Group leader of UDBMSUniversity of Helsinki
Email : jiahenglu.at.gmail.com
Office : Exactum C211[Suomi] [中文]
Research goal: Improving the performance and usability of databases systems
I am a computer scientist and a teacher, with a broad interest in databases and data management. My recent interests include multi-model database management systems, semantic string processing and job optimization for big data platform.
I was awarded a PhD degree in 2007 from the National University of Singapore. My PhD topic was about XML query processing. I did two year Postdoc research at the University of California, Irvine. Then I joined the Renmin University of China in 2008, where I have worked for seven years. I am now working at the University of Helsinki, Finland. I have the broad research and teaching experiences in four countries (China, Singapore, USA, and Finland).
One of my books on Hadoop is awarded as one of the Top 10 Bestselling IT Books in China.
- We are awarded a new grant on multi-model data management from the Academy of Finland. See abstract. (22.06.2017).
- A new PhD student Yuxing Chen will join our research group in Helsinki at 06.03.2017. Welcome Yuxing! (04.03.2017).
- We will give a new tutorial on multi-model data management in EDBT conference 2017 [PDF][slides] (01.02.2017).
- We are awarded a new grant on heterogenous big data platform by Huawei Company (14.12.2016).
- "Towards Benchmarking Multi-Model Databases" CIDR 2017 [Abstract] (16.10.2016).
- Our group made a poster presentation in Linux Jubilee seminar at the department on 22,August 2016. [Poster] (16.10.2016).
- A new PhD student Chao Zhang will join our research group in Helsinki at 09.09.2016. Welcome Chao! (03.09.2016).
- Prof. Irena Holubova from Charles University in Prague visited our group (June-August, 2016) for the collabration on multi-model databases. Thanks Irena! (03.09.2016).
- We organized a workshop with ICDE 2016 in Helsinki on keyword search and data exploration (20.05.2016).
- We organized the First Europe-China workshop on Big Data Management in the University of Helsinki (16.05.2016).
- Multi-model database management systems: As more businesses realized that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. Our research focus is to develop new theories and algorithms of a novel multi-model database management system to manage both well-structured data and NoSQL data. Our approach will reduce integration issues, simplify operations, and eliminate migration issues between relational and NoSQL data.
- Jiaheng Lu: Towards Benchmarking Multi-Model Databases(Abstract) CIDR 2017[PDF]
- Jiaheng Lu, Irena Holubova: Multi-model Data Management: What's New and What's Next? EDBT 2017 Tutorial [PDF][slides]
- Semantic-based similarity string search and join: String data is ubiquitous. Supporting semantic string processing is an important task in databases. The vision of this project is to enhance the usability of databases with semantics by extending the query languages and keyword search. The new results returned by our techniques will rely on not syntactic matching of strings, but the real meaning and context of terms.
- Pengfei Xu, Jiaheng Lu: Top-k String Auto-Completion with Synonyms. DASFAA (2) 2017: 202-218 [PDF, Slides, Source Codes]
- Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Xiaokui Xiao: Boosting the Quality of Approximate String Matching by Synonyms. ACM Trans. Database Syst. 40(3): 15 (2015) [PDF]
- Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Haiyong Wang: String similarity measures and joins with synonyms. SIGMOD Conference 2013: 373-384 [PDF]
Codes and dataset release
- Multi-model data generation and benchmark: We developed a new benchmark called UniBench to give a comprehensive evaluation for multi-model databases. Download the data and scripts here.
- Seminar on big data management (Spring 2016, 2017): Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. The seminar covered selected topics about challenges of big data management, including big data platform, querying, exploration, analysis, sampling, and cloud data management, as well as big data applications.
- Introduction to big data management (Autumn 2016, 2017): We are in the era of "big data". Data sets grow fast in size because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, remote sensing, software logs, cameras, microphones, and wireless sensor networks. Most big data environments go beyond relational databases and traditional data warehouse platforms. The increasing focus on big data is shaping new algorithms and techniques. This course will mainly discuss some selected algorithms and systems on big data management, including data sketches algorithms, Hadoop MapReduce framework, and query languages for XML and graph documents.
- Yuxing Chen (2017-)
- Pengfei Xu (2016-)
- Chao Zhang (2015-)
- Yu Liu (RenminU niversity of China) (2014- )
- Juwei Shi (Renmin University of China) (2013- )
- Zhaoan Dong (Renmin University of China) (2013-)
- Keyword search and data exploratory workshop 2016 with ICDE 2016
- Keyword search on structured data (KEYS) workshop with SIGMOD 2012
- XML-DM Workshop with WAIM 2010
- Cloud-DB workshop with CIKM 2010
- ACM SIGMOD'2010, 2013, 2014, 2015, 2016 Research track
- Very Large Database Conference Proceeding PVLDB 2010, 2015, 2017
- IEEE ICDE Conference 2011, 2017
- Database Systems for Advanced Applications Conference DASFAA 2010,2012, 2013, 2014
- Asia-Pacific Web Conference APWeb 2008, 2009, 2011, 2013, 2014, 2015
- Web-age information management Conference WAIM 2014,2015,2016
- WAIM-APWEB Conference 2017
- Web System Engineering (WISE) Conference 2009
- Chinese Conference on Information Retrieval (CCIR) 2015, 2016
- Australia Database Conference ADC 2013, 2017