Group leader of UDBMSUniversity of Helsinki
Email : jiahenglu.at.gmail.com
Office : Exactum C211[Suomi] [中文]
Research goal: Improving the performance and usability of databases systems
I am a computer scientist and a teacher, with a research interest in databases and data management. My recent topics include multi-model database management systems, semantic string processing and job optimization for big data platform.
I was awarded Ph.D. degree in 2007 from the National University of Singapore. My PhD topic was about XML query processing. I did two years Postdoc research at the University of California, Irvine. Then I joined the Renmin University of China in 2008, where I have worked for seven years. I am now working at the University of Helsinki, Finland. I have the broad research and teaching experiences in four countries (China, Singapore, USA, and Finland).
One of my books on Hadoop is awarded as one of the Top 10 Bestselling IT Books in China.
- Two new PhD students Zhengtong Yan and Shuxun Zhang joined our UDBMS group in October. Welcome Zhengtong and Shuxun! (9.10.2020).
- Our Data Science M.Sc. program is among the global top 10 according to Forbes. Congratulations to all program's teachers and students! [Link] (4.10.2020).
- I was promoted to a permanent professorship in Computer Science (Data Management). Thank all for your help to my academic career! (4.10.2020).
- We will give a new tutorial in CIKM 2020! "Multi-Model Data Query Languages and Processing Paradigms" [Details] (23.5.2020).
- Our survey paper "A Survey on Automatic Parameter Tuning for Big Data Processing Systems" has been published in ACM Computing Surveys (CSUR) [Open access], [Related VLDB tutorial] (2.5.2020).
- We published a new journal paper on benchmarking multi-model databases: "Holistic evaluation in multi-model databases benchmarking." [PDF][Code](6.3.2020).
- We recieved a gift research fund from Oracle, California. I gave an invited talk in Oracle, titled "A Categorical Framework on Multi-Model Databases" [PDF](7.9.2019).
- We will give a new tutorial in CIKM 2019! "Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join" [Details](7.9.2019).
- Two papers are accepted in VLDB 2019! One research paper: "Towards a Unified Framework for String Similarity Joins" [PDF], [Source code] , [Slides] and one demo paper: "PivotE: Revealing and Visualizing the Underlying EntityStructures for Exploration" [PDF] (22.6.2019).
- We will give a new tutorial on autonomous performance tuning in VLDB 2019! See more information here (24.4.2019).
- One new survey paper (38 pages) on multi-model databases (to appear in ACM Computing Surveys)! [PDF] (6.3.2019).
- A new Postdoc Researcher Dr. Qingsong Guo joined our research group in Helsinki on 14.1.2019. Welcome Qingsong! (22.1.2019). More news ...
- Multi-model database management systems: As more businesses realized that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. Our research focus is to develop new theories and algorithms of a novel multi-model database management system to manage both well-structured data and NoSQL data. Our approach will reduce integration issues, simplify operations, and eliminate migration issues between relational and NoSQL data.
- Jiaheng Lu, Irena Holubova : Multi-model Databases: A New Journey to Handle the Variety of Data, ACM Computing Surveys 2019 [PDF]
- Jiaheng Lu, Irena Holubova, Bogdan Cautis: Multi-model Databases and Tightly Integrated Polystores CIKM 2018 Tutorial[PDF]
- Jiaheng Lu: Towards Benchmarking Multi-Model Databases(Abstract) CIDR 2017[PDF]
- Jiaheng Lu, Irena Holubova: Multi-model Data Management: What's New and What's Next? EDBT 2017 Tutorial [PDF][slides]
- Chao Zhang, Jiaheng Lu, Pengfei Xu, Yuxing Chen: UniBench: A Benchmark for Multi-model Database Management Systems. TPCTC 2018: 7-23 [PDF]
- Semantic-based similarity string search and join: String data is ubiquitous. Supporting semantic string processing is an important task in databases. The vision of this project is to enhance the usability of databases with semantics by extending the query languages and keyword search. The new results returned by our techniques will rely on not syntactic matching of strings, but the real meaning and context of terms.
- Pengfei Xu, Jiaheng Lu: Towards a Unified Framework for String Similarity Joins. PVLDB 12(12) 2019: [PDF], [Slides], [Source Codes]
- Pengfei Xu, Jiaheng Lu: Top-k String Auto-Completion with Synonyms. DASFAA (2) 2017: 202-218 [Slides, Source Codes]
- Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Xiaokui Xiao: Boosting the Quality of Approximate String Matching by Synonyms. ACM Trans. Database Syst. 40(3): 15 (2015)
- Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Haiyong Wang: String similarity measures and joins with synonyms. SIGMOD Conference 2013: 373-384
Codes and dataset release
- Multi-model data generation and benchmark: We developed a new benchmark called UniBench to give a comprehensive evaluation for multi-model databases. Download the data and scripts here.
- Seminar on big data management: Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. The seminar covered selected topics about challenges of big data management, including big data platform, querying, exploration, analysis, sampling, and cloud data management, as well as big data applications.
- Introduction to big data management: We are in the era of "big data". Data sets grow fast in size because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, remote sensing, software logs, cameras, microphones, and wireless sensor networks. Most big data environments go beyond relational databases and traditional data warehouse platforms. The increasing focus on big data is shaping new algorithms and techniques. This course will mainly discuss some selected algorithms and systems on big data management, including data sketches algorithms, Hadoop MapReduce framework, and query languages for XML and graph documents.
- Shuxun Zhang (2020-)
- Zhengtong Yan (2020-)
- Gongsheng Yuan (2017-)
- Yuxing Chen (2017-)
- Pengfei Xu (2016-2020) Thesis title: Efficient Approximate String Matching with Synonyms and Taxonomies
- Chao Zhang (2015-)
- Yu Liu (RenminU niversity of China) (2014-2018) (Co-supervised with Prof. Zhewei Wei)
- Juwei Shi (Renmin University of China) (2013-2018)
- Zhaoan Dong (Renmin University of China) (2013-2018) (Co-supervised with Prof. Xiaofang Zhou and Prof. Ju Fan)
- Workshop co-chair in ER 2018.
- Keyword search and data exploratory workshop 2016 with ICDE 2016
- Keyword search on structured data (KEYS) workshop with SIGMOD 2012
- XML-DM Workshop with WAIM 2010
- Cloud-DB workshop with CIKM 2010
- ACM SIGMOD'2010, 2013, 2014, 2015, 2016 Research track
- Very Large Database Conference Proceeding PVLDB 2010, 2015, 2017, 2020, 2021
- IEEE ICDE Conference 2011, 2017, 2019, 2020
- ER Conference 2018, 2019
- Database Systems for Advanced Applications Conference DASFAA 2010, 2012, 2013, 2014, 2020, 2021
- Asia-Pacific Web Conference APWeb 2008, 2009, 2011, 2013, 2014, 2015
- Web-age information management Conference WAIM 2014,2015,2016
- WAIM-APWEB Conference 2017
- Web System Engineering (WISE) Conference 2009
- Chinese Conference on Information Retrieval (CCIR) 2015, 2016
- Australia Database Conference ADC 2013, 2017, 2018, 2019