Group leader of UDBMSUniversity of Helsinki
Email : jiahenglu.at.gmail.com
Office : Exactum C211[Suomi] [中文]
Research goal: Improving the performance and usability of databases systems
I am a computer scientist and a teacher, with a broad interest in databases and data management. My recent interests include multi-model database management systems, semantic string processing and job optimization for big data platform.
I was awarded a PhD degree in 2007 from the National University of Singapore. My PhD topic was about XML query processing. I did two year Postdoc research at the University of California, Irvine. Then I joined the Renmin University of China in 2008, where I have worked for seven years. I am now working at the University of Helsinki, Finland. I have the broad research and teaching experiences in four countries (China, Singapore, USA, and Finland).
One of my books on Hadoop is awarded as one of the Top 10 Bestselling IT Books in China.
- We published three papers on the vision and benchmark for multi-model databases: Vision 1, Vision 2, Benchmark (05.08.2018).
- An invited lecture on big data management on 2018 Summer School: Challenges for the XXI century: data, information and communication under the Utrecht Network collaboration. See the exercise questions. (28.06.2018).
- Congratulate my two PhD students (in Renmin University of China) successfully defend their PhD thesis! Juwei Shi: Performance Evaluation, Models and Optimization for Big Data Analytics Platforms. Yu Liu: Structural-Based Approximate Algorithms for Massive Graphs. (18.05.2018).
- One invited lecture in the EDUFI Winter School in 2018 on "Introduction to Big Data Management". [Talk_slides] [Task] (03.04.2018).
- A new PhD student Gongsheng Yuan joined our research group in Helsinki at 30.10.2017. Welcome Gongsheng! (03.11.2017). Archived news
- Multi-model database management systems: As more businesses realized that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. Our research focus is to develop new theories and algorithms of a novel multi-model database management system to manage both well-structured data and NoSQL data. Our approach will reduce integration issues, simplify operations, and eliminate migration issues between relational and NoSQL data.
- Jiaheng Lu: Towards Benchmarking Multi-Model Databases(Abstract) CIDR 2017[PDF]
- Jiaheng Lu, Irena Holubova: Multi-model Data Management: What's New and What's Next? EDBT 2017 Tutorial [PDF][slides]
- Semantic-based similarity string search and join: String data is ubiquitous. Supporting semantic string processing is an important task in databases. The vision of this project is to enhance the usability of databases with semantics by extending the query languages and keyword search. The new results returned by our techniques will rely on not syntactic matching of strings, but the real meaning and context of terms.
- Pengfei Xu, Jiaheng Lu: Top-k String Auto-Completion with Synonyms. DASFAA (2) 2017: 202-218 [PDF, Slides, Source Codes]
- Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Xiaokui Xiao: Boosting the Quality of Approximate String Matching by Synonyms. ACM Trans. Database Syst. 40(3): 15 (2015) [PDF]
- Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Haiyong Wang: String similarity measures and joins with synonyms. SIGMOD Conference 2013: 373-384 [PDF]
Codes and dataset release
- Multi-model data generation and benchmark: We developed a new benchmark called UniBench to give a comprehensive evaluation for multi-model databases. Download the data and scripts here.
- Seminar on big data management (Spring 2016, 2017): Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. The seminar covered selected topics about challenges of big data management, including big data platform, querying, exploration, analysis, sampling, and cloud data management, as well as big data applications.
- Introduction to big data management (Autumn 2016, 2017): We are in the era of "big data". Data sets grow fast in size because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, remote sensing, software logs, cameras, microphones, and wireless sensor networks. Most big data environments go beyond relational databases and traditional data warehouse platforms. The increasing focus on big data is shaping new algorithms and techniques. This course will mainly discuss some selected algorithms and systems on big data management, including data sketches algorithms, Hadoop MapReduce framework, and query languages for XML and graph documents.
- Gongsheng Yuan (2017-)
- Yuxing Chen (2017-)
- Pengfei Xu (2016-)
- Chao Zhang (2015-)
- Yu Liu (RenminU niversity of China) (2014-2018) (Co-supervised with Prof. Zhewei Wei)
- Juwei Shi (Renmin University of China) (2013-2018)
- Zhaoan Dong (Renmin University of China) (2013-) (Co-supervised with Prof. Xiaofang Zhou and Prof. Ju Fan)
- Workshop co-chair in ER 2018.
- Keyword search and data exploratory workshop 2016 with ICDE 2016
- Keyword search on structured data (KEYS) workshop with SIGMOD 2012
- XML-DM Workshop with WAIM 2010
- Cloud-DB workshop with CIKM 2010
- ACM SIGMOD'2010, 2013, 2014, 2015, 2016 Research track
- Very Large Database Conference Proceeding PVLDB 2010, 2015, 2017
- IEEE ICDE Conference 2011, 2017, 2019
- ER Conference 2018
- Database Systems for Advanced Applications Conference DASFAA 2010,2012, 2013, 2014
- Asia-Pacific Web Conference APWeb 2008, 2009, 2011, 2013, 2014, 2015
- Web-age information management Conference WAIM 2014,2015,2016
- WAIM-APWEB Conference 2017
- Web System Engineering (WISE) Conference 2009
- Chinese Conference on Information Retrieval (CCIR) 2015, 2016
- Australia Database Conference ADC 2013, 2017, 2018