International Fund for Big Data Research in the Department
The UDBMS research group led by Prof. Jiaheng Lu recently received a research grant for big data research from Huawei Open Research Innovation Program (HIRP) . This is an internationally open research competition. This program have received a large number of submissions (more than 600 submissions this year) and only competitive ones are finally chosen for receiving the research award. The title of the project is "Large Scale Heterogeneous Data Processing".
Introduction of this project:
A cloud-hosted application is expected to support millions of end users with terabytes of data. To accommodate such large-scale workloads, it is common to deploy thousands of servers in one data center. Meanwhile, existing big data platforms (e.g., Hadoop or Spark) employ naive scheduling algorithms, which consider neither heterogeneity of resources nor differences of jobs. This motivates a more advanced scheduling scheme in big data environments.
This project studies three problems with respect to heterogeneous resources and job requirements, namely resource and job modeling, cost modeling, and scheduling. This project is expected to tackle each problem and combine all solutions to an integral system, which can apply to current big data platforms such as Hadoop or Spark.
Related big data papers:
1. Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, Chen Wang: MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs. Proceeding of VLDB 7(13): 1319-1330 (2014)
2. Yu Liu, Jiaheng Lu, Hua Yang, Xiaokui Xiao, Zhewei Wei: Towards Maximum Independent Sets on Massive Graphs. Porceeding of VLDB 8(13): 2122-2133 (2015)
3. Jiaheng Lu, Irena Holubova: Multi-model Data Management: What's New and What's Next? EDBT Tutorial 2017
Big data course in the department:
1. Algorithms and Systems for Big Data Management: This course discusses some selected algorithms and systems on big data management, including data sketches algorithms, NoSQL database systems and semi-structured data stores for XML and JSON documents, GFS, MapReduce and Bigtable framework.
2. Seminar on Big Data Management: This seminar discusses new research papers in different fields of big data management, including data querying, exploration, sampling, sharing, cleansing, big data benchmarking and applications.
Website of the UDBMS group: http://udbms.cs.helsinki.fi/