Algorithms and Systems for Big Data Management
Koe
Vuosi | Lukukausi | Päivämäärä | Periodi | Kieli | Vastuuhenkilö |
---|---|---|---|---|---|
2016 | syksy | 31.10-14.12. | 2-2 | Englanti | Jiaheng Lu |
Luennot
Aika | Huone | Luennoija | Päivämäärä |
---|---|---|---|
Ma 12-14 | C222 | Jiaheng Lu | 31.10.2016-14.12.2016 |
Ke 12-14 | C222 | Jiaheng Lu | 31.10.2016-14.12.2016 |
Information for international students
This course is entirely in English.
Yleistä
We are in the era of “big data”. Data sets grow fast in size because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, remote sensing, software logs, cameras, microphones, and wireless sensor networks. Most big data environments go beyond relational databases and traditional data warehouse platforms. The increasing focus on big data is shaping new algorithms and techniques. This course will mainly discuss some selected algorithms and systems on big data management, including data sketches algorithms, NoSQL database systems and semi-structured data stores for XML and JSON documents, GFS and MapReduce framework.
This course is for those new to data science. Prerequisite courses consist of an introductory course in programming (Concepts of Programming) and a course in math (Math for CS: Discrete Math). Knowledge for a traditional relational database is recommended, but not compulsory.
Feedback: Your feedback on this course is really appreciated. Please use this anonymous feedback form.
Kurssin suorittaminen
The course consists of lectures, three exercises, two study groups and an exam. The grading is based on the sum of the points from the exercises (max. 40 points) and the exam (max. 60 points).
- 50 points is required to pass and gives the lowest grade 1
- 85 points or more gives the highest grade 5.
Renewal Exam
The renewal exam requires participation to the course and can be taken only if eligible for the course exam:
(1) Participation to all two study groups during the course; and
(2) At least 10 solved exercise questions.
Kirjallisuus ja materiaali
All essential content can be found in the lecture notes and other material that will be posted here during the course.
Date | Topics | Slides | References | Exercises session (Wednesday) | |
1 | 31.10 | Introdcution to big data management | Introduction | [1,2,3,4] | |
2 | 07.11 | Data model, relational, XML and graph | Data Model | Self-assessment | Study group: Big data |
3 | 14.11 | JSON and Graph model | JSON | Self-assessment | Exercise(I) |
4 | 21.11 | Data sketches I | Countmin Sketch | Self-assessment [6,7,8] | Study group: NoSQL databases |
5 | 28.11 | Data sketches II | FMSketch | Self-assessment |
Q&A
|
6 | 05.12 | MapReduce | MapReduce | Self-assessment | Exercise(II) |
7 | 12.12 | GFS and Bigtable | GFS | Self-assessment | Exercise(III) |
List of associated papers:
(1) Cheikh Kacfah Emani, Nadine Cullot, Christophe Nicolle: Understandable Big Data: A survey. Computer Science Review 17: 70-81 (2015) [PDF paper]