Big Data Frameworks
|Tue 12-14||D122||Sasu Tarkoma||10.03.2015-28.04.2015|
|Fri 10-12||D122||Mohammad Hoque||13.03.2015—13.03.2015|
|Fri 10-12||D122||Mohammad Hoque||16.03.2015—24.04.2015|
|Wed 10-12||D122||Mohammad Hoque||29.04.2015—29.04.2015|
This course examines current and emerging Big Data frameworks with focus on Data Science applications. The course starts with an introduction to MapReduce-based systems and then focuses on Spark and the Berkeley Data Analytics (BDAS) architecture. The course covers traditional MapReduce processes, streaming operation, machine learning and SQL integration. The course consists of the lectures and the assignments.
The course has an IRCnet channel #tkt-bdf.
Assignments are given by Ella Peltonen, Eemil Lagerspetz, and Mohammad Hoque.
Completing the course
The course consists of the lectures and the course assignments. The assignments are based on the Spark Big Data framework and the Scala programming language.
Instead of the first week exercise session, we have a Spark coding tutorial on Friday 13.3. at 10-12. Please bring your laptop with you, if you have one. You can install the latest Spark version beforehand.
The Scala & Spark Tutorial 13.03.2015 slides are available here: http://is.gd/bigdatascala
You can find the answers for Exercie set 1 here (Not yet complete).
The second exercise set is available there. Deadline is 26.3. 2pm, please return your answers via Moodle. These exercises have been discussed on Friday 27.3., when there will also be a Q&A for the exercise set three. Some hints included to the exercise set. Extended deadline 2.4. 2pm. Maximum number of points will be 5 if you use this opportunity. You can pick and do 5 that you are sure of, or do all 6 if you're not sure about one of them.
Access answers for Exercise Set 2 here (Not yet complete).
The third exercise set is now published. Deadline is 9.4. 2pm, please return via Moodle. These exercises will be discussed on Friday 10.4. after Easter. Because of the Easter break, we will not have an exercise on 3.4. Extended deadline 16.4. 2pm. Maximum number of points will be 5 if you use this opportunity. Please, return the entire solution set, also the exercises you are happy with from the first round.
On Friday 17.4., there is a Q&A session instead of the exercise session. Prepare your questions beforehand.
The fourth (and last) exercise set is published. Deadline is 23.4. 2pm and returnings via Moodle as always. These exercises will be discussed on Friday 24.4. Nota that there will be no extension for this last exercise set.
Tentative lecture outline
7.4. Easter break
21.4. Two industry presentations (Nokia and F-Secure) on Big Data and Spark
- Results of the 16.6.2015 exam
- 18.9.2015 16:00 in B123
- 1.12.2015 16:00 in B123
Literature and material
Course is based on the lectures, assignments, and additional material available on the Web. The key source of information is the Apache Spark Web site:
Reading list (for the exam):