# Distributed Systems Project

Vuosi | Lukukausi | Päivämäärä | Periodi | Kieli | Vastuuhenkilö |
---|---|---|---|---|---|

2014 | kevät | 14.01-20.02. | 3-3 | Englanti | Jussi Kangasharju |

## Luennot

Aika | Huone | Luennoija | Päivämäärä |
---|---|---|---|

Ti 10-12 | D122 | Jussi Kangasharju | 14.01.2014-20.02.2014 |

To 10-12 | D122 | Jussi Kangasharju | 14.01.2014-20.02.2014 |

Ensimmäinen kokoontuminen on tiistaina 14.1. klo 10 salissa D122 ja myöhemmistä kokoontumisista sovitaan ensimmäisellä kerralla.

The first meeting is on Tue 14th of January at 10 in room D122 and the rest of the meetings are agreed then.

## Yleistä

The introductory slides describe the basics of the course, schedule, and grading.

We use Twitter as the main means of distributing announcements under the hashtag #tktl_dsp. Longer announcements will be also posted separately on this page.

#### Schedule

- 14.1. Start of first and second assignments (individual)
- 16.1., 21.1., and 23.1. Q&A for first two assignments
**28.1. DEADLINE**for first assignment- 28.1. Start of third assignment (groups allowed but not mandatory)
**4.2. NEW DEADLINE**for second assignment**13.2. Review of answers to second assignment**- 30.1., 4.2., 6.2., 11.2., 13.2., 18.2., and 20.2. Q&A for third assignment
**7.3. DEADLINE**for third assignment

## Kurssin suorittaminen

The course has 3 assignments and you have to pass all of them to pass the course. Each assignment is graded separately and the final grade is a weighted average of the individual assignment grades. Assignments 1 and 2 have weight 1 and assignment 3 has weight 2.

#### Assignment 1

The first assignment is to implement one distributed algorithm from the following set. Your assignment number is calculated by (your student ID % 5) + 1.

#### Assignment 2

- Assignment sheet (NOTE: Assignment description has been updated on 14.1. to reflect the new example code.)
- Small example program that you can run and use as base for your answer

You can check the status of the Hadoop installation via the following two links. They only work from computers within the department's network. University VPN is NOT sufficient.

**Hints for Hadoop assignment**

Getting answers to question 1 (min, max, avg, variance) can be done easily as trivial modification to the example code. You can make all 4 run in a single program in a single pass, but even easier is to write 4 programs and run them separately.

For getting the median in question 2, think about what median is and what its properties are in a data set. Sorting the data set is one way of getting the answer, but it is a very bad way in this case and you should not do that since we do not have enough storage space for everyone to sort their data sets. For a larger data set, sorting would be infeasible to begin with. :-)

There are far simpler and more efficient ways to figure out the value of the median. Do not try to write a program that attempts to find it in a single pass. Instead run the program a few times (less than 5 should be easily doable, less than 10 almost a given) iterating towards the answer and modifying the program based on the results of the previous run.

For question 3, you need the median to get the correct values. If you are stuck with the median and cannot get the answer to it, then simply describe in your answer to question 3 how you would calculate them, assuming you had the median.

#### Assignment 3

There will be Q&A on all sessions until February 20th. We will later have an optional discussion session about different overlays in March, date to be announced later.