# Seminar on Distance Measures

58314302
3
Algoritmit ja koneoppiminen
Syventävät opinnot
Computational aspects of selected distance measures over variety of objects are studied. Examples of measures covered are Fréchet distance, Earth mover"s distance, and distances for point sets under geometric transformations.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2014 syksy 01.09-08.12. 1-2 Englanti Veli Mäkinen

## Luennot

Aika Huone Luennoija Päivämäärä
Ma 14-16 C220 Veli Mäkinen 01.09.2014-13.10.2014
Ma 14-16 C220 Veli Mäkinen 27.10.2014-08.12.2014

## Yleistä

Consider two sets of points in 2D space. There are several ways to define their distance, that is, to assign a non-negative value to characterize how dissimilar the two sets are. If the two sets are identical, one assigns distance 0. If there are points in the other set not present in the other, or some points appear to be shifted locally, the distance should reflect the amount of work to do in order to convert one set into the other, with some fair way to define the cost of the work. Earth mover's distance is a one way of defining such work, with the points additionally associated some weights. One set can then be thought of as piles of earth, and the other as wholes in earth. One wishes to find the minimum cost movement of piles to fill up the wholes.

In many cases, there is global invariant operation that can be carried out free of charge to move one set near the other such that their distance is minimized. In the case of point sets, translations, rotations, and scalings are such operations.

Computation of distance measures with and without the global invariant operations is an interesting field of study. Many times the computation reduces to some combinatorial optimization problem and discretization of the invariance space to meaningful areas where the distance computation is sufficient to compute.

Distance measures on fundamental objects such as weighted point sets, curves, line segments, strings, etc. find applications in wide range of fields, such as fingerprint detection, music retrieval, image retrieval, biomedical imaging, and metagenomic analysis.

The seminar explores selected object classes and distance measure computation aspects around them. Also metric space indexing is studied: How to store a set S of objects such that given a new query object, one can find fast its nearest neighbor in S, that is, an object of S with minimum distance from the query.

Each participant selects a topic based on some original articles. A written report  (5-10 pages using e.g.  tktl-style latex template or office template ) is prepared over the first period, and presentations on the topics are given through the second period. To pass the seminar, a participant should also attend minimum of 4 gatherings during the second period. Final presentation and active participation throughout the seminar affect grading. The written report is taken as rehearsal and you should hence use it for explaining material you plan to present in your talk. You will receive feedback on the report before you presentation; check the feedback schedule below, or book your time by email if the fixed time on Tuesdays is not good for you.

Schedule (due to the University opening seminar on 1.9, we start at the second week of the semester):

• First meeting on Monday 8.9 14-16 C220: Topics will be introduced and each participant should select one of interest. Presentation slots reserved for period II.
• Mondays until the end of period I: study group activity and personal guidance (book your slot by email some days before):
• Mon 15.9 Study group
• Mon 22.9 14:15-15 Study group; 15:00-15:20 TBA; 15:20-15:40 TBA; 15:40-16:00 TBA
• Mon 29.9 14:15-15 Study group; 15:00-15:20 TBA; 15:20-15:40 TBA; 15:40-16:00 TBA
• Mon 6.10 14:15-15 Study group; 15:00-15:20 TBA; 15:20-15:40 TBA; 15:40-16:00 TBA
• Mon 13.10 14:15-15 Study group; 15:00-15:20 TBA; 15:20-15:40 TBA; 15:40-16:00 TBA
• Friday 17.10: Written reports due.
• Mondays in period II:
• Mon 27.10. EN: Topic 1; ASZ: Topic 2
• Mon 3.11. JH: Topic 3
• Mon 10.11. AH: Topic 7; DV Topic 8
• Mon 17.11. KL: Topic 5; HK Topic 6
• Mon 24.11. AK Topic 10
• Mon 1.12. SP: Topic 11
• Tuesdays in period II:
• at 13 room A239b feedback for the first speaker next week
• at 14 room A239b feedback for the second speaker next week

## Kirjallisuus ja materiaali

List of topics / articles:

• Topic 1, Fréchet distance
• Topic 2, Earth mover's distance
• Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. International Journal of Computer Vision 40(2): 99-121, 2000.
• Scott D. Cohen and Leonidas J. Guibas. The Earth Mover's Distance: Lower Bounds and Invariance under Translation. Report CS-TR-97-1597, Stanford University, Department of Computer Science, 1997.
• Sergio Cabello, Panos Giannopoulos, Christian Knauer, and Günter Rote. Comput. Geom. 39(2): 118-133, 2008.
• Hausdorff distance etc.
• Topic 3
• (review) Veli Mäkinen and Esko Ukkonen. Point Pattern Matching. Encyclopedia of Algorithms, pp. 657-660, 2008.
• (survey) Helmut Alt, Kurt Mehlhorn, Hubert Wagener, and Emo Welzl. Discrete & Computational Geometry 3: 237-256, 1988.
• Topic 4
• L. Paul Chew and Klara Kedem. In Proc. SWAT 1992. 318-325.
• Topic 5, Metric space indexing
• Edgar Chávez, Gonzalo Navarro, Ricardo A. Baeza-Yates, and José L. Marroquín. ACM Comput. Surv. 33(3): 273-321, 2001.
• Topic 6, Metric properties & computation
• Thomas Eiter and Heikki Mannila. Acta Inf. 34(2): 109-133, 1997.
• Compression distance
• Topic 7
• Ming Li, Xin Chen, Xin Li, Bin Ma, and Paul M. B. Vitányi. The similarity metric. IEEE Transactions on Information Theory,  50(12): 3250-3264, 2004.
• Topic 8
• Paolo Ferragina, Igor Nitto, and Rossano Venturini. SIAM J. Comput. 42(4): 1521-1541, 2013.