Seminar on Distance Measures
Vuosi  Lukukausi  Päivämäärä  Periodi  Kieli  Vastuuhenkilö 

2014  syksy  01.0908.12.  12  Englanti  Veli Mäkinen 
Luennot
Aika  Huone  Luennoija  Päivämäärä 

Ma 1416  C220  Veli Mäkinen  01.09.201413.10.2014 
Ma 1416  C220  Veli Mäkinen  27.10.201408.12.2014 
Yleistä
Consider two sets of points in 2D space. There are several ways to define their distance, that is, to assign a nonnegative value to characterize how dissimilar the two sets are. If the two sets are identical, one assigns distance 0. If there are points in the other set not present in the other, or some points appear to be shifted locally, the distance should reflect the amount of work to do in order to convert one set into the other, with some fair way to define the cost of the work. Earth mover's distance is a one way of defining such work, with the points additionally associated some weights. One set can then be thought of as piles of earth, and the other as wholes in earth. One wishes to find the minimum cost movement of piles to fill up the wholes.
In many cases, there is global invariant operation that can be carried out free of charge to move one set near the other such that their distance is minimized. In the case of point sets, translations, rotations, and scalings are such operations.
Computation of distance measures with and without the global invariant operations is an interesting field of study. Many times the computation reduces to some combinatorial optimization problem and discretization of the invariance space to meaningful areas where the distance computation is sufficient to compute.
Distance measures on fundamental objects such as weighted point sets, curves, line segments, strings, etc. find applications in wide range of fields, such as fingerprint detection, music retrieval, image retrieval, biomedical imaging, and metagenomic analysis.
The seminar explores selected object classes and distance measure computation aspects around them. Also metric space indexing is studied: How to store a set S of objects such that given a new query object, one can find fast its nearest neighbor in S, that is, an object of S with minimum distance from the query.
Kurssin suorittaminen
Each participant selects a topic based on some original articles. A written report (510 pages using e.g. tktlstyle latex template or office template ) is prepared over the first period, and presentations on the topics are given through the second period. To pass the seminar, a participant should also attend minimum of 4 gatherings during the second period. Final presentation and active participation throughout the seminar affect grading. The written report is taken as rehearsal and you should hence use it for explaining material you plan to present in your talk. You will receive feedback on the report before you presentation; check the feedback schedule below, or book your time by email if the fixed time on Tuesdays is not good for you.
Schedule (due to the University opening seminar on 1.9, we start at the second week of the semester):
 First meeting on Monday 8.9 1416 C220: Topics will be introduced and each participant should select one of interest. Presentation slots reserved for period II.

Mondays until the end of period I: study group activity and personal guidance (book your slot by email some days before):
 Mon 15.9 Study group
 Mon 22.9 14:1515 Study group; 15:0015:20 TBA; 15:2015:40 TBA; 15:4016:00 TBA
 Mon 29.9 14:1515 Study group; 15:0015:20 TBA; 15:2015:40 TBA; 15:4016:00 TBA
 Mon 6.10 14:1515 Study group; 15:0015:20 TBA; 15:2015:40 TBA; 15:4016:00 TBA
 Mon 13.10 14:1515 Study group; 15:0015:20 TBA; 15:2015:40 TBA; 15:4016:00 TBA
 Friday 17.10: Written reports due.

Mondays in period II:
 Mon 27.10. EN: Topic 1; ASZ: Topic 2
 Mon 3.11. JH: Topic 3
 Mon 10.11. AH: Topic 7; DV Topic 8
 Mon 17.11. KL: Topic 5; HK Topic 6
 Mon 24.11. AK Topic 10
 Mon 1.12. SP: Topic 11

Tuesdays in period II:
 at 13 room A239b feedback for the first speaker next week
 at 14 room A239b feedback for the second speaker next week
Kirjallisuus ja materiaali
List of topics / articles:

Topic 1, Fréchet distance
 Helmut Alt and Michael Godau. Computing the Fréchet distance between two polygonal curves, International Journal of Computational Geometry and Applications 5 (1–2): 75–91, 1995. doi:10.1142/S0218195995000064.
 Helmut Alt, Christian Knauer, and Carola Wenk. Matching Polygonal Curves with Respect to the Fréchet Distance. In Proc. STACS 2001: 6374, 2001.

Topic 2, Earth mover's distance

Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. The Earth Mover's Distance as a Metric for Image Retrieval. International Journal of Computer Vision 40(2): 99121, 2000.

Scott D. Cohen and Leonidas J. Guibas. The Earth Mover's Distance: Lower Bounds and Invariance under Translation. Report CSTR971597, Stanford University, Department of Computer Science, 1997.

Sergio Cabello, Panos Giannopoulos, Christian Knauer, and Günter Rote. Matching point sets with respect to the Earth Mover's Distance. Comput. Geom. 39(2): 118133, 2008.


Hausdorff distance etc.

Topic 3

(review) Veli Mäkinen and Esko Ukkonen. Point Pattern Matching. Encyclopedia of Algorithms, pp. 657660, 2008.

(survey) Helmut Alt, Kurt Mehlhorn, Hubert Wagener, and Emo Welzl. Congruence, Similarity, and Symmetries of Geometric Objects. Discrete & Computational Geometry 3: 237256, 1988.


Topic 4

L. Paul Chew and Klara Kedem. Improvements on Geometric Pattern Matching Problems. In Proc. SWAT 1992. 318325.



Topic 5, Metric space indexing

Edgar Chávez, Gonzalo Navarro, Ricardo A. BaezaYates, and José L. Marroquín. Searching in metric spaces. ACM Comput. Surv. 33(3): 273321, 2001.


Topic 6, Metric properties & computation

Thomas Eiter and Heikki Mannila. Distance Measures for Point Sets and their Computation. Acta Inf. 34(2): 109133, 1997.


Compression distance

Topic 7

Ming Li, Xin Chen, Xin Li, Bin Ma, and Paul M. B. Vitányi. The similarity metric. IEEE Transactions on Information Theory, 50(12): 32503264, 2004.


Topic 8

Paolo Ferragina, Igor Nitto, and Rossano Venturini. On the BitComplexity of LempelZiv Compression. SIAM J. Comput. 42(4): 15211541, 2013.



Topic 9, Normalized edit distance
 Andrés Marzal and Enrique Vidal. Computation of Normalized Edit Distance and Applications. IEEE Trans Pattern Anal Mach Intell. 15(9):926932, 1993.
 L Yujian and L Bo. A normalized Levenshtein distance metric. IEEE Trans Pattern Anal Mach Intell. 29(6):10915, 2007 Jun.

Topic 10, Tree edit distance
 Philip Bille. A Survey on Tree Edit Distance and Related Problems. Theor. Comput. Sci. 337(13): 217239, 2005.

Topic 11, Graph distance measures
 Horst Bunke and Kim Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19: 255259, 1998.

Topic 12, Graph kernels
 Thomas Gärtner, Peter A. Flach, abd Stefan Wrobel. On Graph Kernels: Hardness Results and Efficient Alternatives. In Proc. COLT 2003, pp. 129143.