Seminar on Distance Measures
Year | Semester | Date | Period | Language | In charge |
---|---|---|---|---|---|
2014 | autumn | 01.09-08.12. | 1-2 | English | Veli Mäkinen |
Lectures
Time | Room | Lecturer | Date |
---|---|---|---|
Mon 14-16 | C220 | Veli Mäkinen | 01.09.2014-13.10.2014 |
Mon 14-16 | C220 | Veli Mäkinen | 27.10.2014-08.12.2014 |
General
Consider two sets of points in 2D space. There are several ways to define their distance, that is, to assign a non-negative value to characterize how dissimilar the two sets are. If the two sets are identical, one assigns distance 0. If there are points in the other set not present in the other, or some points appear to be shifted locally, the distance should reflect the amount of work to do in order to convert one set into the other, with some fair way to define the cost of the work. Earth mover's distance is a one way of defining such work, with the points additionally associated some weights. One set can then be thought of as piles of earth, and the other as wholes in earth. One wishes to find the minimum cost movement of piles to fill up the wholes.
In many cases, there is global invariant operation that can be carried out free of charge to move one set near the other such that their distance is minimized. In the case of point sets, translations, rotations, and scalings are such operations.
Computation of distance measures with and without the global invariant operations is an interesting field of study. Many times the computation reduces to some combinatorial optimization problem and discretization of the invariance space to meaningful areas where the distance computation is sufficient to compute.
Distance measures on fundamental objects such as weighted point sets, curves, line segments, strings, etc. find applications in wide range of fields, such as fingerprint detection, music retrieval, image retrieval, biomedical imaging, and metagenomic analysis.
The seminar explores selected object classes and distance measure computation aspects around them. Also metric space indexing is studied: How to store a set S of objects such that given a new query object, one can find fast its nearest neighbor in S, that is, an object of S with minimum distance from the query.
Completing the course
Each participant selects a topic based on some original articles. A written report (5-10 pages using e.g. tktl-style latex template or office template ) is prepared over the first period, and presentations on the topics are given through the second period. To pass the seminar, a participant should also attend minimum of 4 gatherings during the second period. Final presentation and active participation throughout the seminar affect grading. The written report is taken as rehearsal and you should hence use it for explaining material you plan to present in your talk. You will receive feedback on the report before you presentation; check the feedback schedule below, or book your time by email if the fixed time on Tuesdays is not good for you.
Schedule (due to the University opening seminar on 1.9, we start at the second week of the semester):
- First meeting on Monday 8.9 14-16 C220: Topics will be introduced and each participant should select one of interest. Presentation slots reserved for period II.
-
Mondays until the end of period I: study group activity and personal guidance (book your slot by email some days before):
- Mon 15.9 Study group
- Mon 22.9 14:15-15 Study group; 15:00-15:20 TBA; 15:20-15:40 TBA; 15:40-16:00 TBA
- Mon 29.9 14:15-15 Study group; 15:00-15:20 TBA; 15:20-15:40 TBA; 15:40-16:00 TBA
- Mon 6.10 14:15-15 Study group; 15:00-15:20 TBA; 15:20-15:40 TBA; 15:40-16:00 TBA
- Mon 13.10 14:15-15 Study group; 15:00-15:20 TBA; 15:20-15:40 TBA; 15:40-16:00 TBA
- Friday 17.10: Written reports due.
-
Mondays in period II:
- Mon 27.10. EN: Topic 1; ASZ: Topic 2
- Mon 3.11. JH: Topic 3
- Mon 10.11. AH: Topic 7; DV Topic 8
- Mon 17.11. KL: Topic 5; HK Topic 6
- Mon 24.11. AK Topic 10
- Mon 1.12. SP: Topic 11
-
Tuesdays in period II:
- at 13 room A239b feedback for the first speaker next week
- at 14 room A239b feedback for the second speaker next week
Literature and material
List of topics / articles:
-
Topic 1, Fréchet distance
- Helmut Alt and Michael Godau. Computing the Fréchet distance between two polygonal curves, International Journal of Computational Geometry and Applications 5 (1–2): 75–91, 1995. doi:10.1142/S0218195995000064.
- Helmut Alt, Christian Knauer, and Carola Wenk. Matching Polygonal Curves with Respect to the Fréchet Distance. In Proc. STACS 2001: 63-74, 2001.
-
Topic 2, Earth mover's distance
-
Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. The Earth Mover's Distance as a Metric for Image Retrieval. International Journal of Computer Vision 40(2): 99-121, 2000.
-
Scott D. Cohen and Leonidas J. Guibas. The Earth Mover's Distance: Lower Bounds and Invariance under Translation. Report CS-TR-97-1597, Stanford University, Department of Computer Science, 1997.
-
Sergio Cabello, Panos Giannopoulos, Christian Knauer, and Günter Rote. Matching point sets with respect to the Earth Mover's Distance. Comput. Geom. 39(2): 118-133, 2008.
-
-
Hausdorff distance etc.
-
Topic 3
-
(review) Veli Mäkinen and Esko Ukkonen. Point Pattern Matching. Encyclopedia of Algorithms, pp. 657-660, 2008.
-
(survey) Helmut Alt, Kurt Mehlhorn, Hubert Wagener, and Emo Welzl. Congruence, Similarity, and Symmetries of Geometric Objects. Discrete & Computational Geometry 3: 237-256, 1988.
-
-
Topic 4
-
L. Paul Chew and Klara Kedem. Improvements on Geometric Pattern Matching Problems. In Proc. SWAT 1992. 318-325.
-
-
-
Topic 5, Metric space indexing
-
Edgar Chávez, Gonzalo Navarro, Ricardo A. Baeza-Yates, and José L. Marroquín. Searching in metric spaces. ACM Comput. Surv. 33(3): 273-321, 2001.
-
-
Topic 6, Metric properties & computation
-
Thomas Eiter and Heikki Mannila. Distance Measures for Point Sets and their Computation. Acta Inf. 34(2): 109-133, 1997.
-
-
Compression distance
-
Topic 7
-
Ming Li, Xin Chen, Xin Li, Bin Ma, and Paul M. B. Vitányi. The similarity metric. IEEE Transactions on Information Theory, 50(12): 3250-3264, 2004.
-
-
Topic 8
-
Paolo Ferragina, Igor Nitto, and Rossano Venturini. On the Bit-Complexity of Lempel-Ziv Compression. SIAM J. Comput. 42(4): 1521-1541, 2013.
-
-
-
Topic 9, Normalized edit distance
- Andrés Marzal and Enrique Vidal. Computation of Normalized Edit Distance and Applications. IEEE Trans Pattern Anal Mach Intell. 15(9):926-932, 1993.
- L Yujian and L Bo. A normalized Levenshtein distance metric. IEEE Trans Pattern Anal Mach Intell. 29(6):1091-5, 2007 Jun.
-
Topic 10, Tree edit distance
- Philip Bille. A Survey on Tree Edit Distance and Related Problems. Theor. Comput. Sci. 337(1-3): 217-239, 2005.
-
Topic 11, Graph distance measures
- Horst Bunke and Kim Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19: 255-259, 1998.
-
Topic 12, Graph kernels
- Thomas Gärtner, Peter A. Flach, abd Stefan Wrobel. On Graph Kernels: Hardness Results and Efficient Alternatives. In Proc. COLT 2003, pp. 129-143.