No lecture on February 4!
Description
The segmentation of sequences or time series into homogenous pieces has
many applications. The course covers some algorithmic and probabilistic
techniques for segmentation methods and related applications.
Tentative contents
-
Introduction: sequences and sequence segmentation; what and why
-
Basic problem definition and the basic dynamic programming algorithm
-
Experimental results
-
Probabilistic versions of the problem
-
Approximate solutions
-
Top-down and iterative methods for segmentation
-
Finding recurrent sources: (k,h)-segmentation
-
Markov chain Monte Carlo methods
Prerequisities
The participants are assumed to know
the basic concepts of algorithms and probability.
For example, the course 58053 Design and Analysis of Algorithms
gives a good background for this course.
Slides
Set 1
Additions to set 1
Set 2
Additions to set 2
Set 3
Set 4: recurrent sources
Set 5: MCMC techniques
Some code
The following pieces of matlab code probably
contain bugs, but they can be perhaps be useful
in building something more robust.
Basic dynamic programming, L_p metric
Basic dynamic programming for 0-1 sequences, loglikelihood metric
Some other papers
Johan Himberg, Kalle Korpiaho, Heikki Mannila, Johanna Tikanmäki, and
Hannu T.T. Toivonen.
Time
series segmentation for context recognition in mobile devices
In The 2001 IEEE International Conference on Data Mining
(ICDM'01),
203 - 210, San Jose, California, November-December 2001. IEEE.
H. Mannila and M. Salmenkivi:
Finding simple intensity descriptions from event sequence data.
Proceedings of the Seventh ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD 2001),
F. Provost and R. Srikant (eds.),
p. 341-346.
A. Gionis and H. Mannila:
Finding recurrent sources in sequences.
ACM ReCOMB 2003.