
=====================================================================

    segkh: Sequence segmentation with recurrent sources
	
    Copyright (C) 2004 Aristides Gionis

    This code is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY.  For more details see the license 
    that accompanies the code. 

    The code is an implementation of the algorithms described in 
    the paper

    [1] A. Gionis, H. Mannila, Finding recurrent sources in 
        sequences, 7th International Conference on Research 
        in Computational Molecular Biology (RECOMB) 2003

=====================================================================

General
=======

The code is implemented in C++ for Linux/UNIX platforms.
It consists of the following files:

segkh.cc	The main C++ file
Makefile	The Makefile
LICENSE 	The file containing the software license
README		This file

To create the executable program simply run "make" or "gmake".
The program is called "segkh".

By running the program with no parameters, a help screen is 
shown with a short explanation of how the program should be 
called.


Input
=====

The program takes its input from the standard input. 

For example, if the data is stored in a file called "datafile",
it can be inputed to the program using, e.g., the commands

$cat datafile | segkh c 20 5
$segkh r 30 7 < datafile

The input to the program is a multi-dimensional time series. 
The format of the input is assumed to be as follows:

-- The first line is an integer specifying the number of 
   dimensions of the time series. 
-- Then n lines follow. One line for each point in the time 
   series. Each line consists of d numbers specifying the 
   d dimensions for the corresponding point.

(Notice, there is no need to specify the number of points n)


Parameters
==========

The program needs three parameters from the commant line

1. The first parameter is a letter specifying the algorithm
   to be used. The options are: 
     r  for the Segment2Level algorithm 
     c  for the ClusterSegment algorithm
     e  for the EM algorithm starting from the solution found 
        by the ClusterSegment algorithm

   For details on the algorithms see the original paper [1].

2. The second parameter is the number of segments k to be 
   used for the segmentation

3. The third parameter is the number of levels h to be 
   used for the segmentation

Example of usage:

segkh c 10 4 < timeseries.1

