Here is a short list of steps to follow to analyse data using Tane.

Steps to follow:

1) Prepare your data: the input data for Tane must be a set of comma
   separated records without extra whitespace. For example, if you
   take a look on original/testadata.orig file, you'll see lines like:

128059,1,1,1,1,2,5,5,1,1,2
1285531,1,1,1,1,2,1,3,1,1,2
1287775,5,1,1,2,2,2,3,1,1,2
144888,8,10,10,8,5,10,7,8,1,4
145447,8,4,4,1,2,9,3,3,1,4
167528,4,1,1,1,2,1,3,6,1,2

Put your data in the directory called original. The name of that file
must end with .orig, for example, data.orig. The entries can be
arbtirary strings (excluding ','), not necessary numeric. For example:

monday,raining,15,abc
tuesday,shining,24,dfg

etc.

2) Generate description file for your data as follows. Take a look on description
   file in descriptions/testdata.dsc. Make a copy of that file and
   replace word testdata (in line DataIn = ...) to match your datafile
   name (in this case data). The name of that file must end with .dsc,
   for example data.dsc.


3) Generate data for Tane as follows. Now you should have two files ready:

original/data.orig
           [tiedoston nimi? vrt. kohta 1) yllä]
descriptions/data.dsc

Change to original -directory and run the data generation command:

$ cd original
$ ../bin/select.bin ../descriptions/data.dsc

Now you should find two generated files at the data -directory, data.dat
and data.rel.

4) Run analysis at the tane root directory. For example:

$ ./bin/tane 11 100 11 data/testdata.dat

where first 11 is the level where you want to stop the search, 100 is
the number of records from the beginning of the data file you want to
use in the search, and the last 11 is the number of attributes you want
to use in search. In the testdata there are total of 11 attributes and
100 records, so the whole data is used in the analysis. And the
data/testdata.dat is the name of the data file.

As an output you'll see lines like:

1 -> 10
1 -> 11
1 2 -> 3
1 2 -> 4
1 2 -> 6
1 2 -> 9
1 4 -> 3
1 3 -> 4

These are the dependencies that are found during the analysis. Line '1
-> 10' means that attribute number 1 defines the attribute number 10.
Replace the number with the actual name of the attribute to make the
output more readable.

Between the dependency lines there are some logging information, which
you can ignore, such as:

Level == 4  #candidates == 330     avg.elements == 6     (14186/2257)

i.e., in search level 4 there are 330 candidates left to analyse.

You can also output the result of the search to a file:

$ ./bin/tane 11 100 11 data/testdata.dat > results.txt
 
results.txt will be in 'unix' form, new line character at the end of
each line, and should be shown correctly in Wordpad (not in Notepad).