
Tree Disequilibrium Test (TreeDT) software
-------------------------------------------

Tree Disequilibrium Test (TreeDT) is a method for gene mapping.  It
extracts (essentially in the form of substrings and prefix trees)
information about historical recombinations in the population. The
information is then used to locate fragments potentially inherited
from a common diseased founder, and to map the disease gene into the
most likely fragment.

For more information about TreeDT, refer to: 

Petteri Sevon, Hannu Toivonen, Vesa Ollikainen. TreeDT: Tree pattern
mining for gene mapping. IEEE/ACM Transactions on Computational
Biology and Bioinformatics 3 (2): 174-185, April-June 2006.

Some additional information, including source code, is available at
http://www.cs.helsinki.fi/group/genetics/hpm.html


Compilation of TreeDT
---------------------

TreeDT is provided as C source code, to be compiled by yourself for
your specific environment. In our Linux environment, at the time of
writing, we use this command to compile it:

gcc -lm *.c -o treedt


Input file format
-----------------

The input file format for TreeDT is the following. 

- The first line must be a title line with columns "Id", "Status",
  "M1", "M2", etc. (TreeDT actually only checks that the names of the
  marker columns begin with an "M", but in the results the markers are
  reported as "M1", "M2", etc. anyway regardless of what's on the
  title line.)

- Accepted values for the Status field are "a" (affected) and "c"
  (unaffected/control). Values in the Id field can be anything.
  Alleles (values of fields M1, M2, etc.) must be labeled with
  non-negative integers where 0 denotes missing value. (No "!"
  parameter settings, comment lines, or blank lines are accepted, in
  contrast to HPM.)

This is a toy example of a valid input file:

Id Status M1 M2 M3
1 a 1 1 2
1 a 2 1 0
2 c 1 1 1
2 c 2 1 2


About output
------------

For information about p-values output by TreeDT, please see the
article referred to above.

The TreeDT output talks about locations 0.5, 1.5, etc. These are
references to the relative locations of the markers, where the first
marker is considered to be at location 1, the second at location 2,
etc. I.e., the first location in the output (0.5) is before the first
marker, the second (1.5) is between the first two markers, etc, and
the last location (for the example data above, 3.5) is after the last
marker.


Command line options
--------------------

Option -h lists the forms in which treedt can be invoded: try
command

treedt -h

The most useful options are p (normal mode) and k (a fixed number of
subtrees). Options P and K are meant for power estimation on simulated
datasets with evenly distributed markers and require the known
position of the causal variant as an extra argument. Option s prints
raw scores without permutations and has been intended for testing only.


Copyright
---------

Copyright is owned by the authors and Licentia Ltd. You may freely use
this software for non-commercial purposes. In scientific publications,
please refer to the article mentioned above.


Disclaimer
----------

The source code is provided "as is" without any guarantees or
warranties.

