Program HIT (Haplotype Inference Technique) solves so-called genotype phasing (or haplotype inference) problem. It estimates the haplotypes of a given set of genotypes consisting of SNPs (single nucleotide polymorhisms). It does this by learning a hidden Markov model for the haplotypes using an EM type algorithm.
There is an applet version of HIT and it can be started here. When the applet is started it asks to be signed. Without signing it is possible to use HIT but you cannot open or save files. HIT requires a new java runtime environment, which can be downloaded from www.java.com.
Alternatively HIT can be run from the command line. Download the source package of HIT and see README.txt in the package or download the jar package and run it using java interpreter: "java -cp hit.jar hit.HIT input K tol/steps". The parameter input is the input filename and the other parameters are as explained later. An example command could look like:
First input genotype is described on input lines 1 and 2, Second on lines 3 and 4 and so on. So if there are n input genotypes then the input file has 2n lines. Here is an example input file for HIT (another example). It is possible to type, paste or open inputs to the input tab. SNPs can be typed in physical order using any (at most) two character for each SNP. Symbols "?", "." and "-" mark the missing values.
The result haplotypes comes to the output tab (in the same format as genotypes), from where it is possible to save or copy the result.
The parameter "K" is the number of founders and the parameter "EM tol/steps" is the stopping criteria for the EM algorithm. If the latter parameter is an integer the EM step is iterated this many times. Otherwise the EM is stopped when the increase in the log likelihood becomes smaller than this given value.
The initialization selection changes the initialization algorithm. The option "Greedy init" is the original algorithm from [RKMU05]. The other choice "Supergreedy Init" is much faster and it is more practical when K > 10.
The source code of HIT is available under GPL license here . Bug reports and comments are welcome and please send them to the contact person Pasi Rastas.
[RKMU05] Pasi Rastas, Mikko Koivisto, Heikki Mannila and Esko Ukkonen: A Hidden Markov Technique for Haplotype Reconstruction. In: R. Casadio and G. Myers (eds.), 5th Workshop on Algorithms in Bioinformatics - WABI 2005, pp. 140-151, Springer 2005. pdf
[RKMU08] Pasi Rastas, Mikko Koivisto, Heikki Mannila, and Esko Ukkonen: Phasing genotypes using a hidden Markov model. In: I. Mandoiu and A. Zelikovsky (eds.), Bioinformatics Algorithms: Techniques and Applications, pp. 373-391, Wiley 2008.