BACH: Fast Haplotype Inference via Context Tree Weighting

Introduction

Program BACH (BAyesian Context based Haplotyping) solves so-called genotype phasing (or haplotype inference) problem. It is based on Bayesian approach to model haplotypes and genotypes with variable order Markov chains. It finds accurate haplotypes for given genotypes by Bayesian maximum a posterior criterion.

Program with Source Code

Program BACH (BAyesian Context-based Haplotyping) with source code can be downloaded here. It is available under GPL license.

Installation

First, create the folder where bach should be installed and move to it, e.g.

mkdir bach
cd bach
Save the source package to this new folder and unzip it, e.g.
unzip bach_source.zip
Move to the folder source.
cd source
And then BACH can be run as explained in Basic Usage.

BACH requires the Java runtime environment to be installed. It can be downloaded from www.java.com.

Basic Usage

java Bach genotypes.txt >haplotypes.txt

The input "genotypes.txt" are the input genotypes. These are given in the same format as in HIT (see HIT's documentation). Example inputs could look like example1 or example2.

The haplotypes are printed to the screen in the same format as the genotypes are given. By adding ">haplotypes.txt" to the command, the resuls are redirected to file "haplotypes.txt" .

General Usage

java Bach input [D [prior [iterations [branching]]] >output

Parameters input and output are as in the basic usage. Parameter D defines the maximum context length (default is 40).

The prior parameter defines the prior for the emission parameters in the variable order Markov chain. If this parameter x is positive (for example 0.5) Beta(x, x) prior is used, otherwise (default) ML-prior is used (see [1]).

The iterations parameter defines the number of runs (default 10) performed. In each run two sets of haplotypes with maximal posterior probability are found. The first set is found by reading input genotypes in the ascending physical order and the second set by reading input genotypes in the reverse order. The output is a centroid of all found haplotypes.

The branching parameter defines the probability of branching in a context tree, which is used to present a variable order Markov chain. This parameter fixes the prior over all context trees by giving an independent probability of spliting or stopping at each node of this tree.

References

[1] Pasi Rastas, Jussi Kollin, Mikko Koivisto: Fast Bayesian Haplotype Inference via Context Tree Weighting, In: K. Crandall, J. Lagergren (Eds.), Proc. 8th Workshop on Algorithms in Bioinformatics - WABI 2008, pp. 259-270. pdf
Pasi Rastas
Last modified: Fri Oct 30 13:36:14 EET 2009