The HaploVisual is a Java applet for haplotype data visualization and analyzation. It is intended to be used by both computer scientists (bioinformatics researchers) and biologists. This is a small documentation about how to use this program. Most of the algorithms are explained in [Ukkonen02] .
HaploVisual is a signed applet. When the HaploVisual applet is opened with a modern browser, the browser should ask whether you trust the applet and want to give it access to your computer. If you don't give access, the applet will run, but you won't be able to load or save files and you might also encounter some visual artifacts on the screen.
Below is a picture of the program when it starts. The red lines and texts explain some functionality of the program.
Program loads three different file formats. Every format has an own item in File-menu ('open', 'open msat' and 'open snps'). The file-format for snps is just simply haplotype data in a file with every value separated by tab or space and lines separated by newline. Possible values are 0{.0}, 0.5 and 1{.0}, where 0.5 means missing value and 0 and 1 are the two alleles. Format for msat is similar, but possible values are 0, 1, 2, ... where 0 means missing value. In HaploVisual missing values do not make any difference. They are treated as any other value. Format for files using open (and for save) is explained in a table below. The same example input is also shown the way HaploVisual shows it and can be loaded as a file here.
If you choose 'Open cross' from File-menu, current founders are used to color the new loaded data. The data is assumed to be in open/save format. Cross opening is possible only if the founder set explains the new data.
|
ABC alphabets (one character each) 4 length of haplotypes 3 number of haplotypes 2 number of founders ACAA haplotype 1 BCCC haplotype 2 ACCC haplotype 3 (there are lines below this only if #founders > 0) 0 0 0 0 stored segmentation for haplotype 1 1 1 1 1for 2 0 0 1 1for 3 ACAA founder 1 BCCC founder 2 |
Choosing the 'New...' from the File-menu opens HaploVisual to a new window. All data is copied to the new window, so this functionality can be used to store the results.
You can change the color of a certain founder string by simply clicking mouse button on that string. The first 8 colors can be set to basic colors by clicking on text 'Founders'.
Haplotypes can be sorted by clicking different mouse buttons on the data. Left button moves character at the mouse position to the beginning of column. Middle button makes characters vote for their positions and right button is similar than left except that only lines below mouse position are affected.
Pressing 'Statistics' button shows some simple statistics of the current data. For example it shows the number of fragments and the average length of a fragment in the data.
Simple random inputs can be generated by pressing the 'Random'-button in the main window. Picture of the parameter dialog for this function is shown below.
Parameter n is the length and m is the number of the generated haplotypes. Value of k is the number of founders used to generate data. Parameter 'average' is the average length of fragments and 'colors' is the number of founders used to generate one haplotype. Alphabet is the set of characters used to generate the founders.
Random data is generated as follows. First a set of k founders, each of length n, are generated randomly choosing each character from the set of alphabets. Then the haplotypes are generated from 'colors' randomly chosen founders and changing founder with probability 1/'average'.
The 'permutate columns'-option is used to change the order of data columns to a random order.
By pressing 'Calc Founders' you can calculate the founder set using different algorithms. The following dialog opens when the 'Calc Founders'-button is pressed.
You can choose the algorithm by clicking desired radio-button. There are also three different tabs at the top of the window. Algorithms on all tabs are described below.
Most algorithms are found under this tab. You can limit the number of (#) founder, average fragment length or the minimum fragment length. The two lowest algorithms are optimal, and very slow. Those require a lot of memory and time, so they can be used only for a very small inputs.
'Remove character' is used for example to remove missing values in data. It tries to find as long similar neighborhood as possible from other haplotypes for the removed value. Then it copies the new value from the best haplotype for each removed character.
'Prune rows' removes haplotypes that have more chosen characters than 'limit' percent of the average.
'Optimal coloring' calculates the optimal coloring (segmentation) for a fixed founder set.
These algorithms, except for the test, use the greedy set cover approximation to cover the data with 'boxes'. A 'box' is a set of haplotypes for some interval with hamming distance smaller than 'hamming'. Only 'cover%' percent of the data is covered. These algorithms calculate also the founder set.
The algorithm Test is just simply covering the data with 'boxes' with larger area than 'min area'. Then statistics is made how the boxes lay on data and this statistics is plotted. This algorithm does not calculate founders and there is no possibility to use hamming distance with this algorithm.
The 'Multirun' option is used to run an algorithm several times, using for example 80% sample of the data. The chosen algorithm is run parameter 'iterations' times.
[Ukkonen02] E. Ukkonen, Finding founder sequences from a set of recombinants, WABI 2002: 277-286