This directory contains the data files used in [Eronen et al. 04]. -------------------------------------------------------------------------- Simulated data -------------- The setting corresponds to an association study in a population isolate. The population simulator of Vesa Ollikainen [Oll02] was used to generate the test data. We simulated a population with effective founder population of size 20 (20 founders each with independent random haplotypes with uniformly distributed alleles). The population then expanded for 20 generations with random mating, leading to a final population of 100000 individuals. We used a sample of 500 genotypes, drawn randomly and independently from the last generation. We experimented separately with biallelic markers (SNPs) and 6-allele markers (microsatellites). In the experiments we always used a marker map of 32 evenly spaced markers. The major parameter varied in the experiments was the distance between adjacent markers: it ranged between 0.01 and 1 cM. The simulated chromosomal regions have, respectively, genetic lengths between 0.31 and 31 cM. 10 independent simulations were run for each different setting. The names of the simulated data files have the following format: A_...hpm, where is the distance in centiMorgans between each adjacent pair of markers, is the number of different alleles and tells which of the 10 independent simulations is in question. -------------------------------------------------------------------------- Daly data --------- As a real data set we used the publicly available data from [Daly et. al. 01], which consists of 129 genotyped trios from a European derived population. The marker map consists of 103 SNP:s ranging over 500 kb located on chromosome 5q31. The original data set was obtained from http://www.broad.mit.edu/humgen/IBD5/raw_data.txt. We inferred the haplotypes of 129 children from the trios and used the nontransmitted chromosomes as an extra 129 (pseudo) haplotype pairs. Markers for which both alleles could not be inferred were marked as missing. From the resulting set of 258 haplotype pairs, the ones with more than 20% missing alleles were removed, leaving 147 haplotype pairs as the test set. The test set is contained in the file "daly.hpm". -------------------------------------------------------------------------- References ---------- [Eronen et al. 04] L. Eronen, F. Geerts, H. Toivonen. A Markov Chain Approach to Reconstruction of Long Haplotypes. In Proceedings of the 9th Pacific Symposium on Biocomputing(PSB), 104-115, January 2004. World Scientific. [Daly et. al. 01] Mark J. Daly, John D. Rioux, Stephan F. Schaffner, Thomas J. Hudson, and Eric S. Lander. High-resolution haplotype structure in the human genome. Nature Genetics, 29:229-232, 2001. [Oll02] Vesa Ollikainen. Simulation techniques for disease gene localization in isolated populations. PhD thesis, University of Helsinki, Finland, 2002.