The synthetic DNA data in:

Jouni Sirén, Niko Välimäki, Veli Mäkinen, and Gonzalo Navarro: Run-Length 
Compressed Indexes Are Superior for Highly Repetitive Sequence Collections.
In 15th Symposium on String Processing and Information Retrieval (SPIRE 2008), 
Springer-Verlag LNCS 5280, pp. 164-175, Melbourne, Australia, November 10-12, 
2008.

The compiler used in the experiments was

  g++ (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)

with a 32-bit Intel target. The random number generator should be the same for any 
GCC 4.* on any 32-bit platform.

File dna.50MB.gz contains the 50 MB prefix of the DNA collection from Pizza & 
Chili corpus. Decompress it and use mutator to generate the data sets. For 
example,

  mutator dna.50MB output 4 25 0.003

writes the 25 x 4 MB data set at mutation rate 0.003 to file output.

Mutation rates used in the experiments were the following:

  0.000
  0.001
  0.003
  0.005
  0.010
  0.015
  0.020
  0.030
  0.040
  0.050