These are the original files. The following commands can be used to produce the 500-megabyte parts used in the experiments. bunzip2 -c fiwiki.bz2 | split_wikipedia.py - 500 revision bunzip2 -c enwiki.bz2 | split_wikipedia.py - 500 page Program split_wikipedia.py can be found in the RLCSA package.