To produce a 400-megabyte prefix of the fiwiki archive, with each revision of a 
page as a separate document, use the following command.

  bunzip2 -c fiwiki.bz2 | wikipedia_extract.py - 400 revision

Replace 'revision' with 'page' to have all revisions of a page as one document. 

The following extracts the first 10000 documents instead and stores them in a
single file.

  bunzip2 -c fiwiki.bz2 | extract_sequences.py - 10000 revision

The Python script can be found in the RLCSA package.