Utilities
File Input
A dataset must store each transaction on a separate line as a list of items separated by white space and ending with a newline. Each item is a non-negative integer.The provided Data class can be downloaded here.
File output
To print all frequent itemsets to the output file, the FSout class can be used, provided here.Datasets
The following two datasets were generated using the generator from the IBM Almaden Quest research group. This generator can be downloaded from their website.Another implementation that can be compiled using the g++ compilers can be dowloaded from Paolo Palmerini's website.
The following datasets were prepared by Roberto Bayardo from the UCI datasets and PUMSB.
The next dataset was provided to us by Ferenc Bodon and contains (anonimized) click-stream data of a hungarian on-line news portal.
There are three datasets available which were used for the KDD CUP 2000.
They're described in the paper "Real world performance of association rule algorithms" by Zheng, Kohavi and Mason.
Before you can download the datasets, you are required to clickthrough on an agreement,
after which you recieve a password that will allow you to download the datasets:
