-----
What:
-----
This is the readme file for the Learning Optimal Bayesian Network Strructures
Weighted Partial MaxSAT datasets.
These datasets are part of publication:
J. Berg, A. Hyttinen, M. Jarvisalo:
"Applications of MaxSAT in Data Analysis"
----------
References
----------
The datasets are the product of the following paper:
Learning Optimal Bounded Treewidth Bayesian Networks via Maximum Satisfiability. Jeremias Berg, Matti Jarvisalo, and Brandon Malone. In Jukka Corander and Samuel Kaski, editors,
Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS 2014), volume 33 of JMLR Workshop and Conference Proceedings, pages 86-95. JMLR, 2014.
-----------------------
Datasets
-----------------------
These instances are all based on a score based approach to BSNL learning.
The MaxSAT instances were created from 21 different BNSL learning datasets with 3 different (2, 3, and 4) bounds enforced on the treewidth. This resulted in 63 instances.
Asia 100, 1000 and 10000:
All three sets are sampled from a BN with 8 nodes.
The precomputed local scores are available from
http://www.cs.york.ac.uk/aig/sw/gobnilp/
The scores are calculated using the BDeu scoring
function with an equivalent sample size of 1.
Abalone:
A raw dataset containing 9 nodes.
The raw data is available from: http://archive.ics.uci.edu/ml
We calculated the scores using the well known MDL scoring function.
Wine:
A raw dataset containing 14 nodes.
The raw data is available from: http://archive.ics.uci.edu/ml
We calculated the scores using the well known MDL scoring function.
Housing:
Precomputed scores from a dataset with 14 nodes.
The precomputed local scores are available from
http://www.cs.helsinki.fi/u/jazkorho/aistats-2013/
Adult:
Precomputed scores from a dataset with 15 nodes.
The precomputed local scores are available from
http://www.cs.helsinki.fi/u/jazkorho/aistats-2013/
Zoo:
A raw dataset containing 17 nodes.
The raw data is available from: http://archive.ics.uci.edu/ml
We calculated the scores using the well known MDL scoring function.
Voting:
A raw dataset containing 17 nodes.
The raw data is available from: http://archive.ics.uci.edu/ml
We calculated the scores using the well known MDL scoring function.
Hepatitis:
A raw dataset containing 20 nodes.
The raw data is available from: http://archive.ics.uci.edu/ml
We calculated the scores using the well known MDL scoring function.
Heart:
A raw dataset containing 23 nodes.
The raw data is available from: http://archive.ics.uci.edu/ml
We calculated the scores using the well known MDL scoring function.
Insurance 100, 1000
Sampled from a BN with 27 nodes.
The precomputed scores are available from
http://www.cs.york.ac.uk/aig/sw/gobnilp/
The scores are calculated using the BDeu scoring
function with an equivalent sample size of 1.
Horse:
A raw dataset containing 28 nodes.
The raw data is available from: http://archive.ics.uci.edu/ml
We calculated the scores using the well known MDL scoring function.
Flag:
A raw dataset containing 29 nodes.
The raw data is available from: http://archive.ics.uci.edu/ml
We calculated the scores using the well known MDL scoring function.
Water 100, 1000:
Sampled from a BN with 32 nodes.
The precomputed scores are available from
http://www.cs.york.ac.uk/aig/sw/gobnilp/
The scores are calculated using the BDeu scoring
function with an equivalent sample size of 1.
Alarm 100:
Sampled from a BN with 37 nodes.
The precomputed scores are available from
http://www.cs.york.ac.uk/aig/sw/gobnilp/
The scores are calculated using the BDeu scoring
function with an equivalent sample size of 1.
Hailfinder 100, 1000, 10000
Sampled from a BN with 56 nodes.
The precomputed scores are available from
http://www.cs.york.ac.uk/aig/sw/gobnilp/
The scores are calculated using the BDeu scoring
function with an equivalent sample size of 1.
-----------
FILE NAMES
-----------
Files are named following convention:
(Rounded)_BTWBNSL__TWBound**.wcnf
where
Rounded=Indication if the weights in the instance are rounded to whole numbers or not.
= Name of the dataset used
**** = The bound on the treewidth of the learned network enforced in the instance.
-------
CONTACT
-------
In case of questions please check the original paper first, then you can contact:
Jeremias Berg
email: jeremias.berg@cs.helsinki.fi
**