Algorithms for scalable Bayesian learning of causal DAGs.

Developed at the Sums of Products research group at the University of Helsinki. Originally published at NeurIPS 2020 [1]​.

Installation and use

Both of the algorithms are implemented in Sumu. After installing Sumu version 0.1.1 with the command pip install sumu==0.1.1 (following its installation instructions) you can run the algorithms from command line with, for which the -h flag prints help:

$ python -h
usage: [-h]
		       [-c {opt,top,pc,mb,ges,greedy,greedy-lite,back-forth}]
		       [-s {bdeu,bge}] [-e ESS] [-m MAX_ID] [-d D]
		       [-b BURN_IN] [-i ITERATIONS] [-n NTH] [-nc N_CHAINS]
		       datapath K



  A path to a space separated file of either discrete or continuous
  data. No header rows for variable names or arities (in the discrete
  case) are assumed. Discrete data is assumed to be integer encoded;
  continuous data uses "." as decimal separator.

  The data path argument should be followed by the number K of candidate
  parents to use for each node, and additional optional arguments as
  explained in this help.


  Files for:
  • Candidate parents found with the selected algorithm.
  • Gadget sampled DAGs.
  • Beeps estimated causal effects (if ran on continuous data).

Example run

  $ python cont_data.csv 10 -s bge


  [1] Jussi Viinikka, Antti Hyttinen, Johan Pensar, and Mikko
  Koivisto. Towards Scalable Bayesian Learning of Causal DAGs. In
  NeurIPS 2020, in press.

positional arguments:
  datapath              path to data file
  K                     how many candidate parents to include

optional arguments:
  -h, --help            show this help message and exit
  -c {opt,top,pc,mb,ges,greedy,greedy-lite,back-forth}, --candidate-parent-algorithm {opt,top,pc,mb,ges,greedy,greedy-lite,back-forth}
			candidate algorithm to use (default: greedy-lite)
  -s {bdeu,bge}, --score {bdeu,bge}
			score function to use
  -e ESS, --ess ESS     equivalent sample size for BDeu
  -m MAX_ID, --max-id MAX_ID
			maximum indegree for scores (default: no max-indegree)
  -d D                  maximum indegree for psets which are not subsets of
			candidates (default: 2)
  -b BURN_IN, --burn-in BURN_IN
			number of burn-in samples (default: 1000)
  -i ITERATIONS, --iterations ITERATIONS
			number of iterations after burn-in (default: 1000)
  -n NTH, --nth NTH     sample dag every nth iteration (default: 10)
  -nc N_CHAINS, --n-chains N_CHAINS
			number of Metropolis coupled MCMC chains (default: 16)
  -r RANDOMSEED, --randomseed RANDOMSEED
			random seed
  -o OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX
			path prefix for output files (default: input file