Research
 
My main line of research concerns causal inference from non-experimental (or partial experimental) data. The basic idea is explained below:
 
One of the main goals in science is the discovery of causal relationships. In disciplines ranging from biology to medicine to economics and social science, one seeks to know the prevailing cause-effect mechanisms between variables or phenomena of interest. In many studies, controlled randomized experiments can be run to establish causal relationships. A typical case is the testing of new medicines: subjects are randomly divided into treatment and control groups so that the groups are statistically identical but for the actual medicine given. Differences in responses between the groups can then be uniquely attributed to the effect of the medicine (taking into account statistical sampling errors, of course).
 
Unfortunately, controlled experiments can be too costly, unethical, or even technically impossible in many studies. Consider, for instance, the effect of alcohol consuption during pregnancy on the fetus. A proper experiment would involve forcing alcohol consumption for one group and prohibiting alcohol intake for the control group; such an experimental protocol is obviously unethical (and also difficult to carry out). Thus, one must rely on non-experimental (‘observational’) studies where subjects are free to choose, but their choices are recorded and analyzed. For such studies, the problem is that observed correlations do not necessarily imply causation; there are typically many different causal interpretations of a given set of data.  
 
Fortunately, the fact that a given set of observed data can be explained in lots of different ways does not make all competing explanations equally plausible. My research focuses on computational and statistical methods for finding the most credible causal explanations from non-experimental data. Such methods are particularly useful when there are many variables involved as the data cannot in this case be easily visualized. The basic framework I use is probabilistic (Bayesian) inference, in which the competing hypotheses are causal models, i.e. full descriptions of the way the data was generated. Such models not only define the joint distribution of observed variables but also specify the way the system would respond to interventions.