Course Material

Lectures

All lectures are given from the slides provided here. Please print your own copies before the lectures using the 2up versions provided). The lecture itself will sometimes digress to web-based material and Java applets for demonstrations. These will usually be listed on this page, but not on the PDF slides.

Lecture 1: Introduction: Graphs and Probability

Review of some famous graphical models including circuits, fault trees, entity-attribute diagrams, and ising lattices. Review of graphs including definitions for directed and undirected graphs, relations such as parents and ancestors, and connectivity notions. Review of probability as frequency or belief in both discrete and continuous contexts. We may play with the Change of variables applet and the Integration applet to better understand the implications of working with continuous domains. Review of modelling as a key part of intelligent systems design. Examples of using graphs to model dependence, causality and relevance. [Aug. 24th, Lecture 1 in PDF, in Postscript, 2-up, gzipped ]

Preparatory reading: Please review the material under graphs from Math World (know these definitions), the Wikipedia (only graph coloring should be understood, all others briefly browse) and the Ising Model. Review course notes of Three Concepts: Probability (in particular, Set 2 and 3 from the Material).

Lecture 2: Independence Modelling

Functional independence in optimization as a method for task simplification. Graph partitioning as a means of problem decomposition. Probabilistic independence as functional independence in probability space. Undirected graphs, relevance, and independence defined as set separation. Clique trees as undirected graphs on variable sets, and a normalised form for them called Almond trees which are a tree-structured generalization of binary independence. Directed graphs, dependence, and independence defined on the underlying undirected graph. Soundness and completeness between graphical independence and functional independence forms. A catalogue of graphical forms for time-series, spatial modelling, diagnosis and fault detection and their independence properties. [Sept. 30th, Lecture 2 in PDF, in Postscript, 2-up, gzipped ]

Preparatory reading: Please review Shachter's manuscript "Examples without Numbers" (local copy) and glance briefly at Scheines' tutorial on d-separation. You don't have to understand d-separation, because its difficult and we'll use a Lauritzen-style definition instead, but have a quick look at it. Part I of Bishop's tutorial (see below) is good material for this lecture. A demo will also be giving using B-Course.

Lecture 3: Exact Inference

Given a graphical model on discrete variables of low dimension, and a probability formula, evaluate the probability, or find the maximum likelihood assignment for the free variables. We will consider these two problems in a general sense. We will develop an exact algorithm for the computation as nested summation on a clustered graph using variable elimination as the basic step. This turns out to create a junction tree which we can turn into an Almond tree, the general independence structure from the last lecture. Thus we will consider finding good junction trees, and their complexity. We will not cover implementation details for the algorithm used to speed it up in practice. [Oct. 1st, Lecture 3 in PDF, in Postscript, 2-up, gzipped]

Variable elimination demo. Warning: uses IP addresses to distinguish users so expect trouble/conflicts if you are using a proxy for internet access.
Preparatory reading: TBD

Lecture 4: Approximate Inference

We will consider some basic types of approximate inference on graphs including Gibbs sampling, rejection sampling importance sampling. For Gibbs sampling, will will review OpenBugs, maintained at the local Statistics Department. Gibbs sampling will be compared with local search, a popular method in optimization and constraint satisfaction. [Oct. 6th, Lecture 4 in PDF, in Postscript, 2-up, gzipped]

Preparatory reading: We will look at examples from OpenBugs, given in Examples:Examples Volume I, Pump and Dogs. The Ising Model is another example of Gibbs, in fact it is the inverse of 2-coloring.

Lecture 5: Basic Learning and Graphs

We will consider a simple case of learning: proportions using the Beta Coin Experiment applet from the Prob./Stats Object Library. This example is covered in detail because it forms the basis of many algorithms. This introduces the notions of maximum likelihood, maximum aposterior, expected value, sufficient statistics, exponential family, the trade-off between complexity and predictive accuracy, and illustrates the problems with priors. The notions of maximum likelihood, maximum aposterior, sufficient statistics, exponential family, and problems with priors will be covered in more detail, making use of graphs as a tool to illustrate the variations.

Preparatory reading: Notes of Three Concepts: Probability Set 4 on proportions. Here is a presentation version.

Lecture 6: Selected Applications and Review

Please bring you questions about the assignments. We will review any material necessary so that everyone has the background they need to complete things. Some applications may also be covered.

Preparatory reading: None for this week.

Literature, further reading

There is no need to purchase any books. We will supply copies of any necessary material prior to lectures. Material available online will sometimes be in Adobe's PDF or Postscript format so you should be able to view these formats. Moreover, some of the demonstrations require Java capable browsers. Other than the use of these long-time standard formats, we make efforts to avoid online material that requires special browsers (e.g., thus material converted from Powerpoint is avoided). Note when we say something is a reference book, we mean you should not use it to learn the material, but only to check details or find more advanced material.

Graphs

Graph Theory from Math World including graph, directed acyclic graph, forest, tree, degree, outdegree and indegree, clique, induced subgraph, and connected component. Some additional material we will touch on and you probably wont know is cut point, mincut, minimum spanning tree, random graph, and phase transition. What is missing strangely is the notion of width and parent or ancestor relations.
The general notion of constraint satisfaction is often presented in a graphical context, including graph coloring as a special case. Note the gory proofs about average case analysis of graph coloring originated in Turner, "Almost All k-Colorable Graphs are Easy to Color", Journal of Algorithms, 1988.
Descriptions of venerable graph based models can be browsed on the Wikipedia including Unified Modeling Language, Entity-relationship model, Data flow diagram, Semantic network (e.g., WordNet), Decision tree, Influence diagram, and Circuit diagram.
Ising Model is an undirected graph in a grid where all points are connected to their horizontal and vertical neighbors. It works like a probabilistic version of the game of life.
“Why is Automatic Test Pattern Generation (ATPG) easy?” (PDF, 627kB) subject is nothing to do with this course, but the paper illustrates the notions of tree-width using a heuristic proxy called "cut-width" (a safe upper bound) to tree-width we define later. The hardness of problems in practice is shown as a function of this in a real-world context. Skim read this; you aren't supposed to understand the details.
- A local copy (PDF)
Research on phase transitions in NP-complete problems, the slide set from "Where the Really Hard Problems Are" by P. Cheeseman, B. Kanefsky, and W. Taylor, in IJCAI 1991 as in informal explanation of properties of optimization in graph problems.
“A tourist guide through treewidth” (PDF, 1311kB), by Bodlaender. Accessible but much more detail than we need. Yet shows you the breadth and use of the computational/optimization side of graphical modelling. Probabilistic graphs get a mention under the old tag of “expert systems”.
- A local copy

Probability

Three Concepts: Probability Course Material, especially the first four lectures, here in PDF: concepts, modelling, probability and proportions. Further links on probability can be found at the Three Concepts: Probability site.
For reference only a variety of useful links and tutorials at Tom Minka's home page at CMU including A Statistical Learning/Pattern Recognition Glossary and the tutorials section of Statistical Learning papers.
A good review of probability from a standard distributional perspective can be got from the excellent set of applets at the Prob./Stats Object Library. For instance, our proportions example will be done using the Beta Coin Experiment applet.
Reference on Axioms for Entropy from Generalized Information Measures and Their Applications by Inder Jeet Taneja.

Graphical Models

“Probabilistic Graphical Models Part One: Graphs and Markov Properties ” (PDF, 890kB), and “Probabilistic Graphical Models Part Two: Inference and Learning ” (PDF, 1394kB), by C.M. Bishop. Good tutorial overlapping quite a lot of the course.
- A local copy of Part I
- A local copy of Part II
Heckerman's tutorials, especially “A Tutorial on Learning with Bayesian Networks” and “Technical overview for statistician (includes undirected graphs) - slides from Valencia 2002 tutorial”. Used as a reference in the learning lectures.
- A local copy of "A Tutorial..."
- A local copy of "Technical overview..."
Friedman's AAAI-98 tutorial with Goldszmidt, covers different material to his NIPS-2001 tutorial, though the NIPS-2001 tutorial is more recent.
A Brief Introduction to Graphical Models and Bayesian Networks, Kevin Muphy's tutorial from 1998.
B-Course developed in the department. Used in the learning lectures.
Learning in Graphical Models edited by M.I. Jordan, 1999 (available in Computer Science Department library). An advanced reference book. Individual chapters also available on-line.
M.I. Jordan, “Graphical Models” (PDF, 1239kB), Statistical Science, 2003.
- A local copy
Graphical Models by S.L. Lauritzen, 1996 (available in Computer Science Department library). An advanced reference book. Lauritzen has been instrumental in a lot of early developments.
R. McEliece and S. M. Aji, 2000. The Generalized Distributive Law, IEEE Trans. Inform. Theory, vol. 46, no. 2 (March 2000), pp. 325--343. Overview and reference for exact computation.
Graphical Models in Applied Multivariate Statistics by Whitaker, 1990 (available in Valtiotieteellisen tiedekunnan kirjasto). A good reference book from the statistical side, intended as an advanced text for graduates.
“A catalogue of graphs” (PDF, 1318kB), by Studeny. Ignore the first 8 pages of theory, the rest catalogues different types of directed and undirected graphs over 2-5 variables. In general, Studeny is an excellent source of theoretical work on independence models, probability and graphs.
- A local copy
Graphical Models Discussion Papers edited by Joel Chenu, another reference, collection of research-oriented tutorials.
Cecil Huang and Adnan Darwiche. Inference in belief networks: A procedural guide. (PDF, 451kB) International Journal of Approximate Reasoning, 15(3):225-263, October, 1996.
- A local copy
Rina Dechter's Network-Based Reasoning - Bayesian Networks course at UCI.

References

Only use as references if needed.

Introduction to Probability Models by S.M.~Ross, Academic Press, 1989 (4th edition). A good introduction to probability for computer scientists with basic distributions, Markov models, queuing and reliability theory and simulation.
Stochastic Simulation by B.D. Ripley, John Wiley & Sons, 1987. A good introduction to sampling algorithms.
Multivariate Analysis by K.V. Mardia, J.T. Kent and J.M. Bibby, Academic Press, 1979. Introduction to multivariate distributions and their analysis based on vector and matrix algebra.
Matrix Computations by G.H. Golub and C.F. Van Loan, Johns Hopkins University Press, Baltimore, 1989 (2nd edition). Introduction to computation on matrices, SVD's, inverses and lots more.
Elements of Information Theory by T.M. Cover and J.A. Thomas, John Wiley & Sons, 1991. Introduction to everything about information theory.
Practical Optimization by P.E. Gill and W. Murray and M.H. Wright, Academic Press, 1981. Introduction to numerical optimization.

Toolboxes/Software

Things we use and recommend:

Bayesian Inference Using Gibbs Sampling (BUGS) and OpenBugs. The latter has a Linux version.
B-Course, which is a full-featured web-based learning system for graphical models.
Graphviz, the famous AT&T drawing tool with the dot utility. Used to make all our figures.

Other good software.

Kevyn Murphy's Bayesian Network Toolbox (BNT)
Intel's OpenBayes project
Steffen Lauritzen's Graphical Models in R
A Tool for Graphical Models -- CoCo
DIGRAM
Mixed Interaction Modeling (MIM)
TETRAD
Graphical Models Toolkit (GMTK)
Dia, cool graph drawing package under Unix and GPL.
ComponentAnalysis.ORG, software for discrete component analysis.




Graphical Models	2002