Research Seminar on Markov Decision Processes (Fall 2006)

Instructor/Organizer: Huizhen Janey Yu (janey.yu@cs.helsinki.fi)
Time and Location: Tue 16-18, Exactum C221


Description

This weekly seminar is about analytical and computational methods for discrete-time Markov decision processes (MDP), a general class of models for sequential decision making under uncertainty. We will study the fundamentals of MDPs as well as the more recent research works, which include approximation methods for large scale problems, simulation-based reinforcement learning methods, and partially observable Markov decision processes (POMDP). Our focus will be on MDPs with finite space models and the expected cost criteria.

The intention of the seminar in general is to introduce the audience to the stochastic dynamic programming methodology, to the extent that the audience will be able to analyze, apply, or develop such methods for their own area of interest.

A tentative syllabus is as follows:



Seminar Format

The development of the theory of MDPs is fairly linear: new analyses and methods build upon ones preceding them. For this reason, we will have lectures covering some of the fundamentals during the first 1/3 of the seminar, meanwhile the participants will have time to absorb and practice the ideas, and to decide which topic they would like to investigate for their own presentations. Later in the seminar, we will also have introductory lectures whenever we start a different topic.

The requirements for the participants are a term paper and a presentation. For presentation, each participant can either prepare a lecture (possibly jointly with another participant), or prepare a presentation on articles. For term paper, participants are encouraged to apply the methods to a problem of their own choice; the contents of the paper can include modeling issues, analysis, or computational results, and should demonstrate understanding of the subject of the seminar.


Reading Materials

We will have lecture notes in addition to the references.

Reference books:

[1] D. P. Bertsekas. Dynamic Programming and Optimal Control, Athena Scientific, 2005. (Slides on some materials of the book are available online.)

[2] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley series in probability and mathematical statistics, John Wiley & Sons, Inc., 1994.

Articles:

A tentative selection of articles for presentations >>


Lectures and Presentations

     05/09  Lecture 1: Introduction

     12/09  Prof. D. P. Bertsekas: Guest lecture on Neuro-Dynamic Programming

19,26/09  Lecture 2: Finite-horizon problems

                Student presentation on the Sequential Assignment Problem

     03/10  Lecture 3: Infinite-horizon discounted problems (I)

     10/10  Lecture 4: Infinite-horizon discounted problems (II)

     31/10  Lecture 5: Markov chains and classification of MDP based on chain structure

     07/11  Petteri Nurmi: Guest lecture on Game Theory and Reinforcement Learning with Applications to Routing in Ad Hoc Networks

     21/11  Prof. Jyrki Kivinen: Guest lecture on Combining Expert Advice

                Lecture 6: Selected Topics on Average Cost Problems (I)

     28/11  Student presentation on Robust Markov Decision Processes with Uncertain Transition Matrices

                Student presentation on Semi-Markov Decision Processes

     30/11  Student presentation on Inverse Reinforcement Learning

                Student presentation on the SEARN Algorithm

     07/12  Student Presentation on Linear Programming in Discounted Markov Decision Problems

                Student presentation on the Linear Programming Approach to Approximate Dynamic Programming

                Student presentation on Review of Reinforcement Learning Methods


Updated in December, 2006.