Research Seminar on Markov Decision Processes (Fall 2006)


A Selection of Materials for Presentations

Semi-Markov Decision Processes:

M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming, Chapter 11. John Wiley & Sons, Inc. 1994.

Linear Programming for Constrained Problems:

M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming, Sections 6.9, 8.8, 9.3. John Wiley & Sons, Inc. 1994.

Approximate Linear Programming and Case Studies:

D. P. de Farias and B. Van Roy, "The Linear Programming Approach to Approximate Dynamic Programming," Operations Research, Vol. 51, No. 6, pp. 850-856, 2003. >>

Carlos Guestrin, Daphne Koller, Ronald Parr and Shobha Venkataraman. "Efficient Solution Algorithms for Factored MDPs," Journal of Artificial Intelligence Research (JAIR), Volume 19, pp. 399-468, 2003. >>

V. F. Farias and B. Van Roy. "Tetris: A Study of Randomized Constraint Sampling," in Probabilistic and Randomized Methods for Design Under Uncertainty, G. Calafiore and F. Dabbene, eds., Springer-Verlag, 2006. >>

Dynamic Games and Robust Control:

A. Nilim and L. El Ghaoui. "Robust Solutions to Markov Decision Problems with Uncertain Transition Matrices," EECS ERL Memo MO 04/26, January 2004. >>

D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Chapter 7. Athena Scientific, 1996.

Simulation-based methods and reinforcement learning:

J. N. Tsitsiklis and B. Van Roy, "An Analysis of Temporal-Difference Learning with Function Approximation," IEEE Transactions on Automatic Control, Vol. 42, No. 5, May 1997, pp. 674-690. >> (and other papers on the same site)

Andrew Y. Ng and Stuart Russell. "Algorithms for Inverse Reinforcement Learning," In Proceedings of the Seventeenth International Conference on Machine Learning, 2000. >>

Partially Observable Markov Decision Processes:

W. S. Lovejoy. "Computationally Feasible Bounds for Partially Observed Markov Decision Processes," Operations Research, 39, pp.162-175, 1991.

D. Aberdeen and J. Baxter. "Internal-state Policy-gradient Algorithms for Infinite-horizon POMDPs," Technical report, RSISE, Australian National University, 2002.