Reading List

Hierarchical Structure and Other Decompositions for Addressing Very Large State Spaces in MDPs and POMDPs

Although it is clear that Markov decision processes (MDPs) and the corresponding partially-observable models (POMDPs) are powerful and general methods for modeling learning and planning problems in stochastic domains, our current understanding limits practical application of these techniques to relatively small state spaces (e.g., those which are explicitly ennumerable). A great deal of recent and ongoing effort has focused on developing hierarchical decompositions and approximations which allow us to represent "macros" of actions or "meta-states", typically at the cost of a sacrifice of optimality or Markovianness. The autonomous mobile robotics group is currently surveying such approaches. If you have suggestions for additions to this list, please contact Terran.

Note that this list has now expanded a bit beyond strictly hierarchy and decomposition work, to include background and related works as well such as Puterman and Kemery & Snell.

[Bertsekas and Tsitsiklis, 1996]: Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Optimization and neural computation series. Athena Scientific, Belmont, MA.
[Boutilier et al., 1999]: Boutilier, C., Dean, T., and Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1--94.
[Boutilier and Dearden, 1994]: Boutilier, C. and Dearden, R. (1994). Using abstractions for decision-theoretic planning with time constraints. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 1016--1022. AAAI Press.
[Boutilier et al., 1995]: Boutilier, C., Dearden, R., and Goldszmidt, M. (1995). Exploiting structure in policy construction. In Proceedings of the Fourteenth International Conference on Artificial Intelligence.
[Dayan and Hinton, 1993]: Dayan, P. and Hinton, G. E. (1993). Feudal reinforcement learning. In Hanson, S. J., Cowan, J. D., and Giles, C. L., editors, Advances in Neural Information Processing Systems 5. Morgan Kaufmann: San Mateo, CA.
[Dean et al., 1995]: Dean, T., Kaelbling, L. P., Kirman, J., and Nicholson, A. (1995). Planning under time constraints in stochastic domains. Artificial Intelligence, 76.
[Dietterich, 2000]: Dietterich, T. G. (2000). An overview of MAXQ hierarchical reinforcement learning. In Choueiry, B. Y. and Walsh, T., editors, Proceedings of the Symposium on Abstraction, Reformulation and Approximation SARA 2000, Lecture Notes in Artificial Intelligence. Springer Verlag, New York.
[Dietterich and Flann, 1995]: Dietterich, T. G. and Flann, N. S. (1995). Explanation-based learning and reinforcement learning: A unified view. In Proceedings of the Twelfth International Conference on Machine Learning, pages 176--184. Morgan Kaufmann.
[Hauskrecht et al., 1998]: Hauskrecht, M., Meuleau, N., Boutilier, C., Kaelbling, L. P., and Dean, T. (1998). Hierarchical solution of Markov decision processes using macro-actions. In Cooper, G. F. and Moral, S., editors, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98). Morgan Kaufmann.
[Kaelbling, 1993]: Kaelbling, L. P. (1993). Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the Tenth International Conference on Machine Learning, pages 167--173.
[Kaelbling et al., 1998]: Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101.
[Kaelbling et al., 1996]: Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237--277.
[Kearns et al., 1999a]: Kearns, M., Mansour, Y., and Ng, A. Y. (1999a). Approximate planning in large POMDPs via reusable trajectories. In Advances in Neural Information Processing Systems.
[Kearns et al., 1999b]: Kearns, M., Mansour, Y., and Ng, A. Y. (1999b). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. In Dean, T., editor, Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), volume 2, pages 1324--1331, Stockholm, Sweden. Morgan Kaufmann.
[Kemeny and Snell, 1976]: Kemeny, J. G. and Snell, J. L. (1976). Finite Markov Chains. Undergraduate Texts in Mathematics. Springer-Verlag, New York.
[Koller and Parr, 2000]: Koller, D. and Parr, R. (2000). Policy iteration for factored MDPs. In Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI 2000). Morgan Kaufmann.
[Koller and Pfeffer, 1997]: Koller, D. and Pfeffer, A. (1997). Object-oriented Bayesian networks. In Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence, pages 302—313, Providence, Rhode Island.
[Littman et al., 1995]: Littman, M. L., Dean, T. L., and Kaelbling, L. P. (1995). On the complexity of solving markov decision problems. In Proceedings of the Eleventh International Conference on Uncertainty in Artificial Intelligence (UAI-95).
[McCallum, 1997]: McCallum, A. K. (1997). Efficient exploration in reinforcement learning with hidden state. In AAAI Fall Symposium on "Model-directed Autonomous Systems". AAAI Press.
[McCallum, 1994]: McCallum, R. A. (1994). Reduced training time for reinforcement learning with hidden state. In The Proceedings of the Eleventh International Machine Learning Workshop (Robot Learning).
[McGovern et al., 1997]: McGovern, A., Sutton, R. S., and Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Proceedings of the 1997 Grace Hopper Celebration of Women in Computing, pages 13--18.
[Meuleau et al., 1998]: Meuleau, N., Hauskrecht, M., Kim, K.-E., Peshkin, L., Kaelbling, L. P., Dean, T., and Boutilier, C. (1998). Solving very large weakly coupled Markov decision processes. In Proceedings of the Fifteenth National Conference on Artificial Intelligence. AAAI Press.
[Moore et al., 1999]: Moore, A. W., Baird, L. C., and Kaelbling, L. (1999). Multi-value-functions: Efficient automatic action hierarchies for multiple goal MDPs. In Dean, T., editor, Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), Stockholm, Sweden. Morgan Kaufmann.
[Parr, 1998]: Parr, R. (1998). Flexible decomposition algorithms for weakly coupled Markov decision problems. In Cooper, G. F. and Moral, S., editors, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98). Morgan Kaufmann.
[Parr and Russell, 1998]: Parr, R. and Russell, S. (1998). Reinforcement learning with hierarchies of machines. In Jordan, M. I., Kearns, M. J., and Solla, S. A., editors, Advances in Neural Information Processing Systems, volume 10. The MIT Press.
[Precup, 2000]: Precup, D. (2000). Temporal Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst, Department of Computer Science.
[Puterman, 1994]: Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley \& Sons, New York.
[Singh and Cohn, 1998]: Singh, S. and Cohn, D. (1998). How to dynamically merge Markov decision processes. In Jordan, M., Kearns, M., and Solla, S., editors, Advances in Neural and Information Processing Systems 10, pages 1057--1063. MIT Press, Cambridge, MA.
[Sutton et al., 1999]: Sutton, R. S., Precup, D., and Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181--211.
[Tash and Russell, 1994]: Tash, J. and Russell, S. (1994). Control strategies for a stochastic planner. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA. AAAI Press.
[Tash, 1996]: Tash, J. K. (1996). Decision Theory Made Tractable: The Value of Deliberation, with Applications to Markov Decision Process Planning. PhD thesis, University of California, Berkeley.

People

Here are some of the people whose work is listed above. This is a far from complete list of those working in this subject; these are just the ones I've found recently. If you know a name (and URL) that belongs here, please contact me.

(TDL)