Reading List
Hierarchical Structure and Other Decompositions for Addressing
Very Large State Spaces in MDPs and POMDPs
Although it is clear that Markov decision processes (MDPs) and the
corresponding partially-observable models (POMDPs) are powerful and
general methods for modeling learning and planning problems in
stochastic domains, our current understanding limits practical
application of these techniques to relatively small state spaces
(e.g., those which are explicitly ennumerable). A great deal of recent
and ongoing effort has focused on developing hierarchical
decompositions and approximations which allow us to represent "macros"
of actions or "meta-states", typically at the cost of a sacrifice of
optimality or Markovianness. The autonomous
mobile robotics group is currently surveying such approaches. If
you have suggestions for additions to this list, please contact Terran.
Note that this list has now expanded a bit beyond strictly hierarchy
and decomposition work, to include background and related works as
well such as Puterman and Kemery & Snell.
- [Bertsekas and Tsitsiklis, 1996]
-
Bertsekas, D. P. and Tsitsiklis, J. N. (1996).
Neuro-Dynamic Programming.
Optimization and neural computation series. Athena Scientific, Belmont, MA.
- [Boutilier et al., 1999]
-
Boutilier, C., Dean, T., and Hanks, S. (1999).
Decision-theoretic planning: Structural assumptions and computational
leverage.
Journal of Artificial Intelligence Research, 11:1--94.
- [Boutilier and Dearden, 1994]
-
Boutilier, C. and Dearden, R. (1994).
Using
abstractions for decision-theoretic planning with time constraints.
In Proceedings of the Twelfth National Conference on Artificial
Intelligence, pages 1016--1022. AAAI Press.
- [Boutilier et al., 1995]
-
Boutilier, C., Dearden, R., and Goldszmidt, M. (1995).
Exploiting
structure in policy construction.
In Proceedings of the Fourteenth International Conference on Artificial
Intelligence.
- [Dayan and Hinton, 1993]
-
Dayan, P. and Hinton, G. E. (1993).
Feudal
reinforcement learning.
In Hanson, S. J., Cowan, J. D., and Giles, C. L., editors, Advances in
Neural Information Processing Systems 5. Morgan Kaufmann: San Mateo, CA.
- [Dean et al., 1995]
-
Dean, T., Kaelbling, L. P., Kirman, J., and Nicholson, A. (1995).
Planning under time
constraints in stochastic domains.
Artificial Intelligence, 76.
- [Dietterich, 2000]
-
Dietterich, T. G. (2000).
An overview
of MAXQ hierarchical reinforcement learning.
In Choueiry, B. Y. and Walsh, T., editors, Proceedings of the Symposium on
Abstraction, Reformulation and Approximation SARA 2000, Lecture Notes in
Artificial Intelligence. Springer Verlag, New York.
- [Dietterich and Flann, 1995]
-
Dietterich, T. G. and Flann, N. S. (1995).
Explanation-based learning and reinforcement learning: A unified view.
In Proceedings of the Twelfth International Conference on Machine
Learning, pages 176--184. Morgan Kaufmann.
- [Hauskrecht et al., 1998]
-
Hauskrecht, M., Meuleau, N., Boutilier, C., Kaelbling, L. P., and Dean, T.
(1998).
Hierarchical
solution of Markov decision processes using macro-actions.
In Cooper, G. F. and Moral, S., editors, Proceedings of the Fourteenth
Conference on Uncertainty in Artificial Intelligence (UAI-98). Morgan
Kaufmann.
- [Kaelbling, 1993]
-
Kaelbling, L. P. (1993).
Hierarchical learning in stochastic domains: Preliminary results.
In Proceedings of the Tenth International Conference on Machine
Learning, pages 167--173.
- [Kaelbling et al., 1998]
-
Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. (1998).
Planning and
acting in partially observable stochastic domains.
Artificial Intelligence, 101.
- [Kaelbling et al., 1996]
-
Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996).
Reinforcement
learning: A survey.
Journal of Artificial Intelligence Research, 4:237--277.
- [Kearns et al., 1999a]
-
Kearns, M., Mansour, Y., and Ng, A. Y. (1999a).
Approximate
planning in large POMDPs via reusable trajectories.
In Advances in Neural Information Processing Systems.
- [Kearns et al., 1999b]
-
Kearns, M., Mansour, Y., and Ng, A. Y. (1999b).
A sparse
sampling algorithm for near-optimal planning in large Markov decision
processes.
In Dean, T., editor, Proceedings of the Sixteenth International Joint
Conference on Artificial Intelligence (IJCAI-99), volume 2, pages
1324--1331, Stockholm, Sweden. Morgan Kaufmann.
- [Kemeny and Snell, 1976]
-
Kemeny, J. G. and Snell, J. L. (1976).
Finite Markov Chains.
Undergraduate Texts in Mathematics. Springer-Verlag, New York.
- [Koller and Parr, 2000]
-
Koller, D. and Parr, R. (2000).
Policy iteration for factored MDPs.
In Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth
Conference (UAI 2000). Morgan Kaufmann.
- [Koller and Pfeffer, 1997]
-
Koller, D. and Pfeffer, A. (1997).
Object-oriented Bayesian networks.
In Proceedings of the Thirteenth Annual Conference on Uncertainty in
Artificial Intelligence, pages 302—313, Providence, Rhode Island.
- [Littman et al., 1995]
-
Littman, M. L., Dean, T. L., and Kaelbling, L. P. (1995).
On the complexity of
solving markov decision problems.
In Proceedings of the Eleventh International Conference on Uncertainty in
Artificial Intelligence (UAI-95).
- [McCallum, 1997]
-
McCallum, A. K. (1997).
Efficient exploration in reinforcement learning with hidden state.
In AAAI Fall Symposium on "Model-directed Autonomous Systems". AAAI
Press.
- [McCallum, 1994]
-
McCallum, R. A. (1994).
Reduced training time for reinforcement learning with hidden state.
In The Proceedings of the Eleventh International Machine Learning Workshop
(Robot Learning).
- [McGovern et al., 1997]
-
McGovern, A., Sutton, R. S., and Fagg, A. H. (1997).
Roles of
macro-actions in accelerating reinforcement learning.
In Proceedings of the 1997 Grace Hopper Celebration of Women in
Computing, pages 13--18.
- [Meuleau et al., 1998]
-
Meuleau, N., Hauskrecht, M., Kim, K.-E., Peshkin, L., Kaelbling, L. P., Dean,
T., and Boutilier, C. (1998).
Solving very large
weakly coupled Markov decision processes.
In Proceedings of the Fifteenth National Conference on Artificial
Intelligence. AAAI Press.
- [Moore et al., 1999]
-
Moore, A. W., Baird, L. C., and Kaelbling, L. (1999).
Multi-value-functions: Efficient automatic action hierarchies
for multiple goal MDPs.
In Dean, T., editor, Proceedings of the Sixteenth International Joint
Conference on Artificial Intelligence (IJCAI-99), Stockholm, Sweden.
Morgan Kaufmann.
- [Parr, 1998]
-
Parr, R. (1998).
Flexible
decomposition algorithms for weakly coupled Markov decision problems.
In Cooper, G. F. and Moral, S., editors, Proceedings of the Fourteenth
Conference on Uncertainty in Artificial Intelligence (UAI-98). Morgan
Kaufmann.
- [Parr and Russell, 1998]
-
Parr, R. and Russell, S. (1998).
Reinforcement
learning with hierarchies of machines.
In Jordan, M. I., Kearns, M. J., and Solla, S. A., editors, Advances in
Neural Information Processing Systems, volume 10. The MIT Press.
- [Precup, 2000]
-
Precup, D. (2000).
Temporal Abstraction in Reinforcement Learning.
PhD thesis, University of Massachusetts, Amherst, Department of Computer
Science.
- [Puterman, 1994]
-
Puterman, M. L. (1994).
Markov Decision Processes: Discrete Stochastic Dynamic Programming.
John Wiley \& Sons, New York.
- [Singh and Cohn, 1998]
-
Singh, S. and Cohn, D. (1998).
How to dynamically merge
Markov decision processes.
In Jordan, M., Kearns, M., and Solla, S., editors, Advances in Neural and
Information Processing Systems 10, pages 1057--1063. MIT Press,
Cambridge, MA.
- [Sutton et al., 1999]
-
Sutton, R. S., Precup, D., and Singh, S. (1999).
Between MDPs and semi-MDPs: A framework for temporal abstraction in
reinforcement learning.
Artificial Intelligence, 112:181--211.
- [Tash and Russell, 1994]
-
Tash, J. and Russell, S. (1994).
Control
strategies for a stochastic planner.
In Proceedings of the Twelfth National Conference on Artificial
Intelligence, Seattle, WA. AAAI Press.
- [Tash, 1996]
-
Tash, J. K. (1996).
Decision
Theory Made Tractable: The Value of Deliberation, with Applications to
Markov Decision Process Planning.
PhD thesis, University of California, Berkeley.
People
Here are some of the people whose work is listed above. This is a far
from complete list of those working in this subject; these are just
the ones I've found recently. If you know a name (and URL) that
belongs here, please contact
me.
(TDL)