Leonid Peshkin
(pesha AT ai.mit.edu)
MIT Artificial Intelligence Lab.
Cambridge, MA 02139
Virginia Savova
(savova AT jhu.edu)
Johns Hopkins University
Baltimore, MD 21218
Abstract:
Reinforcement learning means learning a policy--a mapping of observations
into actions--based on feedback from the environment. The learning can be
viewed as browsing a set of policies while evaluating them by trial through
interaction with the environment. We present an application of gradient ascent
algorithm for reinforcement learning to a complex domain of packet routing in
network communication and compare the performance of this algorithm to other
routing methods on a benchmark problem.