|
Randomized Algorithms for Solving Markov Decision Processes
by
Jiaqiao Hu
State University of New York at Stony Brook, Stony Brook, NY, 11794
Coauthors: Hyeong Soo Chang, Michael C. Fu, Steven I. Marcus
We discuss two randomized algorithms for solving discounted reward Markov Decision Processes (MDPs). One algorithm targets MDPs with large state spaces but small action spaces, and uses the multi-armed bandit model as an efficient tool to allocate computational resources. The value function estimate produced by the algorithm is asymptotically unbiased and the worst case bias converges to zero at rate O(lnN/N), where N is the total simulation samples. The other algorithm is based on evolving a population of policies, in contrast to the usual policy iteration approach, which updates a single policy. The computational complexity of the algorithm is polynomial in the size of the state space, and is insensitive to the size of the action space; thus the algorithm is promising for MDPs with large action spaces.
Date received: March 12, 2007
Copyright © 2007 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # cauc-41.