Bandit-based planning in continuous action markov decision processes
In reinforcement learning, algorithms traditionally are concerned with finding a policy (an optimal mapping of all states to actions) in domains that have a finite state and action space. Extending this approach to spaces with continuous state and action spaces, however, is difficult because methods such as coarse discretization or function approximation can provably cause failure to converge to optimal values in many cases. In this talk, I will discuss a planning algorithm that functions natively in continuous action spaces and is agnostic to state during planning. As such, it does not suffer from problems which arise when trying to represent a global policy. Empirical results demonstrate that the algorithm outperforms current state of the art methods for planning in continuous state and action domains.
Department/affiliation: Rutgers Computer Science, Rutgers Perceptual Science