Improving Weak Reinforcement Learning with Multiplicative Approximation Guarantees

A Boosting Approach to Reinforcement Learning

We study efficient algorithms for reinforcement learning in markov decision processes whose complexity is independent of the number of states.We consider the methodology of boosting, borrowed from supervised learning, for converting weak learners into an accurate policy.We prove sample complexity and running time bounds on our method, that are polynomial in the natural parameters of the problem : approximation guarantee, discount factor, distribution mismatch and number of actions.