Human Aided Reinforcement Learning in Complex Environments   [open pdf - 660KB]

From the abstract: "Reinforcement learning algorithms enable computer programs (agents) to learn to solve tasks through a trial-and-error process. As an agent takes actions in an environment, it receives positive and negative signals that shape its future behavior. To assist the process of learning, and to learn the task faster and more accurately, a human expert can be added to the system to guide an agent in solving the task. This project seeks to expand on current systems that combine a human expert with a reinforcement learning agent. Current systems use human input to modify the signal the agent receives from the environment, which works particularly well for reactive tasks. In more complex tasks, these systems do not work as intended. The manipulation of the environment's signal structure results in undesired and unexpected results for the agent's behavior following human training. Our systems attempt to incorporate humans in ways that do not modify the environment, but rather modify the decisions the agent makes at critical times in training. One of our solutions (Time Warp) allows the human expert to revert back several seconds in the training of the agent to provide an alternate sequence of actions for the agent to take. Another solution (Curriculum Development) allows the human expert to set up critical training points for the agent to learn. The agent then learns how to solve these necessary subskills prior to training in the entire world. Our systems seek to solve the planning requirement by employing a human expert during critical times of learning, as the expert sees fit. Our approaches to the planning requirement will allow the human expert-agent model to be expanded to more complex environments than the previous human systems developed. We hypothesize our project will increase the rate at which a reinforcement learning agent learns a solution to a specific task, and increase the quality of solutions to problems that require planning into the future, while successfully employing the use of a human teacher that guides the agents."

Report Number:
Trident Scholar Report No. 465
Public Domain
Retrieved From:
Defense Technical Information Center (DTIC): http://www.dtic.mil/dtic/
Media Type:
Help with citations