Header Ads Widget

REINFORCEMENT LEARNING

Reinforcement learning addresses the question of how an autonomous agent that senses and acts in its environment can learn to choose optimal actions to achieve its goals. 

INTRODUCTION 

  • Consider building a learning robot. The robot, or agent, has a set of sensors to observe the state of its environment, and a set of actions it can perform to alter this state.
  • Its task is to learn a control strategy, or policy, for choosing actions that achieve its goals.
  • The goals of the agent can be defined by a reward function that assigns a numerical value to each distinct action the agent may take from each distinct state.
  • This reward function may be built into the robot, or known only to an external teacher who provides the reward value for each action performed by the robot.
  • The task of the robot is to perform sequences of actions, observe their consequences, and learn a control policy.
  • The control policy is one that, from any initial state, chooses actions that maximize the reward accumulated over time by the agent. 

Example:

  • A mobile robot may have sensors such as a camera and sonars, and actions such as "move forward" and "turn."
  • The robot may have a goal of docking onto its battery charger whenever its battery level is low.
  • The goal of docking to the battery charger can be captured by assigning a positive reward (Eg., +100) to state-action transitions that immediately result in a connection to the charger and a reward of zero to every other state-action transition. 

Reinforcement Learning Problem

  • An agent interacting with its environment. The agent exists in an environment described by some set of possible states S.
  • Agent perform any of a set of possible actions A. Each time it performs an action a, in some state st the agent receives a real-valued reward r, that indicates the immediate value of this state-action transition. This produces a sequence of states si, actions ai, and immediate rewards ri as shown in the figure.
  • The agent's task is to learn a control policy, 𝝅: S A, that maximizes the expected sum of these rewards, with future rewards discounted exponentially by their delay.


Reinforcement learning problem characteristics

1.      Delayed reward: The task of the agent is to learn a target function 𝜋 that maps from the current state s to the optimal action a = 𝜋 (s). In reinforcement learning, training information is not available in (s, 𝜋 (s)). Instead, the trainer provides only a sequence of immediate reward values as the agent executes its sequence of actions. The agent, therefore, faces the problem of temporal credit assignment: determining which of the actions in its sequence are to be credited with producing the eventual rewards.

2.      Exploration: In reinforcement learning, the agent influences the distribution of training examples by the action sequence it chooses. This raises the question of which experimentation strategy produces most effective learning. The learner faces a trade-off in choosing whether to favor exploration of unknown states and actions, or exploitation of states and actions that it has already learned will yield high reward.

3.      Partially observable states: The agent's sensors can perceive the entire state of the environment at each time step, in many practical situations sensors provide only partial information. In such cases, the agent needs to consider its previous observations together with its current sensor data when choosing actions, and the best policy may be one that chooses actions specifically to improve the observability of the environment.

4.      Life-long learning: Robot requires to learn several related tasks within the same environment, using the same sensors. For example, a mobile robot may need to learn how to dock on its battery charger, how to navigate through narrow corridors, and how to pick up output from laser printers. This setting raises the possibility of using previously obtained experience or knowledge to reduce sample complexity when learning new tasks.

Post a Comment

0 Comments