Header Ads Widget

Convergence

 Will the Q Learning Algorithm converge toward a Q equal to the true Q function?

Yes, under certain conditions.

  1. Assume the system is a deterministic MDP.
  2. Assume the immediate reward values are bounded; that is, there exists some positive constant c such that for all states s and actions a, | r(s, a)| < c
  3. Assume the agent selects actions in such a fashion that it visits every possible state- action pair infinitely often

Post a Comment

0 Comments