Will the Q Learning Algorithm converge toward a Q equal to the true Q function?
Yes, under certain conditions.
- Assume the system is a deterministic MDP.
- Assume the immediate reward values are bounded; that is, there exists some positive constant c such that for all states s and actions a, | r(s, a)| < c
- Assume the agent selects actions in such a fashion that it visits every possible state- action pair infinitely often
0 Comments