Convergence

Will the Q Learning Algorithm converge toward a Q equal to the true Q function?

Yes, under certain conditions.

Assume the system is a deterministic MDP.
Assume the immediate reward values are bounded; that is, there exists some positive constant c such that for all states s and actions a, | r(s, a)| < c
Assume the agent selects actions in such a fashion that it visits every possible state- action pair infinitely often

Classification/Types of Operating Systems