7For a comprehensive treatment of different algorithms, see (Sutton and Barto, 2018). Most algorithms for both action-value and state-value learning are based on very similar recursive computations based on the Bellman equation just described in footnote 5.