18This can be done using a special form of RPE called temporal difference (TD) error. See Sutton and Barto (2018, p. 268) who call it Bellman error, or related developments by Bhatnagar et al. (2009).