21A lot of replay is probably related to rewards, and thus to planning and RL, but some part of wandering thoughts and replay is clearly independent of any rewards. We saw earlier that people are able to perform unsupervised or supervised learning from a single representation of a data point ( 🡭). If you hear a nice melody, it may be replayed it in your mind repetitively, even quite obsessively. Such replay if best understood as performing some kind of unsupervised learning—which does not need any kind of reward or reinforcement signal. For example, it can be Hebbian learning or some kind of feature extraction, which learns the melody and its characteristics particularly well by repetition. The crucial similarity between reinforcement learning, Hebbian learning, and most kinds of machine learning is their iterative nature, and in particular, the need for many iterations. Some of that data may not be real data replayed, but simulated data more akin to planning; such simulation can in fact be used to perform learning in a Bayesian framework (Gutmann et al., 2018). An alternative theory on resting-state activity actually links it to the priors used in Bayesian perception (Berkes et al., 2011; Aitchison and Lengyel, 2016; Hoyer and Hyvärinen, 2003). The idea is that activities of the neurons in resting-state, at least in the sensory cortices, follow the prior distribution of those features that they are encoding. While this theory is not framed in terms of replay, we could interpret it as saying that resting-state activity is in some sense “replaying” typical sensory inputs. These two theories may thus not be incompatible, the replay or wandering thoughts theory focusing on reward processing and the Bayesian theory focusing on basic sensory processing.