¹⁷In game-playing AI, planning by simulation has been used in an extreme way in terms of “self-play”. A much-publicized example is AlphaGo, the system that first beat humans in the board game of Go, which we used as an example of dual processes earlier (Silver et al., 2016). After being input information on a huge number of actual games played by humans, it started playing against itself. This is a very special kind of planning, where you are simulating your opponent as part of the environment. Actually, there is no distinction between the agent itself and the opponent since the same agent “plays” both of them, and learns from the successes and failures of both of them. A later version of the AlphaGo system actually omits the learning from human games altogether and learns entirely by playing against itself; the ensuing system is aptly named AlphaGo Zero (Silver et al., 2017). Pure self-play has also allowed for an AI to rapidly approach human level in a highly complicated, multi-player esports video game called Dota 2: Using more than 100,000 processors running self-play in parallel, the OpenAI Five system can simulate in one day the same amount of data that would take more than a hundred years to collect in ordinary play against humans (OpenAI, 2018). Self-play was also used to achieve super-human performance in the game of poker (Brown and Sandholm, 2018). Actually, such learning by self-play was successfully used earlier in simpler games such as backgammon (Tesauro, 1995) and, even back in 1959, in checkers (Samuel, 1959), in one of the earliest machine learning projects. While some human knowledge was input to the learning process in most of the preceding studies, Tesauro (1995) also reported a variant with pure self-play similar to AlphaGo Zero. Something akin to self-play is actually used by humans when they are simulating social encounters in their own minds: We might use the same model for the actions of other people and the actions of ourselves, and learn both simultaneously. (Yet, the connection between our model of our own mind and our model of the minds of others is complex, see Carruthers (2009) for different possibilities.)