Chapter 7
Fast and slow intelligence and their problems

This chapter concludes Part I by discussing the connections between the different forms of information processing and frustration we have seen so far. To this end, we need to understand better two different modes of processing in the brain, which coincide with those in modern AI. They were already discussed in Chapter 4: neural networks and logic-based Good Old-Fashioned AI. The idea of two complementary systems, or processes, is ubiquitous in modern neuroscience and psychology. It is assumed that the two systems in the brain work relatively independently of each other while complementing each other’s computations. This leads to what is called “dual-process” or “dual-systems” theories. The two systems, or modes of operation, roughly correspond to unconscious processing in the brain’s neural networks, and conscious language-based thinking. In this chapter, we go a bit deeper into that distinction due to its great importance for understanding suffering.

Each of the two systems has its own advantages and disadvantages, which is the theme of this chapter, and in fact, a theme to which we will return many times in this book. Neural networks are based on learning: they need a lot of data and result in inflexible functioning. On the other hand, GOFAI relies on well-defined categories which may not be found in reality, and the computations needed may be overwhelming as in planning. On the positive side, we show how the two systems can work together in recent AI systems. We conclude by discussing how the different forms of frustration seen in earlier chapters are related to this two-systems distinction, providing a synthesis of such different forms of suffering.

Fast and automated vs. slow and deliberative

Let us start by the viewpoint on the two systems given by cognitive psychology and neuroscience.1 According to such “dual-process” (or “dual-systems”) theories, one of the two systems in the brain is similar to the neural networks in AI: It performs its computation very fast, and in an automated manner. It is fast thanks to its computation being massively parallel, i.e., happening in many tiny “processors” at the same time. It is automated in the sense that the computations are performed without any conscious decision to do so, and without any feeling of effort. If visual input comes to your eyes, it will be processed without your deciding to do so, and usually you recognize a cat or a dog in your visual field right away, that is, in something like one-tenth of a second.2 Most of the processing in this system is also unconscious. You don’t even understand how the computations are made; the result of, say, visual recognition just somehow appears in your mind, which is why this system is also called “implicit”.

The processing in the conscious, GOFAI-like system is very different. To begin with, it is much slower. Consider planning how to get home from a restaurant where you are the first time: you can easily spent several seconds, even minutes, solving this planning task. The main reason is that the computations are not parallelized: They work in a serial way, one command by another, so the speed is limited by the speed of a single processing unit. In humans, another reason why symbolic processing is slow is, presumably, that it is evolutionarily a very new system, and thus not very well optimized. Other typical features of such processing are that you need to concentrate on solving the problem, the processing takes some mental effort, and it can make you tired. Such processing is also usually conscious, which means that you can explain how you arrived at your conclusion; hence the system is also called “explicit”.3

Note that in an ordinary computer, the situation above is in some ways reversed, as already explained on 🡭 in Chapter 4. A computer can do logical operations much faster than neural network computations, since logical operations are in line with its internal architecture. In fact, a computer can only do neural network computations based on a rather cumbersome conversion of such analog operations into logical ones. Presumably, in fact, the brain is only able to do logical operations after converting them into neural network computations, and that is equally cumbersome.

To see the division into two systems particularly clearly, we can consider situations where the two systems try to accomplish the same task, say, classification of visual input. We can have a neural network that proposes a solution, as well as a logic-based system that proposes its own. Sometimes, the systems may agree; at other times, they disagree.

Suppose a cat enters your visual field. When the conditions for object recognition are good, your visual neural network would recognize it as a cat. In other words, the network would output the classification “cat” with high certainty. However, when it is dark, and you only get a faint glimpse of the cat that runs behind some bushes, your neural network might not be able to resolve the categorization. It might say it is probably either a cat or a dog, but it cannot say which. At this point, the more conscious, logic-based system might take over. You recall that your neighbour has a cat; you don’t know anybody who owns a dog near-by; you think this is just the right moment in the evening for a cat to hunt for mice. Thus, you logically conclude it was probably a cat. In this case, the task of recognizing an object used the two different systems, working together. The logic-based one took quite some time and effort to use, while the neural network gave its output immediately and without any effort. Here, the systems were not completely independent, since the logic-based system did need input from the neural network to have some options to work on.

The two systems can also disagree, as often happens in the case of fear. Talking about fear and related emotional reactions, people often call them “hard-wired”. This expression is not too far from reality. What happens is that the brain uses special shortcut connections to relay information from the eye to a region called the amygdala, an emotional center in the brain. This shortcut by-passes those areas where visual information is usually processed.4 If such a connection learns to elicit fear (due to a previous unpleasant encounter with some animals, for example), it will be very difficult to get rid of it. Any amount of reasoning is futile, presumably since the visual signal triggering fear is processed by completely different brain areas than logical, conceptual reasoning. Often, the logic-based system loses here, and the neural-network-based fear prevails. This division into two processes also explains why it is difficult for us to change unconscious associations, such as fear: the conscious, symbolic processing has limited power over the neural networks.

Interestingly, people tend to think that the main information processing in our brain happens by the conscious, symbolic system, including our internal speech and conceptual thinking. But what if that is simply the tip of the iceberg, as early psychoanalysts5 claimed more than a hundred years ago? The idea that most information processing is conscious and conceptual may very well be an illusion. We may have such an impression because conceptual processing requires more effort, or because it is more accessible to us by virtue of being conscious. However, if you quantify the amount of computational resources which are used for conceptual, logical thinking, and compare them with those used for, say, vision, it is surely vision that will be the winner.6

Similar to the dual-process theories in cognitive psychology and neuroscience just described, the division between GOFAI and neural networks has been prominent in the history of AI research, which has largely oscillated between the two paradigms. Currently, neural networks are very popular, while GOFAI is not used very widely. However, this may very well change, and perhaps in the future, AI will combine logic-based and neural models in a balanced way. Since GOFAI is used by humans, it is very likely to have some distinct advantage over neural networks, at least for some tasks.7

Note that in AI we find another important distinction which is not very prominent in the neuroscientific literature: learning vs. no learning. Neural networks in AI are fundamentally based on learning, and using them without learning is not feasible. In contrast, in its original form, Good Old-Fashioned AI promises to deliver intelligence without any learning. That comes at the cost of much more computation, and more efforts spent on programming. This distinction seems to relevant to the brain, even if not as strict as in AI, as we will see next.8

Neural network learning is slow, data-hungry, and inflexible

To understand the relative advantages of the two systems, let us first consider the limitations in neural networks, and especially the learning that they depend on. First of all, neural network learning is data-hungry: it needs large amounts of data. This is because the learning is by its very nature statistical; that is, it learns based on statistical regularities, such as correlations. Computing any statistical regularities necessarily needs a lot of data; you cannot compute statistics by just observing, say, two or three numbers.

Second, neural network learning is slow. Often, it is based on gradient optimization, which is iterative, and needs a lot of such iterations. The same applies to Hebbian learning, where changing neural connections takes many repetitions of the input-output pairs—this is natural since Hebbian learning can be seen as a special case of stochastic gradient descent. In fact, to input a really large number of data points into a learning system almost necessarily requires a lot of computation, since each data point takes some small amount of time to process.

This statistical and iterative nature of neural network learning has wide-ranging implications for AI. To begin with, these properties help us to further understand why it is so difficult, in us humans, to change any kind of deeply ingrained associations. Mental associations are presumably in a rather tight correspondence with neural connections: If you associate X with Y, it is because there are physical neural connections between the neurons representing X and Y. Now, even if any statistical connection ceases to exist in the real world, perhaps because you move to live in a new environment, it will take a long time before the Hebbian mechanisms learn to remove the association between X and Y, or to associate X with something else.9

In fact, these learning rules, whether basic Hebbian learning or some other stochastic gradient methods, may seem rather inadequate as an explanation for human learning: We humans can learn from single examples and do not always need a lot of data. You only need to hear somebody say once “Helsinki is the capital of Finland”, and you have learned it, at least for a while. Surely, you don’t need to hear it one thousand times, although that may help. This does not invalidate the neural network models, however, since the brain has multiple memory systems, and Hebbian learning is only one way we learn things and remember them—we will get back to this point in Chapter 9.10

The iterative nature of neural learning, together with the two-process theory, also helps to explain in more detail why it is so difficult to deliberately change unconscious associations. Suppose you consciously decide to learn an unconscious association between X and Y (where X might be “exercise” and Y might be “good”). How can you transfer such information from the conscious, explicit system to the neural networks? Perhaps the best you can do is to recall X and Y simultaneously to your mind—but that has to be done many times! In fact, you are kind of creating a kind of new data and feeding it into the unconscious association learning in you brain. You are almost cheating your brain by pretending that you perceive the association “X and Y” many times. We will see many variations on this technique when we consider methods for reducing suffering in Chapter 15.

Another limitation is that when a neural network learns something, it is strictly based on the specific input and output it has been trained on. While this may seem like an obvious and innocuous property, it is actually another major limitation of modern AI. Suppose that a neural network in a robot is trained to recognize animals of different species: It can tell if a picture depicts a cat or a dog, or any other species in the training set. Next, suppose somebody just replaces the camera in the robot with a new one, with higher resolution. What happens is that the neural network the robot previously trained does not work anymore. It will have no idea how to interpret the high-resolution images since they do not match the templates it learned for the original data. A similar problem is that the learning is dependent on the context: An AI might be trained by images where cats tend to be indoors and dogs outdoors, and it will then erroneously classify any animal pictured indoors as a cat. The AI sees a strong correlation between the surroundings and the animal species, and it will not understand that the actual task is about recognizing the animals and not recognizing the surroundings. That is why a neural network will typically only work in the environment or context it is trained in.11

In light of these limitations, AI based on neural networks is thus rather different from what intelligence usually is supposed to be like in humans. In a celebrated experiment, human participants started wearing goggles containing a prism which made their world look upside down. Surprisingly soon, the participants were able to function normally; somehow, their visual systems were able to process the input correctly in spite of the inverted visual input.12 In general, when humans learn to perform a task, they are often somehow able to abstract general knowledge out of the learning material, and they are able to transfer such knowledge from one task to another. It has even been argued that the hallmark of real intelligence is that it is able to function in many different kinds of environments and accomplish a variety of tasks without having to learn everything from scratch. If all a robot can do is to mow the lawn, we would think it is just accomplishing a mechanical task and is not “really” intelligent.13

Using planning and habits together

Let us next look at how the two systems might interact in AI. Regarding action selection, we have actually seen how two different approaches can solve the same problem in AI: reinforcement learning and planning. Planning is in fact one of the core ideas of the GOFAI theory. Planning is undeniably a highly sophisticated and demanding computational activity, and probably impossible for simple animals—some would even claim it is only present in humans, although that is a hotly debated question.14 In any case, it seems to correspond closely to the view humans have about their own intelligence, and therefore was the target of early AI research. However, in the 1980s, there was growing recognition that building agents, perhaps robots, whose actions show human-level intelligence is extremely difficult, and it may be better to set the ambitions lower. Perhaps building a robot which has the level of intelligence of some simple animal would be a more realistic goal. Moreover, like in other fields of AI, learning gained prominence. That is why habit-like reinforcement learning started to be seen as an interesting alternative to planning.15

Habits die hard—and are hard to learn

However, habit-based behaviour has its problems, partly similar to those considered above for neural network learning. Learning the value function, that is, learning habits, obeys the same laws as other kinds of machine learning. It needs a lot of data: the agent needs to go and act in the world many, many times. This is a major bottleneck in teaching AI and robots to behave intelligently, since it may take a lot of time and energy to make, say, a cleaning robot try to clean the room thousands of times. Basic reinforcement algorithms are also similar to neural network algorithms in that they work by adjusting parameters in the system little by little, based on something like the stochastic gradient methods.

Another limitation which is crucial here is that the result of the learning, the state- or action-value function, is very context-specific. If the robot has learned the value function for cleaning a room, it may not work when it has to clean a garden. Even different rooms to clean may require slightly different value functions! The world could also change. Suppose the fridge from which the robot fetches the orange juice for its master is next to a red closet. Then, the robot will associate the red closet with high value since seeing it, the robot knows it is close to being able to get the juice. However, if somebody moves the closet to a different room, the robot will start acting in a seemingly very stupid way: It will go to the room which now has the red closet when it is supposed to get the orange juice—in fact, it might simply approach any new red object introduced to its environment in the hope that this is how it finds the fridge. It will need to re-learn its action-values all over again.

Here we come to the other side of the slowness of learning habits: Once a habit is learned, it is difficult to get rid of it. In humans, the system learning and computing the reinforcement value function is outside of any conscious control: We cannot tell it to associate a smaller or larger value to some event. This is why we often do things we would prefer not to do, out of habit. In order to learn that a habit is pointless in the sense that it does not give any reward anymore (as happened with the robot above), a new learning process has to happen, and this is just as slow as the initial learning of the habit. That is why habits die hard.16

Combining habits and planning

These problems motivate a recent trend in AI: combining planning and habit-like behaviour. The habit-based framework using reinforcement learning will lead to fast but inflexible action selection, and is ideally complemented by a planning mechanism which searches an action tree a few steps ahead—as many as computationally possible. Depending on the circumstances, the action recommended by either of the two systems can then be implemented.17

Let us go back to the robot which is trying to get the orange juice from the fridge. One possible way of implementing a combination of planning and habit-like behaviour is to have a habit-based system help the planning system in the tree search. Using reinforcement learning, you could train a habit-based system so that when the robot is in front of the fridge whose door is closed, the system suggests the action “open the door”. When the door of the fridge is open with orange juice inside, the habit-based system suggests “grab the orange juice”. While these outputs could be directly used for selecting actions, the point here is that we can use them as mere suggestions to a planning system. Such suggestions would greatly facilitate planning: The search can concentrate on those paths which start with the action suggested by the habit-based system, focusing the search and reducing its complexity. However, the planning system would still be able to correct any errors in the habit-like system, and could override it if the habit turns out to be completely inadequate.

One very successful real-world application using such a dual-process approach is AlphaGo, a system playing the board game of Go better than any human player.18 The tree to be searched in planning consists of moves by the AI and its opponent. This is a classical planning problem in a GOFAI sense. The world has a finite number of well-defined states, and also, the actions and their effects on the world are clearly defined, based on the rules of the game. What is a bit different is that there is an opponent whose actions are unpredictable; however, that is not a big problem because the agent can assume that the opponent chooses its actions using the same planning engine the agent uses itself.

The search tree in Go is huge since the number of possible moves at any given point of the game is quite large, even larger than in chess. In fact, the number of possible board positions (positions of all the stones on the board) is larger than the number of atoms in the universe—highlighting the fundamental problem in GOFAI-style planning. Since it is computationally impossible to exhaustively search the whole tree, AlphaGo randomly tries out as many paths as it has time for. This leads to a “randomized” tree search method called Monte Carlo Tree Search. Algorithms having some randomness deliberately programmed in them are often called Monte Carlo methods after the name of a famous casino. However, even such a random search is obviously quite slow and unreliable.19

The crucial ingredient in AlphaGo is another system which learns habit-like behaviours. This system is used inside the planning system, a bit like in the juice robot just described. While the system is rather complex, let’s just consider the fact that in the initial stage of the learning, AlphaGo looks at a large database of games played by human experts. Using that data, it trains a neural network to predict what human experts would do in a given board position—the board positions correspond to the states here. The neural network is very similar to those used in computer vision, and gets as input a visual view of the Go board. This part of the action selection system could be interpreted as learning a “habit”, i.e., an instinctual way of playing the game without any planning.20 The action proposed by the habit system can be used as such, but even more intelligent performance is obtained by using it as a heuristic for the tree search: the tree search is focused on paths related to that proposed action. This heuristic is further refined by further learning stages. In particular, the system also learns to approximate the state-values by another neural network.21

Such suggestions based on neural networks are fast, and intuitively similar to what humans would do. Often, a single glimpse at the scene in front of your eyes will tell a lot about where reward can be obtained, and suggests what you should do. Even when humans are engaged in planning, such input coming from neural networks often guides the planning. If you go to get something from the fridge, don’t you have almost automated reactions to seeing the fridge door closed, and seeing your favourite food or drink inside the fridge? These are presumably given by a simple neural network. Yet, there is a deliberative, thinking aspect in your behaviour, and you can change it if you realize, for example, that the juice has gone bad—which the simple neural network did not know.

What is typical in humans is that action selection can also switch from one system to another as a function of practice. Learning a new skill, such as driving a car, is a good example—skills are similar to habits from the computational viewpoint. First, you really have to concentrate and consciously think about different action possibilities. With increasing practice and learning, you need to think less and less, since something like a value function is being formed in the brain. In the end, your actions become highly automated, and you don’t really need to think about what you are doing anymore. The habit-based system takes over and drives the car effortlessly.22

Advantages of categories and symbols

While in this example of Go playing, neural networks and GOFAI work nicely together, it is actually not easy in most other tasks to show any clear utility of symbolic AI approaches. This may of course change any time, since AI is a field of rapid development. It is quite likely that GOFAI is necessary for particularly advanced intelligence—something much more advanced than what we have at this moment. Yet, the tendency has recently been almost the opposite: Tasks which were previously thought to be particularly suitable for symbolic AI have been more successfully solved by neural approaches.23 Perhaps symbolic AI works with board games only because such games are in a sense discrete-valued: the stones on the Go board can only be in a limited number of positions, so the game is inherently suitable for GOFAI. So, we have to think hard about what might be the general advantages of logic-based intelligence compared to neural networks. In the following, I explore some possibilities.

GOFAI is more flexible and facilitates generalization

Suppose that there is a neural network that recognizes objects in the world and outputs the category of each object. Then, what would be the utility of operating on those categories as discrete entities, using symbolic-logical processing, instead of having just a huge neural network that does all the processing needed?

We have already seen, more than once, one great promise of GOFAI in the case of planning: flexibility. Given any current state and any goal state, a planning system can, if the computational resources are sufficient, find a plan to get there. If anything changes in the environment—say, it is no longer possible to transition between two states due to some kind of blockage—the planning system takes that into account without any problems. This is in contrast to reinforcement learning which will not know what to do if the environment changes; it may have to spend a lot of time re-learning its value functions.

Furthermore, GOFAI is easily capable of representing various kinds of data structures and relationships in the same way as a computer database. For example, it can easily represent the fact that both cats and dogs are animals, i.e. the hierarchical structure of the categories. It can also represent the relationship that the character string “Scooby” is the name of a particular dog. This adds to the flexibility of GOFAI by allowing more abstract kinds of processing, which are easily performed by humans.24

Even without going into logical operations, we can consider the advantages of using discrete categories (cats vs. dogs) instead of a high-dimensional feature space with continuous values. A wide-spread idea is that categories are useful for generalizing knowledge over categories, which in its turn underlies various forms of abstract thinking. Even though cats are not all the same, it is useful to learn some of their general properties. They like milk, they purr; they don’t like to chew bones like dogs do, and they are not dangerous like bears. Having categories enables the system to learn to associate various properties to the whole category: Observing a few cats drink milk, the system learns to associate milk-drinking to the whole category of cats, instead of just some individual cats. Importantly, associating properties to categories means the system was able to generalize: after seeing some of the cats drink milk, it inferred that all cats drink milk. Such generalization is clearly an important part of intelligence. If the system needed to learn such a property separately for each cat, it would be in great trouble when it sees a new cat and needs to feed it — it would have no idea what to do. But, learning that the whole category of cats is associated with milk-drinking, it knows, immediately and without any further data, what to give to this new cat.

Categories enable communication

Nevertheless, I think the feature which makes GOFAI fundamentally different from neural networks is that the use of symbols is similar to using some kind of a primitive language. In fact, you can hardly have GOFAI without some kind of a language—perhaps akin to a programming language—in which the symbols and logical rules are expressed. It is equally clear that with humans, language is primarily used for communication between individuals. As each category typically corresponds to a word, humans can communicate associations, or properties of categories, to each other. I can tell my friend that cats drink milk, so she does not need to learn what to feed to cats by trial and error. I have condensed my extensive data on cats’ eating habits into a short verbal message that I transmit to her.

So, it is plausible that the main reason humans are capable of symbolic thinking is that it enables them to communicate with each other. After such a communication system was developed during evolution, humans then started using the same system for various kinds of intelligent processing even when alone. Perhaps we started by telling others, for example, where to find prey.25 This led to the development of symbols and logical operations, which were found useful for abstract thinking: Perhaps you could try to figure out yourself where you should hunt tomorrow. Eventually, such capabilities ended up producing things such as in quantum physics—and the very theory of GOFAI.26

A reflection of the utility of categories in communication may be seen in a recent research line in AI which tries to develop systems whose function is easy to interpret by humans.27 If you use a neural network to recognize a pattern, the output may be clear and comprehensible, but the computations—why did the network give that particular output— are extremely difficult to understand for humans. This is fine in many cases, but sometimes it is necessary to explain the decision to humans. For example, if an AI rejects your loan application, the bank using the AI may be legally obliged to explain the grounds for that decision.28 Researchers developing such interpretable AI usually end up doing something similar to GOFAI boosted by learning, since it gives rules which can be expressed in more or less ordinary language, and thus they can be explained. In fact, in Chapter 4 we saw examples of GOFAI systems whose functioning is easy to understand and to explain.29

Categorization is fuzzy, uncertain, and arbitrary

Now, let us consider the flipside: problems that arise when using categories. We have already seen some problems in the logical-symbolic processing, the most typical being the exponential explosion of computation in planning. Here, we focus on the consequences of using categories, and look at the question from a more philosophical angle. Indeed, it has been widely recognized by philosophers over the centuries that dividing the world into “crisp” categories can only be an approximation of the overwhelming complexity of the world. I focus on some issues which will in later chapters be seen to be relevant for suffering.30

Categories are fuzzy

Philosophers have long pointed out that there may not be any clearly defined categories in the world. Granted, the difference between cats and dogs may be rather clear, but what about the category of, say, a “game”? Wittgenstein gave this as an example of a category which has no clear boundaries. Different games have just some vague similarity, which he called “family resemblance”.

This idea has been very influential in AI under the heading of fuzziness. A category is called fuzzy if its boundaries are not clear or well-defined. Consider for example the word “big”. How does one define the category of big things? For simplicity, let us just consider the context of cities. If we say “London is big”, that is clearly true: London definitely belongs to the category of big things, in particular big cities. But if we say “Brussels is big”, is that true or false? How does one define what is big and what is not? In the case of cities, we could define a threshold for the population, but how would we decide what it should be? An AI might learn to categorize cities into big and small ones based on some classification task—in Chapter 4, we discussed how this might happen in categorizing body temperature into “high fever” or not. However, that categorization would depend on the task, and there would always be a grey zone where the division is rather arbitrary.

The consensus in AI research is that many categories are quite fuzzy and have no clear boundaries; there are only different degrees of membership to a category. There is no way of defining a word like “big” (or, say, “nice”, “tall”, “funny”) in a purely binary fashion. There will always be objects that quite clearly belong to the category and objects which clearly do not belong to the category, but for a lot of objects the situation is not clear. In the theory of fuzzy logic, such fuzziness is modelled by giving each object a number between 0 and 1 to express the degrees of membership to each category.31

Categorization is uncertain

In addition, categorization is always more or less uncertain. Any information gleaned from incoming sensory input is uncertain, for reasons we will consider in more detail in Chapter 10. Partly, it is a question of the neural network getting limited information, and partly because of its limited information-processing capabilities. If you have a photograph of a cat taken in the dark and from a bad angle, the neural network or indeed any human observer may not be sure about what it is. They might say it is a cat with 60% probability, but it could be something else as well. In other words, any categorization by an AI is very often a matter of probabilities.

It is important to understand that fuzziness and uncertainty are two very different things. Uncertainty is a question of probabilities, and probabilities are about lack of information. If I say that a coin flip is heads with 50% probability and tails with 50% probability, there is no fuzziness about which one it is. After flipping the coin I can say if it is heads or tails, and no reasonable observer would disagree with me (except in some very, very rare cases). In other words, uncertainty is a question of not knowing what will happen or has happened, i.e., a lack of information about the world. In contrast, fuzziness has nothing to do with lack of information; it is about the lack of clear definition. We cannot say if the statement “Brussels is big” is true even if we have every possible piece of information about Brussels, including its exact population count. According to the information I find on Wikipedia, its population is 1,191,604, but knowing that will not help me with the problem if I don’t know how many inhabitants are required for a city to be in the “big” category.

Humans are not good at processing uncertainty. Various experiments show that humans tend to use excessively categorical thinking, where the uncertainty about the category membership is neglected. That is, when you see something which looks to you most probably like a cat, your cognitive system tends to ignore any other possibilities, and think it is a cat for sure.32

An old Buddhist parable about these dangers in categorization is seeing a rope in the dark and thinking it is a snake. You miscategorize the rope, and your brain activates not only the category of a snake, but all the associations related to that category (“animal”, “dangerous”). You get scared, with all the included physiological changes, such as an increased heart rate. If you had properly taken uncertainty of such categorization into account, your reaction might have been more moderate.

Categorization is arbitrary

In some cases, the categories are not just fuzzy or uncertain: Their very existence can be questionable. Consider concepts such as “freedom” or “good”. Even forgetting about any difficulties in programming an AI to understand them, is it even clear what these words mean? Certainly, they mean different things to different people: people from different cultural backgrounds may easily misunderstand each other simply because they use such concepts with slightly different meanings. A great amount of time can be spent in attempting to just describe the meanings of certain words and categories. In fact, we spend more than one chapter on analysing the category called “self” in this book.

Even in rather straightforward biomedical applications of machine learning, we often use categories that are not well-defined. For example, in a medical diagnosis context, it is not clear if what we usually call schizophrenia is a single disease. Perhaps there are a number of different diseases which all lead to the single diagnosis of schizophrenia.33 Developing effective medications may only be possible once we understand all the subtypes, while thinking of all the subtypes as a single disease (a single category) may mislead any treatment attempts.

Moreover, a categorization that works for one purpose might not be suitable for another. We might divide people into different nationalities, which is very useful from the viewpoint of knowing what languages they are likely to understand. However, we can too easily use the same categories to predict all kinds of personality traits of those individuals, and that prediction may go quite wrong. In more general terms, categories and their utility depend on the context; different people use categories in different ways, thus they are subjective.

Such arbitrariness of categories has been well appreciated in some philosophical schools. In the Yogācāra school of Buddhism, it is claimed that “while such objects [as chairs and trees] are admissible as conventions, in more precise terms there are no chairs, or trees. These are merely words and concepts by which we gather and interpret discrete sensations that arise moment by moment in a causal flux.“34 What arises in such a moment-by-moment flux is, in our terminology, activities in neural networks. Categories are created afterwards, by further information-processing.

Overgeneralization

It may be easy to understand that miscategorization leads to problems, as in mistaking a rope for a snake. However, the biggest computational problem caused by all properties just discussed—fuzziness, uncertainty, arbitrariness—may be overgeneralization. Overgeneralization can be difficult to spot, even after the fact, which makes it particularly treacherous.

Overgeneralization means that you consider all instances of a category to have certain properties, even if those properties hold only for some of them. Since categories are fuzzy, anything which is not really firmly inside the category may actually be quite different from its prototype. Related to this, you may not acknowledge the uncertainty of categorization and the ensuing generalization. Even more rarely do people acknowledge that the very categories are arbitrary.

Overgeneralization effects are well documented, for example, in perception of human faces, where gender and race can bias any conclusions you make about the individual involved.35 As an extreme case of overgeneralization, if you have been bitten by a dog, you may develop a fear towards all dogs, which would be called a phobia. Such fear is quite irrational in the sense that it is very unlikely that the other dogs would bite you. This is a very concrete example of how thinking in terms of categories leads to suffering, as will be discussed in more detail in later chapters.

There are actually good computational reasons why overgeneralization occurs. Learning only a limited number of categories and using them without too much reserve means that knowledge gleaned from all the instances of each category can be pooled together; at the same time, the computational load is decreased. If you actually had enough data from all the dogs in the world, and had unlimited computational capacities, you would know some of them are safe while a few are not. However, data and computation are always limited, so some shortcuts may be necessary—even if they increase your suffering. This is another theme that we will return to over and over again in this book.

The many faces of frustration: Summarizing the mechanisms of suffering

With this framework of information-processing in two systems, we can better summarize the previous chapters, in which we saw several computational ideas related to suffering. We started by considering two basic mechanisms: frustration, and threat to the person or the self. Later, we argued that threats to the self can be seen as special cases of frustration, namely frustration of self-needs (Chapter 6). Thus, we obtained a unified view in which suffering is based on error signals typically related to frustration of some kind.36 We first defined frustration as not reaching a goal (Chapter 3) and later in terms of reward loss and reward prediction error (Chapter 5). In fact, these two kinds of frustration align well with the dual-process theory—slow vs. fast or GOFAI vs. neural networks—considered in this chapter.

However, there is much more than just two kinds of frustration. Reality is of course a bit more complex than such a clean division into planning and habit-based actions. Consider a case where you are yourself going to fetch the orange juice from the fridge. You formulate a plan which involves high-level actions such as going to the fridge, opening the door, etc. Once you are in front of the fridge, your habit-based system suggests you open the door by a certain sequence of muscle contractions which you have performed hundreds of times and which has become quite automated.

Now, suppose you follow the habit-based system and pull the door handle, but the door does not quite open. This kind of “frustrates” your habit of opening the door. But do you suffer? Probably not very much; you just pull again with more force, and if it opens, you hardly register anything out of the ordinary happened. In contrast, if you don’t get the juice at all—because the door is somehow broken and does not open at all— your long-range plan is frustrated, you will definitely suffer. And you really should suffer: All that planning and even the walking was in vain. A strong error signal has to be sent throughout your brain, and that is suffering.

Frustration on different time scales

This example points out one important aspect of action selection: its temporally hierarchical nature, involving simultaneous computations on different time scales.37 In the brain, there are also processes operating at many different time scales. So, some form of frustration can be operating on many different levels simultaneously. In one extreme, the agent may be planning long action sequences, and if they fail, frustration ensues in the sense of not reaching the goal. In the other extreme, a habit-based reinforcement learning system builds predictions on what kind of rewards or changes in state-values are associated with different actions, and computes whether there is reward loss or an RPE. Predictions are made on a millisecond time scale as well as on the time scale of days if not years. Each such time scale has its own learning mechanism using its own errors.38

Such division into time scales brings us to the concept of intention—defining intention as commitment to a goal, as discussed in Chapter 3. The point in intentions is to partly resolve conflicts between long-term and short-term optimization. I can have many desires simultaneously and spend some time thinking about each of them, and perhaps even planning each of them to some extent. But I’m not really hoping to reach all the goals related those desires. Once I decide to commit to one of the goals, that is what sets the goal which can then be frustrated. I would argue that in the case of planning, frustration is not so much due to desire itself but to the ensuing intention. This is in line with the more elaborate expositions of the Buddha’s philosophy on suffering which divide desire into initial desire and a later part called attachment (also translated as “clinging” or “grasping”). Attachment is a process where after an initial feeling of desire (“Nice, chocolate, I would like to have it”), you firmly attach to the object of your desire (“I must have that chocolate”). This distinction is, I think, similar to the distinction between desires and intentions: Buddhist philosophy suggests a central role for attachment, or intention, in the process which creates suffering. While such attachment or intention is not necessary for frustration to occur, I propose that it greatly amplifies it. This is logical because intentions consider longer time scales, and thus an error with intention is more serious, since more time and energy was lost in formulating and executing the plan that failed.

In fact, there is something that works on an even longer time scale: the frustration of self-needs treated in Chapter 6. Self-needs often work on time scales of days, months, even years. Casual observation suggests that they produce some of the very strongest frustrations. Different time scales may further be related to van Hooft’s different kinds of frustration discussed in Chapter 2: frustration of biological functioning, of desires and emotions (in his terminology, which may be different from this book), of more long-term life goals, and even of the sense of the meaning of one’s existence.

Suffering based on desires, expectations, and general errors

Another major difference between the two kinds of frustration—not reaching the goal and computation of a reward loss—is that reward loss is based on violation of expectations, while not reaching the goal is in line with the typical definition of frustration as not getting what one wants, i.e. violation of desires. One way of resolving this is to consider that the term “expectation” may have a slightly different meaning in the case of action selection, and especially planning. The agent is executing a plan in order to get to the goal state, and it is in that sense “expecting” to get to that goal state. Earlier, we saw (🡭) how Epictetus talks about desire “promising” the attainment of its object. Thus, the expectation related to planning could simply be defined as the goal state being reached.39 Then, reward loss would be the same as the basic frustration of not reaching the goal, that is, the object of its desire (by the definition of Chapter 3).

Alternatively, we could see frustration (of desires) and reward loss (compared to expectation) as two distinct, if closely related phenomena, both of which produce suffering. What they have in common is that some kind of error occurred. On the computational level this is mainly an alternative way of defining the word “frustration”. But it opens up the possibility of a very general viewpoint where the connection between suffering and error signalling does not need to be concerned with goals or rewards at all. We all know that it is unpleasant if we expect something and then it does not happen, even if the event we were predicting was neutral in the sense of providing no reward. Thus, it is possible that there is some kind of suffering in almost any prediction error.40 Most interestingly, it has been proposed that dopamine signals prediction errors for events not related to reinforcement, so it might provide a neural mechanism for general signalling of errors.41

In fact, I am tempted to think that desire (or aversion) in itself, especially when combined with intention, can immediately create some kind of suffering even before any frustration. Perhaps the internal representation of a goal state which is different from the current state is an error that automatically leads to the triggering of an error signal, and to suffering. (Alternatively, it could be that the system predicts that there will likely be frustration, and this leads to suffering by some kind of anticipation.) I will not develop this point any further here, but I point out that aversion in the form of irritation is clearly a kind of suffering in itself, while it is somewhat inadequately explained by the developments in earlier chapters.42

Frustration is further modified by the context. If you are deliberately engaged in the learning of, say, a new skill, errors are quite natural and you are likely to feel less frustration; in a sense, you are expecting that there are errors. Or, if your prediction of the reward is uncertain, i.e. only very approximate, the frustration is likely to be weaker. We will have much more to say about such effects in later chapters.

To recapitulate, we see quite a wide spectrum of frustration-related error signalling. Not reaching a goal, not getting an expected reward, or making an error in predicting any event, can all be seen in this same framework. They work on different time scales, and use different systems in the dual-process framework. It seems that particularly strong suffering is obtained by frustration of planning, and even stronger by frustration of self-needs.

Why there is frustration: Outline of the rest of this book

We also need to understand why there is frustration in the first place. On some level, it is obvious that we cannot always reach our goals, or get what we want, if only because of the limitations in our physical skills and strength: We cannot move mountains. The world is also inherently uncertain and unpredictable, so even the perfect plan may fail because something unexpected happens. Yet, more interesting for our purposes are the cognitive limitations. As argued earlier, cognition is something that can be relatively easily intervened on, and modified to some extent. Thus it is more feasible to reduce suffering by focusing on the cognitive mechanisms, instead of trying to develop devices that physically move mountains. Therefore, it is crucial to understand in as much detail as possible how various processes of information-processing contribute to suffering.

We have already seen several information-processing limitations that can produce or amplify frustration. For example, planning is difficult due to the exponential explosion of the number of paths, which means our plans may be far from optimal. We need a lot of data for learning: data may be lacking to build a good model of the world, or to learn quantities such as state values. Categories are often used in action selection—in particular, if the world is divided into states—but these categories may not even be well-defined. The cognitive system may be insatiable and always want more and more rewards. There are several self-related needs which can create particularly strong suffering by mechanisms related to frustration.

Next, Part II goes into more depth regarding such limitations that produce suffering, focusing on the origins of uncontrollability and uncertainty. Later, Part III will consider methods for reducing suffering, mainly by reducing frustration. I will summarize all the different aspects of frustration in a single “equation”, and propose various methods, or interventions, to reduce frustration based on the theory of Parts I and II. Such methods will largely coincide with what Buddhist and Stoic philosophy propose, and include mindfulness meditation as an integral tool.