Chapter 6
Suffering due to self-needs

In addition to frustration, we identified another cause of suffering in Chapter 2: loosely speaking, suffering related to self. Self is a concept with a bewildering array of meanings. Psychology, philosophy, and neuroscience offer a multitude of definitions, and I can make no claim to treat the concept comprehensively.

I focus here on two meanings of “self” directly related to suffering. First, self as the target of evaluation of some kind of long-term success of the agent. The human brain, in particular, has a system that constantly evaluates the agent, checking whether the goals set were reached or rewards obtained, and seeks to improve its general performance. Second, we have self as the target of self-preservation, or survival instinct: all animals have behavioural tendencies to avoid death or organic damage. (A third meaning of self, related to control, will be treated in Chapter 11, and the concept of self-awareness, in Chapter 12.)

Such self-evaluation and self-preservation are computational mechanisms which are constantly operating in animals, and it is easy to justify their computational utility for any intelligent agent. Although at first sight, these aspects of self may seem to provide a mechanism for suffering which is completely different from frustration, I will also show how they are related to frustration of internal, higher-level goals and rewards. As the title of this chapter indicates, these aspects of self can thus be seen as needs, or desires, and they can be frustrated.

Self as long-term performance evaluation

Let us start with self as something whose performance is being constantly evaluated at different levels. As we saw earlier, in reinforcement learning, every single action is always evaluated to improve future actions. The reward prediction error is computed even in the simplest algorithms. If the reward is incorrectly predicted, the error is used by the learning algorithm to improve the prediction—if the prediction was too high, set it lower in the future, for example. While such computations are crucial for learning to act optimally, the errors also trigger the suffering signal according to the theory of the preceding chapter.

However, the situation is complicated by the fact that the learning algorithms themselves contain many parameters describing how the algorithm itself works. One fundamental parameter is how quickly the system should learn: If it learns too quickly, the new information will tend to override the old one, thus leading to forgetting. Below, we will see another parameter which is how much of the time the agent should spend on relatively random exploration of the environment. There are many such parameters in a sophisticated learning system.

Therefore, sophisticated AI should be able to adjust such internal parameters by itself. This is called learning to learn.1 Such learning to learn requires constant monitoring of the performance of the basic learning algorithms. If the current internal-parameter settings do not lead to good learning, adjustments have to be made. This requires an internal signalling system, not unlike the suffering signal, but typically working on a longer time scale, since it takes a long time to see if a learning system learns well.2

Self-esteem and depression

In humans, mood is a signalling system working on a longer time scale. Mood is defined as an emotional state which is more long-lasting than single emotional episodes (such as being angry or feeling afraid). A low mood may take days, if not weeks or months, to change. A psychological concept which works on an even longer time scale is self-esteem: an overall view of the self as worthy or unworthy.3

Depression may in fact be an extreme case of the performance signalling made by the self-evaluation system. One theory proposes that depression occurs when goals are not reached, and moreover, constant attempts to improve performance fail.4 That is, the agent has to admit that whatever it tries, nothing works. In such a case, there is still one last strategy that may help: wait and see. The environment may eventually change by itself, even if you do nothing. Perhaps, after a while, with some luck, the circumstances will be more favourable. Such a “wait and do nothing” program may explain some depressive symptoms, such as passivity and lack of interest in any activities.5

It would clearly make sense to program such a “depressive” mechanism in an AI. If the current algorithms are simply not working at all, it would be better for the agent to just wait and see if the world changes for the better. Such waiting will save energy, and perhaps will also enable the AI to perform some further computations to improve its performance in the meantime.

Self-destructing systems

What if an AI comes to the conclusion that it is not able to fulfill its task at all? Perhaps something went very much wrong in the design of the learning algorithm, or the task is completely impossible, and the circumstances do not seem to change for the better. The most extreme solution would then be for the AI to “destroy” itself.

Suppose you launch many AI agents, or programs, that work more or less independently inside some computing system. If one of the agents is not achieving anything, it would be natural that you terminate its execution. This would free up computational resources for other agents—assuming all the agents are running on the same shared processors—and other agents might be more successful.

To make this possible, there has to be a system for evaluating each AI agent’s performance as a whole. Importantly, the evaluation does not have to be done by an external mechanism; it could be part of the agent itself, which could then decide to self-destruct. There is nothing paradoxical or impossible in such a self-destruction system. It can be explicitly programmed in the agent by a human programmer—while it may indeed be quite impossible for the agent itself to learn such self-destruction behaviour.

It is possible that in some cases, even biological organisms may engage in such self-destruction sequences. Such an idea is quite speculative because it is not obvious why evolution would favour such behaviour. It is clearly possible that the designer of an AI system can explicitly create the self-evaluation and destruction systems, but in biological evolution, there is no such explicit designer. It may actually sound completely nonsensical to think that evolution could lead to self-destruction mechanisms, since an organism which destroys itself cannot spread its genes anymore.

However, evolution is a bit more complicated than just the survival of the fittest individual. It is widely appreciated that in evolutionary arguments, we should take into account not only the survival and reproduction of an individual, but also the survival and reproduction of the closest relatives. This leads to the concept of “inclusive fitness”, where the fitness of an individual takes into account the fitnesses of the relatives weighted by the proportion that they share genes. Close relatives of an individual spread partly the same genes anyway, so their survival is evolutionarily useful for that individual.

According to one suggestion, if a person is seriously ill, and finds himself a great burden to his relatives, it might actually be evolutionarily advantageous for that person to commit suicide. If this helps the relatives with whom he shares a large proportion of genes, the suicide might actually help in spreading those genes, thus increasing the inclusive fitness.6

Thus, self-destruction programs may be useful not only to maximize the utility of AI agents, but also from an evolutionary perspective. This may sound abhorrent from a moral perspective, but that is often the case with evolution which has no reason to be nice or good from a human perspective—as already argued in the preceding chapter, where I compared evolutionary desires to obsessions.

Self as self-preservation and survival

Another rather obvious reason why some kind of concept of self should be programmed in an AI is that the AI may need to protect itself against anything that might destroy it. A robot must take care not to be run over by a car: This is the concept of self-preservation. There is no doubt we can, and probably want to, program some kind of self-preservation mechanism in an AI agent.

Even the simplest biological organisms have behavioural programs that are activated when their existence is threatened; we talk about self-preservation, or survival instinct. We already encountered related ideas in considering definitions of pain and suffering. The widely-used IASP definition related pain to “tissue damage” (🡭), while Cassell’s definition of suffering talked about the “intactness of the person” (🡭). However, what we are talking about here is threats to the very existence of the agent, not just damage.

While it seems relatively straightforward to program self-preservation behaviours in an AI, an open question is whether an AI can somehow develop a survival instinct by itself. In other words, can self-preservation emerge without being explicitly programmed; can the agent learn to perform certain actions for the main purpose of avoiding its own destruction? This is one of the deepest questions in AI, extremely relevant from the viewpoint of developing safe AI systems, and the subject of intense debate.7 We have seen earlier that learning in AI can have various side-effects and unintended consequences; this would be one of the most extreme ones.

On the one hand, there are those who point out that biological organisms have developed their survival instinct via evolutionary mechanisms. They have been subject to natural selection, which has ruthlessly weened out those organisms which do not fight for their survival. In contrast—this line of argumentation goes—AI is not subject to natural selection; it has no evolutionary pressures. So, it will not learn a survival instinct, unless perhaps we explicitly decide to program it to learn one.

Other experts disagree and point out that some kind of survival instinct may be automatically created as an unintended side-effect of creating sufficiently intelligent machines. If a robot is given any mundane task, say fetching a bottle of milk from a near-by shop, a super-intelligent robot would understand that in order to perform that task, it has to stay alive. If the robot were damaged or destroyed in a collision with a car, for example, its task cannot be performed. Thus, the robot might decide to destroy the car somehow (let’s assume the robot is really big) to get the milk safely delivered. If everybody in the car gets killed, that is irrelevant, if the programmer didn’t tell the robot to avoid human casualties. The idea here is that there is no need to explicitly program a survival instinct, or any reward related to that: the general goal of maximizing future rewards will direct the robot’s behaviour towards avoiding destruction. In fact, this line of thinking means that almost any sufficiently intelligent AI will by logical necessity strive to survive. If it is intelligent enough, it will understand what death is, and how death makes it impossible to obtain any further rewards or accomplish goals. This is the opposite of what has happened in biological evolution, where even the very simplest organisms have a survival instinct, and sophisticated intelligence develops later. In AI, intelligence is programmed first, and later, possibly by chance, the AI might obtain a tendency for self-preservation behaviour and related information processing, which might then be called a survival “instinct”.

Clearly, these two views are based on very different assumptions about the AI. The argument where the robot understands that a car on a crash course has to be destroyed assumes a very, very intelligent robot. The robot must have a sophisticated model of the world, infer that it risks being overrun by the car, and understand that being overrun by the car will prevent it from delivering the milk. Most current robots would be nowhere near the intelligence required—but we don’t know if they will be in the future. We are even further away from an AI which could intellectually infer, on an abstract level, that there is such a thing as death, and that various measures should be taken to avoid it.

Nevertheless, if an AI is learning using evolutionary algorithms instead of the conventional gradient-based algorithms, it might be perfectly possible for an AI to obtain a survival instinct, even at the current level of AI development. As reviewed earlier (🡭), optimization procedures mimicking evolution are sometimes actually used in AI. Large-scale application of evolutionary methods definitely has the potential of creating a survival instinct in AI agents. It is a necessary logical consequence of fundamental evolutionary pressures: To spread its artificial “genes”, an agent has to survive long enough to produce offspring if the evolutionary optimization method is similar enough to biological evolution.

Self as desires based on internal rewards

Going back to our main topic, suffering, it is clear that both self-preservation and self-evaluation are important sources of suffering.8 First, it is well-known that depression and low self-esteem create suffering— and they are largely produced by the self-evaluation system. It is, in fact, rather easy to see this as a form of frustration, so it is very much in line with the ideas of the preceding chapters. Self-evaluation is based on a set standard of how good the self should be, in terms of how much reward it should be able to obtain. If such self-evaluation returns a negative result, that can be seen as a form of frustration, similar to reward loss. One could say that the agent had a long-term desire or goal to achieve that standard of average rewards, but the agent failed.

Second, self-preservation is obviously behind (physical) pain, which is signalling when damage is happening to the physical organism, according to the IASP definition of pain (🡭). The same idea was extended to suffering by Cassell’s definition (🡭). He emphasizes “loss of the intactness of person” or “threat” thereof, and that this applies not only to physical intactness but to further aspects such as one’s self-image. Replace his term “person” by “self”, and an interpretation related to the discussion in this chapter is clear: self-preservation mechanisms signalling threats to self—even in a very wide sense of the word—directly create suffering.

Thus, in line with the literature review in Chapter 2, we seem to have two different kinds of suffering related to self-needs. One is born from frustration, in this case based on self-evaluation, and easy to understand by the theories of the preceding chapters. The other kind of suffering comes from a threat to the self, and has only been considered in this chapter. From this viewpoint, we would see suffering related to survival as based on a mechanism which is fundamentally different from frustration, and thus rather different from anything we have treated in the preceding chapters. These two mechanisms might only be similar in the sense that both are forms of error signalling.

However, these seemingly different sources of suffering can be brought together by seeing self-preservation as a form of desire as well. In fact, self-preservation can be seen as a long-term goal or desire which can be frustrated: it is a desire to survive. This is in line with van Hooft’s theory of suffering (🡭), where different aspects of one’s being have different needs, ranging from biological survival to meaning of life. This shows how the two different mechanisms of suffering identified in Chapter 2 have a much closer connection than it might first seem.

In more computational terms, a direct way of linking self and desires is based on defining internal rewards (or intrinsic motivation). Reward is, by definition, what an AI agent ultimately wants when it is trained in the conventional framework. As we have seen, just wanting immediate reward is quite short-sighted: If the agent is intelligent enough, it will try to compute the state-value function and thus take future rewards into account. But, even the state-value function framework, with discounted future rewards, may not always provide the best practical solution to the problem of maximizing rewards. This is because the value function may be extremely difficult to learn: there may not be enough data to learn it, and even with enough data, it may be incredibly complex to compute.

Therefore, it has been found that it is often useful to program some additional rewards in the agent, in particular rewards that somehow improve its long-term functioning. That is, the system is programmed to receive internally generated reward signals in addition to actual, “external”, rewards. These internally generated reward signals are treated by the learning and planning systems just as if they were real reward signals. Such internal rewards lead to what is called “intrinsic motivation” for behaviour; it could also be called “intrinsic desire”.

Curiosity as an example of internal reward

As a practical example of such intrinsic reward, let us consider curiosity, which is widely used in current AI. The starting point here is that when an agent learns in a real environment, the data it receives is strongly influenced by its own actions. If the robot never enters a room, it will not know what is in that room. The action of deciding to enter or not to enter that room will strongly impact the data it gets about that room. This is a problem since usually, the agent does not know what kind of actions create useful data. Therefore, learning to act intelligently necessarily requires a lot of trial and error. That is, the agent just tries out what happens when you do something rather random in each possible situation. Such exploration is actually imposed on almost any agent learning by reinforcement learning. A very simple way of achieving that is to somehow randomize the actions: for example, in 1% or 10% of the time steps, the agent could take a completely random action just to see what happens.9

If you want to buy a new electronic gizmo you have never bought before, a basic exploration strategy would mean you just randomly enter different shops, try to buy it, and depending on whether they sold it to you or not and with what price, you slowly update your value function. Most of your time would probably be spent in trying to buy the gizmo in fashion stores that don’t stock any. Because your actions are quite random, you will end up going to the same stores several times, to the great annoyance of the shop assistants. Since you move around randomly, you easily end up going round and round in the same neighbourhood. Gathering data for reinforcement learning is thus particularly difficult because the agent needs to try out different actions, but if it is done completely randomly, much of the time it will take actions that are not very useful for learning, and don’t bring any reward either.

Here we come to the idea of curiosity. It means that the agent does not try out completely random actions, which is very inefficient, but there is an internal mechanism that steers the exploration in an intelligent way. What we are talking about here is designing an intrinsic reward system that leads to particularly intelligent exploration.10 Basically, the agent should try out new actions if they are informative. If the agent has never tried a certain action in a certain state, and it has no information that enables it to infer what such an action would do, it would be useful to just try it out. That is, instead of completely randomly trying out new actions, the agent should try out actions whose effects it does not know and cannot predict. This is a more sophisticated form of exploration, and similar to what we would call curiosity in humans: try out things which you never did before—but don’t repeat them once you’ve seen what happens! An intrinsic reward should then be given to the agent every time it successfully engages in such curious exploration and obtains new information.

Curiosity enables the agent to better learn the general structure of the world it is living in, since it will more systematically explore as many possibilities of action as possible. Such exploration can greatly improve future planning, since the agent will learn a better model of the world, and thus it indirectly contributes to future reward.11 In the gizmo shopping example above, you would not enter the same store twice, since re-entering the same shop gives little new information. You would actually get an internal reward for going to a different street, even a new neighbourhood, which certainly increases your chances of finding the right kind of store. It is likely that such curiosity has been programmed in animals by evolution.12

Programming self as internal rewards

Some aspects of the self could clearly be programmed as internal rewards. Self-preservation is obviously one: Most reasonable programmers would probably assign a large negative reward to the destruction of the agent, since losing the agent is certainly expensive in most cases. Then, the planning system will not go to states leading to the agent being destroyed. In fact, you would ideally program the agent so that it keeps quite far away from anything like destruction. This is possible by programming an internal reward which gives a negative reward at any state that is even close to destruction. In other words, any perceived threat triggers a negative internal reward signal.13 Thus, the agent tries to keep far away from threatening situations, as if it had a desire for self-preservation—or more generally for safety, meaning the absence of threats. This is how we can connect rewards to the IASP definition of pain and Cassell’s definition of suffering, where not only damage or loss of intactness causes pain and suffering, but a threat as well (“potential damage” in the IASP definition). If an unexpected threat appears, a negative internal reward (or internal “punishment”) is triggered, and that causes reward loss and frustration.14

As already mentioned, the self-evaluation system is clearly nothing else than an internal reward and punishment system, which steers the agent’s behaviour in a certain direction. The difference to ordinary rewards is not only that these self-evaluation rewards come from the internal evaluation system: another fundamental difference is that the self-evaluation system is giving internal rewards to the “learning to learn” system, which sets internal parameters of the system and works on a longer time scale. That system does not directly affect the plans made by the agent, but it tries to improve the general functioning of the planning system to improve all future planning. In animals, such internal rewards are presumably programmed by evolution, since their utility can only be seen in a very long time horizon.15

Self and suffering in Buddhist philosophy

In line with the concept of internal rewards, the Buddha mentions three different kinds of desires: desire for sense pleasures, desire to be, and desire not to be. While the first one can be interpreted as desire for rewards in the ordinary AI sense, the “desire to be” can be interpreted as desiring the self to simply be in the sense of surviving, and further that the self should be something particular. These two kinds of “desire to be” correspond to the self-needs as defined in this chapter. (The “desire not to be” is, in this interpretation, the desire that the self is not something which is considered bad.) Thus, even in early Buddhist philosophy, suffering related to self has been to some extent reduced to suffering related to desires and frustration.16

In later schools of Buddhism, the importance of self was greatly magnified, and some texts even seem to attribute all desires and all suffering to the existence of the “self” (sometimes translated as the “ego”) or attachment to it. This means viewing the connection between desires and self from the opposite angle, considering the self as the source of all desires—instead of the self being the target of some very specific desires as in this chapter. For example, self-preservation requires certain actions to be performed, certain goals to be set, and thus self-preservation leads to desire towards those particular goals.17

Uncertainty, unpredictability, and uncontrollability as internal frustration

Another important case of internal rewards concerns the properties of uncertainty, unpredictability, and uncontrollability. They are strongly related to frustration, as has been mentioned earlier, and will be considered in detail in later chapters: if the world is, say, uncontrollable, frustration is difficult to avoid. However, there is another remarkable connection: these properties seem to be sources of suffering in themselves as well.

One of the most robust findings in studies of economic decision-making is that humans are willing to pay money to reduce uncertainty. Such “risk aversion” can be evolutionarily advantageous and is observed even in animals.18 In addition to such rational economic calculations, uncertainty also feels unpleasant in the body, and it is an important factor in stress. Likewise, lack of control is usually considered detrimental to mental well-being. Psychological experiments show that lack of control and uncertainty even make physical pain feel worse.19

We can understand this interplay by using, again, the concept of internal rewards. It could very well be that uncertainty, unpredictability, or uncontrollability are suffering in themselves because they lead to frustration of specific internal rewards. If, say, controllability is lower than some expected standard, a frustration signal could be launched. That would be useful for learning because it signals that the agent has failed in learning about the environment; it should not have got itself into a situation where controllability is that low. This is equivalent to a self-evaluation system which considers that the agent should not be in situations that are uncertain, difficult to predict or difficult to control. By such mechanisms, uncontrollability, as well as uncertainty and unpredictability, can directly lead to suffering.20

Fear, threat, and frustration

Fear is a phenomenon that is central to understanding human suffering. It has an obvious connection to self-needs, in particular survival. In fact, it may seem a bit too abstract to talk about suffering as coming from a survival instinct as I did above: such suffering may always be mediated by a feeling of fear. Fear is actually a multifaceted phenomenon and we will consider various aspects of fear in later chapters (especially Chapter 8). For now, let us just look at fear from the viewpoint of self-needs, and investigate the fundamental mechanism for suffering operating in connection with fear.

Suppose you suddenly find yourself in the presence of a tiger in a jungle. You are likely to suffer at that very moment, but why exactly? It is not that you missed something you wanted to have, or some reward you anticipated, so this is not a case of typical frustration. (Nor is it obviously a case of aversion-based frustration, where you didn’t expect something unpleasant to happen but it did, because the tiger hasn’t yet attacked you.) The case is rather that you are predicting something terrible to happen with a non-negligible probability. Aristotle proposed that “Fear may be defined as a pain or disturbance due to a mental picture of some destructive or painful evil in the future21 (my italics). Such a prediction, or a “mental picture”, contains a threat and falls in the scope of Cassell’s definition of suffering due to threat to the intactness of the self or person.

Just as I discussed in connection with self-preservation, it could be argued that we should consider such suffering produced by fear as fundamentally different from frustration. Yet, we can again sketch an interpretation which frames fear as a form of frustration, based on internal rewards. First of all, fear is usually, if not always, accompanied by uncontrollability and unpredictability, which were just seen to produce frustration based on internal rewards and related reward loss. Furthermore, suppose there is another internal reward system based on evaluation of the threats related to different states, as proposed above (🡭): any state which is threatening, i.e. predictive of tissue damage or death, is internally programmed to lead to a negative internal reward. Thus, when entering the state where the tiger unexpectedly appears, such an internal reward system issues a negative reward, and this leads to reward loss, because you didn’t expect any such negative reward. This is a simple way of interpreting fear as a frustration: fear is due to the appearance of a threat which leads to internal reward loss.22

In this example, the planning system can amplify the frustration, because faced with a threat, planning may be launched with the goal state taken as any state where the threat is not present: You are frantically thinking about what to do to be safe. Planning is attempted, but it fails: No plan is found that would get rid of the threat, or if such a plan is found, its execution fails. Thus, any goal state that would be safe is not reached, and there is frustration even in the sense of plans failing.23

Fear and the level of intelligence

A simple AI agent might only generate the suffering signal when something bad happens, such as when it fails in its tasks—this is the basic case of frustration. Suppose a thermostat connected to a heating system tries to keep the room at a constant temperature. (This is actually a task that the nervous systems of many animals face as well.) It continually monitors the room temperature and adjusts its actions accordingly. Its function is based on a simple error signal created when the room gets too hot or too cold. When the temperature is suitable, there would be no error signals whatsoever, and certainly no suffering.

Now, if you make the thermostat very intelligent, so that it is able to predict the future and evaluate itself, perhaps even think about its own survival. Then, it might not only suffer when the room temperature is wrong, but also when it anticipates that that might happen. Your hyperintelligent thermostat might be reading the weather forecast on the internet. Suppose the forecast says tomorrow night will be exceptionally cold, beyond the capacities of the heating system; the thermostat anticipates that tomorrow night it will not be able to keep the temperature high enough. Thus, the thermostat suffers due to such a fear—at least in the computational sense.24

The extraordinary thing here is that the hyperintelligent thermostat suffers even long before anything bad happens, before, say, actual frustration is produced, merely by virtue of the newly appeared anticipation of negative reinforcement. Becoming more intelligent means the agent can predict bad things, suffer based on those predictions, and thus suffer much more than it did earlier. “One who fears suffering is already suffering from what he fears” according to Michel de Montaigne.25 Humans suffer enormously because they are too intelligent in this sense, and prone to thinking too much about the future—a theme I will return to in Chapter 9 where I talk about simulation of the future.26

Yet, if we humans are so incredibly intelligent, why cannot we just decide not to fear anything? Why cannot we take Montaigne’s point seriously: He suggested—actually talking about his chronic pain due to kidney stones—that there is no point in imagining or anticipating future pain since that simply induces more suffering? This is a complex question where part of the answer is the dual-process nature of human cognition, which will be treated in the following chapters.