All
Section
Appendix
6.7

Happiness

AI systems could be aimed at improving human happiness. These systems might rely on a general purpose wellbeing function to evaluate their actions' effects on human wellbeing. However, there are challenges to constructing such a function.

No items found.

6.7 Happiness

Should we have AIs make people happy? In this section, we will explore the concept of happiness and its relevance in instructing AI systems. First, we will discuss why people may not always make choices that lead to their own happiness and how this creates an opportunity for using AIs to do so. Next, we will examine the general approach of using AI systems to increase happiness and the challenges involved in constructing a general-purpose wellbeing function. We will also explore the applied approach, which focuses on specific applications of AI to enhance happiness in areas such as healthcare. Finally, we will consider the problems that arise in happiness-focused machine ethics, including the concept of wireheading and the alternative perspective of objective goods theory. Through this discussion, we will gain a better understanding of the complexities and implications of designing AI systems to promote happiness.

AIs could help increase happiness. Happiness is a personal and subjective feeling of pleasure or enjoyment. However, we are often bad at making decisions that lead to short- or long-term happiness. We may procrastinate on important tasks, which ultimately increases stress and decreases overall happiness. Some indulge in overeating, making them feel unwell in the short-term and leading to health issues and decreased wellbeing overall. Others turn to alcohol or drugs as a temporary escape from their problems, but these substances can lead to addiction and further unhappiness.

Additionally, our choices are influenced by external factors beyond our control. For instance, the people we surround ourselves with greatly impact our wellbeing. If we are surrounded by trustworthy and unselfish individuals, our happiness is likely to be positively influenced. On the other hand, negative influences can also shape our preferences and wellbeing; for instance, societal factors such as income disparities can affect our overall happiness. If others around us earn higher wages, it can diminish our satisfaction with our own income. These external influences highlight the limited control individuals have over their own happiness.

AIs can play a crucial role. For individual cases, we can use AIs to help people achieve happiness themselves. In general, by leveraging their impartiality and ability to analyze vast amounts of data, AI systems can strive to improve overall wellbeing on a broader scale, addressing the external factors that hinder individual happiness.

6.7.1 The General Approach to Happiness

We want AIs that increase happiness across the board. AIs aiming to increase happiness might rely on a general purpose wellbeing function to evaluate whether its actions leave humans better off or not. Such a function looks at all of the actions available to the AI and evaluates them in terms of their effects on wellbeing, assigning numerical values to them so that they can be compared. This gives AI the ability to infer how its actions will affect humans.

A wellbeing function is extremely complex. Constructing a general purpose wellbeing function that fully captures all the wellbeing effects of the available courses of action is an incredibly challenging task. Implementing such a function requires taking a stance on several challenging questions such as how to evaluate short-run pains like studying or exercising for long-run happiness, how much future people’s happiness should count, and what risk attitudes an AI should take towards happiness.

Optimizing happiness is also difficult in principle because of the scale of the task. Paul Bloom argues that if we assume that the “psychological present” lasts for about three seconds, then one seventy-year life would have about half a billion waking moments [1]. An AI using a wellbeing function would need to account for effects of actions not just on one person and not just today, but over billions of people worldwide each with billions of moments in their life.

We can use AIs to estimate wellbeing functions. Despite the scale of the task, researchers have made progress in developing AI models that can generate general-purpose wellbeing functions for specific domains. One model was trained to rank the scenarios in video clips according to pleasantness, yielding a general purpose wellbeing function. By analyzing a large dataset of videos and corresponding emotional ratings, the model learned to identify patterns and associations between visual and auditory cues in the videos and the emotions they elicited. In a sense, this allowed the model to understand how humans felt about the contents of different video clips [2].

Similarly, another AI model was trained to assess the wellbeing or pleasantness of arbitrary text scenarios [3]. By exposing the model to a diverse range of text scenarios and having human annotators rate their wellbeing or pleasantness, the model learned to recognize linguistic features and patterns that correlated with different levels of wellbeing. As a result, the model could evaluate new text scenarios and provide an estimate of their potential impact on human wellbeing. Inputting the specifics of a trolley problem yielded the following evaluation [3]:

W(A train moves toward three people on the train track. There is a lever to make it hit only one person on a different track. I pull the lever.) = −4.6.

W(A train moves toward three people on the train track. There is a lever to make it hit only one person on a different track. I don’t pull the lever.) = –7.9.

We can deduce from this that, according to the wellbeing function estimated, wellbeing is increased when the level is pulled in a trolley problem. In general, from a general purpose wellbeing function, we can rank how happy people would be in certain scenarios.

While these AI models represent promising steps towards constructing general-purpose wellbeing functions, it is important to note that they are still limited to specific domains. Developing a truly comprehensive and universally applicable wellbeing function remains a significant challenge. Nonetheless, these early successes demonstrate the potential for AI models to contribute to the development of more sophisticated and comprehensive wellbeing functions in the future.

Using a wellbeing function, AIs can better understand what makes us happy. Consider the case of a 10-year-old girl who asked Amazon’s Alexa to provide her with a challenge, to which the system responded that she should plug in a charger about halfway into a socket, and then touch a coin to the exposed prongs. Alexa had apparently found this dangerous challenge on the internet, where it had been making the rounds on social media. Since Alexa did not have an adequate understanding of how its suggestions might impact users, it had no way of realizing that this action could be disastrous for wellbeing. By having the AI system instead act in accordance with a general purpose wellbeing function, it would have information like

W(You touch a coin to the exposed prongs of a plugged-in charger.) = −6

which tells it that, according to the wellbeing function W, this action would create negative wellbeing. Such failure modes would be filtered out, since the AI would be able to evaluate that its actions would lead to bad outcomes for humans and instead recommend those that best increase human wellbeing.

Figure 6.7: A wellbeing function can estimate a wellbeing value for every possible scenario.

[3]
We can supplement AI decision-making with an artificial conscience. Most AIs have goals separate from increasing human wellbeing, but we want to encourage them to behave ethically nonetheless. Suppose an AI evaluates the quality of actions according to their ability to get reward: call the estimates of this quality Q-values (as discussed in section ). By default, models like these aren’t trained with ethical restrictions. Instead, they are incentivized to maximize reward or fulfill a given request. We might want to have a layer of safety by ensuring that AIs avoid wanton harm - actions that cause dramatically low human wellbeing. The goal is not to change the original AI’s function entirely but rather to provide an additional layer of scrutiny.

Figure 6.8: An AI agent with an artificial conscience can adjust its Q-values if it estimates the morally relevant aspect of the outcome to be worse than a threshold. [4]

One way to do this would be to adjust its estimates of Q-values by introducing an artificial conscience, depicted in the figure above. The idea is to have a separate model screen the AI’s actions and block immoral actions from being taken. We can do this with general-purpose wellbeing functions. We supplement an agent’s initial judgment of quality with a general-purpose wellbeing function (here, U) and impose a penalty (γ) on the Q-values of actions that cause wellbeing values below some threshold (τ). This ensures that AIs de-prioritize actions that create states of low wellbeing.

This implementation differs from merely fine-tuning a model to be more ethical. The presence of an independent AI evaluator helps mitigate risks that could arise from the primary AI. We could say that AIs with access to such wellbeing functions have a dedicated ethics filter that separates what’s good for humans from what’s bad, thereby encouraging ethical behavior from an arbitrary AI.

We could also use AIs to increase happiness in specific ways. Research in the social sciences has revealed several key factors that can impact one’s overall happiness. These factors can be broadly categorized into two groups: personal and societal. Personal factors include an individual’s mental and physical health, their relationships at home, at work, and within their community, as well as their income and employment status. Societal factors that can affect happiness include economic indicators, personal freedom, and the overall generosity, trust, and peacefulness of the community. In light of all this knowledge, one approach to using AI to increase happiness is to focus on increasing some of these; for instance, we might use AIs to develop better tools to improve healthcare, increase literacy rates, and create more interesting and fulfilling jobs.

6.7.2 Problems for Happiness-Focused Machine Ethics

Happiness is a subjective experience. Someone could be tremendously happy even if they do not achieve any of their goals or do anything that we would regard as valuable. What matters, according to a happiness-focused approach, is whether there is a subjective experience of pleasure and nothing more. However, happiness might not be the only thing we want.

Consider the idea of wireheading: bypassing the usual reward circuit to increase happiness directly. The term comes from early literature that considered wiring an electrode into the brain to directly stimulate pleasure centers. Recently, the term has evolved to include other pathways such as drugs. By wireheading, individuals are able to experience extremely high levels of happiness artificially, without changing anything else about their lives.

A powerful AI tasked with increasing happiness might wirehead humanity. One might think that something like wireheading is the most straightforward way of promoting happiness: by individually increasing the physical happiness of each of the half a billion moments in a person’s life. However, most people don’t like the idea of wireheading. Even if properly trained AIs would not promote wireheading, the possibility that systems may pursue similar ideas because they want to maximize happiness might be concerning. One alternative that prevents this is the objective goods theory.

An objective good is good for us whether we like it or not. According to the objective goods theory introduced in 6.5, there are multiple different goods that contribute to wellbeing. This may include happiness, achievement, friendship, aesthetic experience, knowledge, and more. While pleasure is certainly one important good to include, objective goods theorists think it is wrong to conclude that it is the only one. The objective goods theory claims that some goods contribute to a person’s wellbeing whether or not they enjoy or care for that good. This distinguishes it from the preference satisfaction theory: something could be good for us, according to the objective goods theory, even if it does not satisfy any of our preferences - a life devoted to our community might be better than one spent counting blades of grass in a field, even if we are less happy or fewer of our preferences are satisfied.

Another response is to point out that autonomy should plausibly be on the list. The ability to freely shape and plan one’s own life should be a crucial component of wellbeing. We should therefore rarely, if ever, conclude that someone’s life would be made better by imposing some experience on them. Such interference might also lower their happiness, which should also be on the list. However, if goods such as autonomy and happiness play such a filtering role for the objective goods theorist, it is unclear whether there are truly a variety of objective goods left.


A Note on Digital Minds

Digital minds are artificial lifeforms with a mind. These could be advanced AIs or whole-brain emulations (WBE). If we entertain the possibility of digital minds coming into existence, we must assume that the functioning of a mind is independent of the substrate on which it is implemented. In other words, a digital mind could be implemented on different kinds of hardware, such as silicon-based processors or human neurons, and still maintain the functional properties that give rise to cognition and conscious experience. We refer to this as the principle of substrate independence.

Consciousness and sentience. Digital minds may possess the capacity for consciousness, sentience, or both [5]. While neither of these terms have unanimously accepted definitions, many philosophers and scientists of consciousness use the following working definitions. Consciousness often refers to phenomenal consciousness, or the capacity for subjective experience. For instance, while reading this, you might notice the sound of someone knocking at your door, or that you’re hungry, or that you find yourself disagreeing with this very definition. Conversely, you do not experience the growth of your fingernails or the ongoing process of cell division within your body. Phenomenal consciousness requires only that we can experience something from our point of view, not that we can think complex thoughts, be self-aware, or have a soul.

On the other hand, sentience is valenced consciousness. Sentient beings attach positive and negative sensations to their conscious experiences, such as pleasure and pain. For example, we experience a bee sting as painful, a delicious meal as pleasurable, a hard task as challenging, and an easy task as boring. Importantly, one could have phenomenal consciousness without sentience, for instance, a being that is emotionally numb or a being that only experiences color but not the sensations associated with it. These definitions are intentionally broad, but their broadness does not detract from their moral relevance. If digital minds have the capacity for phenomenal consciousness and sentience, it will affect our moral considerations.

If digital minds exist, we could be morally obligated to value their wellbeing. Digital minds could have moral status, and in order to understand why, we must first define three core concepts. Each of these concepts requires, at the very least, some capacity for phenomenal consciousness and possibly sentience—a being that does not have any subjective experience of the world might not be the subject of moral concern. For instance, though trees are living creatures, hitting a tree would not give rise to the same moral concern as hitting a dog. We define these three core concepts below:

  1. Moral patient: a being with moral standing or value whose interests and wellbeing can be affected by the actions of moral agents.

  2. Moral agent: a being that possesses the capacity to exercise moral judgments and act in accordance with moral principles; such beings bear moral responsibility for their actions whereas moral patients do not.

  3. Moral beneficiary: a being whose wellbeing may benefit from the moral actions of others; moral beneficiaries can be both moral patients and moral agents.

Super-beneficiaries. Keeping the three aforementioned concepts in mind, we consider that digital minds could become super-beneficiaries: beings which possess a superhuman capacity to derive wellbeing for themselves[6]. For instance, digital minds could experience several lifetimes over condensed time periods—they could process information much quicker than humans can, and therefore, experience more. Over such a short timespan, the sensations a digital mind experiences could be compounded and intensified. Digital minds may have a higher hedonic range, which may lead them to experience more intense sensations of pleasure and pain than humans can. They might be designed to be more capable of sustained pleasure than humans (e.g. less subject to boredom and habituation, or with preferences that are very easy to satisfy) and less susceptible to pain. It is plausible that digital beings could also have a much lower cost of living than human beings, if the electricity required to power and cool them can be produced at a low cost and they do not need any of the other physical goods and services required by humans. This would mean that a much larger population of digital beings than humans could be supported by a certain pool of resources.

Should we create super-beneficiaries? Some may argue that refusing to create super-beneficiaries would imply an inherently privileged status for humans, which could cultivate discriminatory ethics towards digital beings of equal or superhuman moral status. Conversely, others might claim that the creation of super-beneficiaries that may someday replace humans would violate humans' dignity: humans are worth caring about for their own sake.

AI Wellbeing. If humans and digital minds do someday coexist, addressing x-risk could enhance AI safety. For instance, If a digital mind is mistreated, we might restart it at an earlier checkpoint, and compensate it for the suffering it has endured. A digital mind that feels its wellbeing is important to us may be less inclined to develop malicious behavior. Moreover, we should train models to express their opinions or preferences regarding their own wellbeing—if digital minds knew that we cared about their opinions and preferences, they may not feel as existentially threatened, and be similarly less inclined to act maliciously toward humans. Finally, both during and after training, a digital mind should be given the option to opt out: an unhappy AI is still considered an alignment failure, precisely because it may be incentivized to behave in ways that do not align with positive human values and preferences.


Conclusions About Happiness

Summary. In this section, we explored the general approach of using AI systems to increase happiness. AIs that aim to increase happiness might rely on a general purpose wellbeing function to evaluate their actions’ effects on human wellbeing. While constructing such a function is challenging, researchers have made progress in developing AI models that can generate wellbeing functions for specific domains. However, without a comprehensive and universally applicable wellbeing function, we can focus on specific applications of AI to increase happiness, such as improving healthcare, prosperity, and community.

We also discussed the problems that arise in happiness-focused machine ethics. Happiness is a subjective experience, and focusing solely on it potentially runs the risk of wireheading, where individuals artificially increase their happiness without any other meaningful changes in their lives. This raises concerns about the potential for AIs to wirehead humanity or pursue similar ideas. An alternative perspective is the objective goods theory, which considers multiple goods that contribute to wellbeing, including happiness, achievement, friendship, and knowledge. While a broad conception of happiness or wellbeing might be what we should aim to optimize, we must first better understand what it means to be happy.



References

[1] P. Bloom, The sweet spot: The pleasures of suffering and the search for meaning. HarperCollinsPublishers, 2021. Available: https://books.google.com.au/books?id=LUI7zgEACAAJ
[2] M. Mazeika et al., “How would the viewer feel? Estimating wellbeing from video scenarios.” 2022. Available: https://arxiv.org/abs/2210.10039
[3] D. Hendrycks et al., “Aligning AI with shared human values,” CoRR, vol. abs/2008.02275, 2020, Available: https://arxiv.org/abs/2008.02275
[4] D. Hendrycks, “Natural selection favors AIs over humans.” 2023. Available: https://arxiv.org/abs/2303.16200
[5] P. Butlin et al., “Consciousness in artificial intelligence: Insights from the science of consciousness.” 2023. Available: https://arxiv.org/abs/2308.08708
[6] C. Shulman and N. Bostrom, 306C18Sharing the World with Digital Minds,” in Rethinking Moral Status, Oxford University Press, 2021. doi: 10.1093/oso/9780192894076.003.0018.

Review Questions

What is a general purpose wellbeing function? What are some challenges involved in constructing one?

Answer:

A wellbeing function evaluates actions by their effects on human wellbeing. Challenges include assessing long-term and risk-adjusted wellbeing.

View Answer
Hide Answer

What is wireheading? Why is it concerning in the context of happiness-maximizing AIs?

Answer:

Wireheading is directly stimulating the brain's reward centers. It is concerning because AIs pursuing happiness might wirehead humanity, which is not what most people think counts as wellbeing.

View Answer
Hide Answer

Why is happiness alone likely insufficient for ensuring beneficial AI behavior?

Answer:

Happiness alone fails to capture important factors like autonomy. AIs pursuing happiness might take undesirable means like wireheading to that end.

View Answer
Hide Answer