To understand how the consequences of developing more powerful AI will play out, it is essential to internalize some basics. This and the next two sections develop these, covering in turn what modern AI is, how it leverages massive computations, and the senses in which it is rapidly growing in generality and capability.5
There are many ways to define artificial intelligence, but for our purposes the key property of AI is that while a standard computer program is a list of instructions for how to perform a task, an AI system is one that learns from data or experience to perform tasks without being explicitly told how to do so.
Almost all salient modern AI is based on neural networks. These are mathematical/computational structures, represented by a very large (billions or trillions) set of numbers (“weights”), that perform a training task well. These weights are crafted (or perhaps “grown” or “found”) by iteratively tweaking them so that the neural network improves a numerical score (a.k.a. “loss”) defined toward performing well at one or more tasks.6 This process is known as training the neural network.7
There are many techniques for doing this training, but those details are much less relevant than the ways in which the scoring is defined, and how those result in different tasks the neural network performs well. A key difference has historically been drawn between “narrow” and “general” AI.
Narrow AI is deliberately trained to do a particular task or small set of tasks (such as recognizing images or playing chess); it requires retraining for new tasks, and has a narrow scope of capability. We have superhuman narrow AI, meaning that for nearly any discrete well-defined task a person can do, we can probably construct a score and then successfully train a narrow AI system to do it better than a human could.
General-purpose AI (GPAI) systems can perform a wide range of tasks, including many they were not explicitly trained for; they can also learn new tasks as part of their operation. Current large “multimodal models”8 like ChatGPT exemplify this: trained on a very large corpus of text and images, they can engage in complex reasoning, write code, analyze images, and assist with a vast array of intellectual tasks. While still quite different from human intelligence in ways we’ll see in depth below, their generality has caused a revolution in AI.9
A key difference between AI systems and conventional software is in predictability. Standard software’s output can be unpredictable – indeed sometimes that’s why we write software, to give us results we could not have predicted. But conventional software rarely does anything it was not programmed to do – its scope and behavior are generally as designed. A top-tier chess program may make moves no human could predict (or else they could beat that chess program!) but it will not generally do anything but play chess.
Like conventional software, narrow AI has predictable scope and behavior but can have unpredictable results. This is really just another way to define narrow AI: as AI that is akin to conventional software in its predictability and range of operation.
General-purpose AI is different: its scope (the domains over which it applies), behavior (the sorts of things it does), and results (its actual outputs) can all be unpredictable.10 GPT-4 was trained just to generate text accurately, but developed many capabilities its trainers didn’t predict or intend. This unpredictability stems from the complexity of training: because the training data contains outputs from many different tasks, the AI must effectively learn to perform these tasks to predict well.
This unpredictability of general AI systems is quite fundamental. Although in principle it is possible to carefully construct AI systems that have guaranteed limits on their behavior (as mentioned later in the essay), the way AI systems are created now they are unpredictable in practice and even in principle.
This unpredictability becomes particularly important when we consider how AI systems are actually deployed and used to achieve various goals.
Many AI systems are relatively passive in the sense that they primarily provide information, and the user takes actions. Others, commonly termed agents, take actions themselves, with varying levels of involvement from a user. Those that take actions with relatively less external input or oversight may be termed more autonomous. This forms a spectrum in terms of independence of action, from passive tools to autonomous agents.11
As for goals of AI systems, these may be directly tied to their training objective (e.g. the goal of “winning” for a Go-playing system is also explicitly what it was trained to do). Or they may not be: ChatGPT’s training objective is in part to predict text, in part to be a helpful assistant. But when doing a given task, its goal is supplied to it by the user. Goals may also be created by an AI system itself, only very indirectly related to its training objective.12
Goals are closely tied to the question of “alignment,” that is the question of whether AI systems will do what we want them to do. This simple question hides an enormous level of subtlety.13 For now, note that “we” in this sentence might refer to many different people and groups, leading to different types of alignment. For example, an AI might be highly obedient (or “loyal”) to its user – here “we” is “each of us.” Or it might be more sovereign, being primarily driven by its own goals and constraints, but still acting broadly in the common interest of human wellbeing – “we” is then “humanity” or “society.” In-between is a spectrum where an AI would be largely obedient, but might refuse to take actions that harm others or society, violate the law, etc.
These two axes – level of autonomy and type of alignment – are not entirely independent. For example, a sovereign passive system, while not quite self-contradictory, is a concept in tension, as is an obedient autonomous agent.14 There’s a clear sense in which autonomy and sovereignty tend to go hand-in-hand. In a similar vein, predictability tends to be higher in “passive” and “obedient” AI systems, whereas sovereign or autonomous ones will tend to be more unpredictable. All of this will be crucial for understanding the ramifications of potential AGI and superintelligence.
Creating truly aligned AI, of whatever flavor, requires solving three distinct challenges:
The distinction between reliable behavior and genuine care is crucial. Just as a human employee might follow orders perfectly while lacking any real commitment to the organization’s mission, an AI system might act aligned without truly valuing human preferences. We can train AI systems to say and do things through feedback, and they can learn to reason about what humans want. But making them genuinely value human preferences is a far deeper challenge.15
The profound difficulties in solving these alignment challenges, and their implications for AI risk, will be explored further below. For now, understand that alignment is not just a technical feature we tack on to AI systems, but a fundamental aspect of their architecture that shapes their relationship with humanity.