To understand how the consequences of developing more powerful AI will play out, it is essential to internalize some basics. This and the next two sections develop these, covering in turn what modern AI is, how it leverages massive computations, and the senses in which it is rapidly growing in generality and capability.5
There are many ways to define artificial intelligence, but for our purposes the key property of AI is that while a standard computer program is a list of instructions for how to perform a task, an AI system is one that learns from data or experience to perform tasks without being explicitly told how to do so.
Almost all salient modern AI is based on neural networks. These are mathematical/computational structures, represented by a very large (billions or trillions) set of numbers (“weights”), that perform a training task well. These weights are crafted (or perhaps “grown” or “found”) by iteratively tweaking them so that the neural network improves a numerical score (a.k.a. “loss”) defined toward performing well at one or more tasks.6 This process is known as training the neural network.7
There are many techniques for doing this training, but those details are much less relevant than the ways in which the scoring is defined, and how those result in different tasks the neural network performs well. A key difference has historically been drawn between “narrow” and “general” AI.
Narrow AI is deliberately trained to do a particular task or small set of tasks (such as recognizing images or playing chess); it requires retraining for new tasks, and has a narrow scope of capability. We have superhuman narrow AI, meaning that for nearly any discrete well-defined task a person can do, we can probably construct a score and then successfully train a narrow AI system to do it better than a human could.
General-purpose AI (GPAI) systems can perform a wide range of tasks, including many they were not explicitly trained for; they can also learn new tasks as part of their operation. Current large “multimodal models”8 like ChatGPT exemplify this: trained on a very large corpus of text and images, they can engage in complex reasoning, write code, analyze images, and assist with a vast array of intellectual tasks. While still quite different from human intelligence in ways we’ll see in depth below, their generality has caused a revolution in AI.9
A key difference between AI systems and conventional software is in predictability. Standard software’s output can be unpredictable – indeed sometimes that’s why we write software, to give us results we could not have predicted. But conventional software rarely does anything it was not programmed to do – its scope and behavior are generally as designed. A top-tier chess program may make moves no human could predict (or else they could beat that chess program!) but it will not generally do anything but play chess.
Like conventional software, narrow AI has predictable scope and behavior but can have unpredictable results. This is really just another way to define narrow AI: as AI that is akin to conventional software in its predictability and range of operation.
General-purpose AI is different: its scope (the domains over which it applies), behavior (the sorts of things it does), and results (its actual outputs) can all be unpredictable.10 GPT-4 was trained just to generate text accurately, but developed many capabilities its trainers didn’t predict or intend. This unpredictability stems from the complexity of training: because the training data contains outputs from many different tasks, the AI must effectively learn to perform these tasks to predict well.
This unpredictability of general AI systems is quite fundamental. Although in principle it is possible to carefully construct AI systems that have guaranteed limits on their behavior (as mentioned later in the essay), the way AI systems are created now they are unpredictable in practice and even in principle.
This unpredictability becomes particularly important when we consider how AI systems are actually deployed and used to achieve various goals.
Many AI systems are relatively passive in the sense that they primarily provide information, and the user takes actions. Others, commonly termed agents, take actions themselves, with varying levels of involvement from a user. Those that take actions with relatively less external input or oversight may be termed more autonomous. This forms a spectrum in terms of independence of action, from passive tools to autonomous agents.11
As for goals of AI systems, these may be directly tied to their training objective (e.g. the goal of “winning” for a Go-playing system is also explicitly what it was trained to do). Or they may not be: ChatGPT’s training objective is in part to predict text, in part to be a helpful assistant. But when doing a given task, its goal is supplied to it by the user. Goals may also be created by an AI system itself, only very indirectly related to its training objective.12
Goals are closely tied to the question of “alignment,” that is the question of whether AI systems will do what we want them to do. This simple question hides an enormous level of subtlety.13 For now, note that “we” in this sentence might refer to many different people and groups, leading to different types of alignment. For example, an AI might be highly obedient (or “loyal”) to its user – here “we” is “each of us.” Or it might be more sovereign, being primarily driven by its own goals and constraints, but still acting broadly in the common interest of human wellbeing – “we” is then “humanity” or “society.” In-between is a spectrum where an AI would be largely obedient, but might refuse to take actions that harm others or society, violate the law, etc.
These two axes – level of autonomy and type of alignment – are not entirely independent. For example, a sovereign passive system, while not quite self-contradictory, is a concept in tension, as is an obedient autonomous agent.14 There’s a clear sense in which autonomy and sovereignty tend to go hand-in-hand. In a similar vein, predictability tends to be higher in “passive” and “obedient” AI systems, whereas sovereign or autonomous ones will tend to be more unpredictable. All of this will be crucial for understanding the ramifications of potential AGI and superintelligence.
Creating truly aligned AI, of whatever flavor, requires solving three distinct challenges:
The distinction between reliable behavior and genuine care is crucial. Just as a human employee might follow orders perfectly while lacking any real commitment to the organization’s mission, an AI system might act aligned without truly valuing human preferences. We can train AI systems to say and do things through feedback, and they can learn to reason about what humans want. But making them genuinely value human preferences is a far deeper challenge.15
The profound difficulties in solving these alignment challenges, and their implications for AI risk, will be explored further below. For now, understand that alignment is not just a technical feature we tack on to AI systems, but a fundamental aspect of their architecture that shapes their relationship with humanity.
We may be at the end of the human era.
Something has begun in the past ten years that is unique in the history of our species. Its consequences will, to a great extent, determine the future of humanity. Starting around 2015, researchers have succeeded in developing narrow artificial intelligence (AI) – systems that can win at games like Go, recognize images and speech, and so on, better than any human.1
This is amazing success, and it is yielding extremely useful systems and products that will empower humanity. But narrow artificial intelligence has never been the true goal of the field. Rather, the aim has been to create general purpose AI systems, particularly ones often called “artificial general intelligence” (AGI) or “superintelligence” that are simultaneously as good or better than humans across nearly all tasks, just as AI is now super-human at Go, chess, poker, drone racing, etc. This is the stated goal of many major AI companies.2
These efforts are also succeeding. General-purpose AI systems like ChatGPT, Gemini, Llama, Grok, Claude, and Deepseek, based on massive computations and mountains of data, have reached parity with typical humans across a wide variety of tasks, and even match human experts in some domains. Now AI engineers at some of the largest technology companies are racing to push these giant experiments in machine intelligence to the next levels, at which they match and then exceed the full range of human capabilities, expertise, and autonomy.
This is imminent. Over the last ten years, expert estimates for how long this will take – if we continue our present course – have fallen from decades (or centuries) to single-digit years.
It is also of epochal importance, and transcendent risk. Proponents of AGI see it as a positive transformation that will solve scientific problems, cure disease, develop new technologies, and automate drudgery. And AI could certainly help to achieve all of these things – indeed it already is. But over the decades, many careful thinkers, from Alan Turing to Stephen Hawking to the present-day Geoffrey Hinton and Yoshua Bengio3 have issued a stark warning: building truly smarter-than-human, general, autonomous AI will at minimum completely and irrevocably upend society, and at maximum result in human extinction.4
Superintelligent AI is rapidly approaching on our current path, but is far from inevitable. This essay is an extended argument as to why and how we should close the Gates to this approaching inhuman future, and what we should do instead.
Dramatic advances in artificial intelligence over the past decade (for narrow-purpose AI) and the last several years (for general-purpose AI) have transformed AI from a niche academic field to the core business strategy of many of the world’s largest companies, with hundreds of billions of dollars in annual investment in the techniques and technologies for advancing AI’s capabilities.
We now come to a critical juncture. As the capabilities of new AI systems begin to match and exceed those of humans across many cognitive domains, humanity must decide: how far do we go, and in what direction?
AI, like every technology, started with the goal of improving things for its creator. But our current trajectory, and implicit choice, is an unchecked race toward ever-more powerful systems, driven by economic incentives of a few huge technology companies seeking to automate large swathes of current economic activity and human labor. If this race continues much longer, there is an inevitable winner: AI itself – a faster, smarter, cheaper alternative to people in our economy, our thinking, our decisions, and eventually in control of our civilization.
But we can make another choice: via our governments, we can take control of the AI development process to impose clear limits, lines we won’t cross, and things we simply won’t do – as we have for nuclear technologies, weapons of mass destruction, space weapons, environmentally destructive processes, the bioengineering of humans, and eugenics. Most importantly, we can ensure that AI remains a tool to empower humans, rather than a new species that replaces and eventually supplants us.
This essay argues that we should keep the future human by closing the “gates” to smarter-than-human, autonomous, general-purpose AI – sometimes called “AGI” – and especially to the highly-superhuman version sometimes called “superintelligence.” Instead, we should focus on powerful, trustworthy AI tools that can empower individuals and transformatively improve human societies’ abilities to do what they do best. The structure of this argument follows in brief.
AI systems are fundamentally different from other technologies. While traditional software follows precise instructions, AI systems learn how to achieve goals without being explicitly told how. This makes them powerful: if we can cleanly define the goal or a metric of success, in most cases an AI system can learn to achieve it. But it also makes them inherently unpredictable: we cannot reliably determine what actions they will take to achieve their objectives.
They are also largely unexplainable: although they are partly code, they are mostly an enormous set of inscrutable numbers – neural network “weights” – that cannot be parsed; we are not much better at understanding their inner workings than at discerning thoughts by peering inside a biological brain.
This core mode of training digital neural networks is rapidly increasing in complexity. The most powerful AI systems are created through massive computational experiments, using specialized hardware to train neural networks on enormous datasets, which are then augmented with software tools and superstructure.
This has led to the creation of very powerful tools for creating and processing text and images, performing mathematical and scientific reasoning, aggregating information, and interactively querying a vast store of human knowledge.
Unfortunately, while development of more powerful, more trustworthy technological tools is what we should do, and what nearly everybody wants and says they want, it is not the trajectory we are actually on.
Since the dawn of the field, AI research has instead focused on a different goal: Artificial General Intelligence. This focus has now become the focus of the titanic companies leading AI development.
What is AGI? It is often vaguely defined as “human-level AI,” but this is problematic: which humans, and at which capabilities is it human level? And what about the super-human capabilities it already has? A more useful way to understand AGI is through the intersection of three key properties: high Autonomy (independence of action), high Generality (broad scope and adaptability), and high Intelligence (competence at cognitive tasks). Current AI systems may be highly capable but narrow, or general but requiring constant human oversight, or autonomous but limited in scope.
Full A-G-I would combine all three properties at levels matching or exceeding top human capability. Critically, it is this combination that makes humans so effective and so different from current software; it is also what would enable people to be wholesale replaced by digital systems.
While human intelligence is special, it is by no means a limit. Artificial “superintelligent” systems could operate hundreds of times faster, parse vastly more data and hold enormous quantities “in mind” at once, and form aggregates that are much larger and more effective than collections of humans. They could supplant not individuals but companies, nations, or our civilization as a whole.
There is a strong scientific consensus that AGI is possible. AI already surpasses human performance in many general tests of intellectual capability, including recently high-level reasoning and problem solving. Lagging capabilities – such as continual learning, planning, self-awareness, and originality – all exist at some level in present AI systems, and known techniques exist that are likely to improve all of them.
While until a few years ago many researchers saw AGI as decades away, currently evidence for short timelines to AGI is strong:
The idea that smarter-than-human AGI is decades away or more is simply no longer tenable to the vast majority of experts in the field. Disagreements now are about how many months or years it will take if we stay on this course. The core question we face is: should we?
The race toward AGI is being driven by multiple forces, each making the situation more dangerous. Major technology companies see AGI as the ultimate automation technology – not just augmenting human workers but replacing them largely or entirely. For companies, the prize is enormous: the opportunity to capture a significant fraction of the world’s $100 trillion annual economic output by automating away human labor costs.
Nations feel compelled to join this race, publicly citing economic and scientific leadership, but privately viewing AGI as a potential revolution in military affairs comparable to nuclear weapons. Fear that rivals might gain a decisive strategic advantage creates a classic arms race dynamic.
Those pursuing superintelligence often cite grand visions: curing all diseases, reversing aging, achieving breakthroughs in energy and space travel, or creating superhuman planning capabilities.
Less charitably, what drives the race is power. Each participant – whether company or country – believes that intelligence equals power, and that they will be the best steward of that power.
I argue that these motivations are real but fundamentally misguided: AGI will absorb and seek power rather than grant it; AI-created technologies will also be strongly double-edged, and where beneficial can be created with AI tools and without AGI; and even insofar as AGI and its outputs remain under control, these racing dynamics – both corporate and geopolitical – make large-scale risks to our society nearly inevitable unless decisively interrupted.
Despite their allure, AGI and superintelligence pose dramatic threats to civilization through multiple reinforcing pathways:
Power concentration: superhuman AI could disempower the vast majority of humanity by absorbing huge swathes of social and economic activity into AI systems run by a handful of giant companies (which may in turn either be taken over by, or effectively take over, governments.)
Massive disruption: bulk automation of most cognitive-based jobs, replacement of our current epistemic systems, and rollout of vast numbers of active nonhuman agents would upend most of our current civilizational systems in a relatively short period of time.
Catastrophes: by proliferating the ability – potentially above human level – to create new military and destructive technologies and decoupling it from the social and legal systems grounding responsibility, physical catastrophes from weapons of mass destruction become dramatically more likely.
Geopolitics and war: major world powers will not sit idly by if they feel that a technology that could supply a “decisive strategic advantage” is being developed by their adversaries.
Runaway and loss of control: Unless it is specifically prevented, superhuman AI will have every incentive to further improve itself and could far outstrip humans in speed, data processing, and sophistication of thinking. There is no meaningful way in which we can be in control of such a system. Such AI will not grant power to humans; we will grant power to it, or it will take it.
Many of these risks remain even if the technical “alignment” problem – ensuring that advanced AI reliably does what humans want it to do – is solved. AI presents an enormous challenge in how it will be managed, and very many aspects of this management become incredibly difficult or intractable as human intelligence is breached.
Most fundamentally, the type of superhuman general-purpose AI currently being pursued would, by its very nature, have goals, agency, and capabilities exceeding our own. It would be inherently uncontrollable – how can we control something that we can neither understand nor predict? It would not be a technological tool for human use, but a second species of intelligence on Earth alongside ours. If allowed to progress further, it would constitute not just a second species but a replacement species.
Perhaps it would treat us well, perhaps not. But the future would belong to it, not us. The human era would be over.
The creation of superhuman AGI is far from inevitable. We can prevent it through a coordinated set of governance measures:
First, we need robust accounting and oversight of AI computation (“compute”), which is a fundamental enabler of, and lever to govern, large-scale AI systems. This in turn requires standardized measurement and reporting of the total compute used in training AI models and running them, and technical methods of tallying, certifying, and verifying computation used.
Second, we should implement hard caps on AI computation, both for training and for operation; these prevent AI both from being too powerful and operating too quickly. These caps can be implemented through both legal requirements and hardware-based security measures built into AI-specialized chips, analogous to security features in modern phones. Because specialized AI hardware is made by only a handful of companies, verification and enforcement are feasible through the existing supply chain.
Third, we need enhanced liability for the most dangerous AI systems. Those developing AI that combines high autonomy, broad generality, and superior intelligence should face strict liability for harms, while safe harbors from this liability would encourage development of more limited and controllable systems.
Fourth, we need tiered regulation based on risk levels. The most capable and dangerous systems would require extensive safety and controllability guarantees before development and deployment, while less powerful or more specialized systems would face proportionate oversight. This regulatory framework should eventually operate at both national and international levels.
This approach – with detailed specification given in the full document – is practical: while international coordination will be needed, verification and enforcement can work through the small number of companies controlling the specialized hardware supply chain. It is also flexible: companies can still innovate and profit from AI development, just with clear limits on the most dangerous systems.
Longer-term containment of AI power and risk would require international agreements based on both self- and common-interest, just as controlling nuclear weapon proliferation does now. But we can start immediately with enhanced oversight and liability, while building toward more comprehensive governance.
The key missing ingredient is political and social will to take control of the AI development process. The source of that will, if it comes in time, will be reality itself – that is, from widespread realization of the real implications of what we are doing.
Rather than pursuing uncontrollable AGI, we can develop powerful “Tool AI” that enhances human capability while remaining under meaningful human control. Tool AI systems can be extremely capable while avoiding the dangerous triple-intersection of high autonomy, broad generality, and superhuman intelligence, as long as we engineer them to be controllable at a level commensurate with their capability. They can also be combined into sophisticated systems that maintain human oversight while delivering transformative benefits.
Tool AI can revolutionize medicine, accelerate scientific discovery, enhance education, and improve democratic processes. When properly governed, it can make human experts and institutions more effective rather than replacing them. While such systems will still be highly disruptive and require careful management, the risks they pose are fundamentally different from AGI: they are risks we can govern, like those of other powerful technologies, not existential threats to human agency and civilization. And crucially, when wisely developed, AI tools can help people govern powerful AI and manage its effects.
This approach requires rethinking both how AI is developed and how its benefits are distributed. New models of public and non-profit AI development, robust regulatory frameworks, and mechanisms to distribute economic benefits more broadly can help ensure AI empowers humanity as a whole rather than concentrating power in a few hands. AI itself can help build better social and governance institutions, enabling new forms of coordination and discourse that strengthen rather than undermine human society. National security establishments can leverage their expertise to make AI tool systems genuinely secure and trustworthy, and a true source of defense as well as national power.
We may eventually choose to develop yet more powerful and more sovereign systems that are less like tools and – we can hope – more like wise and powerful benefactors. But we should do so only after we have developed the scientific understanding and governance capacity to do so safely. Such a momentous and irreversible decision should be made deliberately by humanity as a whole, not by default in a race between tech companies and nations.
People want the good that comes from AI: useful tools that empower them, supercharge economic opportunities and growth, and promise breakthroughs in science, technology, and education. Why wouldn’t they? But when asked, overwhelming majorities of the general public want slower and more careful AI development, and do not want smarter-than-human AI that will replace them in their jobs and elsewhere, fill their culture and information commons with non-human content, concentrate power in a tiny set of corporations, pose extreme large-scale global risks, and eventually threaten to disempower or replace their species. Why would they?
We can have one without the other. It starts by deciding that our destiny is not in the supposed inevitability of some technology or in the hands of a few CEOs in Silicon Valley, but in the rest of our hands if we take hold of it. Let’s close the Gates, and keep the future human.