The development of full artificial general intelligence – what we will call here AI that is “outside the Gates” – would be a fundamental shift in the nature of the world: by its very nature it means adding a new species of intelligence to Earth with greater capability than that of humans.
What will then happen depends on many things, including the nature of the technology, choices by those developing it, and the world context in which it is being developed.
Currently, full AGI is being developed by a handful of massive private companies in a race with each other, with little meaningful regulation or external oversight,55 in a society with increasingly weak and even dysfunctional core institutions,56 in a time of high geopolitical tension and low international coordination. Although some are altruistically motivated, many of those doing it are driven by money, or power, or both.
Prediction is very difficult, but there are some dynamics that are well enough understood, and apt-enough analogies with previous technologies to offer a guide. And unfortunately, despite AI’s promise, they give good reason to be profoundly pessimistic about how our current trajectory will play out.
To put it bluntly, on our present course developing AGI will have some positive effects (and make some people very, very rich). But the nature of the technology, the fundamental dynamics, and the context in which it is being developed, strongly indicate that: powerful AI will dramatically undermine our society and civilization; we will lose control of it; we may well end up in a world war because of it; we will lose (or cede) control to it; it will lead to artificial superintelligence, which we absolutely will not control and will mean the end of a human-run world.
These are strong claims, and I wish they were idle speculation or unwarranted “doomer”ism. But this is where the science, the game theory, the evolutionary theory, and history all point. This section develops these claims, and their support, in detail.
Despite what you may hear in Silicon Valley boardrooms, most disruption – especially of the very rapid variety – is not beneficial. There are vastly more ways to make complex systems worse than better. Our world functions as well as it does because we have painstakingly built processes, technologies, and institutions that have made it steadily better.57 Taking a sledgehammer to a factory rarely improves operations.
Here is an (incomplete) catalog of ways AGI systems would disrupt our civilization.
These risks are not speculative. Many of them are being realized as we speak, via existing AI systems! But consider, really consider, what each would look like with dramatically more powerful AI.
Consider labor displacement when most workers simply cannot provide any significant economic value beyond what AI can, in their field of expertise or experience – or even if they retrain! Consider mass surveillance if everyone is being individually watched and monitored by something faster and cleverer than themselves. What does democracy look like when we cannot reliably trust any digital information that we see, hear, or read, and when the most convincing public voices are not even human, and have no stake in the outcome? What becomes of warfare when generals have to constantly defer to AI (or simply put it in charge), lest they grant a decisive advantage to the enemy? Any one of the above risks represents a catastrophe for human61 civilization if fully realized.
You can make your own predictions. Ask yourself these three questions for each risk:
Where your answers are “yes, yes, no” you can see we have got a big problem.
What is our plan for managing them? As it stands there are two on the table regarding AI in general.
The first is to build safeguards into the systems to prevent them from doing things they shouldn’t. That’s being done now: commercial AI systems will, for example, refuse to help build a bomb or write hate speech.
This plan is woefully inadequate for systems outside the Gate.62 It may help decrease risk of AI providing manifestly dangerous assistance to bad actors. But it will do nothing to prevent labor disruption, concentration of power, runaway hyper-capitalism, or replacement of human culture: these are just results of using the systems in permitted ways that profit their providers! And governments will surely obtain access to systems for military or surveillance use.
The second plan is even worse: simply to openly release very powerful AI systems for anyone to use as they like,63 and hope for the best.
Implicit in both plans is that someone else, e.g. governments, will help to solve the problems through soft or hard law, standards, regulations, norms, and other mechanisms we generally use to manage technologies.64 But putting aside that AI corporations already fight tooth-and-nail against any substantial regulation or externally imposed limitations at all, for a number of these risks it’s quite hard to see what regulation would even really help. Regulation could impose safety standards on AI. But would it prevent companies from replacing workers wholesale with AI? Would it forbid people from letting AI run their companies for them? Would it prevent governments from using potent AI in surveillance and weaponry? These issues are fundamental. Humanity could potentially find ways to adapt to them, but only with much more time. As it is, given the speed that AI is reaching or exceeding the capabilities of the people trying to manage them, these problems look increasingly intractable.
Most technologies are very controllable, by construction. If your car or your toaster starts doing something you don’t want it to do, that’s just a malfunction, not part of its nature as a toaster. AI is different: it is grown rather than designed, its core operation is opaque, and it is inherently unpredictable.
This loss of control isn’t theoretical – we see early versions already. Consider first a prosaic, and arguably benign example. If you ask ChatGPT to help you mix a poison, or write a racist screed, it will refuse. That’s arguably good. But it is also ChatGPT not doing what you’ve explicitly asked it to do. Other pieces of software do not do that. That same model won’t design poisons at the request of an OpenAI employee either.65 This makes it very easy to imagine what it would be like for future more powerful AI to be out of control. In many cases, they will simply not do what we ask! Either a given super-human AGI system will be absolutely obedient and loyal to some human command system, or it won’t. If not, it will do things it may believe are good for us, but that are contrary to our explicit commands. That isn’t something that is under control. But, you might say, this is intentional – these refusals are by design, part of what is called “aligning” the systems to human values. And this is true. However the alignment “program” itself has two major problems.66
First, at a deep level we have no idea how to do it. How do we guarantee that an AI system will “care” about what we want? We can train AI systems to say and not say things by providing feedback; and they can learn and reason about what humans want and care about just as they reason about other things. But we have no method – even theoretically – to cause them to deeply and reliably value what people care about. There are high-functioning human psychopaths who know what is considered right and wrong, and how they are supposed to behave. They simply don’t care. But they can act as if they do, if it suits their purpose. Just as we don’t know how to change a psychopath (or anyone else) into someone genuinely, completely loyal or aligned with someone or something else, we have no idea67 how to solve the alignment problem in systems advanced enough to model themselves as agents in the world and potentially manipulate their own training and deceive people. If it proves impossible or unachievable either to make AGI fully obedient or to make it deeply care about humans, then as soon as it is able (and believes it can get away with it) it will start doing things we do not want.68
Second, there are deep theoretical reasons to believe that by nature advanced AI systems will have goals and thus behaviors that are contrary to human interests. Why? Well it might, of course, be given those goals. A system created by the military would likely be deliberately bad for at least some parties. Much more generally, however, an AI system might be given some relatively neutral (“make lots of money”) or even ostensibly positive (“reduce pollution”) goal, that almost inevitably leads to “instrumental” goals that are rather less benign.
We see this all the time in human systems. Just as corporations pursuing profit develop instrumental goals like acquiring political power (to de-fang regulations), becoming secretive (to disempower competition or external control), or undermining scientific understanding (if that understanding shows their actions to be harmful), powerful AI systems will develop similar capabilities – but with far greater speed and effectiveness. Any highly competent agent will want to do things like acquire power and resources, increase its own capabilities, prevent itself from being killed, shut-down, or disempowered, control social narratives and frames around its actions, persuade others of its views, and so on.69
And yet it is not just a nearly unavoidable theoretical prediction, it is already observably happening in today’s AI systems, and increasing with their capability. When evaluated, even these relatively “passive” AI systems will, in appropriate circumstances, deliberately deceive evaluators about their goals and capabilities, aim to disable oversight mechanisms, and evade being shut down or retrained by faking alignment or copying themselves to other locations. While wholly unsurprising to AI safety researchers, these behaviors are very sobering to observe. And they bode very badly for far more powerful and autonomous AI systems that are coming.
Indeed in general, our inability to ensure that AI “cares” about what we care about, or behaves controllably or predictably, or avoids developing drives toward self-preservation, power acquisition, etc., promise only to become more pronounced as AI becomes more powerful. Creating a new airplane implies greater understanding of avionics, hydrodynamics, and control systems. Creating a more powerful computer implies greater understanding and mastery of computer, chip, and software operation and design. Not so with an AI system.70
To sum up: it is conceivable that AGI could be made to be completely obedient; but we don’t know how to do so. If not, it will be more sovereign, like people, doing various things for various reasons. We also don’t know how to reliably instill deep “alignment” into AI that would make those things tend to be good for humanity, and in the absence of a deep level of alignment, the nature of agency and intelligence itself indicates that – just like people and corporations – they will be driven to do many deeply antisocial things.
Where does this put us? A world full of powerful uncontrolled sovereign AI might end up being a good world for humans to be in.71 But as they grow ever more powerful, as we’ll see below, it wouldn’t be our world.
That’s for uncontrollable AGI. But even if AGI could, somehow, be made perfectly controlled and loyal, we’d still have enormous problems. We’ve already seen one: powerful AI can be used and misused to profoundly disrupt our society’s functioning. Let’s see another: insofar as AGI were controllable and game-changingly powerful (or even believed to be so) it would so threaten power structures in the world as to present a profound risk.
Imagine a situation in the near-term future, where it became clear that a corporate effort, perhaps in collaboration with a national government, was on the threshold of rapidly self-improving AI. This happens in the present context of a race between companies, and a geopolitical competition in which recommendations are being made to the US government to explicitly pursue an “AGI Manhattan project” and the US is controlling export of high-powered AI chips to non-allied countries.
The game theory here is stark: once such a race begins (as it has, between companies and somewhat between countries), there are only four possible outcomes:
Let’s examine each possibility. Once started, peacefully stopping a race between companies would require national government intervention (for companies) or unprecedented international coordination (for countries). But when any closing down or significant caution is proposed, there would be immediate cries: “but if we’re stopped, they are going to rush ahead”, where “they” is now China (for the US), or the US (for China), or China and the US (for Europe or India). Under this mindset,72 no participant can stop unilaterally: as long as one commits to racing, the others feel they cannot afford to stop.
The second possibility has one side “winning.” But what does this mean? Just obtaining (somehow obedient) AGI first is not enough. The winner must also stop the others from continuing to race – otherwise they will also obtain it. This is possible in principle: whoever develops AGI first could gain unstoppable power over all other actors. But what would achieving such a “decisive strategic advantage” actually require? Perhaps it would be game-changing military capabilities?73 Or cyberattack powers?74 Perhaps the AGI would just be so amazingly persuasive that it would convince the other parties to just stop?75 So rich that it buys the other companies or even countries?76
How exactly does one side build an AI powerful enough to disempower others from building comparably powerful AI? But that’s the easy question.
Because now consider how this situation looks to other powers. What does the Chinese government think when the US appears to be obtaining such capability? Or vice-versa? What does the US government (or Chinese, or Russian, or Indian) think when OpenAI or DeepMind or Anthropic appears close to a breakthrough? What happens if the US sees a new Indian or UAE effort with breakthrough success? They would see both an existential threat and – crucially – that the only way this “race” ends is through their own disempowerment. These very powerful agents – including governments of fully equipped nations that surely have the means to do so – would be highly motivated to either obtain or destroy such a capability, whether by force or subterfuge.77
This might start small-scale, as sabotage of training runs or attacks on chip manufacturing, but these attacks can only really stop once all parties either lose the capacity to race on AI, or lose the capacity to make the attacks. Because the participants view the stakes as existential, either case is likely to represent a catastrophic war.
That brings us to the fourth possibility: racing to superintelligence, and in the fastest, least controlled way possible. As AI increases in power, its developers on both sides will find it progressively harder to control, especially because racing for capabilities is antithetical to the sort of careful work controllability would require. So this scenario put us squarely in the case where control is lost (or given, as we’ll see next) to the AI systems themselves. That is, AI wins the race. But on the other hand, to the degree that contol is maintained, we continue to have multiple mutually hostile parties each in charge of extremely powerful capabilities. That looks like war again.
Let’s put this all another way.78 The current world simply does not have any institutions that could be entrusted to house development of an AI of this capability without inviting immediate attack.79 All parties will correctly reason that either it will not be under control – and hence is a threat to all parties, or it will be under control, and hence is a threat to any adversary who develops it less quickly. These are nuclear-armed countries, or are companies housed within them.
In the absence of any plausible way for humans to “win” this race, we’re left with a stark conclusion: the only way this race ends is either in catastrophic conflict or where AI, and not any human group, is the winner.
Geopolitical “great powers” competition is just one of many competitions: individuals compete economically and socially; companies compete in markets; political parties compete for power; movements compete for influence. In each arena, as AI approaches and exceeds human capability, competitive pressure will force participants to delegate or cede more and more control to AI systems – not because those participants want to, but because they cannot afford not to.
As with other risks of AGI, we are seeing this already with weaker systems. Students feel pressure to use AI in their assignments, because clearly many other students are. Companies are scrambling to adopt AI solutions for competitive reasons. Artists and programmers feel forced to use AI or else their rates will be undercut by others that do.
These feel like pressured delegation, but not control loss. But let’s dial up the stakes and push forward the clock. Consider a CEO whose competitors are using AGI “aides” to make faster, better decisions, or a military commander facing an adversary with AI-enhanced command and control. A sufficiently advanced AI system could autonomously operate at many times human speed, sophistication, complexity, and data-processing capability, pursuing complex goals in complicated ways. Our CEO or commander, in charge of such a system, may see it accomplish what they want; but would they understand even a small part of how it was accomplished? No, they would just have to accept it. What’s more, much of what the system may do is not just take orders but advise its putative boss on what to do. That advice will be good –– over and over again.
At what point, then, will the role of the human be reduced to clicking “yes, go ahead”?
It feels good to have capable AI systems that can enhance our productivity, take care of annoying drudgery, and even act as a thought-partner in getting things done. It will feel good to have an AI assistant that can take care of actions for us, like a good human personal assistant. It will feel natural, even beneficial, as AI becomes very smart, competent, and reliable, to defer more and more decisions to it. But this “beneficial” delegation has a clear endpoint if we continue down the road: one day we will find that we are not really in charge of much of anything anymore, and that the AI systems actually running the show can no more be turned off than oil companies, social media, the internet, or capitalism.
And this is the much more positive version, in which AI is simply so useful and effective that we let it make most of our key decisions for us. Reality would likely be much more of a mix between this and versions where uncontrolled AGI systems take various forms of power for themselves because, remember, power is useful for almost any goal one has, and AGI would be, by design, at least as effective at pursuing its goals as humans.
Whether we grant control or whether it is wrested from us, its loss seems extremely likely. As Alan Turing originally put it, “…it seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage therefore we should have to expect the machines to take control…”
Please note, although it is obvious enough, that loss of control by humanity to AI also entails loss of control of the United States by the United States government; it means loss of control of China by the Chinese Communist party, and the loss of control of India, France, Brazil, Russia, and every other country by their own government. Thus AI companies are, even if this is not their intention, currently participating in the potential overthrow of world governments, including their own. This could happen in a matter of years.
There’s a case to be made that human-competitive or even expert-competitive general-purpose AI, even if autonomous, could be manageable. It may be incredibly disruptive in all of the ways discussed above, but there are lots of very smart, agential people in the world now, and they are more-or-less manageable.80
But we won’t get to stay at roughly human level. The progression beyond is likely to be driven by the same forces we’ve already seen: competitive pressure between AI developers seeking profit and power, competitive pressure between AI users who can’t afford to fall behind, and – most importantly – AGI’s own ability to improve itself.
In a process we have already seen start with less powerful systems, AGI would itself be able to conceive and design improved versions of itself. This includes hardware, software, neural networks, tools, scaffolds, etc. It will, by definition, be better than us at doing this, so we don’t know exactly how it will intelligence-bootstrap. But we won’t have to. Insofar as we still have influence in what AGI does, we merely would need to ask it to, or let it.
There’s no human-level barrier to cognition that could protect us from this runaway.81
The progression of AGI to superintelligence is not a law of nature; it would still be possible to curtail the runaway, especially if AGI is relatively centralized and to the extent it is controlled by parties that do not feel pressure to race each other. But should AGI be widely proliferated and highly autonomous, it seems nearly impossible to prevent it deciding it should be more, and then yet more, powerful.
To put it bluntly, we have no idea what would happen if we build superintelligence.82 It would take actions we cannot track or perceive for reasons we cannot grasp toward goals we cannot conceive. What we do know is that it won’t be up to us.83
The impossibility of controlling superintelligence can be understood through increasingly stark analogies. First, imagine you are CEO of a large company. There’s no way you can track everything that’s going on, but with the right setup of personnel, you can still meaningfully understand the big picture, and make decisions. But suppose just one thing: everyone else in the company operates at one hundred times your speed. Can you still keep up?
With superintelligent AI, people would be “commanding” something not just faster, but operating at levels of sophistication and complexity they cannot comprehend, processing vastly more data than they can even conceive of. This incommensurability can be put on a formal level: Ashby’s law of requisite variety (and see the related “good regulator theorem”) state, roughly, that any control system must have as many knobs and dials as the system being controlled has degrees of freedom.
A person controlling a superintelligent AI system would be like a fern controlling General Motors: even if “do what the fern wants” were written into the corporate bylaws, the systems are so different in speed and range of action that “control” simply does not apply. (And how long until that pesky bylaw gets rewritten?)84
As there are zero examples of plants controlling fortune 500 corporations, there would be exactly zero examples of people controlling superintelligences. This approaches a mathematical fact.85 If superintelligence were constructed – regardless of how we got there – the question would not be whether humans could control it, but whether we would continue to exist, and if so, whether we would have a good and meaningful existence as individuals or as a species. Over these existential questions for humanity we would have little purchase. The human era would be over.
There is a scenario in which building AGI may go well for humanity: it is built carefully, under control and for the benefit of humanity, governed by mutual agreement of many stakeholders,86 and prevented from evolving to uncontrollable superintelligence.
That scenario is not open to us under present circumstances. As discussed in this section, with very high likelihood, development of AGI would lead to some combination of:
As an early fictional depiction of AGI put it: the only way to win is not to play.