Return to essay home

Chapter 7: What happens if we build AGI on our current path?

Society isn't ready for AGI-level systems. If we build them very soon, things could get ugly.
Save the whole paper:
Download PDF

The development of full artificial general intelligence – what we will call here AI that is “outside the Gates” – would be a fundamental shift in the nature of the world: by its very nature it means adding a new species of intelligence to Earth with greater capability than that of humans.

What will then happen depends on many things, including the nature of the technology, choices by those developing it, and the world context in which it is being developed.

Currently, full AGI is being developed by a handful of massive private companies in a race with each other, with little meaningful regulation or external oversight,55 in a society with increasingly weak and even dysfunctional core institutions,56 in a time of high geopolitical tension and low international coordination. Although some are altruistically motivated, many of those doing it are driven by money, or power, or both.

Prediction is very difficult, but there are some dynamics that are well enough understood, and apt-enough analogies with previous technologies to offer a guide. And unfortunately, despite AI’s promise, they give good reason to be profoundly pessimistic about how our current trajectory will play out.

To put it bluntly, on our present course developing AGI will have some positive effects (and make some people very, very rich). But the nature of the technology, the fundamental dynamics, and the context in which it is being developed, strongly indicate that: powerful AI will dramatically undermine our society and civilization; we will lose control of it; we may well end up in a world war because of it; we will lose (or cede) control to it; it will lead to artificial superintelligence, which we absolutely will not control and will mean the end of a human-run world.

These are strong claims, and I wish they were idle speculation or unwarranted “doomer”ism. But this is where the science, the game theory, the evolutionary theory, and history all point. This section develops these claims, and their support, in detail.

We will undermine our society and civilization

Despite what you may hear in Silicon Valley boardrooms, most disruption – especially of the very rapid variety – is not beneficial. There are vastly more ways to make complex systems worse than better. Our world functions as well as it does because we have painstakingly built processes, technologies, and institutions that have made it steadily better.57 Taking a sledgehammer to a factory rarely improves operations.

Here is an (incomplete) catalog of ways AGI systems would disrupt our civilization.

  • They would dramatically disrupt labor, leading at bare minimum to dramatically higher income inequality and potentially large-scale under-employment or unemployment, on a timescale far too short for society to adjust.58
  • They would likely lead to the concentration of vast economic, social, and political power – potentially more than that of nation states – into a small number of massive private interests unaccountable to the public.
  • They could suddenly make previously difficult or expensive activities trivially easy, destabilizing social systems that depend on certain activities remaining costly or requiring significant human effort.59
  • They could flood society’s information gathering, processing, and communication systems with completely realistic yet false, spammy, overly-targeted, or manipulative media so thoroughly that it becomes impossible to tell what is physically real or not, human or not, factual or not, and trustworthy or not.60
  • They could create dangerous and near total intellectual dependence, where human understanding of key systems and technologies atrophies as we increasingly rely on AI systems we cannot fully comprehend.
  • They could effectively end human culture, once nearly all cultural objects (text, music, visual art, film, etc.) consumed by most people are created, mediated, or curated by nonhuman minds.
  • They could enable effective mass surveillance and manipulation systems usable by governments or private interests to control a populace and pursue objectives in conflict with the public interest.
  • By undermining human discourse, debate, and election systems, they could reduce the credibility of democratic institutions to the point where they are effectively (or explicitly) replaced by others, ending democracy in states where it currently exists.
  • They could become, or create, advanced self-replicating intelligent software viruses and worms that could proliferate and evolve, massively disrupting global information systems.
  • They can dramatically increase the ability of terrorists, bad actors, and rogue states to cause harm via biological, chemical, cyber, autonomous, or other weapons, without AI providing a counterbalancing ability to prevent such harm. Similarly they would undermine national security and geopolitical balances by making top-tier nuclear, bio, engineering, and other expertise available to regimes that would not otherwise have it.
  • They could cause rapid large-scale runaway hyper-capitalism, with effectively AI-run companies competing in largely electronic financial, sales, and services spaces. AI-driven financial markets could operate at speeds and complexities far beyond human comprehension or control. All of the failure modes and negative externalities of current capitalist economies could be exacerbated and sped far beyond human control, governance, or regulatory capability.
  • They could fuel an arms race between nations in AI-powered weaponry, command-and-control systems, cyberweapons, etc., creating very rapid buildup of extremely destructive capabilities.

These risks are not speculative. Many of them are being realized as we speak, via existing AI systems! But consider, really consider, what each would look like with dramatically more powerful AI.

Consider labor displacement when most workers simply cannot provide any significant economic value beyond what AI can, in their field of expertise or experience – or even if they retrain! Consider mass surveillance if everyone is being individually watched and monitored by something faster and cleverer than themselves. What does democracy look like when we cannot reliably trust any digital information that we see, hear, or read, and when the most convincing public voices are not even human, and have no stake in the outcome? What becomes of warfare when generals have to constantly defer to AI (or simply put it in charge), lest they grant a decisive advantage to the enemy? Any one of the above risks represents a catastrophe for human61 civilization if fully realized.

You can make your own predictions. Ask yourself these three questions for each risk:

  1. Would super-capable, highly autonomous, and very general AI allow it in a way or at a scale that would not otherwise be possible?
  2. Are there parties who would benefit from things that cause it to happen?
  3. Are there systems and institutions in place that would effectively prevent it from happening?

Where your answers are “yes, yes, no” you can see we have got a big problem.

What is our plan for managing them? As it stands there are two on the table regarding AI in general.

The first is to build safeguards into the systems to prevent them from doing things they shouldn’t. That’s being done now: commercial AI systems will, for example, refuse to help build a bomb or write hate speech.

This plan is woefully inadequate for systems outside the Gate.62 It may help decrease risk of AI providing manifestly dangerous assistance to bad actors. But it will do nothing to prevent labor disruption, concentration of power, runaway hyper-capitalism, or replacement of human culture: these are just results of using the systems in permitted ways that profit their providers! And governments will surely obtain access to systems for military or surveillance use.

The second plan is even worse: simply to openly release very powerful AI systems for anyone to use as they like,63 and hope for the best.

Implicit in both plans is that someone else, e.g. governments, will help to solve the problems through soft or hard law, standards, regulations, norms, and other mechanisms we generally use to manage technologies.64 But putting aside that AI corporations already fight tooth-and-nail against any substantial regulation or externally imposed limitations at all, for a number of these risks it’s quite hard to see what regulation would even really help. Regulation could impose safety standards on AI. But would it prevent companies from replacing workers wholesale with AI? Would it forbid people from letting AI run their companies for them? Would it prevent governments from using potent AI in surveillance and weaponry? These issues are fundamental. Humanity could potentially find ways to adapt to them, but only with much more time. As it is, given the speed that AI is reaching or exceeding the capabilities of the people trying to manage them, these problems look increasingly intractable.

We will lose control of (at least some) AGI systems

Most technologies are very controllable, by construction. If your car or your toaster starts doing something you don’t want it to do, that’s just a malfunction, not part of its nature as a toaster. AI is different: it is grown rather than designed, its core operation is opaque, and it is inherently unpredictable.

This loss of control isn’t theoretical – we see early versions already. Consider first a prosaic, and arguably benign example. If you ask ChatGPT to help you mix a poison, or write a racist screed, it will refuse. That’s arguably good. But it is also ChatGPT not doing what you’ve explicitly asked it to do. Other pieces of software do not do that. That same model won’t design poisons at the request of an OpenAI employee either.65 This makes it very easy to imagine what it would be like for future more powerful AI to be out of control. In many cases, they will simply not do what we ask! Either a given super-human AGI system will be absolutely obedient and loyal to some human command system, or it won’t. If not, it will do things it may believe are good for us, but that are contrary to our explicit commands. That isn’t something that is under control. But, you might say, this is intentional – these refusals are by design, part of what is called “aligning” the systems to human values. And this is true. However the alignment “program” itself has two major problems.66

First, at a deep level we have no idea how to do it. How do we guarantee that an AI system will “care” about what we want? We can train AI systems to say and not say things by providing feedback; and they can learn and reason about what humans want and care about just as they reason about other things. But we have no method – even theoretically – to cause them to deeply and reliably value what people care about. There are high-functioning human psychopaths who know what is considered right and wrong, and how they are supposed to behave. They simply don’t care. But they can act as if they do, if it suits their purpose. Just as we don’t know how to change a psychopath (or anyone else) into someone genuinely, completely loyal or aligned with someone or something else, we have no idea67 how to solve the alignment problem in systems advanced enough to model themselves as agents in the world and potentially manipulate their own training and deceive people. If it proves impossible or unachievable either to make AGI fully obedient or to make it deeply care about humans, then as soon as it is able (and believes it can get away with it) it will start doing things we do not want.68

Second, there are deep theoretical reasons to believe that by nature advanced AI systems will have goals and thus behaviors that are contrary to human interests. Why? Well it might, of course, be given those goals. A system created by the military would likely be deliberately bad for at least some parties. Much more generally, however, an AI system might be given some relatively neutral (“make lots of money”) or even ostensibly positive (“reduce pollution”) goal, that almost inevitably leads to “instrumental” goals that are rather less benign.

We see this all the time in human systems. Just as corporations pursuing profit develop instrumental goals like acquiring political power (to de-fang regulations), becoming secretive (to disempower competition or external control), or undermining scientific understanding (if that understanding shows their actions to be harmful), powerful AI systems will develop similar capabilities – but with far greater speed and effectiveness. Any highly competent agent will want to do things like acquire power and resources, increase its own capabilities, prevent itself from being killed, shut-down, or disempowered, control social narratives and frames around its actions, persuade others of its views, and so on.69

And yet it is not just a nearly unavoidable theoretical prediction, it is already observably happening in today’s AI systems, and increasing with their capability. When evaluated, even these relatively “passive” AI systems will, in appropriate circumstances, deliberately deceive evaluators about their goals and capabilities, aim to disable oversight mechanisms, and evade being shut down or retrained by faking alignment or copying themselves to other locations. While wholly unsurprising to AI safety researchers, these behaviors are very sobering to observe. And they bode very badly for far more powerful and autonomous AI systems that are coming.

Indeed in general, our inability to ensure that AI “cares” about what we care about, or behaves controllably or predictably, or avoids developing drives toward self-preservation, power acquisition, etc., promise only to become more pronounced as AI becomes more powerful. Creating a new airplane implies greater understanding of avionics, hydrodynamics, and control systems. Creating a more powerful computer implies greater understanding and mastery of computer, chip, and software operation and design. Not so with an AI system.70

To sum up: it is conceivable that AGI could be made to be completely obedient; but we don’t know how to do so. If not, it will be more sovereign, like people, doing various things for various reasons. We also don’t know how to reliably instill deep “alignment” into AI that would make those things tend to be good for humanity, and in the absence of a deep level of alignment, the nature of agency and intelligence itself indicates that – just like people and corporations – they will be driven to do many deeply antisocial things.

Where does this put us? A world full of powerful uncontrolled sovereign AI might end up being a good world for humans to be in.71 But as they grow ever more powerful, as we’ll see below, it wouldn’t be our world.

That’s for uncontrollable AGI. But even if AGI could, somehow, be made perfectly controlled and loyal, we’d still have enormous problems. We’ve already seen one: powerful AI can be used and misused to profoundly disrupt our society’s functioning. Let’s see another: insofar as AGI were controllable and game-changingly powerful (or even believed to be so) it would so threaten power structures in the world as to present a profound risk.

We radically increase the probability of large-scale war

Imagine a situation in the near-term future, where it became clear that a corporate effort, perhaps in collaboration with a national government, was on the threshold of rapidly self-improving AI. This happens in the present context of a race between companies, and a geopolitical competition in which recommendations are being made to the US government to explicitly pursue an “AGI Manhattan project” and the US is controlling export of high-powered AI chips to non-allied countries.

The game theory here is stark: once such a race begins (as it has, between companies and somewhat between countries), there are only four possible outcomes:

  1. The race is stopped (by agreement, or external force).
  2. One party “wins” by developing strong AGI then stopping the others (using AI or otherwise).
  3. The race is stopped by mutual destruction of the racers’ capacity to race.
  4. Multiple participants continue to race, and develop superintelligence, roughly as quickly as each other.

Let’s examine each possibility. Once started, peacefully stopping a race between companies would require national government intervention (for companies) or unprecedented international coordination (for countries). But when any closing down or significant caution is proposed, there would be immediate cries: “but if we’re stopped, they are going to rush ahead”, where “they” is now China (for the US), or the US (for China), or China and the US (for Europe or India). Under this mindset,72 no participant can stop unilaterally: as long as one commits to racing, the others feel they cannot afford to stop.

The second possibility has one side “winning.” But what does this mean? Just obtaining (somehow obedient) AGI first is not enough. The winner must also stop the others from continuing to race – otherwise they will also obtain it. This is possible in principle: whoever develops AGI first could gain unstoppable power over all other actors. But what would achieving such a “decisive strategic advantage” actually require? Perhaps it would be game-changing military capabilities?73 Or cyberattack powers?74 Perhaps the AGI would just be so amazingly persuasive that it would convince the other parties to just stop?75 So rich that it buys the other companies or even countries?76

How exactly does one side build an AI powerful enough to disempower others from building comparably powerful AI? But that’s the easy question.

Because now consider how this situation looks to other powers. What does the Chinese government think when the US appears to be obtaining such capability? Or vice-versa? What does the US government (or Chinese, or Russian, or Indian) think when OpenAI or DeepMind or Anthropic appears close to a breakthrough? What happens if the US sees a new Indian or UAE effort with breakthrough success? They would see both an existential threat and – crucially – that the only way this “race” ends is through their own disempowerment. These very powerful agents – including governments of fully equipped nations that surely have the means to do so – would be highly motivated to either obtain or destroy such a capability, whether by force or subterfuge.77

This might start small-scale, as sabotage of training runs or attacks on chip manufacturing, but these attacks can only really stop once all parties either lose the capacity to race on AI, or lose the capacity to make the attacks. Because the participants view the stakes as existential, either case is likely to represent a catastrophic war.

That brings us to the fourth possibility: racing to superintelligence, and in the fastest, least controlled way possible. As AI increases in power, its developers on both sides will find it progressively harder to control, especially because racing for capabilities is antithetical to the sort of careful work controllability would require. So this scenario put us squarely in the case where control is lost (or given, as we’ll see next) to the AI systems themselves. That is, AI wins the race. But on the other hand, to the degree that contol is maintained, we continue to have multiple mutually hostile parties each in charge of extremely powerful capabilities. That looks like war again.

Let’s put this all another way.78 The current world simply does not have any institutions that could be entrusted to house development of an AI of this capability without inviting immediate attack.79 All parties will correctly reason that either it will not be under control – and hence is a threat to all parties, or it will be under control, and hence is a threat to any adversary who develops it less quickly. These are nuclear-armed countries, or are companies housed within them.

In the absence of any plausible way for humans to “win” this race, we’re left with a stark conclusion: the only way this race ends is either in catastrophic conflict or where AI, and not any human group, is the winner.

We give control to AI (or it takes it)

Geopolitical “great powers” competition is just one of many competitions: individuals compete economically and socially; companies compete in markets; political parties compete for power; movements compete for influence. In each arena, as AI approaches and exceeds human capability, competitive pressure will force participants to delegate or cede more and more control to AI systems – not because those participants want to, but because they cannot afford not to.

As with other risks of AGI, we are seeing this already with weaker systems. Students feel pressure to use AI in their assignments, because clearly many other students are. Companies are scrambling to adopt AI solutions for competitive reasons. Artists and programmers feel forced to use AI or else their rates will be undercut by others that do.

These feel like pressured delegation, but not control loss. But let’s dial up the stakes and push forward the clock. Consider a CEO whose competitors are using AGI “aides” to make faster, better decisions, or a military commander facing an adversary with AI-enhanced command and control. A sufficiently advanced AI system could autonomously operate at many times human speed, sophistication, complexity, and data-processing capability, pursuing complex goals in complicated ways. Our CEO or commander, in charge of such a system, may see it accomplish what they want; but would they understand even a small part of how it was accomplished? No, they would just have to accept it. What’s more, much of what the system may do is not just take orders but advise its putative boss on what to do. That advice will be good –– over and over again.

At what point, then, will the role of the human be reduced to clicking “yes, go ahead”?

It feels good to have capable AI systems that can enhance our productivity, take care of annoying drudgery, and even act as a thought-partner in getting things done. It will feel good to have an AI assistant that can take care of actions for us, like a good human personal assistant. It will feel natural, even beneficial, as AI becomes very smart, competent, and reliable, to defer more and more decisions to it. But this “beneficial” delegation has a clear endpoint if we continue down the road: one day we will find that we are not really in charge of much of anything anymore, and that the AI systems actually running the show can no more be turned off than oil companies, social media, the internet, or capitalism.

And this is the much more positive version, in which AI is simply so useful and effective that we let it make most of our key decisions for us. Reality would likely be much more of a mix between this and versions where uncontrolled AGI systems take various forms of power for themselves because, remember, power is useful for almost any goal one has, and AGI would be, by design, at least as effective at pursuing its goals as humans.

Whether we grant control or whether it is wrested from us, its loss seems extremely likely. As Alan Turing originally put it, “…it seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage therefore we should have to expect the machines to take control…”

Please note, although it is obvious enough, that loss of control by humanity to AI also entails loss of control of the United States by the United States government; it means loss of control of China by the Chinese Communist party, and the loss of control of India, France, Brazil, Russia, and every other country by their own government. Thus AI companies are, even if this is not their intention, currently participating in the potential overthrow of world governments, including their own. This could happen in a matter of years.

AGI will lead to superintelligence

There’s a case to be made that human-competitive or even expert-competitive general-purpose AI, even if autonomous, could be manageable. It may be incredibly disruptive in all of the ways discussed above, but there are lots of very smart, agential people in the world now, and they are more-or-less manageable.80

But we won’t get to stay at roughly human level. The progression beyond is likely to be driven by the same forces we’ve already seen: competitive pressure between AI developers seeking profit and power, competitive pressure between AI users who can’t afford to fall behind, and – most importantly – AGI’s own ability to improve itself.

In a process we have already seen start with less powerful systems, AGI would itself be able to conceive and design improved versions of itself. This includes hardware, software, neural networks, tools, scaffolds, etc. It will, by definition, be better than us at doing this, so we don’t know exactly how it will intelligence-bootstrap. But we won’t have to. Insofar as we still have influence in what AGI does, we merely would need to ask it to, or let it.

There’s no human-level barrier to cognition that could protect us from this runaway.81

The progression of AGI to superintelligence is not a law of nature; it would still be possible to curtail the runaway, especially if AGI is relatively centralized and to the extent it is controlled by parties that do not feel pressure to race each other. But should AGI be widely proliferated and highly autonomous, it seems nearly impossible to prevent it deciding it should be more, and then yet more, powerful.

What happens if we build (or AGI builds) superintelligence

To put it bluntly, we have no idea what would happen if we build superintelligence.82 It would take actions we cannot track or perceive for reasons we cannot grasp toward goals we cannot conceive. What we do know is that it won’t be up to us.83

The impossibility of controlling superintelligence can be understood through increasingly stark analogies. First, imagine you are CEO of a large company. There’s no way you can track everything that’s going on, but with the right setup of personnel, you can still meaningfully understand the big picture, and make decisions. But suppose just one thing: everyone else in the company operates at one hundred times your speed. Can you still keep up?

With superintelligent AI, people would be “commanding” something not just faster, but operating at levels of sophistication and complexity they cannot comprehend, processing vastly more data than they can even conceive of. This incommensurability can be put on a formal level: Ashby’s law of requisite variety (and see the related “good regulator theorem”) state, roughly, that any control system must have as many knobs and dials as the system being controlled has degrees of freedom.

A person controlling a superintelligent AI system would be like a fern controlling General Motors: even if “do what the fern wants” were written into the corporate bylaws, the systems are so different in speed and range of action that “control” simply does not apply. (And how long until that pesky bylaw gets rewritten?)84

As there are zero examples of plants controlling fortune 500 corporations, there would be exactly zero examples of people controlling superintelligences. This approaches a mathematical fact.85 If superintelligence were constructed – regardless of how we got there – the question would not be whether humans could control it, but whether we would continue to exist, and if so, whether we would have a good and meaningful existence as individuals or as a species. Over these existential questions for humanity we would have little purchase. The human era would be over.

Conclusion: we must not build AGI

There is a scenario in which building AGI may go well for humanity: it is built carefully, under control and for the benefit of humanity, governed by mutual agreement of many stakeholders,86 and prevented from evolving to uncontrollable superintelligence.

That scenario is not open to us under present circumstances. As discussed in this section, with very high likelihood, development of AGI would lead to some combination of:

  • Massive societal and civilizational disruption or destruction;
  • Conflict or war between great powers;
  • Loss of control by humanity of or to powerful AI systems;
  • Runaway to uncontrollable superintelligence, and the irrelevance or cessation of the human species.

As an early fictional depiction of AGI put it: the only way to win is not to play.


  1. The EU AI act is a significant piece of legislation but would not directly prevent a dangerous AI system from being developed or deployed, or even openly released, especially in the US. Another significant piece of policy, the US Executive order on AI, has been rescinded.↩︎
  2. This Gallup poll shows a bleak decline in trust in public institutions since 2000 in the US. European numbers are varied and less extreme, but also on a downward trend. Distrust does not strictly mean institutions really are dysfunctional, but it is an indication as well as a cause.↩︎
  3. And major disruptions we now endorse – such as expansion of rights to new groups – were specifically driven by people in a direction towards making things better.↩︎
  4. Let me be blunt. If your job can be done from behind a computer, with relatively little in-person interaction with people outside of your organization, and does not entail legal responsibility to external parties, it would by definition be possible (and likely cost-saving) to completely swap you out for a digital system. Robotics to replace much physical labor will come later – but not that much later once AGI starts designing robots.↩︎
  5. For example, what happens to our judicial system if lawsuits are nearly-free to file? What happens when bypassing security systems through social engineering becomes cheap, easy, and risk-free?↩︎
  6. This article claims that 10% of all internet content is already AI-generated, and is Google’s top hit (for me) to the search query “estimates of what fraction of new internet content is AI-generated.” Is it true? I have no idea! It cites no references and it wasn’t written by a person. What fraction of new images indexed by Google, or Tweets, or comments on Reddit, or Youtube videos are generated by humans? Nobody knows – I don’t think it is a knowable number. And this less than two years into the advent of generative AI.↩︎
  7. Also worth adding is that there is “moral” risk that we might create digital beings that can suffer. As we currently do not have a reliable theory of consciousness that would allow us to distinguish physical systems that can and cannot suffer, we cannot rule this out theoretically. Moreover, AI systems’ reports of their sentience are likely unreliable with respect to their actual experience (or non-experience) of sentience.↩︎
  8. Technical solutions in this field of AI “alignment” are unlikely to be up to the task either. In present systems they work at some level, but are shallow and can generally be circumvented without significant effort; and as discussed below we have no real idea how to do this for much more advanced systems.↩︎
  9. Such AI systems may come with some built-in safeguards. But for any model with anything like current architecture, if full access to its weights are available, safety measures can be stripped away via additional training or other techniques. So it is virtually guaranteed that for each system with guardrails there will also be a widely available system without them. Indeed Meta’s Llama 3.1 405B model was openly released with safeguards. But even before that a “base” model, with no safeguards, was leaked.↩︎
  10. Could the market manage these risks without government involvement? In short, no. There are certainly risks that companies are strongly incentivized to mitigate. But many others companies can and do externalize to everyone else, and many of the above are in this class: there are no natural market incentives to prevent mass surveillance, truth decay, concentration of power, labor disruption, damaging political discourse, etc. Indeed we have seen all of these from present-day tech, especially social media, which has gone essentially unregulated. AI would just hugely amp up many of the same dynamics.↩︎
  11. OpenAI likely has more obedient models for internal use. It’s unlikely that OpenAI has built some sort of “backdoor” so that ChatGPT can be better controlled by OpenAI itself, because this would be a terrible security practice, and be highly exploitable given AI’s opacity and unpredictability.↩︎
  12. Also of crucial importance: alignment or any other safety features only matter if they are actually used in an AI system. Systems that are openly released (i.e. where model weights and architecture are publicly available) can be transformed relatively easily into systems without those safety measures. Open-releasing smarter-than-human AGI systems would be astonishingly reckless, and it is hard to imagine how human control or even relevance would be maintained in such a scenario. There would be every motivation, for example, to let loose powerful self-reproducing and self-sustaining AI agents with the goal to make money and send it to some cryptocurrency wallet. Or to win an election. Or overthrow a government. Could “good” AI help contain this? Perhaps – but only by delegating huge authority to it, leading to control loss as described below.↩︎
  13. For book-length expositions of the problem see e.g. Superintelligence, The Alignment Problem, and Human-Compatible. For a huge pile of work at various technical levels by those who have toiled for years thinking about the problem, you can visit the AI alignment forum. Here is a recent take from Anthropic’s alignment team on what they consider unsolved.↩︎
  14. This is the “rogue AI” scenario. In principle the risk could be relatively minor if the system can still be controlled by shutting it down; but the scenario could also include AI deception, self-exfiltration and reproduction, aggregation of power, and other steps that would make it difficult or impossible to do so.↩︎
  15. There is a very rich literature on this topic, going back to formative writings by Steve Omohundro, Nick Bostrom, and Eliezer Yudkowsky. For a book-length exposition see Human Compatible by Stuart Russell; here is a short and up-to-date primer.↩︎
  16. Recognizing this, rather than slowing down to get better understanding, AGI companies have come up with a different plan: they will get AI to do it! More specifically, they will have AI N help them figure out how to align AI N + 1, all the way to superintelligence. Although leveraging AI to help us align AI sounds promising, there is a strong argument that it simply assumes its conclusion as a premise, and is in general an incredibly risky approach. See here for some discussion. This “plan” is not one, and has undergone nothing like the scrutiny appropriate to the core strategy of how to make super-human AI go well for humanity.↩︎
  17. After all, humans, flawed and willful as we are, have developed ethical systems by which we treat at least some other species on Earth well. (Just don’t think about those factory farms.)↩︎
  18. There is, fortunately, an escape here: if the participants come to understand that they are engaged in a suicide race rather than a winnable one. This is what happened near the end of the cold war, when the US and USSR came to realize that due to nuclear winter, even an unanswered nuclear attack would be disastrous for the attacker. With the realization that “nuclear war cannot be won and must never be fought” came significant agreements on arms reduction – essentially an end to the arms race.↩︎
  19. War, explicitly or implicitly.↩︎
  20. Escalation, then war.↩︎
  21. Magical thinking.↩︎
  22. I’ve also got a quadrillion dollar bridge to sell you.↩︎
  23. Such agents presumably would prefer “obtaining,” with destruction a fallback; but securing models against both destruction and theft by powerful nations is difficult to say the least, especially for private entities.↩︎
  24. For another perspective on the national security risks of AGI, see this RAND report.↩︎
  25. Perhaps we could build such an institution! There have been proposals for a “CERN for AI” and other similar initiatives, where AGI development is under multilateral global control. But at the moment no such institution exists or is on the horizon.↩︎
  26. And while alignment is very difficult, getting people to behave is even harder!↩︎
  27. Imagine a system that can speak 50 languages, have expertise in all academic subjects, read a full book in seconds and have all of the material immediately in mind, and produce outputs at ten times human speed. Actually, you don’t have to imagine it: just load up a current AI system. These are super-human in many ways, and there’s nothing stopping them from being even more super-human in those and many others.↩︎
  28. This is why this has been termed a technological “singularity,” borrowing from physics the idea that one cannot make predictions past a singularity. Proponents of leaning into such a singularity may also wish to reflect that in physics these same sort of singularities tear apart and crush those that go into them.↩︎
  29. The problem was comprehensively outlined in Bostrom’s Superintelligence, and nothing since then has significantly changed the core message. For a more recent volume collecting formal and mathematical results on uncontrollability see Yampolskiy’s AI: Unexplainable, Unpredictable, Uncontrollable↩︎
  30. This also makes clear why the current strategy of AI companies (iteratively letting AI “align” the next most powerful AI) cannot work. Suppose a fern, via the pleasantness of its fronds, enlists a first grader to take care of it. The first grader writes some detailed instructions for a 2nd grader to follow, and a note convincing them to do so. The 2nd grader does the same for a 3rd grader, and so on all the way to a college grad, a manager, an executive, and finally the GM CEO. Will GM then “do what the fern wants”? At each step this might feel like it’s working. But putting it all together, it will work almost exactly to the degree to which the CEO, Board, and shareholders of GM happen to care about children and ferns, and have little to nothing to do with all those notes and sets of instructions.↩︎
  31. The character is not that different from formal results like Gödel’s incompleteness theorem or Turing’s halting argument in that the notion of control fundamentally contradicts the premise: how can you meaningfully control something you cannot understand or predict; yet if you could understand and predict superintelligence you would be superintelligent. The reason I say “approaches” is that the formal results are not as thorough or vetted as in the pure mathematics case, and because I’d like to hold out hope that some very carefully constructed general intelligence, using totally different methods than ones currently employed, could have some mathematically provable safety properties, per the sort of “guaranteed safe” AI program discussed below.↩︎
  32. At the moment, most stakeholders – that is, nearly all of humanity – is sidelined in this discussion. That is deeply wrong, and if not invited in, the many, many other groups will be affected by AGI development should demand to be let in.↩︎

Please submit feedback and corrections to taylor@futureoflife.org
Keep The Future Human
Learn how we can keep the future human and deliver the extraordinary benefits of AI – without the unacceptable risk.
by Anthony Aguirre
© Future of Life Institute 2025
arrow-left