This work reflects the opinions of the author and should not be taken as official position of the Future of Life Institute (though they are compatible; for its official position see this page), or any other organization with which the author is affiliated.
I’m grateful to humans Mark Brakel, Ben Eisenpress, Anna Hehir, Carlos Gutierrez, Emilia Javorsky, Richard Mallah, Jordan Scharnhorst, Elyse Fulcher, Max Tegmark, and Jaan Tallinn for comments on the manuscript; to Tim Schrier for help with some references; to Taylor Jones and Elyse Fulcher for beautification of diagrams.
This work made limited use of generative AI models (Claude and ChatGPT) in its creation, for some editing and red-teaming. In the well-established standard of levels of AI involvement of creative works, this work would probably rate a 3/10. (There is in fact no such standard! But there should be.)
Compute accounting technical details |
---|
A detailed method for both “ground truth” as well as good approximations for the total compute used in training and inference is required for meaningful compute-based controls. Here is an example of how the “ground truth” could be tallied at a technical level. |
Definitions: |
Compute causal graph: For a given output O of an AI model, there is a set of digital computations for which changing the result of that computation could potentially change O. (This should be conservatively assumed, i.e. there should be a clear reason to believe that a computation is independent of a precursor that both occurs earlier in time and has a physical potential causal path of effect.) This includes computation done by the AI model during inference, as well as computations that went into input, data preparation, and training of the model. Because any of these may itself be output from an AI model, this is computed recursively, cut off where a human has provided a significant change to the input. |
Training Compute: The total compute, in FLOP or other units, entailed by the compute causal graph of a neural network (including data preparation, training, and fine-tuning, and any other computations.) |
Output Compute: The total compute in the compute causal graph of a given AI output, including all neural networks (and including their Training Compute) and other computations going into that output. |
Inference Compute Rate: In a series of outputs, the rate of change (in FLOP/s or other units) of Output Compute between outputs, i.e. the compute used to produce the next output, divided by the timed interval between the outputs. |
Examples and approximations: |
|
Implementation Example: Here is one example of how a gate closure could work, given a limit of 1027 FLOP for training and 1020 FLOP/s for inference (running the AI): |
---|
1. Pause: For reasons of national security, the US Executive branch asks all companies based in the US, doing business in the US, or using chips manufactured in the US, to cease and desist from any new AI training runs that might exceed the 1027 FLOP Training Compute limit. The US should commence discussions with other countries hosting AI development, strongly encouraging them to take similar steps and indicating that the US pause may be lifted should they choose not to comply.
2. US oversight and licensing: By executive order or action of an existing regulatory agency, the US requires that within (say) one year:
3. International oversight:
4. International verification and enforcement:
|
Details for a strict AGI liability regime |
---|
|
A tiered approach to AGI safety & security standards | |||
---|---|---|---|
Risk Tier | Trigger(s) | Requirements for training | Requirement for deployment |
RT-0 | AI weak in autonomy, generality, and intelligence | none | none |
RT-1 | AI strong in one of autonomy, generality, and intelligence | none | Based on risk and use, potentially safety cases approved by national authorities wherever the model can be used |
RT-2 | AI strong in two of autonomy, generality, and intelligence | Registration with national authority with jurisdiction over the developer | Safety case bounding risk of major harm below authorized levels plus independent safety audits (including black-box and white-box redteaming) approved by national authorities wherever the model can be used |
RT-3 | AGI strong in autonomy, generality, and intelligence | Pre-approval of safety and security plan by national authority with jurisdiction over the developer | Safety case guaranteeing bounded risk of major harm below authorized levels as well as required specifications, including cybersecurity, controllability, a non-removable killswitch, alignment with human values, and robustness to malicious use. |
RT-4 | Any model that also exceeds either 1027 FLOP Training or 1020 FLOP/s Inference | Prohibited pending international agreed lift of compute cap | Prohibited pending international agreed lift of compute cap |
The last time humanity shared the Earth with other minds that spoke, thought, built technology, and did general-purpose problem solving was 40,000 years ago in ice-age Europe. Those other minds went extinct, wholly or in part due to the efforts of ours.
We are now re-entering such a time. The most advanced products of our culture and technology – datasets built from our entire internet information commons, and 100-billion-element chips that are the most complex technologies we have ever crafted – are being combined to bring advanced general-purpose AI systems into being.
The developers of these systems are keen to portray them as tools for human empowerment. And indeed they could be. But make no mistake: our present trajectory is to build ever-more powerful, goal-directed, decision-making, and generally capable digital agents. They already perform as well as many humans at a broad range of intellectual tasks, are rapidly improving, and are contributing to their own improvement.
Unless this trajectory changes or hits an unexpected roadblock, we will soon – in years, not decades – have digital intelligences that are dangerously powerful. Even in the best of outcomes, these would bring great economic benefits (at least to some of us) but only at the cost of a profound disruption in our society, and replacement of humans in most of the most important things we do: these machines would think for us, plan for us, decide for us, and create for us. We would be spoiled, but spoiled children. Much more likely, these systems would replace humans in both the positive and negative things we do, including exploitation, manipulation, violence, and war. Can we survive AI-hypercharged versions of these? Finally, it is more than plausible that things would not go well at all: that relatively soon we would be replaced not just in what we do, but in what we are, as architects of civilization and the future. Ask the neanderthals how that goes. Perhaps we provided them with extra trinkets for a while as well.
We don’t have to do this. We have human-competitive AI, and there’s no need to build AI with which we can’t compete. We can build amazing AI tools without building a successor species. The notion that AGI and superintelligence are inevitable is a choice masquerading as fate.
By imposing some hard, global limits, we can keep AI’s general capability to approximately human level while still reaping the benefits of computers’ ability to process data in ways we cannot, and automate tasks none of us wants to do. These would still pose many risks, but if designed and managed well, be an enormous boon to humanity, from medicine to research to consumer products.
Imposing limits would require international cooperation, but less than one might think, and those limits would still leave plenty of room for an enormous AI and AI hardware industry focused on applications that enhance human well-being, rather than on the raw pursuit of power. And if, with strong safety guarantees and after a meaningful global dialogue, we decide to go further, that option continues to be ours to pursue.
Humanity must choose to close the Gates to AGI and superintelligence.
To keep the future human.
Thank you for taking the time to explore this topic with us.
I wrote this essay because as a scientist I feel it is important to tell the unvarnished truth, and because as a person I feel it is crucial for us to act quickly and decisively to tackle a world-changing issue: the development of smarter-than-human AI systems.
If we are to respond to this remarkable state of affairs with wisdom, we must be prepared to critically examine the prevailing narrative that AGI and superintelligence ‘must’ be built to secure our interests, or is ‘inevitable’ and cannot be stopped. These narratives leave us disempowered, unable to see the alternative paths ahead of us.
I hope you will join me in calling for caution in the face of recklessness, and courage in the face of greed.
I hope you will join me in calling for a human future.
– Anthony
If we successfully choose not to supplant humanity by machines – at least for a while! – what can we do instead? Do we give up the huge promise of AI as a technology? At some level the answer is a simple no: close the Gates to uncontrollable AGI and superintelligence, but do build many other forms of AI, as well as the governance structures and institutions we’ll need to manage them.
But there’s still a great deal to say; making this happen would be a central occupation of humanity. This section explores several key themes:
The triple-intersection diagram gives a good way to delineate what we can call “Tool AI”: AI that is a controllable tool for human use, rather than an uncontrollable rival or replacement. The least problematic AI systems are those that are autonomous but not general or super capable (like an auction bidding bot), or general but not autonomous or capable (like a small language model), or capable but narrow and very controllable (like AlphaGo).124 Those with two intersecting features have wider application but higher risk and will require major efforts to manage. (Just because an AI system is more of a tool does not mean it is inherently safe, merely that is isn’t inherently unsafe – consider a chainsaw, versus a pet tiger.) The Gate must remain closed to (full) AGI and superintelligence at the triple intersection, and enormous care must be taken with AI systems approaching that threshold.
But this leaves a lot of powerful AI! We can get huge utility out of smart and general passive “oracles” and narrow systems, general systems at human but not superhuman level, and so on. Many tech companies and developers are actively building these sorts of tools and should continue; like most people they are implicitly assuming the Gates to AGI and superintelligence will be closed.125
As well, AI systems can be effectively combined into composite systems that maintain human oversight while enhancing capability. Rather than relying on inscrutable black boxes, we can build systems where multiple components – including both AI and traditional software – work together in ways that humans can monitor and understand.126 While some components might be black boxes, none would be close to AGI – only the composite system as a whole would be both highly general and highly capable, and in a strictly controllable way.127
What does “strictly controllable” mean? A key idea of the “Tool” framework is to allow systems – even if quite general and powerful – that are guaranteed to be under meaningful human control. What does this mean? It entails two aspects. First is a design consideration: humans should be deeply and centrally involved in what the system is doing, without delegating key important decisions to the AI. This is the character of most current AI systems. Second, to the degree that AI systems are autonomous, they must have guarantees that limit their scope of action. A guarantee should be a number characterizing the probability of something happening, and a reason to believe that number. This is what we demand in other safety critical fields, where numbers like “mean time between failure”s and expected numbers of accidents are computed, supported, and published in safety cases.128 The ideal number for failures is zero, of course. And the good news is that we might get quite close, albeit using quite different AI architectures, using ideas of formally verified properties of programs (including AI). The idea, explored at length by Omohundro, Tegmark, Bengio, Dalrymple, and others (see here and here) is to construct a program with certain properties (for example: that a human can shut it down) and formally prove that those properties hold. This can be done now for quite short programs and simple properties, but the (coming) power of AI-powered proof software could allow it for much more complex programs (e.g. wrappers) and even AI itself. This is a very ambitious program, but as pressure grows on the Gates, we’re going to need some powerful materials reinforcing them. Mathematical proof may be one of the few that is strong enough.
With AI progress redirected, Tool AI would still be an enormous industry. In terms of hardware, even with compute caps to prevent superintelligence, training and inference in smaller models will still require huge amounts of specialized components. On the software side, defusing the explosion in AI model and computation size should simply lead to companies redirecting resources toward making the smaller systems better, more diverse, and more specialized, rather than simply making them bigger.129 There would be plenty of room – more probably – for all those money-making Silicon Valley startups.130
Intelligence, whether biological or machine, can be broadly considered as the ability to plan and execute activities bringing about futures more in line with a set of goals. As such, intelligence is of enormous benefit when used in pursuit of wisely chosen goals. Artificial intelligence is attracting huge investments of time and effort largely because of its promised benefits. So we should ask: to what degree would we still garner the benefits of AI if we contain its runaway to superintellience? The answer: we may lose surprisingly little.
Consider first that current AI systems are already very powerful, and we have really only scratched the surface of what can be done with them.131 They are reasonably capable of “running the show” in terms of “understanding” a question or task presented to them, and what it would take to answer this question or do that task.
Next, much of the excitement about modern AI systems is due to their generality; but some of the most capable AI systems – such as ones that generate or recognize speech or images, do scientific prediction and modeling, play games, etc. – are much narrower and well “within the Gates” in terms of computation.132 These systems are super-human at the particular tasks they do. They may have edge-case133 (or exploitable) weaknesses due to their narrowness; however totally narrow or fully general are not the only options available: there are many architectures in between.134
These AI tools can greatly speed advancement in other positive technologies, without AGI. To do better nuclear physics, we don’t need AI to be a nuclear physicist – we have those! If we want to accelerate medicine, give the biologists, medical researchers, and chemists powerful tools. They want them and will use them to enormous gain. We don’t need a server farm full of a million digital geniuses; we have millions of humans whose genius AI can help bring out. Yes, it will take longer to get immortality and the cure to all diseases. This is a real cost. But even the most promising health innovations would be of little use if AI-driven instability leads to global conflict or societal collapse. We owe it to ourselves to give AI-empowered humans a go at the problem first.
And suppose there is, in fact, some enormous upside to AGI that cannot be obtained by humanity using in-Gate tools. Do we lose those by never building AGI and superintelligence? In weighing the risks and rewards here, there is an enormous asymmetric benefit in waiting versus rushing: we can wait until it can be done in a guaranteed safe and beneficial way, and almost everyone will still get to reap the rewards; if we rush, it could be – in the words of the OpenAI CEO Sam Altman – lights out for all of us.
But if non-AGI tools are potentially so powerful, can we manage them? The answer is a clear…maybe.
But it will not be easy. Current cutting-edge AI systems can greatly empower people and institutions in achieving their goals. This is, in general, a good thing! However, there are natural dynamics of having such systems at our disposal – suddenly and without much time for society to adapt – that offer serious risks that need to be managed. It is worth discussing a few major classes of such risks, and how they may be diminished, assuming a Gate closure.
One class of risks is of high-powered Tool AI allowing access to knowledge or capability that had previously been tied to a person or organization, making a combination of high capability plus high loyalty available to a very broad array of actors. Today, with enough money a person of ill intent could hire a team of chemists to design and produce new chemical weapons – but it isn’t so very easy to have that money or to find/assemble the team and convince them to do something pretty clearly illegal, unethical, and dangerous. To prevent AI systems from playing such a role, improvements on current methods may well suffice,135 as long as all those systems and access to them are responsibly managed. On the other hand, if powerful systems are released for general use and modification, any built-in safety measures are likely removable. So to avoid risks in this class, strong restrictions as to what can be publicly released – analogous to restrictions on details of nuclear, explosive, and other dangerous technologies – will be required.136
A second class of risks stems from the scaling up of machines that act like or impersonate people. At the level of harm to individual people, these risks include much more effective scams, spam, and phishing, and the proliferation of non-consensual deepfakes.137 At a collective level, they include disruption of core social processes like public discussion and debate, our societal information and knowledge gathering, processing, and dissemination systems, and our political choice systems. Mitigating this risk is likely to involve (a) laws restricting the impersonation of people by AI systems, and holding liable AI developers that create systems that generate such impersonations, (b) watermarking and provenance systems that identify and classify (responsibly) generated AI content, and (c) new socio-technical epistemic systems that can create a trusted chain from data (e.g. cameras and recordings) through facts, understanding, and good world-models.138 All of this is possible, and AI can help with some parts of it.
A third general risk is that to the degree some tasks are automated, the humans presently doing those tasks can have less financial value as labor. Historically, automating tasks has made things enabled by those tasks cheaper and more abundant, while sorting the people previously doing those tasks into those still involved in the automated version (generally at higher skill/pay), and those whose labor is worth less or little. On net it is difficult to predict in which sectors more versus less human labor will be required in the resulting larger but more efficient sector. In parallel, the automation dynamic tends to increase inequality and general productivity, decrease the cost of certain goods and services (via efficiency increases), and increase the cost of others (via cost disease). For for those on the disfavored side of the inequality increase, it is deeply unclear whether the cost decrease in those certain goods and services outweighs the increase in others, and leads to overall greater well-being. So how will this go for AI? Because of the relative ease with which human intellectual labor can be replaced by general AI, we can expect a rapid version of this with human-competitive general-purpose AI.139 If we close the Gate to AGI, many fewer jobs will be wholesale replaced by AI agents; but huge labor displacement is still probable over a period of years.140 To avoid widespread economic suffering, it will likely be necessary to implement both some form of universal basic assets or income, and also engineer a cultural shift toward valuing and rewarding human-centric labor that is harder to automate (rather than seeing labor prices to drop due to the rise in available labor pushed out of other parts of the economy.) Other constructs, such as that of “data dignity” (in which the human producers of training data are auto-accorded royalties for the value created by that data in AI) may help. Automation by AI also has a second potential adverse effect, which is of inappropriate automation. Along with applications where AI simply does a worse job, this would include those where AI systems are likely to violate moral, ethical, or legal precepts – for example in life and death decisions, and in judicial matters. These must be treated by applying and extending our current legal frameworks.
Finally, a significant threat of in-gate AI is its use in personalized persuasion, attention capture, and manipulation. We have seen in social media and other online platforms the growth of a deeply entrenched attention economy (where online services battle fiercely for user attention) and “surveillance capitalism” systems (in which user information and profiling is added to the commodification of attention.) It is all but certain that more AI will be put into the service of both. AI is already heavily used in addictive feed algorithms, but this will evolve into addictive AI-generated content, customized to be compulsively consumed by a single person. And that person’s input, responses, and data, will be fed into the attention/advertising machine to continue the vicious cycle. As well, as AI helpers provided by tech companies become the interface for more online life, they will likely replace search engines and feeds as the mechanism by which persuasion and monetization of customers occurs. Our society’s failure to control these dynamics so far does not bode well. Some of this dynamic may be lessened via regulations concerning privacy, data rights, and manipulation. Getting more to the problem’s root may require different perspectives, such as that of loyal AI assistants (discussed below.)
The upshot of this discussion is that of hope: in-Gate tool-based systems – at least as long as they stay comparable in power and capability to today’s most cutting-edge systems – are probably manageable if there is will and coordination to do so. Decent human institutions, empowered by AI tools,141 can do it. We could also fail in doing it. But it is hard to see how allowing more powerful systems would help – other than by putting them in charge and hoping for the best.
Races for AI supremacy – driven by national security or other motivations – drive us toward uncontrolled powerful AI systems that would tend to absorb, rather than bestow, power. An AGI race between the US and China is a race to determine which nation superintelligence gets first.
So what should those in charge of national security do instead? Governments have strong experience in building controllable and secure systems, and they should double-down on doing so in AI, supporting the sort of infrastructure projects that succeed best when done at scale and with government imprimatur.
Instead of a reckless “Manhattan project” toward AGI,142 the US government could launch an Apollo project for controllable, secure, trustworthy systems. This could include for example:
In general, there is an enormous attack surface on our society that makes us vulnerable to risks from AI and its misuse. Protecting from some of these risks will require government-sized investment and standardization. These would provide vastly more security than pouring gasoline on the fire of races toward AGI. And if AI is going to be built into weaponry and command-and-control systems, it is crucial that the AI be trustworthy and secure, which current AI simply is not.
This essay has focused on the idea of human control of AI and its potential failure. But another valid lens through which to view the AI situation is through concentration of power. The development of very powerful AI threatens to concentrate power either into the very few and very large corporate hands that have developed and will control it, or into governments using AI as a new means to maintain their own power and control, or into the AI systems themselves. Or some unholy mix of the above. In any of these cases most of humanity loses power, control, and agency. How might we combat this?
The very first and most important step, of course, is a Gate closure to smarter-than-human AGI and superintelligence. These explicitly can directly replace humans and groups of humans. If they are under corporate or government control they will concentrate power in those corporations or governments; if they are “free” they will concentrate power into themselves. So let’s assume the Gates are closed. Then what?
One proposed solution to power concentration is “open-source” AI, where model weights are freely or widely available. But as mentioned earlier, once a model is open, most safety measures or guardrails can be (and generally are) stripped away. So there is an acute tension between on the one hand decentralization, and on the other hand safety, security, and human control of AI systems. There are also reasons to be skeptical that open models will by themselves meaningfully combat power concentration in AI any more than they have in operating systems (still dominated by Microsoft, Apple, and Google despite open alternatives).144
Yet there may be ways to square this circle – to centralize and mitigate risks while decentralizing capability and economic reward. This requires rethinking both how AI is developed and how its benefits are distributed.
New models of public AI development and ownership would help. This could take several forms: government-developed AI (subject to democratic oversight),145 nonprofit AI development organizations (like Mozilla for browsers), or structures enabling very widespread ownership and governance. Key is that these institutions would be explicitly chartered to serve the public interest while operating under strong safety constraints.146 Well-crafted regulatory and standards/certifications regimes will also be vital, so that AI products offered by a vibrant market stay genuinely useful rather than exploitative toward their users.
In terms of economic power concentration, we can use provenance tracking and “data dignity” to ensure economic benefits flow more widely. In particular, most AI power now (and in the future if we keep the Gates closed) stems from human-generated data, whether direct training data or human feedback. If AI companies were required to compensate data providers fairly,147 this could at least help distribute the economic rewards more broadly. Beyond this, another model could be public ownership of significant fractions of large AI companies. For example, governments able to tax AI companies could invest a fraction of receipts into a sovereign wealth fund that holds stock in the companies, and pays dividends to the populace.148
Crucial in these mechanisms is to use the power of AI itself to help distribute power better, rather than simply fighting AI-driven power concentration using non-AI means. One powerful approach would be through well-designed AI assistants that operate with genuine fiduciary duty to their users – putting users’ interests first, especially above corporate providers’.149 These assistants must be truly trustworthy, technically competent yet appropriately limited based on use case and risk level, and widely available to all through public, nonprofit, or certified for-profit channels. Just as we would never accept a human assistant who secretly works against our interests for another party, we should not accept AI assistants that surveil, manipulate, or extract value from their users for corporate benefit.
Such a transformation would fundamentally alter the current dynamic where individuals are left to negotiate alone with vast (AI powered) corporate and bureaucratic machines that prioritize value extraction over human welfare. While there are many possible approaches to redistributing AI-driven power more broadly, none will emerge by default: they must be deliberately engineered and governed with mechanisms like fiduciary requirements, public provision, and tiered access based on risk.
Approaches to mitigate power concentration can face significant headwinds from incumbent powers.150 But there are paths toward AI development that don’t require choosing between safety and concentrated power. By building the right institutions now, we could ensure that AI’s benefits are widely shared while its risks are carefully managed.
Our current governance structures are struggling: they are slow to respond, often captured by special interests, and increasingly distrusted by the public. Yet this is not a reason to abandon them – quite the opposite. Some institutions may need replacing, but more broadly we need new mechanisms that can enhance and supplement our existing structures, helping them function better in our rapidly evolving world.
Much of our institutional weakness stems not from formal government structures, but from degraded social institutions: our systems for developing shared understanding, coordinating action, and conducting meaningful discourse. So far, AI has accelerated this degradation, flooding our information channels with generated content, pointing us to the most polarizing and divisive content, and making it harder to distinguish truth from fiction.
But AI could actually help rebuild and strengthen these social institutions. Consider three crucial areas:
First, AI could help restore trust in our epistemic systems – our ways of knowing what is true. We could develop AI-powered systems that track and verify the provenance of information, from raw data through analysis to conclusions. These systems could combine cryptographic verification with sophisticated analysis to help people understand not just whether something is true, but how we know it’s true.151 Loyal AI assistants could be charged with following the details to ensure that they check out.
Second, AI could enable new forms of large-scale coordination. Many of our most pressing problems – from climate change to antibiotic resistance – are fundamentally coordination problems. We’re stuck in situations that are worse than they could be for nearly everyone, because no individual or group can afford to make the first move. AI systems could help by modeling complex incentive structures, identifying viable paths to better outcomes, and facilitating the trust-building and commitment mechanisms needed to get there.
Perhaps most intriguingly, AI could enable entirely new forms of social discourse. Imagine being able to “talk to a city”152 – not just viewing statistics, but having a meaningful dialogue with an AI system that processes and synthesizes the views, experiences, needs, and aspirations of millions of residents. Or consider how AI could facilitate genuine dialogue between groups that currently talk past each other, by helping each side better understand the other’s actual concerns and values rather than their caricatures of each other.153 Or AI could offer skilled, credibly neutral intermediation of disputes between people or even large groups of people (who could all interact with it directly and individually!) Current AI is totally capable of doing this work, but the tools to do so will not come into being by themselves, or via market incentives.
These possibilities might sound utopian, especially given AI’s current role in degrading discourse and trust. But that’s precisely why we must actively develop these positive applications. By closing the Gates to uncontrollable AGI and prioritizing AI that enhances human agency, we can steer technological progress toward a future where AI serves as a force for empowerment, resilience, and collective advancement.
If the road we are currently on leads to the likely end of our civilization, how do we change roads?
Suppose the desire to stop developing AGI and superintelligence were widespread and powerful,87 because it becomes common understanding that AGI would be power-absorbing rather than power-granting, and a profound danger to society and humanity. How would we close the Gates?
At present we know of only one way to make powerful and general AI, which is via truly massive computations of deep neural networks. Because these are incredibly difficult and expensive things to do, there is a sense in which not doing them is easy.88 But we have already seen the forces that are driving toward AGI, and the game-theoretic dynamics that make it very difficult for any party to unilaterally stop. So it would take a combination of intervention from the outside (i.e. governments) to stop corporations, and agreements between governments to stop themselves.89 What could this look like?
It is useful first to distinguish between AI developments that must be prevented or prohibited, and those that must be managed. The first would primarily be runaway to superintelligence.90 For prohibited development, definitions should be as crisp as possible, and both verification and enforcement should be practical. What must be managed would be general, powerful AI systems – which we already have, and that will have many gray areas, nuance, and complexity. For these, strong effective institutions are crucial.
We may also usefully delineate issues that must be addressed at an international level (including between geopolitical rivals or adversaries)91 from those that individual jurisdictions, countries, or collections of countries can manage. Prohibited development largely falls into the “international” category, because a local prohibition on the development of a technology can generally be circumvented by changing location.92
Finally, we can consider tools in the toolbox. There are many, including technical tools, soft law (standards, norms, etc., hard law (regulations and requirements), liability, market incentives, and so on. Let’s put special attention on one that is particular to AI.
A core tool in governing high-powered AI will be the hardware it requires. Software proliferates easily, has near-zero marginal production cost, crosses borders trivially, and can be instantly modified; none of these are true of hardware. Yet as we’ve discussed, huge amounts of this “compute” are necessary during both training of AI systems and during inference to achieve the most capable systems. Compute can be easily quantified, accounted, and audited, with relatively little ambiguity once good rules for doing so are developed. Most crucially, large amounts of computation are, like enriched uranium, a very scarce, expensive and hard-to-produce resource. Although computer chips are ubiquitous, the hardware required for AI is expensive and enormously difficult to manufacture.93
What makes AI-specialized chips far more manageable as a scarce resource than uranium is that they can include hardware-based security mechanisms. Most modern cellphones, and some laptops, have specialized on-chip hardware features that allow them to ensure that they install only approved operating system software and updates, that they retain and protect sensitive biometric data on-device, and that they can be rendered useless to anyone but their owner if lost or stolen. Over the past several years such hardware security measures have become well-established and widely adopted, and generally proven quite secure.
The key novelty of these features is that they bind hardware and software together using cryptography.94 That is, just having a particular piece of computer hardware does not mean that a user can do anything they want with it by applying different software. And this binding also provides powerful security because many attacks would require a breach of hardware rather than just software security.
Several recent reports (e.g. from GovAI and collaborators, CNAS, and RAND) have pointed out that similar hardware features embedded in cutting edge AI-relevant computing hardware could play an extremely useful role in AI security and governance. They enable a number of functions available to a “governor”95 that one might not guess were available or even possible. As some key examples:
With these considerations – especially regarding computation – in place, we can discuss how to close the Gates to artificial superintelligence; we’ll then turn to preventing full AGI, and managing AI models as they approach and exceed human capability in different aspects.
The first ingredient is, of course, the understanding that superintelligence would not be controllable, and that its consequences are fundamentally unpredictable. At least China and the US must independently decide, for this or other purposes, not to build superintelligence.100 Then an international agreement between them and others, with a strong verification and enforcement mechanism, is needed to assure all parties that their rivals are not defecting and deciding to roll the dice.
To be verifiable and enforceable the limits should be hard limits, and as unambiguous as possible. This seems like a virtually impossible problem: limiting the capabilities of complex software with unpredictable properties, worldwide. Fortunately the situation is much better than this, because the very thing that has made advanced AI possible – a huge amount of compute – is much, much easier to control. Although it might still allow some powerful and dangerous systems, runaway superintelligence can likely be prevented by a hard cap on the amount of computation that goes into a neural network, along with a rate limit on the amount of inference that an AI system (of connected neural networks and other software) can perform. A specific version of this is proposed below.
It may seem that placing hard global limits on AI computation would require huge levels of international coordination and intrusive, privacy-shattering surveillance. Fortunately, it would not. The extremely tight and bottle-necked supply chain provides that once a limit is set legally (whether by law or executive order), verification of compliance to that limit would only require involvement and cooperation of a handful of large companies.101
A plan like this has a number of highly desirable features. It is minimally invasive in the sense that only a few major companies have requirements placed on them, and only fairly significant clusters of computation would be governed. The relevant chips already contain the hardware capabilities needed for a first version.102 Both implementation and enforcement rely on standard legal restrictions. But these are backed up by terms-of-use of the hardware and by hardware controls, vastly simplifying enforcement and forestalling cheating by companies, private groups, or even countries. There is ample precedent for hardware companies placing remote restrictions on their hardware usage, and locking/unlocking particular capabilities externally,103 including even in high-powered CPUs in data centers.104 Even for the rather small fraction of hardware and organizations affected, the oversight could be limited to telemetry, with no direct access to data or models themselves; and the software for this could be open to inspection to exhibit that no additional data is being recorded. The schema is international and cooperative, and quite flexible and extensible. Because the limit chiefly is on hardware rather than software, it is relatively agnostic as to how AI software development and deployment occurs, and is compatible with variety of paradigms including more “decentralized” or “public” AI aimed combating AI-driven concentration of power.
A computation-based Gate closure does have drawbacks as well. First, it is far from a full solution to the problem of AI governance in general. Second, as computer hardware gets faster, the system would “catch” more and more hardware in smaller and smaller clusters (or even individual GPUs).105 It is also possible that due to algorithmic improvements an even lower computation limit would in time be necessary,106 or that computation amount becomes largely irrelevant and closing the Gate would instead necessitate a more detailed risk-based or capability-base governance regime for AI. Third, no matter the guarantees and the small number of entities affected, such a system is bound to create push-back regarding privacy and surveillance, among other concerns.107
Of course, developing and implementing a compute-limiting governance scheme in a short time period will be quite challenging. But it absolutely is doable.
Let us now turn to AGI. Hard lines and definitions here are more difficult, because we certainly have intelligence that is artificial and general, and by no extant definition will everyone agree if or when it exists. Moreover, a compute or inference limit is a somewhat blunt tool (compute being a proxy for capability, which is then a proxy for risk) that – unless it is quite low – is unlikely to prevent AGI that is powerful enough to cause social or civilizational disruption or acute risks.
I’ve argued that the most acute risks emerge from the triple-intersection of very high capability, high autonomy, and great generality. These are the systems that – if they are developed at all – must be managed with enormous care. By creating stringent standards (through liability and regulation) for systems combining all three properties, we can channel AI development toward safer alternatives.
As with other industries and products that could potentially harm consumers or the public, AI systems require careful regulation by effective and empowered government agencies. This regulation should recognize the inherent risks of AGI, and prevent unacceptably risky high-powered AI systems from being developed.108
However, large-scale regulation, especially with real teeth that are sure to be opposed by industry,109 takes time110 as well as political conviction that it is necessary.111 Given the pace of progress, this may take more time than we have available.
On a much faster timescale and as regulatory measures are being developed, we can give companies the necessary incentives to (a) desist from very high-risk activities and (b) develop comprehensive systems for assessing and mitigating risk, by clarifying and increasing liability levels for the most dangerous systems. The idea would be to impose the very highest levels of liability – strict and in some cases personal criminal – for systems in the triple-intersection of high autonomy-generality-intelligence, but to provide “safe harbors” to more typical fault-based liability for systems in which one of those properties is lacking or guaranteed to be manageable. That is, for example, a “weak” system that is general and autonomous (like a capable and trustworthy but limited personal assistant) would be subject to lower liability levels. Likewise a narrow and autonomous system like a self-driving car would still be subject to the significant regulation it already is, but not enhanced liability. Similarly for a highly capable and general system that is “passive” and largely incapable of independent action. Systems lacking two of the three properties are yet more manageable and safe harbors would be even easier to claim. This approach mirrors how we handle other potentially dangerous technologies:112 higher liability for more dangerous configurations creates natural incentives for safer alternatives.
The default outcome of such high levels of liability, which act to internalize AGI risk to companies rather than offload it to the public, is likely (and hopefully!) for companies to simply not develop full AGI until and unless they can genuinely make it trustworthy, safe, and controllable given that their own leadership are the parties at risk. (In case this is not sufficient, the legislation clarifying liability should also explicitly allow for injunctive relief, i.e. a judge ordering a halt, for activities that are clearly in the danger zone and arguably pose a public risk.) As regulation comes into place, abiding by regulation can become the safe harbor, and the safe harbors from low autonomy, narrowness, or weakness of AI systems can convert into relatively lighter regulatory regimes.
With the above discussion in mind, this section provides proposals for key provisions that would implement and maintain prohibition on full AGI and superintelligence, and management of human-competitive or expert-competitive general-purpose AI near the full AGI threshold.113 It has four key pieces: 1) compute accounting and oversight, 2) compute caps in training and operation of AI, 3) a liability framework, and 4) tiered safety and security standards defined that include hard regulatory requirements. These are succinctly described next, with further details or implementation examples given in three accompanying tables. Importantly, note that these are far from all that will be necessary to govern advanced AI systems; while they will have additional security and safety benefits, they are aimed at closing the Gate to intelligence runaway, and redirecting AI development in a better direction.
Rationale: These well-computed and transparently reported numbers would provide the basis for training and operation caps, as well as a safe harbor from higher liability measures (see Appendixes C and D).
Rationale: Total computation, while very imperfect, is a proxy for AI capability (and risk) that is concretely measurable and verifiable, so provides a hard backstop for limiting capabilities. A concrete implementation proposal is given in Appendix B.
Rationale: AI systems cannot be held responsible, so we must hold human individuals and organizations responsible for harm they cause (liability).120 Uncontrollable AGI is a threat to society and civilization and in the absence of a safety case should be considered abnormally dangerous. Putting the burden of responsibility on developers to show that powerful models are safe enough not to be considered “abnormally dangerous” incentivizes safe development, along with transparency and record-keeping to claim those safe harbors. Regulation can then prevent harm where deterrence from liability is insufficient. Finally, AI developers are already liable for damages they cause, so legally clarifying liability for the most risky of systems can be done immediately, without highly detailed standards being developed; these can then develop over time. Details are given in Appendix C.
A regulatory system that addresses large-scale acute risks of AI will require at minimum:
Rationale: Ultimately, liability is not the right mechanism for preventing large-scale risk to the public from a new technology. Comprehensive regulation, with empowered regulatory bodies, will be needed for AI just as for every other major industry posing a risk to the public.123
Regulation toward preventing other pervasive but less acute risks is likely to vary in its form from jurisdiction to jurisdiction. The crucial thing is to avoid developing the AI systems that are so risky that these risks are unmanageable.
Over the next decade, as AI becomes more pervasive and the core technology advances, two key things are likely to happen. First, regulation of existing powerful AI systems will become more difficult, yet even more necessary. It is likely that at least some measures addressing large-scale safety risks will require agreement at the international level, with individual jurisdictions enforcing rules based on international agreements.
Second, training and operation compute caps will become harder to maintain as hardware becomes cheaper and more cost efficient; they may also become less relevant (or need to be even tighter) with advances in algorithms and architectures.
That controlling AI will become harder does not mean we should give up! Implementing the plan outlined in this essay would give us both valuable time and crucial control over the process that would put us in a far, far better position to avoid the existential risk of AI to our society, civilization, and species.
In the yet longer term, there will be choices to make as to what we allow. We may choose still to create some form of genuinely controllable AGI, to the degree this proves possible. Or we may decide that running the world is better left to the machines, if we can convince ourselves that they will do a better job of it, and treat us well. But these should be decisions made with deep scientific understanding of AI in hand, and after meaningful global inclusive discussion, not in a race between tech moguls with most of humanity completely uninvolved and unaware.
The development of full artificial general intelligence – what we will call here AI that is “outside the Gates” – would be a fundamental shift in the nature of the world: by its very nature it means adding a new species of intelligence to Earth with greater capability than that of humans.
What will then happen depends on many things, including the nature of the technology, choices by those developing it, and the world context in which it is being developed.
Currently, full AGI is being developed by a handful of massive private companies in a race with each other, with little meaningful regulation or external oversight,55 in a society with increasingly weak and even dysfunctional core institutions,56 in a time of high geopolitical tension and low international coordination. Although some are altruistically motivated, many of those doing it are driven by money, or power, or both.
Prediction is very difficult, but there are some dynamics that are well enough understood, and apt-enough analogies with previous technologies to offer a guide. And unfortunately, despite AI’s promise, they give good reason to be profoundly pessimistic about how our current trajectory will play out.
To put it bluntly, on our present course developing AGI will have some positive effects (and make some people very, very rich). But the nature of the technology, the fundamental dynamics, and the context in which it is being developed, strongly indicate that: powerful AI will dramatically undermine our society and civilization; we will lose control of it; we may well end up in a world war because of it; we will lose (or cede) control to it; it will lead to artificial superintelligence, which we absolutely will not control and will mean the end of a human-run world.
These are strong claims, and I wish they were idle speculation or unwarranted “doomer”ism. But this is where the science, the game theory, the evolutionary theory, and history all point. This section develops these claims, and their support, in detail.
Despite what you may hear in Silicon Valley boardrooms, most disruption – especially of the very rapid variety – is not beneficial. There are vastly more ways to make complex systems worse than better. Our world functions as well as it does because we have painstakingly built processes, technologies, and institutions that have made it steadily better.57 Taking a sledgehammer to a factory rarely improves operations.
Here is an (incomplete) catalog of ways AGI systems would disrupt our civilization.
These risks are not speculative. Many of them are being realized as we speak, via existing AI systems! But consider, really consider, what each would look like with dramatically more powerful AI.
Consider labor displacement when most workers simply cannot provide any significant economic value beyond what AI can, in their field of expertise or experience – or even if they retrain! Consider mass surveillance if everyone is being individually watched and monitored by something faster and cleverer than themselves. What does democracy look like when we cannot reliably trust any digital information that we see, hear, or read, and when the most convincing public voices are not even human, and have no stake in the outcome? What becomes of warfare when generals have to constantly defer to AI (or simply put it in charge), lest they grant a decisive advantage to the enemy? Any one of the above risks represents a catastrophe for human61 civilization if fully realized.
You can make your own predictions. Ask yourself these three questions for each risk:
Where your answers are “yes, yes, no” you can see we have got a big problem.
What is our plan for managing them? As it stands there are two on the table regarding AI in general.
The first is to build safeguards into the systems to prevent them from doing things they shouldn’t. That’s being done now: commercial AI systems will, for example, refuse to help build a bomb or write hate speech.
This plan is woefully inadequate for systems outside the Gate.62 It may help decrease risk of AI providing manifestly dangerous assistance to bad actors. But it will do nothing to prevent labor disruption, concentration of power, runaway hyper-capitalism, or replacement of human culture: these are just results of using the systems in permitted ways that profit their providers! And governments will surely obtain access to systems for military or surveillance use.
The second plan is even worse: simply to openly release very powerful AI systems for anyone to use as they like,63 and hope for the best.
Implicit in both plans is that someone else, e.g. governments, will help to solve the problems through soft or hard law, standards, regulations, norms, and other mechanisms we generally use to manage technologies.64 But putting aside that AI corporations already fight tooth-and-nail against any substantial regulation or externally imposed limitations at all, for a number of these risks it’s quite hard to see what regulation would even really help. Regulation could impose safety standards on AI. But would it prevent companies from replacing workers wholesale with AI? Would it forbid people from letting AI run their companies for them? Would it prevent governments from using potent AI in surveillance and weaponry? These issues are fundamental. Humanity could potentially find ways to adapt to them, but only with much more time. As it is, given the speed that AI is reaching or exceeding the capabilities of the people trying to manage them, these problems look increasingly intractable.
Most technologies are very controllable, by construction. If your car or your toaster starts doing something you don’t want it to do, that’s just a malfunction, not part of its nature as a toaster. AI is different: it is grown rather than designed, its core operation is opaque, and it is inherently unpredictable.
This loss of control isn’t theoretical – we see early versions already. Consider first a prosaic, and arguably benign example. If you ask ChatGPT to help you mix a poison, or write a racist screed, it will refuse. That’s arguably good. But it is also ChatGPT not doing what you’ve explicitly asked it to do. Other pieces of software do not do that. That same model won’t design poisons at the request of an OpenAI employee either.65 This makes it very easy to imagine what it would be like for future more powerful AI to be out of control. In many cases, they will simply not do what we ask! Either a given super-human AGI system will be absolutely obedient and loyal to some human command system, or it won’t. If not, it will do things it may believe are good for us, but that are contrary to our explicit commands. That isn’t something that is under control. But, you might say, this is intentional – these refusals are by design, part of what is called “aligning” the systems to human values. And this is true. However the alignment “program” itself has two major problems.66
First, at a deep level we have no idea how to do it. How do we guarantee that an AI system will “care” about what we want? We can train AI systems to say and not say things by providing feedback; and they can learn and reason about what humans want and care about just as they reason about other things. But we have no method – even theoretically – to cause them to deeply and reliably value what people care about. There are high-functioning human psychopaths who know what is considered right and wrong, and how they are supposed to behave. They simply don’t care. But they can act as if they do, if it suits their purpose. Just as we don’t know how to change a psychopath (or anyone else) into someone genuinely, completely loyal or aligned with someone or something else, we have no idea67 how to solve the alignment problem in systems advanced enough to model themselves as agents in the world and potentially manipulate their own training and deceive people. If it proves impossible or unachievable either to make AGI fully obedient or to make it deeply care about humans, then as soon as it is able (and believes it can get away with it) it will start doing things we do not want.68
Second, there are deep theoretical reasons to believe that by nature advanced AI systems will have goals and thus behaviors that are contrary to human interests. Why? Well it might, of course, be given those goals. A system created by the military would likely be deliberately bad for at least some parties. Much more generally, however, an AI system might be given some relatively neutral (“make lots of money”) or even ostensibly positive (“reduce pollution”) goal, that almost inevitably leads to “instrumental” goals that are rather less benign.
We see this all the time in human systems. Just as corporations pursuing profit develop instrumental goals like acquiring political power (to de-fang regulations), becoming secretive (to disempower competition or external control), or undermining scientific understanding (if that understanding shows their actions to be harmful), powerful AI systems will develop similar capabilities – but with far greater speed and effectiveness. Any highly competent agent will want to do things like acquire power and resources, increase its own capabilities, prevent itself from being killed, shut-down, or disempowered, control social narratives and frames around its actions, persuade others of its views, and so on.69
And yet it is not just a nearly unavoidable theoretical prediction, it is already observably happening in today’s AI systems, and increasing with their capability. When evaluated, even these relatively “passive” AI systems will, in appropriate circumstances, deliberately deceive evaluators about their goals and capabilities, aim to disable oversight mechanisms, and evade being shut down or retrained by faking alignment or copying themselves to other locations. While wholly unsurprising to AI safety researchers, these behaviors are very sobering to observe. And they bode very badly for far more powerful and autonomous AI systems that are coming.
Indeed in general, our inability to ensure that AI “cares” about what we care about, or behaves controllably or predictably, or avoids developing drives toward self-preservation, power acquisition, etc., promise only to become more pronounced as AI becomes more powerful. Creating a new airplane implies greater understanding of avionics, hydrodynamics, and control systems. Creating a more powerful computer implies greater understanding and mastery of computer, chip, and software operation and design. Not so with an AI system.70
To sum up: it is conceivable that AGI could be made to be completely obedient; but we don’t know how to do so. If not, it will be more sovereign, like people, doing various things for various reasons. We also don’t know how to reliably instill deep “alignment” into AI that would make those things tend to be good for humanity, and in the absence of a deep level of alignment, the nature of agency and intelligence itself indicates that – just like people and corporations – they will be driven to do many deeply antisocial things.
Where does this put us? A world full of powerful uncontrolled sovereign AI might end up being a good world for humans to be in.71 But as they grow ever more powerful, as we’ll see below, it wouldn’t be our world.
That’s for uncontrollable AGI. But even if AGI could, somehow, be made perfectly controlled and loyal, we’d still have enormous problems. We’ve already seen one: powerful AI can be used and misused to profoundly disrupt our society’s functioning. Let’s see another: insofar as AGI were controllable and game-changingly powerful (or even believed to be so) it would so threaten power structures in the world as to present a profound risk.
Imagine a situation in the near-term future, where it became clear that a corporate effort, perhaps in collaboration with a national government, was on the threshold of rapidly self-improving AI. This happens in the present context of a race between companies, and a geopolitical competition in which recommendations are being made to the US government to explicitly pursue an “AGI Manhattan project” and the US is controlling export of high-powered AI chips to non-allied countries.
The game theory here is stark: once such a race begins (as it has, between companies and somewhat between countries), there are only four possible outcomes:
Let’s examine each possibility. Once started, peacefully stopping a race between companies would require national government intervention (for companies) or unprecedented international coordination (for countries). But when any closing down or significant caution is proposed, there would be immediate cries: “but if we’re stopped, they are going to rush ahead”, where “they” is now China (for the US), or the US (for China), or China and the US (for Europe or India). Under this mindset,72 no participant can stop unilaterally: as long as one commits to racing, the others feel they cannot afford to stop.
The second possibility has one side “winning.” But what does this mean? Just obtaining (somehow obedient) AGI first is not enough. The winner must also stop the others from continuing to race – otherwise they will also obtain it. This is possible in principle: whoever develops AGI first could gain unstoppable power over all other actors. But what would achieving such a “decisive strategic advantage” actually require? Perhaps it would be game-changing military capabilities?73 Or cyberattack powers?74 Perhaps the AGI would just be so amazingly persuasive that it would convince the other parties to just stop?75 So rich that it buys the other companies or even countries?76
How exactly does one side build an AI powerful enough to disempower others from building comparably powerful AI? But that’s the easy question.
Because now consider how this situation looks to other powers. What does the Chinese government think when the US appears to be obtaining such capability? Or vice-versa? What does the US government (or Chinese, or Russian, or Indian) think when OpenAI or DeepMind or Anthropic appears close to a breakthrough? What happens if the US sees a new Indian or UAE effort with breakthrough success? They would see both an existential threat and – crucially – that the only way this “race” ends is through their own disempowerment. These very powerful agents – including governments of fully equipped nations that surely have the means to do so – would be highly motivated to either obtain or destroy such a capability, whether by force or subterfuge.77
This might start small-scale, as sabotage of training runs or attacks on chip manufacturing, but these attacks can only really stop once all parties either lose the capacity to race on AI, or lose the capacity to make the attacks. Because the participants view the stakes as existential, either case is likely to represent a catastrophic war.
That brings us to the fourth possibility: racing to superintelligence, and in the fastest, least controlled way possible. As AI increases in power, its developers on both sides will find it progressively harder to control, especially because racing for capabilities is antithetical to the sort of careful work controllability would require. So this scenario put us squarely in the case where control is lost (or given, as we’ll see next) to the AI systems themselves. That is, AI wins the race. But on the other hand, to the degree that contol is maintained, we continue to have multiple mutually hostile parties each in charge of extremely powerful capabilities. That looks like war again.
Let’s put this all another way.78 The current world simply does not have any institutions that could be entrusted to house development of an AI of this capability without inviting immediate attack.79 All parties will correctly reason that either it will not be under control – and hence is a threat to all parties, or it will be under control, and hence is a threat to any adversary who develops it less quickly. These are nuclear-armed countries, or are companies housed within them.
In the absence of any plausible way for humans to “win” this race, we’re left with a stark conclusion: the only way this race ends is either in catastrophic conflict or where AI, and not any human group, is the winner.
Geopolitical “great powers” competition is just one of many competitions: individuals compete economically and socially; companies compete in markets; political parties compete for power; movements compete for influence. In each arena, as AI approaches and exceeds human capability, competitive pressure will force participants to delegate or cede more and more control to AI systems – not because those participants want to, but because they cannot afford not to.
As with other risks of AGI, we are seeing this already with weaker systems. Students feel pressure to use AI in their assignments, because clearly many other students are. Companies are scrambling to adopt AI solutions for competitive reasons. Artists and programmers feel forced to use AI or else their rates will be undercut by others that do.
These feel like pressured delegation, but not control loss. But let’s dial up the stakes and push forward the clock. Consider a CEO whose competitors are using AGI “aides” to make faster, better decisions, or a military commander facing an adversary with AI-enhanced command and control. A sufficiently advanced AI system could autonomously operate at many times human speed, sophistication, complexity, and data-processing capability, pursuing complex goals in complicated ways. Our CEO or commander, in charge of such a system, may see it accomplish what they want; but would they understand even a small part of how it was accomplished? No, they would just have to accept it. What’s more, much of what the system may do is not just take orders but advise its putative boss on what to do. That advice will be good –– over and over again.
At what point, then, will the role of the human be reduced to clicking “yes, go ahead”?
It feels good to have capable AI systems that can enhance our productivity, take care of annoying drudgery, and even act as a thought-partner in getting things done. It will feel good to have an AI assistant that can take care of actions for us, like a good human personal assistant. It will feel natural, even beneficial, as AI becomes very smart, competent, and reliable, to defer more and more decisions to it. But this “beneficial” delegation has a clear endpoint if we continue down the road: one day we will find that we are not really in charge of much of anything anymore, and that the AI systems actually running the show can no more be turned off than oil companies, social media, the internet, or capitalism.
And this is the much more positive version, in which AI is simply so useful and effective that we let it make most of our key decisions for us. Reality would likely be much more of a mix between this and versions where uncontrolled AGI systems take various forms of power for themselves because, remember, power is useful for almost any goal one has, and AGI would be, by design, at least as effective at pursuing its goals as humans.
Whether we grant control or whether it is wrested from us, its loss seems extremely likely. As Alan Turing originally put it, “…it seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage therefore we should have to expect the machines to take control…”
Please note, although it is obvious enough, that loss of control by humanity to AI also entails loss of control of the United States by the United States government; it means loss of control of China by the Chinese Communist party, and the loss of control of India, France, Brazil, Russia, and every other country by their own government. Thus AI companies are, even if this is not their intention, currently participating in the potential overthrow of world governments, including their own. This could happen in a matter of years.
There’s a case to be made that human-competitive or even expert-competitive general-purpose AI, even if autonomous, could be manageable. It may be incredibly disruptive in all of the ways discussed above, but there are lots of very smart, agential people in the world now, and they are more-or-less manageable.80
But we won’t get to stay at roughly human level. The progression beyond is likely to be driven by the same forces we’ve already seen: competitive pressure between AI developers seeking profit and power, competitive pressure between AI users who can’t afford to fall behind, and – most importantly – AGI’s own ability to improve itself.
In a process we have already seen start with less powerful systems, AGI would itself be able to conceive and design improved versions of itself. This includes hardware, software, neural networks, tools, scaffolds, etc. It will, by definition, be better than us at doing this, so we don’t know exactly how it will intelligence-bootstrap. But we won’t have to. Insofar as we still have influence in what AGI does, we merely would need to ask it to, or let it.
There’s no human-level barrier to cognition that could protect us from this runaway.81
The progression of AGI to superintelligence is not a law of nature; it would still be possible to curtail the runaway, especially if AGI is relatively centralized and to the extent it is controlled by parties that do not feel pressure to race each other. But should AGI be widely proliferated and highly autonomous, it seems nearly impossible to prevent it deciding it should be more, and then yet more, powerful.
To put it bluntly, we have no idea what would happen if we build superintelligence.82 It would take actions we cannot track or perceive for reasons we cannot grasp toward goals we cannot conceive. What we do know is that it won’t be up to us.83
The impossibility of controlling superintelligence can be understood through increasingly stark analogies. First, imagine you are CEO of a large company. There’s no way you can track everything that’s going on, but with the right setup of personnel, you can still meaningfully understand the big picture, and make decisions. But suppose just one thing: everyone else in the company operates at one hundred times your speed. Can you still keep up?
With superintelligent AI, people would be “commanding” something not just faster, but operating at levels of sophistication and complexity they cannot comprehend, processing vastly more data than they can even conceive of. This incommensurability can be put on a formal level: Ashby’s law of requisite variety (and see the related “good regulator theorem”) state, roughly, that any control system must have as many knobs and dials as the system being controlled has degrees of freedom.
A person controlling a superintelligent AI system would be like a fern controlling General Motors: even if “do what the fern wants” were written into the corporate bylaws, the systems are so different in speed and range of action that “control” simply does not apply. (And how long until that pesky bylaw gets rewritten?)84
As there are zero examples of plants controlling fortune 500 corporations, there would be exactly zero examples of people controlling superintelligences. This approaches a mathematical fact.85 If superintelligence were constructed – regardless of how we got there – the question would not be whether humans could control it, but whether we would continue to exist, and if so, whether we would have a good and meaningful existence as individuals or as a species. Over these existential questions for humanity we would have little purchase. The human era would be over.
There is a scenario in which building AGI may go well for humanity: it is built carefully, under control and for the benefit of humanity, governed by mutual agreement of many stakeholders,86 and prevented from evolving to uncontrollable superintelligence.
That scenario is not open to us under present circumstances. As discussed in this section, with very high likelihood, development of AGI would lead to some combination of:
As an early fictional depiction of AGI put it: the only way to win is not to play.
The recent fast progress in AI has resulted both from and in an extraordinary level of attention and investment. This is driven in part by success in AI development, but more is going on. Why are some of the largest companies on Earth, and even countries, racing to build not just AI, but AGI and superintelligence?
Until the past five years or so, AI has been largely an academic and scientific research problem, thus largely driven by curiosity and the drive to understand intelligence and how to create it in a new substrate.
In this phase, there was relatively little attention paid to the benefits or perils of AI among most researchers. When asked why AI should be developed, a common response might be to list, somewhat vaguely, problems that AI could help with: new medicines, new materials, new science, smarter processes, and in general improving things for people.47
These are admirable goals!48 Although we can and will question whether AGI – rather than AI in general – is necessary for these goals, they exhibit the idealism with which many AI researchers started.
Over the past half-decade, however, AI has transformed from a relatively pure research field into much more of an engineering and product field, largely driven by some of the world’s largest companies.49 Researchers, while relevant, are no longer in charge of the process.
So why are giant corporations (and even more so investors) pouring vast resources into building AGI? There are two drivers that most companies are quite honest about: they see AI as drivers of productivity for society, and of profits for them. Because general AI is by nature general-purpose, there is a huge prize: rather than choosing a sector in which to create products and services, one can try all of them at once. Big Tech companies have grown enormous by producing digital goods and services, and at least some executives surely see AI as simply the next step in providing them well, with risks and benefits that expand upon but echo those provided by search, social media, laptops, phones, etc.
But why AGI? There is a very simple answer to this, which most companies and investors shy away from discussing publicly.50
It is that AGI can directly, one-for-one, replace workers.
Not augment, not empower, not make more productive. Not even displace. All of these can and will be done by non-AGI. AGI is specifically what can fully replace thought workers (and with robotics, many physical ones as well.) As support for this view one need look no further than OpenAI’s (publicly stated) definition of AGI, which is “a highly autonomous system that outperforms humans at most economically valuable work.”
The prize here (for companies!) is enormous. Labor costs are a substantial percentage of the world’s ∼ $100 trillion global economy. Even if only a fraction of this is captured by replacement of human labor by AI labor, this is trillions of dollars of annual revenue. AI companies are also cognizant of who is willing to pay. As they see it, you are not going to pay thousands of dollars a year for productivity tools. But a company will pay thousands of dollars per year to replace your labor, if they can.
Countries’ stated motivations for pursuing AGI focus on economic and scientific leadership. The argument is compelling: AGI could dramatically accelerate scientific research, technological development, and economic growth. Given the stakes, they argue, no major power can afford to fall behind.51
But there are also additional and largely unstated drivers. There is no doubt that when certain military and national security leaders meet behind closed doors to discuss an extraordinarily potent and catastrophically risky technology, their focus is not on “how do we avoid those risks” but rather “how do we get this first?” Military and intelligence leaders see AGI as a potential revolution in military affairs, perhaps the most significant since nuclear weapons. The fear is that the first country to develop AGI could gain an insurmountable strategic advantage. This creates a classic arms race dynamic.
We’ll see that this “race to AGI” thinking,52 while compelling, is deeply flawed. This is not because racing is dangerous and risky – though it is – but due to the nature of the technology. The unstated assumption is that AGI, like other technologies, is controllable by the state that develops it, and is a power-granting boon to the society that has the most of it. As we will see, it probably won’t be either.
While companies publicly focus on productivity, and countries on economic and technological growth, for those deliberately pursuing full AGI and superintelligence these are just the start. What do they really have in mind? Although seldom said out loud, they include:
The first three are largely “single-edge” technologies – i.e. likely to be quite strongly net positive. It’s hard to argue against curing diseases or being able to live longer if one chooses. And we have already reaped the negative side of fusion (in the form of nuclear weapons); it would be lovely now to get the positive side. The question with this first category is whether getting these technologies sooner compensates for the risk.
The next four are clearly double-edged: transformative technologies with both potentially huge upsides and immense risks, much like AI. All of these, if they sprung out of a black-box tomorrow and were deployed, would be incredibly difficult to manage.53
The final two concern the super-human AI doing things itself rather than just inventing technology. More precisely, putting euphemisms aside, these involve powerful AI systems telling people what to do. Calling this “advice” is disingenuous if the system doing the advising is far more powerful than the advised, who cannot meaningfully understand the basis of decision (or even if this is provided, trust that the advisor would not provide a similarly compelling rationale for a different decision.)
This points to a key item missing from the above list:
It is abundantly clear that much of what is underlying the current race for super-human AI is the idea that intelligence = power. Each racer is banking on being the best holder of that power, and that they will be able to wield it for ostensibly benevolent reasons without it slipping or being taken from their control.
That is, what companies and nations are really chasing is not just the fruits of AGI and superintelligence, but the power to control who gets access to them and how they’re used. Companies see themselves as responsible stewards of this power in service of shareholders and humanity; nations see themselves as necessary guardians preventing hostile powers from gaining decisive advantage. Both are dangerously wrong, failing to recognize that superintelligence, by its nature, cannot be reliably controlled by any human institution. We will see that the nature and dynamics of superintelligent systems make human control extremely difficult, if not impossible.
These racing dynamics – both corporate and geopolitical – make certain risks nearly inevitable unless decisively interrupted. We turn now to examining these risks and why they cannot be adequately mitigated within a competitive54 development paradigm.
The past ten years have seen dramatic advances in AI driven by huge computational, human, and fiscal resources. Many narrow AI applications are better than humans at their assigned tasks, and are certainly far faster and cheaper.31 And there are also narrow super-human agents that can trounce all people at narrow-domain games such as Go, Chess, and Poker, as well as more general agents that can plan and execute actions in simplified simulated environments as effectively as humans can.
Most prominently, current general AI systems from OpenAI/Microsoft, Google/Deepmind, Anthropic/Amazon, Facebook/Meta, X.ai/Tesla and others32 have emerged since early 2023 and steadily (though unevenly) increased their capabilities since then. All of these have been created via token-prediction on huge text and multimedia datasets, combined with extensive reinforcement feedback from humans and other AI systems. Some of them also include extensive tool and scaffold systems.
These systems perform well across an increasingly broad range of tests designed to measure intelligence and expertise, with progress that has surprised even experts in the field:
Despite these impressive numbers (and their obvious intelligence when one interacts with them)36 there are many things (at least the released versions of) these neural networks cannot do. Currently most are disembodied – existing only on servers – and process at most text, sound and still images (but not video.) Crucially, most cannot carry out complex planned activities requiring high accuracy.37 And there are a number of other qualities strong in high-level human cognition currently low in released AI systems.
The following table lists a number of these, based on mid-2024 AI systems such as GPT-4o, Claude 3.5 Sonnet, and Google Gemini 1.5.38 The key question for how rapidly general AI will become more powerful is: to what degree will just doing more of the same produce results, versus adding additional but known techniques, versus developing or implementing really new AI research directions. My own predictions for this are given in the table, in terms of how likely each of these scenarios is to get that capability to and beyond human level.
Capability | Description of capability | Status/prognosis | Scaling/known/new |
---|---|---|---|
Core Cognitive Capabilities | |||
Reasoning | People can do accurate, multistep reasoning, following rules and checking accuracy. | Dramatic recent progress using extended chain-of-thought and retraining | 95/5/5 |
Planning | People exhibit long-term and hierarchical planning. | Improving with scale; can be strongly aided using scaffolding and better training techniques. | 10/85/5 |
Truth-grounding | GPAIs confabulate ungrounded information to satisfy queries. | Improving with scale; calibration data available within model; can be checked/improved via scaffolding. | 30/65/5 |
Flexible problem-solving | Humans can recognize new patterns and invent new solutions to complex problems; current ML models struggle. | Improves with scale but weakly; may be solvable with neurosymbolic or generalized “search” techniques. | 15/75/10 |
Learning and Knowledge | |||
Learning & memory | People have working, short-term, and long-term memory, all of which are dynamic and inter-related. | All models learn during training; GPAIs learn within context window and during fine-tuning; “continual learning” and other techniques exist but not yet integrated into large GPAIs. | 5/80/15 |
Abstraction & recursion | People can map and transfer relation sets into more abstract ones for reasoning and manipulation, including recursive “meta” reasoning. | Weakly improving with scale; could emerge in neurosymbolic systems. | 30/50/20 |
World model(s) | People have and continually update a predictive world model within which they can solve problems and do physical reasoning | Improving with scale; updating tied to learning; GPAIS weak in real-world prediction. | 20/50/30 |
Self and Agency | |||
Agency | People can take actions in order to pursue goals, based on planning/prediction. | Many ML systems are agentic; LLMs can be made agents via wrappers. | 5/90/5 |
Self-direction | People develop and pursue their own goals, with internally-generated motivation and drive. | Largely composed of agency plus originality; likely to emerge in complex agential systems with abstract goals. | 40/45/15 |
Self-reference | People understand and reason about themselves as situated within an environment/context. | Improving with scale and could be augmented with training reward. | 70/15/15 |
Self-awareness | People have knowledge of and can reason regarding their own thoughts and mental states. | Exists in some sense in GPAIs, which can arguably pass the classic “mirror test” for self-awareness. Can be improved with scaffolding; but unclear if this is enough. | 20/55/25 |
Interface and Environment | |||
Embodied intelligence | People understand and actively interact with their real-world environment. | Reinforcement learning works well in simulated and real-world (robotic) environments and can be integrated into multimodal transformers. | 5/85/10 |
Multi-sense processing | People integrate and real-time process visual, audio, and other sensory streams. | Training in multiple modalities appears to “just work,” and improve with scale. Realtime video processing is difficult but e.g. self-driving systems are rapidly improving. | 30/60/10 |
Higher-order Capabilities | |||
Originality | Current ML models are creative in transforming and combining existing ideas/works, but people can build new frameworks and structures, sometimes tied to their identity. | Can be hard to discern from “creativity,” which may scale into it; may emerge from creativity plus self-awareness. | 50/40/10 |
Sentience | People experience qualia; these can be positive, negative or neutral valence; it is “like something” to be a person. | Very difficult and philosophically fraught to determine whether a given system has this. | 5/10/85 |
Breaking down what is “missing” in this way makes it fairly clear that we are quite on-track for broadly above-human intelligence by scaling existing or known techniques.39
There could still be surprises. Even putting aside “sentience,” there could be some of the listed core cognitive capabilities that really can’t be done with current techniques and require new ones. But consider this. The present effort being put forth by many of the world’s largest companies amounts to multiple times the Apollo project’s and tens of times the Manhattan project’s spend,40 and is employing thousands of the very top technical people at unheard of salaries. The dynamics of the past few years have now brought to bear more human intellectual firepower (with AI now being added) to this than any endeavor in history. We should not bet on failure.
The development of general AI over the past several years has focused on creating general and powerful but tool-like AI: it functions primarily as a (fairly) loyal assistant, and generally does not take actions on its own. This is partly by design, but largely because these systems have simply not been competent enough at the relevant skills to be entrusted with complex actions.41
AI companies and researchers are, however, increasing shifting focus toward autonomous expert-level general-purpose agents.42 This would allow the systems to act more like a human assistant to which the user can delegate real actions.43 What will that take? A number of the capabilities in the “what’s missing” table are implicated, including strong truth-grounding, learning and memory, abstraction and recursion, and world-modeling (for intelligence), planning, agency, originality, self-direction, self-reference, and self-awareness (for autonomy), and multi-sense-processing, embodied intelligence, and flexible problem-solving (for generality).44
This triple-intersection of high autonomy (independence of action), high generality (scope and task breadth) and high intelligence (competence at cognitive tasks) is currently unique to humans. It is implicitly what many probably have in mind when they think of AGI – both in terms of its value as well as its risks.
This provides another way to define A-G-I as Autonomous-General-Intelligence, and we’ll see that this triple intersection provides a very valuable lens for high-capability systems both in understanding their risks and rewards, and in governance of AI.
A final crucial factor in understanding AI progress is AI’s unique technological feedback loop. In developing AI, success – in both demonstrated systems and deployed products – brings additional investment, talent, and competition, and we are currently in the midst of an enormous AI hype-plus-reality feedback loop that is driving hundreds of billions, or even trillions, of dollars in investment.
This type of feedback cycle could happen with any technology, and we’ve seen it in many, where market success begets investment, which begets improvement and better market success. But AI development goes further, in that now AI systems are helping to develop new and more powerful AI systems.45 We can think of this feedback loop in five stages, each with a shorter timescale than the last, as shown in the table.
Stage | Timescale | Key Drivers | Current Status | Rate-Limiting Factors |
---|---|---|---|---|
Infrastructure | Years | AI success → investment → better hardware/infrastructure | Ongoing; massive investment | Hardware development cycle |
Model Development | 1-2 Years | Human-led research with AI assistance | Active across major labs | Training run complexity |
Data Generation | Months | AI systems generating synthetic training data | Beginning phase | Data quality verification |
Tool Development | Days-Weeks | AI systems creating their own scaffolding/tools | Early experiments | Software integration time |
Network Self-Improvement | Hours-weeks | Groups of AI systems innovate “social” institutions | Unknown | Inference rate |
Recursive Improvement | Unknown | AGI/superintelligent systems autonomously self-improving | Not yet possible | Unknown/ unpredictable |
Several of these stages are already underway, and a couple clearly getting started. The last stage, in which AI systems autonomously improve themselves, has been a staple of the literature on the risk of very powerful AI systems, and for good reason.46 But it is important to note that it is just the most drastic form of a feedback cycle that has already started and could lead to more surprises in the rapid advancement of the technology.
The term “artificial general intelligence” has been around for some time to point to “human level” general-purpose AI. It has never been a particularly well-defined term, but in recent years it has paradoxically become no better defined yet even more important, with experts simultaneously arguing about whether AGI is decades away or already achieved, and trillion-dollar companies racing “to AGI.” (The ambiguity of “AGI” was highlighted recently when leaked documents reportedly revealed that in OpenAI’s contract with Microsoft, AGI was defined as AI that generates $100 billion in revenue for OpenAI – a rather more mercenary than highbrow definition.)
There are two core problems with the idea of AI with “human level intelligence.” First, humans are very, very different in their ability to do any given type of cognitive work, so there is no “human level.” Second, intelligence is very multi-dimensional; although there may be correlations, they are imperfect and may be quite different in AI. So even if we could define “human level” for many capabilities, AI would surely be far beyond it in some even while quite below in others.26
It is, nonetheless, quite crucial to be able to discuss types, levels, and thresholds of AI capability. The approach taken here is to emphasize that general-purpose AI is here, and that it comes – and will come – at various capability levels at which it is convenient to attach terms even if they are reductive, because they correspond to crucial thresholds in terms of AI’s effects on society and humanity.
We’ll define “full” AGI to be synonymous with “super-human general-purpose AI” meaning an AI system that is able to perform essentially all human cognitive tasks at or above top human expert level, as well as acquire new skills and transfer capability to new domains. This is in keeping with how “AGI” is often defined in the modern literature. It’s important to note that this is a very high threshold. No human has this type of intelligence; rather it is the type of intelligence that large collections of top human experts would have if combined. We can term “superintelligence” a capability that goes beyond this, and define more limited levels of capability by “human-competitive” and “expert-competitive” GPAI, which perform a broad range of tasks at typical professional, or human expert level.27
These terms and some others are collected in the table below. For a more concrete sense of what the various grades of system can do, it is useful to take the definitions seriously and consider what they mean.
AI Type | Related Terms | Definition | Examples |
---|---|---|---|
Narrow AI | Weak AI | AI trained for a specific task or family of tasks. Excels in its domain but lacks general intelligence or transfer learning ability. | Image recognition software; Voice assistants (e.g., Siri, Alexa); Chess-playing programs; DeepMind’s AlphaFold |
Tool AI | Augmented Intelligence, AI Assistant | (Discussed later in essay.) AI system enhancing human capabilities. Combines human-competitive general-purpose AI, narrow AI, and guaranteed control, prioritizing safety and collaboration. Supports human decision-making. | Advanced coding assistants; AI-powered research tools; Sophisticated data analysis platforms. Competent but narrow and controllable agents |
General-purpose AI (GPAI) | AI system adaptable to various tasks, including those not specifically trained for. | Language models (e.g., GPT-4, Claude); Multimodal AI models; DeepMind’s MuZero | |
Human-competitive GPAI | AGI [weak] | General-purpose AI performing tasks at average human level, sometimes exceeding it. | Advanced language models (e.g., O1, Claude 3.5); Some multimodal AI systems |
Expert-competitive GPAI | AGI [partial] | General-purpose AI performing most tasks at human expert level, with significant but limited autonomy | Possibly a tooled and scaffolded O3, at least for mathematics, programming, and some hard sciences |
AGI [full] | Super-human GPAI | AI system capable of autonomously performing roughly all human intellectual tasks at or beyond expert level, with efficient learning and knowledge transfer. | [No current examples – theoretical] |
Super-intelligence | Highly super-human GPAI | AI system far surpassing human capabilities across all domains, outperforming collective human expertise. This out-performance could be in generality, quality, speed, and/or other measures. | [No current examples – theoretical] |
We’re already experiencing what having GPAIs up to human competitive level is like. This has integrated relatively smoothly, as most users experience this as having a smart but limited temp worker who makes them more productive with mixed impact on the quality of their work.28
What would be different about expert-competitive GPAI is that it wouldn’t have the core limitations of present-day AI, and would do the things experts do: independent economically valuable work, real knowledge creation, technical work you can count on, while rarely (though still occasionally) making dumb mistakes.
The idea of full AGI is that it really does all of the cognitive things even the most capable and effective humans do, autonomously and with no needed help or oversight. This includes sophisticated planning, learning new skills, managing complex projects, etc. It could do original cutting-edge research. It could run a company. Whatever your job is, if it is predominantly done by computer or over the phone, it could do it at least as well as you. And probably much faster and more cheaply. We’ll discuss some of the ramifications below, but for now the challenge for you is to really take this seriously. Imagine the top ten most knowledgeable and competent people you know or know of – including CEOs, scientists, professors, top engineers, psychologists, political leaders, and writers. Wrap them all into one, who also speaks 100 languages, has a prodigious memory, operates quickly, is tireless and always motivated, and works at below minimum wage.29 That’s a sense of what AGI would be.
For superintelligence the imagining is harder, because the idea is that it could perform intellectual feats that no human or even collection of humans can – it is by definition unpredictable by us. But we can get a sense. As a bare baseline, consider lots of AGIs, each much more capable than even the top human expert, running at 100 times human speed, with enormous memory and terrific coordination capacity.30 And it goes up from there. Dealing with superintelligence would be less like conversing with a different mind, more like negotiating with a different (and more advanced) civilization.
So how close are we to AGI and superintelligence?
To really understand a human you need to know something about biology, evolution, child-rearing, and more; to understand AI you also need to know about how it is made. Over the past five years, AI systems have evolved tremendously both in capability and complexity. A key enabling factor has been the availability of very large amounts of computation (or colloquially “compute” when applied to AI).
The numbers are staggering. About 1025 − 1026 “floating-point operations” (FLOP)16 are used in the training of models like the GPT series, Claude, Gemini, etc.17 (For comparison, if every human on Earth worked non-stop doing one calculation every five seconds, it would take around a billion years to accomplish this.) This huge amount of computation enables training of models with up to trillions of model weights on terabytes of data – a large fraction of all of the quality text that has ever been written alongside large libraries of sounds, images and video. Complementing this training with additional extensive training reinforcing human preferences and good task performance, models trained in this way exhibit human-competitive performance across a significant span of basic intellectual tasks, including reasoning and problem solving.
We also know (very, very roughly) how much computation speed, in operations per second, is sufficient for the inference speed18 of such a system to match the speed of human text processing. It is about 1015 − 1016 FLOP per second.19
While powerful, these models are by their nature limited in key ways, quite analogous to how an individual human would be limited if forced to simply output text at a fixed rate of words per minute, without stopping to think or using any additional tools. More recent AI systems address these limitations through a more complex process and architecture combining several key elements:
Because these extensions can be very powerful (and include AI systems themselves), these composite systems can be quite sophisticated and dramatically enhance AI capabilities.21 And recently, techniques in scaffolding and especially chain-of-thought prompting (and folding results back into retraining models to use these better) have been developed and employed in o1, o3, and DeepSeek R1 to do many passes of inference in response to a given query.22 This in effect allows the model to “think about” its response and dramatically boosts these models’ ability to do high-caliber reasoning in science, math, and programming tasks.23
For a given AI architecture, increases in training computation can be reliably translated into improvements in a set of clearly-defined metrics. For less crisply defined general capabilities (such as those discussed below), the translation is less clear and predictive, but it is almost certain that larger models with more training computation will have new and better capabilities, even if it is hard to predict what those will be.
Similarly, composite systems and especially advances in “chain of thought” (and training of models that work well with it) have unlocked scaling in inference computation: for a given trained core model, at least some AI system capabilities increase as more computation is applied that allows them to “think harder and longer” about complex problems. This comes at a steep computing speed cost, requiring hundreds or thousands of more FLOP/s to match human performance.24
While only a part of what is leading to rapid AI progress,25 the role of computation and the possibility of composite systems will prove crucial to both preventing uncontrollable AGI and developing safer alternatives.