To really understand a human you need to know something about biology, evolution, child-rearing, and more; to understand AI you also need to know about how it is made. Over the past five years, AI systems have evolved tremendously both in capability and complexity. A key enabling factor has been the availability of very large amounts of computation (or colloquially “compute” when applied to AI).
The numbers are staggering. About 1025 − 1026 “floating-point operations” (FLOP)16 are used in the training of models like the GPT series, Claude, Gemini, etc.17 (For comparison, if every human on Earth worked non-stop doing one calculation every five seconds, it would take around a billion years to accomplish this.) This huge amount of computation enables training of models with up to trillions of model weights on terabytes of data – a large fraction of all of the quality text that has ever been written alongside large libraries of sounds, images and video. Complementing this training with additional extensive training reinforcing human preferences and good task performance, models trained in this way exhibit human-competitive performance across a significant span of basic intellectual tasks, including reasoning and problem solving.
We also know (very, very roughly) how much computation speed, in operations per second, is sufficient for the inference speed18 of such a system to match the speed of human text processing. It is about 1015 − 1016 FLOP per second.19
While powerful, these models are by their nature limited in key ways, quite analogous to how an individual human would be limited if forced to simply output text at a fixed rate of words per minute, without stopping to think or using any additional tools. More recent AI systems address these limitations through a more complex process and architecture combining several key elements:
Because these extensions can be very powerful (and include AI systems themselves), these composite systems can be quite sophisticated and dramatically enhance AI capabilities.21 And recently, techniques in scaffolding and especially chain-of-thought prompting (and folding results back into retraining models to use these better) have been developed and employed in o1, o3, and DeepSeek R1 to do many passes of inference in response to a given query.22 This in effect allows the model to “think about” its response and dramatically boosts these models’ ability to do high-caliber reasoning in science, math, and programming tasks.23
For a given AI architecture, increases in training computation can be reliably translated into improvements in a set of clearly-defined metrics. For less crisply defined general capabilities (such as those discussed below), the translation is less clear and predictive, but it is almost certain that larger models with more training computation will have new and better capabilities, even if it is hard to predict what those will be.
Similarly, composite systems and especially advances in “chain of thought” (and training of models that work well with it) have unlocked scaling in inference computation: for a given trained core model, at least some AI system capabilities increase as more computation is applied that allows them to “think harder and longer” about complex problems. This comes at a steep computing speed cost, requiring hundreds or thousands of more FLOP/s to match human performance.24
While only a part of what is leading to rapid AI progress,25 the role of computation and the possibility of composite systems will prove crucial to both preventing uncontrollable AGI and developing safer alternatives.