The Goodhart Singularity
Automating R&D is not sufficient for superintelligence
Buyer beware: bombastic claims advanced, often flippantly.
AI systems are starting to build themselves. Because each generation of model will be better at building its successor than the last, it seems plausible that the full automation of AI R&D could rapidly lead to an exponential growth in overall AI capabilities. A natural inference is that domain-general superintelligence arrives shortly after AI research is automated.
This natural inference is, indeed, almost the default story of AI progress, as well as the strategy of those pursuing it. Where the AI industry once distinguished itself by the search for general intelligence, it has in recent years converged on a narrower path: the targeted pursuit of coding and AI research. For Sam Altman, “perhaps nothing is quite as significant as the fact that we can now use [AI] to do faster AI research”.
To many, the (ostensibly imminent) automation of AI research looks sufficient for the rapid subsequent development of other world-changing capabilities. It would constitute, in Jack Clark’s words, the crossing of “a Rubicon into a nearly-impossible-to-forecast future”. The writers of AI-2027 think the full automation of coding will produce systems “vastly better than the best humans at every cognitive task” within just one year.1
On the extreme version of this view, as Ajeya Cotra puts it, “it’ll take only months or weeks or days to create sci-fi technologies… that would have taken centuries at human rates”. Perhaps most importantly, this progress could unfold almost entirely within the confines of datacenters, unconstrained by the rate of AI’s wider diffusion into the economy, and take the rest of the world by surprise.
I do not think this will happen.
I don’t think that automating AI R&D will rapidly produce domain-general superintelligence. I don’t think that a datacenter of geniuses, largely separated from the rest of the economy, will figure out how to cure Alzheimers, design mosquito-sized killer drones, or manage a Fortune 500 company.
Of course, I am far from alone in thinking this. Others have explored the possibility that even full R&D automation will be bottlenecked on experimental compute or rapidly hit diminishing returns, or that the main barrier to transformative change is the size of the economy. But I have a sneaking suspicion that there is a further, as-yet-unarticulated reason not to expect automated R&D to rapidly produce a domain-general superintelligence from within the confines of a datacenter.
Put simply, I think it’s impossible to get good at solving problems without access to a source of those problems, and that arduous interaction with the real world in serial time is the only such source. I therefore believe that the development of domain-general superintelligence will depend on the laborious, painful and expensive deployment of AI into the real world. This would, of course, be great news.2
Versions of this view have been articulated by Herbie Bradley (in a scattershot series of excellent tweets) and Dwarkesh Patel (who writes of the necessity of a “broadly deployed intelligence explosion”). But I have yet to see this view laid out in terms clear enough for me to internalise. What follows is my attempt to do so.
I believe the automation of AI R&D will not rapidly lead to domain-general superintelligence because:
It’s impossible to get good at most things without practice.
AI companies lack the data their models would need to practice most things.
This can’t be fixed with “sample efficiency”. In most cases, the relevant data doesn’t exist at all.
This also can’t be fixed with simulations or synthetic data.
In part, this is because the automated AI researchers won’t be able to evaluate success on tasks they have yet to understand.
But it is also because the relevant data can only be generated by interactions with a market whose actors have preferences that are unknown and therefore unsimulable.
This means that the relevant data for superintelligence in most non-coding domains will only become available through deployment of AI models throughout the economy.3
The singularity, therefore, will be bottlenecked on signal. The output of the R&D produced by an isolated datacenter of geniuses would be a mere Goodhart Singularity.4 An isolated AI improving itself against benchmarks would only appear to be approaching superintelligence, while actually optimising for eval performance that fails to generalise beyond the lab.
This suggests that the automation of AI research will not rapidly produce superintelligent capabilities in other domains - their arrival will largely be a function of deployment and data collection in the real world. AI models need real-world deployment for the same reason the body needs pain and corporations needs profit: signal is sovereign.
Let’s take these points in turn.
Domain-specific intelligence requires practice
The current trajectory of AI progress belies the notion that AIs acquire capabilities by virtue of a singular g they are accumulating. Instead, it looks to me that AIs become good at the things they are given the chance to practice.5
I think this is one reason AI capabilities have turned out to be far, far more jagged than we had any right to expect. Few in 2016 would have predicted an intelligence that could outscore Stanford Law graduates on the Bar without coming even remotely close to practicing law, let alone consistently lose at Tic-Tac-Toe.
Stories like AI-2027 sometimes read as if “automating AI R&D” were a phase-shifting inflection point after which jaggedness inexplicably ends. But I don’t know why that would be the case, when jaggedness seems primarily an artifact of the distribution of training data. We should expect jaggedness to last longer than we expect, even after we adjust for jaggedness lasting longer than we expect. Call this Toner’s Law.
To illustrate this point, let’s turn our attention the current toolkit of capabilities progress. From what I can tell, the elaborate construction of bespoke RL-environments is a major engine of such progress. How do these environments get made?
Read an account by one of the providers, and you may be surprised to find that the work of building superintelligence is proceeding one patch at a time. As Mechanize describe it, each environment is the product of a single engineer spending a week patching a single observed failure in a frontier model. “We used to get bundles of capabilities with each scale,” observes Zhengdong Wang, but “now we put in the work to unlock each one”. Are we expecting to solve week-long problems all the way into the singularity?
Mechanize advert advocating for a piecemeal conquest of superintelligence.
Photographed by the author in Berkeley.
The belief that “automating AI research” is the way to rapidly get “everything else” looks like an undercooked bet to me. Companies are aspiring to unlock the singularity through a war of attrition on AI coding and research, after which the magic of “algorithmic progress” and “sample efficiency” imply that further capabilities can be summoned without requiring a week of bespoke schlep from a blackpilled Mechanizer. Dwarkesh has clearly internalised this - the labs’ own actions “hint at a worldview where these models will continue to fare poorly at generalising… making it necessary to build in the skills that they hope will be economically valuable”.
Now of course it’s possible I’m mischaracterising RL. Maybe RL does generalise better than the above Mechanize description would imply. But if this were the case, I’d have expected AI companies to do a much better job of demonstrating this by now! We’ve been pumping the RL paradigm for well over a year, and I still haven’t seen any compelling evidence or narrative that RL will introduce anything other than a piecemeal singularity. When Dwarkesh puts this challenge to Dario - repeatedly! - in their second interview together, every single one of Dario’s counters ends up weakly analogising RL to GPT-2 era generalisation from pretraining.6
AI companies do not have the data to practice the things that matter, even if “sample efficiency” is improved
I don’t think AI companies currently have access to the right kinds of data to enable their models to practice things they want their models to do. I also don’t think this data will be cheap or easy to acquire in the future without widespread diffusion of their models throughout the economy.
This assertion is partly a guess. But I feel fairly confident that no AI company has access to datasets that provide much signal on, say, which of several candidate drone designs would be a greater contribution to Ukraine’s war effort.7
The natural rebuttal to the claim that we don’t possess enough data for superintelligence is: “LLMs are wildly sample inefficient - this can be dramatically improved, to match human-level or beyond!” To make this case, AI bulls can smugly point to the ways in which Deepseek achieved GPT-4-level capabilities at a fraction of the compute.
I’m not convinced this is correct. To begin with, I’m far from convinced that the LLM industry has a strong track record of wild improvements to sample efficiency. As both Anson Ho and Beren Millidge point out, improvements in intelligence-per-FLOP seem largely attributable to improvements in data quality, not algorithm quality. The record for pure algorithmic innovations - particularly ones which reap rewards at a constant compute stock - looks fairly weak.
But more fundamentally, it strikes me that for most tasks-of-interest the data required does not exist at all - at least not in a form that companies can readily feed to their hungry, hungry models. Even OOMs of increase in sample efficiency get you nowhere if the sum of relevant data is 0.
A good test for this intuition is to ask how much coding capability progress we’d get at current levels of talent and compute if the Internet had never existed. My guess is that it’d be a very painful process, requiring tens of billions of hours of human labour. My claim is that we live in this “no Internet” world for most kinds of tasks.
We live in the “no Internet” world for most tasks because the relevant data in the training corpus is of the wrong kind. For most tasks, what we have amounts to descriptions, commentary, or advice - but no record of the steps involved in performing the task itself. So even when there’s a lot of text about a task, that text does not provide the model with what it needs to perform the task. My sense is that, for most economically interesting tasks, there’s a large gulf between predicting tokens describing a task and predicting tokens that perform that task successfully.8
There is no Github for closing a Series B, winning a knife fight, or negotiating with Houthis.
This fact may save us all.
One way to make this gulf obvious is to ask LLMs for strategy advice on a boardgame. Ideally this boardgame should be popular enough to have plenty of strategy advice online. Dominion - which Claude isn’t great at - makes for a good example. The training pipeline that has taught Claude to write compellingly about Dominion is ill-suited to producing an agent that could actually win a game. This is because the objective it’s optimising for is “produce advice that looks good”, and not “propose a strategy that wins”. LLMs relate to most tasks as McKinsey does to running a company.
The obvious exception to the above is coding, for which the tokens left to us by our ancestors are constitutive of the task itself. LLMs are best at tasks whose natural substrate is text - tasks for which the textual version of a problem is the very problem you want to solve. They are good at tasks whose primitives are naturally represented as tokens in training data. When the task is one which, by default, we have conducted in and through tokens left behind in the training data, then predicting tokens in the pretraining corpus amounts to completing the version of the task that the text describes. But where this is not the case, the relationship between accurately modelling a description of a thing and actually doing that thing breaks down.
For most tasks in the economy, the pretraining corpus contains writing about the task, but not a record of the task itself. This is of course one of many reasons coding has progressed faster than other domains - code is one of the neat cases for which the task itself is almost entirely reducible to its token trace.9
This means that, for most tasks, our approach to training LLMs is akin to trying to train AlphaZero on chess commentary rather than games of chess. But I don’t think that even a 100% success rate at predicting the tokens of a Robert Caro biography generalises to becoming POTUS.
Of course, this suggests two complentary paths:
We find ways to faithfully represent ever-greater portions of the universe in token form, giving everything an API, allowing you to train agents to solve real-world problems inthe substrate of tokens.
We give agents the ability to act more directly on the world (Computer Use now, Robotics later), instantly unlocking a broader class of non-textual problems currently incompressible into tokens.
Either way, AI companies won’t be able to get their hands on data that make their models good at non-coding problems without deploying these models into the economy (and successfully enticing their customers to share the relevant data with them).10
The data required for practice cannot be simulated
The natural rebuttal to the above is that you can just simulate all the data you need, such that diffusion is unnecessary. Maybe this is in fact true. But I’m skeptical.
One source of my skepticism is entirely unoriginal: I think it will be difficult for AIs in the datacenter to tell whether they are in fact improving at a skill if they lack ground-truth access to that skill.
Consider that almost half of SWE-Bench submissions accepted by AI auto-graders would be rejected by the actual human maintainers of the relevant repositories. The fact that you can pump SWE-bench scores without increasing actual merge rates is, to me, suggestive of the situation the datacenter-genius will find itself in. How is Claude 8 supposed to know whether its latest galaxy-brained algorithm has improved Claude 9’s ability to become POTUS purely by looking only at the results of the Become-POTUS eval?11 We all know that evals, whilst useful, remain stubbornly incomplete proxies of real-world capabilities. The only eval that matters is the voter, the customer, the economy, and the bottom line. Without access to any of these, your singularity will be mercilessly goodharted.
The great Zhengdong makes this point about the progress of AI research itself. Not only are “evals” the only things that models are capable of getting good at, but “the researchers [themselves], they just wanna optimise… they just want an important problem to solve, a clear evaluation that measures progress towards it, and then they just wanna optimise it.” I suggest that AI companies need real-world deployment as a source of problems, or else they will have no good targets for optimisation.12
I have another reason that is both more fundamental and more speculative for asserting the implausibility of reaching domain-general superintelligence through simulation. I suspect that even an arbitrarily intelligent simulator would be incapable of generating relevant practice data for several tasks, because the relevant data can only be generated through an interaction with the relevant market. The data simply does not exist anywhere prior to the deployment that would generate it, and that deployment involves modelling the preferences and knowledge of actors in a market that are unknown to the simulator.
From what I understand (and I’m not entirely sure I understand it well), this is basically a Hayekian point.13 The signal that something is good is generated by millions of actors revealing their preferences through their behaviour. You can’t simulate this from inside a datacenter, because the inputs aren’t being withheld through any contingent failure of data-collection - they just don’t exist anywhere yet, and they won’t until the market generates them. Unless your AIs have access to this market, they won’t be going anywhere.
Again, boardgames provide a fruitful analogy here. Twilight Imperium is one of my favourites. Because each game takes around twelve hours, I play only rarely. This gives me weeks and weeks of time to enjoyably theorycraft my strategy ahead of each prospective game. Twilight Imperium is deliberately complex and Byzantine, so the prep is great fun: sitting alone with a google doc and tabs of Reddit strategy threads, gaming out my turns of every round, getting a huge kick out of how absolutely cooked all my opponents are going to be. The reality never lives up.
My experience of playing Twilight Imperium. Credit for this meme goes to the excellently named u/MadCaucasian (no relation).
Time and again, my best-laid plans bear shockingly little resemblance to how the game actually plays out. As it turns out, I really suck at this game, and my model of how the game was supposed to play out approaches zero predictive validity by about minute 5 of the total 12 hours of game time.
I posit that the RSI’d “superintelligence” that breaks free from the datacenter after a year of “algorithmic progress” grindset will feel a little like I do by Round 2 of a game of Twilight Imperium: my fleets destroyed, resources depleted, and the allies I planned to cultivate conquering my home system.
My (optimistic) forecast for the modal experience of
self-improved misaligned “super”-intelligences
A further obvious objection to the picture I’ve painted above is, of course, continual learning. Intelligent humans thrive in foreign environments because we can learn as we go along. Is it not at least conceivable that our automated AI researchers, toiling away in their secluded datacenters, would be capable of crafting the perfect continual learning algorithm? They need not know everything before breaking out and taking over, they need only to have developed superhuman sample efficiency, such that they become rapidly godlike within mere moments of leaving the datacenter.
This too, I am skeptical of. How can you craft a perfect continual learning algorithm without a proper diverse set of problems to learn over? That’s how evolution did it. It’s just not clear to me that the datacenter of automated AI researchers would be able to judge the relative merits of one continual learning algorithm over another without a real problem set to test it on. As Herbie Bradley puts it, we should not “model AI labs as a black box separate from the world, within which superintelligent AI can improve with no outside dependencies”. Intelligence can’t explode in a vacuum.
Implications
To be absolutely clear, I don’t think any of the above means that superintelligence is impossible, or that the world still has a shot at looking remotely normal 50 years from now. It almost certainly does not. It’s also clear that most people warning about AGI don’t think that we should only be worried because the automation of AI R&D will rapidly lead to domain-general superintelligence.
But I think that the conversation as a whole has tended to treat “diffusion” as something of an irrelevance to the progression towards superintelligence. The fact that businesses may be slow in adopting AI is only relevant insofar as it means the world outside SF won’t know what’s going on, or insofar as it might make raising money harder for labs. Diffusion isn’t seen as a source of capabilities progress.
Similarly, on the other side, those who have for years harped on about the significance of diffusion have not (to my knowledge!) clearly articulated why exactly the future AI superintelligence would have to be preceded by thousands of deployments into mom & pop shops, middle schools, and Siemens. The above is my attempt to bridge this gap.
If I’m correct, a couple things follow.
The first is that AI policy’s growing focus on “internal deployment” and “automating AI R&D” may well be misplaced. I don’t think superintelligence is impossible - it just won’t arrive in the shape we currently anticipate, or on the timelines currently forecasted. The next phase of capability progress will look less like a datacenter compounding in private and more like a grind of deployment, customer discovery, and real-world data collection. If that is what the future looks like, the policies we should be pursuing look quite different.
For one, this world would suggest that data-holders hold far, far more power than they currently realise, and they should leverage it (as Ukraine is doing). The asymmetry between “company that can build a frontier model” and “entity that holds proprietary, deployment-grade data on a real economic activity” might shift, and we should think about what a preferable shift could look like. This is probably of particular significance to Europe, a country with an ever-vanishing shot at building superintelligence but a glorious legacy of doing science and industry (and thus a rich stockpile of data).
Another implication is that AI labs may well just set out to conquer verticals one by one. If the relevant signal can only be acquired through deployment, it may well be the case that the winning move is to just become the deployer. David Oks makes the point that labour displacement is more likely to come from something like Dwarkesh’s fully automated firm than from incumbent AI adoption. It strikes me as plausible that Anthropic or OpenAI’s best bet is to be clever about which verticals to serially conquer and own end to end. We should start thinking about the conditions under which that pathway is or is not desirable.
Acknowledgements
I’m grateful to the following people for feedback on this piece: Herbie Bradley, John Halstead, Jack Miller, Stephen Clare, Rebecca Hersman, Markus Anderljung, Jake Steckler, Zaheed Kara, Liam Patell, John Lidiard, Alan Chan, Amelia Michael and Nick Stockton. Any remaining errors are, of course, my own.
Similarly, Aschenbrenner’s Situational Awareness envisions datacenters of automated researchers improving their algorithmic efficiency until they’re capable of solving robotics, drone warfare, and everything else. Dario Amodei also thinks progress in AI research will make the acquisition of other capabilities almost trivial, enabling us to make 50-100 years of biological progress in a decade.
The automation of AI R&D is also by now the default story of AI doom. Where industry safety frameworks once tracked capabilities relevant to Loss of Control through a model’s ability to “autonomously replicate and adapt”, 2024-5 saw a gradual shift towards “AI R&D” as the far and away the predominant threat vector. See also Apollo, AI Behind Closed Doors, IAPS, Managing Risks from Internally Deployed AI systems.
You thus have my permission to apply a considerable copium discount to everything that follows.
Or, alternatively, the wholesale recreation of parts of that economy by AI labs themselves. That too will take time (and lots of money).
Goodhart’s law: when a measure becomes a target, it ceases to be a good measure.
The notion that practice makes perfect is, of course, consistent with the way we humans live our lives. Returns to practicing a task are generally monotonic, especially when that task is economically valuable. We expect a Slaughter and May partner in her fifties to decisively mog trainees ten times out of ten.
To be clear, this also appears to be Mechanize’s view. I’m unconvinced.
Note that this is a different claim from the one that current AI could assist in better, faster drone design - I think it could! But I’d be highly skeptical that Claude 8 could end-to-end design a superior drone without takes from someone on the front-lines, or without feedback from seeing its first prototypes crash and fail.
Thus LLMs are good at precisely the things humans happily splurge over the Internet for free. This point is also not novel. Freddie deBoer observes that LLMs “have achieved mastery over the exact domains that were already, by any sane measure, overprovisioned”.
And even this is not perfect, of course, which is why even superb coding agents are not yet hirable as SWEs outright.
One obvious alternative to deployment in the economy involves the AI lab recreating large parts of the economy outright. Anthropic could, for instance, just build factories and hire workers to wear a camera and motion sensors as they go about their job. This may well be what happens. I still think this means the intelligence explosion is rate-limited by an AI lab’s involvement in real-world economy (partially because doing this would be so expensive). There’s also a possibility that task distribution in the economy is inherently non-stationary, and shifts in response to AI deployments, such that it’s difficult to make your models good at the next marginal automatable task without deployment (credit goes entirely to Herbie Bradley for this point).
Per Scott Alexander, the logical endpoint is that evals are replaced entirely by real-world deployments. The only RestaurantBench that provides signal on Claude 8’s attempts to improve Claude 9’s restaurants is an actual, real-world attempt at opening and managing a restaurant.
A clear exception to this case is coding (and some forms of marketing, communications and legal work), for which Anthropic clearly benefits from being its own customer. Particularly in Google’s case, such data may well constitute a crucial resource.
It has been brought to my attention that this point was, in fact, first articulated by Von Mises in Economic Calculation in the Socialist Commonwealth. I’ve stuck to referring to it as “Hayekian” for the sake of the reader (you’re welcome).






> This too, I am skeptical of. How can you craft a perfect continual learning algorithm without a proper diverse set of problems to learn over? That’s how evolution did it. It’s just not clear to me that the datacenter of automated AI researchers would be able to judge the relative merits of one continual learning algorithm over another without a real problem set to test it on.
Doesn’t this depend on the premise that “there will be many continual learning algos that perform equally well at learning skills in any synthetic environments you can possibly set up, but which nonetheless differ significantly in their ability to learn skills in ‘the real world’”?
This is possible I guess, but it seems unlikely to me.
Great post! I’m glad someone wrote this up into a coherent view, and I think it’s an accurate model of LLMs and their near-term successors (“LLMs relate to most tasks as McKinsey does to running a company” is one of many banger lines).
I think the weakest spot in your argument — and the biggest open question here — is the implicit claim about how much real-world data you’d need to properly test your algorithms, especially continual learning algos (Brendan makes a version of this point, as did Tom Davidson in his tweet thread with Bradley).
If it takes a lot of data — the “turn the whole economy into RL environments” world that Mercor et al. imagine — then that’s great news, and the country in the datacenter won’t get very far on its own. Your argument implicitly takes this stance (though correct me if I’m wrong!), and within the LLM paradigm I think that’s a reasonable expectation.
If it doesn’t take much data — either because RL for LLMs starts to generalize, as Dario expects, or because Grok 6 can test its continual learning algos perfectly well with a couple hundred Optimus robots roaming around Colossus’s backyard for a year — then the argument is much less comforting. I’m not sure how to weigh the evidence here, but I’m not as convinced as you seem to be that we’ll end up needing the world economic RL env (particularly for non-LLM paradigms, e.g. Steve Byrnes’s Brain-like AGI).
One intuition that might differentiate our views: evolution needed lots of time and diverse problems to develop the brain’s algorithms, but we have some big advantages over evolution. Obviously we can take much more strategic actions (e.g. deciding to scale up an architecture by many OOMs within a few years), but perhaps more importantly we have a working example of a sample mega-efficient general intelligence algorithm in our brain. If (and it’s a big if!) we could reverse engineer it even partly (e.g. to where we have chimp-level brain algos), that might greatly reduce the amount of data we’d need to test our AGIs.
That said, it’s not clear at all whether that kind of reverse engineering work is on track to being automated (e.g. maybe it involves lots of neuroscience or taste/generalization), or whether hundreds of humans working on it irl have made that much progress (unclear how to measure that).