Alignment Research and Intelligence Enhancement: Discussion on The Failed Strategy of Artificial Intelligence Doomers
The people I critiqued in my Palladium article The Failed Strategy of Artificial Intelligence Doomers have responded, and I’ll reply to some of that here. The article probably got me more emails from that coalition’s people than anything else I’ve written, which led to some good conversations. There’s also been substantial discussion at their hub on Less Wrong. As I write this, the post has 126 points and 73 comments, most of them critical.
I should explain, for those who aren’t familiar with the subculture, that this is a friendly and constructive reception. Rationalists think disagreeing is fun and agreeing is boring—it’s one reason I get along with them. So, people who agree or think the article is insightful will vote up and move on to something they enjoy, while people who disagree or think the article is dumb or think it’s only mostly correct will spend the time to write up their objection. This is not a new observation, their founding texts explicitly call out their “culture of objections”. It’s fine, you just have to know how to interpret their discourse. “There’s a post with a high score and lots of disagreeing comments” is what it looks like when they’re taking an argument seriously.
The top comment, from Jim Babcock:
The article seems to assume that the primary motivation for wanting to slow down AI is to buy time for institutional progress. Which seems incorrect as an interpretation of the motivation. Most people that I hear talk about buying time are talking about buying time for technical progress in alignment. Technical progress, unlike institution-building, tends to be cumulative at all timescales, which makes it much more strategically relevant.
To which I say: Yes, obviously. This is implicit in my argument. I should have made it explicit, since I also heard the same point from several different people on Twitter. Even if most readers made the connection themselves, a notable minority did not. Mea culpa. I’ll spell it out here, and please forgive me if this seems long or repetitive; the point seems incredibly obvious to me and I’m not sure which part isn’t intuitive to these readers, so I’m going over the basics. I am tempted to just say “The technical progress in alignment has to come from institutions”, but if conveying the point were that easy then Jim could’ve put it together on his own.
If we accept the AI Doomer position that superintelligent AGI will inevitably be built someday, and will inevitably become the most powerful force in the world once it exists, then the big cosmic question is whether or not the eventual AGI does things that match what humans want—whether it’s aligned to human values, to use their term of art. (There’s ongoing debate about exactly what “alignment” means and what “human values” are and whether those are coherent concepts, but for our purposes it’s sufficient to gesture vaguely at the intuitive meaning—the task of aligning the future AGI does not require that we precisely understand these concepts now, only that we can eventually.)
Jim is absolutely correct that this alignment can only come because the people or programs which write the AGI’s code understand, technically, what they are doing, and write the correct code to produce an AGI which does things they want instead of producing an AGI which does something else. If you have “functional institutions”, but the institutions do not write the code for an aligned AGI, then you do not get an aligned AGI. You have to write the correct code. Functionality does not produce an aligned AGI via some quintessence of competence which diffuses into the AGI through the ether. “Institutional functionality” matters to this question only insofar as it affects whether the institutions in question create an aligned AGI or an unaligned AGI or no AGI.
According to the AI Doomers, the current institutions are on track to soon create an unaligned AGI, and are not capable of creating an aligned AGI. So, the institutions must improve if they are to produce the outcome which the AI Doomers seek. The differences between institutions result in differences between the outputs those institutions produce—research, engineering, software, and ultimately perhaps an AGI. Again I’m sorry if all this seems basic, I’m aware this is an old point which many AI Doomers have argued for explicitly and at great length, but Jim’s popularly-endorsed comment makes it clear that many of them didn’t see the mechanism here.
The problems with the “buy time for technical alignment research” strategy become clear when you stop thinking in zoomed-out abstractions and start thinking about specific projects. Most important, of course, is that the AI Doomers’ attempts to “buy time” have in fact achieved exactly the opposite, and by continuing the same strategy they are on track to continue accelerating the events which they claim they want to slow down, as I argued in my article. I doubt Jim disputes this; there are very few AI Doomers do, when I ask them in person. It is striking that so many of them are pursuing this strategy despite agreeing that it will probably have the opposite of the effect they say they want, but this isn’t the place for me to speculate about why.
We can set aside the actual political efforts pursued by AI Doomers in real life, and imagine there’s a magic wand which “buys time” in the way they wish for, and shuts down all research except for that conducted by AI Doomer institutions. Suppose the wand removes anything that looks like OpenAI, and only things that look like Redwood Research continue researching AI, and also they’re magically shielded from the entryism, gamesmanship, and other consequences that would happen if you did this in reality. Do any AI Doomers think this would actually suffice to solve the technical challenges? Probably there’s someone somewhere, but it’s not a common belief. Even the magic wand doesn’t help with their ostensible goal, unless you have another magic wand which gives you extreme institutional improvements.
Even just identifying technical AI alignment research, and separating it from other technical AI research, is extremely contentious, and the AI Doomers have not reached consensus on what should be included in the category. The AI Doomer ontology holds that “AI research” consists of two subtypes, “AI capacity research” and “AI alignment research”, but this distinction holds up much worse in practice than in theory. Essentially, AI capacity research is defined as advancing our understanding of how to make AI that is more powerful and better at doing stuff. AI alignment research advances our understanding of how to make AI choose to do stuff that is aligned to human values.
Conceptually speaking, this distinction is extremely fuzzy at best, and kind of fake at worst. Historically speaking, these labels have mostly been important as a way for AI companies like DeepMind, OpenAI, and Anthropic to recruit AI Doomers to do research and development by branding it as “alignment”. There is no consensus among the AI Doomers about which projects count as “alignment research”. For every AI Doomer holding up a claimed example of technical alignment research progress, you will find another denouncing the same example as “safetywashing” a reckless capabilities advance. Many have championed programs like “benchmarks” and “evals” as ways of gaining better insight into how current AI software works and therefore assessing and improving its alignment, while others decry these as open-ended research tools which bring us closer to apocalypse. And as far as I’ve heard, literally all of the claimed technical alignment research advances were only possible because of the foundations laid by the past decade of capabilities research advances.
The conceptual problem might be solvable in principle. If so, however, the current AI Doomer institutions are evidently not up to the task of solving it. In large part because of this, they do not have any plausible research path to solve the technical problem, as they understand it. An “AI pause” with an exception for technical alignment research would not be enough, even if we limit ourselves to the magic wand version of the plan. In real life, of course, such a plan is even worse than this because it would get coopted even harder and faster than their previous alignment research efforts got coopted by AI companies.
To their credit, most of the AI Doomers openly acknowledge that the plan of buying time for technical alignment research cannot work. Statements like “I'm not sure how much I trust most technical AI safety researchers to make important progress on AI safety now. And I trust most institutions a lot less.” and “I have grown pessimistic about our ability to solve the open technical problems even given 100 years of work on them.” are common. This is why so many of them are putting their chips on human intelligence enhancement via genetic engineering, in hopes of producing people capable of solving the problem which is beyond their coalition’s current abilities. Previously I thought this was a niche plan advanced only by Eliezer Yudkowsky, but the response to my Palladium article showed it’s much more popular than that. I disagree with them about the relative chances of producing institutions with stronger epistemic foundations vs massive near-term breakthroughs in human intelligence enhancement technology, but the latter could indeed help if it were achieved.
Intelligence enhancement research has another great advantage over activism to shut down AI research: If the assumptions behind AI Doom are incorrect, then activism to shut down AI research makes the world worse, but research to improve intelligence still makes the world better. For people operating within the AI Doomer frame, I think this is the best way to solve the problem, rather than dig the hole even deeper as their past attempts have done. It is the Norman Borlaug approach rather than the Club of Rome approach.
While I am reluctant to pour cold water on a good cause, I do think a lot of AI Doomers are overestimating the odds that this project massively succeeds in our lifetime. Many of them are putting a lot of weight on the idea that current incremental progress is on the verge of a giant leap forward which produces engineered megageniuses. I don’t think this is true.
This brings us to the second-most popular comment on my article, from Oliver Habryka (I’m condensing it here but click through for the whole thing):
I feel like intelligence enhancement being pretty solidly in the near-term technological horizon provides strong argument for future governance being much better. There are also maybe 3-5 other technologies that seem likely to be achieved in the next 30 years bar AGI that would all hugely improve future AGI governance.
And then a lot of the post seems to make really quite bad arguments against forecasting AI timelines and other technologies … I will very gladly take all your bets that intelligence augmentation will not take "several centuries". … I see no methodology that suggests anything remotely as long as this, and so many forms of trend extrapolation, first principles argument, reference class forecasting and so many other things that suggest things happen faster than that.
…
We can’t bet on where the tech will be centuries from now, but Oliver, if you think there’s a large chance of achieving big megagenius breakthroughs in human intelligence before I’m old and failing—say, within 30 years—then yes I am very happy to bet you (or others whose honesty the community trust network vouches for).
For my general argument against being able to predict the arrival time of breakthrough technologies, see Against AGI Timelines. Briefly, the history of technology does not support the idea that large breakthroughs can be “timed” beforehand. The reasons apply to genetically engineering megageniuses (or to fusion power, curing cancer, superconductors, etc) as much as to AGI.
However, we can do a pretty good job of predicting incremental improvements in technology, e.g. battery capacity or desalination efficiency. Many at Less Wrong, apparently including Oliver, contend that we do not need large breakthroughs to get megageniuses, and current incremental progress on genetic engineering will be sufficient. I am skeptical.
The Less Wronger argument that big breakthroughs are imminent is best expressed in GeneSmith and kman’s Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible and How to Make Superbabies. The latter argues that genetic edits can “make someone with a higher predisposition towards genius than anyone that has ever lived” and achieve “50 additional years of life expectancy” by stacking together technologies which are on the verge of being proven. The authors have founded a company aiming to do exactly that. At the time I thought the comments on the post were interested-but-skeptical, but now I realize that many of Less Wrong’s people are putting tremendous strategic and psychological weight on projects like this. Perhaps I should have expected that, after putting their chips on superhuman intelligence transforming the world in their lifetimes via AGI and superhuman intelligence transforming the world in their lifetimes via rationality training, they are now putting their chips on superhuman intelligence transforming the world in their lifetimes via genetic engineering.
The lesser objection to this plan is that stacking together technologies which are on the verge of being proven is the sort of thing which often elides decades of finicky engineering work to solve unforeseen difficulties, and not infrequently results in the discovery that one of the hoped-for technologies is not actually as provable as you thought. The greater objection is that, even if the gene editing technologies work more or less as the authors hope, it is not at all clear that the effects will be as large as they claim. Modest effects would still be an enormous success, of course! But we don’t know nearly enough about human intelligence to justify their confidence in extreme effects.
We’re basically dealing with three black-box systems which we understand poorly. We know that genetic code creates the brain, but mostly we don’t know how, with a few exceptions for especially legible mutations. We know that the brain creates human intelligence, but mostly we don’t know how, with some exceptions like understanding that particular regions are associated with particular cognitive functions, or understanding particular chemical pathways which influence cognition in more or less understood ways. And our understanding of what human intelligence even is remains largely a “I know it when I see it” situation, with academic disciplines like psychology, behavioral economics, and ethnic studies offering explanations of how humans think which are incredibly incomplete at best and transparently fraudulent at worst, while introspective traditions like psychotherapy, meditation, and Less Wrong’s own rationality techniques provide mutually contradictory accounts of how human thought works, how it ideally ought to work, and how to improve it in practice.
When we know that some genetic mutation is associated with a higher IQ, usually we don’t know how the mutation affects the brain (for the ones which are in fact causal), or how the brain doing something differently alters the cognitive architecture, or the mechanics of what makes a different cognitive architecture “more intelligent”. If you stack hundreds of these mutations as the authors propose, what happens? I would like to find out but it seems really amazingly incredibly unclear, as several comments on the post point out. When GeneSmith and kman talk of boosting IQs to 170, that’s putting more weight on the concept of “IQ” than today’s IQ tests can bear.
This situation does not permit a precise engineering approach. It permits a statistical brute-force approach, which is what the authors propose. Statistical Challenges with Making Super IQ babies is a better and more informed critique of their statistical assumptions than I would be able to write, especially the “Correlation vs. Causation” and “The Additive Effect of Genetics” sections. For their plan to produce effects large enough to transform the strategic situation of artificial intelligence development as quickly as they hope, a bunch of assumptions would have to go right. (Including assumptions about the regulatory or cultural response to gene editing, which are beyond the scope of the technological argument.)
Let me reiterate that I absolutely one thousand percent support attempts to engineer megageniuses, not because it is easy, but because it is hard. If these attempts merely produce mild improvements in human intelligence or health which we mostly don’t understand mechanistically, perhaps breaking down at the upper tail, that’s an astoundingly great outcome, one of the best uses of a life’s work I can imagine. If I’m wrong and they or their peers make their 125-year-old megageniuses, then so much the better. If the project totally fails, then oh well, the story of human progress is a lot of crazy-sounding projects which mostly crash in ugly fireballs, except a small fraction which actually work and more than justify all the wipeouts along the way. Perhaps the results will help someone come up with a big conceptual breakthrough in understanding genetics or the brain or human intelligence, which makes it possible to do something closer to engineering than statistical brute-force. While most projects which look like this don’t win big, most projects which win big come from something that looks like this. New technology is very frequently achieved by crazy people who are ideologically committed to their particular project. All progress depends on the unreasonable man.
Genetic intelligence augmentation is not a plan so promising that we should institute a global tyranny to shut down technological progress until it has successfully created supermen, as some AI Doomers advocate. In my entire life I have never encountered a plan that promising. But it is promising enough that working on it is a good idea. AI Doomers would do much better to work on intelligence enhancement (genetic or otherwise) rather than pursue the social engineering which historically they have largely focused on, mostly counterproductively. Despite the difficulties, intelligence enhancement is one of the very few plans I’ve ever heard articulated which can actually help the issues they fear, rather than make them worse. If their predictions about future technology prove false, like the past predictions about imminent superhuman AI, then the project will still make the world better rather than worse.