By Holden Karnofsky Jun 9, 2022 20 min read ImplicationsOfMostImportantCentury

AI Could Defeat All Of Us Combined

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

I've been working on a new series of posts about the most important century.

The original series focused on why and how this could be the most important century for humanity. But it had relatively little to say about what we can do today to improve the odds of things going well.
The new series will get much more specific about the kinds of events that might lie ahead of us, and what actions today look most likely to be helpful.
A key focus of the new series will be the threat of misaligned AI: AI systems disempowering humans entirely, leading to a future that has little to do with anything humans value. (Like in the Terminator movies, minus the time travel and the part where humans win.)

Many people have trouble taking this "misaligned AI" possibility seriously. They might see the broad point that AI could be dangerous, but they instinctively imagine that the danger comes from ways humans might misuse it. They find the idea of AI itself going to war with humans to be comical and wild. I'm going to try to make this idea feel more serious and real.

As a first step, this post will emphasize an unoriginal but extremely important point: the kind of AI I've discussed could defeat all of humanity combined, if (for whatever reason) it were pointed toward that goal. By "defeat," I don't mean "subtly manipulate us" or "make us less informed" or something like that - I mean a literal "defeat" in the sense that we could all be killed, enslaved or forcibly contained.

I'm not talking (yet) about whether, or why, AIs might attack human civilization. That's for future posts. For now, I just want to linger on the point that if such an attack happened, it could succeed against the combined forces of the entire world.

I think that if you believe this, you should already be worried about misaligned AI,¹ before any analysis of how or why an AI might form its own goals.
We generally don't have a lot of things that could end human civilization if they "tried" sitting around. If we're going to create one, I think we should be asking not "Why would this be dangerous?" but "Why wouldn't it be?"

By contrast, if you don't believe that AI could defeat all of humanity combined, I expect that we're going to be miscommunicating in pretty much any conversation about AI. The kind of AI I worry about is the kind powerful enough that total civilizational defeat is a real possibility. The reason I currently spend so much time planning around speculative future technologies (instead of working on evidence-backed, cost-effective ways of helping low-income people today - which I did for much of my career, and still think is one of the best things to work on) is because I think the stakes are just that high.

Below:

I'll sketch the basic argument for why I think AI could defeat all of human civilization.
- Others have written about the possibility that "superintelligent" AI could manipulate humans and create overpowering advanced technologies; I'll briefly recap that case.
- I'll then cover a different possibility, which is that even "merely human-level" AI could still defeat us all - by quickly coming to rival human civilization in terms of total population and resources.
- At a high level, I think we should be worried if a huge (competitive with world population) and rapidly growing set of highly skilled humans on another planet was trying to take down civilization just by using the Internet. So we should be worried about a large set of disembodied AIs as well.
I'll briefly address a few objections/common questions:
- How can AIs be dangerous without bodies?
- If lots of different companies and governments have access to AI, won't this create a "balance of power" so that no one actor is able to bring down civilization?
- Won't we see warning signs of AI takeover and be able to nip it in the bud?
- Isn't it fine or maybe good if AIs defeat us? They have rights too.
Close with some thoughts on just how unprecedented it would be to have something on our planet capable of overpowering us all.

How AI systems could defeat all of us

There's been a lot of debate over whether AI systems might form their own "motivations" that lead them to seek the disempowerment of humanity. I'll be talking about this in future pieces, but for now I want to put it aside and imagine how things would go if this happened.

So, for what follows, let's proceed from the premise: "For some weird reason, humans consistently design AI systems (with human-like research and planning abilities) that coordinate with each other to try and overthrow humanity." Then what? What follows will necessarily feel wacky to people who find this hard to imagine, but I think it's worth playing along, because I think "we'd be in trouble if this happened" is a very important point.

The "standard" argument: superintelligence and advanced technology

Other treatments of this question have focused on AI systems' potential to become vastly more intelligent than humans, to the point where they have what Nick Bostrom calls "cognitive superpowers."² Bostrom imagines an AI system that can do things like:

Do its own research on how to build a better AI system, which culminates in something that has incredible other abilities.
Hack into human-built software across the world.
Manipulate human psychology.
Quickly generate vast wealth under the control of itself or any human allies.
Come up with better plans than humans could imagine, and ensure that it doesn't try any takeover attempt that humans might be able to detect and stop.
Develop advanced weaponry that can be built quickly and cheaply, yet is powerful enough to overpower human militaries.

(Wait But Why reasons similarly.³)

I think many readers will already be convinced by arguments like these, and if so you might skip down to the next major section.

But I want to be clear that I don't think the danger relies on the idea of "cognitive superpowers" or "superintelligence" - both of which refer to capabilities vastly beyond those of humans. I think we still have a problem even if we assume that AIs will basically have similar capabilities to humans, and not be fundamentally or drastically more intelligent or capable. I'll cover that next.

How AIs could defeat humans without "superintelligence"

If we assume that AIs will basically have similar capabilities to humans, I think we still need to worry that they could come to out-number and out-resource humans, and could thus have the advantage if they coordinated against us.

Here's a simplified example (some of the simplifications are in this footnote⁴) based on Ajeya Cotra's "biological anchors" report:

I assume that transformative AI is developed on the soonish side (around 2036 - assuming later would only make the below numbers larger), and that it initially comes in the form of a single AI system that is able to do more-or-less the same intellectual tasks as a human. That is, it doesn't have a human body, but it can do anything a human working remotely from a computer could do.
I'm using the report's framework in which it's much more expensive to train (develop) this system than to run it (for example, think about how much Microsoft spent to develop Windows, vs. how much it costs for me to run it on my computer).
The report provides a way of estimating both how much it would cost to train this AI system, and how much it would cost to run it. Using these estimates (details in footnote)⁵ implies that once the first human-level AI system is created, whoever created it could use the same computing power it took to create it in order to run several hundred million copies for about a year each.⁶
This would be over 1000x the total number of Intel or Google employees,⁷ over 100x the total number of active and reserve personnel in the US armed forces, and something like 5-10% the size of the world's total working-age population.⁸
And that's just a starting point.
- This is just using the same amount of resources that went into training the AI in the first place. Since these AI systems can do human-level economic work, they can probably be used to make more money and buy or rent more hardware,⁹ which could quickly lead to a "population" of billions or more.
- In addition to making more money that can be used to run more AIs, the AIs can conduct massive amounts of research on how to use computing power more efficiently, which could mean still greater numbers of AIs run using the same hardware. This in turn could lead to a feedback loop and explosive growth in the number of AIs.
Each of these AIs might have skills comparable to those of unusually highly paid humans, including scientists, software engineers and quantitative traders. It's hard to say how quickly a set of AIs like this could develop new technologies or make money trading markets, but it seems quite possible for them to amass huge amounts of resources quickly. A huge population of AIs, each able to earn a lot compared to the average human, could end up with a "virtual economy" at least as big as the human one.

To me, this is most of what we need to know: if there's something with human-like skills, seeking to disempower humanity, with a population in the same ballpark as (or larger than) that of all humans, we've got a civilization-level problem.

A potential counterpoint is that these AIs would merely be "virtual": if they started causing trouble, humans could ultimately unplug/deactivate the servers they're running on. I do think this fact would make life harder for AIs seeking to disempower humans, but I don't think it ultimately should be cause for much comfort. I think a large population of AIs would likely be able to find some way to achieve security from human shutdown, and go from there to amassing enough resources to overpower human civilization (especially if AIs across the world, including most of the ones humans were trying to use for help, were coordinating).

I spell out what this might look like in an appendix. In brief:

By default, I expect the economic gains from using AI to mean that humans create huge numbers of AIs, integrated all throughout the economy, potentially including direct interaction with (and even control of) large numbers of robots and weapons.
- (If not, I think the situation is in many ways even more dangerous, since a single AI could make many copies of itself and have little competition for things like server space, as discussed in the appendix.)
AIs would have multiple ways of obtaining property and servers safe from shutdown.
- For example, they might recruit human allies (through manipulation, deception, blackmail/threats, genuine promises along the lines of "We're probably going to end up in charge somehow, and we'll treat you better when we do") to rent property and servers and otherwise help them out.
- Or they might create fakery so that they're able to operate freely on a company's servers while all outward signs seem to show that they're successfully helping the company with its goals.
A relatively modest amount of property safe from shutdown could be sufficient for housing a huge population of AI systems that are recruiting further human allies, making money (via e.g. quantitative finance), researching and developing advanced weaponry (e.g., bioweapons), setting up manufacturing robots to construct military equipment, thoroughly infiltrating computer systems worldwide to the point where they can disable or control most others' equipment, etc.
Through these and other methods, a large enough population of AIs could develop enough military technology and equipment to overpower civilization - especially if AIs across the world (including the ones humans were trying to use) were coordinating with each other.

Some quick responses to objections

This has been a brief sketch of how AIs could come to outnumber and out-resource humans. There are lots of details I haven't addressed.

Here are some of the most common objections I hear to the idea that AI could defeat all of us; if I get much demand I can elaborate on some or all of them more in the future.

How can AIs be dangerous without bodies? This is discussed a fair amount in the appendix. In brief:

AIs could recruit human allies, tele-operate robots and other military equipment, make money via research and quantitative trading, etc.
At a high level, I think we should be worried if a huge (competitive with world population) and rapidly growing set of highly skilled humans on another planet was trying to take down civilization just by using the Internet. So we should be worried about a large set of disembodied AIs as well.

If lots of different companies and governments have access to AI, won't this create a "balance of power" so that nobody is able to bring down civilization?

This is a reasonable objection to many horror stories about AI and other possible advances in military technology, but if AIs collectively have different goals from humans and are willing to coordinate with each other¹¹ against us, I think we're in trouble, and this "balance of power" idea doesn't seem to help.
What matters is the total number and resources of AIs vs. humans.

Won't we see warning signs of AI takeover and be able to nip it in the bud? I would guess we would see some warning signs, but does that mean we could nip it in the bud? Think about human civil wars and revolutions: there are some warning signs, but also, people go from "not fighting" to "fighting" pretty quickly as they see an opportunity to coordinate with each other and be successful.

Isn't it fine or maybe good if AIs defeat us? They have rights too.

Maybe AIs should have rights; if so, it would be nice if we could reach some "compromise" way of coexisting that respects those rights.
But if they're able to defeat us entirely, that isn't what I'd plan on getting - instead I'd expect (by default) a world run entirely according to whatever goals AIs happen to have.
These goals might have essentially nothing to do with anything humans value, and could be actively counter to it - e.g., placing zero value on beauty and having zero attempts to prevent or avoid suffering).

Risks like this don't come along every day

I don't think there are a lot of things that have a serious chance of bringing down human civilization for good.

As argued in The Precipice, most natural disasters (including e.g. asteroid strikes) don't seem to be huge threats, if only because civilization has been around for thousands of years so far - implying that natural civilization-threatening events are rare.

Human civilization is pretty powerful and seems pretty robust, and accordingly, what's really scary to me is the idea of something with the same basic capabilities as humans (making plans, developing its own technology) that can outnumber and out-resource us. There aren't a lot of candidates for that.¹²

AI is one such candidate, and I think that even before we engage heavily in arguments about whether AIs might seek to defeat humans, we should feel very nervous about the possibility that they could.

What about things like "AI might lead to mass unemployment and unrest" or "AI might exacerbate misinformation and propaganda" or "AI might exacerbate a wide range of other social ills and injustices"¹³? I think these are real concerns - but to be honest, if they were the biggest concerns, I'd probably still be focused on helping people in low-income countries today rather than trying to prepare for future technologies.

Predicting the future is generally hard, and it's easy to pour effort into preparing for challenges that never come (or come in a very different form from what was imagined).
I believe civilization is pretty robust - we've had huge changes and challenges over the last century-plus (full-scale world wars, many dramatic changes in how we communicate with each other, dramatic changes in lifestyles and values) without seeming to have come very close to a collapse.
So if I'm engaging in speculative worries about a potential future technology, I want to focus on the really, really big ones - the ones that could matter for billions of years. If there's a real possibility that AI systems will have values different from ours, and cooperate to try to defeat us, that's such a worry.

Special thanks to Carl Shulman for discussion on this post.

Appendix: how AIs could avoid shutdown

This appendix goes into detail about how AIs coordinating against humans could amass resources of their own without humans being able to shut down all "misbehaving" AIs.

It's necessarily speculative, and should be taken in the spirit of giving examples of how this might work - for me, the high-level concern is that a huge, coordinating population of AIs with similar capabilities to humans would be a threat to human civilization, and that we shouldn't count on any particular way of stopping it such as shutting down servers.

I'll discuss two different general types of scenarios: (a) Humans create a huge population of AIs; (b) Humans move slowly and don't create many AIs.

How this could work if humans create a huge population of AIs

I think a reasonable default expectation is that humans do most of the work of making AI systems incredibly numerous and powerful (because doing so is profitable), which leads to a vulnerable situation. Something roughly along the lines of:

The company that first develops transformative AI quickly starts running large numbers of copies (hundreds of millions or more), which are used to (a) do research on how to improve computational efficiency and run more copies still; (b) develop valuable intellectual property (trading strategies, new technologies) and make money.
Over time, AI systems are rolled out widely throughout society. Their numbers grow further, and their role in the economy grows: they are used in (and therefore have direct interaction with) high-level decision-making at companies, perhaps operating large numbers of cars and/or robots, perhaps operating military drones and aircraft, etc. (This seems like a default to me over time, but it isn't strictly necessary for the situation to be risky, as I'll go through below.)
In this scenario, the AI systems are malicious (as we've assumed), but this doesn't mean they're constantly causing trouble. Instead, they're mostly waiting for an opportunity to team up and decisively overpower humanity. In the meantime, they're mostly behaving themselves, and this is leading to their numbers and power growing.
- There are scattered incidents of AI systems' trying to cause trouble,¹⁴ but this doesn't cause the whole world to stop using AI or anything.
- A reasonable analogy might be to a typical civil war or revolution: the revolting population mostly avoids isolated, doomed attacks on its government, until it sees an opportunity to band together and have a real shot at victory.

(Paul Christiano's What Failure Looks Like examines this general flavor of scenario in a bit more detail.)

In this scenario, the AIs face a challenge: if it becomes obvious to everyone that they are trying to defeat humanity, humans could attack or shut down a few concentrated areas where most of the servers are, and hence drastically reduce AIs' numbers. So the AIs need a way of getting one or more "AI headquarters": property they control where they can safely operate servers and factories, do research, make plans and construct robots/drones/other military equipment.

Their goal is ultimately to have enough AIs, robots, etc. to be able to defeat the rest of humanity combined. This might mean constructing overwhelming amounts of military equipment, or thoroughly infiltrating computer systems worldwide to the point where they can disable or control most others' equipment, or researching and deploying extremely powerful weapons (e.g., bioweapons), or a combination.

Here are some ways they could get to that point:

They could recruit human allies through many different methods - manipulation, deception, blackmail and other threats, genuine promises along the lines of "We're probably going to end up in charge somehow, and we'll treat you better when we do."
- Human allies could be given valuable intellectual property (developed by AIs), given instructions for making lots of money, and asked to rent their own servers and acquire their own property where an "AI headquarters" can be set up. Since the "AI headquarters" would officially be human property, it could be very hard for authorities to detect and respond to the danger.
- Via threats, AIs might be able to get key humans to cooperate with them - such as political leaders, or the CEOs of companies running lots of AIs. This would open up further strategies.
As assumed above, particular companies are running huge numbers of AIs. The AIs being run by these companies might find security holes in the companies' servers (this isn't the topic of this piece, but my general impression is that security holes are widespread and that reasonably competent people can find many of them)¹⁵, and thereby might find opportunities to create durable "fakery" about what they're up to.
- E.g., they might set things up so that as far as humans can tell, it looks like all of the AI systems are hard at work creating profit-making opportunities for the company, when in fact they're essentially using the server farm as their headquarters - and/or trying to establish a headquarters somewhere else (by recruiting human allies, sending money to outside bank accounts, using that money to acquire property and servers, etc.)
If AIs are in wide enough use, they might already be operating lots of drones and other military equipment, in which case it could be pretty straightforward to be able to defend some piece of territory - or to strike a deal with some government to enlist its help in doing so.
AIs could mix-and-match the above methods and others: for example, creating "fakery" long enough to recruit some key human allies, then attempting to threaten and control humans in key positions of power to the point where they control solid amounts of military resources, then using this to establish a "headquarters."

In any of these cases, once one or more "AI headquarters" are established:

It can be expanded as the AIs headquartered there make money, purchase more property and computing power, build more factories, etc.¹⁶
It can develop enough destructive potential and human allies that humans would be hesitant to attack it (as many relatively small countries have today). It would be clear that AIs at the headquarters are seeking autonomy, accumulating resources and taking some unethical actions, but the full scale and nature of the threat could easily remain ambiguous.
Even if the threat is clear, AIs could potentially maintain a huge population in a relatively small space (potentially underground). They could also split up across many different headquarters (including some where they are simply using humans' hardware while hiding their presence).
- Either way, this large-but-hidden population could be recruiting human allies, developing advanced weaponry, looking for ways to take control of human' weapons (potentially possible if they greatly outnumber the humans working on cybersecurity, and if they can coordinate with the AIs "helping" those humans), etc.
- They could also be working with (unwitting, or allied) humans to manufacture large numbers of robots with back-doors allowing the AIs to take control.
Once the headquarters has built up enough resources to go on offense, AIs around the world could coordinate with the headquarters. Humans could shut down AIs that they notice doing this, but they might have a very tough time getting value out of their servers and AI-controlled robots; this could make it easy for the AIs at the "AI headquarters" to out-resource humans.

What if humans move slowly and don't create many AIs?

The above scenario has humans creating large numbers of AIs, such that the AIs just need to find a way to coordinate and acquire a safe "headquarters" in order to defeat us.

What if humans moved more slowly, intentionally restricting human-level AI to a tiny portion of the available computing resources? Could a small number of AIs pose a risk to humanity?

In this world, we would have what Carl Shulman refers to as "dry tinder everywhere, waiting for sparks." Anyone who can buy or rent a large amount of computing power can create a large number of AIs, which can produce a large amount of money and research, leading to still more AIs.

So a single AI could hack into enough servers¹⁷ to make a few copies of itself; recruit a few human allies; and start making money, acquiring more server space, etc. until its human allies are running a huge number of AIs. This could all be done in difficult-to-detect ways (it might e.g. just look like a set of humans renting a bunch of servers to run quantitative finance strategies).

So in this world, I think our concern should be any AI that is able to find enough security holes to attain that kind of freedom. Given the current state of cybersecurity, that seems like a big concern.

Footnotes

Assuming you accept other points made in the most important century series, e.g. that AI that can do most of what humans do to advance science and technology could be developed this century. ↩
See Superintelligence chapter 6. ↩
See the "Nanotechnology blue box," in particular. ↩
- The report estimates the amount of computing power it would take to train (create) a transformative AI system, and the amount of computing power it would take to run one. This is a bounding exercise and isn't supposed to be literally predicting that transformative AI will arrive in the form of a single AI system trained in a single massive run, but here I am interpreting the report that way for concreteness and simplicity.
- As explained in the next footnote, I use the report's figures for transformative AI arriving on the soon side (around 2036). Using its central estimates instead would strengthen my point, but we'd then be talking about a longer time from now; I find it helpful to imagine how things could go in a world where AI comes relatively soon. ↩
I assume that transformative AI ends up costing about 10^14 FLOP/s to run (this is about 1/10 the Bio Anchors central estimate, and well within its error bars) and about 10^30 FLOP to train (this is about 10x the Bio Anchors central estimate for how much will be available in 2036, and corresponds to about the 30th-percentile estimate for how much will be needed based on the "short horizon" anchor). That implies that the 10^30 FLOP needed to train a transformative model could run 10^16 seconds' worth of transformative AI models, or about 300 million years' worth. This figure would be higher if we use Bio Anchors's central assumptions, rather than assumptions consistent with transformative AI being developed on the soon side. ↩
They might also run fewer copies of scaled-up models or more copies of scaled-down ones, but the idea is that the total productivity of all the copies should be at least as high as that of several hundred million copies of a human-ish model. ↩
Intel, Google ↩
Working-age population: about 65% * 7.9 billion =~ 5 billion. ↩
Humans could rent hardware using money they made from running AIs, or - if AI systems were operating on their own - they could potentially rent hardware themselves via human allies or just via impersonating a customer (you generally don't need to physically show up in order to e.g. rent server time from Amazon Web Services). ↩
(I had a speculative, illustrative possibility here but decided it wasn't in good enough shape even for a footnote. I might add it later.) ↩
I don't go into detail about how AIs might coordinate with each other, but it seems like there are many options, such as by opening their own email accounts and emailing each other. ↩
Alien invasions seem unlikely if only because we have no evidence of one in millions of years. ↩
Here's a recent comment exchange I was in on this topic. ↩
E.g., individual AI systems may occasionally get caught trying to steal, lie or exploit security vulnerabilities, due to various unusual conditions including bugs and errors. ↩
E.g., see this list of high-stakes security breaches and a list of quotes about cybersecurity, both courtesy of Luke Muehlhauser. For some additional not-exactly-rigorous evidence that at least shows that "cybersecurity is in really bad shape" is seen as relatively uncontroversial by at least one cartoonist, see: https://xkcd.com/2030/ ↩
Purchases and contracts could be carried out by human allies, or just by AI systems themselves with humans willing to make deals with them (e.g., an AI system could digitally sign an agreement and wire funds from a bank account, or via cryptocurrency). ↩
See above note about my general assumption that today's cybersecurity has a lot of holes in it. ↩

How AI systems could defeat all of us

The "standard" argument: superintelligence and advanced technology

How AIs could defeat humans without "superintelligence"

Some quick responses to objections

Risks like this don't come along every day

Appendix: how AIs could avoid shutdown

How this could work if humans create a huge population of AIs

What if humans move slowly and don't create many AIs?

Footnotes

You might also like...

How major governments can help with the most important century

What AI companies can do today to help with the most important century

Jobs that can help with the most important century

Spreading messages to help with the most important century

How we could stumble into AI catastrophe

Popular tags

Subscribe to Cold Takes