This is part of a series of lists of projects. The unifying theme is that the projects are not targeted at solving alignment or engineered pandemics but still targeted at worlds where transformative AI is coming in the next 10 years or so. See here for the introductory post.
Commonly discussed motivations cited for why rapid AI progress might be scary are:
Risks from misalignment.
Risk from AI-assisted bioweapons.
But even aside from these risks, it seems likely that advanced AI will lead to explosive technological and economic growth across the board, which could lead to a large number of problems emerging at a frighteningly fast pace.1
The growth speed-ups could be extreme. The basic worry is that AI would let us return to a historical trend of super-exponential growth.2 If this happens, I don’t know any reassuring upper limit for how fast growth could go.
(Illustratively: Paul Christiano’s suggested definition for slow takeoff is “There will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles.” If world GDP doubled in a single year, that would mean that growth was ~30x faster than it is right now.)
If technological growth speeds up by, say, 30x, then that suggests that over just a few years, we might have to deal with all the technologies that would (at a “normal” pace) be discovered over 100 years. That’s an intense and scary situation.
This section is about problems that might arise in this situation and governance solutions that could help mitigate them. It’s also about “meta” solutions that could help us deal with all of these issues at once, e.g. by improving our ability to coordinate to slow down the development and deployment of new technologies.
Note: Many of the projects in this section would also be useful for alignment. But I’m not covering any proposals that are purely focused on addressing alignment concerns.
Investigate and publicly make the case for/against explosive growth being likely and risky [Forecasting] [Empirical research] [Philosophical/conceptual] [Writing]
I think there’s substantial value to be had in vetting and describing the case for explosive growth, as well as describing why it could be terrifying. Explosive growth underlies most of the concerns in this section — so establishing the basic risk is very important.
Note: There’s some possible backfire risk, here. Making a persuasive case for explosive growth could motivate people to try harder to get there even faster. (And in particular, to try to get there before other actors do.) Thereby giving humanity even less time to prepare.
I don’t think that’s a crazy concern. On the other hand, it’s plausible that we’re currently in an unfortunate middle ground, where everyone already believes that frontier AI capabilities will translate into a lot of power, but no one expects the crazy-fast growth that would strike fear into their hearts.
On balance, my current take is that it’s better for the world to see what’s coming than to stumble into it blindly.
Related/previous work:
Explosive growth from AI automation: A review of the arguments.
OpenPhil reports.
Carl Shulman episode on Lunar Society.
Examples of how to attack this problem:
One objection to explosive growth is that certain bottlenecks will prevent large factors of speed-up like 10x or 100x. One project would be to look into such bottlenecks (running physical experiments, regulatory hurdles, mining the requisite materials, etc) and assess their plausibility.
For existing work here, see the “arguments against the explosive growth hypothesis” section of Explosive growth from AI automation.
Biological analogies of maximally fast self-replication times and how relevant those analogies are to future growth.
For example, duckweed has a doubling time of a few days.
Laying out the existing case in a more accessible and vivid format.
Investigating concrete scary technologies and detailing scenarios where they emerge too fast to figure out how to deal with. E.g.:
Cheap nukes.
Nanotech.
Cheap and scalable production of small, deadly drones.
Highly powerful surveillance, lie detection, and mind reading.
Extremely powerful persuasion.
Various issues around digital minds. Capability of uploading human minds.
Transitioning from an economy where people make money via labor where almost all income is paid out to those who control capital. (Because of AI automation.)
Painting a picture of a great outcome [Forecasting] [Philosophical/conceptual] [Governance]
Although a fast intelligence explosion would be terrifying, the resulting technology could also be used to create a fantastic world. It would be great to be able to combine (i) appropriate worry about a poorly handled intelligence explosion and (ii) a convincing case for how everyone could get what they want if we just coordinate. That would make for a powerful case for why people should focus on coordinating. (And not risk everything by racing and grabbing for power.)
For a related proposal and some ideas about how to go about it, see Holden Karnofsky’s proposal here.
For some previous work, see the Future of Life Institute’s worldbuilding contest (and follow-up work).
Policy-analysis of issues that could come up with explosive technological growth [Governance] [Forecasting] [Philosophical/conceptual]
Here are three concrete areas that might need solutions in order for humanity to build an excellent world amidst all the new technology that might soon be available to us. Finding policy solutions for these could:
Help people get started with implementing the solutions.
Contribute to “Painting a picture of a great outcome”, mentioned just above.
Address vulnerable world hypothesis with minimal costs
(“Vulnerable world hypothesis” is in reference to this paper.)
If too many actors had access to incredibly destructive tech, we’d probably see a lot of destruction. Because a small fraction of people would choose to use it.
Unfortunately, a good heuristic for what we would get from rapid technological growth is: cheaper production of more effective products. This suggests that sufficiently large technological growth would enable cheap and convenient production of e.g. even more explosive nukes or deadlier pandemics.
Will technology also give us powerful defenses against these technologies? I can easily imagine technological solutions to some issues, e.g. physical defenses against the spread of deadly pandemics. But for others, it seems more difficult. I don’t know what sort of technology would give you a cheap and convenient defense against a nuclear bomb being detonated nearby.
One solution is to prevent the problem at its source: By preventing access to the prerequisite materials or technologies and/or by monitoring people who have the capacity to cause large amounts of destruction.
But in some scenarios, almost anyone could have the capacity to cause widespread destruction. So the monitoring might have to be highly pervasive, which would come with significant costs and risks of misuse.
Exploring potential solutions to this problem hasn’t really been done in depth. It would be great to find solutions that minimize the harms from both destructive technologies and from pervasive surveillance.3
Examples of how to attack this problem:
To what extent would the proposal in Surveil things, not people solve the problem?
For example: Are there plausible technologies that could spread purely via people sharing information?
Better information technology could allow surveillance/monitoring to be much more precise, detecting certain key facts (“Is this person constructing a bomb?”) while not recording or leaking information about anything else. (C.f. the case for privacy optimism / Beyond Privacy Trade-offs with Structured Transparency.) To what extent could variants of this address concerns about intrusive surveillance?
Could this address pragmatic concerns, e.g.: government abuse of surveillance powers, flexible enforcement of unjust laws, etc.
Relatedly: Could this be constructed in a way that allowed citizens to verify that no surveillance was going on beyond what was claimed?
Absent pragmatic concerns, would some variants of this be compatible with people’s psychological desire for privacy?
Which variants of this would be (in)compatible with established legal rights to privacy?
How to handle brinkmanship/threats?
Historically, we’ve taken on large risks from brinkmanship around nukes. For example, during the Cuban missile crisis, President Kennedy thought that the risk of escalation to war was "between 1 in 3 and even".
I would expect similar risks to go up substantially at a time when humanity is rapidly developing new technology. This technology will almost certainly enable new, powerful weapons with unknown strategic implications and without any pre-existing norms about their use.
In addition, AI might enable new commitment mechanisms.
If you had a solution to the alignment problem, you could program an AI system to follow through on certain commitments and then hand over control to that AI system.
If you had accurate lie detection, some people might be able to decide to keep certain commitments and use lie detection technology to make those commitments credible.
This could help solve coordination problems. But it could also create entirely new strategies and risks around brinkmanship and threats.
What policies should people adopt in the presence of strong commitment abilities? My impression is that state-of-the-art theory on this isn’t great.
The main recommendation from traditional game theory is “try hard to commit to something crazy before your opponent, and then it will be rational for them to do whatever you want”. That doesn’t seem like the way we want future decisions to be made.
There are some alternative theoretical approaches under development, such as open-source game theory. But they haven’t gotten very far. And I don’t know of any ambitious attempts to answer the question of how people ought to behave around this in the real world. Taking into account real-world constraints (around credibility, computation power, lack of perfect rationality, etc.) as well as real-world advantages (such as a shared history and some shared human intuitions, which might provide a basis for successfully coordinating on good norms).
Here’s one specific story for why this could be tractable and urgent: Multi-agent interactions often have a ton of equilibria, and which one is picked will depend on people’s expectations about what other people will do, which are informed about their expectations of other people, etc. If you can anticipate a strategic situation before it arrives and suggest a particular way of handling it, that could change people’s expectations about each other and thereby change the rational course of action.
Examples of how to attack this problem:
What will the facts-on-the-ground situation look like? What commitment abilities will people have, and how transparent will they be to other parties?
Given plausible facts on the ground, what are some plausible norms that would (i) make for a good society if everyone followed them, (ii) that would be good to follow if everyone else followed them, and (iii) that aren’t too brittle if people settle on somewhat different norms?
Example of a norm: Never do something specifically because someone else doesn't want it.
This could draw a lot on looking at current norms around geopolitical conflicts and other areas. To get out of abstract land and think about what people do in practice. (For example, for the norm suggested above, you could ask whether it is compatible with or would condemn sanctions? What about the criminal justice system?)
Avoiding AI-assisted human coups
Advanced AI could enable dangerously high concentrations of power. This could happen via at least two different routes.
Firstly, a relatively small group of people (e.g. a company or some executives/employees of a company) who develop the technology could rapidly accumulate a lot of technology and cognitive power compared to the rest of the world. If those people decided to launch a coup against a national government, they may have a good chance of succeeding.
The obvious solution to that problem is for the government and other key stakeholders to have significant oversight over the company’s operations. Up to and including nationalizing frontier labs.
The second problem is that AI might lead power within institutions to be more concentrated in the hands of a few people.
Today, institutions are often coordinated via chain-of-command systems where humans are expected to obey other humans. But hard power is ultimately distributed between a large number of individual humans.
As a consequence: If a leader tries to use their power to launch a coup, their subordinates are capable of noticing the extraordinary circumstances they find themselves in, and use their own judgment about whether to obey orders or not. So Alice can’t necessarily count on Bob’s support in a coup, even if Bob is Alice’s subordinate.
But with AI, it will be technically possible to construct institutions where hard power is controlled by AIs, who could be trained to obey certain humans’ orders without question. And even if those AIs were also programmed to follow certain laws and standards, a conflict between laws/standards and the orders of their human overseers might lead to behavior that is out-of-distribution and unpredictable (rather than the laws/standards overriding the humans’ orders).
To solve this problem: We’d want people to make conscious decisions about what the AIs should do when all the normal sources of authority disagree and explicitly train the AIs for the right behavior in those circumstances. Also, we’d want them to institute controls to prevent a small number of individuals from unilaterally retraining the AIs.
Examples of how to attack this:
Spell out criteria (e.g. certain evaluations) for when AI is becoming powerful enough that it needs strong government oversight. Describe what this oversight would need to look like.
Outline a long list of cases where it might be ambiguous what highly capable AI systems should do (when different sources of authority disagree). Spell out criteria for when AIs have enough power that the relevant lab/government should have taken a position on all those cases (and trained the AI to behave appropriately).
Advocate for labs to set up strict internal access controls to weights, as well as access controls and review of the code used to train large language models. This is to prevent a small group of lab employees from modifying the training of powerful AIs to make the AI loyal to them in particular.
Get similar assurances in place for AIs used in government, military, and law enforcement. This could include giving auditing privileges to opposition parties, international allies, etc.
Spell out technical competencies that auditors (either dedicated organizations or stakeholders like opposition parties and other governments) would need to properly verify what they were being told. Publicly explain this and advocate for those actors to develop those technical competencies.
There’s significant overlap between solutions to this problem and the proposal Develop technical proposals for how to train models in a transparently trustworthy way from the “Epistemics” post in this series.
(Thanks to Carl Shulman for discussion.)
Governance issues raised by digital minds
There are a lot of governance issues raised by the possibility of digital minds. For example, what sort of reform is needed in one-person-one-vote democracies when creating new persons is as easy as copying software? See also Develop candidate regulation from the “Digital Sentience & Rights” post in this series.
Norms/proposals for how to navigate an intelligence explosion [Governance] [Forecasting] [Philosophical/conceptual]
Painting a picture of a great outcome suggested outlining an acceptable “endpoint” to explosive growth.
Separately, there’s a question of what the appropriate norms are for how we get from where we are today to that situation.
Other than risks from misaligned AI along the way, I think the 3 central points here are:
If one actor pushes ahead with an intelligence explosion, they might get a massive power advantage over the rest of the world. That creates a big risk for everyone else, who might find themselves powerless. Instead of going through with a massive gamble like that, could we set up agreements or norms that make it more likely that everyone has some say in the future?
A maximum-speed intelligence explosion will lead to a lot of changes and a lot of new technology really, really fast. That’s a core part of what’s scary, here. Could we somehow coordinate to go slower?
Good post-intelligence-explosion worlds will (at least eventually) look quite different from our world, and that includes governance & politics looking quite different. For example:
There will be more focus on deciding what sorts of crazy technology can be used in what ways.
There will be less need to focus on economic growth to meet the material needs of currently existing people.4
There will be massive necessary changes associated with going from purely biological citizens to also having digital minds with political rights.
There may be changes in what form of government is naturally favored. Worryingly, perhaps away from favoring democracy,5 which could return us to autocratic forms of rule (which are more historically typical) unless we make special efforts to preserve democracy.
There will be strong incentives to hand over various parts of politics to more effective AI systems. (Drafting proposals, negotiating, persuading people about your point of view, etc.)
It would be nice to get a head start on making this governance transition happen smoothly and deliberately.
What follows are some candidate norms and proposals. Each of them could be:
Sketched out in much greater detail.
Evaluated for how plausible it is to form a credible norm around it.
Evaluated for whether such a norm would be good or bad.
Evaluated for whether partial movement in that direction would be good or bad. (So that we don’t propose norms that backfire if they get less than complete support.)
Advocated for.
(For ideas in this section, I’ve especially benefitted from discussion with Carl Shulman, Will MacAskill, Nick Beckstead, Ajeya Cotra, Daniel Kokotajlo, and Joe Carlsmith.)
No “first strike” intelligence explosion
One candidate norm could be: “It’s a grave violation of international norms for a company or a nation to unilaterally start an intelligence explosion because this has a high likelihood of effectively disempowering all other nations. A nation can only start an intelligence explosion if some other company or nation has already started an intelligence explosion of their own or if they have agreement from a majority of other nations that this is in everyone’s best interest.” (Perhaps because they have adequate preparations for reassuring other countries that they won’t be disempowered. Or perhaps because the alternative would be to wait for an even less responsible intelligence explosion from some other country.)
Never go faster than X?
The above proposal was roughly: Only step through an intelligence explosion if you have agreements from other nations about how to do it.
A more opinionated and concrete proposal is: Collectively, as a world, we should probably never grow GDP or innovate technology faster than a certain maximum rate. Perhaps something like: Technological and economic growth should be no more than 5x faster than 2023 levels.
Concrete decision-making proposals
A proposal that would be more opinionated in some ways, and less opinionated in other ways, would be: At some level of AI capabilities, you’re supposed to call for a grand constitutional convention to decide how to navigate the rest of the intelligence explosion, and what to do afterward.
You could make suggestions about this constitutional convention, e.g.:
The first round will last for a year.
There will be 1000 people participating, sampled from nations in proportion to their population.
People will use approval voting.
etc.
The “constitutional convention” proposal resembles something like deliberative democracy insofar as the selected people are randomly chosen, and looks more like typical geopolitical deliberations insofar as nations’ governments decide which representatives to send.
(Thanks to Will MacAskill for discussion.)
Technical proposals for slowing down / coordinating
In order to be especially effective, all 3 above proposals require credible technical proposals for how nations could verify that they were all collectively slowing down. Exactly how such proposals should work is a big open problem. Which, thankfully, some people are working on.
I don’t know any detailed list of open questions here. (And I’d be interested if anyone had such a list!) But some places to look are:
Shavit (2023) is an important paper that brings up some open problems.
Lennart Heim’s list of resources for compute governance.
Naming some other candidate directions:
Research how to design computer chips and rules for computer chips that cleanly distinguish chips that can and can’t be used for AI training. So that restrictions on AI chips can be implemented with minimal consequences for other applications.
Proposing schemes for how competitors (like the US & China) could set up sufficiently intense monitoring of each other to be confident that there’s no illicit AI research happening.
C.f. historical success at spying for detecting large operations.
How feasible would it be to do an international “CERN for AI”-type thing?
In particular: To what extent could this possibly involve competitors who don’t particularly trust each other? Given risks of leaks and risks of backdoors or data poisoning.
Great proposals here should probably be built around capability evaluations and commitments for what to do at certain capability levels. We can already start to practice this by developing better capability evaluations and proposing standards around them. It seems especially good to develop evaluations for AI speeding up ML R&D.
Dubiously enforceable promises
Get nations to make promises early on along the lines of “We won’t use powerful AI to offensively violate other countries’ national sovereignty. Not militarily, and not by weird circumspect means either (e.g. via superhuman AI persuasion).” Maybe promising other nations a seat at the bargaining table that determines the path of the AI-driven future. Maybe deciding on a default way to distribute future resources and promising that all departures from that will require broad agreement.
It is, of course, best if such promises are credible and enforceable.
But I think this would have some value even if it's just something like: The US Congress passes a bill that contains a ton of promises to other nations. And that increases the probability that they'll act according to those promises.
There’s a lot of potential work to do here in drafting suggested promises that would:
Be meaningful in extreme scenarios.
Be plausibly credible in extreme scenarios.
Could plausibly be passed in a political climate that still has serious doubts about where all this AI stuff will lead.
(Thanks to Carl Shulman for discussion.)
Technical proposals for aggregating preferences
A different direction would be for people to explore technical proposals for effectively aggregating people’s preferences. So that, during an intelligence explosion, it’s more convenient to get more accurate pictures of what different constitutions would recommend. Thereby making it harder to legitimately dismiss such demands.
Previous work:
Jan Leike’s proposal for “Building towards Coherent Extrapolated Volition with language models”.
The Polis platform. (Which, notably, Anthropic used when exploring what constitution to choose for their constitutional AI methods.)
Decrease the power of bad actors
Bad actors could make dangerous decisions about what to do with AI technology. Perhaps increasing risks from brinkmanship or other destructive technology — or perhaps seizing and maintaining indefinite power over the future and making poor choices about what to do with it. See also some of the risks described in Reducing long-term risks from malevolent actors.
One line of attack on this problem is to develop policies for avoiding AI-assisted human coups. But here are a few more.
Avoid malevolent individuals getting power within key organizations [Governance]
If you have opportunities to affect the policies of important institutions, it could be valuable to reduce the probability that malevolent individuals get hired and/or are selected for key roles.
Examples of how to attack this question: (h/t Stefan Torges)
Making it more likely that malevolent actors are detected before joining an AI development effort (e.g., screenings, background checks).
Making it more likely that malevolent actors are detected within AI development efforts (e.g., staff training, screenings for key roles).
Making it more likely that staff speak up about / report suspicious behavior (e.g., whistleblower protections, appropriate organizational processes).
Making it more likely that malevolent actors are removed based on credible evidence (e.g., appropriate governance structures).
Setting up appropriate access controls within AI development efforts. E.g. requiring multiple people’s simultaneous approval for crucial types of access and/or reducing the number of people with the power to access the models (unilaterally or with just a couple of people).
Changing promotion guidelines and/or culture in ways that select against rather than for power-seeking individuals.
See also Reducing long-term risks from malevolent actors for more analysis and intervention ideas.
Prevent dangerous external individuals/organizations from having access to AI [Governance] [ML]
There is (thankfully) some significant effort going on in this space already, so I don’t have a lot to add.
Examples of how to attack this question:
Improving security. (I think Securing AI Model Weights is the current state of the art of how labs could improve their security.)
Offering dangerous frontier models via API rather than giving weights access.
Controlling hardware access.
Accelerate good actors
This is a riskier proposition, and it’s extremely easy to accidentally do harm here. But in principle, one way to give bad actors relatively less power is to differentially accelerate good actors. C.f. successful, careful AI lab section of Holden Karnofsky’s playbook for AI risk reduction.
Analyze: What tech could change the landscape? [Forecasting] [Philosophical/conceptual] [Governance]
If an intelligence explosion could lead to 100 years of “normal” technological progress within just a few years, then this is a very unusually valuable time to have some foresight into what technologies are on the horizon.
It seems particularly valuable to anticipate technologies that could (i) pose big risks or (ii) enable novel solutions to other risks.
On (i), some plausible candidates are:
Bioweapons.
Misaligned-by-default, superintelligent AI.
Super-persuasion. (Discussed a bit in the Epistemics post in this series.)
Some candidates that could feature on both (i) and (ii):
Lie detection.
Various new surveillance/monitoring technologies.
Including: Better ability to have existing monitoring technology be privacy-preserving. (Which is more purely (ii).)
Commitment abilities.6
Atomically precise manufacturing.
Cognitive enhancement.
Good results here could influence:
Big list of questions that labs should have answers for [Philosophical/conceptual] [Forecasting] [Governance]
This could be seen as a project or could alternatively be seen as a framing for how to best address many of these issues. (Not just from this post, but also issues from other posts in this series.)
I think it’s plausible that our biggest “value-add” on these topics will be that we see potentially important issues coming before other people. This suggests that our main priority should be to clearly flag all the thorny issues we expect to appear during an intelligence explosion and ask questions about how labs (and possibly other relevant institutions) plan to deal with them.
This could be favorably combined with offering suggestions for how to address all the issues. But separate from any suggestions, it’s valuable to establish something as a problem that needs some answer so that people can’t easily dismiss any one solution without offering an alternative one.
Some of these questions might be framed as “What should your AI do in situation X?”. Interestingly, even if labs don’t engage with a published list of questions, we can already tell what (stated) position the labs’ current AIs have on those questions. They can be presented with dilemmas, the AIs can answer, and the results can be published.
Having a single big list of questions that must be addressed would also make it easier to notice when certain principles conflict. I.e., situations when you can’t fulfill them both at once and are forced to choose.
Example of how to attack this:
Ask yourself: “If the lab never made an intentional decision about how they or their AIs should handle X — how worried would you be?”. Write a list with versions of X, starting from the ones that would make you most worried. Also, write a hypothesized solution to it. (Both because solutions are valuable and because this exercise encourages concreteness.)
As a starting point for what to write about, you could consider many of the ideas in this series of posts. What will you do if your AI systems have morally relevant preferences? What if they care about politics and deserve political rights? What if you find yourself in a situation where you could unilaterally launch an intelligence explosion? Or where control over your AI systems could enable someone to launch a coup? Etc.
(Thanks to Carl Shulman for this idea and discussion.)
End
That’s all I have on this topic! As a reminder: it's very incomplete. But if you're interested in working on projects like this, please feel free to get in touch.
Other posts in series: Introduction, governance during explosive growth, epistemics, sentience and rights of digital minds, backup plans & cooperative AI.
Some of these risks are also fairly commonly discussed. In particular, centralization of power and risks from powerful AI falling into the wrong hands are both reasonably common concerns, and are strongly related to some of the projects I list in this section.
Why could this happen? Historically, the pace of innovation may have been tightly coupled to world GDP, because population size (i.e. the number of potential innovators) was constrained by the supply of food. In semi-endogenous growth models, this makes super-exponential growth plausible. But recently, growth has outpaced population growth, leading to a slower pace of innovation than our current amount of resources could theoretically support. But AGI would make it easy to convert resources into automated scientists, which could return us to the historical state of affairs. For more on this, see e.g. the duplicator.
C.f. this comment from Michael Nielsen’s notes on the vulnerable world hypothesis: “How to develop provably beneficial surveillance? It would require extensive work beyond the scope of these notes. It is worth noting that most existing surveillance regimes are developed with little external oversight, either in conception, or operationally. They also rarely delegate work to actors with different motives in a decentralized fashion. And they often operate without effective competition. I take these facts to be extremely encouraging: they mean that there is a lot of low-hanging fruit to work with here, obvious levers by which many of the worst abuses of surveillance may be reduced. Classic surveillance regimes have typically prioritized the regime, not humanity at large, and that means the design space here is surprisingly unexplored.”
Though new technology would likely enable rapid reproduction for biological humans and super rapid reproduction for digital minds. That’s one of the technologies that we’ll need to decide how to handle. If we allow for an explosively fast increase in population size, then population size and/or per-capita resources would again be limited by economic growth.
As explained by Ben Garfinkel here, it’s plausible that democracy has recently become common because industrialization means that it’s unusually valuable to invest in your population, and unusually dangerous to not give people what they want. Whereas with widespread automation, states would rely less on satisfying the demands of their population.
Here, I think there’s an important difference between:
You can only credibly commit to an action if you have the consent of the person who you want to demonstrate this commitment to.
You can unilaterally make credible commitments.
I think the former is good. I think the latter is quite scary, for reasons mentioned earlier.