Memo on some neglected topics

Nov 11, 2023

I originally wrote this for the Meta Coordination Forum. The organizers were interested in a memo on topics other than alignment that might be increasingly important as AI capabilities rapidly grow — in order to inform the degree to which community-building resources should go towards AI safety community building vs. broader capacity building. This is a lightly edited version of my memo, on that. All views are my own.

Some example neglected topics (without much elaboration)

Here are a few example topics that could matter a lot if we’re in the most important century, which aren’t always captured in a normal “AI alignment” narrative:

The potential moral value of AI.1
The potential importance of making AI behave cooperatively towards humans, other AIs, or other civilizations (whether AI ends up intent-aligned or not).
Questions about how human governance institutions will keep up if AI leads to explosive growth.
Ways in which AI could cause human deliberation to get derailed, e.g. powerful persuasion abilities.
Positive visions about how we could end up on a good path towards becoming a society that makes wise and kind decisions about what to do with the resources accessible to us. (Including how AI could help with this.)

(More elaboration on these below.)

Here are a few examples of somewhat-more-concrete things that it might (or might not) be good for some people to do on these (and related) topics:

Develop proposals for how labs could treat digital minds better, and advocate for them to be implemented. (C.f. this nearcasted proposal.)
Advocate for people to try to avoid building AIs with large-scale preferences about the world (at least until we better understand what we’re doing). In order to avoid a scenario where, if some generation of AIs turn out to be sentient and worthy of rights, we’re forced to choose between “freely hand over political power to alien preferences” and “deny rights to AIs on no reasonable basis”.
Differentially accelerate AI being used to improve our ability to find the truth, compared to being used for propaganda or manipulation.
- E.g.: Start an organization that uses LLMs to produce epistemically rigorous investigations of many topics. If you’re the first to do a great job of this, and if you’re truth-seeking and even-handed, then you might become a trusted source on controversial topics. And your investigations would just get better as AI got better.
- E.g.: Evaluate and write-up facts about current LLM’s forecasting ability, to incentivize labs to make LLMs state correct and calibrated beliefs about the world.
- E.g.: Improve AI ability to help with thorny philosophical problems.

Implications for community building?

…with a focus on “the extent to which community-building resources should go towards AI safety vs. broader capacity building”.

Ethics, philosophy, and prioritization matter more for research on these topics than it does for alignment research.
- For some issues in AI alignment, there’s a lot of convergence on what’s important regardless of your ethical perspective, which means that ethics & philosophy aren’t that important for getting people to contribute. By contrast, when thinking about “everything but alignment”, I think we should expect somewhat more divergence, which could raise the importance of those subjects.
  - For example:
    - How much to care about digital minds?
    - How much to focus on “deliberation could get off track forever” (which is of great longtermist importance) vs. short-term events (e.g. the speed at which AI gets deployed to solve all of the world’s current problems.)
  - But to be clear, I wouldn’t want to go hard on any one ethical framework here (e.g. just utilitarianism). Some diversity and pluralism seems good.
- In addition, the huge variety of topics especially rewards prioritization and a focus on what matters, which is perhaps more promoted by general EA community building than AI safety community building?
  - Though: Very similar virtues also seem great for AI safety work, so I’m not sure if this changes much.
- And if we find more shovel-ready interventions, for some of these topics, then I imagine that they would be similar to alignment, on these dimensions.
It seems relatively worse to go too hard on just “get technical AI safety researchers”.
- But that would have seemed like a mistake anyway. AI governance looks great even if you’re just concerned about alignment. Forecasting AI progress (and generally getting a better understanding of what’s going to happen) looks great even if you’re just concerned about alignment.
It seems relatively worse to go too hard on just “get people to work towards AI alignment” (including via non-technical roles).
- But in practice, it’s not clear that you’d talk about very different topics if you were trying to find people to work on alignment, vs. if you were trying to find people to work on these topics.
- In order for someone to do good work on alignment-related topics, I think it’s very helpful to have some basic sense of how AI might accelerate innovation and shape society (which is important for the topics listed above).
- Conversely, in order for someone to do good work on other ways in which AI could change the world, I still think that it seems very helpful to have some understanding of the alignment problem, and plausible solutions to it.
- Relatedly…
Focusing on “the most important century” / “transformative AI is coming” works well for these topics.
- Let’s put “just focus on AI safety” to the side, and compare:
  - “EA”-community building, with
  - “let’s help deal with the most important century”-community building
- I don’t think it’s clear which is better for these topics. Getting the empirics right matters a lot! If explosive technological growth is at our doorstep — then that’s a big deal, and I’m plausibly more optimistic about the contributions of someone who has a good understanding of that but who’s missing some other EA virtues, than someone who doesn’t have a good understanding of that.
Seems great to communicate that these kinds of questions are important and neglected. (Though also quite poorly scoped and hard to make progress on.)
- If there are people who are excited about and able to contribute to some of these topics (and who don’t have a stronger comparative advantage for anything in e.g. alignment) then it seems pretty likely they should work on them.

Elaborating on the example topics

Elaborating on the topics I mentioned above.

Moral value of AI.
- What does common sense morality say about how we should treat AIs?
- How can we tell whether/which AI systems are conscious?
- If we fail to get intent-aligned AI, are there nevertheless certain types of AI that we’d prefer to get over others? See Paul Christiano’s post on this; or my post on what “evidential cooperation in large worlds” has to say about it.
The potential importance of making AI behave cooperatively towards humans, other AIs, or other civilizations (independently of whether it ends up aligned).
- E.g. making AIs less likely to be spiteful or more likely to implement good bargaining strategies like safe Pareto improvements.
Ways in which AI could cause human deliberation to get derailed. Such as:
- The availability of extremely powerful persuasion (see discussion e.g. here and here).
  - As an example intervention: It seems plausibly tractable to develop good regulatory proposals for reducing bad AI persuasion, and I think such proposals could gather significant political support.
- Availability of irreversible commitment and lock-in abilities.
- If all of humans’ material affairs will be managed by AIs (such that people’s competencies and beliefs won’t affect their ability to control resources) then maybe that could remove an important incentive and selection-effect towards healthy epistemic practices. C.f. decoupling deliberation from competition.
Questions about how human governance institutions will keep up as AI leads to explosive growth.
- If we will very quickly develop highly destabilizing technologies,2 how can the world quickly move to the level of coordination that is necessary to handle those?
- Can we reduce the risk of AI-enabled human coups — e.g. avoid being in situations where AIs are trained to obey individual or small group of humans, without having been trained on cases where those humans try to grab power.
- Can we wait before creating billions of digital minds who deserve and want to exercise political rights? (At least for as long as our governance still relies on the principle of one-person one-vote.)
Positive visions about how we could end up on a good path towards becoming a society that makes wise and kind decisions about what to do with the resources accessible to us. (Including how AI could help with this.)
- (Including how AI could help with this.)
- E.g.: Elaboration on whether any type of “long reflection” would be a good idea.
- E.g.: A vision of a post-AI world that will make everybody decently happy, that’s sufficiently credible that people can focus on putting in controls that gets us something at least that good, instead of personally trying to race and grab power. (C.f.: Holden Karnofsky’s research proposal here.)

Nick Bostrom and Carl Shulman’s propositions concerning digital minds and society has some good discussion of a lot of this stuff.

How ITN are these issues?

How good do these topics look in an importance/neglectedness/tractability framework? In my view, they look comparable to alignment on importance, stronger on neglectedness (if we consider only work that’s been done so far), and pretty unclear on tractability (though probably less tractable than alignment).

For example, let’s consider “human deliberation could go poorly (without misalignment or other blatant x-risks”).

Importance: I think it’s easy to defend this being 10% of the future, and reasonable to put it significantly higher.
Neglectedness: Depends on what sort of work you count.
- If we restrict ourselves to work that’s been done so-far that thinks about this in the context of very fast-paced technological progress, it seems tiny. <10 FTE years in EA, and I don’t know anything super relevant outside.
Tractability:
- Very unclear!
- In general, most problems fall within a 100x tractability range.
- Given how little work has been done here, so far, most of the value of additional labor probably comes from information value about how tractable it is. That information value seems pretty great to me — absent specific arguments for why we should expect the problem to not be very tractable.

So let’s briefly talk about a specific argument for why these neglected topics might not be so great: That if we solve alignment, AI will help us deal with these problems. Or phrased differently: Why spend precious hours on these problems now, when cognitive resources will be cheap and plentiful soon enough.3

I think this argument is pretty good. But I don’t think it’s overwhelmingly strong:

Issues could appear before AI is good enough to obsolete us.
- For example: Strong persuasion could appear before AI gets excellent at figuring out solutions to strong persuasion.
- This is amplified by general uncertainty about the distribution of AI capabilities. Although AI will accelerate progress in many areas, we should have large uncertainty about how much it will accelerate progress in different areas. So for each of the issues, there’s non-negligible probability that AI will accelerate the area that causes the problem (e.g. tech development) before it accelerates progress on the solution (e.g. forecasting potential harms from tech development).
- (Note that a plausible candidate intervention here, is: “differentially accelerate AI’s ability to provide solutions relative to AI’s ability to cause problems”.)
There might not be enough time for some actions later.
- For example: Even with excellent AI advice, it might be impossible for the world’s nations to agree on a form of global governance in less than 1 month. In which case it could have been good to warn about this in advance.
“Getting there first” could get you more ears.
- For example: See the LLM-fueled organization with good epistemics, that I suggested in the first section, which could get a good reputation.
- For example: Writing about how to deal with a problem early-on could shape the discussion and get you additional credibility.
Expectations about the future shape current actions.
- If people think there are broadly acceptable solutions to problems, then they might be more inclined to join a broad coalition and ensure that we get something at least as good as that which we know is possible.
- If people have no idea what’s going to happen, then they might be more desperate to seek power, to ensure that they have some control over the outcome.
One topic is to come up with candidate back-up plans to alignment. That matters in worlds where we don’t succeed at alignment well-enough to have AI do the research for us.
- See “moral value of AI”, or some topics in cooperative AI, mentioned above.

So I don't think “AI will help us deal with these problems” is decisive. I’d like to see more attempted investigations to learn about these issues’ tractability.

Including both: Welfare of digital minds, and whether there’s any types of misaligned AI that would be relatively better to get, if we fail to get intent-alignment.

This could be: tech that (if proliferated) would give vast destructive power to millions of people, or that would allow & encourage safe “first strikes” against other countries, or that would allow the initial developers of that tech to acquire vast power over the rest of the world. (C.f.: vulnerable world hypothesis.)

Annoyingly — that could be counted into either lower importance (if we restrict our attention to the part of the problem that needs to be addressed before sufficiently good AI), lower neglectedness (if we take into account all of the future labor that will predictably be added to the problem), or lower tractability (it’s hard to make an impact by doing research on questions that will mainly be determined by research that happens later-on).

Lukas Finnveden

Discussion about this post