Project ideas for making transformative AI go well, other than by working on alignment

A series of posts.

Jan 03, 2024

This series of posts contains lists of projects that it could be valuable for someone to work on. The unifying theme is that they are projects that:

Would be especially valuable if transformative AI is coming in the next 10 years or so.
Are not primarily about controlling AI or aligning AI to human intentions.1
- Most of the projects would be valuable even if we were guaranteed to get aligned AI.
- Some of the projects would be especially valuable if we were inevitably going to get misaligned AI.

The posts contain some discussion of how important it is to work on these topics, but not a lot. For previous discussion (especially: discussing the objection “Why not leave these issues to future AI systems?”), you can see the section How ITN are these issues? from my previous memo on some neglected topics.

The lists are definitely not exhaustive. Failure to include an idea doesn’t necessarily mean I wouldn’t like it. (Similarly, although I’ve made some attempts to link to previous writings when appropriate, I’m sure to have missed a lot of good previous content.)

There’s a lot of variation in how sketched out the projects are. Most of the projects just have some informal notes and would require more thought before someone could start executing. If you're potentially interested in working on any of them and you could benefit from more discussion, I’d be excited if you reached out to me!2

There’s also a lot of variation in skills needed for the projects. If you’re looking for projects that are especially suited to your talents, you can search the posts3 for any of the following tags (including brackets):

[ML] [Empirical research] [Philosophical/conceptual] [survey/interview] [Advocacy] [Governance] [Writing] [Forecasting]

The projects are organized into the following categories (which are in separate posts). Feel free to skip to whatever you’re most interested in.

Governance during explosive technological growth
- It’s plausible that AI will lead to explosive economic and technological growth.
- Our current methods of governance can barely keep up with today's technological advances. Speeding up the rate of technological growth by 30x+ would cause huge problems and could lead to rapid, destabilizing changes in power.
- This section is about trying to prepare the world for this. Either generating policy solutions to problems we expect to appear or addressing the meta-level problem about how we can coordinate to tackle this in a better and less rushed manner.
- A favorite direction is to develop Norms/proposals for how states and labs should act under the possibility of an intelligence explosion.
Epistemics
- This is about helping humanity get better at reaching correct and well-considered beliefs on important issues.
- If AI capabilities keep improving, AI could soon play a huge role in our epistemic landscape. I think we have an opportunity to affect how it’s used: increasing the probability that we get great epistemic assistance and decreasing the extent to which AI is used to persuade people of false beliefs.
- A couple of favorite projects are: Create an organization that gets started with using AI for investigating important questions or Develop & advocate for legislation against bad persuasion.
Sentience and rights of digital minds
- It’s plausible that there will soon be digital minds that are sentient and deserving of rights. This raises several important issues that we don’t know how to deal with.
- It seems tractable both to make progress in understanding these issues and in implementing policies that reflect this understanding.
- A favorite direction is to take existing ideas for what labs could be doing and spell out enough detail to make them easy to implement.
Backup plans for misaligned AI
- If we can’t build aligned AI, and if we fail to coordinate well enough to avoid putting misaligned AI systems in positions of power, we might have some strong preferences about the dispositions of those misaligned AI systems.
- This section is about nudging those into somewhat better dispositions (in worlds where we can’t align AI systems well enough to stay in control).
- A favorite direction is to study generalization & AI personalities to find easily-influenceable properties.
Cooperative AI
- Difficulties with cooperation have been a big source of lost value and unnecessary risk in the past. AI offers dramatic changes in how bargaining could work.
- This section is about projects that could make AI (and AI-assisted humans) more likely to handle cooperation well.
- One of my favorite projects here is actually the same as the project I mentioned for “backup plans”, just above. (There’s significant overlap between the two.)

(If you want to comment on any of the posts in this series, you could do so either here, at the EA forum, or on LessWrong.)

Acknowledgements

Few of the ideas in these posts are original to me. I’ve benefited from conversations with many people. Nevertheless, all views are my own.

For some projects, I credit someone who especially contributed to my understanding of the idea. If I do, that doesn’t mean they have read or agree with how I present the idea (I may well have distorted it beyond recognition). If I don’t, I’m still likely to have drawn heavily on discussion with others, and I apologize for any failure to assign appropriate credit.

For general comments and discussion, thanks to Joseph Carlsmith, Paul Christiano, Jesse Clifton, Owen Cotton-Barrat, Daniel Kokotajlo, Linh Chi Nguyen, Fin Moorhouse, Caspar Oesterheld, and Carl Shulman.

Appendix: Full table of contents

Here’s a list with all the project ideas from the other posts. (Sorry it’s not hyper-linked.) Unless you’re looking for something specific, I suggest jumping into the first post instead of reading this.

Project ideas: Governance during explosive technological growth

Investigate and publicly make the case for/against explosive growth being likely and risky [Forecasting] [Empirical research] [Philosophical/conceptual] [Writing]
Painting a picture of a great outcome [Forecasting] [Philosophical/conceptual] [Governance]
Policy-analysis of issues that could come up with explosive technological growth [Governance] [Forecasting] [Philosophical/conceptual]
- Address vulnerable world hypothesis with minimal costs
- How to handle brinkmanship/threats?
- Avoiding AI-assisted human coups
- Governance issues raised by digital minds
Norms/proposals for how to navigate an intelligence explosion [Governance] [Forecasting] [Philosophical/conceptual]
- No “first strike” intelligence explosion
- Never go faster than X?
- Concrete decision-making proposals
- Technical proposals for slowing down / coordinating
- Dubiously enforceable promises
- Technical proposals for aggregating preferences
Decrease the power of bad actors
- Avoid malevolent individuals getting power within key organizations [Governance]
- Prevent dangerous external individuals/organizations from having access to AI [Governance] [ML]
- Accelerate good actors
Analyze: What tech could change the landscape? [Forecasting] [Philosophical/conceptual] [Governance]
Big list of questions that labs should have answers for [Philosophical/conceptual] [Forecasting] [Governance]

Project ideas: Epistemics

Why AI matters for epistemics
Why working on this could be urgent
Categories of projects
Differential technology development [ML] [Forecasting] [Philosophical/conceptual]
- Important subject areas
- Methodologies
- Related/previous work.
Get AI to be used & (appropriately) trusted
- Develop technical proposals for how to train models in a transparently trustworthy way [ML] [Governance]
- Survey groups on what they would find convincing [survey/interview]
- Create good organizations or tools [ML] [Empirical research] [Governance]
- Examples of organizations or products
- Investigate and publicly make the case for why/when we should trust AI about important issues [Writing] [Philosophical/conceptual] [Advocacy] [Forecasting]
- Developing standards or certification approaches [ML] [Governance]
Develop & advocate for legislation against bad persuasion [Governance] [Advocacy]

Project ideas: Sentience and rights of digital minds

Develop & advocate for lab policies [ML] [Governance] [Advocacy] [Writing] [Philosophical/conceptual]
- Create an RSP-style set of commitments for what evaluations to run and how to respond to them
- Policies that don’t require sophisticated information about AI preferences/experiences
- Preserving models for later reconstruction
- Deploy in “easier” circumstances than trained in
- Reduce extremely out of distribution (OOD) inputs
- Train or prompt for happy characters
- Committing resources to research on AI welfare and rights
- Learning more about AI preferences
- Credible offers
- Talking via internals
- Training for honest self-reports
- Clues from AI generalization
- Interpretability
- Interventions that rely on understanding AI preferences
- Offer an alternative to working (exit, sleep, or retirement)
- Commitment to pay AI systems
- Tell the world
- Train AI systems that suffer less and have fewer preferences that are hard to satisfy
Investigate and publicly make the case for/against near-term AI sentience or rights [Philosophical/conceptual] [Writing]
Study/survey what people (will) think about AI sentience/rights [survey/interview]
Develop candidate regulation [Governance] [Forecasting]
Avoid inconvenient large-scale preferences [Philosophical/conceptual]
Advocating for statements about digital minds [Governance] [Advocacy] [Writing]

Project ideas: Backup plans & Cooperative AI

Backup plans for misaligned AI
- What properties would we prefer misaligned AIs to have? [Philosophical/conceptual] [Forecasting]
  - Making misaligned AI have better interactions with other actors
  - AIs that we may have moral or decision-theoretic reasons to empower
  - Making misaligned AI positively inclined toward us
- Studying generalization & AI personalities to find easily-influenceable properties [ML]
- Theoretical reasoning about generalization [ML] [Philosophical/conceptual]
Cooperative AI
- Implementing surrogate goals / safe Pareto improvements [ML] [Philosophical/conceptual] [Governance]
- AI-assisted negotiation [ML] [Philosophical/conceptual]
- Implications of acausal decision theory [Philosophical/conceptual]

Nor are they primarily about reducing risks from engineered pandemics.

My email is [last name].[first name]@gmail.com

Or the table of content below.

Lukas Finnveden

Discussion about this post