4 Comments

> (For some other people’s discussion of that question, see section 3.1 of Oesterheld (2017) and this blogpost.)

I'm confused why you've cited the "Three wagers" blogpost. The "wager in favor of higher correlations" isn't a wager in favor of high correlations with agents with _different values_, relative to those with the same values. The wager is in favor of acting as if you correlate with agents who aren't exact copies, but (as Treutlein himself notes) such agents could still strongly share your values. So in practice this wager doesn't seem to recommend actions different from just fulfilling your own values.

Expand full comment
author

I think that the wager (in practice, on average) favors theories that assert high correlations with agents with different values relative to those with the same values — as an indirect consequence of favoring theories that assert high correlations with very different agents with the same values.

I talk more about that here https://lukasfinnveden.substack.com/p/are-our-actions-evidence-for-ai-decisions#%C2%A7how-to-handle-uncertainty

Expand full comment

Okay, hm, I think maybe it's just that the "wager" claim is kinda misleading / the conclusion here seems much less action-guiding than you seem to be suggesting.

Just on priors I’d be very suspicious of a claim that there's a wager for making decisions other than what is optimal by one's own values. “Wagers” are supposed to be of the form, “The stakes of assuming X when not-X are much lower than the stakes of assuming not-X when X.” But the (opportunity) cost of [me acting according to the recommendations of a high-correlations-with-agents-with-very-different-values view when that view is false] doesn’t seem to have lower stakes than [the other way around], without digging more into the details. The former could be extremely costly insofar as it’s not just me who does something worse by my values than otherwise — it’s all these other agents who share my values, too.

Looking at your argument in particular, you don’t quantify the payoffs of the best-by-my-values action vs. the best-by-the-values-of-some-average-across-EDT-agents. It seems like if the sucker payoff is sufficiently bad and/or the gains from trade are sufficiently small, this wager can’t work.

(If you're just not making the claim I'm critiquing, only making the more modest claim that when you average things out the effective correlations with other-values agents are counterintuitively large, I think I agree.)

Expand full comment
author

> when you average things out the effective correlations with other-values agents are counterintuitively large

Yeah, this is what I'm trying to say.

Expand full comment