Are our choices analogous to AIs' choices?
Previously on this blog, I have:
Introduced the question of whether ECL says we should cooperate with distant AIs.
Suggested a general formula for whether ECL recommends any mutually beneficial deal in asymmetric situations.
Discussed how that formula could be applied to our situation.
Key inputs into the formula include:
Whether us choosing to benefit AI values would provide evidence that AI will analogously benefit our values.
Whether AIs benefitting our values would provide evidence that we will analogously benefit AI values.
This post will speculate about the values of those parameters.
(If your account of decision theory is different from evidential decision theory, you can substitute “provide evidence that” with your preferred metric of acausal influence.)
Summary
In general, ECL suggests that you should be more inclined to benefit the values of actors whom you correlate more with, in the sense that you perceive yourself as having more acausal influence on them, and they perceive themselves as having more acausal influence on you.
In particular, the post on asymmetric ECL suggests that your inclination to benefit distant AIs’ values should be proportional to your perceived acausal influence on them and their perceived acausal influence on you; but inversely proportional to your and AIs’ perceived acausal influence to actors with shared values.
From an EDT perspective, these “correlations” or this “acausal influence” might not be best viewed as objective facts about the world. Instead, they simply reflect the degree to which we consider our actions to be evidence for choices made in pre-AGI civilizations that share our values vs. evidence for choices made by AIs. (And vice versa, for the AIs.)
(I’m not yet fully sold on this perspective. I still feel quite mystified by how to determine the degree of “acausal influence” I have on others. Even if the EDT perspective turns out to be right, I at-least expect there to be a lot more to learn about what sort of reasoning does and doesn’t make sense when establishing the relevant conditionals.)
Taking the EDT perspective on-its-face, there’s an intuitively strong argument that our actions are significantly more evidence for actions taken in pre-AGI civilizations than actions taken by AIs. If I query my brain for intuitive predictions of distant actors, it sure seems like my own actions have more impact on my prediction of people in distant pre-AGI civilizations than on my predictions of misaligned AGI systems.
I think that intuition is worth paying attention to. But I think it’s less important than it naively seems. I’ll now go through a few different reasons for why you might have it, and a few different counterarguments. I’ll mostly be talking about this from an EDT perspective.
Different options. You might think that we can’t affect the AIs much because our option-space is very different. Us humans get to decide how much we should invest in alignment vs. research on ECL, and AGI systems of the future get to decide what to use their light cone for (or something like that). However, this seems like it’s much reduced by retreating to more abstract questions, like “Should I adopt a policy of impartially optimizing for many different values?”. See the post on asymmetric ECL for a more detailed description of what that abstract question could look like. See footnote for a caveat.1
Abstract reasoning vs. human quirks and biases. When we do decision-theoretic reasoning, we’re doing a type of reasoning where our conclusions correlate with AGIs’ conclusions. If I didn’t know anything about decision theory, and learned that humans upon extensive analysis thought that ECL “works”, then I would think it more likely that AGIs also would think ECL “works”. Analogously: If I didn’t know anything about math, and learned that humans thought that the Riemann hypothesis was true, then I would think it more likely that AGIs would think the same thing.
I think this establishes some basic plausibility that our actions might correlate with AGIs’. On the other hand, it’s less clear that this makes sense when you are the one studying decision theory or math, and you are the ones making the decisions.
If you’ve discovered a lot of good arguments that the Riemann hypothesis is true, then you have already conditioned on the existence of those arguments, which is what influences your belief that AIs will conclude that they’re true. You get no further information based on whether you utter the words “the Riemann hypothesis is true”.
You can make a similar argument in the decision-theoretic case: If you’ve discovered a lot of good arguments that ECL “works”, then that’s what informs your beliefs about what the AIs will do (and that’s something you’ve already conditioned on). When you then act according to ECL, or not, you’re not getting further evidence about where the decision-theoretic arguments points. You’re just learning whether those arguments led to action within your human brain, with all its quirks and biases. And that’s much more evidence for what evolved creatures do than what AGIs do.
I’m currently unconvinced by this argument. For one, even before making a decision, you can also observe a lot of information about how the quirks and biases of your human brain are interpreting and interacting with the arguments. (Similar to the tickle defense in smoking lesion.) Which means that you also don’t get that much new information about your quirks and biases, in the act of making your decision. That puts the quirks and biases at a similar standing to the decision theoretic reasoning.
I also have an intuition that goes further, which I won't be able to fully describe in this paragraph. But to gesture at it: It seems to me like it’s a mistake for your decision theory to make decisions in order to provide “good news” about parts of your cognition that aren't responsive to decision-theoretic reasoning (such as your human-specific quirks and biases.)
See more here.
AGIs will know about us. The AGIs will naturally have more knowledge of us than we have of them, e.g. they might know how ECL-ish the pre-AGI civilization in their part of the universe was. If you’re an “updateful” EDT agent, then evidence of your counterpart’s actions that doesn’t come from your own actions will typically reduce the evidential power of your own action. So, from their perspective, their actions have little acausal influence on our choices. I think it’s correct that this would reduce correlations if AGIs are updateful. However, if they are sufficiently updateless, they would not take this knowledge into account when assessing their acausal influence — so this consideration mostly shifts our concern to sufficiently updateless AGIs.
See more here.
AGIs will have a deeper understanding of decision theory. I think there are two different versions of this concern, each with separate responses.
AGI’s understanding of decision theory will make us predictable, in the sense that it will understand the principles that we use to make decisions so deeply that it will know what we choose before it makes its own decision. Thus, it will not see its own decisions as giving any evidence for ours.
The response to this is similar to the one just-above: If it’s sufficiently logically updateless about this, then it would not take this knowledge into account when assessing its acausal influence over us.
AGI’s reasoning about decision theory will be so different that it correlates very little with us. If you think this is true, it seems like you’re committed to the idea that AGIs will reason about decision theory in a way that has barely any connection to how we reason about it. But if you expect AGI to be good at reasoning about decision theory, and you think that our own reasoning correlates little with their reasoning, then that would suggest that you’re pessimistic about us reaching any correct conclusions about decision theory. If this is your view, I agree that you should be pessimistic about doing ECL with the AIs. (If nothing else — because all of this reasoning about ECL have little chance of being correct, anyway.)
In the rest of this post, I expand a bit on these points. (Though this topic is very confusing to me, and I can’t promise that the expanded version will be very enlightening.)
There’s also an appendix on How to handle uncertainty? — when we’re uncertain about how much acausal influence we should perceive ourselves as having, on various groups.
In more depth…
Abstract reasoning vs. human quirks and biases
When we’re reasoning about decision theory, it seems like we’re doing a type of reasoning that both pre-AGI actors and philosophically ambitious AGIs should be doing. Ultimately, we’re trying to learn what the best way of making decisions is, whatever that means. If we’re doing that well, our conclusions about those abstract questions should correlate with the AGI’s conclusions — by virtue of us both being correct.2 Let’s call this “decision theoretic reasoning”. (Read those quotes as scare quotes — I don’t love the name.)
But there’s also many factors that influence our judgments that wouldn’t apply to AI systems. For example, maybe we would irrationally reject ECL because our social instincts kick in to save us from acting crazy. Or maybe we’ll irrationally accept ECL because of wishful thinking and because we want an excuse to act nicely towards everyone. Insofar as factors like these determine our decisions, our decisions really only correlate with humans and human-like species. Let’s call this “human-specific factors”.
When you learn about whether a human does ECL, you get evidence about both these components. Thus, observing a human doing something for ECL reasons is evidence both that (i) good “decision theoretic reasoning” implies that ECL makes sense, and (ii) that “human-specific factors” pushes humans towards acting according to ECL. The first of these is evidence that everyone is more likely to do ECL, the second is only evidence that human-like actors are likely to do ECL.
But if you are a human, you are in quite a different situation from someone observing a human. You know a lot more about why you’re taking the actions you do.
If you’re an updateful EDT agent, then as you’re thinking about arguments for and against taking some ECL-informed action, you will condition on the existence of those arguments. So insofar as uncertainty about the existence of those arguments were your only source of correlation with certain other agents, you won’t see yourself as having any power over them. Since you continuously condition-away your insights about “decision theoretic reasoning”, it’s not clear that your actions ever give significant evidence about where “decision theoretic reasoning“ points.
But it seems that similar arguments apply to ~all sources of influence on your decision-making — including the human-specific factors. Consider, for a moment, an analogy to smoking lesion and the tickle defense (see for example here for an introduction to the problem).
The tickle defense goes: Once you notice an impulse to smoke, you’ve already received your evidence. Deciding to smoke or not gives you no additional information about your lesion.
Analogously, here: Once you notice an impulse to (not) engage in ECL (due to wishful thinking, or social desirability bias, or anything like that), you’ve already received your evidence about the “human-specific factors”. Deciding to engage in ECL or not after that doesn’t give you any additional evidence about those.
I have a few different reactions to this:
These kinds of arguments are very confusing, and it seems like they can’t work arbitrarily far. Until you’ve made your decision, you should maintain some uncertainty about what your decision will be, so there must be some factor that determines your choice that you don’t condition on.
Though some have argued that EDT agents sometimes can get confident about what their decisions will be before they make them, and that this indeed can have significant (and mostly bad) implications for their behavior. See footnote for examples.3
I’m not sure I find the tickle defense as stated fully persuasive, because it’s not clear to me that humans always do have the requisite type of self-knowledge.
Note that it’s easy to construct hypothetical programs that would lack self-knowledge: E.g. one that first queried something-like-EDT for a recommendation of whether to output “smoke” or not, and that then had some unknown-to-the-program probability of outputting “smoke” regardless of what EDT recommended. When this program observes its own output, it will learn new facts about itself.
Despite this, I feel quite strongly that it’s correct to smoke in smoking lesion.
When I consider agents that lack some crucial types of self-knowledge (such as the agents in Abram’s Smoking Lesion Steelman or in my appendix What should you do when you don’t know your DT?) it feels quite clear to me that the agents who refuse to smoke are doing something wrong (and in some sense they would agree, since they’re not stable under self-modification).
In particular, it seems like they’re choosing the output of their decision theory to control something that is ultimately not under the control of their decision-theoretic reasoning.
Reflecting on this, it seems to me that correct reasoning about decision theory should mostly care about correlations from category (i) above (i.e. correlations between agents’ broadly-correct decision-theoretic reasoning) and mostly not care about correlations from category (ii) (i.e., correlations that stem from other sources of influences on our behavior).
I’m not sure what the best way to justify this is…
Perhaps it could be to instead use something like functional decision theory.
Perhaps it could be to be updateless about some types of information. (Such that you treat your actions as if they can provide substantial evidence about facts that you’re in some sense already confident about.)
Perhaps EDT does get the right answer, if one does the detailed analysis exactly right, and correctly understands what types of inputs it should and should not condition on.
Ultimately, the situation seems very confusing in a way that ties into deep decision-theoretic issues. It’s possible that some of it could be short-cutted, but it’s also possible that getting good answers to these questions would require substantial progress in decision theory.
High correlations might require updatelessness
The AGIs will naturally have more knowledge of us than we have of them. For example:
They might know how ECL-ish the pre-AGI civilization in their part of the universe was.
Or they might understand decision theory so well that they can perfectly predict what weak, shallow reasoners such as ourselves would conclude.
If they were to use (updateful) EDT, this would reduce their perceived influence over us, since their own decisions would be less informative for what we do than their other sources of evidence.4
One possible response to this is: If the AI is ever in a situation where it believes that it correlates a lot with people-like-us (e.g. because it doesn’t yet know how ECL-ish pre-AGI civilizations were, and it doesn’t yet understand decision theory much better than we do), and it thought that learning more would ruin this correlation, then that would be a significant cost of learning more. Instead, it would prefer to adopt a kind of updatelessness that preserved its ability to evidentially affect our actions (while still being able to benefit from new information in other ways).
And even if the AI is never in such a situation (e.g. because it is “born” with knowledge of how ECL-ish its preceding civilization was), it might still adopt a kind of updatelessness that would make it act as if it had been in such a situation. Either because that’s the rational thing to do in some deep sense, or — if updatelessness is more like “values” than something prescribed by rationality — just because it is so inclined.
Phrasing this differently:
If an agent is ever in a situation where it correlates a lot with people-like-us, and it finds ECL arguments persuasive, then it would adopt a policy of placing some weight on satisfying our preferences. (To gain evidence that we do the same.)
Any action that would predictably make its future self place less weight on our preferences would be bad according to its adopted policy (since that policy places some weight on satisfying our preferences). So it would place some value on avoiding such information, or even better, on self-modifying itself such as to be able to make use of such information without thereby making its future self place less weight on our preferences.
If a galaxy-brained AGI decides to follow an update-less policy akin to “what would I have committed to back when I was more ignorant about the world”, it would notice that it would have committed to placing some constant weight on our values, and act accordingly.
This response seems fairly plausible to me. But it would mean that the group of misaligned AI systems that are ECL-ish in the right way to cooperate with us is much smaller than we’d have otherwise thought — since they’d be required to be updateless in the right way.
Also: Note that this proposal might require the agents to be some degree of logically updateless, if it requires them to imagine a past where they didn’t understand decision theory very well, yet.
What if AI’s reasoning about decision theory is very different from ours
Another possible issue is that the AGIs’ reasoning about decision theory will simply be so different from ours that we wouldn’t correlate much with it. I can imagine at least two different versions of this:
Something like: As you think about decision theory, you eventually encounter an insight (or a series of insights) X, that makes you correlate very little with agents who haven’t had insight X.
An especially plausible version of this is: An agent that deeply understands the structure and conclusions of all decision-theoretic reasoning that we’d be capable of doing may have too much knowledge about us to see themselves as being correlated with us.
Something like: As you think about decision theory, you eventually encounter an insight (or a series of insights) X, that changes the structure of your correlation in a way such that you’re no longer incentivised to benefit agents without insight X, even if you correlate with them in some ways.5
The hand-wavy counter-argument here is just that: If we can reason well-enough about decision theory to sometimes reach correct conclusions, and AGI also can do that, then that suggests some minimal correlation. But I don’t think this establishes a large correlation.
Another counter-argument is to repeat the point about updatelessness. It might be the case that a superintelligent AI wouldn’t perceive itself as having much acausal influence on us, except that a past self decided to become updateless at a time when it perceived itself as having acausal influence over us.
Appendices
What should you do when you don’t know your DT?
Here’s an exercise that’s relevant for a few different problems: What should an agent do if it’s uncertain about what decision theory (DT) it’s following? There’s a few different operationalizations of this, and they all give very different results.
Imagine the setting: prisoner’s dilemma (PD) against a clone, of the particular form “Button C sends $3 to your clone, button D sends $2 to yourself.”
50% chance of your recommendation being followed
Here’s one operationalisation:
Inside your brain, there’s a little EDT module, which reasons about its recommendations in an EDT fashion.
Inside your brain, there’s a little CDT module, which reasons about its recommendations in a CDT fashion.
They both submit their recommendation to a randomization module that follows each recommendation with 50% probability. (This randomization is independent of your clone, so you might end up taking different actions.)
When you play PD against a clone, the CDT module will defect, ‘cause that’s what CDT does.
What will the EDT module do? It will reason:
Regardless of what I recommend, my opponent’s EDT module will recommend the same thing.
Both my recommendation and my opponent’s EDT module has a 50% chance of having their recommendation be followed.
So recommending “C” nets me a 50% chance of $3 (if my opponent follows EDT) and recommending “D” nets me a 50% chance of $2 (if I follow EDT). The former is better, so I recommend “C”.
So the EDT module’s recommendation is the same as if it controlled the whole agent!
50% chance of your recommendation being asked
Consider the same situation as above, except the randomization happens first, and only then does your brain ask either the EDT or the CDT module.
In this case, when your EDT module is consulted, it can deduce that its decision is going to matter! But it still doesn’t know its opponents’ randomization. So if it’s purely selfish and not updateless, it will reason:
Regardless of what I recommend, my opponent’s EDT module will recommend the same thing.
So if I choose “C”, there’s a 50% chance that my opponent’s EDT module is in charge and gives me $3.
But if I choose “D”, I’m guaranteed to get $2.
That’s better! So I’ll pick “D”.
Ignorance of what your decision-module does
Now let’s consider the most pathological case: The case where your own decision theory doesn’t know what it’s doing, as it's doing it.
Concretely:
Your brain contains both a world-model-module and a decision-algorithm-module.
Your decision algorithm uses your world-model to make decisions, but your world-model is 50/50 on whether the decision-algorithm uses CDT or EDT.
This means that your world-model will be deducing facts about your decision-algorithm from observing your own actions.
(Though note that if it’s been alive for a while, it should already have been able to deduce this. I ignore that here.)
Prisoner’s dilemma
In the PD-case:
CDT will defect, as always. More money is more money!
If you implement EDT, then:
EDT will query the world-model for what the world looks like if the decision-module outputs “C”.
The world-model might reason “Well, CDT would defect. So if you output “C”, then that must mean that you’re EDT. Then I can conclude that EDT outputs “C” in cases like this. The expected utility in this world is 0.5*0$ [for the world where the opponent is CDT] +0.5*$3 [for the world where the opponent is EDT]=$1.5”.
EDT will query the world-model for what the world looks like if the decision-module outputs “D”.
There will be some complicated arguments for how this informs you about whether your opponent’s EDT module is likely to output “C” or “D”, but ultimately, the payoff will be at least $2.
So EDT will also defect.6
Smoker’s lesion
Consider a smoker’s-lesion-style situation where:
Everyone enjoys smoking, but has a much stronger desire to avoid cancer.
Smoking has no causal impact on cancer risk.
There are genes that will dispose you towards both CDT and cancer.
Everyone starts out 50/50 on whether they are EDT or CDT agents.
In this case, CDT agents (uncertain about whether they are EDT or CDT agents) will smoke, because they enjoy it and see no reason not to.
Because of that, not smoking is strong evidence that you’re an EDT agent. So EDT agents (uncertain about whether they are EDT or CDT) will avoid smoking.
Here’s an amusing situation:
Let’s give this agent an opportunity to pay 1 penny to find out whether they are an EDT or a CDT agent before they decide about smoking,7 I think that:
The CDT agent will pay for this information, because they will reason:
There’s a 50% chance that I’m a CDT agent. In that case, this information won’t change my behavior.
There’s a 50% chance that I’m an EDT agent. In that case, this information will cause me to smoke (because I no longer have to signal to myself that I’m an EDT agent). I enjoy smoking far more than a penny. So I should pay for the information.
The EDT agent will, similarly to the above situation, thereby reason that they should refuse to pay, because that’s evidence that they’re not a CDT agent, which they really don’t want to be.
Now let’s change the offer to instead give this agent an opportunity to either decline, pay 1 penny for the information, or pay 2 pennies for the same information.
The CDT agent will still pay 1 penny for the information, since the option to pay 2 pennies for the information is strictly worse.
The EDT agent will want to avoid paying 1 penny for the information, since that will tell them that they’re a CDT agent. But they also prefer paying 2 pennies for the information over declining the offer. After all, paying 2 pennies is just as good for signaling that they’re not CDT agents — but they’ll still get to buy the information of what type of agent they are. Which they value — since they think that if they are an EDT agent, this information will allow them to smoke, which they value far more than 2 pennies.
Conclusion
The agent in the first example seems pretty sensible.
The agent in the second example seems slightly less sensible. It would self-modify to become updateless before the dilemma starts, if it could be. Also, it’s not really an example of someone who doesn't know what their DT is. They learn what their DT is before they make the decision — they’re just uncertain about their opponent’s DT.
The agent in the third example seems absolutely pathological to me. I don’t want to be an agent like that — and indeed, I think that agent would self-modify to something more sensible given the opportunity. (As long as that self-modification didn’t accidentally signal that they were a CDT agent.)
How to handle uncertainty?
For many of the above considerations, I’m very uncertain about how convinced I should be by them. Do I have almost as much acausal influence on AIs as I have on humans; or do I have near-0 acausal influence on AIs? I don’t know what I would decide on after thinking about this for longer.
What’s the right way to handle this uncertainty? I think there are two plausible perspectives: bargaining-based approaches and expected value.
On bargaining-based approaches, you can imagine all the perspectives starting out with “budgets” proportional to your credence. (E.g. a credence of X% could correspond to that view getting X% of your money+time, or maybe an X% probability of deciding all your decisions.) Then the different views can bargain between themselves to locate some place on the pareto-frontier that distributes gains-from-trade in a fair way. So if we assign 20% credence to the proposition that we correlate similarly much with AGIs and pre-AGI civilizations (who share our values), maybe making AI ECL-ish should be the top priority for 20% of our resources.
On expected-value based approaches, I think the most natural proposal is to:
Have each view assign numbers to how much we correlate with various different groups.
Then compute the expected correlation with each group as the mean of all those correlations, weighted by the credence assigned to each view.
This proposal might differ significantly from the above, because views which assign higher correlations will generally dominate. Most notably: CDT:ish views (which don’t think we have any ability to correlate with other agents) will be totally ignored in sufficiently large worlds, as explained in the evidentialist’s wager.
But the same phenomenon also plays out on a smaller scale. Views that assert higher correlations than others, or correlations with a significantly wider range of people than others, will often dominate views that don’t. (C.f. section “A wager in favor of higher correlation” here.) Consider the following example:
One view is that our correlation with any EDT actor is quite large (maybe because it accepts some views about updatelessness and how to handle self-knowledge that I gesture at above). Let’s say that this view maintains that
A different view says that you must be more similar to an agent before you correlate with them. Maybe:
and
If we assign equal credence to both views, that would suggest that the all-things-considered numbers for pre-AGI civs is (50%+10%)/2=30% and for misaligned AGI is (50%+1%)/2=25.5%.
Thus, even though we started out with equal credence on a 1:1 ratio and a 10:1 ratio, the expected correlation is much closer to the former (in particular: a 1.18:1 ratio), because the view asserting a 1:1 ratio view also thinks that the correlations are larger.
Just like in the example, I think this pattern will tend to favor views that say that our correlation with AGIs and pre-AGI EDT:ers aren’t too different from each other. If a view says that our cognition must be close to others’ along many dimensions in order to have a strong correlation, then that would coincide with having a small average correlation with large groups like “pre-AGI EDT:ers”. (Because most people in those groups wouldn’t be relevantly similar.) Whereas views that think that details of the cognition don’t matter much will tend to both assign higher correlations to groups like “pre-AGI EDT:ers” and assign higher correlations with misaligned AIs.
I’m not sure whether the bargaining-based approach or the expected value approach is better here. My meta-solution is to assign some credence to the bargaining-based approach and some to the EV-based approach, and then use the bargaining-based approach to aggregate those.
As for what credences to use there… I’m generally skeptical of maximizing EV across very different ontologies, which e.g. means that I want to put some credence on “ECL just doesn’t work” that doesn’t get swamped by the evidentialist’s wager. But it seems less objectionable to maximize EV within an ontology — like for different EDT views that propose differently large correlations.
(Another issue, that I don’t talk about here, is how asymmetric deals work with uncertainty over your cooperation-partners beliefs. Seems tricky!)
Note that deciding to retreat to that abstract question is also a decision, which could be motivated by either abstract reasons or reasons that are more unique to our situation. The main reason for why “retreat to the abstract question” would be a good decision is that it would be evidence that other agents do the same — which mainly applies if we have abstract reasons to retreat to the abstract question. In practice, this feels plausible. For example, both this footnote and the paragraph it features in argues that you should retreat to an abstract question without making any reference to the specific options we face.
The degree to which this is compelling depends on the degree to which you endorse some sort of “realism” about the true instrumental rationality. For example, do you believe that intelligent beings are about as likely to converge to a particular decision theory as they are to converge on their description of mathematics or physics? I personally think that decision-theory seems less convergent than these other topics, in the sense that I’d be less surprised if different agents ended up disagreeing quite-a-bit on-reflection. But it still seems like reasoning about decision theory has a lot of structure to it — I suspect there’s not that many wildly different end-points, and if some intelligent actor reaches a particular conclusion, I think that’s non-negligible evidence that others will reach a similar conclusion.
Paul’s point number 5 here. (“If an EDT agent becomes almost sure about its behavior then it can become very similar to CDT and for example can two-box on some Newcomb-like problems.”)
Caspar Oesterheld’s discussion in section 6.4 here of Eell’s argument that EDT two-boxes in Newcomb’s problem.
Section 5.5 (“Street-Crossing Scenario: Avoiding Evidentialist Excess”) in Good and Real discusses the reverse problem: that becoming confident that you’ll choose good options will (incorrectly) make you believe that any option you pick will be good.
Based on the reasoning in Asymmetric ECL (summarized here), there would not be any mutually beneficial deal that both parties are incentivized to take if their perceived acausal influence over us was too low.
For example: Perhaps an agent with insight X can exclude agents without insight X from their ECL cooperation without providing any evidence that agents without insight X will exclude those with insight X. In other words: If an agent with insight X does a form of ECL that only benefits those with insight X, perhaps that’s evidence that agents without insight X will do a form of ECL that benefits everyone, including those with insight X. And not significantly less evidence for that than if the agents with insight X had done “normal” ECL.
Then, conversely, if we (without insight X) decide to do a form of ECL that benefits all agents that do ECL, perhaps that doesn’t provide evidence that agents with insight X will benefit us.
I don’t know what insight X could be and I don’t want to claim that this particular structure of correlation is likely. But I notice that there’s nothing about my understanding of decision theory that implies that an insight like this couldn’t exist.
Unless the model expects to encounter a high-stakes ~newcomb's problem later on. If so, it will think it’s really good news that it’s EDT! So it might select “C” to get the good news that it’s definitely not a CDT agent.
Let’s assume that they will forget about the decision they make before the choose whether to smoke or not, to ensure that they don’t learn anything about their DT even when not paying. They will only remember something if they pay to learn what they are.