Circling some basic confusions:
- A bunch of ethics bottoms out in “do what the idealised version of you would do”
- And many people’s hopes for the long term future is “do a ton of reflection, get a ton of advice, not only get corrected on epistemic errors but also values, such that we act on better values in the end”
- Moral progress to date seems good to me
- But I have an intuitive mistrust of idealised me
- And if I could wave a wand right now and become who I’ll be after my values change, I’m not sure I would
- (And this is somewhat common – cf discussion of values drift over time within EA)
- (And I can’t clearly price in values change into EV-based decision theories, if the values change is sufficiently basic/fundamental, or if the experience is otherwise transformative..?)
(What are my actual questions? Some attempts:)
- How do I know if values change was good?
- By whose lights?
- How would I become confident enough to step into the values-change machine/take the wise AI advisor’s advice?
- In some sense, I think this values change will be good. So why am I squeamish about it now?
- How do you think sensibly about fundamental changes in your values, actually?
Presumably relevant work (haven’t read all of it)
- The philosophical literature on transformative experiences — mostly LA Paul: https://plato.stanford.edu/entries/transformative-experience/
- Joe on idealisation: https://joecarlsmith.com/2021/06/21/on-the-limits-of-idealized-values
- Opus suggests: Agnes Callard Aspiration, Susan Wolf on moral saints, Korsgaard
Some disentangling
- Values change: one possible disentangling
- Epistemic correction. Lots of supposed values differences are just differences in beliefs about the world.
- Conceptual clarification. You care about some X; you later learn that you were confused about its extension, and so what-you-value shifts.
- Deep values change. Something on a basic level changes.
- Shallow values change. From personal taste (you like jazz now) to smaller moral changes.
A better taxonomy?
- Changes to first-order values — probably very rare
- Changes to things like: inclusion criteria, features we take to be morally salient, relevant reasons, etc
- Epistemic updates — includes tactics, includes facts about the world
So you’re nervous about values change. What can you endorse?
- Improved consistency/reasoning, maybe? Less cope. Truer to your stated values.
- Epistemic updates (modulo infohazards, if those are actually real)
It’s fine to be nervous about waving a wand/pressing the values change button
- Change is fast, and slow change is generally better
- And, distinguishing the general “values change is usually good” from “this particular process is one you should submit to”
- Also, it would be fine if we just had bad intuitions here — we often do