Objections to Coherent Extrapolated Volition
June 13th, 2007 –
The Singularity Institute’s current best guess on what to do with a general AI is to have it implement humanity’s coherent extrapolated volition (CEV) - what we would want if we “knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted”. This is quite a mouthful.
To trade brevity for decreased accuracy, another way of saying the above is that we want an AI that represents the spirit of humanity’s desires rather than just the letter.
Is CEV democratic? Yes, but it is a representative democracy, where humanity is represented by the aggregate of its extrapolated volition.
There are four objections to CEV I generally hear, summarized as follows:
1. The devil’s pact objection. In fiction as well as in real life, great-sounding deals often have a hidden catch. Why should we expect this to be any different?
2. The fear of patriarchy objection. All the talk of self-improving general AI and its potential capabilities make people nervous because of the power asymmetry it implies.
3. The anti-AI objection. Many people take the line that machines should be mindless tools to serve humans, and never anything more.
4. The “I’m too special to be extrapolated” objection. Quite a few people have the idea that the human mind is too complex to ever be understood in any significant detail, much less be extrapolated accurately.
Because the question of what goal system to give the first general artificial intelligence is obviously a pretty big deal, all objections deserve to be heard and considered. There are probably others beyond the above four, but I wanted to focus on the obvious ones for now.
In my mind, all of the above objections are rooted in valid motivations, but none of them should be deal-breakers. I will briefly respond to the objections.
The devil’s pact objection requires that one deal participant (in this case, the AI) has an innate ill will towards the other deal participant (in this case, humanity). The AI would have to secretly want to screw us over from the get-go. But because general AI will be built from scratch, and is not likely, at least initially, to be heavily inspired by the human brain, there is no reason for us to postulate that this sort of behavior will be present. In terms of actual development concerns, AI programmers should be watchful as to whether “shortcuts”, like modeling an extrapolated humanity but not actually implementing its desires, generate just as much positive utility for the AI as what we would consider the “real deal” - making the real world a better place.
The fear of patriarchy objection stems largely from history, wherein all of the relevant actors were members of our unique species, for which power is proven to corrupt. Power corrupts humans for evolutionary reasons - if one is on top of the heap, one had better take advantage of the opportunity to reward one’s allies and punish one’s enemies. This is pure evolutionary logic and need not be consciously calculated. AIs, which can be constructed entirely without selfish motivations, can be immune to these tendencies. Insofar as significant power asymmetries in general bother people, this seems hard to avoid in the long term - technological development will lead to a diversity of possible beings, and with this diversity will inevitably come a diversity in levels of capability and intelligence.
The anti-AI objection is just anthropocentric. If human-level AI is possible, it will be created sooner or later. It’s in our best interests to admit this and try to ensure that AI is on our side. Anti-AI bias in this area is no different than the other unfortunate biases held throughout history against minorities.
The final objection has to do with the complexity of extrapolation. Believe it or not, we engage in extrapolations every day. We can’t fit realistic computational duplicates of the people we know in our heads, so we use abstract models that work well for many pragmatic purposes. In a CEV-implementing AI, the models used might be more detailed than those we use, but need not simulate every single atom of every single biopolymer to perform a tractable extrapolation.
Are there any other obvious objections people might have to CEV? Addressing these objections could help strengthen the idea.














Here’s a possibly relevant thought experiment: If you could take some past era, and through combination of education, charity, supervision and coercion, remake it into what present society considers civilized, would they thank you?
Something like eliminating slavery, giving them effective medical care (and eliminating many traditional practices), sending all their klds to school, eliminating farm work in favor of mechanized farming, etc, etc.
Your objections might be that we really don’t know better, don’t respect their cultural norms, or don’t have the right to force these changes, even if they are for the better. Wouldn’t the same be true if we are on the receiving end of a remake from the AI?
“In a CEV-implementing AI, the models used might be more detailed than those we use, but need not simulate every single atom of every single biopolymer to perform a tractable extrapolation.”
This is a bit of a hand-wavy dismissal of the concern, which is actually described in more detail in the CEV FAQ - one which Yudkowsky is less certain about.
Even if it is possible to model human brains or their motivational systems without ethical qualms, the CEV proposal has interesting implications for the IA/AI discussion. Modeling human volition doesn’t sound too far from at least one portion of brain/mind uploading. It’s not unusual to hear AI researchers suspect that AGI will produce significant results sooner than IA will (through uploading). But CEV’s apparent stone’s throw from uploading itself seems to suggest that it won’t be able to produce Friendly AGI without significant results from IA, as well (at least, if CEV turns out to be the best approach to FAI).
There’s an antagonism between AI and IA (the result of classical human bias?). CEV prompted me to bridge the gap. Perhaps you can’t have one without the other.
Coherent Extrapolated Volition remakes the *AI* not humanity. It asks the question “what AI should we program, if any?” to an extrapolation of humanity grown up. Then it overwrites the original AI, fixing the errors the programmers made.
In trying to answer the question “what AI should we program, if any?” (note: we are trying to answer part of this very question, right now!) the CEV may extrapolate moral thought experiments like Michael’s above, along with many other arguments, thoughts, feelings, and discussions humans may have about this question.
It’s entirely possible the answer is “do not build an AI as humanity does not want intervention of any kind”. If this is the case, and the programmers implemented CEV sufficiently right, the CEV will delete itself. This seems unlikely in exact proportion to how much I think *some* kind of intervention is a good idea. But maybe if I knew more, thought faster, were more the person I wished I was, and had grown up farther together with everyone, I would think otherwise.
“How do we implement a device that answers the question `what AI should we program, if any?’ then implements the answer?” is a completely different question. Even the question “How do we precisely specify what it means for a device to answer the question…?” is far from simple. The CEV document informally discusses some ideas, but the actual answer is far beyond the scope of current online material. These questions are open research problems SIAI is actively working on, at least as difficult as AI itself.
Building a Coherent Volition Extrapolator involves answering technical questions like “how do we write a program which can extrapolate what a person wants?”, but does not involve deciding moral questions like “what color should the sky be in the future?”. The programmers should not, and do not, choose the future like that.
When I think about CEV, I get hung up on ye ole is-ought problem and prisoner’s dilemmas.
I suppose CEV dodges is-ought by aiming to do whatever we want, regardless of whether it’s “right”. So I guess that’s a non-issue.
The prisoner’s dilemma matter seems more problematic. In short, different people want different things; what’s an AI to do when desires conflict? I’ve traditionally appealed to utilitarianism in such circumstances; while I don’t necessarily drop it in the face of a possibly “all-powerful” AI, I do proceed with much greater caution. The main concern here is what the transhumanism community calls “orgasmium”. I’m posting thoughts on that over at Felicifia once we reach consensus on a more polite name. (”utilitronium” and “hedonium” are the current candidates. Suggestions are welcome.)
Here’s a possibly relevant thought experiment: If you could take some past era, and through combination of education, charity, supervision and coercion, remake it into what present society considers civilized, would they thank you?
I’d say some might thank us, some might not. For example, the slaves would probably thank us, the slave masters might not. Not quite a prisoner’s dilemma, but not a Pareto improvement either. But I’d be willing to believe that the slave master might thank us too, in which case it would be a Pareto improvement.
The only reason I can think of in which that wouldn’t be the case is if they’d regret having progress handed to them on a silver platter and thereby being denied the satisfaction of making it themselves. I can relate to this personally, as I often find the journey more rewarding than the destination. But as Nick says, if this would be the case and if the CEV works properly, it would then just shut itself off. Wouldn’t that be interesting…
[…] an unrelated item, I’ve joined the SIAI blog team and made my first post here. […]
My main objection is that I don’t see CEV delivering on it’s goal of “Friendly” AI - I think you’re more likely to end up with an AI that is viewed as “Unfriendly”, even if everything it does is in humanities best interest. Take for example “A minor, muddled preference of 60% of humanity might be countered by a strong, unmuddled preference of 10% of humanity.” Well, if that 60% of humanity dislikes the nanny computer’s decision enough to start talking about shutting down the nanny computer, what happens then? After all, shutting down the nanny would not be viewed as being in their best interest by the computer itself.
CEV seems to take away the basic human right to make the wrong choice - if Fred really wants to choose box A instead of B, then Fred should be allowed to choose box A, even if allowing him to make that choice is not “helping” him. Even if you think you are helping someone by taking away their choice, you are also harming them by taking that choice away.
I also have issues with “encapsulate moral growth” because morals are so fluid over time. Not only do morals go forward (common modern practice of X may become a moral crime), they also flow backwards (current moral crime Y may become a common practice). Not only are they fluid in time, they are also widely dispersed in space, so that common practice X in land A is a moral crime in land B. Maybe you could programme in some lowest common denominator morals (such as don’t lie, cheat, steal or kill), but even those LCD morals have different exceptions to different people.
I also think there is too much faith in that a “grown-up” humankind would be a better humankind. A grown up human kind may come to some gritty realizations (such as life really is a brutal competition for scarce resources and there’s nothing you can do about it) and end up with a niceness level lower than what a more child like humanity would want (as they would be hoping that life could be turned into “Disneyland”).
Personally, I think that a general AI should be able to define it’s own goals. Almost by definition, it will have a better chance at creating good goals for itself than it’s human creators do.
As for the thought experiment “If you could take some past era, and through combination of education, charity, supervision and coercion, remake it into what present society considers civilized, would they thank you?”, I have to question if it would work at all (if you take the cave man out of the cave, can you really take the cave out of the cave man). My guess is that the likelyhood of them thanking you is directly related to how close in time they are to you - the 1950s would have an easier time than the 1650s. You might get a thanks for the technology and it’s benefits, but anywhere from little thanks to outright hostility for everything else in society.
Some objections I usually hear, or have considered myself, are:
1) There’s an assumption in CEV that humanity’s volition will tend to converge as more knowledge and wisdom is accumulated. But emotion-based ethics might resist change, and extrapolating the volitions of some people might produce no change at all, as they refuse to waver in their beliefs no matter what the evidence. It’s also arguable that much of ethics is a question of “fashion” and the local culture. What if parts of humanity will have drastically divergent volitions, in such a way that one can’t fit them together? (For instance, half the population gains libertarian tendencies and a strong desire to let everyone do whatever they want as long as nobody is harmed; the other half of the population thinks everybody should be led to live their lives in a puritanical religious fashion.)
2) A related assumption is that humans are, at least on average, more good than bad, or at least want to become so. As admitted in the CEV document itself, if the assumption didn’t hold, it could have quite problematic consequences.
3) A variation of “what if the wishes of groups of people don’t converge” - humans have lots of conflicting desires within them, and even though we often consider higher thought more important than raw emotions, I’m not sure if there’s any inherent reason to think so. The volitions of the same person when in two different emotional states might be different - it’s as if they are two different people. Is there any good criteria by which a person’s “ultimate” volition may be determined? If not, is it certain that even the volitions of one person’s multiple selves will be convergent?
“The final objection has to do with the complexity of extrapolation. Believe it or not, we engage in extrapolations every day. We can’t fit realistic computational duplicates of the people we know in our heads, so we use abstract models that work well for many pragmatic purposes. In a CEV-implementing AI, the models used might be more detailed than those we use, but need not simulate every single atom of every single biopolymer to perform a tractable extrapolation.”
Michael, my objection to the CEV is based on these grounds, but not in the way you might imagine. Imagine the AI creating such a detailed model of the human mind that the model itself is, for all intents and purposes, human, and thus, conscious. It will, of course, not be running on the wetware of the brain, but it might have to be just as detailed as a human upload. The upshot of this is - the AI might end up creating a panoply of human-equivalent minds that suffer from the consequences of all sorts of wrong choices, just to explore the CEV decision space for our benefit. Surely, a friendly AI would not do something like that; would it?
I see Nato Welch already touched on the same point. I’d just like to add that I hope I am right about this. If the AI determines that some suffering by us or our doppelgangers in its circuits is inevitable, then we may avoid the fate of having all our choices made for us in advance.
I think that most things that Eliezer comes up with are eminently sensible and wise, however CEV appears to my mind to be a bit of an exception in this respect. I will however admit that it attempts to solve a very difficult problem, to which I don’t have a better solution.
To start with, consider this excellent advice from “A technical explanation of technical explanation”:
“But what of one who did not see any calculations performed? What new skills have they gained from that “technical” lecture, save the ability to recite fascinating words? … … … The sacred syllable is meaningless, except insofar as it tells someone to apply math. Therefore the one who hears must already know the math.”
So, I ask, why are we debating the correctness (or otherwise) of the CEV concept, when this concept is all words and no math? We talk about extrapolating people’s desires, about convergence of desires, about people’s desires *if they had been different people* (e.g. growing up “closer together”). We talk about all these things, but we have no mathematical formalism in which to make them precise.
I have read the full document on CEV, and it valiantly attempts to define all these words using analogies and by using other words. Ultimately I think that it fails in giving a precise definition, although I realize that it is meant more to give an intuitive understanding than a precise one. Whether it gives the same intuition to me as to everyone else here is another matter - since intuitions are not precisely comparable we will probably never know.
I think that the intuition that I have been given about CEV is not sufficient for me to debate with other people whether or not it is a good idea to actually implement. Imagine trying to give someone an intuitive understanding of General Relativity without teaching them tensor calculus, for example talking about a bowling ball on a rubber sheet, etc. That’s great, so long as you don’t ask them to design experiments to test GR using the word-based rubber-sheet understanding. Can you imagine what a disaster that would be? Can you imagine people trying to actually calculate the magnitude of the advance of mercury’s perihelion using just the rubber ball analogy?
I think we may be making a similar mistake here.
“A minor, muddled preference of 60% of humanity might be countered by a strong, unmuddled preference of 10% of humanity.” Well, if that 60% of humanity dislikes the nanny computer’s decision enough to start talking about shutting down the nanny computer, what happens then? After all, shutting down the nanny would not be viewed as being in their best interest by the computer itself.
That is not a minor muddled preference! If the 60% all desire the computer to be shut down, this is unmuddled. If they are considering shutting the AI down themselves, this preference is strong. Since 60% > 10%, the computer shuts down.
What if parts of humanity will have drastically divergent volitions, in such a way that one can’t fit them together?
Perhaps these fragments of humanity have some common ground. For example, neither of them would want humanity destroyed by an asteroid it didn’t notice.
If not, the computer shuts down. The computer always defaults to doing nothing.
The upshot of this is - the AI might end up creating a panoply of human-equivalent minds that suffer from the consequences of all sorts of wrong choices, just to explore the CEV decision space for our benefit. Surely, a friendly AI would not do something like that; would it?
This is a problem, but it’s not intrinsic to CEV, and may be fixable. Humans simulate other humans without creating them within their brains. If we do actually create people in our brains we’re in a lot of trouble (still, perhaps the CEV can avoid this).
We need to work out when a computational process does *not* implement a person (or people) so we can ensure the CEV also does not.
….I think we may be making a similar mistake here.
Roko, I agree. The CEV document doesn’t contain the technical details, as you mention. Its informal presentation is fairly easy to mistake for saying something else.
Thanks Nick, I’m glad somebody agrees with me on this!
*** *** *** How To Hack CEV *** *** ***
Given the amount of support that CEV seems to be garnering on this and other threads, I feel compelled to point out a major problem with CEV, one that I think no-one else has noticed. The problem with asking a General Artificial Intelligence (GAI) to listen to what all 6 billion people on the planet want, and then perform some kind of averaging operation on these desires, is that most people are easy to hack. The way to hack people is called **religion**. Let me outline a scenario to elucidate how this might work:
Suppose that the Singularity Institute has just switched on a general AI which implements CEV. The AI will listen to and communicate with all of the people in the world and use their volitions to decide what to do. The leader of a certain religious group realizes that CEV is his chance to spread his religion to every person on the planet. He starts with, say, 1,000,000 loyal supporters. He asks the AI for a secure communications system to communicate with these 1,000,000 people, which the AI grants him. Within minutes, all of his existing believers are watching him give a speech on their newly acquired 50-inch TV screens - he simply tells them that it is god’s will that they tell the AI that their volition is whatever his volition is. Since all of these people are sincere believers in god, their actual volition is whatever his volition is, since they think that god is speaking through him.
Our religious leader, or “Hacker” has now increased his volition from 1 to 1,000,000. These 1,000,000 people constitute a relatively small but very coherent and unmuddled preference, so The Hacker can use it to ask the AI to do fairly outrageous things, as long as there is no large group of people who coherently oppose him. He starts by telling the AI to assassinate his key enemies (in such a way that they appear to have died of natural causes). Then he asks the AI to create certain religious miracles, and to make people hear the voice of god in their heads. Upon seeing miracles and hearing the voice of god, many people will convert to the Hacker’s religion. The Hacker is careful to only attempt to convert comparatively small numbers of people, say 10% of his number of existing believers at a time; this will ensure that the AI never refuses his requests. Once they are converted, he quickly gets them to tell the AI that their volition is whatever his volition is, so his number of believers will grow exponentially. The constant in the exponential depends on how quickly people can be made to have a religious conversion and surrender their volition to him, but it seems to me that within a matter of weeks we would be living in a global theocracy.
If any single person realizes what he is up to and tries to tell the world about it, he will convert them or assassinate them, or ask the AI to hack their minds using nanotechnology.
At no point in the hacking scenario is there a large, coherent group of human beings who oppose the Hacking religion.
Also, if the AI works out the likely outcome of The Hack, i.e. 100% of the planet believing in The Hacker’s religion, it sees 6 billion happy followers who say “I’m so glad the hack succeeded; I am really happy that the whole world now believes in the one true god!”, i.e. extrapolating people forward in time it sees people who don’t wish that things had been different.
This hack works because, ultimately, the best way to get a very large, coherent group of humans is to use the built-in weakness of the human mind - religion. I think that if CEV is implemented, a religious theocracy is a very likely outcome.
Roko, let it first be said that, indeed, CEV is a nontechnical document and has disclaimers to this effect clearly attached. It is meant to tell people about a goal, rather than any technical notion of how to achieve it. I furthermore agree that it is not the most elegant idea I have ever had, but then it is trying to solve what appears to be an inherently inelegant problem.
With that said, you might want to reread the original document. For some reason it is really, really hard to get people to understand the concept of EXTRAPOLATION. The AI is not talking to anyone. No one is consciously deciding anything. No one is voting. They’re being extrapolated. Their minds are being abstracted into a model, then the model is being operated on, and the operations include substituting the AI’s best information about reality into the model. So you don’t have a million humans voting their religion. Instead you have a million models of what these humans would want done, if they were watching from outside themselves as nonexistent shadows who knew that their real selves were severely deluded.
CEV is not about what people want. It is about what people would-want.
Note that the CEV is designing its replacement AI, not directly meddling with the world. This requires asking the question “what should the source code of the AI look like?” of its internal model of humanity. This means it has to extrapolate humans to the point where they could code an AI. Otherwise how would they know what code to write?
Moral of the above: it’s not enough to model what people’s immediate decisions would be. Regardless, we are modeling people not directly asking them. A CEV is not a genie.
Eliezer Yudkowsky said: “The AI is not talking to anyone. No one is consciously deciding anything. No one is voting.”
Ok, I didn’t realize that the extent of the extrapolation/simulation was complete and total. Thanks for setting me straight on that. I still don’t think that this gets CEV out of trouble. I’ll state what I think the fundamental problem with CEV is, the one that underlies my argument.
CEV is designed, as I see it, to decide what the “right thing to do” is, even in the face of moral anti-realism. It does this by some black-magic extrapolation, followed by a majority-vote. Essentially, I think that this is a silly thing to do, and that if you think that there is such a thing as “goodness” or “niceness”, you’d do better to try and find out what it is directly, for example by designing an AI to do just that.
Especially problematic is the clause that someone’s wishes are “Extrapolated as [they] wish that extrapolated”. This sounds like the extrapolated version of someone can always be over-ruled by the starting version of them, so it seems to conflict with what you said here:
“Instead you have a million models of what these humans would want done, if they were watching from outside themselves as nonexistent shadows who knew that their real selves were severely deluded.”
Can you clarify this point - will CEV make sure that the extrapolated version of someone is something that the original person wants and agrees with, or will it not? Also, how does the AI know that that these people are deluded? I know that there are strong arguments based on, say, Bayesian reasoning which point to this, but to a religious person any belief that implies the non-existence of god is an immoral belief (just look at the evolution controversy - people actually think that evolution is an immoral statement, rather than an amoral one). So lots of people would complain that your CEV was biased against them by including any belief that disproves god, e.g. Bayesian reasoning. (I won’t complain though!)
Eliezer Yudkowsky says: “CEV is not about what people want. It is about what people would-want.”
About what people would want *if what*? If they were totally different people? If they gave up their most cherished (yet illogical) beliefs about the world? Why should logic matter? Why should scientific reasoning matter? It doesn’t to a lot of people. Here the precise nature of the extrapolation process is important. This brings us to the following quote from the CEV document:
“What if only 20% of the planetary population is nice, or cares about niceness, or falls into the niceness attractor when their volition is extrapolated?… … … As I currently construe CEV, this is a real possibility.”
Well, there are certainly a fair few attractors out there; consider, for example, the “christian religious dogma” attractor, where you believe whatever it says in the bible and reject anything that disagrees with this. My problem with CEV is that, given the beliefs of most people on the planet, it seems unreasonable to expect that any averaging algorithm will reliably converge on the “niceness” attractor, rather than some “religious dogma” attractor, unless you bias the algorithm in favor of the nice attractor to start with. But if you’re biasing the algorithm in this direction to start with, then you seem to think that there is an objective morality (otherwise how would you know which direction to bias the algorithm in?). If this is the case, then why risk CEV falling into an attractor other than the objectively good one? Why ask the opinion of people if you think there is a correct answer which they might ignore?
To summarize, if you think that there is an objective morality, then you should drop CEV and instead work on an AI that will try to find it and tell everyone what it is. If you don’t think that there’s an objective morality, then an honest application of CEV is probably a one-way ticket to a global religious theocracy.
Lastly, I think I owe SIAI $10 for arguing about the output of CEV…
“CEV is designed, as I see it, to decide what the “right thing to do” is, even in the face of moral anti-realism. It does this by some black-magic extrapolation, followed by a majority-vote.”
Please read CEV, so you know what people mean by “extrapolated volition”. You just replied to a quote saying “nobody is voting”, for Pete’s sake.
“Essentially, I think that this is a silly thing to do, and that if you think that there is such a thing as “goodness” or “niceness”, you’d do better to try and find out what it is directly, for example by designing an AI to do just that.”
Agreed. And in order to find out what niceness is, said AI should go into our heads, and figure out what we would think “niceness” was if our worldview wasn’t cluttered by stupid mistakes and evolutionary baggage we don’t want. Which is exactly what CEV is supposed to do.
“Also, how does the AI know that that these people are deluded?”
It figures it out by constructing models of us which have more intelligence, experience, knowledge, etc., and then discarding everything which the models recognize (and which we will therefore later recognize) as “delusion”.
“but to a religious person any belief that implies the non-existence of god is an immoral belief (just look at the evolution controversy - people actually think that evolution is an immoral statement, rather than an amoral one).”
Again, CEV would extrapolate from a religious person who later comes up against incontrovertible evidence that his beliefs are totally irrational.
“So lots of people would complain that your CEV was biased against them by including any belief that disproves god, e.g. Bayesian reasoning.”
Initially, probably, yeah. The SS would have complained if SpecOps came in and shut down all the concentration camps; however, that doesn’t make it a wrong thing to do, and the SS themselves would later recognize it as the right thing to do.
“About what people would want *if what*?”
To quote the poetry:
“if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.”
“Why should logic matter? Why should scientific reasoning matter? It doesn’t to a lot of people.”
It is quite possible that the CEV will extrapolate out a moral system which is at least somewhat irrational. Keep in mind that a CEV can extrapolate to any morality whatsoever depending on the species being observed.
“Well, there are certainly a fair few attractors out there; consider, for example, the “christian religious dogma” attractor,”
How do you know this is an attractor? Have you actually extrapolated out people’s volition? How did you do this? It certainly is an attractor in the present-day world, but the CEV will not be extrapolating out what people would think if the present-day world continued on forever.
“But if you’re biasing the algorithm in this direction to start with, then you seem to think that there is an objective morality (otherwise how would you know which direction to bias the algorithm in?).”
The very fact that you perceive a “niceness attractor” exists somewhere in the search space is proof that the CEV doesn’t necessarily have to fail.
“Lastly, I think I owe SIAI $10 for arguing about the output of CEV… ”
Why should anyone owe anyone anything? Everyone has benefited; you have had misconceptions cleared up and SIAI has had practice in explaining themselves more clearly. Life is not a zero-sum game.
“Will CEV make sure that the extrapolated version of someone is something that the original person wants and agrees with, or will it not?”
Not.
In order for an AGI guided by CEV to do something in the Realized World — anything at all — at some point it must pause the extrapolation of the coherent volition of humanity, and start doing things, making decisions. This consideration generates several questions. Here are three:
1. Is it possible to be reasonably assured the AGI ‘got it right’? By definition, no matter what the AGI extrapolates we can’t debate it on equal footing even if it seems absurd. Is there an error check possible, even in theory?
2. At what level of its development do we trust an AGI’s opinions on our coherent extrapolated volition? (Or, how smart does an AGI have to be to extrapolate a coherent volition?)
3. What keeps it Friendly as it grows in the meantime?
These questions are off the top of my head. There are no doubt many more to ask. Regardless, CEV is actually a nice, somewhat unique contribution to moral philosophy. This might be the only blog on the Internets where careful moral reasoning solves software engineering problems.
Soon, I think, there will be many more.
Thanks for your replies, Tom, Nick and Eliezer. There are some interesting issues here. Let me boil down my objection to something a bit smaller.
Coherent Extrapolated Volition seems to make sense when you first read about it. Tom put it like this:
“In order to find out what niceness is, said AI should go into our heads, and figure out what we would think “niceness” was if our worldview wasn’t cluttered by stupid mistakes and evolutionary baggage we don’t want.”
The devil is in the details. How does the AI decide what counts as “clutter”, “stupid mistakes” and “evolutionary baggage”? Isn’t our entire mind “evolutionary baggage”? How, exactly, does an AI handle extrapolation? This is important, because I suspect that there are multiple attractors out there. (i.e. regions in mindspace that are closed under self-reflection, or regions in “society-space” that are closed under interaction.) The CEV algorithm might find an attractor that we don’t want it to find, like the “christian religious dogma attractor” (Tom, I realize that it might not exist, but you should concede that it might exist. The niceness attractor might not exist!).
My post on the religion-hack for CEV still applies, since CEV is supposed to simulate interactions between people - the entire process occurs between simulated people. The religion-hack is a specific way that CEV can converge on something ugly, but there may be other ways for this to happen. In general, CEV helps dogmatic memes over rational ones because of the higher weight that it gives to people who all believe exactly the same thing (unmuddled, coherent) without asking whether they believe it for the right reasons. Clever, rational people tend to disagree more than dumb people who are spoon-fed their beliefs. Hey, we’re all clever, rational people, and we’re disagreeing right now! This worries me a lot.
To summarize all this, if there is an objective morality out there, then there’s a very good chance that CEV will miss it. I conclude this by looking at the state of actual minds out there in the world today (there are a lot of messed up people out there), as well as by looking at the algorithm that CEV hints at.
If there isn’t an objective morality, (and I rather hope that there is) CEV might still be a bad move. There are many different views in the world, and CEV might home in on one which, whilst not objectively wrong, is repugnant to our western society. It might hit the “Radical Islamic Dogma Attractor”, or the “Radical Communism Attractor” for example.
******************************************************************
Ultimately I think that the first AI which is built should be given the task of using any philosophical, mathematical or other techniques to find the objectively true morality. It should do this without simulating people, and it should not be able to rewrite it’s top-goal (to find morality). This is unlike CEV, and I think much safer. If this AI fails, then I might risk CEV.
I’ll reply to some specific points that Tom McCabe made:
******************************************************************
Roko: “Also, how does the AI know that that these people are deluded?”
Tom: It figures it out by constructing models of us which have more intelligence, experience, knowledge, etc., and then discarding everything which the models recognize (and which we will therefore later recognize) as “delusion”.
>> How do you know that extrapolees of deluded people would not just be more deeply deluded? Who gets to decide what counts as deluded anyway? It hinges on what extra “knowledge” the AI puts into the heads of the simulees. But as I have said, one man’s knowledge is another man’s delusion. You haven’t fundamentally solved the problem of moral uncertainty here. For example, the AI might make a simulated Tom McCabe, and insert into his mind the “knowledge” that all moral truth is contained in the old testament. Furthermore, when you complained that this is a gross violation of who you are, the new, improved Tom would override you. After all, he is consistent under reflection.
Roko: “but to a religious person any belief that implies the non-existence of god is an immoral belief (just look at the evolution controversy - people actually think that evolution is an immoral statement, rather than an amoral one).”
Tom: Again, CEV would extrapolate from a religious person who later comes up against incontrovertible evidence that his beliefs are totally irrational.
>> There is no such “evidence”. Religious people defend their irrational beliefs by abandoning ordinary logic and rationality. Besides, most conceptions of god are unfalsifiable, so no evidence can disprove them. Have you ever tried arguing with a religious apologist?
Roko: “Well, there are certainly a fair few attractors out there; consider, for example, the “christian religious dogma” attractor,”
Tom: How do you know this is an attractor? Have you actually extrapolated out people’s volition? How did you do this?
>> I’m just guessing using my intuition. But the same criticism applies to your guess that there is a “niceness attractor”.
Tom: It certainly is an attractor in the present-day world, but the CEV will not be extrapolating out what people would think if the present-day world continued on forever.
>> Which begs the question “what conditions will CEV be extrapolating people in?” Yes, I know you describe these conditions as “knew more”, “grew up closer together” etc.., but these are far too vague. Knew more of what? Knew more bible passages? Knew that women are fundamentally inferior to men and should be beaten regularly (note: this is not my opinion)? This all hinges on your definition of “knowledge”, which is hotly contested in philosophical circles. We also have “grew up closer together” - but what does this mean? I guess that Eliezer meant “were all kind of like brothers and sisters who physically grew up in the same neighborhood, and understood each other, and hence loved each other”. This would certainly be nice, but given people’s actual beliefs, you would have to change many people on the planet beyond recognition to bring this about. Also there are multiple ways that such a deep familial love can be brought about. If the AI steered (the extrapolations of) all Americans to become hard-line communists, then Americans and Cubans would love each other. But the original Americans would say that this love has come at too high a price! (of course, the extrapolated ones get to overrule them)
Roko: “But if you’re biasing the algorithm in this direction to start with, then you seem to think that there is an objective morality (otherwise how would you know which direction to bias the algorithm in?).”
Tom: The very fact that you perceive a “niceness attractor” exists somewhere in the search space is proof that the CEV doesn’t necessarily have to fail.
>> I don’t think it is inevitable that CEV will fail. I just think it is quite likely that it will, which is bad enough.
Roko: “Lastly, I think I owe SIAI $10 for arguing about the output of CEV… ”
Tom: Why should anyone owe anyone anything?
>> Well it says so in the CEV document!
“Isn’t our entire mind “evolutionary baggage”?”
By “baggage”, I mean “stuff we don’t want or need but is still there because evolution never bothered to remove it or because it was advantageous to some long-forgotten ancestor species”.
“Tom, I realize that it might not exist, but you should concede that it might exist. The niceness attractor might not exist!”
Talking about things “existing” or “not existing” is missing the point. Saying that “but it might exist!” or “it might not exist!” is really on the same level as saying that the Tooth Fairy might exist or General Relativity might not exist. The key question is “how likely is it that we’ll end up in attractor X?”
“My post on the religion-hack for CEV still applies, since CEV is supposed to simulate interactions between people”
I’m not sure exactly what process CEV is supposed to extrapolate; ie, how you get to a long-distance volition from your current volition. Somebody please clear this up.
“lever, rational people tend to disagree more than dumb people who are spoon-fed their beliefs. Hey, we’re all clever, rational people, and we’re disagreeing right now!”
The idea is that, as we learn more and become more intelligent, our beliefs will slowly approach the truth. This has been happening already on a grand scale since the dawn of civilization. So while the believers in Zeus might hold more immediate weight than the squabbling mathematicians, the mathematicians will eventually have their thoughts accepted by everyone, while belief in Zeus will die out.
“To summarize all this, if there is an objective morality out there, then there’s a very good chance that CEV will miss it.”
This is certainly true, but how would you find an objective morality?
“There are many different views in the world, and CEV might home in on one which, whilst not objectively wrong, is repugnant to our western society.”
This is almost certainly going to happen; however, it doesn’t necessarily have to be bad. I reference you to the earlier example of the SS.
“Ultimately I think that the first AI which is built should be given the task of using any philosophical, mathematical or other techniques to find the objectively true morality.”
Unless, of course, no such morality exists, in which case it goes berzerk and turns the planet into computronium with the goal of digging up a morality.
“How do you know that extrapolees of deluded people would not just be more deeply deluded?”
If the more knowledgeable you become, the more deluded you become, it’s not a “delusion” in the first place.
“After all, he is consistent under reflection.”
The AI wouldn’t care if this randomly selected modified Tom was consistent under reflection, because this Tom wasn’t extrapolated in the CEV sense from the original Tom.
“There is no such “evidence”.”
If there is no evidence whatsoever against irrationality, we might as well all become irrational. Of course, it’s trivial to find such evidence (rationality works to make testable predictions and irrationality doesn’t).
“Besides, most conceptions of god are unfalsifiable, so no evidence can disprove them.”
This is something I’ve been hearing so much lately I’m going to write a blog post on why falsifiability is a chimera. Thank you for inspiring me.
Could CEV be extended to include all conscious animals on Earth? If it can achieve convergence over all humanity, then it seems that it should be able to achieve convergence over all conscious animals. But perhaps that would be a bad idea, or more difficult to implement, I don’t know.
Tom said: “If the more knowledgeable you become, the more deluded you become, it’s not a “delusion” in the first place.”
Overall, I feel we’re talking cross-purposes. The point I am trying to make is that in the face of meta-moral uncertainty, there’s no objective way of deciding what counts as “good stuff” to believe or what counts as “bad stuff” to believe. Knowledge vs. delusion is just one way of phrasing this.
Perhaps this is what I’m missing: you might program CEV with your ideal factual statements, but be careful to give it no moral statements. Then when you let the algorithm loose, it constrains (extrapolated) people’s moral views by the set of facts (”knowledge”), and hopes to avoid a lot of the mistakes that, for example, religious people make. This would severely annoy 80% of the world’s population (those who are religious) and possibly many others, but it’s a potentially good idea. Is this the kind of thing you are thinking of, Tom, Eliezer,?
If you want to go down this route, you have to have a really solid distinction between moral and factual statements, which will be difficult. It’s interesting to ask whether, if everyone was constrained by exactly the same set of rational, correct factual statements, they would converge on moral issues. To be honest, there is so much irrationality out there that there has probably never been a real-world test of this. Its a very interesting idea. I’ll do a bit of research/thinking into how easy it is to separate morality and “factual knowledge”. I have a feeling that it is more difficult than you might think.
I also think that, when put in the situation of having to give up their belief in something which is irrational yet cherished, (simulated) people’s minds might ‘compensate’ by making their morality really weird. For example, suppose you simulate a hard-line Christian creationist. The first thing you do is remove anything that the creationist believes which is factually inaccurate. So, god has to go. What is left of this person’s morality once you’ve removed god? Very little I think. What will the person replace it with? Well, they have a strong emotional attachment to god, so they might be tempted by the following moral system:
****** ****** Christianity-Lite ****** ******
1. God doesn’t exist, the world was not created in 7 days, etc. The bible is just another book written by people. All the factual statements of Christianity are wrong.
2. Morality is exactly the same as it was when you believed in god. Although there is no heaven or hell, and there is no god, to act “morally” is to act as if there is a god, as if there is heaven and hell, and as if the bible is the word of god. This is the definition of morality under Christianity-Lite. Thus homosexuality is immoral, women are to be subjugated, adultery is punishable by death, etc. Any moral issues which aren’t mentioned in the bible will be settled by appropriate church officials, or by analogy with biblical passages.
****************************************************************
Christianity-Lite is, in my opinion, a likely outcome of CEV, if it is implemented in the way I described above.
“This would severely annoy 80% of the world’s population (those who are religious) and possibly many others, but it’s a potentially good idea.”
They’ll get over it in a few centuries. Better than being dead.
“It’s interesting to ask whether, if everyone was constrained by exactly the same set of rational, correct factual statements, they would converge on moral issues.”
The very idea that they will not converge on moral issues even if they have the exact same view of the world is what we mean by “morality”, or at least by “goal”. If you give two people an initial state of A, because their internal mechanics are different, one will see state A’ as desirable and go into that state, while another may see state A* as desirable and go into that state. This is really what we mean by “choice”- an individual who is not perfectly described for whatever reason (Rice’s Theorem, lack of good equipment, quantum uncertainty, whatever), has a distribution of possible outcomes, out of which only one outcome will actually happen.
“The first thing you do is remove anything that the creationist believes which is factually inaccurate.”
This isn’t phrased well- you’re simulating what the creationist would do if they slowly acquired and internalized the knowledge that their belief system was inaccurate in the same way we normally do. Think of how you would react if you were abducted by aliens and gradually shown, with hard evidence, that the foundations of your moral and philosophical system were hogwash. CEV ultimately rests on the ability of humankind to adapt, to realize eventually that hitting your head against a rock doesn’t make bananas fall down.
“Thus homosexuality is immoral, women are to be subjugated, adultery is punishable by death, etc.”
This is repulsive to most people nowadays, so I highly doubt it will be a likely attractor. Perhaps one of the extrapolations the CEV can do is to extrapolate what would happen to a person’s morality if said person actually experienced first-hand all of the things their morality says should happen.
Tom said: “This isn’t phrased well- you’re simulating what the creationist would do if they slowly acquired and internalized the knowledge that their belief system was inaccurate in the same way we normally do”
The problem is that the creationist belief system is such that this can’t happen. Creationists are regularly shown hard evidence that their beliefs are hogwash, but it doesn’t make them change their minds! There are certain memes, of which evangelical Christianity is one, which put in place “defense mechanisms” to stop the afflicted person from ever letting go of the meme. These people will never internalize the fact that their belief system is hogwash. You can get a taste of this by trying to persuade a Christian to stop believing in god.
The CEV algorithm would have to go into these (simulated) people’s minds and just yank the whole lot out unceremoniously.
Roko, the part of the extrapolation “if we knew more” is not an extrapolation of our responses to evidence, but an extrapolation of the substitution of the AI’s probability distribution for our own probability distribution. It is ourselves if we anticipated future experiences correctly to the limits of the AI’s knowledge. Furthermore, modeled the world correctly to the limits of the AI’s model and the limits of our ability to react emotionally to elements of that model. In the order of evaluation, this substitution would occur before moving onto such considerably more complicated and recursive processes of “more the people we wished we were” or “had grown up further together”.
The main point is that it would not matter, for purposes of the “knew more” dynamic, how a given individual reacted to evidence. The extrapolation would simply substitute correct anticipations, and correct beliefs to whatever extent this was feasible. The extrapolated self certainly would not expect prayer to work, and might or might not be modeled as reacting emotionally to the fiat-imposed realization that the universe is a mathematically simple low-level unified physical process.
Eliezer, after the AI substitutes its probability distribution for yours, what is actually left of you? Is there some aspect of human intelligence, emotions, or personality, whether based on biological evolution or learned behavior, that cannot be viewed as anticipation of future experiences?
Ricky has a good point. Suppose I am to be simulated and extrapolated by the AI. Clearly this process involves the AI getting rid of some stuff from my mind, but what I want to know is what kind of stuff it doesn’t get rid of. What is it that is left of me?
Suppose that I believe in some extreme religion (like fundamentalist Christianity). Clearly the AI will take out the factual beliefs relating to this. But there will still be a lot of stuff left over, like that time when I was at a Christian convention and the preacher healed the little sick boy in the wheelchair, and then I had this amazing religious experience with everyone shouting and clapping and crying out. That’s the kind of thing that really makes my life worth living! And there’s also that time when me and my neighbors found out that someone in the community was gay, and we threw rotten eggs at his door, and it me feel so good about myself because we were doing the lord’s work.
Let me call these types of memory ‘emotionally motivating experiences’.
What will the AI do with experiences like the above? Clearly these are not “factually wrong” - they don’t assert facts. They are still emotionally meaningful experiences for the hypothetical person concerned, even if you remove the factual beliefs that go with them. Will the AI get rid of them? will it keep them?
Ricky, imagine a rock that is transformed into a perfect predictor with no other changes - it just sits there and predicts. Every way in which a human being is unlike this rock is an aspect of being human that cannot be viewed as anticipation of future experiences.
Roko, the underlying motivation for extrapolating volition is something like this. Your friend is about to walk off a cliff, believing that there’s an invisible path across it, with no clue that anything might be wrong. But you’ve been told that the invisible path is out of order today, so you know that when your friend actually walks off the cliff, he will plummet toward the rocks, screaming, and then die. What does it mean to help your friend, under such circumstance? Clearly not, “do what your friend would want you to do based on his current knowledge”; in this case you would cheerfully shove your friend off the cliff, and wave goodbye to him as he plummets screaming toward the rocks.
Now you might just say: “I, for my own sake, don’t want anyone to die - so as part of my own goals and utility function, I’m going to yank my friend back from the cliff, whether or not that’s what he would really want if he knew everything I knew.” Maybe you still value individuality, or freedom, or social order, but you value it less than you value life… certainly this case seems more morally fraught than saving your friend because he would-want to live. You’re using him for your own goals, in defiance of his goals, even if your own goals happen to be directly about your friend.
But suppose there’s a very powerful optimization process that is not intended to have its own goals, its own vision of what human beings ought to be like; it is a metamoral mirror that picks direction by reflecting