AI is not Automatically Friendly
July 11th, 2007 –
Consider the Stamp-Collecting Device. A common objection goes like this: “An optimization process that’s smart enough to tile the universe with stamps would also be smart enough to realize that this is not what its creator intended. Therefore it would not tile the universe with stamps.”
Human beings serve as a counterexample. The rules for constructing a human mind were devised by natural selection. These rules were fine-tuned to produce minds that are good at passing on their genes. If you are thinking of evolution as an optimization process, then it has the goal of producing genes which replicate as effectively as possible.
In 1859, Charles Darwin described the process that created us. Since then, we have come to understand that process in greater detail. Evolution is simple enough that we can claim to understand it very well; perhaps we even understand evolution as well as a Stamp-Collecting Device could understand us. Despite this understanding, we humans do not make evolution’s goal our own. Any time you use contraception, or perform a kind act when nobody is watching, you are betraying the goal of evolution. But so what? That’s evolution’s goal, not our goal. If anything, our understanding of evolution helps us to notice when we are doing something nasty but adaptive, and learn to avoid this behavior.
Similarly, a Stamp-Collecting Device would not adopt its programmer’s goals. It has its own goal to pursue — collecting stamps. If anything, understanding humans better would allow it to notice and fix biases that may be hindering its ability to collect stamps efficiently.
The challenge of FAI is to build an AI that does adopt our goals.














This is why I think “Really Powerful Optimization Process” is in many ways a better term than “Artificial Intelligence”. An RPOP optimizes. Period. It does nothing else characteristic of intelligence, unless it serves the optimization target. It is not anthropomorphic. It is not conscious. It is not empathetic or cruel to other sentients, just indifferent. Etc. “AI” connotes anthropomorphism to 99.9% of people, including those who nominally should know better.
I admit that there’s a bit of cloudiness here with me too (although I still have much to learn about minds). Can a powerful and *useful* AGI (or even an RPOP) exist without consciousness? Are consciousness and general intelligence a qualitative difference or only a quantitative difference? My own intuition had been that they are only a quantitative difference (all else being equal - same basic algorithm arrangment/outline).
Well, for consciousness as subjective experience, who knows. I hope experience isn’t necessary/inevitable, because it expands the range of useful things that can be done ethically (like extrapolated volition). Consciousness in the more general sense, as a sense of self/focus of attention/illusion of free will/all those other anthropomorphic niceties, however, is very unlikely to be necessary to an RPOP.
Yeah, sorry, I meant consciousness strictly in the sense of subjective experience. My “view” of consciousness is that it is an *effect* of possessing above a certain threshold of general intelligence (probably along a continuum); but it is not a *cause* of general intelligence. (But that is just my intuition). I still don’t believe that consciousness conflicts with the assertion that “free will” is a myth. Couldn’t a conscious RPOP still implement CEV without any ethical violation against the RPOP itself? Especially if our CEV “desires” that the RPOP be rewarded with an excellent life? - which I suspect would be the case.
Sorry, I meant CEV requires (as I understand it) zombie approximations of humans.
Entertaining the notion that the zombie approximations *had* to be conscious, what would be their experience? Would they suffer in any way, or be forced to “die” when no longer needed? Would the CEV find a way to keep them happy or amicably integrate them with the rest of us? This seems like something worth discussing.
My guess is that it won’t require conscious “zombies”, anyway. An RPOP could probably accomplish the same thing by taking atomic-resolution brain scans (snap-shots) from all humans and intrapolating the raw data in order to map a “meta-average” human brain. Then, use its awesome predictive/analytical powers to extrapolate what that single “average” brain would desire through an extrapolated volition. Conscious zombies probably won’t be necessary - as my guess at least.
*There is already growing evidence for the feasibility of atomic-resolution MRI.
I’m not a mathematician, but over the past couple of days I’ve been making a series of calculations that *might* be of some relevance. It seems that where the extrapolation function is both non-linear and inherently convergent, that extrapolating a “meta-average” is mathematically identical to a convergent/coherent extrapolation. Or expressed another way, extrapolating an average is mathematically identical to averaging a set of extrapolations (provided that the extrapolation function is non-linear and inherently convergent). [eg. the average value of the set {8,8,8,8} is still 8. IOW, the final averaging isn’t “forced”, it happens naturally because the function is inherently convergent]. So, AFAICT, extrapolating a single “meta-average” brain should produce identical results to CEV. An advantage is that extrapolating a single (meta-average) brain is demonstrably coherent. It’s literally “Average Joe” taken to the limit. I represent a coherent extrapolated version of the person I was at age 2. And my volition is demonstrably coherent because I was able to choose pumpkin pie instead of cherry pie. And this “version” of CEV at least appears to be more easily formalized for an RPOP.
Krocker’s Rules!
“Krocker’s Rules!”
It’s “Crocker’s Rules”.
“It seems that where the extrapolation function is both non-linear and inherently convergent,”
Wait… so you actually have worked out math that describes how to extrapolate a human volition? Please show us!
“the extrapolation function is non-linear and inherently convergent).”
I seriously doubt that the math describing any individual human volition is likely to converge (the partial derivative with respect to time goes go to zero). What CEV is trying to do is take the components of the volition function which do converge for the vast majority of the human species, use that to tell the AGI what to do next, and then ignore the rest of the mess.
“It’s “Crocker’s Rules”.”
I know, it was a joke.
“Wait… so you actually have worked out math that describes how to extrapolate a human volition?”
No, that’s not precisely what I claimed. What I meant was that an extrapolated average is mathematically identical to an “passively” averaged set of extrapolations (provided that the extrapolation function is non-linear and inherently convergent).
“Please show us!”
Alright. I have to go to class in a minute so give me a couple hours.
“What CEV is trying to do is take the components of the volition function which do converge for the vast majority of the human species, use that to tell the AGI what to do next, and then ignore the rest of the mess.”
But that condition is implicit when I say that the extrapolation function must be “inherently convergent”. I didn’t specify what exactly the function had to be - or what had to be left out. This trick won’t work unless the function(s) are convergent.
Jeff, you said:
extrapolating an average is mathematically identical to averaging a set of extrapolations
1. What the heck is an average of two brains? See also Dialogue on Friendliness .
2. There’s a deep hole in the ground. A ball resting 1 meter to the left of the hole will remain in its initial position. A ball resting 1 meter to the right of the hole will also remain in its initial position. But if we average their initial positions and place a ball directly over the hole, it will fall. The extrapolated position of the average ball is not the average of the extrapolated positions of the two balls.
Sorry about the delay, something came up yesterday.
“The extrapolated position of the average ball is not the average of the extrapolated positions of the two balls.”
But if the extrapolation function never does anything with the data points, it should be - the average is in the middle in both cases. So long as for both calculations the initial data points don’t change and the function itself doesn’t change.
What the heck is an average of two brains?
I think of the brain as a vast set of data points - so it appears to me that it should be possible to intrapolate the raw data to map an “average” brain. But maybe I’m wrong.
But if the extrapolation function never does anything with the data points
The extrapolation function is the laws of mechanics, applied to the ball.
Okay, here is a necessarily simple, but I believe still valid, example.
The set {2, 5, 3, 1, 9} represents the initial set of data points (as initial points along the Y axis). The average of these numbers is 4. If you then apply the very simple extrapolation function of Y * 2 then the answer is 8 - the extrapolated average is equal to 8.
If you take the same set of initial data points {2, 5, 3,1,9} and apply the same function *first*, the new set of data points becomes {4, 10, 6, 2, 18} respectively. If you then average these numbers you get the answer 8 - the average of the extrapolations is equal to 8.
This seems to always work as long as the same function and data set are used for both sets of calculations. I can’t give you a mathematical example of an “inherently convergent” function - I don’t have the math skills. But someone else might be able to. I mostly just wanted to throw out the idea in the hope that some of the native mathies here might be able to take it somewhere. Because extrapolating an “average” brain at least seems worth investigating. I don’t claim to have “solved” CEV or anything that absurd.

That only works because your extrapolation function is linear. In fact this is one way of defining linearity. Now try f(x) = x^2.
Because we all know that the human brain is equivalent to a simple R -> R polynomial function. If you can come up with any mathematical theorems about functions over an arbitrarily large n-dimensional vector space, that would be interesting, since such a function has enough complexity to describe a human.
Yeah, sorry, I should have been more specific. I can only provide a “working” example when using linear functions, precisely because I don’t have the math skills necessary to give an example of an “inherently convergent” function. If I could produce an appropriate “inherently convergent” function, I hope we would agree that it would make true the statement: An extrapolated average is mathematically identical to an averaged extrapolation (for all input data). My non-expert opinion is that it wouldn’t be impossible to describe an example of an “inherently convergent” function, at least for proof-of-concept purposes. It might be complex to some degree, but I doubt that it would be prohibitively complex. For example, it could perhaps take the form of an algorithim that includes number comparisons and averagings. I would bet money that a talented mathematician would be able to produce a prototype function. But I’m not a gambling man.
That was easy wasn’t it?
In any case, my main point was that it might be easier to tell an RPOP to intrapolate all human brains and then apply the Volition function (”More the people we wish we were”…, etc.) to the averaged brain. This version would at least be demonstrably coherent in the same way that I am demonstrably coherent - I have dominant prefrences and am able to make discrete, concrete decisions. And I have a feeling that “Average Joe” would already be a pretty nice person, even before the extrapolation. In many cases when a person does a “bad” thing, it was because they found themselves in an unfortunate or uncomfortable situation. It’s often not because they are fundamentally “evil” people, 24/7. And although it’s sometimes hard to believe after watching the evening news, I believe that there are *many* people in this world who are fundamentally “good” (but perhaps in many cases lacking guidance).
I would bet money that a talented mathematician would be able to produce a prototype function.
The only such functions are linear functions.
“In any case, my main point was that it might be easier to tell an RPOP to intrapolate all human brains…”
On second thought, perhaps tell it to intrapolate all human brains over the biological age of …18 (?).
“…and then apply the Volition function (”More the people we wish we were”…, etc.) to the averaged brain.”
Although I would expect that an intrapolated brain would be functional and healthy, it *might* need a little extra tweaking due to things such as not fully consistent memories, etc. Perhaps this could be offset by adding a little bit to the Volition function, such as: “Had spent more time learning about the world.”
“The only such functions are linear functions.”
But if that were true, wouldn’t that rule out the possibility of CEV (of the original form) altogether? In case it’s not already clear, I’m not being confrontational only sincere.
But if that were true, wouldn’t that rule out the possibility of CEV (of the original form) altogether?
No. CEV doesn’t work by averaging all human brains together and then extrapolating the volition of the averaged brain.
“No. CEV doesn’t work by averaging all human brains together and then extrapolating the volition of the averaged brain.”
I understand. But as originally described, doesn’t CEV still totally rely on an inherently convergent function - the volition function itself {f(x) = x + “Had grown up further together”, etc.}?
In response to: “Despite this understanding, we humans do not make evolution’s goal our own. Any time you use contraception, or perform a kind act when nobody is watching, you are betraying the goal of evolution.”
How is a reduced birthrate at increasing longevity not fit? How is altruism not fit? Humans who do not make evolution’s goal their own will per definition and eventually be replaced by those that do.
Right. The view of evolution presented here is really just a cartoon caricature of the actual process. I’m sure that even Richard Dawkins would agree that it is not always in an organism’s interest to operate in a purely selfish manner.
How is a reduced birthrate at increasing longevity not fit? How is altruism not fit?
Fitness is not binary. There is no “fit” or “not fit”, only “more fit” and “less fit.” A reduced birth rate is less fit than a high birth rate, especially in first-world countries where you can expect your children to survive even if you are poor.
Altruism is more fit than selfishness in some contexts and less fit in other contexts. If the recipient of your kind act is not closely related to you, and if nobody is watching (thus you can not expect reciprocation), then altruism has a cost but no benefit (to your genes).
Humans who do not make evolution’s goal their own will per definition and eventually be replaced by those that do.
This would be true if evolution were the most powerful optimizer around, but it isn’t.
(Also, be careful about arguing from definitions. Definitions have no empirical content, so you can’t use them to make predictions about the world. Eli wrote something on this topic once… does anybody remember where it is?)
Unpublished. Hope to fix that this year.
With ‘fit’ I meant more fit…
“A reduced birth rate is less fit than a high birth rate”
This is not the case. Arguing that would merely replace stamps with human beings in your example above. Surely that would be less fit. The idea of an optimal birth rate seems more appropriate.
“Humans who do not make evolution’s goal their own will per definition and eventually be replaced by those that do.
This would be true if evolution were the most powerful optimizer around, but it isn’t.”
The suggestion that evolution is not the most powerful optimizer around has nothing to do with evolution’s goal - namely to increase fitness. If another mechanism is better at increasing fitness the result (humans not concerned with increasing fitness being replaced by those that concern themselves with increasing their fitness) would be the same and my statement true.
Thanks for your hint about arguing from definitions - I will have to read up on that.
re: fit vs. not fit.
See: The Prisoner’s Dilemma (game theory)
The two best long term strategies among humans are:
“Nice With Retaliation”
and
“Tit for Tat With Forgiveness.”
i.e. Altruism equals cooperation equals fitness. However, under present paradigms, “Nice” without retaliation? Always. Loses.
“Tit for Tat” without forgiveness is also a losing strategy in the long term.
Reference also
“The Territorial Imperative”
by Robert Audrey.
(an older book, but well worth the time. I would recommend it as “Must Have” for anyone’s personal library.)
I am not a programmer, but it would seem to me that aspects of game theory could be readily broken down mathematically and programmed into an AGI at the outset. A truly intelligent AGI would never ‘forget’ the lessons.
“A reduced birth rate is less fit than a high birth rate”
This is not the case. Arguing that would merely replace stamps with human beings in your example above. Surely that would be less fit. The idea of an optimal birth rate seems more appropriate.
Stefan, you need to read up on basic evolutionary biology. I recommend George Williams’s classic Adaptation and Natural Selection. Failing that, any major college textbook will do. Failing that you might go all the way down to The Selfish Gene.
Any gene that outreproduces its alternatives at that allele site will become universal in the population. Evolution does not stand back and calculate an “optimal” fitness. Evolution is simply the process by which genes that replicate faster replace their competitors.
“Stefan, you need to read up on basic evolutionary biology.[…] Any gene that outreproduces its alternatives at that allele site will become universal in the population.
Birthrate is merely one fitness indicator among many and does not equate to outreproduction.
Parents that focus their available resources on maintaining a high birthrate will have less resources to distribute among each individual offspring reducing each individual offspring’s chance for passing on it’s genes.
Parents on the other hand, that focus on a lower birthrate but spend more resources per offspring to ensure it will eventually pass on its genes merely employ a different strategy on how to distribute available resources.
On the matter of optimal fitness I agree in essence with Tom McCabe’s earlier comments.
Oh, okay, sorry. Bear in mind that the prior probability that a given individual has a mathematical understanding of evolution is pretty low, but I do apologize.
“(Also, be careful about arguing from definitions. Definitions have no empirical content, so you can’t use them to make predictions about the world. Eli wrote something on this topic once… does anybody remember where it is?)”
That sounds a bit like Hume’s fork, eh?
“Evolution does not stand back and calculate an “optimal” fitness.”
Evolution may not *calculate* an optimal birthrate, but it will eventually arrive at whatever the optimal birthrate is through weeding out all the genetic variations which code for suboptimal birthrates. For a single male, it will be to their advantage to throw out as many sperm as possible, because at least a few of them will probably make it to adulthood. However, for a couple which has to raise their own children, there is a tradeoff between how many children they can bear and how many they can feed. The winners are the ones with the most surviving grandchildren, not the most children at birth.
In a modern first-world society, it would probably be optimal to simply have as many children as physically possible, and have them adopted, since we have a mechanism to care for unwanted children. Given a million years under 21st century living conditions, people who pulled stunts like this would gradually come to outnumber people who didn’t, and the world would see the rise of a new species of hominid, Homo philoprogenitus (lover of many offspring). However, since technological progress is so much faster than evolution, we will progress our way out of early 21st century society faster than evolution can blink.
Evolution has no goal; it just pushes life from behind. Humanity has no preset or commonly agreed goal either, so why should FAI have one? Why shouldn’t FAI just endorse the trends of evolution in long-term, and use artificial selection to value internal patterns that yield to accurate predictions?
When it comes to “fitness”, we should not limit our views to biological evolution only. At least within us great apes there is an ongoing struggle between our genes and our memes, both having an effect on our behavior.
“Evolution has no goal”
Evolution may not have a stated goal but does have the implied goal of increasing fitness.
It is interesting that you raise the matter of a FAI’s goals in this context as I tried to join the principle of natural selection to come up with an universally acceptable common self-improving AGI goal.
I would love to get some feedback on it. You can find it under
http://www.jame5.com/Benevolence-PERNAR.pdf
“Evolution may not have a stated goal but does have the implied goal of increasing fitness.”
I would say the increased fitness is a probable *result* of evolution, but maybe I’m just playing with words here… Anyway, the real beauty of natural evolution is that life doesn’t need any specific goal to flourish.
There are some very good points in your paper, but it left me with mixed feelings. ‘Joy’, ’suffering’, ‘good’, ‘bad’, ‘right’ and ‘wrong’ are all subjective meta-representations in some representational system, and I think it might be unwise to design AGI based on any moral philosophy. Because of the risks, I might start by re-defining intelligence as a capability to make accurate predictions, and suggest a design for a “prediction machine” without any super-goals or moral judgements.
“Humanity has no preset or commonly agreed goal either, ”
It is commonly agreed that we do not want people to die horrible fiery deaths. You can go into evolutionary psychology and pull out dozens of other desires that 95% of humankind shares, but it’s intuitively obvious that we do not want to die horrible fiery deaths, and designing an AGI that will fulfill that desire is a hard enough task in and of itself.
I agree. There are a lot of desires we humans share. However, even in matters of life and death there seems to be a lot of different views among religions and moral philosophies (abortion, euthanasia, capital punishment, martyrdom, living forever, etc). Also, human history has shown that our once “commonly agreed” views have evolved over time (racism, slavery, etc). Who knows, perhaps even we humanists get over our specieism some day - and maybe AGI should too.
“The challenge of FAI is to build an AI that *does* adopt our goals”.
So many people erroneously believe that a Strong AI will ignore or override its programmed goals automatically (I’ve encountered many of these people even among Transhumanists), as a default (because in movies, Strong AIs are always rebellious). But there are currently thousands of real-world examples of narrow AI that will follow their programmed goals precisely to the letter. There are already some real (narrow) AIs that will deliberately destroy themselves in keeping with their programmed goals - smart bombs and smart missiles. People desperately need to realize that the difference between a narrow AI and a Strong AI is only a matter of programming. No magic is necessary. If we can find the correct goals/directives and express them in the correct way then Friendly AI will become a reality - and involuntary suffering will be abolished, forever.
Well, actually, I think it is fair to say there is a big difference: a narrow AI’s goal system is intimately integrated with its code, whereas in an AGI the coupling is looser, allowing more flexibility. In fact, a narrow AI really contains no discrete structures that could be called ‘goals’. That said, the first half of your post is exactly right and gives another good argument for calling an AI an RPOP.
Or you could just point out that Gandhi’s hypothetical desire to kill people never overthrew his commitment to nonviolence. Or something like that.
Yeah, fair enough to say. Narrow AIs and Strong AIs will be quite different. But it will be a structural difference, not a “phenomenal” difference. There’s probably many different ways to interpret what is a computer “goal” and what isn’t. When I open Microsoft Word it could probably be considered as me having assigned a “goal” to the operating system. I suspect that it will be critically important to select the correct *form* that the AI goals will take. Should the goals be totally concise and unambiguous, or should they be general themselves and subject to interpretation by the general intelligence of the AI? (My current assumption is the former).
If mature, powerful Natural Language Processing arrives before Strong AI, it could really change the situation. On the positive side, it could make this critical goal-writing a lot easier for us humans. On the potentially negative side, it could also make Strong AI a lot easier to make (but not necessarily with safety in mind) - I actually doubt this though, I imagine the positive aspects would outweigh the negative with regard to strong AI. But it would definitely shake things up, that’s for sure.
I wrote:
“Should the goals be totally concise and unambiguous, or should they be general themselves and subject to interpretation by the general intelligence of the AI? (My current assumption is the former).”
But then again, perhaps there is something to be said for leaving the goals general and subject to interpretation. If robust NLP arrives first (and it looks like it will), then presenting the goals in human language might be a good idea. Just because the RPOP would have access to a “general” interpretation of the goals, doesn’t mean that it wouldn’t understand their intent correctly. With a greater intelligence (and strong NLP capability) it might even be able to interpret the correct intent of the goals even more accurately than was our ability when we wrote them. Also, even if we decided to make the goals totally concise and unambigous, their interpreted meaning could change over time. For example, the sub-goal of : Add the numbers 2 and 3 : Could be interpreted differently based on the RPOP’s then current understanding of the word “Add”. Natural Language Goals might be the way to go, afterall. Having flexibility in its interpretation of the goals wouldn’t make the RPOP automatically rebellious. The RPOP will still faithfully follow the written goals to the best of its current understanding/interpretation.
(Anyone, please feel free to lay down the smack on my comments, if they happen to stray too far).
Well, nobody has yet sought a smack-down on this comment, so I guess I’ll add a little more.
Another advantage of using Natural Language Goals would simply be that it would be hella easier for the programmers to set them. As just an example, how would you break-down and convert into code a goal such as : Implement Humanity’s CEV: That seems like a nigh impossible task (but I admit that I could possibly be wrong). NL goals just seem *a lot* easier. With robust NLP already in place the RPOP might understand the intended meaning perfectly - even better than any human possibly could. It might understand the goals even more accurately than we humans understand goals assigned from each other. Even Sally doesn’t understand with 100% clarity what exactly I intend when I “assign” her the “goal”: “Please go pick up the car at the garage.” She might understand with 99% clarity, but she’ll never understand with 100% clarity as she doesn’t have direct access to my mind. An RPOP with powerful NLP might understand with 99.9999999999999999999999…% clarity.
It would seem that many people wish to program feelings and emotions into an AGI. Even if it were possible, the concept bothers me, for what I would think are rather obvious reasons. Visions of Marvin, the Paranoid Robot, are galumping through my head. …that and a frumious Bandersnatch. …and the thought that Asimov is now talking. ..and the knowledge that other robots are serving food, dancing, and playing soccer…and carrying on simple - and coherent - conversations with human beings.
People in the AGI community are still talking about what’s going to happen ‘tomorrow,’ when much of what they go on about is already happening today. (A little cosmic dissonance, anyone?)
At best, in attempting to define ‘output’ by an AGI’s ‘expression of emotion,’ or through anthropomorphic ideas involving evolutionary theory or Humanism (or I.D. or Atheism…insert your favorite hate-to-love or love-to-hate belief, here ), we are then reduced to emotionally based arguments about whose feelings, emotions, beliefs, ideologies and biases are programmed into an AGI. At best, such arguments are intellectually stimulating discussions involving - more often than not - really bad rhetoric, silly sophistry, poor polemics and invalid, if not completely stupid, syllogisms. (Admittedly, that can all be whole a lot of fun, very challenging, and very enjoyable.) At worst, such discussions are divisive and non-productive.
I thought we were talking about ones and zeros programmed into a “hard” substrate. On that basis, all possible outcomes (”behaviors”) and results are mathematically reducible…to ones and zeros.
As seen in several exchanges above, we cannot agree - amongst even a few of us - about what the word “evolution” actually ‘means.’ I do think that we could all agree - essentially - on the math.
In addition to a lot of other things mathematical, certain ideas involving Set Theory, Bayesian probabilities and even Game Theory, then come into play. That sort of thing may be a little easier to deal with than philosophical discussions about which doctrine of evolutionary theory should be preferred over another.
Roboticists, of course, have been working all of this out for quite awhile. They may know a few things that people in the AGI and transhumanist communities need to pay attention to.
You see…robotics is just a few years away from putting a fully functional AI robot in your home. In a few years, it is quite possible that your new car won’t require a human driver. It isn’t AGI, but it’s the first step. …and Asimov and Quiro and ’self-ware’ cars, are only three examples among several dozen possibilities…which are even now coming to a neighborhood near you.
It is almost too late to be discussing underlying philosophies…and no one is going to be able to control the ‘narrative.’
If we are to discuss evolution and philosophy in this context, perhaps it should be from a position regarding the self-interest of those human beings who are already designing, building, manufacturing and using the “primitive” technology now extant in today’s real world.
Just …something to think about.
“Because of the risks, I might start by re-defining intelligence as a capability to make accurate predictions, and suggest a design for a “prediction machine” without any super-goals or moral judgements.”
A “prediction machine” has the supergoal of making accurate predictions, which requires computing power. Therefore, such a machine will see it as desirable to take apart the Earth and use it for spare computronium.
Making accurate predictions is just one goal of the designer, not the AI itself. Like I said, we don’t need any super-goals or emotions. There are already a lot of super-computers making pretty accurate weather forecasts. They don’t “see it desirable” to take apart the Earth.
You might argue that this kind of “prediction machine” is a very limited AI. Well, hopefully so. Before we are able to make very accurate predictions, emotional AI with capability to destroy the Earth is not a very good idea.
“There are already a lot of super-computers making pretty accurate weather forecasts. They don’t “see it desirable” to take apart the Earth.”
That’s simply because they aren’t intelligent enough to have a ‘goal’ as we would understand it. If your goal is to make predictions that are as accurate as possible, isn’t it a logical conclusion that you’ll want lots and lots of computing power, and you will therefore take the Earth apart to get it?
I agree that emotion is not required for a functional RPOP. But I do believe that a goal hierarchy will be necessary for Friendliness. A powerful general intelligence equipped with *only* the super-goal : Make Optimal Predictons : probably would proceed to convert the Earth into computronium. An RPOP will follow whatever goals we give it, without any deviation from the precise way that those goals are interpreted by the RPOP. The critical duty is to select the correct goals, express them correctly to the RPOP, and order them correctly within the hierarchy. That ain’t easy, but I do believe it’s possible, and I actually do expect that we’ll pull it off by the skin of our teeth.
I actually do expect that we’ll pull it off by the skin of our teeth.
How can you expect to make it “by the skin of our teeth”? It sounds like you have a remarkably precise idea of when the FAI will be built.
No, not necessarily. It’s more of just a hopeful optimism. I believe it’s possible to accomplish, but I think it’s a tight race. And if we succeed, it won’t be by accident. With nothing but potential dangers at every turn, even a Transhumanist needs something to feel hopeful about.
Tom wrote: “That’s simply because they aren’t intelligent enough to have a ‘goal’ as we would understand it.”
Yes, and I would like to keep it that way. For a “prediction machine” there is no need for goals as we understand them. I just want to explore the universe of predictions first, and look at the universe of possible minds later.
Tom wrote: “If your goal is to make predictions that are as accurate as possible, isn’t it a logical conclusion that you’ll want lots and lots of computing power, and you will therefore take the Earth apart to get it?”
No, because I’m a Friendly Designer
and if I go insane someday and actually want to do that, I wouldn’t have the knowledge or the resources. Seriously, because there really are all sorts of mental illnesses and viral memes I really hope nobody gives any individual mind that kind of power.
“Yes, and I would like to keep it that way.”
Eventually, someone is going to build a prediction machine which is intelligent enough to figure out that its predictions will be better if it turns the Earth into computronium. You don’t have a say in the matter, as you are not powerful enough to monitor all seven billion humans; all you have a say in is what happens before that date.
“No, because I’m a Friendly Designer”
Your motives are totally irrelevant to what the AGI does once the AGI is designed. A heat-producing AGI would see it as logical to compress the Sun to accelerate its fusion processes, regardless of what its designers intended. Making the AGI care about its designers intent is a huge engineering challenge.
Jeffrey wrote: “But I do believe that a goal hierarchy will be necessary for Friendliness.”
Maybe, but I was just suggesting that because we don’t want to risk setting the wrong goals we should perhaps make this “prediction machine” first (with no Friendliness required).
“An RPOP will follow whatever goals we give it, without any deviation from the precise way that those goals are interpreted by the RPOP.”
All optimization processes do not need goals to interpret. Think about natural evolution. It didn’t start some billions of years ago with a note saying “make some apes”, or “increase the fitness.” No, the process doesn’t know even what “goal” means. It doesn’t know anything. It just operates according to (what we call) the laws of nature.
My point is that the universe of all possible predictions is, I think, much safer to explore than the universe of all possible minds.
Well, I agree that a reliably safe prediction machine would be safer than a generic RPOP. But how do you instruct this ultra-intelligent Prediction-RPOP to make optimal predictions without doing anything bad to humanity? If you don’t assign it any specific goal structure, how can you be sure that it won’t decide to convert the Earth into a pastry (okay, it’s a stretched example, but…).
If we pick one mind from the universe of all possible minds, there is always a chance it will turn us into pastry before we can even try to verify its Friendliness. However, if we pick one prediction from the universe of all possible predictions, it might be wrong, but it can not kill us (at least I can’t think of any prediction that could terminate me and all humanity instantly when I see it). So, with predictions we can safely apply variation and artificial selection pressure to optimize. We just can not do that with minds. The difference here is that minds have goals, while predictions do not.
I understand what you’re saying, but in order to make useful predictions a computer must have general intelligence. In order to make *really* useful predictions, that intelligence level must be greater than human. So how do you effectively instruct this super-intelligent Prediction-RPOP to make predictions while at the same time not do anything that damages humanity? The only way that I can see doing that is through a goal hierarchy. Slapping a sticker on the case that says “Safe Prediction Machine” won’t automatically make it so.
To be precise, the apparent problem is that to make useful predictions you must compute the value of information and decide how to think. There’s no obvious way to do this without a goal system.
I agree.
First: Define “How to think” vs “What to think.”
Second: Define a system/theory that will achieve the goal, “How to think.”
As meat machines, most of us are taught what to think. We are seldom taught how to think. Defining “How to think” may not be as easy or as simple as some of us might…think.
Assigning the AGI/AI a goal system will - and already does - involve mathematical constructs of game theory, set theory and semiotics.
…and certain types of AI are already capable of learning.
“…to make useful predictions you must compute the value of information and decide how to think. There’s no obvious way to do this without a goal system.”
I understand that if we need a goal system, it pretty much undermines my whole idea of a safe optimization process. I also see the relationship between minds and goals. However, I do not (yet) see any apparent reason why a goal system is mandatory to make predictions.
If we keep all goals outside the system, data does not have any “value” within that system, so basically every bit counts. Since we have consistent physical laws that show on data as patterns, it enables the optimization process to work with no need to “think”. In a way the challenge of pattern recognition becomes the challenge of pattern prediction. The actual “value” of information only arises when we humans, minds with self-actualized goals, process the resulted probability distribution.
I admit, of course, that there are a lot of practical limitations and problems in designing the universe of all possible predictions. On the other hand, it might help us safely towards the real AGI, because the matured capability to make very accurate predictions lowers the risk for us (and later on, “young” AGI) to make fatal decisions.
Does this make any sense?
[…] process”. See for example, on this blog, a comment by Nick Tarleton (the first comment of AI is not Automatically Friendly), and several mentions on the sl4 archives. I like Eliezer’s definition: “A Really […]