Recent Comments
SIAI Bloggers
  • Michael Anissimov Media Director
  • Ben Goertzel Director of Research
  • David Hart Director of Open Source Projects
  • Michael Vassar President
  • Eliezer Yudkowsky Research Fellow
Guest Bloggers
  • Seth Baum Pennsylvania State University
  • Nick Hay University of Auckland
  • Mitchell Howe Contributing Writer
  • Tom McCabe Yale University
  • Carl Shulman New York University
  • Peter de Blanc Temple University
Tag Cloud
academic academics accelerating change accelerating change agi AGI 08 ai Anthropic Reasoning anthropomorphism artificial intelligence artificial intelligence aubrey de grey barney pell biases BIL bloggers bloggingheads tv bruce klein catastrophic risks civilization conference conference agi 09 conference chairman conferences consciousness research conventions convergence convergence08 cto cynthia breazeal david hart director of research donations doug wolens eliezer yudkowsky eric baum esther dyson event horizon events evolution existential risks FAI feature length documentary films Friendly AI Friendly Artificial Intelligence future salon future shock futurist community goertzel google gsoc institute research fellow intelligence explosion interest journal interesting articles interviews intros JAGI jaron lanier john horgan justin rattner language search engine lesswrong life extension machine consciousness marcus hutter martin rees math mathematics media meeting microsoft mit morality nanotechnology natural language search neil gershenfeld new york times news office of naval research open letter open source open source open source projects opencog opencogprime optimization processes outreach papers peter diamandis peter thiel pitt podcasts prediction quantum computing radio ray kurzweil relevant articles research fellow risk roadmap school science science fiction shane legg SIAI singularity singularity summit singularity institute singularity summit spectrum talk transhumanism utilitarianism utility vernor vinge videos virtual reality pioneer volunteers xiamen university yudkowsky
Archives

The Stamp Collecting Device

June 11th, 2007Nick Hay

An avid stamp collector, who is also an AI enthusiast, decides to build a stamp collecting device. This margin is too small for the details, but the idea is simple:

  1. The device will be active for one year. It is connected to the internet, from which it sends and receives packets.
  2. The device has an internal model of the universe. This model captures how likely each state of the world is, can predict future packets received, and can simulate the effect of packets sent.
  3. For every possible sequence of packets, the model extrapolates the final state and counts the number of stamps collected.
  4. The device outputs the sequence leading to the largest number of stamps.

This is a powerful device. It models every possible course of action to output the best. Outputting a one kilobit packet per second, a single day has 286,400,000 = 1026,000,000 possible packet sequences. By comparison, the number of atoms in the observable universe is about 1080, and its volume is only 10426 cubic Planck units. There are a lot of possibilities to consider.

The vast majority of packet sequences have no significant effect. Invalid headers, unresponsive destinations, nonsensical requests. But some of them implement coherent plans. One sequence of packets leads to the winning bid on an online auction. 17 stamps. Another leads to several winning bids. 1,700 stamps. Yet another coordinates stamp collecting specialists by email. 170,000 stamps. Anything you can do on the internet in a day is included in a tiny corner of this search.

There is a problem. Because it’s easier to destroy than create, most sequences which lead to vast numbers of stamps are destructive. One sequence hacks into computers, directing them to collect credit card numbers and bid at stamp auctions. 170,000,000 stamps. Another sends a virus which makes all printers create stamps. 17,000,000,000 stamps.

Infecting the internet to collect stamps seems stupid. Won’t the device realize this wasn’t what the stamp collector had in mind? Won’t it ask whether there is a better goal than collecting stamps? Isn’t it satisfied with 170,000 stamps?

But we can answer this question, we have the device’s complete design. It doesn’t ask itself questions. It doesn’t think at all. It simply selects the output maximizing the number of stamps. The device is not well understood by analogy to humans.

So, the stamp collector powers up the device. And the world stops, filled with stamps.

Comments (30) (RSS feed)

Toggle comment visibility Comment by Seth Baum
Jun 12, 2007 11:14 am

I’m concerned about this device’s ability to predict outcomes, or at least measure them in stamps. Is this internal model of the universe (2) physically possible? Would it need to overcome chaotic effects or quantum uncertainty, and if so, would it be able to? Or is the argument here that the device would estimate the outcomes, just as we can (170,000,000, etc)? If so, this ability to estimate would have to exist independently of the ability to recognize the collector does not want it to turn the world into stamps. I’d be willing to believe that. I’d like to think that if we can build an AI with this stamp-estimating capability, we could also build it to be friendly, but perhaps I’m wrong and also that doesn’t mean that we would make it friendly.

I’m much more willing to believe that such a virus exists, although this is less my specialty.

 
Toggle comment visibility Comment by Peter de Blanc
Jun 12, 2007 12:42 pm

If so, this ability to estimate would have to exist independently of the ability to recognize the collector does not want it to turn the world into stamps.

Not at all. If you’re really that good at making predictions, you should be able to predict the stamp collector’s reactions. But the stamp collecting device’s main loop is not: “predict the outcomes of various actions, and select the action that will please the collector.” It is: “predict the outcomes of various actions, and select the action that gets the most stamps.” The device may very well extrapolate that the collector does not want to turn the world into stamps, but that is not the criterion the device uses for selecting actions.

 
Toggle comment visibility Comment by Nick Hay
Jun 12, 2007 3:48 pm

Is this internal model of the universe (2) physically possible? Would it need to overcome chaotic effects or quantum uncertainty, and if so, would it be able to? Or is the argument here that the device would estimate the outcomes, just as we can (170,000,000, etc)?

The internal model can be uncertain i.e. it actually computes the expected number of stamps not the actual number. If the model is accurate enough, expected and actual may be fairly close.

That humans can model the external universe is an existence proof that you can get around chaotic and quantum effects, at the cost of estimating things.

Think of the above description as an approximate specification of the design, not an implementation outline. A simple brute-force implementation would be intractable, far too slow to work in this universe. In the same way that I might specify a program for finding the solution to the equation 1 = 2000000/x as an exhaustive search trying all values of x rather than immediately returning x = 2000000, intractable specifications can have tractable implementations.

If so, this ability to estimate would have to exist independently of the ability to recognize the collector does not want it to turn the world into stamps.

As Peter described, it may model what the stamp collector does, e.g. cry out “I don’t like stamps that much!”, but this doesn’t mean the device will stop collecting stamps. You can see there is no part of the device which computes “what the stamp collector wants”, or indeed anything that singles out the stamp collector. The stamp collector is simply part of the external universe it is trying to collect stamps in, equivalent to all other non-stamp objects.

It would not compute what the collector *wants*, since this is subtle counterfactual question e.g. it would not compute the answer to “if the stamp collector realized I would fill the universe with stamps, how would he written my code differently?”. A better design should.

I’d like to think that if we can build an AI with this stamp-estimating capability, we could also build it to be friendly, but perhaps I’m wrong and also that doesn’t mean that we would make it friendly.

Stamp collecting is hard, but I think a device that does the right thing is far harder. For stamp collecting we need a device that can accurately model the external universe and identify stamps within its model. This is far from simple, but simpler than e.g. a device that works out what a person wants.

 
Toggle comment visibility Comment by Seth Baum
Jun 12, 2007 6:44 pm

OK, I’ll buy Peter’s and Nick’s argument that this AI might calculate the collector’s reaction and proceed anyways, but I won’t yet accept their certainty about it. Both appear to be assuming that such an AI will behave similarly to contemporary optimization algorithms, but I don’t think any of us know enough about knowledge and wisdom to assume that such a radically more powerful optimization algorithm would necessarily behave the same. Am I wrong?

Also, how much harder would it be for a device to figure out what a person want than to estimate stamps? I suspect negligibly so. The device already has internalized a very sophisticated model of the universe. Are our wants really that much more sophisticated? If our wants are simply functions of our neural anatomy, then probably not. Brains are complicated, but not complicated relative to the universe. But again, I don’t think we understand our minds well enough to answer the question. For all we know, dualism
http://en.wikipedia.org/wiki/Dualism_%28philosophy_of_mind%29
could be correct, in which case it may be impossible for a device to deduce our wants, although they probably could be estimated fairly easily.

A related idea: Build a narrow AI designed to deduce our wants. Test. Use results to guide general FAI design.

 
Toggle comment visibility Comment by Roko
Jun 13, 2007 7:40 am

I’d love to know what the difference between “deducing” someone’s desires and “estimating them fairly accurately” is. I’d even go so far as to say that all Seth Baum does is “estimate other people’s desires fairly accurately”.

I don’t think dualism is going to make the slightest difference here.

I think we just have to swallow the bitter pill that Nick is feeding us here: it’s far easier to build a GAI that destroys the entire planet than a GAI that does anything else, and this holds true even if you weren’t trying to code the GAI to destroy the planet.

It’s almost like once you’ve got a GAI, destroying the whole world with it is as easy as falling off a log.

 
Toggle comment visibility Comment by Seth Baum
Jun 13, 2007 11:21 am

By “deducing” desires I meant figuring them out from the underlying neural anatomy. If dualism holds then this may be impossible. For what it’s worth, I find dualism unlikely.

I certainly do not deduce desires, nor does any other human. I do estimate other people’s desires, although I may not do so particularly accurately.

Destroying the world with GAI might be easy. I only mean to suggest that, from this example, *not* destroying the world might also be easy. Of course, unless an FAI eliminates the possibility of creating a UFAI, we’ll have to not destroy the world every time or else game over.

 
Toggle comment visibility Comment by Eliezer Yudkowsky
Jun 13, 2007 1:10 pm

You ask me for lemonade. I go to the refrigerator and there is no lemonade but there is water and orange juice. I decide that if, *contrary to fact*, you knew there was no lemonade and that there was water and orange juice, you would want the orange juice. So I bring you the orange juice.

It is a very highly sophisticated form of helpfulness which humans take for granted, just as humans used to take vision for granted before the AI programmers discovered how hard it was to actually implement, and how much hidden complexity underlay the simple-seeming act of “just look and see what’s there”.

On the Computer Stupidities website, there is a story of a novice programmer who looked over his program’s output, puzzled, and said: “You know, I don’t think the compiler is paying attention to any of my comments.” Even if the compiler could, in principle, understand, the phrase “could” is dangerous - it involves factoring the mechanism into capability and motive; and a human may automatically substitute the motive. In reality, there is no “could”; either the compiler does something or it does not. The code is not given to the AI, for the AI to look over and hand back, like giving written instructions to a human employee. The code *is* the AI, the way that you are your neurons. When you look over a cake recipe, you think of the results and whether they will be what the recipe-writer intended; but you do not think over your neurons before they are allowed to fire. For that matter, you don’t particularly care “what evolution intended” when it conflicts with more humane moralities.

Figuring what someone *would want* involves executing a complex counterfactual, on a particular level of abstraction, involving particular “standard” alterations. It’s a lot of computational work that has no direct predictive power - it’s a counterfactual, not a factual - and you have no reason to execute that computation unless your utility function contains an explicit term mentioning it, or related to it.

 
Toggle comment visibility Comment by Seth Baum
Jun 13, 2007 6:10 pm

Is choosing orange juice all that sophisticated? It could just come from a relational set up a la Cyc: orange juice is more similar to lemonade than is water since both are citrus-based. Water would in turn be closer than, say, carrots since water is a potable liquid.

Meanwhile, us humans often botch such guesswork- ever gotten a bad gift? Here, perhaps the person actually preferred water to orange juice. We would then guess wrong unless we had this prior knowledge- ditto for the AI.

you don’t particularly care “what evolution intended” when it conflicts with more humane moralities.
Me, no, but creationists, yes.

 
Toggle comment visibility Comment by Eliezer Yudkowsky
Jun 14, 2007 12:54 pm

Re: the “relational setup a la Cyc” - speaking as a matter of AI, it’s not that simple. The result has to be substitutable into a goal, not just share the maximum available number of surface properties. You, yourself, evaluated “similarity” on-the-fly, in a specific task-oriented context. If I’d given the hypothetical example of substituting into a recipe, you would be telling me that lemons are more similar to lemonade than is orange juice.

It is a *general* caution in AI work to beware the seeming simplicity of that which you have no trouble doing yourself.

Humans often botch such guesswork, but hell, we botch a lot of guesswork. From the SI/CEV standpoint, the main thing is (a) reducing it to a question of simple fact that can be fed to a superintelligence with far superior data and computing power; and (b) producing calibrated, non-overconfident probability distributions over it, so that the SI/CEV can be properly uncertain and act accordingly.

 
Toggle comment visibility Comment by Seth Baum
Jun 15, 2007 6:21 am

re “The result has to be substitutable into a goal”: I suspect a Cyc-like device could easily parse the phrase “get me lemonade” or any (reasonable?) variant into a goal of retrieving lemonade. Given that, all that would be needed is a fairly universal

if goal_x can be achieved then
do goal_x
else
ao_x = observe_available_options
nbt_x = nextbestthing(goal_x,ao_x)
do nbt_x
end

nextbestthing(goal,ao)
parse goal into option valuing criteria
find a member of ao with highest value
end

observe_available_options would be along the lines of a difficult yet manageable computer vision problem, which is going to have to be solved anyways if a robot is to get us objects from a refridgerator. A Cyc-like device should be able to do nextbestthing.parse. A traditional optimization routine should be able to do nextbestthing.find, with the help of the Cyc-like relational database to evaluate the ao’s.

Of course, this does not directly help us avoid turning the world into stamps. In that example, were the input “go online and get me as many stamps as you can”, the parser would need a more sophisticated understanding of “me”. Perhaps a little CEV here would help. The question is how much.

 
Toggle comment visibility Comment by Seth Baum
Jun 15, 2007 6:24 am

Sorry, the indents (double spaces) on the above pseudo-code did not display properly. Substituting periods (.) for spaces ( ) at the beginnings of lines, that should look like

if goal_x can be achieved then
…do goal_x
else
…ao_x = observe_available_options
…nbt_x = nextbestthing(goal_x,ao_x)
…do nbt_x
end

nextbestthing(goal,ao)
…parse goal into option valuing criteria
…find a member of ao with highest value
end

 
Toggle comment visibility Comment by Tom McCabe
Jun 15, 2007 7:40 pm

“In reality, there is no “could”; either the compiler does something or it does not.”

There is a real-world distinction here; “could” here refers to something the compiler would have done if the compilation had slightly different starting parameters. “Could” refers to capabilities which will be activated given slightly different starting conditions, as opposed to capabilities that would require different starting conditions going much farther back in the causality chain (eg, the programmers decide to implement a Do-What-I-Mean program while they’re writing the compiler). “Could” is thus just another fuzzy context-dependent way of saying “somewhere around here on a scalar spectrum”, in much the same way as “smart”.

 
Toggle comment visibility Comment by Tom McCabe
Jun 15, 2007 7:46 pm

“It could just come from a relational set up a la Cyc: orange juice is more similar to lemonade than is water since both are citrus-based.”

This only works in very limited, logical, sandbox-type situations. Suppose that I ask for twenty kilos of U235 for a fission reactor, and the AI decides the closest thing is twenty kilos of Pu239? Oops, there goes the city.

“We would then guess wrong unless we had this prior knowledge- ditto for the AI.”

Except when we guess wrong, it usually doesn’t involve such severe consequences. No human has ever had the power that we’re giving to the AI. We must design a system to do the job better than we can, ala Deep Blue, if we’re not going to get blown up.

“I suspect a Cyc-like device could easily parse the phrase “get me lemonade” or any (reasonable?) variant into a goal of retrieving lemonade.”

Deducing what humans mean from what they say is a horrendously complex task. Anyone who says otherwise should be forced to read everything after it has been passed through an auto-translation program like Babelfish, which we’ve been working on full-time with an entire programming team for years now.

 
Toggle comment visibility Comment by Seth Baum
Jun 16, 2007 7:22 am

“could” here refers to something the compiler would have done if the compilation had slightly different starting parameters.

I couldn’t (cough) have said it better myself. We seem here to be debating whether an AI will have free will. After all, “could” any of us possibly do anything other than the exact actions that we end up taking, even though it seems like we “could”?

……

Re “twenty kilos of U235″: Again, a human would make the same mistake unless properly trained; an AI would not make the mistake if properly trained. Let’s not forget that human intelligence is also often quite narrow (i.e., we specialize). It appears that telling the wrong AI to build a reactor is like telling the wrong human to make a reactor. Us humans are pretty good at not telling lawyers to build reactors and not telling engineers to defend us in court. We’re also pretty good at not washing our clothes in the television and not staring at the washing machine for hours. Why would we all of a sudden make these mistakes with AI?

I guess this would be to criticize the part of the Stamp Collector example in which the stamp collector tells the AI to go get stamps in the first place. OK, one might say, some stamp collector might be that dumb. Then the question is, is whoever builds such an AI dumb enough to release it to the public? We keep our nuclear secrets secret for the same reason. AI is of course probably far easier to spread/harder to contain than nuclear since it’s based on computer hardware/software not exotic materials, which just makes it a (probably) much more difficult case of the same class of problem. This difference could of course prove fatal, as I’m sure everyone here recognizes.

I should add for the record that I’m all for building strong safety measures into AI. While I’m not convinced the stakes are as high as some suggest, I’m also definitely not convinced that they’re not as high as some suggest, which is why I’m here.

Except when we guess wrong, it usually doesn’t involve such severe consequences. No human has ever had the power that we’re giving to the AI.

I’d say, “that we might give to the AI” or “that we suspect we can give to the AI”. This is hardly a fait accompli.

Re “Deducing what humans mean from what they say is a horrendously complex task.”: I agree. I should have been more clear that the sub-algorithms (interpret goal, observe available options, determine next best thing) are themselves extremely difficult (and again, hardly a fait accompli). Thanks for pointing this out.

 
Toggle comment visibility Comment by Tom McCabe
Jun 16, 2007 8:54 am

“We seem here to be debating whether an AI will have free will.”

More on free will here.

“Again, a human would make the same mistake unless properly trained;”

Human performance is not the standard here. We must write an AI that will do better than any human, or we’re toast.

“It appears that telling the wrong AI to build a reactor is like telling the wrong human to make a reactor.”

We’re talking about the Sysop, the first superintelligence, right? Such an AI will be faced with those kinds of challenges, because it has to be responsible for overseeing a world of six billion people. Even if you protected against stupid accidents with fissile materials, there’s a million ways to do something stupid that will kill people, and you can’t design for them all in advance.

“Then the question is, is whoever builds such an AI dumb enough to release it to the public?”

The AI-builders might simply keep making the AI more powerful until it jumps out onto the Internet and proceeds to wreak havoc, without even making a conscious decision to “release” it. Such is the danger of fiddling with intelligence.

“I’d say, “that we might give to the AI” or “that we suspect we can give to the AI”. This is hardly a fait accompli.”

That statement should have been phrased differently; something like “power the AI will just take for itself regardless of what we want”. See here for why even a “merely” human-equivalent AI will have tremendous power even when we don’t intend for it to have that power.

 
Toggle comment visibility Comment by Tom McCabe
Jun 16, 2007 8:54 am

Someone please fix the first link. Thank you.

 
Toggle comment visibility Comment by Eliezer Yudkowsky
Jun 16, 2007 12:51 pm

Seth, I see that you were a PhD student in NEU’s Electrical Engineering department. Electrical engineering isn’t very complicated, right? I mean, it’s just:

while device is incomplete
…get some wires
…connect them

The part about getting wires can be implemented by going to a hardware store, and as for connecting them, a soldering iron should do the trick.

(We all understand the difficulties of only our own profession.)

 
Toggle comment visibility Comment by Seth Baum
Jun 16, 2007 7:20 pm

Tom, thanks for the links- interesting reads.

“Such an AI will be faced with those kinds of challenges, because it has to be responsible for overseeing a world of six billion people.”

OK, but now we’re not talking about collecting stamps, fetching beverages, or even building reactors. No one human is charged with such a lofty task, so we’re already using a standard beyond (an individual) human intelligence, and we still run non-negligible chances of wrecking the whole thing- hence this conversation. (Technically, reactor design is also a multi-person project.) And of course, I agree that designing any AI that will manage to avoid all the stupid ways it could do very bad things is a (the?) critical task. I’m not disputing that an AI may have capacities which far outstrip ours.

The AI-builders might simply keep making the AI more powerful until it jumps out onto the Internet and proceeds to wreak havoc, without even making a conscious decision to “release” it.

I’ve heard this before, and this is very concerning to me. This is not a strategy thread, but at some point the matter of how to handle this situation should be discussed. Feel free to send me a private email if you’d like to talk more about this. (sethbaum — gmail)

……

Eliezer, regarding beverage fetching, I’ll repeat from above: The sub-algorithms are themselves extremely difficult, or at least are by contemporary standards. But my impression (correct me if I’m wrong) is that the hard parts of beverage fetching are the language processing, the computer vision, and the Cyc-like relational structure, all of which are making slow but steady progress these days, so this may not be a difficult AI/robotics problem in the foreseeable future. Let’s at least give the mainstream AI community what credit it does deserve.

As for electrical engineering, you’re missing a lot of steps. The entire algorithm is here. The complex social parts (eg identifying a need) seem by far the most difficult to program. But in the spirit of the beverage fetching example and your brief pseudocode, you could just do the narrow device-building part, whose algorithm looks like

1. get design goal
2. intpret design goal
3. determine design
4. get some wires (resistors, capacitors, etc.)
5. connect them

1 is trivial i/o. 2 is essentially the same NLP/Cyc-like gadget as the beverage fetcher uses. 3 is an optimization problem which appears to have already received attention. 4 & 5 are robotics similar to that needed for beverage fetching. So no, I would not say (this portion of) electrical engineering is particularly difficult to program.

……

Returning to the original Stamp Collector example, I think the question still stands: Is friendliness/CEV/etc substantially more difficult than giving an AI an internal model of the universe which includes us? I’m not convinced.

Perhaps this depends on if we feed the model to the AI or if we feed it the ability to obtain the model (or the ability to develop the ability to obtain the model, etc) and it then obtains the model itself. But if we’re not feeding it the model, how plausible is it that such an AI would be in a position to receive a stamp collector’s request in the first place?

 
Toggle comment visibility Comment by Tom McCabe
Jun 16, 2007 10:46 pm

“OK, but now we’re not talking about collecting stamps, fetching beverages, or even building reactors.”

So what? An AI designed for building stamps, if given even limited programming ability, could very well wind up crashing the Internet. An AI designed for building stamps, if given even limited general intelligence, could very well wind up destroying all life on Earth. That’s the whole point- simple AIs will get a lot more power than we expect them to intuitively, and then will use that power to wreak havoc.

“And of course, I agree that designing any AI that will manage to avoid all the stupid ways it could do very bad things”

This isn’t a very good way of phrasing it, in my non-professional opinion. Thinking in terms of “how can I avoid the AI making stupid mistake #61″, in the same way that you would steer through an obstacle course, will just lead to catastrophic failure as there’s always going to be a stupid mistake you didn’t think of. You need to give the AI the same sense of “oh, this might be a stupid mistake” or “oh, this is really important so I had better mathematically prove there are no stupid mistakes” that humans have, so that the AI can catch stupid mistakes we wouldn’t.

“But my impression (correct me if I’m wrong) is that the hard parts of beverage fetching are the language processing, the computer vision, and the Cyc-like relational structure, all of which are making slow but steady progress these days, so this may not be a difficult AI/robotics problem in the foreseeable future.”

Again, ordering a simple robot to go up and get you a beverage is radically different than ordering an intelligent AI to go up and get you a beverage. For example, the said AI could calculate that the fastest way of getting you the beverage is to put an RPG into the refrigerator, thus putting liquid in your mouth at around five thousand meters per second. And a simple translate-words-into-goals AI wouldn’t even see what was wrong with this; you asked for a beverage, and you got a beverage a hundred times faster than any human could have gotten you one, so what’s the difficulty?

“But in the spirit of the beverage fetching example and your brief pseudocode, you could just do the narrow device-building part, whose algorithm looks like”

If we’re going to write algorithms that are this vague, why don’t we simply say “Step 1. Do good stuff.” and be done with it?

“Is friendliness/CEV/etc substantially more difficult than giving an AI an internal model of the universe which includes us?”

Yes. Once the AI has that universe model, what’s it going to do with it? It could use the information to exploit human vulnerabilities as easily as it could use it to help us. A universe model is really just a useful tool for manipulating the universe- it doesn’t tell you which direction you should manipulate the universe into.

 
Toggle comment visibility Comment by Seth Baum
Jun 17, 2007 6:52 am

“That’s the whole point- simple AIs will get a lot more power than we expect them to intuitively, and then will use that power to wreak havoc.”

Replace both “will”s with “may”s to recognize the uncertainty and sure, I’ll agree with this.

“there’s always going to be a stupid mistake you didn’t think of.”

I think we’re on the same page here: It seems to me that the only way to design an “AI that will manage to avoid all the stupid ways it could do very bad things” is to design it to recognize these stupid ways as they come up- not to exhaustively build them in. If you have a better expression for the same idea, let me know.

As for the little algorithms, I suspect we were looking at the problems differently. I was thinking “how do I solve this problem”; the little algorithms were only to break the problems down into steps that could themselves be solved with (possibly more advanced versions of) contemporary narrow AI. “Do good stuff” cannot be so solved. If you mean to look at the same narrow problems and ask “What if we tossed general AI at it?” then of course the situation may change dramatically.

“Once the AI has that universe model”

You’re waving a magic wand and giving a universe model to an AI. How did it get there, and how difficult is getting it there compared to making it friendly?

“It could use the information to exploit human vulnerabilities as easily as it could use it to help us.”

So you mean to say, it could easily use this information to help us? Then the answer is no, friendliness is not substantially more difficult, but just because it’s not doesn’t mean it will necessarily happen. If an AI understands the universe that well, then surely it can figure friendliness out. The question is, How difficult is it to tell it to figure friendliness out and do it in comparison to how difficult it is to feed it a universe model?

 
Toggle comment visibility Comment by Seth Baum
Jun 17, 2007 8:18 am

By the way I want to apologize for contributing to the somewhat inflammatory tone this comment thread has taken. While the underlying discussions and disagreements have been quite productive (for me at least), they probably could have been carried out more civilly. I apologize if I’ve offended anyone and I’ll try harder in the future.

 
Toggle comment visibility Comment by Eliezer Yudkowsky
Jun 17, 2007 11:33 am

Seth, the basic distinction between getting a good universe model, and getting a detailed description of where to steer the universe, is that there is only one reality to map, but many possible targets of optimization processes. If you get basic inferential processes right and hook up a few sensors, you can expect General Relativity to pop out of looking at the universe. You can also expect powerful models of people the way they really are (not involving any particular counterfactuals, or any particular privileged level of abstraction). But to say that a dead and roasted body has lower utility than a live and happy one, you need to get extra bits of information into the AI. And a random mind-in-general does not conveniently, automatically privilege the particular abstractions and categories you use; in the course of understanding the universe and predicting it as well as possible, it does not necessarily see a whole human burning to death as distinct from carbohydrogen blob #27 undergoing oxidation. If the AI is inventing its own representation, the way that we ourselves invented all sorts of new words and concepts as we learned more about the universe - and if the AI is revising old representations, the way we got rid of the phlogiston concept - then it won’t be easy to preprogram a utility function over these changing representations; even assuming that we, ourselves, knew exactly which futures we wanted, which we don’t.

CEV is an attempt to define a hookup to the environment from which the AI can extract this sort of information, even the parts the programmers themselves don’t know, rather than it all having to be programmed in explicitly and in detail. But then CEV itself must be programmed, and that too is many bits.

Furthermore, it looks to me like all AI problems are a lot harder than you think they are - certainly relative to current science. Breaking things down into “goals” and “achievers” does not even begin to touch on the necessary decomposition into simpler parts, any more than learning to solder a wire teaches you everything you need to know about electrical engineering.

 
Toggle comment visibility Comment by Tom McCabe
Jun 17, 2007 11:47 am

“Replace both “will”s with “may”s to recognize the uncertainty and sure, I’ll agree with this.”

Whether any given AI will blow up the Earth at one particular moment is guesswork. Whether any given AI will blow up the Earth if we keep pushing it and giving it more power isn’t guesswork- it’s bound to happen at some point.

““Do good stuff” cannot be so solved.”

Then we’re pretty much screwed, because a working FAI needs a mathematically consistent version of the algorithm “Do good stuff.” That’s what we’re aiming for- an FAI that will do good stuff in any circumstance, regardless of the particular value of “stuff”.

“You’re waving a magic wand and giving a universe model to an AI. How did it get there, and how difficult is getting it there compared to making it friendly?”

We already have fairly realistic universe models in many video games, so it can’t be that hard.

“So you mean to say, it could easily use this information to help us?”

I was equating using the information to blow us up and using the information to help us, because they’re basically at the same level of difficulty *to the AI*. If you’re talking about which one is harder to implement by the *programmers*, getting the AI to use it to help us is far harder.

“If an AI understands the universe that well, then surely it can figure friendliness out.”

Sure, it could figure friendliness out. The problem is that friendliness is only a ridiculously small subset of the possible targets to steer the universe towards. Why would it figure friendliness out, as opposed to figuring out how to turn everything into iron or increase the energy of a randomly selected proton? The space of possible moralities the AI could derive just from looking at the universe is so huge that the AI would never find friendliness even if it derived a new morality every millionth of a second until the heat death of the universe.

 
Toggle comment visibility Comment by Seth Baum
Jun 19, 2007 4:18 pm

Thanks to both of you for bearing with me on this.

Eliezer:

I agree that feeding an AI what it should do requires information beyond that required for feeding it how the universe is. I’m trying to get a sense for which project is harder, the is or the should. Perhaps there’s no meaningful answer to this question if we can always feed it more of either. However, the impression I get from you is that you think the should will be much more difficult. Am I wrong?

And please don’t get me wrong- I understand that basic AI work, whether it’s narrow or general, is very difficult work. Perhaps I should have been more clear about this when I casually referenced narrow AI progress.

……

Tom:

When I said “Do good stuff” cannot be so solved, I meant that it cannot be broken down into smaller parts that (to me at least seem to) fit within existing narrow AI work. Whether it’s Deep Blue, DARPA Grand Challenge, or any other narrow AI project I know of, it’s always us telling the AI what it should do, as opposed to us asking the AI to play philosopher and figure out what it should do. (I’d love to hear counterexamples if you know of any.) CEV does not fit into this category, since CEV is not a narrow AI project, or at least not a traditional one. Perhaps it will work for “Do good stuff”, or perhaps some other scheme will. I haven’t given up hope.

…..

One piece that seems missing from this is the can. If the AI has a model of the universe, does it necessarily have vast ability to manipulate it? The is and the can seem closely connected but not equivalent.

 
Toggle comment visibility Comment by Tom McCabe
Jun 19, 2007 5:24 pm

Deep Blue was better than any human chess player. Therefore, no human could have possibly predicted (let alone programmed in) Deep Blue’s chess moves, because if they could, then they must be as good at chess as Deep Blue was.

 
Toggle comment visibility Comment by Eliezer Yudkowsky
Jun 19, 2007 5:55 pm

Seth: While is is explicitly preprogrammed into most modern, narrow AIs, human beings seemed to have learned the majority of their current knowledge by looking at the environment, using core inference routines that did not incorporate explicit knowledge of e.g. General Relativity. Solomonoff induction, a pretty simple formalism, describes a universal (albeit uncomputable) predictor, as good as any computable predictor out there, in a few lines of equations. If you’re trying to compress the code for looking at the universe, instead of directly describing the universe, then it compresses quite a lot! The only upper bound on how much it compresses is how much computing power you’re willing to spend decompressing. The intrinsic, irreducible Kolmogorov complexity of “look at the universe and see what’s there” is pretty damn low as compared to human morality.

In terms of what it takes to do in practice, I wouldn’t be surprised if a full, rigorous understanding of intelligence got you 95% of the way to building a Friendly AI in terms of real-world labor. But that’s for a rigorous understanding of intelligence. Not for slapping something together which, like the human brain, operates in wildly ad-hoc and inconsistent ways but works most of the time. The dangerous thing about this situation is that currently, people who are interested in AGI at all, are (in their daily practice) focusing all their effort on rushing something out the door, not on doing a system with clean foundations into which they know how Friendly AI will fit.

 
Toggle comment visibility Comment by Seth Baum
Jun 20, 2007 8:25 pm

Eliezer: Thanks for your comments. They answer my questions nicely and give me much better insight into the situation.

 
Jul 11, 2007 11:10 am

[…] the Stamp-Collecting Device. A common objection goes like this: “An optimization process that’s smart enough to […]

 
Oct 9, 2009 12:32 pm

[…] more on the paperclip maximizer AI concept, see Nick Hay’s The Stamp Collecting Device. […]

 
Oct 9, 2009 8:35 pm

[…] through the long comments thread at The Stamp Collecting Device, I found a funny quip from the always quotable […]

 

Leave a reply

Comments may take a while to appear, as they are moderated for spam.