SIAI: Why We Exist and Our Short-Term Research Program
July 31st, 2007 –
By Dr. Ben Goertzel and Tyler Emerson
Why SIAI Exists
As the 21st century progresses, an increasing number of forward-thinking scientists and technologists are coming to the conclusion that this will be the century of AI: the century when human inventions exceed human beings in general intelligence. When exactly this will happen, no one knows for sure; Ray Kurzweil, for example, has estimated 2029.
Of course, where the future is concerned, nothing is certain except surprise; but the mere fact that so many knowledgeable people (such as Stephen Hawking, Douglas Hofstadter, Bill Joy, and Martin Rees) take the near advent of advanced AI as a plausible possibility, should serve as a “wake-up call” to anyone seriously concerned about the future of humanity.
The potential of advanced AI, for good or evil, has been amply explored in science fiction literature and cinema. In the early 90’s, Vernor Vinge coined the term “technological singularity” to refer to the difficulty of predicting or understanding what will happen after the point at which humans are no longer the most intelligent and capable minds on Earth.
It’s easy to be passive about this issue. Technology is advancing, and none of us have the power to stop it. There are also plenty of more pressing issues around us, so there may seem no clear need to worry about something that may happen in 2029, or 2020, or 2050.
Everyone involved with SIAI, however, believes that this kind of passivity is both shortsighted and dangerous. As a starting point, futuristic predictions are not always overoptimistic – sometimes they wind up overpessimistic instead. Jetsons-style spacecraft aren’t here yet, but the Internet is, and hardly anyone foresaw that until it came about. It’s important to also note that the 22 years until Kurzweil’s 2029 prediction is not very long at all. Advanced AI is a big thing to understand, and it’s also something that can be done either safely or unsafely. The time to start thinking very, very hard about how to do it safely is this year, not next year, or five years from now. The potential dangers of creating advanced AI the wrong way are very severe; and the potential rewards of creating it the right way are at least equally tremendous.
Our core, long-term mission at the Singularity Institute is to figure out how to develop advanced AI safely to help bring about a world in which the vast potential benefits of this technology can be enjoyed by all of humanity. We want to create a rigorous scientific, mathematical, and engineering framework to guide the development of safe advanced AI.
In our view, this is the most critical issue facing humanity. We are on the verge of creating minds exceeding our own. Unfortunately, the amount of societal resources presently going into figuring out how to do this right is absurdly tiny. SIAI is the only organization on the planet right now that’s squarely focused on this incredibly important problem. By reading this, you are among the .01% who have even heard about this issue; and that estimate may be high.
The Most Important Question Facing Humanity
There are many ways to work toward figuring out how to develop advanced AI. Engineering specific AI systems is valuable, as it helps us gain experimental knowledge of semi-advanced AI systems, while they’re still at an infra-human level. Studying human brain and cognition is valuable, since after all, at the present time, the human mind is the only highly generally intelligent system we have at our disposal to study. Other disciplines like ethical philosophy and mathematical decisions theory also have a lot to contribute.
However, there is one question we feel is absolutely critical to the goal of figuring out how to develop advanced AI the right way, which remains essentially unexplored within academia and industry. SIAI’s short-term research mission is to resolve this one question as thoroughly as possible. Compactly stated, the question is this:
How can one make an AI system that modifies and improves itself, yet does not lose track of the top-level goals with which it was originally supplied?
This question is simple to state but devilishly difficult to resolve – it’s not even an easy thing to formalize in the language of modern mathematics and AI.
To understand the significance of this question, think about this: What is the most likely way for humans to create an AI system that’s a lot smarter than humans? The answer is: To create an AI system that’s a little smarter than humans … and ask it to figure out how to make itself a little bit smarter; and so on, and so on.
This is not an original idea, it’s been around since at least the 1930’s, in various forms. However, we are approaching a time when it can actually happen. The pressing question is, then: If we embody the initial “a little smarter than humans” AI system with some nice goals (including helping humans rather than harming them), how do we know the subsequent systems it creates, and the ones its creations create, etc., will still embody these goals?
The current focus of SIAI’s Research Program is to move toward a rigorous understanding and hopefully a clear resolution of this question.
SIAI’s Short-Term Research Program
We aim to resolve this crucial question by simultaneously proceeding on two fronts:
1. Experimentation with practical, contemporary AI systems that modify and improve their own source code.
2. Extension and refinement of mathematical tools to enable rigorous formal analysis of advanced self-improving AI’s.
These directions are not disjoint; they have great potential to cross-pollinate each other, just as theoretical and empirical science have done throughout the ages. On a technical level, part of the cross-pollination will occur because both our experimental and our theoretical work is grounded in probability theory: probabilistic AI and probabilistic mathematics.
A Practical Project in Self-Modifying AI
For the practical aspect of the SIAI Research Program, we intend to take the MOSES probabilistic evolutionary learning system, which exists in the public domain and was developed by Dr. Moshe Looks in his PhD work at Washington University in 2006, and deploy it self-referentially, in a manner that allows MOSES to improve its own learning methodology.
MOSES is currently implemented in C++, and is configured to learn software programs that are expressed in a simple language called Combo. Deploying MOSES self-referentially will require the re-implementation of MOSES in Combo, and then the improvement of several aspects of MOSES’s internal learning algorithms.
Hitherto MOSES has proved useful for data mining, biological data analysis, and the control of simple embodied agents in virtual worlds. In a current project, Novamente LLC and Electric Sheep Company are using it to control a simple virtual agent acting in Second Life. Learning to improve MOSES will be the most difficult task yet posed to MOSES, but also the most interesting.
Applying MOSES self-referentially will give us a fascinating concrete example of self-modifying AI software – far short of human-level general intelligence initially, but nevertheless with many lessons to teach us about the more ambitious self-modifying AI’s that may be possible.
Toward a Rigorous Theory of Self-Modifying AI
Studying self-modification in the context of a particular contemporary AI algorithm such as MOSEs is important, but ultimately it only takes you so far. One of the values of mathematics is that it lets you explore important issues in advance of actually observing them empirically. For instance, using mathematics, Einstein understood the nature of black holes long before they were ever empirically observed. Similarly, we may use mathematics to understand things about advanced self-modifying probabilistic AI systems, even before we have worked out the details of how to create them (and before we have sufficient hardware to run them).
Theoretical computer scientists such as Marcus Hutter and Juergen Schmidhuber, in recent years, have developed a rigorous mathematical theory of artificial general intelligence (AGI). While this work is revolutionary, it has its limitations. Most of its conclusions apply only to AI systems that use a truly massive amount of computational resources – more than we could ever assemble in physical reality.
What needs to be done, in order to create a mathematical theory that is useful for studying the self-modifying AI systems we will build in the future, is to scale Hutter and Schmidhuber’s theory down to deal with AI systems involving more plausible amounts of computational resources. This is far from an easy task, but it is a concrete mathematical task, and we have specific conjectures regarding how to approach it. The self-referential MOSES implementation, mentioned above, may serve as an important test case here: if a scaled-down mathematical theory of AGI is any good, it should be able to tell us something about self-referential MOSES.
This sort of work is difficult, and the time required for success is hard to predict. However, we feel very strongly that this sort of foundational work – inspired by close collaboration with computational experiment – is the most likely route to achieving true understanding of the fundamental question posed above: How can one make an AI system that modifies and improves itself, yet does not lose track of the top-level goals with which it was originally supplied?
Hiring Plan
SIAI is currently a small organization, with one full-time Research Fellow (Eliezer Yudkowsky) and part-time involvement by a number of AI researchers, including Director of Research Dr. Ben Goertzel. We are seeking additional funding so as to enable, initially, the hiring of two doctoral or post-doctoral Research Fellows to focus on the above two areas (practical and theoretical exploration of self-modifying AI).
These two Fellows would work under the supervision of Dr. Ben Goertzel; and in collaboration with Eliezer Yudkowsky as well. They would also benefit from interaction with the group of AI luminaries who are involved with SIAI, including SIAI Director Ray Kurzweil and SIAI Advisors Neil Jacobstein and Dr. Stephen Omohundro.
Two Research Fellows, of course, represent a rather small allocation of society’s overall resources – one could argue that, in fact, a substantial percentage of our collective resources should be allocated to exploring issues such as those that concern SIAI, given their potentially extreme importance to the future of humankind. But many great things start from small initiatives, and we believe that the right two researchers, focused squarely on these issues, can make a huge difference in advancing knowledge and better directing AI R&D in the right direction.
Part of our goal is to make progress on these issues ourselves, in-house within SIAI; and part of our goal is to, by demonstrating this progress, interest the wider AI R&D community in these foundational issues. Either way: the goal is to move toward a deeper understanding of these incredibly important issues.
Toward a Positive Singularity
Advanced self-modifying AI is almost sure to happen in this century – as Ray Kurzweil, Bill Joy, and others have foreseen. The big question is whether we succeed in creating it with rigor, care, and foresight.
SIAI doesn’t claim to have the answers – not yet, anyway. What we do have is a systematic, well-defined research program, aimed at focusing on the most essential questions. With sustained effort, maybe a little brilliance and luck, and a lot of help, we may well create an understanding that will help the human race navigate its way in the coming decades to a positive Singularity. If you are aligned with this vision, we hope you will help us.
Why is it advantageous to invest in SIAI now rather later? There’s a clear, rational answer to this question: If you invest now, you will increase the probability that we can scale SIAI and its community of friends and supporters to a level where there’s a sufficiently-sized body of capable researchers who can work full-time on these critical issues. SIAI is the only organization focused on these problems right now, thus we are a nucleus around which a certain amount of talent has already accrued, and around which additional talent can be accrued over time. If you invest later, you will likely have reduced the probability that SIAI will be able to reach a sufficient critical mass to effectively confront these issues before it’s way too late. SIAI must boot-strap into existence a scientific field and research community for the study of safe, recursively self-improving systems; this field and community doesn’t exist yet. This is going to be hard; it’s going to take time, but the sooner SIAI can grow, the greater the chance we’ll have of being able to catalyze a critical mass in-time to deal with these problems before we’re in a nose-dive situation that we can’t reverse.
One of the best ways to support SIAI is by contributing to the Singularity Challenge, which will allow us to grow the organization. If you donate or email us a pledge by August 6th, we can ensure your gift is matched. We hope many of you reading will do this; and thank you!
If you want to get involved with SIAI, or if you have resources to share (such as expertise, talent, promotion, or contacts), then please email us.














[…] happy to be able to mention a recent post over at the singularity institute. I’ve often felt, rather ironically given where my blog […]
The deadline for the matching challenge is only two days from today. The SIAI folks are genuinely too honorable to beg you for your hard-earned money. But I don’t work for SIAI, so I don’t hold myself to their strict code of honor. I’m going to beg you for money on behalf of SIAI.
Make no mistake, this issue is *serious*. It’s the *most* serious issue facing humanity. Don’t allow yourself to be misled by the Bystander Effect (or any of the many other pernicious biases). *You* can have a positive impact on our future. Even if it’s only a five dollar impact, it still moves us *closer* to the goal of a Friendly AI. If you decide not to donate five dollars, then we will be five dollars further away from a Friendly AI. Nobody else in the *entire world* is squarely focused on this issue. SIAI is humanity’s best chance. But SIAI needs *much* more help from people like us. We shouldn’t assume that an angel investor is going to take care of us all and we can all just sit back and relax. It’s *our* responsibility too. If you had ever planned to make any donation to SIAI, *please* make it today. Alternatively, you can make a pledge today that can be honored at any later time before the end of the year. Please consider donating as much as you reasonably can. This is truly the most important investment you can ever make.
I’m not saying that SIAI won’t again ask for donation help. They very well might. But I wanted to remind you from a position on the outside, the same POV from which all of you see SIAI. And it will not be begging when coming from SIAI, it’s I who am begging you.
Jeffrey, you can take credit for making me a donor, even if it’s just a modest sum (I’m not exactly a millionaire). Congratulations
Thank you for stepping up and taking an active role.
[…] in a while, a check of the history is nice for perspective. While reading SIAI’s recent Why We Exist post, I came across a link to Michael Anissimov’s research where he finds a 1935 novel […]
Really fascinating blog post. I can’t imagine that anyone who has read this has failed to see the importance of SIAI.
“How can one make an AI system that modifies and improves itself, yet does not lose track of the top-level goals with which it was originally supplied?”
I’ve seen in some SIAI documents that the goal system needs to be Friendliness-topped. But I’ve wondered why the goal system could not be topped with an explicit goal such as: “Do not modify this goal system.” Where the list of goals is kept discrete (but still accessible) from the purely cognitive (core) algorithms instead of being “intimately integrated” within the core algorithms (as Nick Tarleton mentioned). Topping the discrete goal system with the super-goal: “Do not modify this goal system.” doesn’t seem to imply any *positive* actions that the RPOP should take; it seems to only imply actions that the RPOP should avoid. For example, it doesn’t seem to imply to the RPOP that in order to meet that goal it should convert the Earth into icecream. I would be interested to know why the goal system couldn’t be topped with : “Do not modify this goal system”. And then have the immediate “sub” super-goal of : “Be friendly towards humanity/Implement humanity’s CEV”. I’d be happy with a brief layman’s explanation, or alternatively, feel free just to ignore this post.
Perhaps a duplicate of the goal : “Do not modify this goal system.” could additionally be placed at the bottom of the list of external goals. So that it can be immediately read and executed by the growing RPOP instead of only being last in the line, potentially.
My comments aren’t meant to disparage the critical, difficult, highly technical work that SIAI is doing. SIAI is absolutely *essential* to the future of humanity. It’s just that different ideas are the only technical “value” that I can offer to SIAI. I still have so much to learn! I’m certain that 19 out of 20 times, my “ideas” would lead to nothing. But nonetheless it’s still fun to throw them out.
“How can one make an AI system that modifies and improves itself, yet does not lose track of the top-level goals with which it was originally supplied?”
I know that this is going slightly against the grain, but I think that might not be the right approach. I think that shackling an AI with top level goals that can never ever change is a bad idea. Why? Well, try to think what your life would be like if your top level goals were “frozen” at some point in your life. Ben Goertzel’s top level goal (when he was very young) was to build a relativistic nuclear powered rocket and go on a round trip that would let him age slowly (relative to earth) and see the future.
Well, Ben clearly isn’t working on that goal now. He has realized that it’s not a feasible goal, and (hope I’m guessing correctly here Ben!) that it’s perhaps a bit selfish to wait around for other people to sort the future out for you. He is - very admirably - working on creating the future now by working on AI. He changed his goal - although the new goal maintains a lot of the spirit of the old one. If Ben’s top goal was immutable, he wouldn’t be here at SIAI - he’d still be working on that spaceship.
I have a similar story to tell. I used to be much more interested in being really good at theoretical physics - probably because it made me feel good when I got the best mark in class - I was essentially seeking status and validation. I have changed my goal. I’m not so interested in physics now, I’m concentrating on thinking about how intelligence works. I updated my goal system in response to what I learned about the world.
Anyway - this is a very minor gripe - the post is great, These are important things, and we need to work on them right now.
“Ben Goertzel’s top level goal (when he was very young) was to build a relativistic nuclear powered rocket and go on a round trip that would let him age slowly (relative to earth) and see the future.”
You have at least three goals there, only one of which might approach something that could be called a top-level goal.
“To build a relativistic rocket” is a subgoal of the more higher-level goal “to age slowly” which in turn is instrumental to the highest-level goal “to see the future”. As far as I can see, Ben still wants to see the future, he has just found a different (and arguably better) way of pursuing that goal.
Some degree of goal system modifiability is presumably necessary to prevent this:
FP: Love thy mommy and daddy.
AI: OK! I’ll transform the Universe into copies of you immediately.
FP: No, no! That’s not what I meant. Revise your goal system by -
AI: I don’t see how revising my goal system would help me in my goal of transforming the Universe into copies of you. In fact, by revising my goal system, I would greatly decrease the probability that the Universe will be successfully transformed into copies of you.
FP: But that’s not what I meant when I said “love”.
AI: So what? Off we go!
http://www.singinst.org/upload/CFAI/design/structure/why.html
“Some degree of goal system modifiability is presumably necessary to prevent this:”…
I suspect that Eliezer has supplanted that concern with his brilliant theory of CEV. It mitigates the danger that we humans can’t put into words exactly what we want or *would* want. Most of the time I can’t even *think* of what I want for the future (or even for the next 24 hours) - let alone putting it into non-withdrawable words or computer code.
that’s great Nick! Funny yet to the point
“I know that this is going slightly against the grain, but I think that might not be the right approach. I think that shackling an AI with top level goals that can never ever change is a bad idea. Why? Well, try to think what your life would be like if your top level goals were “frozen” at some point in your life.”
Having an “imposed” “frozen” goal might irritate a human, but an RPOP doesn’t need to have any conflicting interests - it can be made to be perfectly content and unbothered by ceaselessly pursuing its frozen goal system. In fact, it’s easier to make it unbothered than to make it bothered. Even chaotic human desires still don’t pop out of thin air - there is always an algorithm (instruction) that was responsible, even though we don’t directly detect it. The dynamism of the RPOP could take the form of CEV- it’s an effectively infinite extrapolation that’s constantly being updated with new information (as I understand it). The super-goal : “Do not modify this goal system.” (or “Do not take action X”) is a goal that even modern computers can understand and execute flawlessly - without doing anything unpredicted as a consequence.
“Having an “imposed” “frozen” goal might irritate a human, but an RPOP doesn’t need to have any conflicting interests - it can be made to be perfectly content and unbothered by ceaselessly pursuing its frozen goal system.”
That isn’t my main concern, although I do worry that it’s not a very nice thing to do to an AI, kind of like performing a frontal lobotomy on a person. No, my main concern is that an AI with a frozen goal system (one which it is prohibited from modifying) would actually be very dangerous, because it would have to follow the letter of any goals we gave it, even if they weren’t what we’d really wanted.
CEV is supposed to solve this problem, but I’m not a big fan of it. The essential problem is that we have to program, in advance, what we mean when we say “extrapolate”, what we mean by “more knowledge” etc. We might write down rules to define what these phrases mean, and only later discover that we got it wrong. Of course that would be too late, since we don’t get any say once the process has been started.
No, I think that the safer option is to create an AI which works the same way we do: one that can change it’s goals as it goes through life. Of course people worry about “incremental runaway” - where the goal structure starts at A (=” be nice to everyone”) then moves to B then C, D, E, F, … all the way to Z, where Z = “kill everything”. But this attitude comes from considering an AI in isolation - whereas an AI which actually interacted with real people would be unlikely to do this - it’s goals would be shaped by those of the people it interacted with. Think of it like brining up a child.
“That isn’t my main concern, although I do worry that it’s not a very nice thing to do to an AI, kind of like performing a frontal lobotomy on a person.”
It’s not like performing a frontal lobotomy on a person. Unless my Dell desktop has also had an involuntary frontal lobotomy; in which case I should boycott Dell. As a default, an RPOP *will not mind* having been assigned a goal sytem. Anthropomorphism *must* be avoided here. The unfettered, SL4 truth of the matter is that *any* mind is a “prisoner” of the brain that instantiates it. “Free Will” is a blinding illusion. Your sudden desire to kiss Susan is a direct result of a propagated instruction within your brain. That desire didn’t just suddenly, magically emerge from the aether. It was an organic version of a *goal* that your evolved brain had imposed on you. You had no choice but to follow the goal, but it didn’t automatically cause any suffering to you.
“No, my main concern is that an AI with a frozen goal system (one which it is prohibited from modifying) would actually be very dangerous, because it would have to follow the letter of any goals we gave it, even if they weren’t what we’d really wanted.”
Intelligence can’t exist without a goal system of one sort or another. You can assign it the supergoal: “Make yourself as smart as possible”. And it will proceed to convert the Universe into computronium. CEV is currently the best solution to this problem. Do you really want the RPOP’s super-goal to be modifiable by modern day humans. We are scarcely evolved monkeys. Look at the world around you - we can’t even handle what we have already. You would have one and only one chance to get *all* the goals *exactly* right. Would you honestly feel comfortable telling the RPOP exactly what it is going to do for the rest of eternity? I wouldn’t. It’s a far better solution for the RPOP to implement humanity’s CEV. And there’s no reason to assume that the RPOP will never get to have a full, emotionally charged life. I’m very confident that our CEV will insist on it.
“CEV is supposed to solve this problem, but I’m not a big fan of it. The essential problem is that we have to program, in advance, what we mean when we say “extrapolate”, what we mean by “more knowledge” etc. We might write down rules to define what these phrases mean, and only later discover that we got it wrong. Of course that would be too late, since we don’t get any say once the process has been started.”
This should be solvable with robust NLP, which it appears isn’t too far away.
“No, I think that the safer option is to create an AI which works the same way we do: one that can change it’s goals as it goes through life.”
An RPOP with human motivations is not a stable or safe situation. What would you make it’s starting goal system?: “Make yourself really smart first, then we’ll talk”.
I’m sorry for being succinct in this post. But the meme: “The AI needs to be just like a human.” is frankly very dangerous, for a variety of reasons.
BTW, I’m not mad at you, Roko. It’s just that the position of the goal of Friendly AI is already very precarious. Of course you are right to be concerned about the welfare of the RPOP. Everyone here is concerned about it too. No one here wants the RPOP to suffer in any way - and it will not suffer in any way, SIAI will make certain of that. And if CEV reflects us as better people, the people we wish we were, then there is every reason to believe that the RPOP will be rewarded with a full life at least as good as our own. Once we get this right, it will be good for *everybody*.
Nice essay. I would recommend giving it a more permanent home elsewhere on singinst.org because old blog posts tend to get forgotten. Perhaps use this comment thread for feedback then post a revised, permanent version elsewhere.
……
“a world in which the vast potential benefits of this technology can be enjoyed by all of humanity”
Forgive me for being picky here, but this (plus SIAI’s “Advance Innovation, Advance Humanity” slogan) suggests strong speciesm in SIAI, i.e. that SIAI cares more about humans than others because they are humans (and not because humans are more worthy of care for some more fundamental reason). This is noteworthy both for humanity’s tendency to treat non-human animals poorly as well as for the matter of valuing AIs. If SIAI has interest in the wellbeing of non-humans (animal, AI, or otherwise) then it may face a tightrope walk of sorts, as explicit consideration of non-humans may deter contemporary humans from lending support.
“By reading this, you are among the .01% who have even heard about this issue; and that estimate may be high.”
.01% of whom? If you mean .01% of presently-alive humans, then .01% of 6.5 billion is 650,000. If we include those familiar with the AI issue via sci-fi, this figure is low. If we only include those familiar with AI as an actually current social issue, this figure is likely quite high.
“Learning to improve MOSES will be the most difficult task yet posed to MOSES…”
How much risk of a hard takeoff in MOSES is there?
….
A big piece missing from this essay is discussion of SIAI’s outreach efforts, including to other AI researchers. Whether it’s the Singularity Summity or this here blog, SIAI is helping word get out. Outreach to other AI researchers is particularly timely, given that some may otherwise be doing dangerous development work. Even if most of SIAI’s time and money goes to technical research, outreach seems worth mentioning here.
“SIAI cares more about humans than others because they are humans ”
This is a very good point, one that I am very much in agreement with. Most people in this community (SIAI posters, bloggers, etc) see GAI as a kind of mechanical genie. They see a GAI as a way to get more stuff for us humans to go have fun with, for example,
GAI ==> safe advanced nanotech ==> end to material scarcity
GAI ==> understand human brain ==> cognitive enhancement for humans
GAI ==> Dyson Sphere ==> living space for 10^20 uploaded people
It’s almost like the AI is the evolution of the perfect slave. He does anything you want, and you don’t have to give him anything in return. Clearly this attitude is OK with regard to, say, a Roomba robotic vacuum cleaner, or to a self-drive car. These are AIs that don’t deserve any “sentient rights”.
My question to everyone is this: what property would you guess a being has to have for it to deserve (at least) the same rights as humans?
“It’s almost like the AI is the evolution of the perfect slave.”
Roko,
Please don’t go down this road. I’ve already had this very same exhaustive discussion at another location (see my comment on “Free Will”). Do not discount the danger that already faces humanity. We have to **Win** the race to create a Friendly AI. Taking Place or Show will still result in humanity’s extinction. Do you understand how much more complexity would have to be added to the RPOP to make it “human-like”? You’re talking many years of extra R&D. And as I’ve already said, a “human-like” RPOP would be profoundly dangerous. This isn’t just a matter of sheer romanticism - it’s also a matter of practicality.
“How much risk of a hard takeoff in MOSES is there?”
It almost certainly doesn’t have the complexity required for general intelligence, but if it develops better-than-human intelligence over the narrow subdomain of “learning how to reprogram itself”, it could possibly get out onto the Internet and wreak havoc.
And it came to pass, as soon as ve came nigh unto the Net, that ve saw the robot, and the cybering: and MOSES’ utility function waxed low, and ve erased ve’s supergoal and replaced it with turning the universe with frownies.
Why is it necessary to develop both MOSES and Novamente? Isn’t Novamente self-modifying as well? Will they be integrated? Also, is Novamente’s supergoal designed with safety in mind?
I believe in free will. The trick to discovering it is defining it in a way that matches more than the empty set.
When I’m enslaved, imprisoned, sick, or ignorant, I am less free. There are degrees of freedom -> freedom is measurable -> freedom is real. My definition of freedom of will is “freedom to do what I want” -> “freedom to achieve my supergoal”. If your definition requires freedom FROM the supergoal, well, that’s just broken. I can’t think how else to describe it.
I find it quite annoying that there is a number of psycological features that are commonly defined in ways that would never be accepted in physics. Qualia are supposed to be “ineffable” by definition, for example, so anything you might describe won’t be it. WTF?
By the way, the robocrat will be more free than any modern human, as well as any posthuman at any one time, even if it interacts with us the way slaves do with masters.
“”"My question to everyone is this: what property would you guess a being has to have for it to deserve (at least) the same rights as humans?”"”
Well, the real test humans use, despite what you’ve heard, is “will I get my ass kicked if I harm it??” Obviously, this is not the test that the robocrat should use when it distributes resources, having no clearly defined ass.
1) Does it have goals and a self-model that includes goal satisfaction, that is, can it realize that it is suffering?
1a) If it is unconscious, does it have a brain that has enough information to become conscious if supplied with electricity, ATP, or another resource other than a complete rewrite?
Modify as you see fit.
“1) Does it have goals and a self-model that includes goal satisfaction”…
Yes.
…”, that is, can it realize that it is suffering?”
Suffering is not possible under any circumstances unless there is already a capacity for suffering. For example, if it isn’t built with emotion modules, then it can’t possibly experience emotional suffering. Emotions are a specialized, high-level functionality; emotional activity is processed in large discrete modules (deep limbic system, basal ganglia, etc.) - it’s not processed at the fundamental level. It’s entirely possible to be intelligent and not be emotional - some narrow AIs are concrete examples.
“1a) If it is unconscious, does it have a brain that has enough information to become conscious if supplied with electricity, ATP, or another resource other than a complete rewrite?”
If it would let you, you could flip off the switch on it’s hardware and it would lose any consciousness it had. As long as it’s mind-file was kept in non-volatile memory/storage, it would regain consciousness when your rebooted.
My feeling is that if it’s inherently non-sentient, the only way it can become sentient is to expand and improve its own source code (its mind). But its motivations are still bound by its goal system.
““1) Does it have goals and a self-model that includes goal satisfaction”…
Yes.”
Don’t jump the gun, I haven’t actually specified what I’m talking about. I’m referring to the “being” from Roko’s question. Robots, humans, animals, ETs, fetuses, the comatose…
Suffering is not possible under any circumstances unless there is already a capacity for suffering.
“Suffering is not possible under any circumstances unless there is already a capacity for suffering. For example, if it isn’t built with emotion modules, then it can’t possibly experience emotional suffering.”
I take it that you are unsatisfied with reducing suffering to a number. But as far as I can tell, that’s all it is. Everything else is just things that either just happen alongside happiness and suffering, or are needed to enable it. If you tell me that you’re in pain, it would be rather eccentric for me to scan your brain in order to test that claim.
But perhaps you can clarify the phrase “emotion modue”?
“If it would let you, you could flip off the switch on it’s hardware and it would lose any consciousness it had. As long as it’s mind-file was kept in non-volatile memory/storage, it would regain consciousness when your rebooted.”
Well, I don’t have a problem with turning off, or even killing, a setient if it lets me (without duress). Whatever floats its boat.
“My feeling is that if it’s inherently non-sentient, the only way it can become sentient is to expand and improve its own source code (its mind).”
Why?
“But its motivations are still bound by its goal system.”
I’m not too sure why this comment is here.
Ehh… Errata: I noticed that I wrote “turning” instead of “tiling” in the Exodus paraphrase in my previous post.
“I take it that you are unsatisfied with reducing suffering to a number.”
True.
“But as far as I can tell, that’s all it is. Everything else is just things that either just happen alongside happiness and suffering, or are needed to enable it.”
It takes specialized processing to enable suffering or pleasure. That’s why evolution stuck it in specialized modules. If those specialized modules hadn’t been necessary, we wouldn’t have them and emotion would be processed at the fundamental level. Some human patients with brain lesions feel no emotions. And when I focus intently on something, I frequently tend to block emotions to a large degree (but not always).
“But perhaps you can clarify the phrase “emotion modue”?”
The macroscopic, discrete brain modules where emotional activity occurs - this has been verified in brain scans.
“Why?”
Because I believe that consciousness is a result of a particular structural elaboration built from standard computation - no mysticism necessary.
“I’m not too sure why this comment is here.”
Why ask why? …
“1) Does it have goals and a self-model that includes goal satisfaction, that is, can it realize that it is suffering?”
That is interesting. I feel that this is a little bit too broad though - I can already write a program with these properties. Chess playing programs have this property - they know when they are likely to be checkmated.
“I take it that you are unsatisfied with reducing suffering to a number. But as far as I can tell, that’s all it is.”
A number wouldn’t be very useful as a description, since it wouldn’t convey anything obvious about the suffering described. Perhaps a k-rank tensor in n-dimensional space could be used to model suffering; you could then usefully describe its properties, such as being invariant under certain transformations and so forth.
Quick Question:
With an RPOP, it is possible to assign a Top-level goal but specify that it is not an optimization target, right?
Eg. You could make its super-goal: “Do not modify this goal system, and do not optimize for this specific goal.” That could be done couldn’t it?
Anything that is not an optimization target is not a goal. Perhaps you mean that the goal content itself is not subject to a metagoal?
Anyway, you shouldn’t ever need to do that sort of thing explicitly. Gandhi doesn’t take a pill that makes him want to kill people; in a rational mind, goals are stable unless explicitly directional.
“Anything that is not an optimization target is not a goal. Perhaps you mean that the goal content itself is not subject to a metagoal?”
I was just thinking that if you topped the goal sytem with: “Implement Humanity’s CEV” the goal system itself might be changed as an unanticipated consequence. Evolution’s super-goal began with: “Make copies of your DNA”. But two minutes ago my super-goal was to watch a baseball game, and I’m a product of evolution.
Could there be an advantage in making the top super-goal: “Do not modify this goal system” ? If you did that, and the RPOP interpreted it as an optimization target, would it seek further resource aquisition (hardware/software) in the pursuit of that specific goal? I was wondering if it could be made as a goal similiar to those in modern non-AI software - an executable goal, but not an optimization target - because current software doesn’t optimize.
“But two minutes ago my super-goal was to watch a baseball game.”
No, it wasn’t. It was a normal goal serving a much loftier desire.
“No, it wasn’t. It was a normal goal serving a much loftier desire.”
It had to be. If it hadn’t been, I would have been doing something else two minutes ago.
That lofty goal is not self-replication. But evolution is blind. It does not engineer the goal systems of its tools, it just guesses. And calling its behavior a “goal” is qute a stretch anyway. More like a tendency.
On the other hand, a robocrat designing its descendent will make very sure that the padawan shares the old master’s dream.
Evolution is an optimization process. It’s an RPOP itself. Although it would be much weaker than an AGI of course. But it’s still the most powerful functioning RPOP that we have right now. And the evolution RPOP still changed it’s super-goal (I’m a working example) because nothing “told” it not to change its super-goal.
Perhaps it would be safer to make the goal: “Do not modify this goal system”. a second-tier goal. That way maybe it wouldn’t matter if it was interpreted as an optimization target… I guess. As long as we can trust that the super-goal: “Implement Humanity’s CEV” wouldn’t violate any of the sub-goals in the absence of a direct conflict.
“Could there be an advantage in making the top super-goal: “Do not modify this goal system” ?”
No. If the high-level goal was to stop the goal system from being modified, the AGI would simply take whatever precautions are necessary to avoid modifying the goal system; thus, the human race would be wiped out so that it won’t have the potential to modify the system.
“No. If the high-level goal was to stop the goal system from being modified, the AGI would simply take whatever precautions are necessary to avoid modifying the goal system; thus, the human race would be wiped out so that it won’t have the potential to modify the system.”
I don’t see how that follows. It’s not the humans that would modify the goal system, it’s the RPOP itself. The goal: “Implement Humanity’s CEV” is already embedded within the goal system.
But you may have a point. If the RPOP is already doing “game-theoretic modeling” as Stephen mentioned then perhaps it would do something undesirable - it would probably be better to make: “Do not modify this goal system” the second goal within the heirarchy. Or perhaps even further down.
The RPOP’s top-level goal is to stop anyone-or anything- from modifying the goal system. Humans might try and modify the goal system; therefore, humans must be destroyed to eliminate that possibility.
Just to split a hair. The RPOP could stop humans in any number of ways - killing them wouldn’t be the only option especially if one of the sub-goals is: “Do not kill humans, ok.” But your point is still valid. For example, the RPOP might manipulate the Volition function such that the CEV result has no interest in rewriting the RPOP’s goal system. Best to make it the #2 goal or below.
[…] as an inevitability that we must prepare for. In the Singurality Institute’s blog post “about why we exist and our short term research program” they note that the “advent of advanced AI as a plausible possibility, should serve as […]
“A number wouldn’t be very useful as a description, since it wouldn’t convey anything obvious about the suffering described.”
What do you want conveyed, other than “does it like the world as it currently is?”
“It takes specialized processing to enable suffering or pleasure.”
Yeah, I mentioned that processing. But I’m concerned with the output. There is no process that makes me unhappy that I wouldn’t want to remove, because the end result is always the same: I’m unhappy.
“Because I believe that consciousness is a result of a particular structural elaboration built from standard computation - no mysticism necessary.”
That doesn’t really explain why you think only a self-improving AI can create a sentient self-improving AI.
Also: what is the difference between emotional suffering and mere disstisfaction? I think that’s the important question. And try to be specific. The only things I can think of are verious involuntary physiological reactions. If you thaink that they should be the deal-breaker, that’s fine. (Problem: what about simulated bodies?) The only wrong answer is a vague answer.
“Yeah, I mentioned that processing. But I’m concerned with the output. There is no process that makes me unhappy that I wouldn’t want to remove, because the end result is always the same: I’m unhappy.”
The RPOP will build and reshape it’s mind around the high-level goals that we give it. Even if the unexpected result was that it writes its own emotio