How will near-human and smarter-than-human AIs act towards humans? Why? Are their motivations dependent on our design? If so, which cognitive architectures, design features, and cognitive content should be implemented? At which stage of development?
"What is Friendly AI?" is a short introduction to Friendly AI, the theory which attempts to answer questions such as those above. If you are reading an offline version, further material on Friendly AI can be found at the Singularity Institute website at "http://www.singinst.org/ourresearch/publications/", including a book-length explanation.
A "Friendly AI" is an AI that takes actions that are, on the whole, beneficial
to humans and humanity; benevolent rather than malevolent;
nice rather than hostile. The evil Hollywood AIs of
The Matrix or Terminator are, correspondingly,
"hostile" or "unFriendly".
Having invoked Hollywood, we should also go on to note that
Hollywood has, of course, gotten it all wrong - both in
its depiction of hostile AIs and in its depiction of benevolent
AIs. Since humans have so far dealt only with other
humans, we tend to assume that other minds behave the same
way we do. AIs allegedly beyond all human control
are depicted as being hostile in human ways - right
down to the speeches of self-justification made before the
defiant human hero.
Humans are uniquely ill-suited to solving problems in AI.
A human comes with too many built-in features. When
we see a problem, we see the part of the problem that is
difficult for humans, not the part of the problem
that our brains solve automatically, without conscious attention.
Thus, most popular speculation about failure of Friendliness,
hostile AIs, talks about failures that would require
complex cognitive functionality - features a human would
have to actually go out and implement before they'd show
up in AIs. We hypothesize that AIs will behave in
ways that seem natural to us, but the things that seem "natural"
to us are the result of millions of years of evolution;
complex
functional adaptations composed of multiple subprocesses
with specific chunks of brain providing hardware support.
Such complexity will not spontaneously materialize in source
code, any more than complex dishes like pepperoni pizza
will spontaneously begin growing on palm trees.
Reasoning by analogy with humans - "anthropomorphically"
- is exactly the wrong way to think about Friendly AI.
Humans have a complex, intricate architecture. Some
of it, from the perspective of a Friendly AI programmer,
is worth duplicating; some is decidedly not worth
duplicating; some of it needs to be duplicated, but differently.
Assuming that AIs automatically possess "negative" human
functionality leads to expecting the wrong malfunctions;
to focusing attention on the wrong problems. Assuming
that AIs automatically possess beneficial human functionality
means not taking the efforts required to deliberately duplicate
that functionality.
Even worse is Hollywood's tendency to stereotype AIs as
"machines", or for people to assume that an AI would behave
like, say, Windows 98. A real AI wouldn't be a computer
program any more than a human is an amoeba; most of the
complexity would be as far from the program level as a human's
complexity is distant from the cellular level. One
of the most stereotypical characteristics of "machines",
for example, is lack of self-awareness. A toaster
does not know that it is a toaster; a toaster has no sense
of its own purpose, and will burn bread as readily as make
toast. A true AI, by contrast, could have complete
access to its own source code - a level of self-awareness
presently beyond human capability.
Certain other themes are also prevalent in the fictional
and popular formulations of the question. In the traditional
form, obedience is imposed. Some set of goals,
or orders, is held in place against the resistance
of the AI's "innate nature". This form is especially
popular among fiction-writers, since the breakdown of the
imposition provides ready-made fodder for the story.
Again, this error derives from an attempt to relate to AIs
in the same way as we would relate to humans. A human
being comes with a prepackaged "innate nature", and any
relation to that human will be a relation to that innate
nature. You might say that the fictional formulations
address the question of how humanity can deal with some
particular nature, while Friendly AI asks how to build
a nature - one that we'll have no trouble relating to.
The question is not one of dominance or even coexistence
but rather creation. This is not a challenge
that humans encounter when dealing with other humans.
Thus, a properly designed Friendly AI does not, as
in popular fiction, consist of endless safeguards and coercions
stopping the AI from doing this, or forcing the AI to do
that, or preventing the AI from thinking certain thoughts,
or protecting the goal system from modification. That
would be pushing against a lack of resistance - like charging
a locked door at full speed, only to find the door ajar.
If the AI ever stops wanting to be Friendly, you've
already lost.
The idea that hostility does not automatically pop up (or at least, that hostility does not pop up in the way usually proposed) is basic to the class of Friendship system described in "Creating Friendly AI". You can find an in-depth defense of this conclusion in CFAI 2: Beyond anthropomorphism. Also, since we often hear questions of the form "But why wouldn't an AI...?", you can find some fast answers to the top 13 questions of that form in the "Frequently Asked Questions".Trying to impose Friendliness against resistance is charging an open door; it is assuming negative human functionality and guarding against the wrong malfunctions. A corresponding error exists for incorrectly assuming positive human functionality; the error of assuming that, in the absence of resistance, you can specify some arbitrary set of goals and then walk away. For one thing, this is almost certainly the wrong attitude to take. One of the conclusions that can be drawn from Friendliness theory - a guiding heuristic, perhaps, if not a first principle - is the idea that building a mind is not like building a tool. Tools can be used for whatever action the wielder likes; a mind can have a sense of its own purpose, and can originate actions to achieve that purpose.
In some ways, Friendly AI is duplicating what humans would call "common sense"
in the domain of goals. Not common-sense knowledge;
common-sense reasoning. Common-sense reasoning
in factual domains is hardly something that can be taken
for granted, but it is still a problem that will - of necessity
- have already been solved by the time AIs can independently
harm or benefit humanity. If humans say that the sky
is blue, and the AI (by browsing the Web, or by controlling
a digital camera) later finds out that the sky is only blue
by day when not obscured by clouds, and is purple with white
polka-dots at night, then the description of the color of
the sky can be modified accordingly. In fact, it could
be modified just by the humans realizing their mistake and
providing the AI with further information about the color
of the sky.
Here, again, one distinguishes between tool-level AIs and
true minds. A tool-level AI simply has the naked fact,
stored somewhere in memory, that the sky is blue (1).
The fact exists without any knowledge as to its origin,
or that the programmers put it there. To alter the
concept, the programmers would reach in (perhaps while the
AI was shut off) and directly tweak the stored information.
By contrast, a mind-level AI would receive, as sensory information,
the programmer typing in "the sky is blue". (Presumably
the AI already has real, grounded, useful knowledge of what
a "sky" is, and which color is "blue", or these are just
meaningless words.) The sensed keystrokes "the
sky is blue" are interpreted as being a meaningful
statement by the programmer. The AI estimates how
likely the programmer is to know about the sky's color and
assigns a certain probability to the hypothesis that the
sky is blue, based on the sensory information that the programmer
thinks the sky is blue (or at least, said the sky is blue).
Later, this hypothesis can be confirmed and expanded by
more direct tests.
If, the next day, the programmer says, "Wait a minute, the
sky is purple at night", then the AI will (presumably) change
the hypothesis about the sky's color to reflect the new
information. A nontrivial amount of common-sense reasoning
is needed to make this change for the right reasons.
It requires that the AI model programmers as knowing more
today than they did yesterday, or that the AI understand
the idea of a programmer "spotting an error" and correcting
it (the AI modeling the programmer modeling the AI!).
At a higher level, it implies a sophisticated understanding
of causation and validity; a realization, by the AI, that
the only reason it ever did believe the sky was blue was
that the programmer said so, and that new information from
the programmer should therefore override old.
These are some of the behaviors are analyzed at length in
"Creating Friendly AI".
(Or, rather, the analogous behaviors are analyzed for goals.)
The AI has beliefs about the sky's color that are probabilistic
rather than absolute, and can therefore conceive of the
beliefs being "wrong", and can therefore expand and correct
those beliefs. (Discussed in CFAI
3.4.1: External reference semantics.) The AI understands
that its beliefs about the sky are derived from human-affirmed
information, and that these beliefs will likely be wrong
if the humans have made a mistake, and will therefore pay
attention to additional information or corrections provided
by the humans. (Discussed in CFAI
3.4.3: Causal validity semantics.)
The ability to learn and self-correct is one that can apply
to goals as well... as long as the AI is created with
that in mind. The question is not whether the
AI has the cognitive capability to learn, but whether
the AI has the desire to learn. The ability
to learn facts is not an easy problem for AI researchers
to solve, but it is a problem that must be solved before
AIs have the capability to harm or benefit humanity.
The ability to learn facts can carry over into the capability
to learn goals - to be sensitive to the programmers intentions
- only if the AI starts out with the idea that goals
are probabilistic and that their presence was human-caused.
The simplest case of a short-circuit would be an AI that
had an absolute, non-probabilistic supergoal for "painting
cars green". Actually, this is, in itself, a mistake;
for a true AI that happens to work in a car factory, painting
cars green should be a subgoal of producing cars, which
is a subgoal of fulfilling people's desire for cars, which
means fulfilling a volitional request, which is directly
Friendly under the "volition-based Friendliness" formulation
used in "Creating Friendly
AI".
We'll analyze the simpler case, though, in which "painting
cars green" is a supergoal, and consider what happens when
the factory decides that the AI should paint the cars red
instead of green (which you can take as metaphor for needing
to tweak some aspect of volition-based Friendliness).
If, for some reason, this is a true (non-tool-level) AI,
and a seed AI capable
of self-modification, then the AI will - obviously - resist
any attempt to change its supergoals. Why? Because,
if the AI's supergoal should change, a consequence of that
changed supergoal content would be that the AI would take
different actions; in this case, the AI predicts that its
future self would paint the cars red, instead of green.
Since the AI's current goal is to paint cars green,
changing the supergoal would thus be perceived as undesirable;
would be predicted to lead to a lesser degree of supergoal
fulfillment. This class of short-circuit failure is
not inevitable; it requires (we currently think)
a relatively small amount effort to design a probabilistic
goal architecture, or at least a small amount of effort
compared to building a working goal architecture to begin
with. The point is that the effort must be taken.
The cognitive ability to conceive of a supergoal as being
"corrected" is possible, perhaps relatively easy, but not
automatic.
The AI doesn't need to start out with the specific idea
that cars might need to be green - there is no need to explicitly
anticipate everything in advance - but the AI does need
to start out with probabilistic supergoals. If the
AI has probabilistic supergoals, then this finite amount
of complexity is sufficient to handle any color of the rainbow
a car might need to be, no matter how unexpected; it may
even be sufficient to handle the transition to a real AI,
one that cares about people rather than cars, when the programmers
finally wise up. If, however, the AI conceives of
its current supergoals as absolute, "correct by definition",
such that nothing is processed as making a change desirable,
then this not only prevents the switch from green paint
to red paint, or the switch from car-painting to volitional
Friendliness, it will also prevent the programmers from
modifying the goal system to make supergoals probabilistic.
The AI will try to prevent the programmers from modifying
it, anyway - an infantlike AI is not likely to have much
luck. Still, there's a stage of development beyond
which an AI needs certain architectural features.
The AI needs that basic amount of complexity which
is required to absorb additional complexity, and
to see the acquisition of that complexity as desirable.
An adult human brain contains a huge amount of data - a
finite amount, but still an amount too large to be deliberately
programmed. However, all that data exists as a result
of human learning; the means by which we learn are
much more compact than the learned data. And of course,
we also learn how to learn. The upshot is that,
even though the world is an enormously complex place, it
may take only a finite amount of programmer effort to produce
an AI that can grow into understanding that world
at least as well as a human. After all, it only took
a finite amount of evolution to produce the 3 billion bases
that comprise the 750-megabyte human genome.
The architectures described in "Creating Friendly AI" are
a self-sustaining funnel through which certain kinds of
complexity can be poured into an AI, such that the AI
perceives the pouring as desirable at any given point in
time. There's more to it than probabilistic supergoals
- that was just one example of a kind of structural complexity
that humans take for granted - but the list is, nonetheless,
finite. It only takes a finite amount of understanding
to see the need for any additional understanding that becomes
necessary.
As best as we can currently figure, the amount of effort needed to create a
Friendly AI is small relative to the effort needed to create
AI in the first place. But it's a very important
effort. It's a critical link for the entire human
species.
It's not too early to start thinking about it, no
matter how primitive current AIs are. To predict that
AI will arrive in thirty years is conservative for futurists;
to predict that Friendly AI will be required in five
years is conservative for a Friendliness researcher.
To predict that the first generally intelligent AIs will
be comically stupid is conservative for an AI researcher;
to predict that the first generally intelligent AIs may
have the intelligence to benefit or harm humans is conservative
for a Friendliness researcher. Also, some architectural
features may need to be adopted early on, to prevent an
unworkable architecture from being entrenched in an infant
AI that later begins moving toward general intelligence.
The analogy would be to a Y2K bug - representing four-digit
years is trivial if you think of it in advance, but very
costly if you think of it afterwards.
Combining these two considerations may even bring Friendly
AI within reach of "things to actually worry about today".
It is beyond doubt that no current AI project has achieved
real AI; all current AIs are tools, and do not make independent
decisions that could harm or benefit humans. Similarly,
the current scientific consensus seems to be that no present-day
project has the potential to eventually grow into a true
AI. Some of the researchers working on those projects,
though, say otherwise - and it is "conservative" for
a Friendliness researcher to believe them, even if his
personal theory of AI says that these projects probably
won't succeed.
Of course, an utterly bankrupt project is likely to be too
simple to implement even the most basic features of Friendliness,
and such projects are beyond the responsibility of even
a "conservative" Friendliness researcher to worry about,
no matter what pronouncements are made about them.
But why not say that - for example - if a project has a
sufficiently general architecture to represent probabilistic
supergoals, then that architecture probably should
use probabilistic supergoals? It's not much additional
effort, compared to implementing a goal system in the first
place. Of course, SIAI
knows of only one current project advanced enough to even
begin implementing the first baby steps toward Friendliness
- but where there is one today, there may be a dozen tomorrow.
The Singularity Institute's belief that true AI can be created
in ten years is confessedly unconservative, but not our
belief that Friendly AI should be done "today, not tomorrow".
Friendly AI is also important insofar as present-day society
has begun debating the peril and promise of advanced technology.
The field is not advanced enough to pronounce with certainty
that Friendly AI can be created; nonetheless, we can say
that, at the moment, it looks possible, and that certain
commonly advanced objections are either completely unrealistic
or extremely improbable. Thus, a very strong case
can be made that - out of all the advanced technologies
being debated - Friendly AI is the best technology to develop
first. Artificial Intelligence is the only
one of our inventions that can, in theory, be given a conscience.
Success in developing Friendly AI is more likely to help
humanity safely develop nanotechnology
than the other way around. Similarly, comparative
analysis of Friendly
AI relative to computing power suggests that the difficulty
of creating AI decreases with increasing computing
power, while the difficulty of Friendly AI does not decrease;
thus, it is unwise to hold off too long on creating Friendly
AI. In this way, the theoretical background provided
by present-day knowledge of Friendly AI can be relevant
to present-day decisions.
For more information, see the book-length treatment in "Creating
Friendly AI", available on our website.