|
|
|
|
Welcome to General Intelligence and Seed AI version 2.3. The purpose of this document is to describe the principles,
paradigms, cognitive architecture, and cognitive components needed to build
a complete mind possessed of general intelligence, capable of self-understanding,
self-modification, and recursive self-enhancement.
| Multi-page version: | http://singinst.org/GISAI/ | |
| Single-page version: | http://singinst.org/GISAI.html | |
| Printable version: | http://singinst.org/printable-GISAI.html |
©2001 by the Singularity Institute for Artificial Intelligence, Inc. All rights reserved.
"General Intelligence and Seed AI" is a publication of the Singularity Institute for Artificial
Intelligence, Inc., a nonprofit corporation. You can contact the
Singularity Institute at institute@singinst.org.
To support the Singularity institute, visit http://singinst.org/donate.html.
The Singularity Institute is a 501(c)(3) public charity and your donations
are tax-deductible to the full extent of the law. The seed AI project
is presently in the design/conceptualization stage and no code has yet been
written; additional funding is required before the project can be launched. This is a near-book-length explanation. If you need well-grounded
knowledge of the subject, then we highly recommend reading °GISAI
straight through. However, if you need answers immediately, see the
Singularity Institute pages on AI
for introductory articles. GISAI is a work in progress. As of version 2.3, the
sections "Paradigms" and "Mind" are complete and self-contained. The
section "Cognition" is in progress and may contain references to unimplemented
topics. As additional topics are published, the minor version number
(second digit) increases. Words defined in the Glossary
look like this: General intelligence itself is huge. The human brain, created
by millions of years of evolution, is composed of a hundred billion neurons
connected by a hundred trillion synapses, forming more than a hundred neurologically
distinguishable areas. We should not expect the problem of AI to
be easy. Subproblems of cognition include attention, memory, association,
abstraction, symbols, causality, subjunctivity, expectation, goals, actions,
introspection, caching, and learning, to cite a non-exhaustive list.
These features are not "emergent". They are °complex
functional adaptations, evolved systems with multiple components
and sophisticated internal architectures, whose functionality must
be deliberately duplicated within an artificial mind. If done right,
cognition can support the thoughts implementing abilities such as analysis,
design, understanding, invention, self-awareness, and the other facets
which together sum to an intelligent mind. An intelligent mind with
access to its own source code can do all kinds of neat stuff, but we'll
get into that later.
Different
schools of AI are distinguished by different kinds of underlying "mindstuff".
Classical AI consists of "predicate calculus" or "propositional logic",
which is to say suggestively named °LISP tokens, plus directly
coded procedures intended to imitate human formal logic. Connectionist
AI consists of neurons implemented on the token level, with each neuron
in the input and output layers having a programmer-determined interpretation,
plus intervening layers which are usually not supposed to have a
direct interpretation, with the overall network being trained by an external
algorithm to perform perceptual tasks. (Although more biologically
realistic implementations are emerging.) Agent-based AI consists
of hundreds of humanly-written pieces of code which do whatever the programmer
wants, with interactions ranging from handing data structures around to
tampering with each other's behaviors.
Seed AI inherits connectionism's belief that error tolerance is a good
thing. Error tolerance leads to the ability to mutate. The
ability to mutate leads to evolution. Evolution leads to rich complexity
- "mindstuff" with lots of tentacles and interconnections. However,
connectionist theory presents a dualistic opposition between °stochastic,
error-tolerant neurons and the crystalline fragility of code or assembly
language. This conflates two logically distinct ideas. It's
possible to have crystalline neural networks in which a single error breaks
the chain of causality, or stochastic code in which (for example) multiple,
mutatable implementations of a function point have tweakable weightings.
Seed AI strongly emphasizes the necessity of rich complexity in cognitive
processes, and mistrusts classical AI's direct programmatic implementations.
However, seed AI also mistrusts that connectionist position which holds
higher-level cognitive processes to be sacrosanct and opaque, off-limits
to the human programmer, who is only allowed to fool around with neuron
behaviors and training algorithms, and not the actual network patterns.
Seed AI does prefer learned concepts to preprogrammed ones, since learned
concepts are richer. Nonetheless, I think it's permissible, if risky,
to preprogram concepts in order to bootstrap the AI to the point where
it can learn. More to the point, it's okay to have an architecture
where, even though the higher levels are stochastic or self-organizing
or emergent or learned or whatever, the programmer can still see and modify
what's going on. And it is necessary that the designer know
what's happening on the higher levels, at least in general terms, because
cognitive abilities are not emergent and do not happen by
accident. Both classical AI and connectionist AI propose a kind of
magic that avoids the difficulty of actually implementing the higher layers
of cognition. Classical AI states that a LISP token named "goal"
is a goal. Connectionist AI declares that it can all be done with
neurons and training algorithms. Seed AI admits the necessity of
confronting the problem directly.
In the human brain, there's at least one multilevel system where the
higher levels, though stochastic, still have known interpretations: the
visual processing system. Feature extraction by the visual cortex
and associated areas doesn't proceed in a strict hierarchy with numbered
levels (seed AI mistrusts that sort of thing), but there are definitely
lower-level features (such as retinal pixels), mid-level features (such
as edges and surface textures), and high-level features (such as 3D shapes
and moving objects). Together, the pixels and attached interpretations
constitute the cognitive object that is a visual description. It's
also possible to run the feature-extraction system in reverse, activate
a high-level feature and have it draw in the mid-level features which draw
in the low-level features. Such "reversible patterns" are necessary-but-not-sufficient
to memory recall and directed imagination. Memory and imagination,
when implemented via this method, can hold rich concepts that mutate interestingly
and mix coherently. A mental image of a red sausage can mutate directly
to a mental image of a blue sausage without either storing the perception
of redness in a single crystalline token or mutating the image pixel
by independent pixel. °David Marr's paradigm of the
"two-and-a-half dimensional world", multilevel holistic descriptions, is
writ large and held to apply not just to sensory feature extraction but
to categories, symbols, and other concepts. If seed AI has a "mindstuff",
this is it.
Seed AI also emphasizes the problem of sensory modalities (such as the
visual cortex, auditory cortex, and sensorimotor cortex in humans), previously
considered a matter for specialized robots. A sensory modality consists
of data structures suited to representing the "pixels" and features of
the target domain, and codelets or processing stages which extract mid-level
and high-level features of that domain. Sensory modalities grant
superior intuitions and visualizational power in the target domain, which
itself is sufficient reason to give a self-modifying AI a sensory modality
for source code. Sensory modalities can also provide useful metaphors
and concrete substrate for abstract reasoning about other domains; you
can play chess using your visual cortex, or imagine a "branching" if-then-else
statement. Sensory modalities provide a source of computational "raw
material" from which concepts can form. Finally, a sensory modality
provides intuitions for understanding concrete problems in a training domain,
such as source code. This makes it possible for the AI to learn
the art of abstraction - moving from concrete problems, to categorizing
sensory data, to conceptualizing complex methods, and so on - instead of
being expected to swallow high-level thought all at once.
Sensory modalities are the foundations of intelligence - a term carefully
selected to reflect necessity but not sufficiency; after you build the
foundations, there's still a lot of house left over. In particular,
a codic modality does not write source code, just as the visual cortex
does not design skyscrapers. When I speak of a "codic" sensory modality,
I am not extending the term "sensory modality" to include an autonomous
facility for writing source code. I am using "modality" in the original
sense to describe a system almost exactly analogous to the visual cortex,
just operating in the domain of source code instead of pixels.
Sensory modalities - visual, spatial, codic - are the bottom layer of
the AI, the layer in which representations and behaviors are specified
directly by the programmer. (Although avoiding the crystalline fragility
of classical AI is still a design goal.) The next layer is concepts.
Concepts are pieces of mindstuff, which can either describe the mental
world, or can be applied to alter the mental world. (Note
that successive concepts can be applied to a single target, building up
a complex visualization.) Concepts are contained in long-term memory.
Categories, symbols, and most varieties of declarative memory are concepts.
Concepts are more powerful if they are learned, trained, or otherwise created
by the AI, but can be created by the programmer for bootstrapping purposes.
(If, of course, the programmer can hack the tools necessary to modify the
concept level.) The underlying substrate of the concept can be code,
assembly language, or neural nets, whichever is least fragile and is easiest
to understand and mutate; this issue is discussed later, but I currently
lean towards code. (Not raw code, of course, but code as it
is understood by the AI.)
Concepts, when retrieved from long-term memory, built into a structure,
and activated, create a thought. The archetypal example of
a thought is building words - symbols - into a grammatical sentence and
"speaking" them within the mind. Thoughts exist in the RAM of the
mind, the "working memory" created by available workspace in the sensory
modalities. During their existence, thoughts can modify that portion
of the world-model currently being examined in working memory. (Not
every sentence spoken within the mind is supposed to describe reality;
thoughts can also create and modify °subjunctive ("what-if")
hypotheses.) Thoughts are identified with - supposed to implement
the functionality of - the human "stream of consciousness".
The three-layer model of intelligence is necessary, but not sufficient.
Building an AI "with sensory modalities, concepts, and thoughts" is no
guarantee of intelligence. The AI must have the right sensory
modalities, the right concepts, and the right thoughts.
Evolution is the cause of intelligence in humans. Intelligence
is an evolutionary advantage because it enables us to model, predict, and
manipulate reality, including that portion of reality consisting of other
humans and ourselves. In our physical Universe, reality tends to
organize itself along lines that might be called "°holistic"
or "°reductionist", depending on whether you're looking up
or looking down. "Which facts are likely to reappear? The simple
facts. How to recognize them? Choose those that seem simple.
Either this simplicity is real or the complex elements are indistinguishable.
In the first case we're likely to meet this simple fact again either alone
or as an element in a complex fact. The second case too has a good
chance of recurring since nature doesn't randomly construct such cases."
(Robert M. Pirsig, "Zen
and the Art of Motorcycle Maintenance", p. 238.)
Thought takes place within a causal, goal-oriented, "reductholistic"
world-model, and seeks to better understand the world or invent solutions
to a problem. Some methods include: °Holistic
analysis: Taking a known high-level characteristic of a known high-level
object ("birds fly"), and using °heuristics (thought-level
knowledge learned from experience) to try and construct an explanation
for the characteristic; an explanation consists of a low-level structure
which gives rise to that high-level characteristic in a manner consistent
with all known facts about the high-level object ("a bird's flapping wings
push it upwards"). Causal analysis: Taking a known fact ("my
telephone is ringing") and using heuristics to construct a causal
sequence which results in that fact ("someone wants to speak to me").
Holistic design: Taking a high-level characteristic as a design goal
("go fast"), using heuristics to reduce the search space by reasoning about
constraints and opportunities in possible designs ("use wheels"), and then
testing ideas for specific low-level structures that attempt to satisfy
the goals ("bicycles").
Both understanding and invention are fundamentally and messily recursive;
whether a bicycle works depends on the design of the wheels, and whether
a wheel works depends on whether that wheel consists of steel, rubber or
tapioca pudding. Hence the need for °heuristics that
bind high-level characteristics to low-level properties. Hence the
need to recurse on finding new heuristics or more evidence or better tools
or greater intelligence or higher self-awareness before the ultimate task
can be solved. Solving a problem gives rise to lasting self-development
as well as immediate solutions.
When a sufficiently advanced AI can bind a high-level characteristic
like "word-processing program" through the multiple layers of design to
individual lines of code, °ve can write a word-processing
program given the verbal instruction of "Write a word-processing program."
(Of course, following verbal instructions also assumes speech recognition
and language processing - not to mention a very detailed knowledge of what
a word-processing program is, what it does, what it's for, how humans will
use it, and why the program shouldn't erase the hard drive.) When
the AI, perhaps given a sensory modality for atoms and molecules, can understand
all the extant research on molecular manipulation, °ve can
work out a sequence of steps which will result in the construction of a
general nanotechnological assembler, or tools to build one. When
the AI can bind a high-level characteristic like "useful intelligence"
through the multiple layers of designed cognitive processes to individual
lines of code, ve can redesign °vis own source code and increase
vis intelligence.
Developing such a seed AI may require a tremendous amount of programmer
effort and programmer creativity; it is entirely possible that a seed AI
is the most ambitious software project in history, not just in terms of
the end result, but in terms of the sheer depth of internal design
complexity. To bring the problem into the range of the humanly solvable,
it is necessary that development be broken up into stages, so that the
first stages of the AI can assist with later stages. The usual aphorism
is that 10% of the code implements 90% of the functionality, which suggests
one approach. Seed AI adds the distinction between learned concepts
and programmer-designed concepts. If so, the first stage might be
an AI with simplified modalities, preprogrammed simple concepts, low-level
goal definitions, and perhaps even programmer-assisted development of the
stream-of-consciousness reflexes needed for coherent thought. Such
an AI would hopefully be capable of manipulating code in simple ways, thus
rendering the source code for concepts (and in fact its own source code)
subject to the type of flexible and useful mutations needed to learn rich
concepts or evolve more optimized code. The skeleton AI helps us
fill in the flesh on the skeleton.
...
Have you got all that?
Good.
Take a deep breath.
We're ready to begin.
It
is probably impossible to write an AI in immediate possession of human-equivalent
abilities in every field; transhuman abilities even more so, since there's
no working model. The task is not to build an AI with some astronomical
level of intelligence; the task is building an AI which is capable of improving
itself, of understanding and rewriting its own source code.
The task is not to build a mighty oak tree, but a humble seed.
As the AI rewrites itself, it moves along a trajectory of intelligence.
The task is not to build an AI at some specific point on the trajectory,
but to ensure that the trajectory is open-ended, reaching human equivalence
and transcending it. Smarter and smarter AIs become better and better
at rewriting their own code and making themselves even smarter. When
writing a seed AI, it's not just what the AI can do now, but what it will
be able to do later. And the problem isn't just writing good code,
it's writing code that the seed AI can understand, since the eventual goal
is for it to rewrite its own assembly language. (1).
If "recursive self-enhancement" is to avoid running out of steam, it's
necessary for code optimization or architectural changes to result in an
increment of actual intelligence, of smartness, not just speed.
Running an optimizing compiler over its own source code (2) may result in a faster optimizing
compiler. Repeating the procedure a second time accomplishes nothing,
producing an identical set of binaries, since the same algorithm is being
run - only faster. A human who fails to solve a problem in one year
(or solves it suboptimally) may benefit from another ten years to think
about the problem; even so, an individual human may eventually run out
of ideas. An individual human who fails to solve a problem in a hundred
years may, if somehow transformed into an Einstein, solve it within an
hour. Faster unintelligent algorithms accomplish little or nothing;
faster intelligent thought can make a small difference; better intelligent
thought makes the problem new again.
If each rung on the ladder of recursive self-enhancement involves a
leap of sufficient magnitude, then each rung should open up enough new
vistas of self-improvement for the next rung to be reached. If not,
of course, the seed AI will have optimized itself and used up all perceived
opportunities for improvement without generating the insight needed to
see new kinds of opportunities. In this case the seed AI will
have stalled, and it will be time for the human programmers to go to work
nudging it over the bottleneck. Ultimately, the AI must cross, not
only the gap that separates the mythical average human from Einstein, but
the gap that separates
homo sapiens neanderthalis from homo sapiens
sapiens. The leap to true understanding, when it happens,
will open up at least as many possibilities as would be available to a
human researcher with access to vis own neural source code.
A surprisingly frequent objection to self-enhancement is that intelligence,
when defined as "the ability to increase intelligence", is a circular definition
- one which would, they say, result in a sterile and uninteresting AI.
Even if this were the definition (it isn't), and the definition were circular
(it wouldn't be), the cycle could be broken simply by grounding the definition
in chess-playing ability or some similar test of ability. However,
intelligence is not defined as the ability to increase intelligence;
that is simply the form of intelligent behavior we are most interested
in. Intelligence is not defined at all. What intelligence
is,
if you look at a human, is more than a hundred °cytoarchitecturally
distinct areas of the brain, all of which work together to create intelligence.
Intelligence is, in short, modular, and the tasks performed by individual
modules are different in kind from the nature of the overall intelligence.
If the overall intelligence can turn around and look at a module as an
isolated process, it can make clearly defined performance improvements
- improvements that eventually sum up to improved overall intelligence
- without ever confronting the circular problem of "making itself more
intelligent". Intelligence, from a design perspective, is a goal
with many, many subgoals. An intelligence seeking the goal of improved
intelligence does not confront "improved intelligence" as a naked fact,
but as a very rich and complicated fact adorned with less complicated subgoals.
Presumably there is an ultimate limit to the intelligence that can be
achieved on a given piece of hardware, but if the seed AI can design better
hardware, the cycle continues. To be concrete, if a seed AI is smart
enough to chart a path from modern technological capabilities to °nanotechnology
- to the hardware described in K. Eric Drexler's Nanosystems
- this should be enough computing power to provide thousands or millions
of times the raw capacity of a human brain. (3). Whether the cognitive
and technological trajectory beyond this point continues forever or tops
out at some ultimate physical limit is basically irrelevant from a human
perspective; nanotechnology plus thousands of times human brainpower should
be far more than enough to accomplish whatever you wanted a transhuman
for in the first place.
This scenario often meets with the objection that a lone AI can accomplish
nothing; that technological advancement requires an entire civilization,
with exchanges between thousands of scientists or millions of humans.
This actually understates the problem. To think a single thought,
it is necessary to duplicate far more than the genetically programmed functionality
of a single human brain. After all, even if the functionality of
a human were perfectly duplicated, the AI might do nothing but burble for
the first year - that's what human infants do.
Perceptions have to coalesce into concepts. The concepts have
to be strung together into thoughts. Enough good thoughts have to
be repeated often enough for the sequences to become °cached,
for the often-repeated subpatterns to become reflex. Enough of these
infrastructural reflexes must accumulate for one thought to give rise to
another thought, in a connected chain, forming a stream of consciousness.
Unless we want to sit around for years listening to the computer go ga-ga,
the functionality of infancy must be either encapsulated in a virtual world
that runs in computer time, or bypassed using a skeleton set of preprogrammed
concepts and thoughts. (Hopefully, the "skeleton thoughts" will be
replaced by real, learned thoughts as the seed AI practices thinking.)
Human scientific thought relies on millennia of accumulated knowledge,
the how-to-think °heuristics discovered by hundreds of geniuses.
While a seed AI may be able to absorb some of this knowledge by surfing
the 'Net, there will be other dilemnas, unique to seed AIs, that it must
solve on its own.
Finally, the autonomic processes of the human mind reflect millions
of years of evolutionary optimization. Unless we want to expend an
equal amount of programming effort, the functionality of evolution itself
must be replaced - either by the seed AI's self-tweaking of those algorithms,
or by replacing processes that are autonomic in humans with the deliberate
decisions of the seed AI.
That's a gargantuan job, but it's matched by equally powerful tools.
The traditional advantages of computer programs
- not "AI", but "computer programs" - are threefold: The ability
to perform repetitive tasks without getting bored; the ability to
perform algorithmic tasks at greater linear speeds than our 200-°hertz
neurons permit; and the ability to perform complex algorithmic tasks
without
making mistakes (or rather, without making those classes of mistakes
which are due to distraction or running out of short-term memory).
All of which, of course, has nothing to do with intelligence.
The toolbox of seed AI is yet unknown; nobody has built one. This
page is more about building the first stages, the task of getting the seed
AI to say "Hello, world!" But, if this can be done, what advantages
would we expect of a general intelligence with access to its own source
code?
The ability to design new sensory modalities. In a sense,
any human programmer is a blind painter - worse, a painter born without
a visual cortex. Our programs are painted pixel by pixel, and are
accordingly sensitive to single errors. We need to consciously keep
track of each line of code as an abstract object. A seed AI could
have a "codic cortex", a sensory modality devoted to code, with intuitions
and instincts devoted to code, and the ability to abstract higher-level
concepts from code and intuitively visualize complete models detailed in
code. A human programmer is very far indeed from vis ancestral environment,
but an AI can always be at home. (But remember: A codic modality
doesn't write code, just as a human visual cortex doesn't design skyscrapers.)
The ability to blend conscious and autonomic thought. Combining
°Deep Blue with °Kasparov doesn't yield a being
who can consciously examine a billion moves per second; it yields a Kasparov
who can wonder "How can I put a queen here?" and blink out for a fraction
of a second while a million moves are automatically examined. At
a higher level of integration, Kasparov's conscious perceptions of each
consciously examined chess position may incorporate data culled from a
million possibilities, and Kasparov's dozen examined positions may not
be consciously simulated moves, but "skips" to the dozen most plausible
futures five moves ahead. (5).
Freedom from human failings, and especially human politics.
The tendency to rationalize untenable positions to oneself, in order to
win arguments and gain social status, seems so natural to us; it's
hard to remember that rationalization is a °complex functional adaptation,
one that would have no reason to exist in "minds in general". A synthetic
mind has no political instincts (6);
a synthetic mind could run the course of human civilization without politically-imposed
dead ends, without °observer bias, without the tendency to
rationalize. The reason we humans instinctively think that progress
requires multiple minds is that we're used to human geniuses, who make
one or two breakthroughs, but then get stuck on their Great Idea and oppose
all progress until the next generation of brash young scientists comes
along. A genius-equivalent mind that doesn't age and doesn't rationalize
could encapsulate that cycle within a single entity.
Overpower - the ability to devote more raw computing power, or
more efficient computing power, than is devoted to some module in the original
human mind; the ability to throw more brainpower at the problem to yield
intelligence of higher quality, greater quantity, faster speed, even difference
in kind. Deep Blue eventually beat Kasparov by pouring huge amounts
of computing power into what was essentially a glorified search tree; imagine
if the basic component processes of human intelligence could be similarly
overclocked...
Self-observation - the ability to capture the execution of a
module and play it back in slow motion; the ability to watch one's own
thoughts and trace out chains of causality; the ability to form concepts
about the self based on fine-grained introspection.
Conscious learning - the ability to deliberately construct or
deliberately improve concepts and memories, rather than entrusting them
to autonomic processes; the ability to tweak, optimize, or debug learned
skills based on deliberate analysis.
Self-improvement - the ubiquitous glue that holds a seed AI's
mind together; the means by which the AI moves from crystalline, programmer-implemented skeleton functionality
to rich and flexible thoughts. In the human mind, °stochastic
concepts - combined answers made up of the average of many little answers
- leads to error tolerance; error tolerance lets concepts mutate without
breaking; mutation leads to evolutionary growth and rich complexity.
An AI, by using probabilistic elements, can achieve the same effect; another
route is deliberate observation and manipulation, leading to deliberate
"mutations" with a vastly lower error rate. What are these mutations
or manipulations? A blind search can become a heuristically guided
search and vastly more useful; an autonomic process can become conscious
and vastly richer; a conscious process can become autonomic and vastly
faster - there is no sharp border between conscious learning and tweaking
your own code. And finally, there are high-level redesigns, not "mutations"
at all, alterations which require too many simultaneous, non-backwards-compatible
changes to ever be implemented by evolution.
If all of that works, it gives rise to self-encapsulation and
recursive
self-enhancement. When the newborn mind fully understands vis
own source code, when ve fully °understands the intelligent
reasoning that went into vis own creation - and when ve is capable of °inventing
that reason independently, so that the mind contains its own design - the
cycle is closed. The mind causes the design, and the design causes
the mind. Any increase in intelligence, whether sparked by hardware
or software, will result in a better mind; which, since the design was
(or could have been) generated by the mind, will propagate to cause a better
design; which, in turn, will propagate to cause a better mind. (7). And since the seed AI will encapsulate
not only the functionality of human individual intelligence but the functionality
of evolution and society, these causes of intelligence will be subject
to improvement as well. We might call it a "civilization-in-a-box",
an entity with more "hardware" intelligence than Einstein (8) and capable of codifying abstract thought
to run at the linear speed of a modern computer.
A successful seed AI would have power. A genuine civilization-in-a-box,
thinking at a millionfold human speed, might fold centuries of technological
progress into mere hours. I won't beat the point to death.
I've done so in my other writings - Staring
into the Singularity, in particular. It's just that the fundamentalpurpose
of transhuman AI differs from that of traditional AI.
The academic purpose of modern prehuman AI is to write programs that
demonstrate some aspect of human thought - to hold a mirror up to the brain.
The commercial purpose of prehuman AI is to automate tasks too boring,
too fast, or too expensive for humans. It's possible to dispute whether
an academic implementation actually captures an aspect of human intelligence,
or whether a commercial application performs a task that deserves to be
called "intelligent".
In transhuman AI, if success isn't blatantly obvious to everyone
except trained philosophers, the effort has failed. The ultimate
purpose of transhuman AI is to create a °Transition Guide;
an entity that can safely develop °nanotechnology and any
subsequent ultratechnologies that may be possible, use transhuman °Friendliness
to see what comes next, and use those ultratechnologies to see humanity
safely through to whatever life is like on the other side of the Singularity.
This might consist of assisting all humanity in upgrading to the level
of superintelligent Powers, or creating an operating system for all the
quarks in the Solar System, or something completely unknowable. I
believe that, as the result of creating a °Friendly superintelligence,
involuntary death, pain, coercion, and stupidity will be erased from the
human condition; and that humanity, or whatever we become, will go on to
fulfill to the maximum possible extent whatever greater destiny or higher
goals exist, if any do.
To return to Earth: There will undoubtedly be many milestones,
many interim subgoals and interim successes, along the path to superintelligence.
The key point is that while embodying some aspect of cognition may be useful
or necessary, it is not an end in itself. Treating facets of cognition
as ends in themselves has led traditional AI to develop a sort of "trophy
mentality", a tendency to value programs according to whether they fit
surface descriptions. (One gets the impression that if you asked
certain AI researchers to write the next Great English Novel, they'd write
a 20-page essay on toaster ovens and then tear off through the streets,
shouting: "Eureka! It's in English! It's in English!")
My hope is that the lofty but utilitarian goals of seed AI will lead to
the habit of looking at every piece of the design and saying: "Sure,
it sounds neat, but how does it contribute materially to general intelligence?"
After all, if an aspect of cognition is duplicated faithfully but without
understanding its overall purpose, it's a matter of pure faith to expect
it to contribute anything.
But that brings us to the next section, "Thinking About AI".
AI has, in the past, failed repeatedly. The shadow
cast by this failure falls over all proposals for new AI projects.
The question is always asked: "Why won't your project fail, like
all the other projects? Why did the previous projects fail?
Does your theory of general intelligence explain the previous failures
while predicting success for your own efforts?" Actually, anyone
can explain away previous failures and predict success; all you have to
do is assert that some particular new characteristic is the One Great Idea,
necessary and sufficient to intelligence. The real question is whether
a new approach to AI makes the failure of previous efforts seem massively
inevitable, the predictable result of historical factors; whether the approach
provides a theory of previous failures that is satisfyingly obvious in
retrospect, makes earlier errors look like natural mistakes that any growing
civilization might make, and thus "swallows" the historical failures in
a new theory which leaves no dangling anxieties.
Okay. I won't go quite that far. Still, AI has an embarassing
tendency to predict success where none materializes, to make mountains
out of molehills, and to assert that some simpleminded pattern of suggestively-named
°LISP tokens completely explains some incredibly high-level
thought process. Why?
Consider the symbol your mind contains for 'light bulb'. In your
mind, the sounds of the spoken words "light bulb" are reconstructed in
your auditory cortex. A picture of a light bulb is loaded into your
visual cortex. Furthermore, the auditory and visual cortices are
far more complex, and intelligent, than the algorithm your computer uses
to play sounds and MPEG files. Your auditory cortex has evolved specifically
to process incoming speech sounds, with better fineness and resolution
than it displays on other auditory tasks. Your visual cortex does
not simply contain a 2D pixel array. The visual cortex has specialized
processes that extract David Marr's "two-and-a-half dimensional world"
- edge detection, corner interpretation, surfaces, shading, movement -
and processes that extract from this a model of 3D objects in a 3D world.
"About 50 percent of the cerebral cortex of primates is devoted exclusively
to visual processing, and the estimated territory for humans is nearly
comparable." (°MITECS, "Mid-Level Vision".)
In the semantic net or Physical Symbol System of classical
AI, a light bulb would be represented by an atomic LISP token named light-bulb.
"General Intelligence and Seed AI" is written in informal style. Academic
readers, readers seeking a more technical explanation, or readers who prefer
a more formal style, may wish to read "Levels of Organization in General Intelligence"
instead.
"A °seed AI is an AI capable of self-understanding,
self-modification, and recursive self-enhancement."
Executive Summary and Introduction
Please bear in mind that the following is an introduction only.
It contains some ideas which must be introduced in advance to avoid circular
dependencies in the actual explanations, and a general summary of the cognitive
architecture so you know where the ideas fit in. In particular, I
am not expecting you to read the introduction and immediately shout:
"Aha! This is the Secret of AI!" Some important ideas are described,
yes, but just because an idea is necessary doesn't make it sufficient.
Too many of AI's past failures have come of the trophy-hunting mentality,
asking which buzzwords the code can be described by, and not asking what
the code actually
does.
This document is about general intelligence - what it is, and how to build
one. The desired end result is a self-enhancing mind or "seed AI".
Seed AI means that - rather than trying to build a mind immediately capable
of human-equivalent or transhuman reasoning - the goal is to build a mind
capable of enhancing itself, and then re-enhancing itself with that higher
intelligence, until the goal point is reached. "The task is not to
build an AI with some astronomical level of intelligence; the task is building
an AI which is capable of improving
itself, of understanding and
rewriting its own source code. The task is not to build a mighty
oak tree, but a humble seed." (From 1.1: Seed AI.)
..
.
1: Paradigms
1.1: Seed AI
1.1.1: The AI Advantage
1.2: Thinking About AI
| NOTE: | I say "LISP tokens", not "LISP symbols", despite convention and accepted usage. Calling the lowest level of the system "symbols" is a horrifically bad habit. |
Some of the problem may be explained by history; back when AI was being invented, in the 1950s and 1960s, researchers had tiny little machines that modern pocket calculators would sneer at. These early researchers chose to believe they could succeed with "symbols" composed of small LISP structures, cognitive "processes" with the complexity of one subroutine in a modern class library. They were wrong, but the need to believe produced approaches and paradigms that sank AI for decades.
Previous
AI has been conducted under the Physicist's Paradigm. The development
of physics over the past few centuries - at least, the dramatic, stereotypical
part - has been characterized by the discovery of simple equations that
neatly account for complex phenomena. In physics, the task is finding
a single bright idea that explains everything. Newton took a single
assumption (masses attract each other with a force equal to the product
of the masses divided by the square of the distance) and churned through
some calculus to show that, if an apple falls towards the ground at a constant
acceleration, then this explains why planets move in elliptical orbits.
The search for a similar fits-on-a-T-Shirt unifying principle to fully
explain a brain with hundreds of °cytoarchitecturally distinct
areas has wreaked havoc on AI.
"Heuristics are compiled hindsight; they are judgemental rules which,
if only we'd had them earlier, would have enabled us to reach our present
state of achievement more rapidly." (Douglas Lenat, 1981.)
The heuristic learned from past failures of AI might be titled "Necessary, But Not Sufficient". Whenever
neural networks are mentioned in press releases, the blurb always includes
the phrase "neural networks, which use the same parallel architecture found
in the human brain". Of course, the "neurons" in neural networks
are usually nothing remotely like biological neurons. But the main
thing that gets overlooked is that it would be equally true (not very)
to say that neural networks use the same parallel architecture found in
an earthworm's brain. Regardless of whether neural networks
are Necessary, they are certainly Not Sufficient. The human brain
requires millions of years of evolution, thousands of modules, hundreds
of thousands of adaptations, on top of the simple bright idea of
"Hey, let's build a neural network!"
The Physicist's Paradigm lends itself easily to our need for drama.
One great principle, one bold new idea, comes along to overthrow the false
gods of the old religion... and set up a new bunch of false gods.
As always when trying to prove a desired result from a flawed premise,
the simplest path involves the Laws of Similarity and Contagion.
For example, the "neurons" in neural networks involve associative links
of activation. Therefore, the extremely subtle and high-level associative
links of human concepts must be explained by this low-level property.
Similarly, any instance of human deduction which can be written down (after
the fact) as a syllogism must be explained by the blind operation of a
ten-line-of-code process - even if the human thoughts blatantly involve
a rich visualization of the subject matter, with the results yielded by
direct examination of the visualization rather than formal deductive reasoning.
In AI, the one great simple idea usually operates on a low level,
in accordance with the Physicist's Paradigm. Reasoning from similarity
of surface properties is used to assert that high-level cognitive phenomena
are explained by the low-level phenomenon, which (it is claimed) is both
Necessary and Sufficient. This cognitive structure is a full-blown
fallacy; it contains the social drama (one brilliant idea, new against
old) and the rationalization (reasoning by similarity of surface properties,
sympathetic magic) necessary to bear any amount of emotional weight.
And that's how AI research goes wrong.
There are several ways to avoid making this class of mistake.
One is to have the words "Necessary, But Not Sufficient" tattooed on your
forehead. One is an intuition of causal analysis that
says "This cause does not have sufficient complexity to explain this effect."
One is to be instinctively wary of attempts to implement cognition on the
token level. (One is learning enough evolutionary psychology to recognize
and counter ideology-based thoughts directly, but that's moving off-topic...)
One is introspection. Human introspection currently has a bad
reputation in cognitive science, looked on as untrustworthy, unscientific,
and easy to abuse. This is totally true. Still, you can't build
a mind without a working model. It is necessary to know, intuitively,
that classical-AI propositional logic - syllogisms, property inheritance,
et cetera - is inadequate to explain your deduction that dropping an anvil
on a car will break it. You should be able to see, introspectively,
that there's more than that going on. You can visualize an anvil
smashing into your car's hood, the metal crumpling, and the windshield
shattering. (9). Clearly visible is vastly more mental material,
more cognitive "stuff", than classical-AI propositional logic involves.
The revolt against the Physicist's Paradigm can be formalized as the
Law of Pragmatism:
The key words are "contribute materially". An architecture can
be necessary to thought without accounting for the substance
of thought. The Law of Pragmatism says that if a neural network's
rules are simple enough to be formalized mathematically, than the substance
of any intelligent answers produced by that network will be attributable
to the specific pattern of weightings. If the pattern of weightings
is created by a mathematically formalizable learning method, then the substance
of intelligence will lie, not in the learning method, but in the intricate
pattern of regularities within the training instances.
We can't be
certain that the Law of Pragmatism will hold in the
future, but it's definitely a heuristic in the Lenatian sense; if only
we'd known it in the 1950s, so much error could have been avoided.
The Law of Pragmatism is one of the tools used to determine whether an
idea is Necessary, But Not Sufficient. (11).
°GISAI proposes a mind which contains modules vaguely
analogous to human sensory modalities (auditory cortex, visual cortex,
etc.). This does not mean that you can design any old system which
can be described as "containing modular sensory modalities" and then dash
off a press release about how your company is building an AI containing
modular sensory modalities. That's the trophy mentality I was talking
about earlier. A modular, modality-based system is Necessary, But
Not Sufficient; it is also necessary to have the right modules,
in the right sensory modalities, using the
right representation
and the right intuitions to process the right base of experience
to produce the right concepts that support the right thoughts
within the right larger architecture.
When you think of a light bulb, the syllables and phonemes of "light
bulb" are loaded into your auditory cortex; if you're a visual person,
a generic picture of a light bulb - the default exemplar - appears in your
visual cortex. Let's suppose that some AI has reasonably sophisticated
analogues of the auditory cortex and visual cortex, capable of perceiving
higher-level features as well as the raw binary data. This is clearly
necessary;
is it sufficient to understand light bulbs in the same way as a
human?
No. Not even close. When you hear the phrase "triangular
light bulb", you visualize a triangular light bulb.
The Law of Pragmatism
Any form of cognition which can be mathematically
formalized, or which has a provably correct implementation, is too simple
to contribute materially to intelligence.
| NOTE: | Please halt, close your eyes, and visualize a triangular light bulb. Please? Pretty please with sugar on top? |
How do these two symbols combine? You know that light bulbs are fragile; you have a built-in comprehension of real-world physics - sometimes called "naive" physics - that enables you to understand fragility. You understand that the bulb and the filament are made of different materials; you can somehow attribute non-visual properties to pieces of the three-dimensional shape hanging in your visual cortex. If you try to design a triangular light bulb, you'll design a flourescent triangular loop, or a pyramid-shaped incandescent bulb; in either case, unlike the default visualization of "triangle", the result will not have sharp edges. You know that sharp edges, on glass, will cut the hand that holds it.
Look at all that! It requires a temporal, four-dimensional understanding of the light bulb. It requires an appreciation, a set of intuitions, for cause and effect. It requires that you be capable of spotting a problem - a conflict with a goal - which requires means for representing conflicts, and cognitive reflexes derived from a goal system.
Look at yourself "looking at all that". It requires introspection, reflection, self-perception. It requires an entire self-sensory modality - representations, intuitions, cached reflexes, expectations - focused on the mind doing the thinking.
For you to read this paragraph, and think about it, requires a stream of consciousness. For you to think about light bulbs implies that you codified your past experiences of actual light bulbs into the representation used by your long-term memory. The visual image of the light bulb, appearing in your visual cortex, implies that a default exemplar for "light bulb" was abstracted from experience, stored under the symbol for "light bulb", and triggered by that symbol's auditory tag of 'light bulb'. And this exemplar can even be combined with the learned symbol for "triangle". You have formed an adjective, "triangular", consisting of characteristics which can be applied to modify the visual and design substance of the light-bulb concept. For you to visualize a light-bulb smashing, with an accompanying tinkling noise, requires synchronization of recollection and reconstruction across multiple sensory modalities.
I've mentioned many features in the last paragraphs; none of them are emergent. None of them will magically pop into existence on the high level "if only the simple low-level equation can be found". In a human, these features are °complex functional adaptations, generated by millions of years of evolution. For an AI, that means you sit down and write the code; that you change the design, or add design elements (special-purpose low-level code that directly implements a high-level case is usually a Bad Thing), specifically to yield the needed result.
In short, the design in GISAI is simply far larger, as a system
architecture, than any design which has been previously attempted.
It's large enough to resemble systems of the complexity described in the
471 articles in °The MIT Encyclopedia of the Cognitive Sciences.
(12). You'll appreciate this better after reading
the rest of the document, of course, but when you have done so, I expect
that seed AI will look too
different from past failures for one
to reflect on the other. Fish and fowl, apples and oranges, elephants
and typewriters. There is still the possibility that any given seed
AI project will fail, or even that seed AI itself will fail - but if so,
it will fail for different reasons.
Intelligence
is an evolutionary advantage because it enables us to model, predict, and
manipulate reality. This includes not only Joe Caveman (or rather,
Pat Hunter-Gatherer) inventing the bow and arrow, but Chris Tribal-Chief
outwitting his (13) political
rivals and Sandy Spear-Maker realizing that the reason her spears keep
breaking is that she's being too impatient while making them. That
is, the "reality" we model includes not just things, but other humans,
and the self. (14).
A chain of reasoning is important because it
ends with a conclusion about how the world works, or about how the world
can be altered. The "world", for these purposes, includes the internal
world of the AI; when designing a bicycle, the hypothesis "a round object
can traverse ground without bumping" is a statement about the external
world. The hypotheses "it'd be a good idea to think about round objects",
or "the key problem is to figure out how to interface with the ground",
or even "I feel like designing a bicycle", are statements about the internal
world.
From an external perspective, cognitive events matter only insofar as
they affect external behavior. Just so, from an internal perspective,
the effect on the world-model is the punchline, the substance. This
is not to say that every line of code must make a change to the world-model,
or that the world-model is composed exclusively of high-level beliefs about
the real world. The thought sequences that construct a what-if scenario
- a °subjunctive fantasy world - are altering a world-model,
even if it's not the model of the world. A "vague feeling
that there's some kind of as-yet unnamed similarity between two pictures"
is part of the content of the AI's beliefs about the world. The code
that produces that intuition may undergo many internal iterations, acting
on data structures with no obvious correspondence to the world-model, before
producing an understandable output.
What makes a pattern of bytes - or neurons - a "model"? And what
makes a particular statement in that model "true" or "false"? (15).
The best definition I've found is derived from looking at the cause
of our intelligence: "Intelligence is an evolutionary advantage because
it enables us to model, predict, and manipulate reality." Models
are useful because they correspond to external reality.
I
distinguish four levels of binding:
The "world-model" for an AI living in that °microworld
consists of everything the AI knows about that world - the positions, velocities,
radii, and masses of the billiard balls. More abstract perceptions,
such as "a group of °three billiard balls", are also part
of the world-model. The prediction that "billiard ball A and billiard
ball B will collide" is part of the world-model. If the AI imagines
a situation where four billiard balls are arranged in a square, then that
imaginary world has its own, °subjunctive world-model.
If the AI believes "'imagining four billiard balls in a square' will prove
useful in solving problem X", then that belief is part of the world-model.
In short, the world-model is not necessarily a programmatic concept
- a unified set of data structures with a common format and °API.
(Although it would be wonderfully convenient, if we could pull it off.)
The "world-model" is a cognitive concept; it refers to the content of all
beliefs, the substance of all mental imagery.
Returning to the billiard-ball world, what is necessary for an AI to
have a "model" of this world?
Suppose that a cue ball travelling south at 4 meters/second, bumping
into a billiard ball travelling south at 2 meters/second, results in the
cue ball and the billiard ball travelling south at 3 meters/second.
Suppose, furthermore, that these rules are contained within the AI's internal
model of the environment, so that if the AI visualizes a cue ball at {8.2,
6} of radius 1 travelling south at 4 m/s, and a ball at {8.2, 10} of radius
1 going south at 2 m/s, the AI will visualize the balls bumping one second
later at {8.2, 11}, and the two balls then travelling south at 3 m/s.
It's a long way from there to knowing - consciously, °declaratively
- that two balls in general bumping at 4 m/s and 2 m/s while going
in the same direction will travel on together at 3 m/s. It's an even
longer way to knowing that "if billiard ball X bumps into billiard ball
Y, then they will continue on together with the average of their velocities".
And it's a still longer way to reversing the rule and knowing that
"to get a group of two balls travelling together with velocity X, given
billiard ball A with velocity Y, bump it with billiard ball B having velocity
(2X - Y)". Finally, to close the loop, this last high-level rule
must be applied to create a particular hypothesized action in the
world-model, and the hypothesized action needs to be taken as a real action
in external reality.
Without jumping too far ahead, there are a number of properties that
a world-model needs to support high-level thought. It needs to support
°time
- multiple frames or a temporal visualization - with accompanying extraction
of temporal features. It needs to support predictions
and expectations (and an expectation isn't real unless the
AI notices when the expectation is fulfilled, and especially when it is
violated). The world-model needs to support hypotheses, °subjunctive
frames of visualization, which are distinct from "real reality" and can
be manipulated freely by high-level thought. (By "freely manipulated",
I mean a direct manipulative binding; choosing to think about a
billiard ball at position {2, 3} should cause a billiard ball to materialize
directly within the representation at {2, 3}, with no careful sequence
of actions required.) And for the visualization to be useful once
it exists, the high-level thought which created the billiard-ball image
must
refer to the particular image visualized... and the reference
must run both ways, a two-way linkage.
°Time, expectation, comparision, °subjunctivity,
visualization, introspection, and reference. I haven't defined any
of these terms yet. (Most are discussed in 3: Cognition,
although you can jump ahead to Appendix A: Glossary if you're
impatient.) Nonetheless, these are some of the basic attributes that
are present in human world-models, and which are Necessary (But Not Sufficient)
for the existence of high-level features such as causality,
intentionality, °goals, memory, learning, association, focus,
abstraction, categorization, and symbolization.
2: Mind
2.1: World-model
These definitions raise an army of fundamental issues - °time,
causality, °subjunctivity, goals, searching,
invention - but first, let's look at a concrete example. Imagine a °microworld
composed of Newtonian billiard balls - a world of spheres (or circles),
each with a position, radius, mass, and velocity, interacting on some frictionless
surface (or moving in a two-dimensional vacuum). (16).
In the last case, the AI may have been able to manipulate each of the six
billiard balls as a separate object, or each action may have affected multiple
balls simultaneously, requiring a more complex planning process.
The important thing is that "creating two symmetrical groups of three billiard
balls" is not something that would happen by chance, or be uncovered by
a blind search. For the AI to create a structure of billiard balls,
it will need °heuristics - knowledge about rules - that not
only link outcomes to actions, but reverse the process to link actions
to outcomes.
| NOTE: | I mention that list of features to illustrate what will probably be one of the major headaches for AI designers: If you design a system and forget to allow for the possibility of expectation, comparision, °subjunctivity, visualization, or whatever, then you'll either have to go back and redesign every single component to open up space for the new possibilities, or start all over from scratch. Actualities can always be written in later, but the potential has to be there from the beginning, and that means a designer who knows the requirements spec in advance. |
In a rainbow, the physical frequency of the light
changes smoothly and linearly with distance (19). Yet, when you look
at a rainbow, you see colors grouped into bands, with relatively sharp
borders. And it's not just you. Everyone sees the bands.
It gets worse. Consider: The frequency of light is a linear,
scalar, real number. The visible frequencies of light rise linearly
from red to blue, bounded by infrared and ultraviolet. But if you
look at a color wheel on your computer, you'll see that it's a wheel.
Red to orange to yellow to green to blue to... purple? ... and back to
red again. Where does purple come from? It's a color
that doesn't exist, seemingly added on afterwards to turn a linear spectrum
into a circle!
It turns out the color purple and the bands in a rainbow are both artifacts
of the way humans perceive color space, which in turn is a result of the
way our visual cortex has evolved to distinguish objects in the ancestral
environment and maintain color constancy under natural lighting.
(For more about this, see "The Perceptual Organization of Colors" in "The
Adapted Mind". It's definitely a cool article.)
The color purple, and the bands in the rainbow, aren't real.
But everyone sees them, so you can't just call them hallucinations.
I prefer to strike a happy compromise and say that purple and rainbows
exist in the Consensus. Nobody actually lives in external
reality, and we couldn't understand it if we did; too many quarks flying
around. When we walk through a hall, watching the floor and walls
and ceiling moving around us, we're actually walking through our visual
cortex. That's what we see, after all. We don't see the photons
reflected by the walls, and we certainly don't see the walls themselves;
every single detail of our perception is there because a neuron is firing
somewhere in the visual system. If the wrong neuron fired, we'd see
a spot of color that wasn't there; if a neuron failed to fire, we wouldn't
see a spot of color that was there. From this perspective, the actual
photons are almost irrelevant. Furthermore, all the colors in the
hall you're walking through are technically incorrect due to that old color-space
thing. Heck, you might even walk past something purple.
This is the point where the philosopher usually goes off the solipsistic
deep end. "It's all arbitrary! Nothing is real! Everything
is true! I can say whatever I want and nobody can do a thing about
it, bwahaha!" I hate this whole line of thinking. If
I ever start sounding like this, check my forehead for lobotomy scars.
The Consensus usually has an extremely tight °sensory,
°predictive, and °manipulative binding to external
reality. No, it doesn't work 100% of the time, but it works 99.99%
of the time, so the rules are just as strict. Just because you can't
see external reality directly doesn't mean it isn't there.
Everything you see is illusion, the Veil of Maya. Where Eastern
philosophy goes wrong is in assuming that the Veil of Maya is hiding something
big and important. What lies behind the illusion of a brick is the
actual brick. The vast majority of the time, you can forget the Veil
of Maya is even there.
Nor does our residence in the Consensus grant the Consensus primacy
over external reality. The Consensus itself is just another part
of reality. That's how reality binds the Consensus; it's just one
part of reality affecting another part, under the standard rules of interaction
imposed by the laws of physics. External reality existed before the
patterns in reality known as "humans" or "the Consensus". People
who ignore external reality on the grounds that "all truth is subjective"
tend to have their constituent quarks assimilated by the quark-patterns
we call "tigers".
However, sometimes it's important to remember that tigers only exist
in the Consensus. Suppose someone asks you for a definition of a
"tiger", and you give them a definition that works 99.99% of the time -
"big orange cat thingy with stripes". Then whoever it is paints a
tiger green and says, "Ha, ha! Your definition is wrong!" What
I would do in this case is give a more precise definition based on genetics,
behavior patterns, and so on, but then you have cyborg tigers and mutant
tigers. At that point, it becomes important to remember that it's
"just" the Consensus. You shouldn't expect things in the Consensus
to have perfect mathematical definitions. Evolution doesn't select
for tigers, or tiger-perceiving minds, that have philosophically elegant
definitions; evolution selects whatever works most of the time.
So why does the Consensus work? Because of a fundamental rule
of °reductholism: Forget about definitions.
Anything true "by definition" is a tautology, and bears no relation to
external reality - does not even refer to external reality.
Forget about definitions, and if you find that some cognitive perception
is inherently °subjective or °observer-dependent
- that the perception relies on qualities that exist only in the mind of
the observer - then relax and accept it as being useful to intelligence
most of the time, and don't go into philosophical fits. It isn't
real,
after all, so why should you worry?
Hey, that's life in the Consensus.
| DEFN: | Consensus: The Consensus is the world of shared perceptions that humanity inhabits. Things in the Consensus aren't really really real, but they usually correspond tightly to reality - enough to make the rules about what you can and can't say just as strict. What distinguishes the Consensus from actual reality is that there is no a priori reason why things should be formalizable, philosophically coherent, or unambiguous. |
A human has a visual cortex, an auditory cortex, a sensorimotor
cortex - areas of the brain specifically devoted to particular senses.
Each such "cortex" is composed of neural modules which extract important
mid-level and high-level features from the low-level data, in a way determined
by the "laws of physics" of that domain. The visual cortex and associated
areas (20)
are by far the best-understood parts of the brain, so that's what we'll
use for an example.
Visual information starts out as light hitting the retina; the resulting
information can be thought of as being analogous to a two-dimensional array
of pixels (although the neural "pixels" aren't rectangular). "Low-level"
feature extraction starts right in the retina, with neurons that respond
to edges, intensity changes, light spots, dark spots, et cetera.
From this new representation - the 2D pixels, plus features like edges,
light spots, and so on - the lateral geniculate nucleus and striate cortex
extract mid-level features such as edge orientation, movement, direction
of moving features, textures, the curvature of textured surfaces, shading,
and binocular perception. This information yields °David Marr's
two-and-a-half-dimensional world, which is composed of scattered facts
about the three-dimensional properties of two-dimensional features - this
is a continuous surface, this surface is curving away and to the left,
these two surfaces meet to form an edge, these three edges meet to form
a corner.
Finally, a 3D representation of moving objects is constructed from the
2.5D world. Constraint propagation: If the 3D interpretation
of one corner requires an edge to be convex, then that edge cannot be concave
in another corner. Object assembly: Multiple surfaces that
move at the same speed, or that move in a fashion consistent with rotation,
are part of a single object. Consistency: An object (or an
edge, or a surface) cannot simultaneously be moving in two directions.
The resulting 3D representation, still bound to the 2.5D features and
the 2D pixels, is sent to the temporal cortex for object recognition and
to the parietal cortex for spatial visualization.
The visual cortex is the foundation of one of the seven senses.
(Yes, at least seven. In addition to sight, sound, taste, smell,
and touch, there's proprioception (the nerves that tell us where our arms
and legs are) and the vestibular sense (the inner ear's inertial motion-detectors).
(21).) The neural areas
that are devoted solely to processing one sense or another account
for a huge chunk of the human cortex. In the modular partitioning
of the human brain, the single most common type of module is a sensory
modality, or a piece of one. This demonstrates a fundamental lesson
about minds in general.
Classical AI programs, particularly "expert systems", are often partitioned
into microtheories. A microtheory is a body of knowledge, i.e. a
big semantic net, e.g. propositional logic, a.k.a. suggestively named LISP
tokens. A typical microtheory subject is a human specialty, such
as "cars" or "childhood diseases" or "oil refineries". The content
of knowledge typically consists of what would, in a human, be very high-level,
heuristic statements: "A child that is sick on Saturday is more likely
to be seriously ill than a child who's sick on a schoolday."
How do the microtheory-based modules of classical AI differ from the
sensory
modules that are common in the human mind? How does a "microtheory
of vision" differ from a "visual cortex"? Why did the microtheory
approach fail?
There are two fundamental clues that, in retrospect, should have alerted
expert-system theorists ("knowledge engineers") that something was wrong.
First, microtheories attempt to embody high-level rules of reasoning -
heuristics that require a lot of pre-existing content in the world-model.
The visual cortex doesn't know about butterflies; it knows about edge-detection.
The visual cortex doesn't contain a preprogrammed picture of a butterfly;
it contains the feature-extractors that let you look at a butterfly, parse
it as a distinct object standing out against the background, remember that
object apart from the background, and reconstruct a picture of that object
from memory. We are not born with experience of butterflies; we are
born with the visual cortex that gives us the capability to experience
and remember butterflies. The visual cortex is not visual knowledge;
it is the space in which visual knowledge exists.
The second, deeper problem follows from the first. All of an expert
system's microtheories have the same underlying data structures (in this
case, propositional logic), acted on by the same underlying procedures
(in this case, a few rules of °Bayesian reasoning).
Why separate something into distinct modules if they all use the
same data structures and the same functions? Shouldn't a real program
have more than one real module?
I'm not suggesting that data formats and modules be proliferated because
this will magically make the program work better. Any competent programmer
knows not to use two data formats where one will do. But if the data
and processes aren't complex enough to seize the programmer by the throat
and force a modular architecture, then the program is too simple
to give rise to real intelligence.
Besides, a single-module architecture certainly isn't the way
the brain does it. Maybe there's some ingenious way to represent
auditory and visual information using a single underlying data structure.
If we can get away with it, great. But if no act of genius is required
to solve the very deep problem of getting domain-specific representations
to interact usefully, if the problem is "solved" because all the content
of thought takes the form of propositional logic, if all the behaviors
can fit comfortably into a single programmatic module - then the program
doesn't have enough complexity to be a decent video game, much less an
AI. (22).
We shouldn't be too harsh on the classical-AI researchers. Building
an AI that operates on "pure logic" - no sensory modalities, no equivalent
to the visual cortex - was worth trying. As Ed Regis would say, it
had a certain hubristic appeal. Why does human thought use the visual
cortex? Because it's there! After all, if you've already evolved
a visual cortex, further adaptations will naturally take advantage of it.
It doesn't mean that an engineer, working ab initio, must be bound
by the human way of doing things.
But it didn't work. The recipe for intelligence presented by GISAI
assumes an AI that possesses equivalents to the visual cortex, auditory
cortex, and so on. Not necessarily these particular cortices;
after all, Helen Keller (who was blind and deaf, and spoke in hand signs)
learned to think intelligently. But even Helen Keller had proprioception,
and thus a parietal lobe for spatial orientations; she had a sense of touch,
which she could use to "listen" to sign language; she could use the sensory
modalities she had to perceive signed symbols, and form symbols internally,
and string those symbols together to form sentences, and think. (23) Some equivalent of
some
type of "cortex" is necessary to the GISAI design.
"Cortex" is a specifically neurological term referring to the surface
area of the brain, and therefore I will use the term "sensory modality",
or "modality", instead of cortex.
| DEFN: | Modality: Modalities in an AI are analogous to human cortices - visual cortex, auditory cortex, et cetera - enabling the AI to visualize processes in the target domain. Modalities capture, not high-level knowledge, but low-level behaviors. A modality has data structures suited to representing the target domain, and °codelets or processing stages which extract higher-level features from raw data. |
Why does an AI need a visual modality? Because the human visual cortex and associated neuroanatomy - our visual modality - is what makes our thoughts of 2D and 3D objects real. Drew McDermott, in Artificial Intelligence Meets Natural Stupidity, pointed out that, just because a LISP token is labeled with the character string "hamburger", it does not mean that the program understands hamburgers. The program has not even noticed hamburgers. If the symbol were called G0025 instead of hamburger, nobody would ever be able to figure out that the token was supposed to represent a hamburger.
When two objects collide, we don't just have a bit of propositional logic that says collide(car, truck); we imagine two moving objects. We model 2D pixels and 3D features and visualize the objects crashing together. The edges touch, not as touch(edge-of(car), edge-of(truck)), but as two curves meeting and deforming at all the individual points along the edge. You could successfully look at a human brain and deduce that the neurons in question were modelling edges and colliding objects; this is, in fact, what visual neuroanatomists do. But if you did the same to a classical AI, if you stripped away the handy English variable names from the propositional logic, you'd be left with G0025(Q0423, U0111) and H0096(D0103(Q0423), D0103(U0111)). No amount of reasoning could bind those cryptic numbers to real-world cars or trucks.
Furthermore, our visual cortex is useful for more than vision. Philosophy
in the Flesh (George Lakoff and Mark Johnson) talks about the Source-Path-Goal pattern (24) - a trajector that moves, a starting
point, a goal, a route; the position of the trajector at a given time,
the direction at that time, the actual final destination... Philosophy
in the Flesh also talks about "internal spatial 'logic' and built-in
inferences": If you traverse a route, you have been at all locations
along the route; if you travel from A to B and B to C, you have traveled
from A to C; if X and Y are traveling along a direct route from A to B
and X passes Y, then X is further from A and closer to B than Y is.
These are all behaviors of spatial reality. Classical AI
would attempt to capture descriptions of this behavior; i.e. "if
travel(X,
A, B) and travel(X, B, C) then travel(X, A, C)".
The problem is that the low-level elements (pixels, trajectors, velocities)
making up the model can yield a nearly infinite number of high-level behaviors,
all of which - under the classical-AI method - must be described independently.
If A is-contained-in B, it can't get out - unless B has-a-hole.
Unless A is-larger-than the hole. Unless A can-turn-on-its-side
or the hole is-flexible. Trying to describe all the possible
behaviors exhibited by the high-level characteristics, without directly
simulating the underlying reality, is like trying to design a CPU that
multiplies two 32-bit numbers using a doubly-indexed lookup table with
2^64 (around eighteen billion billion) entries.
Real CPUs take advantage of the fact that 32-bit numbers are made of
bits.
This enables transistors to multiply using the wedding-cake method (or
whatever it is modern CPU designs use). A 32-bit number is not a
monolithic object. The numerical interpretation of 32 binary digits
is not intrinsic, but rather a high-level characteristic, an observation,
an abstraction. The individual bits interact, and yield a 32-bit
(or 64-bit) result which can then be interpreted as the resulting number.
The computer can multiply 9825 by 767 and get 7535775, not because someone
told
it that 9825 times 767 is 7535775, but because someone told it about how
to multiply the individual bits.
A visual modality grants the power to observe, predict, decide, and
manipulate objects moving in trajectories, not because the modality captures
knowledge
of high-level characteristics, but because the modality has elements which
behave
in the same way as the external reality. An AI with a visual modality
has the potential to understand the concept of "closer", not because it
has vast stores of propositional logic about closer(A, B), but
because the model of A and B is composed of actual pixels which
are actually getting closer. (25).
Source-Path-Goal is not just a visual pattern. It is a
metaphor
that applies to almost any effort. Force and
resistance
aren't just people pushing carts, they're companies pushing products.
Source-Path-Goal applies not just to walking to Manhattan, but a programmer
struggling to write an application that conforms to the requirements spec.
It applies to the progress of these very words, moving across the screen
as I type them, decreasing the distance to the goal of a publishable Web
page. Furthermore, the visual metaphor is in many cases a useful
metaphor, one which binds predictively. (26). A metaphor is useful
when it involves, not just a similarity of high-level characteristics,
but a similarity of low-level elements, or a single underlying cause.
(See previous footnote.) The visual metaphor that maps the behavior
of a programming task to the Source-Path-Goal pattern (a visual object
moving along a visual line) is useful if some measure of "task completed"
can be mapped to the quantitative position of the trajector, and the perceived
velocity used to (correctly!) predict the amount of time remaining on the
task.
Of course, one must realize that having a visual modality is Necessary,
But Not Sufficient, to pulling that kind of stunt. In such cases,
noticing
the analogy is ninety percent of the creativity. The atomic case
of such noticing would consist of generating models at random, either by
generating random data sets or by randomly mixing previously acquired models,
until some covariance, some similarity, is noticed between the model and
the reality. And then the AI says "Eureka!"
Of course, except for very simple metaphors, the search space is too
large for blind constructs to ever match up with reality. It is more
often necessary to deliberately construct a model - in this case, a visual
model - whose behaviors correspond to reality. Discussion of such
higher-level reasoning doesn't belong in the section on "sensory modalities",
but being able to "deliberately construct" anything requires a way
to manipulate the visual model. In addition to the hardware/code
for taking the external action of "draw a square on the sheet of
paper", a mind requires the hardware/code to take the internal action
of "imagine a square". The consequence, in terms of how sensory modalities
are programmed, is that feature extraction needs to be reversible.
Not all of the features all of the time, of course, but for
the cognitive act of visualization to be possible, there must be
a mechanism whereby the perception that detects the "line" feature has
an inverse function that constructs a line, or transforms something else
into a line.
Feature reconstruction is much more difficult to program than feature
extraction. More computationally intensive, too. It's the difference
between multiplying the low-level elements of "7" and "17", and reconstructing
two low-level elements which could have yielded the high-level feature
of "119". This may be one of the reasons why thalamocortical sensory
pathways are always reciprocated by corticothalamic projections of equal
or greater size; for example, a cat has 10^6 neural fibers leading from
the lateral geniculate nucleus to the visual cortex, but 10^7 fibers going
in the reverse direction. (27).
Even a complete sensory modality, capable of perception and visualization,
is useless without the rest of the AI. "Necessary, But Not Sufficient,"
the phrase goes. A modality provides some of the raw material that
concepts are made of - the space in which visualizations exist, but nothing
more. But, granting that the rest of the AI has been done properly,
a visual modality will create the potential to understand the concept of
"closer"; to use the concept of "closer", and heuristics derived from examining
instances of the concept "closer", as a useful visual metaphor for other
tasks; and to use deliberately constructed models, existing in the visual
modality, to ground thinking about generic processes and interactions.
(In other words, when considering a "fork" in chess or an "if" statement
in code, it can be visualized as an object with a Y-shaped trajectory.)
Is a complete visual modality - pixels, edge detectors, surface-texture
decoders, and all - really necessary to engage in spatial reasoning?
Would a world of Newtonian billiard balls, with velocities and collision-detection,
do as well? It would apparently suffice to represent concepts such
as "fork", "if statement", "source-path-goal", "closer", and to create
metaphors for most generic systems composed of discrete objects.
The billiard-ball world has significantly less representative power; it's
harder to understand a "curved trajectory" in spacetime if you can't visualize
a curve in space. (28). But, considering
the sheer programmatic difficulty of coding a visual modality, are metaphors
with billiard balls composed of pixels that superior to metaphors
with billiard balls implemented directly as low-level elements?
Well, yes. In a visual modality, you can switch from round billiard
balls to square billiard balls, visualize them deforming as they touch,
and otherwise "think outside the box". The potential for thinking
outside the box, in this case, exists because the system being modeled
has elements that are represented by high-level visual objects; these high-level
visual objects in turn are composed of mid-level visual features which
are composed of low-level visual elements. This provides wiggle room
for creativity.
Consider the famous puzzle with nine dots arranged in a square, where
you're supposed to draw four straight lines, without lifting pen from paper,
to connect the dots. (29). To solve
the puzzle one must "think outside the box" - that is, draw lines which
extend beyond the confines of the square. A conventional computer
program written to solve this problem would probably contain the "box"
as an assumption built into the code, which is why computers have a reputation
for lack of creativity. (30).
A billiard-ball metaphor, even assuming that it could represent lines,
might run into the same problem.
I suspect that many solvers of the nine-dot problem reach their insight
because a particular configuration of tried-out lines suggests an incomplete
triangle whose corners lie outside the box. "Seeing" an "incomplete
triangle" is an optical illusion, which is to say that it's the
result of high-level features being triggered and suggesting mid-level
features - in this case, some extra lines that turn out to be the solution
to the problem. Sure, you can make up ways that this could happen
in a billiards modality, but then the billiards modality starts looking
like a visual cortex. The point is that, for our particular human
style of creativity, it is Necessary (But Not Sufficient) to have a modality
with rich "extraneous" perceptions, and where high-level objects in the
metaphor can be made to do unconventional things by mentally manipulating
the low-level elements. (Even so, it would make development sense
to start out with a billiards modality and work up to vision gradually.)
There are two final reasons for giving a seed AI sensory modalities:
First, the possession of a codic modality may improve the AI's understanding
of source code, at least until the AI is smart enough to make its own decisions
about the balance between slow-conscious and fast-autonomic thought.
Second, as will be discussed later, thoughts don't start out as abstract;
they reach what we would consider the "abstract" level by climbing a layer
cake of ideas. That layer cake starts with the non-abstract, autonomic
intuitions and perceptions of the world described by modalities.
The concrete world provided by modalities is what enables the AI to learn
its way up to tackling abstract problems.
| NOTE: | One of the greatest advantages of
seed AI - second only to recursive self-improvement - is going
beyond the human sensory modalities. It's possible to create a sensory
modality for source code. The converse is also true: Various
processes that are autonomic in humans - memory storage, symbol formation
- can become sensory modalities subject to deliberate manipulation.
In programmatic terms, any program module with a coherent set of data structures and an API, which could benefit from higher-level thinking, is a candidate for transformation into a modality with world-model-capable representations, feature extraction, reversible features to allow mental actions, and the other design characteristics required to support concept formation. |
Modalities in the human brain are mostly preprogrammed, as opposed to learned. (Human modalities require external stimuli to grow into their preprogrammed organization, but this is not the same as learning.) Individual neural signals can have meanings that are visible and understandable to an eavesdropper. Programmers may legitimately take the risk of creating modalities through deliberate programming, with low-level elements that correspond to data structures, and human-written procedures for feature extraction.
Within °GISAI, the term concept is used to refer to the kind of mental stuff that exists as a pattern in the modality. A learned sequence of instructions that reconstructs a generic, abstracted "light bulb" in the visual modality is a concept. Symbols, categories, and some memories are concepts. (Despite common usage, "concept" might technically refer to non-declarative mental stuff such as a human cognitive reflex or a human motor skill. However, in a seed AI, where everything is open to introspection, it makes sense to call the equivalents of human reflexes or skills "concepts".) Concepts are patterns, learned or preprogrammed, that exist in long-term storage and can be retrieved.
A structure of concepts creates a thought. The archetypal example, in humans, is words coming together to form sentences. Thoughts are visualized; they operate within the RAM of the mind, the "workspace" represented by available content capacity in the sensory modalities, commonly called "short-term memory" or "working memory". (The capacity of working memory in AIs is not determined by available RAM, but by available CPU capacity to perform feature extraction on the contents of memory. If you have the data structures without the feature extraction, the AI won't notice the information.) Thoughts manipulate the world-model.
In humans, at least, it's hard to draw clean boundaries between thoughts
and concepts. (31). The experience of hearing the word for a single
concept, such as "triangle", is not necessarily a mere concept; it may
be more valid to view it as a thought composed of the single concept "triangle".
And, although some concepts are formed by categorizing directly from sense
perception, more abstract concepts such as "three" probably occur first
as deliberate thoughts. We'll be discussing both types in this section.
In chemistry, abstract means remove; to "abstract" an
atom from a molecule means to take it away. Use of the term "abstract"
to describe the process of forming concepts implies two assumptions:
First, to create a concept is to generalize; second, to generalize
is to lose information. It implies that, to form the concept
of "red", it is necessary to ignore other high-level features such as shape
and size, and focus only on color.
This is the classical-AI view of abstraction, and we should therefore
be suspicious of it. On the other hand, our mechanisms for abstraction
can
learn the concept for "red". In a being with a visual modality, this
concept would consist of a piece of mindstuff that had learned to distinguish
between red objects and non-red objects. Since redness is detected
directly as a low-level feature, it shouldn't be very hard to train a piece
of mindstuff to thus distinguish - whether the mindstuff is made of trainable
neurons, evolving code, or whatever. A neural net needs to learn
to fire when the "red" feature is present, and not otherwise; a piece of
code only needs to evolve to test for the presence of the redness feature.
At most, "red" might also require testing for solid-color or same-hue groupings.
Given a visual modality, the concept of "red" lies very close to the surface.
Of course, to have a real concept for "red", it's not enough to distinguish
between red and non-red. The concept has to be applicable;
you have to be able to apply it to visualizations, as in "red dog".
You also need a default exemplar (32) for "red"; and an extreme exemplar
for "red"; and memories of experiences that are stereotypically red, such
as stoplights and blood. (For all we know, leaving out any one of
these would be enough to totally hose the flow of cognition.) Again,
these features lie close to the surface of a visual modality. "Red"
would be one of the easiest features to make reversible, with little additional
computational cost involved; just set the hue of all colors to a red value.
(Although hopefully in such a way as to preserve all detected edges, contrasts,
and so on. Making everything exactly the same color would
destroy non-color features.) The default exemplar for red can be
a red blob, or a red light; the extreme exemplar for red may be the same
as the default exemplar, or it may be a more intensely red blob.
And the stereotypically red objects, such as stoplights and blood, are
the objects in which the redness is important, and much remarked upon.
(33).
For the moment, however, let's concentrate on the problem of forming
categories. The conventional wisdom states that categorization consists
of generalization, and that generalization consists of focusing on particular
features at the expense of others.
We'll use the microdomain of letter-strings as an example. To
generalize from the instances {"aaa", "bbb", "ccc"} to form the category
"strings-of-three-equal-letters", the information about which letter
must be abstracted, or lost, from the model. Actually, this misstates
the problem. If you lose that information on a letter-by-letter basis,
then "aaa" and "aab" both look like "***". What's needed is for the
letter-string modality to first extract the features of "group-of-equal-letters",
"number=3", and "letter=b", after which the concept can lose the
last feature or focus on the first two. If the second feature,
"number", is also lost, then the result is an even more general concept,
"strings-of-equal-letters". Of course, this concept is precisely
identical to the modality's built-in feature-detector for "group-of-equal-letters",
which again points up that only very simple conceptual categories, lying
very close to the surface of the modality's preprogrammed assumptions about
which features are important, can be implemented by direct information-loss.
To examine a more complex concept, we'll look at the example of "three".
To a twenty-first-century human, trained in arithmetic and mathematics,
the concept of "three" has enormous richness. It must therefore be
emphasized that we are dealing solely with the concept of "three", and
that a mind can understand "three" without understanding "two" or "four"
or "number" or "addition" or "multiplication". A mind may have the
concept "three" and the concept "two" without noticing any similarity between
them, much less having the aha! that these concepts should go together
under the heading "number". If a mind somehow manages to pick up
the categories of groups-of-three-dogs and groups-of-three-cats, it doesn't
follow that the mind will generalize to the category of "three".
To think about infant-level or child-level AIs, or for that matter to
teach human children, it's necessary to slow down and forget about what
seems "natural". It's necessary to make a conscious separation between
ideas - ideas that, to humans, seem so close together that it takes a deliberate
effort to see the distance.
Just because the AI exists on a machine performing billions of arithmetical
operations per second doesn't mean that the AI itself must understand arithmetic
or "three". (John Searle, take note!) Even if the AI has a
codic modality which grants it direct access to numerical operations, it
doesn't necessarily understand "three". If every modality were programmed
with feature-extractors that counted up the number of objects in every
grouping, and output the result as (say) the tag "number: three", the AI
might still fail to really understand "three", since such an AI
would be unable to count objects that weren't represented directly in some
modality. An AI that learns the concept of "three" is more
likely to notice not just three apples but that °ve (the
AI) is currently thinking three thoughts. A preprogrammed concept
only notices what the programmer was thinking about when he or she wrote
the program.
What is "three", then? How would the concept of "three" be learned
by an AI whose modalities made no direct reference to numbers - whose modalities,
in fact, were designed by a programmer who wasn't thinking about numbers
at the time? How can such a simple concept be decomposed into something
even simpler?
There's an AI called "°Copycat",
written by Melanie Mitchell and conceived by Douglas
R. Hofstadter, that tries to solve analogy problems in the microdomain
of letter-strings. If you tell Copycat: "'abc' goes to 'abd';
what does 'bcd' go to?", it will answer "'bce'". It can handle much
harder problems, too. (See °Copycat in the glossary.)
Copycat is a really fascinating AI, and you can read about it in Metamagical
Themas, or read the source
code (it's a good read, and available as plain text online - no decompression
required). If you do look at the source code, or even just browse
the list of filenames, you'll see the names of some very fundamental cognitive
entities. There are "bonds", "groups", and "correspondences".
There are "descriptors" (and "distinguishing descriptors") and "mappings",
and all sorts of interesting things.
Without going too far into the details of Copycat, I believe that some
of the mental objects in Copycat are primitive enough to lie very close
to the foundations of cognition. Copycat measures numbers directly
(although it can only count up to five), but that's not the feature we're
interested in. Copycat was designed to understand relations and invent
analogies. It can notice when two letters occupy "the same position"
in a letter-string, and can also notice when two letters occupy "the same
role" in a higher-order mental construct. It can notice that "c"
in "abc" and "d" in "abd" and "d" in "bcd" all occupy the same position.
It can understand the concept of "the same role", if faced by an analogy
problem which forces it to do so. For example: If "abc" goes
to "abd", what does "pqrs" go to? Copycat sees that "c" and "s" occupy
the same role, even though they no longer occupy the same numerical position
in the string, and so replies "pqrt".
Correspondences and roles and mappings are probably
autonomically-detected features on the modality-level (as well as being
very advanced concepts in cognitive science). Intuitive, directly
perceived correspondences allow two images in the same modality to be compared,
and that is a basic part of what makes a modality go.
These intuitions obey certain underlying cognitive pressures (also modeled
by the Copycat project): If two high-level structures are equal,
then the low-level structures should be mapped to each other. Symmetry,
which - very loosely defined - is the idea that each of these low-level
mappings should be the same. If one is reflected, they should all
be reflected, and so on. Completeness: You shouldn't
map five elements to each other but leave the sixth elements dangling.
Copycat shows an example of how to implement this class of cognitive
intuitions using conflict-detectors, equality-detectors, and a feature
called a "computational temperature". Roughly speaking, conflicts
raise the temperature and good structures lower the temperature.
The higher the temperature, the more easily cognitive perceptions break
- the more easily groups and bonds and mappings dissolve. Lower temperatures
indicate better answers, and thus answers are more persistent - perceived
pieces of the answer in the cognitive workspace are harder to break.
Copycat's intuitions may not have the same flexibility or insight as a
human consciously trying to solve a "symmetry problem" or a "completeness
problem", but they do arguably match a human's unconscious intuitions
about analogy problems. Each low-level built-in cognitive ability
has its analogue as a high-level thought-based skill, and it is dangerous
to confuse the standards to which the two are held.
We now return to the concept of "three". We'll suppose for the
moment that we're operating in a Newtonian billiard-ball modality, and
that we want the AI to learn to recognize three billiard balls.
The first concept learned for "three" might look like this:
The mental image on the left is an "exemplar" (or "prototype"), attached
to the three concept and stored in memory. The mental image
on the right is the target, containing the objects actually being counted.
The concept of "three" is satisfied when correspondences can be drawn between
each object in the three-exemplar and each object in the target image.
If the target image contains two objects, a dangling object will be detected
in the three-exemplar image, and the concept will not be satisfied.
If the target image contains four objects, then a dangling object will
be detected in the target image. (34).
This isn't a full answer to the "problem of three", of course.
A full answer would also consider the question of how to computationally
implement a "unique correspondence" in a non-fragile way; how to distinguish
each object from the background; how to apply the three-concept
to a mental image formerly containing two or four objects to yield a new
mental image containing three objects; how to retrieve the exemplar from
memory; how to extend the intuition of "unique correspondence" across modalities.
And the type of mindstuff needed to implement these instructions in a non-fragile
way; and how the exemplar and concept were created or learned in the first
place.
In fact, the problem of three is so complicated that it would probably
be first solved by conscious thought, and compiled into a concept afterwards.
This adds the problem of figuring out how the thoughts got started; what
types of task would force a mind to notice "three" and evolve a definition
like that above; and how the skill gets compiled into a pattern.
Also, an understanding of three that generalizes from the concept
"three billiard balls" to the concept "three groups of three billiard balls"
means asking what kind of problem would force the generalization.
It means asking how the generalization would take place inside the thought-based
skill or mindstuff-based concept; how the need to generalize would translate
into a cognitive pressure, and how that pressure would apply to a piece
of the mindstuff-code, and how that piece would correctly shift under pressure.
And then there are questions about moving towards the adult-human understanding
of "three", such as noticing that it doesn't matter which
particular
billiard ball A corresponds to which billiard ball B.
However, the diagram above does constitute a major leap forward in solving
the problem. It is a functional decomposition of three, one that
invokes more basic forces such as unique correspondence and exemplar retrieval.
It is a concept that could be learned even by an AI whose programmers had
never heard of numbers, or whose programmers weren't thinking about numbers
at the time. It is a concept that can mutate in useful ways.
By relaxing the requirement of no dangling objects in the exemplar, we
get "less than or equal to three". By relaxing the requirement of
no dangling objects in the target image, we get "greater than or equal
to three". By requiring a dangling object in the target image, we
get "more than three". By comparing two images, instead of a exemplar
and an image, we get "same number as" (35),
and from there "less than" or "less than or equal to".
In fact, examining some of these mutations suggests a real-world path
to threeness. The general rule is that concepts don't get invented
until they're useful. Many physical tasks in our world require equal
numbers of something; four pegs for four holes, and so on. The task
of perceiving a particular number of "holes" and selecting, in advance,
the correct number of pegs, might force the AI to develop the concept of
corresponding sets, or sets that contain the same number of objects.
The spatial fact that two pegs can't go in the same hole, and that one
peg can't go in two holes, would be a force acting to create the perception
of unique (one-to-one) correspondences. "Corresponding-sets" would
probably be the first concept formed. After that, if it were useful
to do so, would come a tendency to categorize sets into classes of corresponding
sets, when it was useful to do so; after that would come the selection
of a three-exemplar and the concept of three.
The decomposition of three in the above graphic is not the most efficient
concept for three. It is simply the most easily evolved. After
the formation of the exemplar-and-comparision concept for three would come
a more efficient procedure: Counting.
To evolve the counting
concept requires that the counting skill
be developed, which occurs on the thought-level, which thought in turn
requires a more sophisticated concept-level depiction of three.
It requires that one and two have also been developed, and
that one and two and three have been generalized into
number.
Once this occurs, and the AI has been playing around with numbers for a
while, it may notice that any group of three objects contains a group of
two objects. It may manage to form the concept of "one-more-than",
an insight that would probably be triggered by watching the number of a
group change as additional objects are added. It might even notice
that physical processes which add one object at a time always result in
the same sequence of numerical descriptions: "One, two, three, four..."
If multiple experiences of such physical processes can be generalized,
and an exemplar experience of the process selected and applied, the result
might be a counting procedure like that taught to human children: Tag
an object as counted and say the word 'one'; tag another object as counted
and say 'two'; tag another object as counted and say the word that, in
the learned auditory chanting sequence, comes after 'two'; and so on.
Do not re-count any object that has already been tagged as "counted".
The last word said aloud is the number of the group. This method
is more efficient than checking unique correspondences, and the method
also reflects a deeper understanding of numbers.
Finally, once "three" has been used long enough, it's likely that a
human brain evolves some type of neural substrate for seeing threeness
directly. That is, some piece of the human visual modality - probably
the object-recognition system in the temporal lobe, but that's just a wild
guess - learns to respond to groups of three objects. (Larger numbers
like "five" or "six" are harder to recognize directly - that is, without
counting - unless the objects are arranged in stereotypical five-patterns
and six-patterns, like those on the sides of dice.) The analogue
for an AI might be a piece of code (or assembly language, or a neural net
- you know, mindstuff) that counts items directly.
However, even if the AI eventually creates a highly-optimized counting
method, implemented directly, the previous definitions of the concept will
still exist. When new situations are encountered, new situations
that force the extension of the concept, the mind can switch from the optimized
method to the methods that reflect underlying causes and underlying substrate.
If necessary, the problem can rise all the way to the level of conscious
perception, so that the deliberate, thought-level methods - the thoughts
from which the concepts first arose - are used. The experiences that
underlie the original definition, the experience of noticing the definition,
the experience of using the definition - all can be reviewed. This
is why a concept is so much richer, so much more powerful, if it's learned
instead of preprogrammed. It's why learned, rich concepts are so
much more flexible, so much likelier to mutate and evolve and spin
off interesting specializations and generalizations and variations.
It's why learned concepts are more useful when a mind encounters special
cases and has to resort to high-level reasoning. It's why high-level
cognitive objects are vastly more powerful, more real, than the
flat, naked "predicate calculus" of classical AI.
Thus the idea of "information-loss" or "focus" is cast in a different
light. Sure, calling something a three-group, or placing it into
the three-category, can be said to "lose" a lot of information - in information-theoretical
terms, you've moved from specifying the distinct and individual object
to specifying a member of the class of things that can be described by
"three". In classical-AI terms, you've decided to focus on the feature
called "number" and not any of the other features of the object.
But to label a rich, complex, multi-step act of perception "information
loss" borders on perversion. Seeing the "threeness" of a group doesn't
destroy
information, it adds information. One perceives everything
that was previously known about the object, and its threeness as well;
nor could that threeness be "focused" on, until the methods for perceiving
threeness were learned.
Neural networks, when perturbed, are known to seek out what might be
called "minimal-energy states". A network-relaxation model of concept
combination could be computationally realistic - an operation that neurons
can accomplish in the 200 operations-per-second timescale. My current
hypothesis for the basic neural operation in concept-combination is the
resonance.
A neural resonance circuit - perhaps not a physical, synaptic circuit,
but a virtual message-passing circuit, established by one of the higher-level
neural communication methods (binding by neural synchrony, maybe) - can
either resonate positively, reinforcing that part of the concept-combination,
or resonate negatively, generating a conflict. My guess at the network-relaxation
method resembles the "potential energy surface" of chemistry in that multiple,
superposed alternatives are tried out simultaneously, so that the minima-seeking
resembles a flowing liquid rather than a rolling ball.
The high-level, salient facets of the concepts being combined are combined
first. These high-level features then visualize the mid-level features;
if no conflict is detected, the mid-level features visualize the low-level
features. If a conflict is detected at any level, the conflict propagates
back up to the conflicting high-level or mid-level features causing the
problem. Who wins the conflict? The more salient, more important,
or more useful feature - remember, we're talking about combining two concepts,
each with its own set of features along various dimensions - is selected
as dominant, and the network relaxation algorithm proceeds. When
one concept modifies another, the "more salient" feature is the one specified
by the concept doing the modifying. (Note also that, in casual reading,
not all the facets of a concept may be important, just as you don't fully
visualize every word in a sentence. Only the facets that resonate
with the subject of discussion, with the paragraph, will be visualized.)
In the case of "triangular light bulbs", "triangular" is an adjective.
The concept for "triangle" or "triangular" is modifying the concept of
"light bulb", rather than vice versa. The default exemplar for "light
bulb" - that is, an image of the generic light bulb - is loaded into the
mental workspace, including the visual facet of the exemplar being loaded
into the visual cortex. Next, the concept for "triangular" is applied
to this mental image.
The concept of "triangular", as it refers to physical objects, has a
single facet: It alters the physical shape of the target image.
Note that I say "physical shape", not "visual shape". The default
exemplar for "light bulb" is a mental image - not a mental picture,
but a mental image; in GISAI, an "image" means a representation
in any modality or modalities, not just the visual cortex. The "light
bulb" exemplar is an image of a three-dimensional bulb-shaped object, made
of glass, having a metal plug at the bottom, whose purpose is to emit light.
It is this multimodal mental image that "triangular" modifies, not just
the visual component of the image. In particular, the "shape" facet
of the light-bulb concept, the facet being modified, is a high-level feature
describing the shape of the three-dimensional physical object, not the
shape of the visual image. Thus, modifying the light-bulb shape will
modify the mental image of the physical shape, rather than manipulating
the 2-D visual shape in the visual cortex.
The "triangular" concept, when applied along the dimension of "shape",
manipulates the mental image of the light bulb, changing the 3D model to
be triangle-shaped. However, since the image of a flat light bulb
fails to resonate, "triangle" automatically slips to "pyramid".
(I'm not sure whether this conflict is detected at the mid-level feature
of "flat light bulb", or whether a flat light bulb actually begins to visualize
before the conflict is detected. The slippage happens too fast for
me to be sure. I suspect that "triangular" has slipped to "pyramidal"
before, when applied to three-dimensional mental images; for neural entities,
anything that happens once is likely to happen again. Neurons learn,
and neural thinking wears channels in the neurons. It could be that
the non-flatness of light bulbs is salient because of their bulbous shape,
and that this resonance with non-flatness causes "triangular" to slip to
"pyramidal" before the concept is even applied.)
Pyramids are sharp. I know, from introspection, that the "sharp
pyramidal light-bulb" got all the way down to the visual level before the
conflict was noticed. (The conflict rose to the level of conscious
perception, but was resolved more or less intuitively; I didn't have to
"stop and think". So this is probably still a valid example of concept-level
processes.) The particular conflict: Sharp glass cuts the person
who holds it. We've all had visual experience of sharp glass, and
the associated need for visual recognition and avoidance; thus, the mental
image of sharp glass would trigger this recognition and create a conflict.
This conflict, once detected, was also visualized all the way down to the
visual cortex; I briefly saw the mental image of a thumb sliding along
the edge of the pyramid.
The problem of sharp edges is one that is caused by sharpness and can
be solved by rounding, and I've had visual experience of glass with rounded
edges, so the sharp edges on the mental image slipped to rounded edges.
The result was a complete mental image of a pyramidal light bulb, having
four triangular sides, rounded edges and corners, and a square bottom with
a plug in it. (36)
Every sentence in the last five paragraphs, of course, is just begging
the question: "Why? Why? Why?" A full answer is
really beyond the scope of the section on "Mind"; I just want to remind
my readers that often the real answer is "Because it happened that way
at least once before in your lifetime." A human mind is not necessarily
capable of simultaneously inventing all the reflexes, salient pathways,
and slippages necessary to visualize a triangular lightbulb. Neurons
learn, and thoughts wear channels in the network. The first
time I ever had to select which level triangle-imposition should apply
to - visual, spatial, or physical - I may have made a comical mistake.
A seed AI may be able to avoid or shorten this period of infancy by using
deliberate, thought-level reasoning about how concepts should combine;
if so, this is functionality over and above that exhibited by humans.
You'll note that, throughout the entire discussion of concept combination,
I've been talking about humans and even making appeals to specific properties
of neurally based mindstuff, without talking about the problem of implementation
in AIs. Most of the time, the associational, similarity-based architecture
of biological neural structures is a terrible inconvenience. Human
evolution always works with neural structures - no other type of computational
substrate is available - but some computational tasks are so ill-suited
to the architecture that one must turn incredible hoops to encode them
neurally. (This is why I tend to be instinctively suspicious of someone
who says, "Let's solve this problem with a neural net!" When the
human mind comes up with a solution, it tends to phrase it as code, not
a neural network. "If you really understood the problem," I think
to myself, "you wouldn't be using neural nets.")
Concept combination is one of the few places where neurons really shine.
It's one of the very rare occasions when the associational, similarity-based,
channel-wearing architecture of biological neural structures is so appropriate
that a programmer might reinvent naked neurons, with no features added
or removed, as the correct computational elements for solving the problem.
Neural structures are just very well-suited to "reductionist energy minimization"
or "holistic network relaxation" or whatever you want to call it.
Even so, neural networks are very hard to understand, or debug, or sensibly
modify. I believe in the ideal of mindstuff that both human programmers
and the AI can understand and manipulate. To expect direct human
readability may be a little too much; that goal, if taken literally, tends
to promote fragile, crystalline, simplistic code, like that of a classical
AI. Still, even if concept-level mindstuff doesn't have the direct
semantics of code, we can expect better than the naked incomprehensibility
of assembly language. We can expect the programmer to be able to
see and manipulate what's going on, at least in general terms, perhaps
with the aid of some type of "decompiler". I currently tend to lean
towards code for the final mindstuff, while acknowledging that this code
may tend to organize itself in neural-like patterns which will require
additional tools to decode.
Thoughts are created by structures of concept-level patterns.
The archetypal example is a grammatical sentence: a linear sequence of
words parsed by the brain's linguistic centers into a more-or-less hierarchical
structure, in which the referents of targetable words and phrases (an adjective
needs a target image, for example) have been found, either inside the sentence
or in the most salient part of the current mental image. The inverse
of this process is when a fact is noticed, turned into a concept structure,
translated into a sentence, and articulated out loud within the mind.
(A possible reason for the stream-of-consciousness phenomenon is discussed
in 2.4.3: Thoughts about thoughts.)
The current section has discussed concepts as mindstuff-based patterns
in sensory modalities - that is, the mindstuff is assumed to pay attention
to, or issue instructions to, the sensory modalities and the features therein.
That concepts interact with other concepts, and are influenced by the higher-level
context in which they are invoked, has been largely ignored. This
was deliberate. The farther you go from the mindstuff level, and
the more "abstract" you get, the closer you are to the levels that are
easily accessible to human introspection. These are the introspective
perceptions that come out in words; the qualities that modern culture associates
with above-average intelligence; the levels enormously overemphasized by
classical AI.
Still, there are some thoughts that are so abstract as to appear distant
from any sensory grounding. In that last sentence, for example, only
the term "distant" has an obvious grounding, and since the sentence wasn't
interpreted in a spatial context, it's unlikely that even that term had
any direct visualizational effect. Metaphors do show up more often
than you might think, even in abstract thought (see °Lakoff and Johnson,
Metaphors
We Live By or Philosophy
in the Flesh). Still, there are concepts whose definition and
grounding is primarily their effect on other concepts - "abstract concepts".
Why doesn't the classical-AI method work for abstract concepts?
Even abstract concepts, mental images composed entirely of concepts
referring to other concepts, exist within a °reductholistic
system. Abstract concepts may not have reductionist definitions that
ground directly in sensory experience, but they have reductionist definitions
that ground in other concepts. What are apparently high-level object-to-object
interactions between two abstract concepts can, if conflicts appear, be
modeled as mid-level structure-to-structure interactions between two definitions.
Abstract concepts still have lower-level structure, mid-level interactions,
and higher-level context.
Still, defining concepts in terms of other concepts is what classical
AIs do. I can't actually recall, offhand, any (failed!) classical
AIs with explicit holistic structure - I can't recall any classical AIs
that constructed explicitly multilevel models to ground reasoning using
semantic networks - but it seems likely that someone would have tried it
at some point. (Eurisko and Copycat don't count for reasons that
will be discussed in future sections. Besides, they didn't fail.)
So, why doesn't the classical method work for abstract concepts?
Many classical AIs lack even basic quantitative interactions (such as
fuzzy logic), rendering them incapable of using methods such as holistic
network relaxation, and lending all interactions an even more crystalline
feeling. Still, there are classical AIs that use fuzzy logic.
What's missing is flexibility, mutability, and above all richness;
what's missing is the complexity that comes from learning a concept.
Perhaps it would be theoretically possible to select a piece of abstract
reasoning in an adult AI in which the complexity of sensory modalities
played no part at all. Perhaps it would even be possible to remove
all the grounding concepts below a certain level, and most of the modality-level
complexity, without destroying the causal process of the reasoning.
Even so - even if the mind were deprived of its ultimate grounding and
left floating - the result wouldn't be a classical AI. Abstract concepts
are learned, are grown in a world that's almost as rich as a sensory
modality - because the grounding definitions are composed of slightly less
abstract concepts with rich interactions, and those less-abstract concepts
are rich because they grew up in a rich world composed of interactions
between even-less-abstract concepts, and so on, until you reach the level
of sensory modalities. Richness isn't automatic. Once a concept
is created, you have to play around with it for a while before it's rich
enough to support another layer. You can't start from the top and
build down.
Another factor that's missing from classical AIs is the ability to attach
experience to concepts, to gain experience in thinking, to wear
a channel in the mind. Even a concept-combination like "triangular
light bulb" has a dynamic pattern, a flow of cause and effect on the concept
level, that relies on the thinker having done most of the thinking in advance.
That complexity is also absent from classical AIs. (And of course,
most classical AIs just don't support all the other dimensions of cognition
- attention, focus, causality, goals, subjunctivity, et cetera.)
I think this provides an adequate explanation of why classical AI failed.
This is why classical AIs can't support thought-level reasoning or a stream
of consciousness; why sensory modalities are necessary to learn abstract
thought; and why concepts must be learned in order to be rich enough
to support coherent thought.
Rational reasoning is very large,
and very complicated. In trying to duplicate the functionality of
a line of rational reasoning, it's easy to bite off too much, and despair
- or worse, oversimplify. The remedy is an understanding of precedence,
a sequence that tells you when you're getting ahead of yourself and building
the roof before you've laid the foundations; °heuristics
that tell you when to slow down and build the tools to build the tools.
Before you can create a thing, there must be the potential for that thing
to exist, and sometimes you have to recurse on creating the potential.
Drew McDermott, in the classic article "Artificial Intelligence Meets
Natural Stupidity", pointed out that the first task, in AI, is to get the
AI to notice its subject. Not "understand". Notice.
If a classical AI has a LISP token named "hamburger", that doesn't mean
the token is a symbol, or that there's any hamburgerness about it.
For an AI to notice something, its internal behavior must change because
of what is noticed. A LISP token named "hamburger" has no attached
hamburgerness. A philosopher of classical AI would say that the LISP
token has semantics because it refers to hamburgers in external reality,
but the AI has no way of noticing this alleged reference. The "reference"
does not influence the AI's behavior - neither external behavior, nor the
internal flow of program causality.
I've
extended McDermott's heuristic to describe a sequence called RNUI,
which stands for Represent, Notice, Understand, and Invent. Represent
comes before Notice; before you can write feature-detectors in a
modality, you need data structures (or non-°crystalline equivalents
thereof) for the data being examined and the features being perceived.
Understand comes before Invent; before an AI can design a
good bicycle, it needs to be able to tell good bicycles from bad bicyles
- perceive the structure of goals and subgoals, understand a human designer's
explanation of why a bicycle was designed a particular way, be capable
of Representing the explanation and Noticing the difference
between explanations and random babbling. Only then can the AI independently
invent a bicycle and explain it to someone else.
Represent is when the skeleton of a cognitive structure, or the
input and output of a function, or a flat description of a real thought,
can be represented within the AI. Represent is about static
data, what remains after dynamic aspects and behaviors have been subtracted.
Represent
can't tell the difference between data constituting a thought, and data
that was provided by a random-number generator.
Notice provides the behaviors that enforce internal relations
and internal coherence. Notice adds the dynamic aspect to
the data. Applied to the modality-level, Notice describes
the feature-extractors that annotate the data with simple facts about relations,
simple bits of causal links, obvious similarities, temporal progressions,
small predictions and expectations, and other features created by the "laws
of physics" of that domain. The converse of modality-level Notice
perception is Notice manipulation, the availability of choices and
actions that manipulate the cognitive representations in direct ways.
The RNUI sequence also applies to higher levels, and to the AI as a whole;
it's possible to be capable of Representing and Noticing
threeness without Understanding
it, or being able to do anything useful with it.
Understand is about °intentionality and external
relations. Understand is about coherence with respect to other cognitive
structures, and coherence with respect to both upper context and underlying
substance (the upper and lower levels of the °reductholistic
representation). Understanding means knowledge and behaviors
that reflect the goal-oriented aspects of a cognitive structure, and the
purpose of a design feature. Understanding reflects the use
of heuristics that can bind high-level characteristics to low-level characteristics.
Understanding means being able to distinguish a good design from
a bad one. Understanding is the ability to fully represent
the cognitive structures that would be created in the course of designing
a bicycle or inventing an explanation, and to verify that these cognitive
structures represent a good design or a good explanation.
Invent is the ability to design a bicycle, to invent a heuristic,
to analyze a phenomenon, to create a plan for a chess game - in short,
to think.
If you have trouble getting an AI to design a bicycle, ask yourself:
"Could this AI understand a design for a bicycle if it had one? Could
it tell a good design for a bad design?" If you have trouble getting
an AI to understand the design for a bicycle, ask yourself: "Can
this AI notice the pieces of a bicycle? Could it tell the difference
between a bicycle and random static?" If you have trouble getting
an AI to notice the pieces, ask yourself: "Can this AI represent
the pieces of the bicycle? Can it represent what is being noticed
about them?"
2.3.2: Abstraction is information-loss; abstraction is not information-loss
2.3.3: The concept of "three"

2.3.4: Concept combination and application
"When you hear the phrase "triangular light bulb", you visualize
a triangular light bulb... How do these two symbols combine?
You know that light bulbs are fragile; you have a built-in comprehension
of real-world physics - sometimes called "naive" physics - that enables
you to understand fragility. You understand that the bulb and the
filament are made of different materials; you can somehow attribute non-visual
properties to pieces of the three-dimensional shape hanging in your visual
cortex. If you try to design a triangular light bulb, you'll design
a flourescent triangular loop, or a pyramid-shaped incandescent bulb; in
either case, unlike the default visualization of "triangle", the result
will not have sharp edges. You know that sharp edges, on glass, will
cut the hand that holds it."
How do the concepts of "triangular" and "light-bulb" combine? My
current hypothesis involves what might be called "°reductionist
energy minimization" or "°holistic network relaxation", a
conflict-resolution method that takes cues from both the "potential energy
surface" of chemistry and the "computational temperature" of Copycat.
-- 1.2: Thinking About AI2.3.5: Thoughts are created by concept structures
Interlude: Represent, Notice, Understand, Invent
2.4: Thoughts
| NOTE: | This section is about what thoughts do. For an explanation of what thoughts are - how they work, where they come from, and so on - see the previous sections. |
Before the AI can act, it needs to learn. "Learning" can be divided into knowledge-formation and skill-formation. Skill formation happens when mindstuff, reflexes, or other unconscious processes are modified. In humans, the modification is autonomic; in seed AIs, it can be either autonomic or deliberate; but skills are always executed autonomically. (Note that "skill", as used here, includes not only motor reflexes but cognitive reflexes, and that "skill" does not include conscious skills like knowing (in theory!) how to disassemble a motorcycle.) The usual term for the dichotomy between skill and knowledge is "procedural vs. declarative", although this involves an assumption about the underlying representation that isn't necessarily true. In general, "knowledge" is the world-model, the contents of the mind, and "skill" is the stuff the mind is made of. Because skills tend to be located at the concept-level or modality-level, this section focuses on knowledge.
The world-model is holistic or reductionist, depending on whether you're looking up or looking down. We live in a Universe where complex objects are built from simpler structures, and stochastic regularities in the interactions between simple elements become complex elements that can develop their own interactions.
Thus, broadly speaking, there are at least three kinds of knowledge problems. You can look for a regularity in the way an object interacts with another object. You can take an object, an event, or an interaction, and try to analyze it; explain how the visible complexity is embodied in the constituent elements and their interactions. Or you can take elements and interactions that you already know something about, and try to understand the high-level behavior of the system. Starting from what you know, you can look sideways, down, or up.
Actually, this is speaking too broadly. Where, for example, do you fit "taking an object that you know something about, and suddenly understanding its purpose within a higher system"? I suppose you could explain this as a variant of analysis - when the "Aha!" is done, the result is a better understanding of a system in terms of its constituents. But then there are other knowledge problems, like guessing the properties of an element by taking the intentional stance towards the system and assuming the object is well-designed for its purpose. Where does that fit in? The moral, I suppose, is that "reductholism" has its uses as a paradigm, but there are limits.
Maybe we should generalize to generic causal models, regardless of level? Then you could divide activities into noticing a property or interaction, deducing the cause of a property or interaction, or projecting from known causes to the expected results. This model is a little more useful, since it sounds like the three problem types may correspond to three problem-solving methods: (A) Examine the model for unexpected regularities, correspondences, covariances, and so on. (B) Generate and test possible models to explain an effect. (C) Use existing knowledge to fill in the blanks (and, if you're a scientific mind, test the predictions thus created).
Still, even that view has its limitations. For example, asking Why? or looking for an explanation isn't strictly a matter of generate-and-test. In fact, generate-and-test is simply a genteel, thought-level version of that old bugaboo of AI, the search algorithm. It seems likely that some type of "genteel search algorithm" - not "blind", but not really deliberate either, and with a definite random component - is responsible for sudden insights and intuitive leaps and a lot of the go-juice of intelligence on the concept level. On the thought level, however, it's often more efficient to take a step back and think about the problem. One implementation for thinking about the problem is "abstraction is information-loss" classical-AI-type "abstract thought", running the problem through with Unknown Variables substituted in for everything you don't know, to see if there are places where the Unknowns cancel out to yield partial results that would hold true of every possible solution, thus constraining the search space. A more accurate implementation would be "applying heuristics that operate on the general information you have, to build up general information about the answer".
The thought-level is a genuine layer of the mind. There isn't any simple way to characterize it. There's a complex way to characterize it, which would consist of watching people solve problems while thinking out loud ("protocol analysis"), then figuring out a set of generalizations that corresponded to underlying neurology or underlying functional modules of the problem-solving method, and which categorized all the individual thoughts in the experimental observations. This problem is large, but finite; the set of underlying abilities and mental actions is limited. Still, such a project is beyond the scope of this particular section. (What I will attempt to do, in later topics, is describe enough of the underlying abilities - enough that implementing them would give rise to sustainable thought. Remember, seed AI isn't about perfectly describing the complete functionality of humans, it's about building minds with sufficient functionality to work.)
The thought-level is a genuine layer of the mind, and has around the same amount of internal complexity as might be associated with the modality-level or the concept-level. The difference is that thoughts are open to introspection, and thus, when I make sweeping generalizations, my readers can catch me at it. Nonetheless, I hope that the generalizations that have been offered here are sufficient to convey a vague general image of what goes on in a mind searching for knowledge. Noticing interesting coincidences and covariances and similarities (looking sideways), building and testing and thinking about the reason why something happens (analysis, looking down in the holistic model, looking backwards in the causal model), trying to fill in the blanks from the knowledge you already have (prediction, looking up in the holistic model, looking forwards in the causal model). The goal is a holistic model with good high-level/low-level bindings, or a causal model where the consequences and preconditions of a perturbation are well-understood, or a goal-and-subgoal model with plans and convergences and intentionality. The goal is a model that holds together, on all levels, when you think about changing it; a model rich enough to support what we think of as intelligent thought.
It is literally impossible to draw a sharp line between understanding and creativity. Sometimes the solution to a difficult knowledge question must be invented, almost ab initio. Sometimes the creation of a new entity is not a matter of searching through possibilities but of seeing the one possibility by looking deeper into the information that you already have. But, usually, when building the world-model, you're trying to find a single, unique solution; the answer to the question. When trying to design something new, you're looking for anyanswer to the question. Understanding is more strongly constrained, but this actually makes the problem easier, since a solution exists and the problem is finding it... the constraints might rather be called clues.
In invention, each constraint eliminates options and makes it less likely that a solution exists. The distinction between understanding and invention is something like the difference between P and NP, between verifying a solution and finding it. Returning to the quadrivium of Sensory, Predictive, Decisive, and Manipulative binding, and to Manipulation's sub-trinity of qualitative, quantitative, and structural bindings, then invention, or high-level manipulation, adds a fourth binding, the holic binding. It's the ability to take a desired high-level characteristic and specify the low-level structure that creates it. It's the ability to engage in hierarchical design, to start from the goal of rapid travel and move to a complete physical design for a bicycle.
The methods of invention are even less clear-cut than the methods of understanding. Unless the problem is one of qualitative manipulation (choice from among a limited number of alternatives), the design space is essentially infinite. An intelligent mind reduces the effective search space through possession of a holistic model that ultimately grounds in heuristics capable of direct backwards manipulation. In other words, if you can choose any real number to specify the width of the wheel, what's needed is a heuristic that binds it - reversibly - to a higher-level design feature, such as desired stability on turns. If desired stability on turns is itself a design variable, a heuristic is needed that binds it to a known quantity, such as the weight range of the rider. And so on.
Such reasoning acts to reduce the search space from the space of all possible low-level specifications of a design, to the space of cognitive objects constituting reasonable high-level designs. If there are enough heuristics left to constrain the design further, or to specify design features from high-level goals, then the task can be completed without special inspiration. If there's a gap, a high-level feature with no heuristics that directly determine how it might be implemented, then there sometimes comes that special event known as an "insight", an intuitive leap.
Sometimes you try to invent the bicycle without knowing about the wheel. The crucial insight may consist of remembering logs rolling down a hill. It may consist of just suddenly seeing the answer. Or it may lie in finding the right heuristic to attack the problem. The key point is that a wide search space is crossed to find the single right answer, apparently without any guide or heuristic that simplifies the problem. (If the aha! is finding the right heuristic, then the act of creativity lies in crossing the search space of possible heuristics.)
What is creativity? Creativity is the name we assign to the mental shock that occurs when a large and novel load of high-quality mental material is delivered to our perceptions. I would say that it's the perception of "unexpected" material, meaning "unexpected" not in the sense that the delivery comes as a surprise, but in the sense that our mental model can't predict the specific content of the material being delivered. We perceive a thought as "creative", in ourselves or others, on one of two occasions: First, seeing someone thinking outside the box; second, on perceiving a single good solution selected from a nearly infinite search space. In the first case, a concept is redefined, or what was thought to be a constraint is broken; the answer is unexpected, which creates - to the viewer - the mental shock that we name "creativity". The second case consists of seeing the very large gap between "high-speed travel" and "bicycle" crossed; the viewer - unless ve verself has designed a bicycle - has no single heuristic that can cross a gap of that size, that can anticipate the content of the material presented. There's a nearly infinite space of possible paintings, so when we see any single painting of reasonable quality, a large quantity of unexpected cognitive material is delivered to our eyes and we call it "creativity".
It seems likely to me that the experience of creative insight happens when the mind decides to brute-force, or rather intelligent-force, the search problem. The aha! of wheels comes because, somewhere in the back of your mind, possible memories were tested at random for applicability to the problem until the memory of logs rolling down a hill resonated with the problem and rose to conscious attention. This unconscious "blind" search may employ some of the tricks of deliberation, such as searching through memories of objects that were seen traveling very fast. (Or not. It seems likely to me that only deliberate thought produces that kind of constraint.) Even so, it remains in essence a try-at-random algorithm. If there's anything more to subconscious creative insights than that, I don't know what it is.
Since thoughts are reasonably accessible to
the human mind, there's a good deal of existing research on how they work.
The specific methods are important, but what's more important is getting
a working system of thoughts, enough methods that work well enough
that the AI can continue further.
Most important to the system of thoughts is introspection.
Introspection is the glue that holds the thought-level together.
Coherent thoughts don't happen at random. They happen because we
know how to think, and because we have the right reflexes for thinking.
The problem of what to think next is itself a problem domain. To
prevent an infinite-recursion error, our solution to this problem on the
moment-to-moment level is dictated entirely by reflex, the channels worn
into our neural minds. Even when we deliberately stop and say to
ourselves, "Now, what topic should I think about next?", the thinking about
thinking proceeds by reflex. These reflexes are formed during infancy,
and before they exist, coherent thought doesn't happen. To get past
that barrier you'd have to be a seed AI, capable of watching a replay of
your own source code in action, or halting and storing the current state
of high-level thought to recurse on examining the stuff the thought is
made of.
The self is a domain fully as complex as any in external reality.
It consists not just of perceiving the self but of manipulating the self.
The experience you remember of introspection consists of the occasions
when the problems became large enough to require conscious thought.
Beneath that remembered, introspection-accessible experience lies perceptions
and reflexes that have become so invisible we don't even notice them.
The intuitions of introspection are far more basic to thought than
Hamlet's soliloquy. The problem of introspection should be approached
with the same respect, and the same attention to the RNUI method, that
would be given to the problem of designing a bicycle.
Introspection requires introspective senses, perhaps even an introspective
modality. But the idea of an introspective modality is a subtle and
perhaps useless one. The obvious implementation is to have an introspective
modality that reports on all the cognitive elements inside the AI, but
what does this add? The AI has already noticed that the cognitive
elements are there. How does "the introspective modality" differ
from "a useless and static additional copy of all the information inside
the AI"? What can you do with the detected feature of "the feature
of redness" that you can't do with the feature of redness itself?
To answer this question, it is necessary to step back and consider the
problem in context. Sensory modalities don't exist in a vacuum.
They are useful because concepts lie on top. The question, then,
is not how to build an introspective sensory modality, but how to insure
that concepts about introspection can form. This may involve creating
a new introspective modality, or it may involve attaching a new dimension
to the old modalities and to the other modules of cognition.
Concepts manipulate their referents, as well as extracting information
from them. How would you go about tweaking the visual modality so
that you could imagine "thinking about redness"? How do you get the
AI to notice, declaratively, that a concept has been activated, and how
is this perception reversed to give rise to visualizing the consequences
of activating a concept?
This design problem may go a bit towards explaining that peculiar phenomenon
called "stream of consciousness". You notice a fact, the fact gets
turned into a conceptual structure, the conceptual structure gets turned
into a sentence by your language centers, and then you speak the sentence
"out loud" within your mind. The fascinating thing is this:
If you try to skip the step of "speaking the sentence out loud" within
your mind,
even after you know exactly what the words will be, you
can't go on thinking. Why? What new information is added by
this act?
One possible explanation is that the human mind notices concepts by
noticing the auditory cortex. Humans have no built-in introspective
modality, so concepts become "visible" to our mental reflexes when they
add recognizable content - words - to the auditory cortex. This closes
the loop. Concept activation becomes detectable, and we can form
concepts about concepts. I don't think this is the entire explanation,
but it's a good start.
What about thoughts? On the thought-level, human introspection
is fairly primitive. There's this tendency to lump everything together
under the term "I". When we attribute causality, we say "I remembered"
instead of "the long-term memory-retrieval subsystem reports..."
Perhaps this is because, historically speaking, we didn't know anything
about what was inside the mind until yesterday afternoon. Perhaps
it's because fine-grained introspection doesn't contribute useful complexity
to self-modeling unless you're, oh, writing a paper on AI or something.
There's plenty of useful heuristics about the self that can be learned
by looking at cause and effect, even when all the causal chains start at
a monolithic self-object. A seed AI may have uses for more fine-grained
self-models, but with both design and source code freely accessible, it
shouldn't be too hard for such a self-model to develop.
When can an AI legitimately use the word "I"?
Understand that we are asking about a very limited and purely technical
aspect of self-awareness. We are not talking about the kind of self-awareness
that will cause an ethical system to treat you as a person. We are
not talking about "qualia", the hard problem of conscious experience, what
it means to be a bat, or anything of that sort. These are different
puzzles.
The question being asked is: When can an AI legitimately use the
word "I" in a sentence, such as "I want ice cream", without Drew McDermott
popping up and accusing us of using a word that might as well be translated
as "shmeerp" or G0025?
Consider the SPDM distinction: Sensory, Predictive, Decisive,
Manipulative. A binding between a model and reality starts
when the model "maps" in some way to reality (although this is ultimately
arbitrary), becomes testable when the model can predict experiences, and
becomes useful when the model can be used to decide between alternatives,
with the acid test being manipulation of reality in quantitative or structural
ways. Consider also the distinction between modality-level, concept-level,
and thought-level.
Self-modeling begins when the AI - let's call it Aisa, for "AI, self-aware"
- starts to notice information about itself. Introspective sensations
of sensations are hard to distinguish from the sensations themselves, so
this ball doesn't really get rolling until Aisa forms introspective concepts.
The self-model doesn't begin to generate novel information, information
that can impose a coherent view of internal events, until it can make predictions
- for example: "Skipping from topic to topic, instead of spending
a lot of time on one topic, will result in conceptual structures that are
connected primarily through association." Likewise, this information
doesn't become useful until it plays a part in goal-oriented decisions
- a decisive binding.
When Aisa can create introspective concepts and formulate thought-level
heuristics about the self, it will be able to reason about itself in the
same fashion that it reasons about anything else. Aisa will be able
to manipulate internal reality in the same way that it manipulates external
reality. If Aisa is impressively good at understanding and manipulating
motorcycles, it might be equally impressive when it comes to understanding
and manipulating Aisa.
But to say that "Aisa understands Aisa" is not the same as saying "Aisa
understands itself". Douglas Lenat once said of Cyc that it knows
that there is such a thing as Cyc, and it knows that Cyc is a computer,
but it doesn't know that it is Cyc. That is the key
distinction. A thought-level SPDM binding for the self-model is more
than enough to let Aisa legitimately say "Aisa wants ice cream" - to make
use of the term "Aisa" materially different from use of the term "shmeerp"
or "G0025". There's still one more step required before
Aisa can say: "I want ice cream." But what?
Interestingly, assuming the problem is real is enough to solve the problem.
If another step is required before Aisa can say "I want ice cream", then
there must be a material difference between saying "Aisa wants ice cream"
and "I want ice cream". So that's the answer: You can say "I"
when the behavior generated by modeling yourself is materially different
- because of the self-reference - from the behavior that would be generated
by modeling another AI that happened to look like yourself.
This will never happen with any individual thought - not in humans,
not in AIs - but iterated versions of Aisa-referential thoughts may begin
to exhibit materially different behavior. Any individual thought
will always be a case of A modifying B, but if B then goes on to modify
A, the system-as-a-whole may exhibit behavior that is fundamentally characteristic
of self-awareness. And then Aisa can legitimately say of verself:
"I want an ice-cream cone."
Humans also throw a few extras into the pot. We have observer-biased
social beliefs, a whole view of the world that's skewed toward the mind
at the center, which tends to anchor the perception of the self.
We attribute internal causality to a monolithic object called the "self",
which generates a lot of perceived self-reference because you don't
notice the difference between the thought doing the modifying and the cognitive
object being modified - the source of the thought is the "self", and the
item being modified is part of the "self".
A seed AI will probably be better off without these features.
I mention them because they constitute much of what a human means by "self".
Time
in a digital computer is °discrete and has a single °space
of simultaneity, so anyone who's ever played °Conway's Game
of Life knows everything they need to know about the True Ultimate
Nature of time in the AI. With each tick of the clock, each frame
is derived from the preceeding frame by the "laws of physics" of that °ontology.
(Higher-level regularities in the sequence of frames form what we call
causality; more about this in Unimplemented section: Causality.)
A general intelligence needs to be able to perceive
and visualize when two events occur at the same time; when one event precedes
or follows another event; when two sequences of events are identical or
opposite-symmetrical; and when two intervals are equal, lesser, or greater.
Most of this comes under the general heading of having a feel for time
as a quantity and time as a trajectory, which requires both concept-level
and modality-level support.
To support temporal metaphors and temporal concepts - to provide an
°API with sufficient complexity for the °mindstuff
to hook into - the AI needs modality-level support. The most obvious
method would be to tag all events with a 64-bit number indicating the nanoseconds
since 1970 - a plain good-old-fashioned system clock. The problem
is that then the AI can't think about anything that happened before 1970.
Or about °picoseconds.
If we humans have a built-in system clock - there are several candidates,
ranging from the heartbeat to a 40-°hertz electrical pulse
in the brain - we don't have conscious, abstract access to it. What
we remember is the relative times; that event A came before event
B, that event C was between A and B, that a lot of stuff happened between
A and B, that D seemed to take a long time, that E seemed to go by very
quickly, that E and F happened at the same time, and so on. If I
know that a particular event happened at 4:58 PM on July 23rd 2000, it's
because I looked at my watch and associated the visual or auditory label
"4:58" with the event. That's why I can think - at least abstractly
- about the age of the Universe or picosecond time frames. Our abstract
concepts for quantitative time aren't really built on our internal modality-level
clocks, but on the external clocks we built. Or rather, the internal
modality-level clocks are used for immediate perceptions only, and the
abstract concepts create the modality level through a layer of abstraction
that can handle millennia as easily as minutes.
Because it's very easy to derive all the relative perceptions of time
by comparing absolute quantitative times, we'll almost certainly wind up
tagging every event with a 64-bit system-clock time (or equivalent interpreter
token), and building any other modality functions on top of that.
It's just important to remember that the really important concepts
about time should not be founded directly on the underlying, absolute numbers,
because then the AI really can't think about picoseconds or pre-1970
events; the mindstuff making up the concepts will crash. Concepts
about time, if they refer to quantitative numbers at all, should be founded
on the relative times of the cognitive events that occur while thinking
about a temporal problem. Thus the AI can imagine a process that
takes place on picosecond timescales, and because the visualization itself
takes place on nanosecond timescales (or whatever speed the AI's system
clock runs at), there's no crash. It's a kind of automatic scaling.
To put it another way: Generality requires that there be at least
one layer of complete abstraction between temporal concepts and temporal
modalities. Even if stored memories also store the attached system-clock
time, a replay of those memories obviously won't take place at the recorded
time! If all remembered times are purely abstract characteristics,
and only concretely visualized times give rise to temporal intuitions,
then the AI can freely manipulate temporal aspects of a visualized process.
Symbols such as slow and fast (37) can be abstracted
from temporal intuitions and applied to aspects of any visualized temporal
process.
Of course, because we aren't slavishly following human limitations,
a seed AI should probably have some mode of direct access to the system
clock. We've all been in situations where we've wanted to know exactly
what time it is, or exactly what time it was when we had breakfast.
That's why God gave us wristwatches (38). This should be safe as long as the
direct access occurs through the same conceptual filter, the same layer
of abstraction, so that the modality-level system clock time 203840928340
comes out as the abstract characteristic "System-clock time 203840928340".
Another subtlety of human temporal understanding is that our senses
are synchronized even though different senses presumably have different
processing delays. It takes time for the visual cortex to process
an image, and time for the auditory cortex to process a sound - not necessarily
the same amount of time. But a physical sound and a physical
sight that arrive simultaneously should be perceived as simultaneous.
Since a seed AI should be able to tag sensory events as distinct
from the derivative perceptual events, this should be relatively
easy to handle on the modality level... although it's possible to imagine
problems popping up if there are °heuristics or concepts
that act on the derivative and possibly unsynchronized high-level features
of multiple modalities.
For some cases, this problem can be solved by only allowing multimodality
concepts to act on events that have been completely processed by all targeted
modalities. If a vision and a sound arrive at t=10, the sound finishes
feature-extraction at t=20, and vision finishes extraction at t=30, then
no audiovisual concept can begin acting until t=31, with both the sound
and the vision having a perceived time of t=10. In other words, rather
than skimming the cream off the modalities, the perceived now of
the AI will lag a few seconds behind real time.
This introduces two new problems: One, it may introduce severe
delays into the system. Modalities don't just apply to external sensory
information; modalities are where all the internal thoughts take place
as well. To some extent this problem may be solvable by not requiring
complete
processing before concepts can activate, but only that level of processing
which is necessary to the concept. After all, a concept can't act
on information it doesn't have. But this may still lose some efficiency;
there may be cases where concepts don't need synchronization.
The second problem is synchronization of subjective time. If the
AI's now lags a few seconds behind, when are thoughts perceived
to have taken place? If the AI thinks "foo!" at a time that looks
to the AI like t=10 but is actually t=40, is the concept "foo!" labeled
as having taken place at t=10 or t=40? And what difference does it
make? I can't see that using t=40 makes any difference, so I'm strongly
in favor of labeling all events as occurring when they actually occur.
Still, the AI may eventually find useful °heuristics that
act on "subjective time".
All these modality-level and concept-level
problems are simply echoes of the far more difficult problem of change
propagation on the thought-level - how to ensure that "Aha!"
experiences and "Oops" experiences propagate to all the corners of the
mind, so that beliefs remain in a reasonably consistent state. The
issue of Consistency doesn't belong in this section. However, it
seems likely that issues of concept-level (and thought-level) synchronization
are not problems that should be solved by autonomic processes; concept
synchronization may need to be decided on a case-by-case basis. It
may be that, in the process of learning thought-level reflexes, and finding
concepts that work well, the AI will be forced to invent whatever forms
of synchronization are necessary for each concept. If a multimodal
concept must act on modality-images that began processing at the "same
time" (39), and will
otherwise fail (not generate useful results), it should be a relatively
simple tweak/mutation, of the sort that even °Eurisko could
have performed easily enough. The same goes for whatever concepts
are specified by the programmer during the initial stages.
As a general rule: All derivative perceptual events should be
tagged with their true cognitive time as well as the external-world time
of the derivative event. Human-programmed concepts should enable
the programmer to decide which time should be used; learned concepts won't
even be noticed unless the proper timeframe is used. Try to maintain
the regularities in reality that all intelligence is supposed to represent;
figure out whether the useful regularities represented by a temporal concept
are perceptual/external or cognitive/internal.
When you consider that time is almost always mathematically described
as a real number (41); that one of the words for real number is "quantity"; that
in most trajectories the spatial distance to the target decreases monotonically
with time; and that time "moves forward" at constant velocity; then, the
identity seems so perfect that there is no complexity to be gained by the
metaphor. °Lakoff and Johnson kindly remind us that
"quantity" applies not just to mathematics, but to piles of bricks and
stacks of coins; that "trajectories" are not just simple flights from source
to target, but complex spatial maneuvers, with huge chunks of the visual
subsystems dedicated to their visualization.
By observing that piles of two bricks plus piles of three bricks equal
piles of five bricks, it is possible to guess that two hours plus three
hours will equal five hours. Using the underlying numerical concept
described in 2.3.3: The concept of "three", it can be seen
that this "metaphor" requires the ability to treat temporal intervals as
distinct objects, so that unique correspondences can be drawn between each
of three hours and each of three bricks. To learn (concept-level)
to treat time as a quantity requires that the AI encounter a task with
a uniqueness constraint; one in which it can't do two things in the same
minute (42).
This leads to treating time as a limited resource, which leads to an even
stronger analogy with time-as-material-substance.
°Lakoff and Johnson describe the time-is-movement metaphor
in terms of the motion of an observer. The "location" of the observer
is the present, the "space" in front of the observer is the future, the
"space" behind the observer is the past. "Objects" are events or
times, "located" at various "points" along the "line". The time-is-motion
metaphor has two (incompatible) interpretations: The observer can
be thought of as moving forward at a constant speed, passing the events;
or the events can be thought of as moving towards the observer. (L&J
note that this is why "Let's move the meeting ahead a week" is ambiguous.)
Lakoff and Johnson note that we also map time onto body image; in almost
all languages, the observer "faces" the future - although a few languages
(presumably noting that one can see the past, but not the future) have
the observer facing the past. However, this is getting away from
the primary topic - the utility of describing time as a trajectory.
One primary use of time-as-space is to visualize multiple events simultaneously.
That is, by conceptualizing time as a line, we can simultaneously consider
three points/events along the line, where a true temporal visualization
would force us to consider the events sequentially. But this
only applies to humans, with our single and indivisible stream of
consciousness. A seed AI might be able to simultaneously visualize
the dynamic qualities of three different events; in effect, placing three
different moving observers at three different points along the timeline!
Likewise, visualizing time as space makes it easier for humans to perceive
certain types of qualitative relations. Visualizing a quantity plotted
against time - you know, an ordinary 2D graph - enables us to perceive
properties of the curve that would not be visible to a human observer
watching the 1D variable change with time. Humans have one set of
intuitions for static spatial properties, allowing us to stand back and
look at the graph and form compounded perceptions and connected thoughts;
we have another set for dynamic systems in which the sensory images change
at the same rate as our stream of consciousness.
For an AI, the benefit of spatial metaphors might be provided directly
by rewriting the spatial-modality perceptions directly for the temporal
modality - rewriting a visual curve-detector so that it operates on data
in the temporal modality, so that an AI watching a single quantity change
over time has the same set of "smooth curve" or "sharp curve" or "global
maximum" perceptions as a human contemplating a 2D graph.
In conclusion: Time, quantity, and trajectory share certain basic
underlying properties. The primary driver for high-level metaphors
between time and quantity is a task in which time is a limited resource.
In humans, the primary driver for metaphors between time and trajectory
is the greater sophistication of our static visual intuitions, but this
may not apply to seed AIs.
Hofstadter, writing about °Copycat - an
AI that performs analogies in the domain of letter-strings, such as "abc->abd::pqrs->?"
- notes that, despite the simplicity of Copycat's domain, the domain can
contain analogy problems so complex as to embrace a significant chunk of
human thought. A few years back, when I was only beginning to think
about AI, I set out to brainstorm a list of a few hundred perceptions relating
to analogies - "before, next, grow, quantity, add, distance, speed, blockage,
symmetry, interval..." - and noticed that most of them could be represented
on a linear strip of Xs and Os. These perceptions I collectively
name to myself the linear intuitions - the perceptions that apply
to straight lines.
One such perception is reflection: "XXOX" is the reflection of
"XOXX", and the image "XXOXOXX" is bilaterally symmetric. (Note that
it may take you more time to verify that "XXOXOXXO" is the reflection of
"OXXOXOXX", or that "OXOXXOXXOXO" is bilaterally symmetric, and you may
need to do so consciously rather than intuitively; our perceptions have
°horizons, limits to the amount of processing power expended.
Of course, your perceptions are analyzing huge collections of two-dimensional
pixels, not just the on-off "pixels" of a linear image.) Writing
a computational procedure to verify reflection is trivial, but this would
leave out some of the most important design features. On seeing the
letter-strings "ooabaoo", "cxcdcxc", and "rauabauar", the letter-string
"oomemool" would come as rather a surprise, and the "l" would stick out
like a sore thumb. Even without precedents to establish the expectation,
the image "WHMMOW"
has something wrong about it (43).
The perception of reflection is not simply a binary, yes-or-no verification;
once a partial reflection is visible, it establishes an expectation
of complete reflection - a mental image of how the structure "ought" to
look, if the reflection were complete - and if the expectation is violated,
if the actual image conflicts with the imagined, then the violation is
detected, and the violating object becomes more salient ("sticks out like
a sore thumb"). If there is some way to look at the violating object
that preserves perfect reflection, it will resonate strongly with the expectation.
(A more complete discussion of expectation, especially on the concept-level
rather than modality-level, is in Unimplemented section: Causality.)
The point is that the perception of reflection, like most perceptions,
has complex internal structure. In particular, it is possible to
expect reflection, and for the property of "reflection" to be applied
to a previously asymmetric object.
And the usual caveats: It is possible to notice reflection within
an image, or to notice reflection of two structures in two different
images; and it is easier to see reflection if you're looking for it in
advance.
Since it would be computationally expensive to compare every possible
set of pixels for reflection, and yet we notice even unexpected reflections
within an image - implying that the detectors are always on - the human
brain probably detects for prerequisites to reflection first, and tries
to perceive reflection per se only if the prerequisites trigger.
If two visual images are related by the property of reflection, they are
likely to have very similar high-level properties, so that the simultaneous
perception of an image and its reflection would lead to perceptual structures
that, in the human neuron-based brain, would resonate very strongly with
each other, suggesting that tests should be performed for both identity
and reflection. If the object is recognizable, then both the object
and its mirror image would usually be classified identically by the temporal
lobe (44)
- a bird and its mirror image are both classified as "bird" - so that the
visual signals from object and mirror image would rendezvous at that point,
and could be backtraced to their origins, and the test for symmetry then
applied.
That's how humans detect visual symmetry, anyway. It is possible
that the human brain uses its underlying electrical properties to detect
neural synchronies on a global scale, a physically based method that it
would be computationally extravagant to match on a von-Neumann-architecture
digital computer. It could be that a Monte Carlo method
would do as well; a million random samplings and comparisions of parts
of the global state might often find local similarities between sufficiently
large similar structures - if not always, then often enough to give perception
a humanlike flavor of spontaneity. A Monte Carlo method that randomly
tried to detect a million possible resonances might do to duplicate almost
all the functionality of neural resonance, without the combinatorial explosion
that would defeat a perfect implementation.
But that sort of thing is a major, fundamental, and underlying design
issue, and somewhat beyond the scope of this section, or even 3: Cognition.
The perception of 1D temporal reflection is much simpler than the perception
of true 2D or 3D spatial reflection. The modality-level design requirement
is that the AI should be able to independently notice blatantly obvious
temporal reflections; detecting anything more subtle can be left to heuristics,
concepts, and the full weight of deliberate intelligence. The AI
needs to be able to verify temporal reflections suggested by concept-level
or thought-level considerations, but this, as said, is relatively simple.
The reactivation of the infrequently-used exploding-grape concept (or
perceptual structure, if it doesn't rate a concept) should be enough to
suggest that events are being repeated; enough to draw correspondences
between each unusual pair of events. The computational procedure
for detecting reflection is simple enough that it could conceivably be
run on every consciously perceived event-line where correspondences are
drawn between events - at least, with respect to the events salient enough
to have correspondences drawn between them.
Perhaps this example is a bit outré, but then it's hard
to come up with examples of useful temporal reflections. The
only example that springs to mind would be disassembling and reassembling
a motorcycle (45). A stock-trading
AI might find a temporal-reflection intuition useful, or an AI watching
a light bob up and down and trying to deduce a pattern. "Run the
process backwards" is an incredibly useful heuristic in a wide variety
of circumstances, but such a high-level idea is a thought-level process;
even the concept "backwards" properly belongs under Unimplemented section: Symmetry.
There are still some subtleties remaining in Scenario 1 (the exploding-grape
scenario). First, the correspondences drawn are between high-level
events. The concept of "exploding grape" is not represented directly
in a sensory modality; at most, the sound and sight of the exploding grape
are represented, and no two real-world sights and sounds will ever be precisely
equal. The similarities between the first and second events that
lead both of them to be classified as "exploding grape" are higher-level
- either low-level conceptual or very high-level modality.
However, the modality-level intuition for temporal reflection
can operate on concept-level cognitive events. In humans,
for example, the thought
exploding grape results in the visualization
of the syllables "exploding grape" in the auditory cortex, which - in theory
- could have a time-tag attached. In practice, it seems likely that
the AI architecture will be such as to locate concept-level cognitive events
and label them as objects - so that, among other things, thoughts can be
tagged with the system-clock-time that's used for modality-level temporal
intuitions. In general, thinking about thinking - introspection -
obviously requires some way of observing the temporal sequence of thoughts,
knowing
when you thought something. Either the architecture
needs to explicitly represent the activation of concepts and thoughts (the
likely solution (46)); or,
if it's all a big puddle of mindstuff with higher levels being emergent
(47), the thoughts need to spill over into
modalities in some way that allows evolved concepts and thought-level reflexes
to do things like identify the time of a thought.
The second subtlety is that the temporal reflection is not likely to
be perfect. The intervals between the dropped glass and the
exploding grape are not likely to be exactly 20 seconds apiece. Only
the comparative precedences - which event came first - are tested for reflection.
That said, a reflection which preserves intervals constitutes a much stronger
binding, although human temporal perceptions are too approximate for us
to notice that sort of thing without a stopwatch. (Our spatial intuitions
for reflection do require the preservation of distances.)
Simultaneity is when two events occur at the same time. Perfect
simultaneity is when two events are tagged as occurring at exactly the
same time, to the limits of the resolution of the modality-level system
clock. Even in AIs that totally avoid parallel processing, sensory
modalities will tag all the components of an incoming image as having arrived
at the same time, so any mind is full of insignificant simultaneities.
Significant simultaneities are those that are unexpected and that occur
in high-level, salient objects. For example, two objects simultaneously
disappearing from a sensory input.
Because a seed AI's system clock will probably run much much faster
than our own, it may be necessary to define intuitions that detect imperfect
simultaneities - for example, any sensory coincidence within 1/40th of
a second, or any internal coincidence within 1/1000th of a second (or some
other time scale chosen to match the speed of the AI's stream of consciousness).
(48).
Aside from that, take all the caveats I listed in 3.1.4.1: Reflection
and apply them to simultaneity. For example, if simultaneity is repeated
often enough to be expected, then the expectation of simultaneity is applied
to sensory inputs to create an image, a violated expectation should be
noticed as a conflict of the real image with the expectation, the violating
stimulus should become salient, and so on. (And if stimulus A appears
without the expected simultaneous stimulus B... and stimulus B still hasn't
appeared after the AI gets over the shock... then both stimulus A and the
absence
of B become salient.)
The human perception of intervals is approximate rather than quantitative.
We divide how long something feels into "less than a second", "a
second", "ten seconds", "a minute", "ten minutes", "an hour", "a few hours",
"a day", "a few days", "a few weeks", "a few months", "a few years", "a
lifetime", and "longer than a lifetime". (That's a guess. I
don't know the actual categories or their boundaries. It would be
an interesting thing to know, if someone has already done the research.)
The human perception of temporal intervals is also at least partially
°subjective, dependent on how much thinking is going on.
A process relatively empty of events, in which our mind processes incoming
data much faster than it becomes available, is paradoxically perceived
as being longer - it is "boring" (49). A process packed full of emotionally
significant events may appear as being longer; when it's over, "it feels
much longer than it was". (Again, with the time-as-pathway metaphor,
passing a lot of events may appear to make the intervals longer.)
There's also the proverb "time flies when you're having fun"; if events
happen so fast that "there's no time to think" or pay attention to underlying
intervals, time may appear to move by much more quickly. (50).
However, it appears to me that human subjective intervals implement
no important functionality. If the AI uses system-clock intervals
to control the actual subjective perception, so that perceived intervals
are precise, then the perception of exact intervals is more likely to be
useful - that is, when two processes unexpectedly have the same intervals,
it is more likely to signal a useful underlying correlation. The
AI does need a perception for "approximately the same amount of time",
since this is a useful human perception. (Such a perception might
have a quantitative as well as a qualitative component; in other words,
the perception of "approximately the same amount of time" might be strongly
true or weakly true.)
It may be that we humans have no modality-level "equal interval detectors"
at all - after all, we have to count heartbeats or glance at a watch when
we want to even verify the equality of two intervals. If so,
an AI with a modality-level appreciation for intervals might spot surprises
that a human would miss.
"Temporal Reasoning" in °MITECS notes that comparative
operations on intervals can be more complex than the simple precedence
or simultaneity of instantaneous events: "There are thirteen primitive
possible relationships between a pair of intervals: for example, before
(<)
meets (m) (the end of the first corresponds to the beginning of
the second), overlaps (o) and so on." Since these thirteen
possible relationships can be built up from the relationships of the "start"
and "end" events, I don't think they would require architecture-level
support. Overlapping intervals should be intuitively noticed because
salient intervals should be perceived as solid, filling in every point
between the two events, and collisions should be detected in the same way
as collisions of solid objects. Computationally, this can be implemented
either by using a 1D collision-detection algorithm, or by creating an internally
perceived "timeline", with temporal pixels that can be occupied by multiple
events, with a computationally tractable resolution (the system clock might
be too fast) that is nonetheless fine enough to detect overlap. (52).
Finally, intervals have the same caveats as 3.1.4.1: Reflection.
For example, intervals are perceived only for salient events; they aren't
computed for every pair of cognitive events in the mind. (This is,
in fact, impossible, since the perception of an interval is itself a cognitive
event.)
Temporal precedence is which of two events - A or B - came first.
Precedence is the most often-used and most useful temporal perception;
it is the one by which humans order reality. We don't care about
the exact intervals in milliseconds (although an AI might - see above);
we care whether event A or event B came first. Precedence is the
most useful temporal intuition because it is the most deeply intertwined
with causality - effects follow causes. (See Unimplemented section: Causality.)
Mathematically, transitivity of precedence is the defining characteristic
of a linear ordering. If A < B and B < C, then A < C; if
this relation holds true for all events A, B, and C in a group, then that
defines a linear ordering of the group (53).
The set of precedence relations defines a linear string of events.
It is this definition that we humans use, most of the time. Without
access to an actual calendar, we will almost never reconstruct a series
of events by trying to remember the actual temporal labels and performing
a sort(). Rather, we try to reconstruct the series by remembering
that B came after A and before C, that D came after B, and so on
It is also noteworthy that we tend to remember precedences that have
reasons
behind them - such as the precedence of cause and effect. If the
series is a causal chain, we may be able to rattle off the whole series
without effort. If we're trying to describe the ordering of events
that belong to multiple different causal series, we often have to consciously
reconstruct the complete ordering from intersections in the partial orderings
we remember; from remembering whether something was "a short time ago"
or "a long time ago"; and so on. We do not remember an internal calendar
or timeline, and we do not remember - on the modality level - the times
of events. We remember precedences, and it is from these precedences
that the timeline of our lives is constructed.
A seed AI should probably use a modality-level clock or a modality-level
timeline, but it will still need to understand precedence.
Precedence in general is ubiquitous; we invoke it every time we say
before
or after. Precedence can be spatial as well as temporal.
Precedence applies to priorities, not just in terms of what must be done
first,
but the first choice. In this sense, we invoke precedence
every time we say better or worse. The metaphors for
precedence apply to every comparator that operates on a linear ordering:
This is why linear and temporal metaphors are ubiquitous in human language.
What all the metaphors have in common is that the comparative operation
on the quantity or trajectory usually reflects an actual temporal precedence
- the first choice is usually the one that is considered first;
the cognitive events associated with extrapolating that choice will take
place earlier. If a simpler theorem comes before a more complex one,
it's because the complex theorems are constructed from simple ones; the
simple ones are learned first or invented first, and the
cognitive event of that learning or invention will have an earlier clock-time
attached.
Comparision is as ubiquitous in modalities as it is in ordinary source
code. The modality-level intuitions for temporal precedence are a
single case of this general rule.
Usual caveats about expecting precedence and broken expectations and
so on.
"Quantity" is invoked with every perception containing a real number,
as ubiquitous as floating-point numbers in ordinary source code.
When I say "quantity", I do not just refer to a continuously divisible
material substance, like water or time; I generalize to the internal use
of floating-point numbers in representations and intuitions - all the perceptions
that can be "stronger" or "weaker".
Given two quantities, we can notice which is more or less; given two
quantitative properties, such as height, we can notice which is higher
or lower; given two quantitative perceptions, we can tell which is stronger
or weaker. This perception can operate statically, in the absence
of a temporal component.
As discussed in Unimplemented section: whenextract, quantities and comparators
are too ubiquitous to initiate thoughts directly, unless the quantities
and comparators are properties of very high-level objects; thus, low-level
quantities and comparisions would be computed either as preludes to feature
extraction, or only when demanded by the context of a higher thought.
Comparisions computed for feature extraction are also generally local.
A human visual pixel is compared with nearby pixels for edge detection,
but not with every other pixel in the image, using O(N) instead of O(N^2)
comparisions. A seed AI should be able to compare arbitrary pixels
in arbitrary modalities - but only on demand. For more about the
differences between on-demand and automatically-computed perceptions, the
difference between low-level and high-level perceptions, and the difference
between thought-initiating and guess-verifying perceptions, see Unimplemented section: whenextract.
The list of basic operations that can be performed on static quantities
is basically the set of useful arithmetical operations: Subtraction
(in other words, interval calculation), comparision, equality testing.
It would also be possible to include addition, multiplication, division,
bit shifting, bitwise & and |, remainder calculations, exponentiation,
and all the other operations that can be performed on integers and floating-point
numbers; however, these operations are less likely to be useful - less
likely to pick out some interesting facet of reality.
A field of quantities, extended across time or space or both, can give
rise to the mid-level features called patterns; patterns are higher-level
than quantities, and richer, and rarer as a perception (a hundred pixels
give rise to one pattern); thus, patterns are more meaningful. Patterns
can be broken, and the high-level feature that constitutes the breaking
of a pattern is rarer, and far more meaningful, than either the
patterns themselves or the low-level quantities. (I speak here of
modality-level patterns; the problem of seeing thought-level patterns is
nearly identical with the problem of intelligence itself.)
One example of a pattern is a rising quantity - "rising" implying either
a single quantity changing with time, or a field of quantities changing
continuously with with some spatial dimension.
A modality observing D: 8, 16, 32, 64, 128, 256 should notice
that the numbers are constantly increasing, and that the rate of the increase
is constantly increasing. A human modality would not notice
that the numbers formed a doubling sequence - and neither, in all probability,
should an AI's modality, unless the sequence is examined by a thought-level
process. I say this to emphasize that the problem of modality-level
pattern detection is limited, in contrast to the problem of understanding
patterns in general - if the AI's modality can understand a simple, limited
set of patterns, it should be enough.
To notice a pattern is to form an expectation. When this expectation
is violated, the pattern is broken. Observing a single quantity changing,
as in sequence C, the feature "increasing" remains constant. If C
continues but suddenly starts decreasing - 8, 19, 22, 36, 45, 71, 62, 21,
7, 6, 1 - an "edge" has been detected. On a higher level, this is
what is observed: "...greater than, greater than, greater than, less
than, less than, less than..." Thus the presence of the low-level
feature detector for "greater than" or "less than" enables the AI to notice
a pattern it could not otherwise notice, and to detect an edge it could
not otherwise see. That is the function of modality-level feature
detectors: To enable the discovery of regularities in reality that
would otherwise remain hidden.
As a general rule, notice equality, continued equality, and broken equality
in the quantity, in the first derivative, and in the second derivative.
We notice when a constant quantity changes and when a constant rate of
change changes, but we humans do not directly perceive changes in acceleration.
We compute the quantity and the quantitative first derivative, but not
the quantitative second derivative. Since the second derivative -
for humans - is not quantitative but qualitative, we can notice it crossing
the zero line, or notice large (order-of-magnitude) changes, but not notice
small internal variances. An AI might find it useful to perceive
the second derivative quantitatively, but computing a quantitative third
derivative (and thus a qualitative fourth derivative) would probably not
contribute significantly to intelligence outside of specialized applications.
(54).
There is still a question of salience. We would wish a financial
AI, or a human accountant, to notice and wonder if a bank account customarily
showing transactions measured in hundreds of dollars suddenly began showing
transactions measured in millions - the mid-level feature "magnitude",
formerly constant at "hundreds", suddenly jumps to "millions". But
we wouldn't want to notice a change from the mid-level feature "magnitude:
150-155" to the mid-level feature "magnitude: 153-160", even though - on
the surface - both look like equally sharp inequalities. (As a °crystalline
"compare" operation, "hundreds" != "millions" is neither more nor less
unequal than "150-155" != "153-160".) Similarly, we would not notice
a change from the mid-level feature "frequency of numbers ending in 5:
20%" to "frequency of numbers ending in 5: 25%"; or, if we did somehow
notice, we wouldn't attach as much significance.
We have learned from experience, or from our cultural surroundings,
that money is extremely significant, that people often try to tamper with
it, and that the order-of-magnitude of monetary quantities should be paid
attention to; we have not learned a similar heuristic for shifts in a few
dollars, or shifts in percentage frequency of digits, which is why monitoring
either quantity is a specialized technique used only by auditors.
Learning which patterns and broken patterns to pay attention to is a
concept-level problem; it's not trivial, but °Eurisko-oid
techniques should suffice.
These are the feature extractors that can operate on quantities in general:
On the concept-level, all these features should be computed for all
salient high-level quantities, and for all higher-level features rare enough
that computing all the features is computationally tractable. Figuring
out which features to compute for a quantity, and which features to pay
attention to, is a major learning problem for the AI; learning in this
area contributes significantly to qualitative intelligence as well as efficiency,
since compounding extractors can lead to the computation of entirely new
features.
On the modality level, these feature extractors can be composed to yield
some basic mid-level features, such as edge detection in pixels, although
anything more than that is probably a domain-specific problem. For
example, a problem as simple as computing changes in velocity will not
fit strictly within the domain of quantitative perceptions, unless the
velocity is broken up by domain-specific perceptions into quantitative
components of speed and direction.
°Lakoff and Johnson, arguing that our understanding of
trajectories is fundamentally based on motor functions, offer this list
of the basic elements of a trajectory (quoted from "Philosophy in the Flesh"):
The concept of a trajectory can be represented in the temporal XO modality.
Zooming out from the following frame, "OOOOOOXOOOOOOOOXOOOXOOOOOO", it
could be described as "three points on a line". Given a temporal
sequence of XO frames, the points on the line can "move"; they can have
position, speed, direction, and velocity.
The XO modality suffices to represent an example of a trajectory, e.g.:
"XXOOOX", "XOXOOX", "XOOXOX", "XOOOXX"; an observing human would say that
the middle X has moved from the starting point defined by the first X to
the endpoint defined by the third X. (Note that I do not yet use
the word "target".)
For the sake of form, we should name all the intuitions giving rise
to the start-move-endpoint perception. The largest hurdle is the
perception of each middle X as an instance of the same continuous object
- that is, that the X at position 2 in t1, the X at 3 in t2,
the X at 4 in t3, and the X at 5 in t4, are all instances
of a single object with a continuous existence. A human makes this
interpretation immediately because we have built-in assumptions about the
continued existence of discrete objects - domain-specific instincts that
become visible within a few months after birth.
An AI could probably make the same interpretation, but it would be more
difficult. To establish a strongly bound perception of each X as
a discrete object and the middle X as a continuous object, it would probably
take a trajectory lasting, say, ten frames, instead of four. Assume
for the moment that the sequence is expanded to encompass ten frames and
ten one-unit steps for the middle X. In this case, the following
facts are visible immediately: First, that there are the same number
of Xs in each frame. (I will not say "three Xs in each frame", since
this implies an understanding of "°three".) Second,
that each frame has an X in position 1 and an X in position 12. To
a human, it is "obvious" that the constant number of Xs implies a constant
number of discrete objects; to a human, it is obvious that the three Xs
are each different objects; to a human, it is obvious that an X maintaining
an identical position in each frame is the same object in each frame; therefore,
since the first and last Xs are accounted for, the leftover middle X in
each frame must be the third object. And indeed, the "movement" of
the third object (or "shift in the positional attribute", as an AI might
see it) is incremental and constant.
A tremendous amount of cognition has just flashed by. Getting
the AI to perceive two experiences as belonging to the same object is almost
as deep a problem as that of getting the AI to perceive two objects as
belonging to the same category. Some of the underlying forces are
visible in the source
code of Hofstadter's °Copycat; Copycat can see two different
letters in two different strings as occupying the same role. (Copycat
can also see bonds
formed by "movements" in letterspace; it knows that "c" follows "b".)
The general rule, however, goes much deeper than this.
The Rule of Improbability implies that, the wider the range of possible
values for an attribute, the more strongly equality of values implies
equality of underlying objects. "XOX" binds to "XOX" much more weakly
than "roj" binds to "roj". "3" binds to "3" much more weakly than
"23,083" binds to "23,083".
Thus, even so basic a task as knowing when two experiences are the "same"
object requires that the AI have previously have learned which attributes
are good indicators of identity, which in turn requires that the AI have
watched over objects known to be identical so that it can observe which
attributes remain constant. If this were a seminar on logic we'd
be in trouble, but since we're pragmatists we can break the circularity
by cheating, just as the human mind does - it seems highly likely that
equality of visual signatures and continuous change in position are hardwired
into the brain as signals of identity. Similarly, we can start by
identifying a few good attributes to begin with, and giving some sample
sets with pre-identified objects, and letting the seed AI work it out from
there.
What are the consequences of identifying an object?
(Author's note: The discussion of objects should probably be somewhere
other than 3.1.6: Trajectories, probably the section on
categorization, and should have a much longer discussion.)
In what sense does labeling objects as "sources", "trajectors", and
"destinations" - we will not use the term target just yet - differ
from identifying them as "Object 1", "Object 2", and "Object 3"?
In what sense is a "path" different from a "trajectory"? What expectations
are implied by the labels, and what experiences are preconditions for using
the labels?
Conceptually, a path can exist apart from the traversing objects.
If, on multiple occasions, one or more objects is observed to precisely
traverse the same path - perhaps at the same speed - then a generalization
can be made; an observed feature can be extracted from the single experience
and verified to apply across a set of different experiences. To observe
the existence of a path is useful only if the observation is reflected
in external reality - for example, if the reason a rolling ball
follows a path down a mountain is because someone dug a trench. A
seed AI is unlikely to need to deal with physical trajectories of the type
we are familiar with, but the metaphor of "trajectory" extends to the more
important modality of source code - a piece of data can follow a path through
multiple functions.
Similarly, the conditions that lead us to identify some object or position
as "source" is that one or more observed trajectories originate from that
source; what leads us to identify a position as "endpoint" is that one
or more observed trajectories terminate at that endpoint. What makes
the perception of "source" useful is if there is a causal reason
why the position is the source of the trajectory, especially if the object
or position is actually generating the trajectors - if a pitcher
throws a ball, for example; or, in AI terms, if a function outputs pieces
of data that then travel through the system. Similarly, the perception
of "endpoint" is especially useful if the endpoint actually halts the trajector,
or consumes it.
One cue that a real cause may exist - that the perception of a position/object
as "source"/"path"/"endpoint" is useful - is if multiple, varying
paths/trajectories have the same source or endpoint. Imagine that
a randomly moving point darts over a screen, and then the movie is played
back three times; the fact that the sources and endpoints were identical
may not mean that the sources and endpoints have any particular significance;
the rest of the path was identical too.
Thus, the perception of "source" or "endpoint" exists whenever multiple
trajectories share an starting position or ending position, and exists
more strongly when multiple different trajectories share a source
or endpoint but not other characteristics. The perception of "source"
and "endpoint" is useful when the perception reflects the underlying
cause of the initiation or termination of the trajectory.
A "source" or "endpoint" can be any characteristic shared by multiple
origins or terminating points, not just position. If the trajectory
of a grenade always ends at the location of the blue car, regardless of
where the blue car goes, then it's a good guess that someone is trying
to blow up the blue car - that the blue car is the endpoint. The
greater the variance, the less probability that the covariance is coincidence,
and the stronger the binding. The more unique the description of
the endpoints - e.g., the blue car was the only car which shared a location
with all endpoints, and the green car and the purple car were elsewhere
- the stronger the binding. This binding is predictive if
it can be used to predict the position of the next trajectory termination
by reference to the position of the perceived "endpoint", and manipulative
if moving the perceived "endpoint" can change the trajectories - that is,
if you can guess where the grenade will fall by looking at the blue car,
and make the grenade fall in a particular place by driving the blue car
there. If the binding is strong enough, the endpoint may deserve
the name of "target" (see below).
Finally, it is noteworthy that "source" and "endpoint" do not necessarily
imply that the trajector goes into and out of existence. Any interval
which bounds the trajectory, or any conditions which bound the trajectory,
or any sharp changes within the trajectory, may make salient the location
of the trajector during the boundary change. (To perform the computational
operations which check multiple trajectories for binding of sources or
endpoints, it is necessary that the source and endpoint be salient - salient
enough that the additional processing is performed which discovers the
binding.)
When
defining what it means to take the intentional stance with respect
to a system, the archetypal example given is usually that of the thermostat.
A thermostat turns on a cooling system when the temperature rises above
a certain point, and turns on a heating system when the temperature falls
below a certain point. A thermostat behaves as though it "wants"
the temperature to stay within a certain range; as if the thermostat had
a goal state and deliberately resisted alterations to that goal state.
In reality, a thermostat possesses no model of reality whatsoever, but
we may still find it convenient to speak of the thermostat's behavior
as goal-oriented or "intentional".
To describe a trajectory using the terms source, path, and target,
the trajector's arrival at the target must be non-coincidental. If
the trajector is continuously propelled, then use of the word "target"
usually implies that the trajector's path is self-correcting - that if
an impulse is applied which causes the trajector to depart from
the path, a correction (originating from inside or outside the trajector)
will correct the trajectory so that the trajector continues to approach
the goal state. A trajector typically approaches the target such
that the distance between trajector and target tends to decrease continuously,
in spite of any interfering impulses. (This is not always true, particularly
in cases where the "trajector" actually is an intelligent or semi-intelligent
entity capable of taking the long way around, but you get the idea.)
In a slightly different usage of the word "target", the trajector moves
at a constant and unalterable velocity, but tends to hit the target - or
at least come close to it - because the trajector was aimed.
(Which is how "aiming is defined".) (Author's note: Expand
this area.)
Resistance is the name given to an "obstacle" on the way to the
target or goal state. The perception of "resistance" arises when
we observe a trajector hit some type of barrier and bounce, or slow down,
or be pushed back. The implication is that the trajector has not
merely encountered some random impulse, but that there are specific forces
preventing the achievement of a specific goal state or subgoal state.
Forcefulness is the ability to overcome resistance. The
perception of "forcefulness" - force that, to humans, is viscerally impressive
- arises when we see the trajector applying additional forces to overcome
resistance.
All of this applies, not just to actual moving objects, but to goals
in general; to the higher-level metaphor similarity is closeness.
The idea of "closeness" does not apply only to two °quantitative
attributes, but also to two structures built from a number of °qualitative
attributes. If, over time, the qualitative attributes of the first
structure are one by one adjusted so that they match the corresponding
attributes of the second structure, then the first structure is "approaching"
the second.
Mathematically, we might say that one point is approaching a second
in the multi-dimensional °phase space defined by the qualitative
attributes, but this is being overly literal. The perception of similarity
is
useful when two objects being more similar means that the two
objects are more likely to behave similarly. The similarity-is-closeness
metaphor is useful and °manipulative when two objects
being "closer" means that less additional work is required to make them
match completely - one object has become closer to the target represented
by the other.
Use of the term "close" to mean "similar" is an astonishingly general
metaphor. "Close" is used to describe almost any object, event, or
situation that can "approach" a goal state. "Approach" is used as
a metaphor to describe goals in general.
The ultimate underpinning of this metaphor, in humans, may actually
be the human emotional state of tension. We feel tension
as we watch something approach a goal; tension rises as the goal comes
closer and closer... The same rising tension applies when we watch a trajector
approach a target. The closer the approach, the sharper our attention,
the more we're on the lookout for something that might go wrong at the
last second. The metaphor between spatial closeness and generalized
similarity is probably a shadow of the much stronger metaphor between approaching
a target and approaching a goal.
Generally speaking, it's a bad idea to weigh down an AI with slavish
imitations of human emotions. It may not even be necessary to duplicate
the metaphor; I'm not all that sure that the space-to-similarity metaphor
contributes to intelligence. It does seem likely that the AI will
either experience (or learn) some type of heightened attention as events
approach a goal state.
For we humans, who inhabit a physical world, trying to make an object
achieve a certain position is one of the most common goal states;
position is one of the attributes that is most commonly manipulated to
reach a goal state. Indeed, we might be said to instinctively apply
the metaphor state is position. Perhaps the AI will learn
a similar set of extensive metaphors for source code.
There should probably be some type of modality-level support that indicates
the feeling of approaching a goal, so that the concept of "approaching
a goal" lies very close to the surface, and generalizations across tasks
and modalities are easy to notice. The idea of "approach" is an opening
wedge, a way to split reality along lines that reveal important regularities;
the behavior of the "trajectory" towards the goal in one task is often
usefully similar to the behavior of trajectories in other tasks.
Apr 24, 2001: GISAI 2.3.01. Uploaded printable
version. Some minor suggested bugfixes. Removed most mentions
of the phrase "Eliezer Yudkowsky" to make it clearer that GISAI is a publication
of the Singularity Institute.
Apr 18, 2001: GISAI 2.3.0. (This version number previously
reflected the addition of Creating Friendly AI, which later became
a separate document.) Changed copyright to "2001" and "Singularity
Institute" instead of legacy "2000" and "Eliezer Yudkowsky". Uploaded
multi-page version.
Sep 7, 2000: GISAI 2.2.0. Added 3.1: Time and Linearity and Interlude: The Consensus and the Veil of Maya. Uploaded
old bugfixes. 358K.
Jun 25, 2000: GISAI 2.1.0. Added Appendix A: Glossary
and Version History. Much editing, rewriting,
and wordsmithing. 220K. Not published.
May 18, 2000: GISAI 2.0a. General Intelligence
and Seed AI was originally known as Coding a Transhuman AI.
As the Singularity Institute did not yet exist at that time, CaTAI was
then copyrighted by Eliezer S. Yudkowsky. 180K.
2.4.4: The legitimate use of the word "I"
3: Cognition
3.1: Time and Linearity
3.1.1: The dangers of the system clock
3.1.2: Synchronization
3.1.3: Linear metaphors: Time, quantity, trajectory
"A general intelligence needs to be able to perceive and visualize
when two events occur at the same time; when one event precedes or follows
another event; when two sequences of events are identical or opposite-symmetrical;
and when two intervals are equal, lesser, or greater. Most of this
comes under the general heading of having a feel for time as a quantity
and time as a trajectory..."
Several of the most fundamental domains of cognition are one-dimensional
or monotonically increasing, and thus share certain linear charateristics.
In a sense, any possible use of the word "close" or "far" invokes a kind
of linear intuition. So do the words "more" and "less". Time,
because it is both monotonically increasing and one-dimensional (40), is
one of the linear domains. The linear domains tend to relate very
closely to each other - you can have "more" time or "less" time, treating
time as a quantity; you can be "close" to a given time, treating time as
a trajectory. We freely mix-and-match the words because the target
domains share behaviors and underlying properties. In some sense,
the relation between time and quantity and trajectory is not, as °Lakoff
and Johnson would call it, a "metaphor"; it is a real identity.
-- above3.1.4: Linear intuitions: Reflection, simultaneity, interval, precedence
3.1.4.1: Reflection
Scenario 1
A glass drops, and grapes explode in the microwave,
and the computer turns itself on - and then, a few minutes later, the computer
turns itself on, grapes explode in the microwave, and a glass drops.
3.1.4.2: Simultaneity
3.1.4.3: Interval
3.1.4.4: Precedence
3.1.5: Quantity in perceptions
3.1.5.1: Zeroth, first, and second derivatives
3.1.5.2: Patterns and broken patterns
A and B are not only monotonically increasing, but steadily increasing.
The only pattern in C is that the numbers are always rising; each number,
when compared to the previous number, is greater than that previous number.
In each case, a pattern at a lower level becomes a constant feature at
a higher level. The first derivative - "increase by 1", "increase
by 2" - is a constant in A and B. In C, the feature "previous number
is less than next number" is a constant.
3.1.5.3: Salience of noticed changes
3.1.5.4: Feature extractors for general quantities
The lack of these simple intuitions is one of the reasons why computer
programs look so stupid to humans. We always notice when salient
quantities change; most programs are incapable of noticing anything at
all, unless specifically programmed, and they certainly aren't programmed
to notice the general properties of the things they notice. A bank
account won't notice if you make one deposit a day, then suddenly make
ten deposits in one day, then go back to one deposit a day; it's programmed
to handle financial transactions, but not notice patterns in them.
Since knowing about a deposit is a high-level perception to a human - one
which rises all the way to the level of conscious attention - we automatically
compute the basic quantitative perceptions and notice any unexpected equalities
or unexpected changes.
3.1.6: Trajectories
"Trajectory" can also be generalized to any series of changes to a single
object, any series of modulations to a state, that takes place over time
and has a definite beginning and end; any perception that changes continuously,
and smoothly or monotonically enough to be perceived as a trajectory rather
than a series of unrelated change-events. (55).
The trajectory behaviors - especially trajectories with definite beginnings
and ends and directions - intersect planning, which intersects goals, which
is a different topic. However, we will discuss intuitions that have
°intentional aspects - goal-oriented characteristics - such
as force and resistance.
3.1.6.1: Identification of single objects across temporal experiences
Rules of Identification
1.
Equality of attributes across experiences, particularly those attributes
that remain constant for constant objects, implies equality of identity.
2. Continuous change in an attribute, particularly those attributes
that can change without changing the underlying object - such as "position"
or "speed" - implies equality of identity.
Rule of Improbability
Binding
When two images are equal or very similar, the probability that
there is a shared underlying cause behind the equality is proportional
to the improbability of a coincidental equality.
Rules of Objectification
1.
Objects constitute a major source of regularities in reality, and many
heuristics - perhaps even modality-level feature extractors - will operate
on objects rather than experiences.
2. Objects often continue to exist even when they are not directly
experienced, and may require continuous modeling.
3. Objects will often have internal attributes and complex, dynamic
internal structure.
4. All nonvisible attributes of an object remain constant across
experiences, unless there is a reason to expect them to change. (If
the object has intrinsic variability, then the description of the variability
remains constant.)3.1.6.2: Defining attributes of sources, trajectors, and destinations
Rule of Variance Binding
Multiple,
variant experiences sharing a single higher-level characteristic, but not
others, means that the shared characteristic is likely to be significant.
Multiple identical experiences can have any number of possible sources;
only if at least some properties differ is there a reason to focus on a
particular shared characteristic as opposed to others.
3.1.6.3: Source, path, target; impulse, correction, resistance, and forcefulness
Version History
May 18, 2001: GISAI 2.3.02. Split the
original document, "Coding a Transhuman AI", into General Intelligence and Seed AI
and Creating Friendly AI. Minor assorted bugfixes. GISAI
now 349K.
Appendix A: Glossary
| NOTE: | If a referenced item does not appear in this glossary, it may be defined in Creating Friendly AI. |
Since this curve folds in on itself, most "reasonable" images of the local curves for intelligence and efficiency, when combined, are likely to result in a breakthrough-and-bottleneck series at the global level. At least, this is what's likely to happen in the prehuman areas of the curve. Once a breakthrough carries the seed AI past the human level, I would expect the °nanotechnology-to-°Transition Guide "curve" to take over.
Introspection, like evolutionary reasoning, is an incredibly powerful
tool. Like evolutionary reasoning, it takes practice, talent, and
self-awareness to use it on a professional level - to reliably distinguish
between post facto and "pre facto" (10) reasoning, or between original
thought and °cached thought.
Some people, maybe even a majority of readers, may not have needed to
visualize the car smashing before deducing that it would break - or, rather,
accepting that the sentence "Dropping an anvil on a car will break it"
is true - or, rather, continuing to read without noticing that the sentence
was false.
If proprioception does have a separate area of cortex (with distinct representations and extractable features), then it's a distinct sensory modality and should be known as such.
Otherwise, it's like suggesting that translating between Microsoft Word and HTML should be programmatically trivial because both files are really just magnetic patterns in the atoms of the hard disk. What matters is the level where they're different - that's where the Law of Pragmatism says the intelligence is. And if they aren't different anywhere - why, then, there's probably no intelligence.
It [the thalamus] has a simple position in the overall architecture; virtually all information arriving at the cerebral cortex comes from the thalamus, which receives it from subcortical structures... In particular, all visual, auditory, tactile, and proprioceptive information passes through the thalamus on its way to cortex...The most popular hypothesis is that these fibers play a gatekeeping role, assisting in focus of attention (why do you need more fibers to do that?); or, more plausibly, top-down constraints in feature extraction. And since this particular statistic is for cats, the latter hypothesis may be mostly correct. Visualization - imagination - is stereotypically associated with minds directed by general intelligence. While cats may need a memory, and thus the ability to reconstruct images from remembered high-level features, they probably don't need the detailed, fine-grained imagination of a human. So I wouldn't be surprised to find an even greater discrepancy in humans!These facts give rise to the classic view that the thalamus is a passive relay station which generates virtually all the information bearing input to the cortex...
BUT the above picture has omitted one fundamental fact: all projections from thalamus to cortex are reciprocated by feedback projections from cortex to thalamus of the same or even larger size. For instance, Sherman and Koch (1986) estimate that in cat there are roughly 10^6 fibers from the lateral geniculate nucleus in the thalamus to the visual cortex, but 10^7 fibers in the reverse direction! (Italics in original.)
Or perhaps, even for cats, more fibers go from cortex to thalamus than vice versa because even mnemonic sensory manipulation is just computationally harder than sensory perception.
It may be that this is a genuine instance of a physical property of the underlying neurons that would be very hard to duplicate as an external heuristic, without creating an additional layer of neuronlike interpreted code. However, I think that procedural pattern-detectors, plus the ability to learn heuristics about which pattern-detectors to apply and when, should be able to match the effectiveness of biological neurons at forming expectations and detecting patterns.
Our neural ability to adapt to unexpected new patterns may be simulable by trying to detect identity or covariance in a few thousand entirely random quantities, every now and then.