LEVELS OF ORGANIZATION IN GENERAL INTELLIGENCE

Eliezer S. Yudkowsky
Singularity Institute for Artificial Intelligence

To appear in
Ben Goertzel and Cassio Pennachin, eds.
"Artificial General Intelligence"


Abstract

Part I discusses the conceptual foundations of general intelligence as a discipline, orienting it within the Integrated Causal Model of Tooby and Cosmides.  Part II constitutes the bulk of the paper and discusses the functional decomposition of general intelligence into a complex supersystem of interdependent internally specialized processes, and structures the description using five successive levels of functional organization:  Code, sensory modalities, concepts, thoughts, and deliberation.  Part III discusses probable differences between humans and AIs and points out several fundamental advantages that minds-in-general potentially possess relative to current evolved intelligences, especially with respect to recursive self-improvement.


1: Part I: Foundations of general intelligence

What is intelligence?  In humans, intelligence is a brain with a hundred billion neurons and a hundred trillion synapses; a brain in which the cerebral cortex alone is organized into 52 cytoarchitecturally distinct areas per hemisphere.  Intelligence is not the complex expression of a simple principle; intelligence is the complex expression of a complex set of principles.  Intelligence is a supersystem composed of many mutually interdependent subsystems - subsystems specialized not only for particular environmental skills but for particular internal functions.  The heart is not a specialized organ that enables us to run down prey; the heart is a specialized organ that supplies oxygen to the body.  Remove the heart and the result is not a less efficient human, or a less specialized human; the result is a system that ceases to function.

Why is intelligence?  The cause of human intelligence is evolution - the operation of natural selection on a genetic population in which organisms reproduce differentially depending on heritable variation in traits.  Intelligence is an evolutionary advantage because it enables us to model, predict, and manipulate reality.  Evolutionary problems are not limited to stereotypical ancestral contexts such as fleeing lions or chipping spears; our intelligence includes the ability to model social realities consisting of other humans, and the ability to predict and manipulate the internal reality of the mind.  Philosophers of the mind sometimes define "knowledge" as cognitive patterns that map to external reality [Newell80], but a surface mapping has no inherent evolutionary utility.  Intelligence requires more than passive correspondence between internal representations and sensory data, or between sensory data and reality.  Cognition goes beyond passive denotation; it can predict future sensory data from past experience.  Intelligence requires correspondences strong enough for the organism to choose between futures by choosing actions on the basis of their future results.  Intelligence in the fully human sense requires the ability to manipulate the world by reasoning backward from a mental image of the desired outcome to create a mental image of the necessary actions.  (In Part II, these ascending tests of ability are formalized as sensory, predictive, decisive, and manipulative bindings between a model and a referent.)

Understanding the evolution of the human mind requires more than classical Darwinism; it requires the modern "neo-Darwinian" or "population genetics" understanding of evolution - the Integrated Causal Model set forth by [Tooby92].  One of the most important concepts in the ICM is that of "complex functional adaptation".  Evolutionary adaptations are driven by selection pressures acting on genes.  A given gene's contribution to fitness is determined by regularities of the total environment, including both the external environment and the genetic environment.  Adaptation occurs in response to statistically present genetic complexity, not just statistically present environmental contexts.  A new adaptation that requires the presence of a previous adaptation cannot spread unless the prerequisite adaptation is present in the genetic environment with sufficient statistical regularity to make the new adaptation a recurring evolutionary advantage.  Evolution uses existing genetic complexity to build new genetic complexity, but evolution exhibits no foresight.  Evolution does not construct genetic complexity unless it is an immediate advantage, and this is a fundamental constraint on accounts of the evolution of complex systems.

Complex functional adaptations - adaptations that require multiple genetic features to build a complex interdependent system in the phenotype - are usually, and necessarily, universal within a species.  Independent variance in each of the genes making up a complex interdependent system would quickly reduce to insignificance the probability of any phenotype possessing a full functioning system.  To give an example in a simplified world, if independent genes for "retina", "lens", "cornea", "iris", and "optic nerve" each had an independent 20% frequency in the genetic population, the random-chance probability of any individual being born with a complete eyeball would be 1 in 3125.

Natural selection, while feeding on variation, uses it up [Sober84].  The bulk of genetic complexity in any single organism consists of a deep pool of panspecies complex functional adaptations, with selection pressures operating on a surface froth of individual variations. The target matter of Artificial Intelligence is not the surface variation that makes one human slightly smarter than another human, but rather the vast store of complexity that separates a human from an amoeba.  We must avoid distraction by the surface variations that occupy the whole of our day-to-day social universe.  The differences between humans are the points on which we compete and the features we use to recognize our fellows, and thus it is easy to slip into paying them too much attention.

A still greater problem for would-be analysts of panhuman complexity is that the foundations of the mind are not open to introspection.  We perceive only the highest levels of organization of the mind.  You can remember a birthday party, but you cannot remember your hippocampus encoding the memory.

Is either introspection or evolutionary argument relevant to AI?  To what extent can truths about humans be used to predict truths about AIs, and to what extent does knowledge about humans enable us to create AI designs?  If the sole purpose of AI as a research field is to test theories about human cognition, then only truths about human cognition are relevant.  But while human cognitive science constitutes a legitimate purpose, it is not the sole reason to pursue AI; one may also pursue AI as a goal in its own right, in the belief that AI will be useful and beneficial.  From this perspective, what matters is the quality of the resulting intelligence, and not the means through which it is achieved.  However, proper use of this egalitarian viewpoint should be distinguished from historical uses of the "bait-and-switch technique" in which "intelligent AI" is redefined away from its intuitive meaning of "AI as recognizable person", simultaneously with the presentation of a AI design which leaves out most of the functional elements of human intelligence and offers no replacement for them.  There is a difference between relaxing constraints on the means by which "intelligence" can permissibly be achieved, and lowering the standards by which we judge the results as "intelligence".  It is thus permitted to depart from the methods adopted by evolution, but is it wise?

Evolution often finds good ways, but rarely the best ways.  Evolution is a useful inspiration but a dangerous template.  Evolution is a good teacher, but it's up to us to apply the lessons wisely.  Humans are not good examples of minds-in-general; humans are an evolved species with a cognitive and emotional architecture adapted to hunter-gatherer contexts and cognitive processes tuned to run on a substrate of massively parallel 200Hz biological neurons.  Humans were created by evolution, an unintelligent process; AI will be created by the intelligent processes that are humans.

Because evolution lacks foresight, complex functions cannot evolve unless their prerequisites are evolutionary advantages for other reasons.  The human evolutionary line did not evolve toward general intelligence; rather, the hominid line evolved smarter and more complex systems that lacked general intelligence, until finally the cumulative store of existing complexity contained all the tools and subsystems needed for evolution to stumble across general intelligence.  Even this is too anthropocentric; we should say rather that primate evolution stumbled across a fitness gradient whose path includes the subspecies Homo sapiens sapiens, which subspecies exhibits one particular kind of general intelligence.

The human designers of an AI, unlike evolution, will possess the ability to plan ahead for general intelligence.  Furthermore, unlike evolution, a human planner can jump sharp fitness gradients by executing multiple simultaneous actions; a human designer can use foresight to plan multiple new system components as part of a coordinated upgrade.  A human can take present actions based on anticipated forward compatibility with future plans.

Thus, the ontogeny of an AI need not recapitulate human phylogeny.  Because evolution cannot stumble across grand supersystem designs until the subsystems have evolved for other reasons, the phylogeny of the human line is characterized by development from very complex non-general intelligence to very complex general intelligence through the layered accretion of adaptive complexity lying within successive levels of organization.  In contrast, a deliberately designed AI is likely to begin as a set of subsystems in a relatively primitive and undeveloped state, but nonetheless already designed to form a functioning supersystem1.  Because human intelligence is evolutionarily recent, the vast bulk of the complexity making up a human evolved in the absence of general intelligence; the rest of the system has not yet had time to adapt.  Once an AI supersystem possesses any degree of intelligence at all, no matter how primitive, that intelligence becomes a tool which can be used in the construction of further complexity.

Where the human line developed from very complex non-general intelligence into very complex general intelligence, a successful AI project is more likely to develop from a primitive general intelligence into a complex general intelligence.  Note that primitive does not mean architecturally simple.  The right set of subsystems, even in a primitive and simplified state, may be able to function together as a complete but imbecilic mind which then provides a framework for further development.  This does not imply that AI can be reduced to a single algorithm containing the "essence of intelligence".  A cognitive supersystem may be "primitive" relative to a human and still require a tremendous amount of functional complexity.

I am admittedly biased against the search for a single essence of intelligence; I believe that the search for a single essence of intelligence lies at the center of AI's previous failures.  Simplicity is the grail of physics, not AI.  Physicists win Nobel Prizes when they discover a previously unknown underlying layer and explain its behaviors.  We already know what the ultimate bottom layer of an Artificial Intelligence looks like; it looks like ones and zeroes.  Our job is to build something interesting out of those ones and zeroes.  The Turing formalism does not solve this problem any more than quantum electrodynamics tells us how to build a bicycle; knowing the abstract fact that a bicycle is built from atoms doesn't tell you how to build a bicycle out of atoms - which atoms to use and where to put them.  Similarly, the abstract knowledge that biological neurons implement human intelligence does not explain human intelligence.  The classical hype of early neural networks, that they used "the same parallel architecture as the human brain", should, at most, have been a claim of using the same parallel architecture as an earthworm's brain.  (And given the complexity of biological neurons, the claim would still have been wrong.)

"The science of understanding living organization is very different from physics or chemistry, where parsimony makes sense as a theoretical criterion.  The study of organisms is more like reverse engineering, where one may be dealing with a large array of very different components whose heterogeneous organization is explained by the way in which they interact to produce a functional outcome.  Evolution, the constructor of living organisms, has no privileged tendency to build into designs principles of operation that are simple and general."
            -- Leda Cosmides and John Tooby, "The Psychological Foundations of Culture" [Tooby92]
The field of Artificial Intelligence suffers from a heavy, lingering dose of genericity and black-box, blank-slate, tabula-rasa concepts seeping in from the Standard Social Sciences Model (SSSM) identified by [Tooby92].  The general project of liberating AI from the clutches of the SSSM is more work than I wish to undertake in this paper, but one problem that must be dealt with immediately is physics envy.  The development of physics over the last few centuries has been characterized by the discovery of unifying equations which neatly underlie many complex phenomena.  Most of the past fifty years in AI might be described as the search for a similar unifying principle believed to underlie the complex phenomenon of intelligence.

Physics envy in AI is the search for a single, simple underlying process, with the expectation that this one discovery will lay bare all the secrets of intelligence.  The tendency to treat new approaches to AI as if they were new theories of physics may at least partially explain AI's past history of overpromise and oversimplification.  Attributing all the vast functionality of human intelligence to some single descriptive facet - that brains are "parallel", or "distributed", or "stochastic"; that minds use "deduction" or "induction" - results in a failure (an overhyped failure) as the project promises that all the functionality of human intelligence will slide out from some simple principle.

The effects of physics envy can be more subtle; they also appear in the lack of interaction between AI projects.  Physics envy has given rise to a series of AI projects that could only use one idea, as each new hypothesis for the one true essence of intelligence was tested and discarded.  Douglas Lenat's AM and EURISKO programs [Douglas83] - though the results were controversial and may have been mildly exaggerated [Ritchie84] - nonetheless used very intriguing and fundamental design patterns to deliver significant and unprecedented results.  Despite this, the design patterns of EURISKO, such as self-modifying decomposable heuristics, have seen almost no reuse in later AIs.  Even Lenat's subsequent Cyc project [Lenat86] apparently does not reuse the ideas developed in EURISKO.  From the perspective of a modern-day programmer, accustomed to hoarding design patterns and code libraries, the lack of crossfertilization is a surprising anomaly.  One would think that self-optimizing heuristics would be useful as an external tool, e.g. for parameter tuning, even if the overall cognitive architecture did not allow for the internal use of such heuristics.  The AI field seems to have treated EURISKO as a failed hypothesis, or even a competing hypothesis, rather than an incremental success or a reusable tool.

The most common paradigms of traditional AI - search trees, neural networks, genetic algorithms, evolutionary computation, semantic nets - have in common the property that they can be implemented without requiring a store of preexisting complexity.  The processes that have become traditional, that have been reused, are the tools that stand alone and are immediately useful.  A semantic network is a "knowledge" representation so simple that it is literally writable on paper.  An AI project adding a semantic network need not design a hippocampus-equivalent to form memories, nor build a sensory modality to represent mental imagery.  The traditional AI processes accompanying semantic nets - such as theorem proving, case-based reasoning, production systems, and expert systems - are again standalone algorithms.  Neural networks and evolutionary computations are not generally intelligent but they are generically intelligent; they can be trained on any problem that has a sufficiently shallow fitness gradient relative to available computing power.  (Though EURISKO's self-modifying heuristics probably had generality equalling or exceeding these more typical tools, the source code was not open and the system design was far too complex to build over an afternoon, so the design pattern was not reused - or so I would guess.)

The standalone nature of the traditional processes may make them useful tools for shoring up the initial stages of a general AI supersystem - with the exception of the semantic network; I regard semantic nets as poisonous to AI research for reasons which should shortly become clear.  But standalone algorithms are not substitutes for intelligence and they are not complete systems.  Genericity is not the same as generality.

"Physics envy" (trying to replace the human cognitive supersystem with a single process or method) should be distinguished from the less ambitious attempt to clean up the human mind design while leaving the essential architecture intact.  Cleanup is probably inevitable while human programmers are involved, but it is nonetheless a problem to be approached with extreme caution.  Although the population genetics model of evolution admits of many theoretical reasons why the presence of a feature may not imply adaptiveness (much less optimality), in practice the adaptationists usually win.  The spandrels of San Marco may not have been built for decorative elegance [Gould79], but they are still holding the roof up.  Cleanup should be undertaken, not with pride in the greater simplicity of human design relative to evolutionary design, but with a healthy dose of anxiety that we will leave out something important.

An example:  Humans are currently believed to have a modular adaptation for visual face recognition, generally identified with a portion of inferotemporal cortex, though this is a simplification [Rodman99].  At first glance this brainware appears to be an archetypal example of human-specific functionality, an adaptation to an evolutionary context with no obvious analogue for an early-stage AI.  However, [Carey92] has suggested from neuropathological evidence (associated deficits) that face recognition brainware is also responsible for the generalized task of acquiring very fine expertise in the visual domain; thus, the dynamics of face recognition may be of general significance for builders of sensory modalities.

Another example is the sensory modalities themselves.  As described in greater detail in Part II, the human cognitive supersystem is built to require the use of the sensory modalities which we originally evolved for other purposes.  One good reason why the human supersystem uses sensory modalities is that the sensory modalities are there.  Sensory modalities are evolutionarily ancient; they would have existed, in primitive or complex form, during the evolution of all higher levels of organization.  Neural tissue was already dedicated to sensory modalities, and would go on consuming ATP2 even if inactive, albeit at a lesser rate.  Consider the incremental nature of adaptation, so that in the very beginnings of hominid intelligence only a very small amount of de novo complexity would have been involved; consider that evolution has no inherent drive toward design elegance; consider that adaptation is in response to the total environment, which includes both the external environment and the genetic environment - these are all plausible reasons to suspect evolution of offloading the computational burden onto pre-existing neural circuitry, even where a human designer would have chosen to employ a separate subsystem.  Thus, it was not inherently absurd for AI's first devotees to try for general intelligence that employed no sensory modalities.

Today we have at least one reason to believe that nonsensory intelligence is a bad approach; we tried it and it didn't work.  Of course this is far too general an argument - it applies equally to "we tried non-face-recognizing intelligence and it didn't work" or even "we tried non-bipedal intelligence and it didn't work".  The argument's real force derives from specific hypotheses about the functional role of sensory modalities in general intelligence (discussed in Part II).  But in retrospect we can identify at least one methodological problem:  Rather than identifying the role played by modalities in intelligence, and then attempting to "clean up" the design by substituting a simpler process into the functional role played by modalities3, the first explorers of AI simply assumed that sensory modalities were irrelevant to general intelligence.

Leaving out key design elements, without replacement, on the basis of the mistaken belief that they are not relevant to general intelligence, is an error that displays a terrifying synergy with "physics envy".  In extreme cases - and most historical cases have been extreme - the design ignores everything about the human mind except one characteristic (logic, distributed parallelism, fuzziness, etc.), which is held to be "the key to intelligence".

I argue strongly for "supersystems", but I do not believe that "supersystems" are the necessary and sufficient Key to AI.  Human general intelligence requires the right supersystem, with the right cognitive subsystems, doing the right things in the right way.  Humans are not intelligent by virtue of being "supersystems", but by virtue of being a particular supersystem which implements human intelligence.  I emphasize supersystem design because I believe that the field of AI has been crippled by the wrong kind of simplicity - a simplicity which, as a design constraint, rules out workable designs for intelligence; a simplicity which, as a methodology, rules out incremental progress toward an understanding of general intelligence; a simplicity which, as a viewpoint, renders most of the mind invisible except for whichever single aspect is currently promoted as the Key to AI.

If the quest for design simplicity is to be "considered harmful"4, what should replace it?  I believe that rather than simplicity, we should pursue sufficiently complex explanations and usefully deep designs.  In ordinary programming, there is no reason to assume a priori that the task is enormously large.  In AI the rule should be that the problem is always harder and deeper than it looks, even after you take this rule into account.  Knowing that the task is large does not enable us to meet the challenge just by making our designs larger or more complicated; certain specific complexity is required, and complexity for the sake of complexity is worse than useless.  Nonetheless, the presumption that we are more likely to underdesign than overdesign implies a different attitude towards design, in which victory is never declared, and even after a problem appears to be solved, we go on trying to solve it.  If this creed were to be summed up in a single phrase, it would be:  "Necessary but not sufficient."  In accordance with this creed, it should be emphasized that supersystems thinking is only one part of a larger paradigm, and that an open-ended design process is itself "necessary but not sufficient".  These are first steps toward AI, but not the only first steps, and certainly not the last steps.



2: Part II: Levels of organization in deliberative general intelligence

Intelligence in the human cognitive supersystem is the result of the many cognitive processes taking place on multiple levels of organization.  However, this statement is vague without hypotheses about specific levels of organization and specific cognitive phenomena.  The concrete theory presented in Part II goes under the name of "deliberative general intelligence" (DGI).

The human mind, owing to its accretive evolutionary origin, has several major distinct candidates for the mind's "center of gravity".  For example, the limbic system is an evolutionarily ancient part of the brain that now coordinates activities in many of the other systems that later grew up around it.  However, in (cautiously) considering what a more foresightful and less accretive design for intelligence might look like, I find that a single center of gravity stands out as having the most complexity and doing most of the substantive work of intelligence, such that in an AI, to an even greater degree than in humans, this center of gravity would probably become the central supersystem of the mind.  This center of gravity is the cognitive superprocess which is introspectively observed by humans through the internal narrative - the process whose workings are reflected in the mental sentences that we internally "speak" and internally "hear" when thinking about a problem.  To avoid the awkward phrase "stream of consciousness" and the loaded word "consciousness", this cognitive superprocess will hereafter be referred to as deliberation.


2.1: An illustration of principles

My chosen entry point into deliberation is words - that is, the words we mentally speak and mentally hear in our internal narrative.  Let us take the word "lightbulb" (or the wordlike phrase "light bulb") as an example5.  When you see the letters spelling "light bulb", the phonemes for light bulb flow through your auditory cortex.  If a mental task requires it, a visual exemplar for the "light bulb" category may be retrieved as mental imagery in your visual cortex (and associated visual areas).  Some of your past memories and experiences, such as accidentally breaking a light bulb and carefully sweeping up the sharp pieces, may be associated with or stored under the "light bulb" concept.  "Light bulb" is associated to other concepts; in cognitive priming experiments, it has been shown that hearing a phrase such as "light bulb"6 will prime associated words such as "fluorescent" or "fragile", increasing the recognition speed or reaction speed when associated words are presented [Meyer71].  The "light bulb" concept can act as a mental category; it describes some referents in perceived sensory experiences or internal mental imagery, but not other referents; and, among the referents it describes, it describes some strongly and others only weakly.

To further expose the internal complexity of the "light bulb" concept, I would like to offer an introspective illustration.  I apologize to any readers who possess strong philosophical prejudices against introspection; I emphasize that the exercise is not intended as evidence for a theory, but rather as a means of introducing and grounding concepts that will be argued in more detail later.  That said:

Close your eyes, and try to immediately (without conscious reasoning) visualize a triangular light bulb - now.  Did you do so?  What did you see?  On personally performing this test for the first time, I saw a pyramidal light bulb, with smoothed edges, with a bulb on the square base.  Perhaps you saw a tetrahedral light bulb instead of a pyramidal one, or a light bulb with sharp edges instead of smooth edges, or even a fluorescent tube bent into an equilateral triangle.  The specific result varies; what matters is the process you used to arrive at the mental imagery.

Our mental image for "triangular light bulb" would intuitively appear to be the result of imposing "triangular", the adjectival form of "triangle", on the "light bulb" concept.  That is, the novel mental image of a triangular light bulb is apparently the result of combining the sensory content of two pre-existing concepts.  (DGI7 does not hold otherwise, but the assumption deserves to be pointed out explicitly.)  Similarly, the combination of the two concepts is not a collision, but a structured imposition; "triangular" is imposed on "light bulb", and not "light-bulb-like" on "triangle".

The structured combination of two concepts is a major cognitive process.  I emphasize that I am not talking about interesting complexity which is supposedly to be found in the overall pattern of relations between concepts; I am talking about complexity which is directly visible in the specific example of imposing "triangular" on "light bulb".  I am not "zooming out" to look at the overall terrain of concepts, but "zooming in" to look at the cognitive processes needed to handle this single case.  The specific example of imposing "triangular" on "light bulb" is a nontrivial feat of mind; "triangular light bulb" is a trickier concept combination than "green light bulb" or "triangular parking lot".

The mental process of visualizing a "triangular light bulb" flashes through the mind very quickly; it may be possible to glimpse subjective flashes of the concept combination, but the process is not really open to human introspection.  For example, when first imposing "triangular" on "light bulb", I would report a brief subjective flash of a conflict arising from trying to impose the planar 2-D shape of "triangular" on the 3-D "light bulb" concept.  However, before this conflict could take place, it would seem necessary that some cognitive process have already selected the shape facet of "triangular" for imposition - as opposed to, say, the color or line width of the "triangle" exemplar that appears when I try to visualize a "triangle" as such.  However, this initial selection of shape as the key facet did not rise to the level of conscious attention.  I can guess at the underlying selection process - in this case, that past experience with the usage had already "cached" shape as the salient facet for the concept triangular, and that the concept was abstracted from an experiential base in which shape, but not color, was the perceived similarity within the group of experiences.  However, I cannot actually introspect on this selection process.

Likewise, I may have glimpsed the existence of a conflict, and that it was a conflict resulting from the 2D nature of "triangular" versus the 3D nature of "light bulb", but how the conflict was detected is not apparent in the subjective glimpse.  And the resolution of the conflict, the transformation of the 2D triangle shape into a 3D pyramid shape, was apparently instantaneous from my introspective vantage point.  Again, I can guess at the underlying process - in this case, that several already-associated conceptual neighbors of "triangle" were imposed on "light bulb" in parallel, and the best fit selected.  But even if this explanation is correct, the process occurred too fast to be visible to direct introspection.  I cannot rule out the possibility that a more complex, more deeply creative process was involved in the transition from triangle to pyramid, although basic constraints on human information-processing (the 200 spike/second speed limit of the underlying neurons) still apply.  Nor can I rule out the possibility that there was a unique serial route from triangle to pyramid.

The creation of an actual visuospatial image of a pyramidal light bulb is, presumably, a complex visual process - one that implies the ability of the visuospatial modality to reverse the usual flow of information and send commands from high-level features to low-level features, instead of detecting high-level features from low-level features.  DGI hypothesizes that visualization occurs through a flow from high-level feature controllers to low-level feature controllers, creating an articulated mental image within a sensory modality through a multistage process that allows the detection of conflicts at higher levels before proceeding to lower levels.  The final mental imagery is introspectively visible, but the process that creates it is mostly opaque.

Some theorists defy introspection to assert that our mental imagery is purely abstract [Pylyshyn81].  Yet there exists evidence from neuroanatomy, functional neuroimaging, pathology of neurological disorders, and cognitive psychology to support the contention that mental imagery is directly represented in sensory modalities [Kosslyn94][Finke77] show that mental imagery can create visual afterimages8 similar to, though weaker than, the afterimages resulting from real visual experience.  [Sherman86] estimate that while the cat has roughly 106 fibers from the lateral geniculate nucleus9 to the visual cortex, there are approximately 107 fibers running in the opposite direction.  No explanatory consensus currently exists for the existence of the massive corticothalamic feedback projections, though there are many competing theories; the puzzle is of obvious interest to an AI researcher positing a theory in which inventing novel mental imagery is more computationally intensive than sensory perception.

To return to the "triangular lightbulb" example:  Once the visuospatial image of a pyramidal light bulb was fully articulated, the next introspective glimpse was of a conflict in visualizing a glass pyramid - a pyramid has sharp edges, and sharp glass can cut the user.  This implies the mental imagery had semantic content (knowledge about the material composition of the pyramidal light bulb), imported from the original "light bulb" concept, and well-integrated with the visual representation.  Like most modern-day humans, I know from early parental warnings and later real-life confirmation that sharp glass is dangerous.  Thus the rapid visual detection of sharp glass is important when dealing with real-life sensory experience.  I say this to emphasize that no extended line of intelligent reasoning (which would exceed the 200Hz speed limit of biological neurons) is required to react negatively to a fleeting mental image of sharp glass.  This reaction could reasonably happen in a single perceptual step, so long as the same perceptual system which detects the visual signature of sharp glass in real-world sensory experience also reacts to mental imagery.

The conflict detected was resolved by the imposition of smooth edges on the glass pyramid making up the pyramidal light bulb.  Again, this apparently occurred instantly; again, nontrivial hidden complexity is implied.  To frame the problem in the terms suggested by [Hofstadter85], the imaginative process needed to possess or create a "knob" governing the image's transition from sharp edges to rounded edges, and the possession or creation of this knob is the most interesting part of the process, not the selection of one knob from many.  If the "knob" was created on the fly, it implies a much higher degree of systemic creativity than selecting from among pre-existing options.

Once the final conflict was resolved by the perceptual imposition of smoothed edges, the final mental image took on a stable form.  Again, in this example, all of the mental events appeared introspectively to happen automatically and without conscious decisions on my part; I would estimate that the whole process took less than one second.

In concept combination, a few flashes of the intermediate stages of processing may be visible as introspective glimpses - especially those conflicts that arise to the level of conscious attention before being resolved automatically.  But the extreme rapidity of the process means the glimpses are even more unreliable than ordinary introspection - where introspection is traditionally considered unreliable to begin with.  To some extent, this is the point of the illustration narrated above; almost all of the internal complexity of concepts is hidden away from human introspection, and many theories of AI (even in the modern era) thus attempt to implement concepts on the token level, e.g., "lightbulb" as a raw LISP atom.

This traditional problem is why I have carefully avoided using the word symbol in the exposition above.  In AI, the term "symbol" carries implicit connotations about representation - that the symbol is a naked LISP atom (Prolog variable, etc.) whose supposed meaning derives from its relation to the surrounding atoms in a semantic net; or at most a LISP atom whose content is a "frame-based" LISP structure (that is, whose content is another semantic net).  Even attempts to argue against the design assumptions of Good Old-Fashioned AI (GOFAI) are often phrased in GOFAI's terms; for example, the "symbol grounding problem".  Much discussion of the symbol grounding problem has approached the problem as if the design starts out with symbols and "grounding" is then added.  In some cases this viewpoint has directly translated to AI architectures; e.g., a traditional semantic net is loosely coupled to a connectionist sensorimotor system [Hexmoor93].

DGI belongs to the existing tradition that asks, not "How do we ground our semantic nets?", but rather "What is the underlying stuff making up these rich high-level objects we call 'symbols'?" - an approach presented most beautifully in [Hofstadter79]; see also [Chalmers92].  From this viewpoint, without the right underlying "symbolstuff", there are no symbols; merely LISP tokens carved in mockery of real concepts and brought to unholy life by the naming-makes-it-so fallacy.

Imagine sensory modalities as solid objects with a metaphorical surface composed of the layered feature detectors and their inverse functions as feature controllers.  The metaphorical "symbolstuff" is a pattern that interacts with the feature detectors to test for the presence of complex patterns in sensory data, or inversely, interacts with the feature controllers to produce complex mental imagery.  Symbols combine through the faceted combination of their symbolstuffs, using a process that might be called "holonic conflict resolution", where information flows from high-level feature controllers to low-level feature controllers, and conflicts are detected at each layer as the flow proceeds.  ("Holonic" is a useful word to describe the simultaneous application of reductionism and holism, in which a single quality is simultaneously a combination of parts and a part of a greater whole [Koestler67].  Note that "holonic" does not imply strict hierarchy, only a general flow from high-level to low-level and vice versa.  For example, a single feature detector may make use of the output of lower-level feature detectors, and act in turn as an input to higher-level feature detectors.  The information contained in a mid-level feature is then the holistic sum of many lower-level features, and also an element in the sums produced by higher-level features.  If you pick one vantage point in a holonic structure and "look down" (reductionism) you find parts composing the local whole, with simpler behaviors that contribute to local complexity; if you "look up" (holism) you find a greater whole to which local parts contribute, and more complex processes which local behaviors support.  See also [Hofstadter79].)

I apologize for adding yet another term, "holonic conflict resolution", to a namespace already crowded with terms such as "computational temperature" [Mitchell93], "Prägnanz" [Koffka35], "Hopfield networks" [Hopfield85], "constraint propagation" [Kumar92], and many others.  Holonic conflict resolution is certainly not a wholly new idea, and may even be wholly unoriginal on a feature-by-feature basis, but the combination of features I wish to describe does not exactly match the existing common usage of any of the terms above.  "Holonic conflict resolution" is intended to convey the image of a process that flows serially through the layered, holonic structure of perception, with detected conflicts resolved locally or propagated to the level above, with a final solution that satisfices.  Many of the terms above, in their common usage, refer to an iterated annealing process which seeks a global minimum.  Holonic conflict resolution is intended to be biologically plausible; i.e., to involve a smooth flow of visualization which is computationally tractable for parallel but speed-limited neurons.

Holonic conflict resolution is not proposed as a complete solution to perceptual problems, but rather as the active canvas for the interaction of concepts with mental imagery.  In theoretical terms, holonic conflict resolution is a structural framework within which to posit specific conflict-detection and conflict-resolution methods.  Holonic imagery is the artist's medium within which symbolstuff paints mental pictures such as "triangular light bulb".

A constructive account of concepts and symbolstuff would need to supply:

This is not an exhaustive list of concept functionality; these are just the three most "interesting" challenges10.  These challenges are interesting because the difficulty of solving them simultaneously seems to be the multiplicative (rather than additive) product of the difficulties of solving them individually.  Other design requirements for a constructive account of concepts would include: association to nearby concepts; supercategories and subcategories; exemplars stored in memory; prototype and typicality effects [Rosch78]; and many others (see, e.g., [Lakoff87]).

The interaction of concepts with modalities, and the interaction of concepts with each other, illustrate what I believe to be several important rules about how to approach AI.

The first principle is that of multiple levels of organization.  The human phenotype is composed of atoms11, molecules, proteins, cells, tissues, organs, organ systems, and finally the complete body - eight distinguishable layers of organization, each successive layer built above the preceding one, each successive layer incorporating evolved adaptive complexity.  Some useful properties of the higher level may emerge naturally from lower-level behaviors, but not all of them; higher-level properties are also subject to selection pressures on heritable variation and the elaboration of complex functional adaptations.  In postulating multiple levels of organization, I am not positing that the behaviors of all higher layers emerge automatically from the lowest layer.

If I had to pick one single mistake that has been the most debilitating in AI, it would be implementing a process too close to the token level - trying to implement a high-level process without implementing the underlying layers of organization.  Many proverbial AI pathologies result at least partially from omitting lower levels of organization from the design.

Take, for example, that version of the "frame problem" - sometimes also considered a form of the "commonsense problem" - in which intelligent reasoning appears to require knowledge of an infinite number of special cases.  Consider a CPU which adds two 32-bit numbers.  The higher level consists of two integers which are added to produce a third integer.  On a lower level, the computational objects are not regarded as opaque "integers", but as ordered structures of 32 bits.  When the CPU performs an arithmetic operation, two structures of 32 bits collide, under certain rules which govern the local interactions between bits, and the result is a new structure of 32 bits.  Now consider the woes of a research team, with no knowledge of the CPU's underlying implementation, that tries to create an arithmetic "expert system" by encoding a vast semantic network containing the "knowledge" that two and two make four, twenty-one and sixteen make thirty-seven, and so on.  This giant lookup table requires eighteen billion billion entries for completion.

In this hypothetical world where the lower-level process of addition is not understood, we can imagine the "common-sense" problem for addition; the launching of distributed Internet projects to "encode all the detailed knowledge necessary for addition"; the frame problem for addition; the philosophies of formal semantics under which the LISP token thirty-seven is meaningful because it refers to thirty-seven objects in the external world; the design principle that the token thirty-seven has no internal complexity and is rather given meaning by its network of relations to other tokens; the "number grounding problem"; the hopeful futurists arguing that past projects to create Artificial Addition failed because of inadequate computing power; and so on.

To some extent this is an unfair analogy.  Even if the thought experiment is basically correct, and the woes described would result from an attempt to capture a high-level description of arithmetic without implementing the underlying lower level, this does not prove the analogous mistake is the source of these woes in the real field of AI.  And to some extent the above description is unfair even as a thought experiment; an arithmetical expert system would not be as bankrupt as semantic nets.  The regularities in an "expert system for arithmetic" would be real, noticeable by simple and computationally feasible means, and could be used to deduce that arithmetic was the underlying process being represented, even by a Martian reading the program code with no hint as to the intended purpose of the system.  The gap between the higher level and the lower level is not absolute and uncrossable, as it is in semantic nets.

An arithmetic expert system that leaves out one level of organization may be recoverable.  Semantic nets leave out multiple levels of organization.  Omitting all the experiential and sensory grounding of human symbols leaves no raw material to work with.  If all the LISP tokens in a semantic net were given random new names, there would be no way to deduce whether G0025 formerly meant hamburger or chair[Harnad90] describes the symbol grounding problem arising out of semantic nets as similar to learning Chinese as a first language using only a Chinese-to-Chinese dictionary.

I believe that many (though not all) cases of the "commonsense problem" or "frame problem" arise from trying to store all possible descriptions of high-level behaviors that, in the human mind, are modeled by visualizing the lower level of organization from which those behaviors emerge.  For example, [Lakoff99] give a sample list of "built-in inferences" emerging from what they identify as the Source-Path-Goal metaphor:

A general intelligence with a visual modality has no need to explicitly store an infinite number of such statements in a theorem-proving production system.  The above statements can be perceived on the fly by inspecting depictive mental imagery.  Rather than storing knowledge about trajectories, a visual modality actually simulates the behavior of trajectories.  A visual modality uses low-level elements, metaphorical "pixels" and their holonic feature structure, whose behaviors locally correspond to the real-world behaviors of the referent.  There is a mapping from representation to referent, but it is a mapping on a lower level of organization than traditional semantic nets attempt to capture.  The correspondence happens on the level where 13 is the structure 00001101, not on the level where it is the number thirteen.

I occasionally encounter some confusion about the difference between a visual modality and a microtheory of vision.  Admittedly, microtheories in theorem-proving systems are well known in AI, so some confusion is understandable.  But layered feature extraction in the visual modality - which is an established fact of neuroscience - is also very well known even in the pure computer science tradition of AI, and has been well-known ever since David Marr's tremendously influential 1982 book Vision [Marr82] and earlier papers.  To make the difference explicit, the human visual cortex "knows" about edge detection, shading, textures of curved surfaces, binocular disparities, color constancy under natural lighting, motion relative to the plane of fixation, and so on.  The visual cortex does not know about butterflies.  In fact, a visual cortex "knows" nothing; a sensory modality contains behaviors which correspond to environmental invariants, not knowledge about environmental regularities.

This illustrates the second-worst error in AI, the failure to distinguish between things that can be hardwired and things that must be learned.  We are not preprogrammed to know about butterflies.  Evolution wired us with visual circuitry that makes sense of the sensory image of the butterfly, and with object-recognition systems that form visual categories.  When we see a butterfly, we are then able to recognize future butterflies as belonging to the same kind.  Sometimes evolution bypasses this system to give us visual instincts, but this constitutes a tiny fraction of visual knowledge.  A modern human recognizes a vast number of visual categories with no analogues in the ancestral environment.

What problems result from failing to distinguish between things that can be hardwired and things that must be learned?  "Hardwiring what should be learned" is so universally combined with "collapsing the levels of organization" that it is difficult to sort out the resulting pathologies.  An expert systems engineer, in addition to acting on the assumption that knowledge of butterflies can be preprogrammed, is also likely to act on the assumption that knowledge about butterflies consists of a butterfly LISP token which derives meaning from relations to other LISP tokens - rather than butterfly being a stored pattern that interacts with the visual modality and recognizes a butterfly.  A semantic net not only lacks richness, it lacks the capacity to represent richness.  Thus, I would attribute the symbol grounding problem to "collapsing the levels of organization", rather than "hardwiring what should be learned".

But even if a programmer who understood the levels of organization tried to create butterfly-recognizing symbolstuff by hand, I would still expect the resulting butterfly pattern to lack the richness of the learned butterfly pattern in a human mind.  When the human visual system creates a butterfly visual category, it does not write an opaque, procedural butterfly-recognition codelet using abstract knowledge about butterflies and then tag the codelet onto a butterfly frame.  Human visual categorization abstracts the butterfly category from a store of visual experiences of butterflies.

Furthermore, visual categorization - the general concept-formation process, not just the temporal visual processing stream - leaves behind an association between the butterfly concept and the stored memories from which "butterfly" was abstracted; it associates one or more exemplars with the butterfly category; it associates the butterfly category through overlapping territory to other visual categories such as fluttering; it creates butterfly symbolstuff that can combine with other symbolstuffs to produce mental imagery of a blue butterfly; and so on.  To the extent that a human lacks the patience to do these things, or to the extent that a human does them in fragile and hand-coded ways rather than using robust abstraction from a messy experiential base, lack of richness will result.  Even if an AI needs programmer-created concepts to bootstrap further concept formation, bootstrap concepts should be created using programmer-directed tool versions of the corresponding AI subsystems, and the bootstrap concepts should be replaced with AI-formed concepts as early as possible.

Two other potential problems emerging from the use of programmer-created content are opacity and isolation.

Opacity refers to the potential inability of an AI's subsystems to modify content that originated outside the AI.  If a programmer is creating cognitive content, it should at least be the kind of content that the AI could have created on its own; it should be content in a form that the AI's cognitive subsystems can manipulate.  The best way to ensure that the AI can modify and use internal content is to have the AI create the content.  If an AI's cognitive subsystems are powerful enough to create content independently, then hopefully those same subsystems will be capable of adding to that content, manipulating it, bending it in response to pressures exerted by a problem, and so on.  What the AI creates, the AI can use and improve.  Whatever the AI accomplishes on its own is a part of the AI's mind; the AI "owns" it and is not simply borrowing it from the programmers.  This is a principle that extends far beyond abstracting concepts!

Isolation means that if a concept, or a piece of knowledge, is handed to the AI on a silver platter, the AI may be isolated from the things that the AI would have needed to learn first in order to acquire that knowledge naturally, in the course of building up successive layers of understanding to handle problems of increasing complexity.  The concept may also be isolated from similar concepts and related concepts that the AI would otherwise have learned at around the same time, denying the AI useful associations and slippages.  Conceivably programmers could try to second-guess isolation by hardwiring many similar "knowledges", but this is no substitute for a natural ecology of cognition.


2.2: Levels of organization in deliberation

The model of intelligence presented in this chapter - "Deliberative General Intelligence" or "DGI" - requires five distinct layers of organization, each layer built on top of the underlying layer.

Although the five-layer model is central to the DGI theory of intelligence, the rule of Necessary But Not Sufficient still holds.  An AI project will not succeed by virtue of "implementing a five-layer model of intelligence, just like the human brain".  It must be the right five layers.  It must be the right modalities, used in the right concepts, coming together to create the right thoughts seeking out the right goals.  (An AI might use different modalities, but will still need a right set of modalities.)

The five-layer model of deliberation is not inclusive of everything in the DGI theory of mind, but it covers substantial territory, and can be extended beyond the deliberation superprocess to provide a loose sense of which level of organization any cognitive process lies upon.  Observing that the human body is composed of molecules, proteins, cells, tissues, and organs is not a complete design for a human body, but it is nonetheless important to know whether something is an organ or a protein.  Blood, for example, is not a prototypical tissue, but it is composed of cells, and is generally said to occupy the tissue level of organization of the human body.  Similarly, the hippocampus, in its role as a memory-formation subsystem, is not a sensory modality, but it can be said to occupy the "modality level":  It is brainware (a discrete, modular chunk of neural circuitry); it lies above the neuron/code level; it has a characteristic tiling/wiring pattern as the result of genetic complexity; it interacts as an equal with the subsystems comprising sensory modalities.

Generalized definitions of the five levels of organization might be as follows:

Even for the generalized levels of organization, not everything fits cleanly into one level or another.  While the hardwired-learned-invented trichotomy usually matches the modality-concept-thought trichotomy, the two are conceptually distinct, and sometimes the correspondence is broken.  But the levels of organization are almost always useful - even exceptions to the rule are more easily seen as partial departures than as complete special cases.


2.3: The code level

The code level is composed of functions, classes, modules, packages; data types, data structures, data repositories; all the purely programmatic challenges of creating AI.  Artificial Intelligence has traditionally been much more intertwined with computer programming than it should be, mostly because of attempts to overcompress the levels of organization and implement thought sequences directly as programmatic procedures, or implement concepts directly as LISP atoms or LISP frames.  The code level lies directly beneath the modality level or brainware level; bleedover from modality-level challenges may show up as legitimate programmatic problems, but little else - not thoughts, cognitive content, or high-level problem-solving methods.

Any good programmer - a programmer with a feeling for aesthetics - knows the tedium of solving the same special case, over and over, in slightly different ways; and also the triumph of thinking through the metaproblem and creating a general solution that solves all the special cases simultaneously.  As the hacker Jargon File observes, "Real hackers generalize uninteresting problems enough to make them interesting and solve them -- thus solving the original problem as a special case (and, it must be admitted, occasionally turning a molehill into a mountain, or a mountain into a tectonic plate)." [Raymond01a].  This idiom does not work for general AI!  A real AI would be the ultimate general solution because it would encapsulate the cognitive processes that human programmers use to write any specific piece of code, but this ultimate solution cannot be obtained through the technique of successively generalizing uninteresting problems into interesting ones.

Programming is the art of translating a human's mental model of a problem-solution into a computer program; that is, the art of translating thoughts into code.  Programming inherently violates the levels of organization; it leads directly into the pitfalls of classical AI.  The underlying low-level processes that implement intelligence are of a fundamentally different character than high-level intelligence itself.  When we translate our thoughts about a problem into code, we are establishing a correspondence between code and the high-level content of our minds, not a correspondence between code and the dynamic process of a human mind.  In ordinary programming, the task is to get a computer to solve a specific problem; it may be an "interesting" problem, with a very large domain, but it will still be a specific problem.  In ordinary programming the problem is solved by taking the human thought process that would be used to solve an instance of the problem, and translating that thought process into code that can also solve instances of the problem.  Programmers are humans who have learned the art of inventing thought processes, called "algorithms", that rely only on capabilities an ordinary computer possesses.

The reflexes learned by a good, artistic programmer represent a fundamental danger when embarking on a general AI project.  Programmers are trained to solve problems, and trying to create general AI means solving the programming problem of creating a mind that solves problems.  There is the danger of a short-circuit, of misinterpreting the problem task as writing code that directly solves some specific challenge posed to the mind, instead of building a mind that can solve the challenge with general intelligence.  Code, when abused, is an excellent tool for creating long-term problems in the guise of short-term solutions.

Having described what we are forbidden to do with code, what legitimate challenges lie on this level of organization?

Some programming challenges are universal.  Any modern programmer should be familiar with the world of compilers, interpreters, debuggers, Integrated Development Environments, multithreaded programming, object orientation, code reuse, code maintenance, and the other tools and traditions of modern-day programming.  It is difficult to imagine anyone successfully coding the brainware level of general intelligence in assembly language - at least if the code is being developed for the first time.  In that sense object orientation and other features of modern-day languages are "required" for AI development; but they are necessary as productivity tools, not because of any deep similarity between the structure of the programming language and the structure of general intelligence.  Good programming tools help with AI development but do not help with AI.

Some programming challenges, although universal, are likely to be unusually severe in AI development.  AI development is exploratory, parallelized, and large.  Writing a great deal of exploratory code means that IDEs with refactoring support and version control are important, and that modular code is even more important than it is usually - or at least, code that is as modular as possible given the highly interconnected nature of the cognitive supersystem.

Parallelism on the hardware level is currently supported by symmetric multiprocessing chip architectures [Hwang98], NOW (network-of-workstations) clustering [Anderson95] and Beowulf clustering [Becker95], and message-passing APIs such as PVM [Geist93] and MPI [Gropp94].  However, software-level parallelism is not handled well by present-day languages and is therefore likely to present one of the greatest challenges.  Even if software parallelism were well-supported, AI developers will still need to spend time explicitly thinking on how to parallelize cognitive processes - human cognition may be massively parallel on the lower levels, but the overall flow of cognition is still serial.

Finally, there are some programming challenges that are likely to be unique to AI.

We know it is possible to evolve a general intelligence that runs on a hundred trillion synapses with characteristic limiting speeds of approximately 200 spikes per second.  An interesting property of human neurobiology is that, at a limiting speed of 150 meters per second for myelinated axons, each neuron is potentially within roughly a single "clock tick" of any other neuron in the brain15[Sandberg99] describes a quantity S that translates to the wait time, in clock cycles, between different parts of a cognitive system - the minimum time it could take for a signal to travel between the most distant parts of the system, measured in the system's clock ticks.  For the human brain, S is on the rough order of 1 - in theory, at least.  In practice, axons take up space and myelinated axons take up even more space, so the brain uses a highly modular architecture, but there are still long-distance pipes such as the corpus callosum.  Currently, S is much greater than 1 for clustered computing systems.  S is greater than 1 even within a single-processor computer system; Moore's Law for intrasystem communications bandwidth describes a substantially slower doubling time than processor speeds.  Increasingly the limiting resource of modern computing systems is not processor speed but memory bandwidth [Wulf95] (and this problem has gotten worse, rather than better, since 1995).

One class of purely programmatic problems that are unique to AI arise from the need to "port" intelligence from massively parallel neurons to clustered computing systems (or other human-programmable substrate).  It is conceivable, for example, that the human mind handles the cognitive process of memory association by comparing current working imagery to all stored memories, in parallel.  We have no particular evidence that the human mind uses a brute force comparison, but it could be brute-forced.  The human brain acknowledges no distinction between CPU and RAM.  If there are enough neurons to store a memory, then the same neurons may presumably be called upon to compare that memory to current experience.  (This holds true even if the correspondence between neural groups and stored memories is many-to-many instead of one-to-one.)

Memory association may or may not use a "compare" operation (brute force or otherwise) of current imagery against all stored memories, but it seems likely that the brain uses a massively parallel algorithm at one point or another of its operation; memory association is simply a plausible candidate.  Suppose that memory association is a brute-force task, performed by asking all neurons engaged in memory storage to perform a "compare" against patterns broadcast from current working imagery.  Faced with the design requirement of matching the brute force of 1014 massively parallel synapses with a mere clustered system, a programmer may be tempted to despair.  There is no a priori reason why such a task should be possible.

Faced with a problem of this class, there are two courses the programmer can take.  The first is to implement an analogous "massive compare" as efficiently as possible on the available hardware - an algorithmic challenge worthy of Hercules, but past programmers have overcome massive computational barriers through heroic efforts and the relentless grinding of Moore's Law.  The second road - much scarier, with even less of a guarantee that success is possible - is to redesign the cognitive process for different hardware.

The human brain's most fundamental limit is its speed.  Anything that happens in less than a second perforce must use less than 200 sequential operations, however massively parallelized.  If the human brain really does use a massively parallel brute-force compare against all stored memories to handle the problem of association, it's probably because there isn't time to do anything else!  The human brain is massively parallel because massive parallelism is the only way to do anything in 200 clock ticks.  If modern computers ran at 200Hz instead of 2GHz, PCs would also need 1014 processors to do anything interesting in realtime.

A sufficiently bold general AI developer, instead of trying to reimplement the cognitive process of association as it developed in humans, might instead ask:  What would this cognitive subsystem look like, if it had evolved on hardware instead of wetware?  If we remove the old constraint of needing to complete in a handful of clock ticks, and add the new constraint of not being able to offhandedly "parallelize against all stored memories", what is the new best algorithm for memory association?  For example, suppose that you find a method of "fuzzy hashing" a memory, such that mostly similar memories automatically collide within a container space, but where the fuzzy hash inherently requires an extended linear series of sequential operations that would have placed "fuzzy hashing" out of reach for realtime neural operations.  "Fuzzy hashing" would then be a strong candidate for an alternative implementation of memory association.

A computationally cheaper association subsystem that exploits serial speed instead of parallel speed, whether based around "fuzzy hashing" or something else entirely, might still be qualitatively less intelligent than the corresponding association system within the human brain.  For example, memory recognition might be limited to clustered contexts rather than being fully general across all past experience, with the AI often missing "obvious" associations (where "obvious" has the anthropocentric meaning of "computationally easy for a human observer").  In this case, the question would be whether the overall general intelligence could function well enough to get by, perhaps compensating for lack of associational breadth by using longer linear chains of reasoning.  The difference between serialism and parallelism, on a low level, would propagate upward to create cognitive differences that compensate for the loss of human advantages or exploit new advantages not shared by humans.

Another class of problem stems from "porting" across the extremely different programming styles of evolution versus human coding.  Human-written programs typically involve a long series of chained dependencies that intersect at single points of failure - "crystalline" is a good term to describe most human code.  Computation in neurons has a different character.  Over time our pictures of biological neurons have evolved from simple integrators of synaptic inputs that fire when a threshold input level is reached, to sophisticated biological processors with mixed analog-digital logics, adaptive plasticity, dendritic computing, and functionally relevant dendritic and synaptic morphologies [Koch00].  What remains true is that, from an algorithmic perspective, neural computing uses roughly arithmetical operations16 that proceed along multiple intertwining channels in which information is represented redundantly and processed stochastically.  Hence, it is easier to "train" neural networks - even nonbiological connectionist networks - than to train a piece of human-written code.  Flipping a random bit inside the state of a running program, or flipping a random bit in an assembly-language instruction, has a much greater effect than a similar perturbation of a neural network.  For neural networks the fitness landscapes are smoother.  Why is this?  Biological neural networks need to tolerate greater environmental noise (data error) and processor noise (computational error), but this is only the beginning of the explanation.

Smooth fitness landscapes are a useful, necessary, and fundamental outcome of evolution.  Every evolutionary success starts as a mutation - an error - or as a novel genetic combination.  A modern organism, powerfully adaptive with a large reservoir of genetic complexity, necessarily possesses a very long evolutionary history; that is, the genotype has necessarily passed through a very large number of successful mutations and recombinations along the road to its current form.  The "evolution of evolvability" is most commonly justified by reference to this historical constraint [Dawkins96], but there have also been attempts to demonstrate local selection pressures for the characteristics that give rise to evolvability [Wagner96], thus averting the need to invoke the controversial agency of species selection.  Either way, smooth fitness landscapes are part of the design signature of evolution.

"Smooth fitness landscapes" imply, among other things, that a small perturbation in the program code (genetic noise), in the input (environmental noise), or in the state of the executing program (processor noise), is likely to produce at most a small degradation in output quality.  In most human-written code, a small perturbation of any kind usually causes a crash.  Genomes are built by a cumulative series of point mutations and random recombinations.  Human-written programs start out as high-level goals which are translated, by an extended serial thought process, into code.  A perturbation to human-written code perturbs the code's final form, rather than its first cause, and the code's final form has no history of successful mutation.  The thoughts that gave rise to the code probably have a smooth fitness metric, in the sense that a slight perturbation to the programmer's state of mind will probably produce code that is at most a little worse, and possibly a little better.  Human thoughts, which are the original source of human-written code, are resilient; the code itself is fragile.

The dream solution would be a programming language in which human-written, top-down code somehow had the smooth fitness landscapes that are characteristic of accreted evolved complexity, but this is probably far too much to ask of a programming language.  The difference between evolution and design runs deeper than the difference between stochastic neural circuitry and fragile chip architectures.  On the other hand, using fragile building blocks can't possibly help, so a language-level solution might solve at least some of the problem.

The importance of smooth fitness landscapes holds true for all levels of organization.  Concepts and thoughts should not break as the result of small changes.  The code level is being singled out because smoothness on the code level represents a different kind of problem than smoothness on the higher levels.  On the higher levels, smoothness is a product of correctly designed cognitive processes; a learned concept will apply to messy new data because it was abstracted from a messy experiential base.  Given that AI complexity lying within the concept level requires smooth fitness landscapes, the correct strategy is to duplicate the smoothness on that level - to accept as a high-level design requirement that the AI produce error-tolerant concepts abstracted from messy experiential bases.

On the code level, neural circuitry is smooth and stochastic by the nature of neurons and by the nature of evolutionary design.  Human-written programs are sharp and fragile ("crystalline") by the nature of modern chip architectures and by the nature of human programming.  The distinction is not likely to be erased by programmer effort or new programming languages.  The long-term solution might be an AI with a sensory modality for code (see Part III), but that is not likely to be attainable in the early stages.  The basic code-level "stuff" of the human brain has built-in support for smooth fitness landscapes, and the basic code-level "stuff" of human-written computer programs does not.  Where human processes rely on neural circuitry being automatically error-tolerant and trainable, it will take additional programmatic work to "port" that cognitive process to a new substrate where the built-in support is absent.  The final compromise solution may have error tolerance as one explicit design feature among many, rather than error-tolerance naturally emerging from the code level.

There are other important features that are also supported by biological neural networks - that are "natural" to neural substrate.  These features probably include:

Again, this does not imply an unbeatable advantage for biological neural networks.  In some cases wetware has very poor feature support, relative to contemporary hardware.  Contemporary hardware has better support for: The challenge is using new advantages to compensate for the loss of old advantages, and replacing substrate-level support with design-level support.

This concludes the account of exceptional issues that arise at the code level.  An enumeration of all issues that arise at the code level - for example, serializing the current contents of a sensory modality for efficient transmission to a duplicate modality on a different node of a distributed network - would constitute at least a third of a complete constructive account of a general AI.  But programming is not all the work of AI, perhaps not even most of the work of AI; much of the effort needed to construct an intelligence will go into prodding the AI into forming certain concepts, undergoing certain experiences, discovering certain beliefs, and learning various high-level skills.  These tasks cannot be accomplished with an IDE.  Coding the wrong thing successfully can mess up an AI project worse than any number of programming failures.  I believe that the most important skill an AI developer can have is knowing what not to program.


2.4: The modality level

2.4.1: The evolutionary design of modalities in humans

Most students of AI are familiar with the high-level computational processes of at least one human sensory modality, vision, at least to the extent of being acquainted with David Marr's "2 1/2D world" and the concept of layered feature extraction [Marr82].  Further investigations in computational neuroscience have both confirmed Marr's theory and rendered it enormously more complex.  Although many writers, including myself, have been known to use the phrase "visual cortex" when talking about the entire visual modality, this is like talking about the United States by referring to New York.  About 50% of the neocortex of nonhuman primates is devoted exclusively to visual processing, with over 30 distinct visual areas identified in the macaque monkey [Felleman91].

The major visual stream is the retinal-geniculate-cortical stream, which goes from the retina to the lateral geniculate nucleus to the striate cortex17 to the higher visual areas.  Beyond the visual cortex, processing splits into two major secondary streams; the ventral stream heading toward the temporal lobe for object recognition, and the dorsal stream heading toward the parietal lobe for spatial processing.  The visual stream begins in the retina, which contains around 100 million rods and 5 million cones, but feeds into an optic cable containing only around 1 million axons.  Visual preprocessing begins in the first layer of the retina, which converts the raw intensities into center-surround gradients, a representation that forms the basis of all further visual processing.  After several further layers of retinal processing, the final retinal layer is composed of a wide variety of ganglion types that include directionally selective motion detectors, slow-moving edge detectors, fast movement detectors, uniformity detectors, and subtractive color channels.  The axons of these ganglions form the optic nerve and project to the magnocellular, parvocellular, and koniocellular layers of the lateral geniculate nucleus; currently it appears that each class of ganglion projects to only one of these layers.  It is widely assumed that further feature detection takes place in the lateral geniculate nucleus, but the specifics are not currently clear.  From the lateral geniculate nucleus, the visual information stream continues to area V1, the primary visual cortex, which begins feature extraction for information about motion, orientation, color and depth.  From primary visual cortex the information stream continues, making its way to the higher visual areas, V2 through V6.  Beyond the visual cortex, the information stream continues to temporal areas (object recognition) and parietal areas (spatial processing).

As mentioned earlier, primary visual cortex sends massive corticothalamic feedback projections to the lateral geniculate nucleus [Sherman86].  Corticocortical connections are also typically accompanied by feedback projections of equal strength [Felleman91].  There is currently no standard explanation for these feedback connections.  DGI18 requires sensory modalities with feature controllers that are the inverse complements of the feature detectors; this fits with the existence of the feedback projections.  However, it should be noted that this assertion is not part of contemporary neuroscience.  The existence of feature controllers is allowed for, but not asserted, by current theory; their existence is asserted, and required, by DGI.  (The hypothesis that feedback projections play a role in mental imagery is not limited to DGI; for example, [Kosslyn94] cites the existence of corticocortical feedback projections as providing an underlying mechanism for higher-level cognitive functions to control depictive mental imagery.)

The general lesson learned from the human visual modality is that modalities are not microtheories, that modalities are not flat representations of the pixel level, and that modalities are functionally characterized by successive layers of successively more elaborate feature structure.  Modalities are one of the best exhibitions of this evolutionary design pattern - ascending layers of adaptive complexity - which also appears, albeit in very different form, in the ascending code-modality-concept-thought-deliberation model of the human mind.  Each ascending layer is more elaborate, more complex, more flexible, and more computationally expensive.  Each layer requires the complexity of the layer underneath - both functionally within a single organism, and evolutionarily within a genetic population.

The concept layer is evolvable in a series of short steps if, and only if, there already exists substantial complexity within the modality layer.  The same design pattern - ascending layers of adaptive complexity - also appears within an evolved sensory modality.  The first features detected are simple, and can evolve in a single step or a small series of adaptive short steps.  The ability to detect these first features can be adaptive even in the absence of a complete sensory modality.  The eye, which is currently believed to have independently evolved in many different species, may have begun, each time, as a single light-sensitive spot on the organism's skin.

In modalities, each additional layer of feature detectors makes use of the information provided by the first layer of feature detectors.  In the absence of the first layer of feature detectors, the "code" for the second layer of feature detectors would be too complex to evolve in one chunk.  With the first layer of feature detectors already present, feature detectors in the second layer can evolve in a single step, or in a short series of locally adaptive steps.  The successive layers of organization in a sensory modality are a beautiful illustration of evolution's design signature, the functional ontogeny of the information recapitulating the evolutionary phylogeny.

Evolution is a good teacher but a poor role model; is this design a bug or a feature?  I would argue that it is generally a feature.  There is a deep correspondence between evolutionarily smooth fitness landscapes and computationally smooth fitness landscapes.  There is a deep correspondence between each successive layer of feature detectors being evolvable, and each successive layer of feature detectors being computable in a way that is "smooth" rather than "fragile", as described in the earlier discussion of the code layer.  Smooth computations are more evolvable, so evolution, in constructing a system incrementally, tends to construct linear sequences or ascending layers of smooth operations.

An AI designer may conceivably discard the requirement that each ascending layer of feature detection be incrementally useful/adaptive - although this may make the subsystem harder to incrementally develop and test!  It is cognitively important, however, that successive layers of feature detectors be computationally "smooth" in one specific sense.  DGI concepts interact with inverse feature detectors, "feature controllers", in order to construct mental imagery.  For the task of imposing a concept and the still more difficult task of abstracting a concept to be simultaneously tractable, it is necessary that sensory modalities be a continuum of locally smooth layers, rather than consisting of enormous, intractable, opaque chunks.  There is a deep correspondence between the smooth design that renders concepts tractable and the smooth architecture emergent from incremental evolution.

The feature controllers used to create mental imagery are evolvable and preadaptive in the absence of mental imagery; feature controllers could begin as top-down constraints in perceptual processing, or even more simply as a perceptual step which happens to be best computed by a recurrent network.  In both cases, the easiest (most evolvable) architecture is generally one in which the feedback connection reciprocates the feedforward connection.  Thus, the feature controller layers are not a separate system independent from the feature detector layers; rather, I expect that what is locally a feature detector is also locally a feature controller.  Again, this smooth reversibility helps render it possible to learn a single concept which can act as a category detector or a category imposer.  It is the simultaneous solution of concept imposition, concept satisfaction, concept faceting, and concept abstraction that requires reversible features - feature controllers which are the local inverses of the feature detectors.  I doubt that feature controllers reach all the way down to the first layers of the retina (I have not heard of any feedback connections reaching this far), but direct evidence from neuroimaging shows that mental imagery activates primary visual cortex [Kosslyn93]; I am not sure whether analogous tests have been performed for the lateral geniculate nucleus, but the feedback connections are there.

2.4.2: The human design of modalities in AI

An AI needs sensory modalities - but which modalities?  How do those modalities contribute materially to general intelligence outside the immediate modality?

Does an AI need a visuospatial system modeled after the grand complexity of the visuospatial system in primates and humans?  We know more about the human visual modality than about any other aspect of human neurology, but that doesn't mean we know enough to build a visual modality from scratch.  Furthermore, the human visual modality is enormously complex, computationally intensive, and fitted to an environment which an AI does not necessarily have an immediate need to comprehend.  Should humanlike 3D vision19 be one of the first modalities attempted?

I believe it will prove best to discard the human modalities or to use them as inspiration only - to use a completely different set of sensory modalities during the AI's early stages.  An AI occupies a different environment than a human and direct imitation of human modalities would not be appropriate.  For an AI's initial learning experiences, I would advocate placing the AI in complex virtual environments, where the virtual environments are internal to the computer but external to the AI.  The programmers would then attempt to develop sensory modalities corresponding to the virtual environments.  Henceforth I may use the term "microenvironment" to indicate a complex virtual environment.  The term "microworld" is less unwieldy, but should not be taken as having the Good Old-Fashioned AI connotation of "microworlds" in which all features are directly represented by predicate logic, e.g., SHRDLU's simplified world of blocks and tables [Winograd72].

Abandoning the human modalities appears to introduce an additional fragile dependency on the correctness of the AI theory, in that substituting novel sensory modalities for the human ones would appear to require a correct understanding of the nature of sensory modalities and how they contribute to intelligence.  This is true, but I would argue that the existence of an additional dependency is illusory.  An attempt to blindly imitate the human visual modality, without understanding the role of modalities in intelligence, would be unlikely to contribute to general intelligence except by accident.  Our modern understanding of the human visual modality is not so perfect that we could rely on the functional completeness of a neurologically inspired design; for example, a design based only on consensus contemporary theory might omit feature controllers!  However, shifting to microworlds does require that experience in the microworlds reproduce functionally relevant aspects of experience in real life, including unpredictability, uncertainty, real-time process control, holonic (part-whole) organization, et cetera.  I do not believe that this introduces an additional dependency on theoretic understanding, over and above the theoretic understanding that would be required to build an AI that absorbed complexity from these aspects of real-world environments, but it nonetheless represents a strong dependency on theoretic understanding.

Suppose that we are designing, de novo, a sensory modality and virtual environment.  Three possible modalities that come to mind as reasonable for a very primitive and early-stage AI, in ascending order of implementational difficulty, would be:

  1. A modality for Newtonian billiard balls;
  2. A modality for a 100x100 "Go" board;
  3. A modality for some type of interpreted code (a metaphorical "codic cortex").
In human vision, the very first visual neurons are the "rods and cones" which transduce impinging environmental photons to a neural representation as sensory information.  For each of the three modalities above, the "rods and cones" level would probably use essentially the same representation as the data structures used to create the microworld, or virtual environment, in which the AI is embodied.  This is a major departure from the design of naturally evolved modalities, in which the basic level - the quark level, as far as we know - is many layers removed from the high-level objects that give rise to the indirect information that reaches the senses.  Evolved sensory modalities devote most of their complexity to reconstructing the world that gives rise to the incoming sensory impressions - to reconstructing the 3D moving objects that give rise to the photons impinging on the rods-and-cones layer of the retina.  Of course, choosing vision as an example is arguably a biased selection; sound is not as complex as vision, and smell and taste are not as complex as sound.  Nonetheless, eliminating the uncertainty and intervening layers between the true environment and the organism's sensory data is a major step.  It should significantly reduce the challenges of early AI development, but is a dangerous step nonetheless because of its distance from the biological paradigm and its elimination of a significant complexity source.

I recommend eliminating environmental reconstruction as a complexity source in early AI development.  Visualizing the prospect of deliberately degrading the quality of the AI's environmental information on one end, and elaborating the AI's sensory modality on the other end, I find it likely that the entire operation will cancel out, contributing nothing.  An AI that had to learn to reconstruct the environment, in the same way that evolution learned to construct sensory modalities, might produce interesting complexity as a result; but if the same programmer is creating environmental complexity and modality complexity, I would expect the two operations to cancel out.  While environmental reconstruction is a nontrivial complexity source within the human brain, I consider the ratio between the difficulty of programmer development of the complexity, and the contribution of that complexity to general intelligence, to be relatively small.  Adding complexity for environmental reconstruction, by introducing additional layers of complexity in the microworld and deliberately introducing information losses between the topmost layer of the microworld and the AI's sensory receptors, and then attempting to create an AI modality which could reconstruct the original microworld content from the final sensory signal, would require a relatively great investment of effort in return for what I suspect would be a relatively small boost to general intelligence.

Suppose that for each of the three modalities - billiards, Go, code - the "pre-retinal" level consists of true and accurate information about the quark level of the virtual microworld, although perhaps not complete information, and that the essential complexity which renders the model a "sensory modality" rests in the feature structure, the ascending layers of feature detectors and descending layers of feature controllers.  Which features, then, are appropriate?  And how do they contribute materially to general intelligence?

The usual statement is that the complexity in a sensory modality reflects regularities of the environment, but I wish to offer a slightly different viewpoint.  To illustrate this view, I must borrow and severely simplify the punchline of a truly elegant paper, "The Perceptual Organization of Colors" by Roger Shepard [Shepard92].  Among other questions, this paper seeks to answer the question of trichromancy:  Why are there three kinds of cones in the human retina, and not two, or four?  Why is human visual perception organized into a three-dimensional color space?  Historically, it was often theorized that trichromancy represented an arbitrary compromise between chromatic resolution and spatial resolution; that is, between the number of colors perceived and the grain size of visual resolution.  As it turns out, there is a more fundamental reason why three color channels are needed.

To clarify the question, consider that surfaces possess a potentially infinite number of spectral reflectance distributions.  We will focus on spectral reflectance distributions, rather than spectral power distributions, because adaptively relevant objects that emit their own light are environmentally rare.  Hence the physically constant property of most objects is the spectral reflectance distribution, which combines with the spectral power distribution of light impinging on the object to give rise to the spectral power distribution received by the human eye.  The spectral reflectance distribution is defined over the wavelengths from 400nm to 700nm (the visible range), and since wavelength is a continuum, the spectral reflectance distribution can theoretically require an unlimited number of quantities to specify.  Hence, it is not possible to exactly constrain a spectral reflectance distribution using only three quantities, which is the amount of information transduced by human cones.

The human eye is not capable of discriminating among all physically possible reflecting surfaces.  However, it is possible that for "natural" surfaces - surfaces of the kind commonly encountered in the ancestral environment - reflectance for each pure frequency does not vary independently of reflectance for all other frequencies.  For example, there might exist some set of basis reflectance functions, such that the reflectance distributions of almost all natural surfaces could be expressed as a weighted sum of the basis vectors.  If so, one possible explanation for the trichromancy of human vision would be that three color channels are just enough to perform adequate discrimination in a "natural" color space of limited dimensionality.

The ability to discriminate between all natural surfaces would be the design recommended by the "environmental regularity" philosophy of sensory modalities.  The dimensionality of the internal model would mirror the dimensionality of the environment.

As it turns out, natural surfaces have spectral reflectance distributions that vary along roughly five to seven dimensions [Maloney86].  There thus exist natural surfaces that, although appearing to trichromatic viewers as "the same color", nonetheless possess different spectral reflectance distributions.

[Shepard92] instead asks how many color channels are needed to ensure that the color we perceive is the same color each time the surface is viewed under different lighting conditions.  The amount of ambient light can also potentially vary along an unlimited number of dimensions, and the actual light reaching the eye is the product of the spectral power distribution and the spectral reflectance distribution.  A reddish object in bluish light may reflect the same number of photons of each wavelength as a bluish object in reddish light.  Similarly, a white object in reddish light may reflect mostly red photons, while the same white object in bluish light may reflect mostly blue photons.  And yet the human visual system manages to maintain the property of color constancy; the same object will appear to be the same color under different lighting conditions.

[Judd64] measured 622 spectral power distributions for natural lighting, under 622 widely varying natural conditions of weather and times of day, and found that variations in natural lighting reduce to three degrees of freedom.  Furthermore, these three degrees of freedom bear a close correspondence to the three dimensions of color opponency that were proposed for the human visual system based on experimental examination [Hurvich57].  The three degrees of freedom are:

The three color channels of the human visual system are precisely the number of channels needed in order to maintain color constancy under natural lighting conditions20.  Three color channels are not enough to discriminate between all natural surface reflectances, but three color channels are the exact number required to compensate for ambient natural lighting and thereby ensure that the same surface is perceptually the "same color" on any two occasions.  This simplifies the adaptively important task of recognizing a previously experienced object on future encounters.

The lesson I would learn from this tale of color constancy is that sensory modalities are about invariants and not just regularities.  Consider the task of designing a sensory modality for some form of interpreted code.  (This is a very challenging task because human programming languages tend toward non-smooth fitness landscapes, as previously discussed.)  When considering which features to extract, the question I would ask is not "What regularities are found in code?" but rather "What feature structure is needed for the AI to perceive two identical algorithms with slightly different implementations as 'the same piece of code'?"  Or more concretely:  "What features does this modality need to extract to perceive the recursive algorithm for the Fibonacci sequence and the iterative algorithm for the Fibonacci sequence as 'the same piece of code'?"

Tip your head slightly to the left, then slightly to the right.  Every retinal receptor may receive a different signal, but the experienced visual field remains almost exactly the "same".  Hold up a chess pawn, and tip it slightly to the left or slightly to the right.  Despite the changes in retinal reception, we see the "same" pawn with a slightly different orientation.  Could a sensory modality for code look at two sets of interpreted bytecodes (or other program listing), completely different on a byte-by-byte basis, and see these two listings as the "same" algorithm in two slightly different "orientations"?

The modality level of organization, like the code level, has a characteristic kind of work that it performs.  Formulating a butterfly concept and seeing two butterflies as members of the same category is the work of the concept level, but seeing a chess pawn in two orientations as the same pawn is the work of the modality level.  There is overlap between the modality level and the concept level, just as there is overlap between the code level and the modality level.  But on the whole, the modality level is about invariants rather than regularities and identities rather than categories.

Similarly, the understanding conferred by the modality level should not be confused with the analytic understanding characteristic of thoughts and deliberation.  Returning to the example of a codic modality, one possible indication of a serious design error would be constructing a modality that could analyze any possible piece of code equally well.  The very first layer of the retina - rods and cones - is the only part of the human visual system that will work on all possible pixel fields.  The rest of the visual system will only work for the low-entropy pixel fields experienced by a low-entropy organism in a low-entropy environment.  The very next layer, after rods and cones, already relies on center-surround organization being a useful way to compress visual information; this only holds true in a low-entropy visual environment.

Designing a modality that worked equally well for any possible computer program would probably be an indication that the modality was extracting the wrong kind of information.  Thus, one should be wary of an alleged "feature structure" that looks as if it would work equally well for all possible pieces of code.  It may be a valid analytical method but it probably belongs on the deliberation level, not the modality level.  (Admittedly not every local step of a modality must be dependent on low-entropy input; some local stages of processing may have the mathematical nature of a lossless transform that works equally well on any possible input.  Also, hardware is probably better suited than wetware to lossless transforms.)

The human brain is constrained by a characteristic serial speed of 200 sequential steps per second, and by the ubiquitous internal use of the synchronous arrival of associated information, to arrange processing stages that flow smoothly forward.  High-level "if-then" or "switch-case" logic is harder to arrive at neurally, and extended complex "if-then" or "switch-case" logic is probably almost impossible unless implemented through branching parallel circuitry that remains synchronized.  Probably an exceptional condition must be ignored, averaged out, or otherwise handled using the same algorithms that would apply to any other modality content. Can an AI modality use an architecture that applies different algorithms to different pieces of modality content?  Can an AI modality handle exceptional conditions through special-case code?  I would advise caution, for several reasons.  First, major "if-then" branches are characteristic of deliberative processes, and being tempted to use such a branch may indicate a level confusion.  Second, making exceptions to the smooth flow of processing will probably complicate the meshing of concepts and modalities.  Third, modalities are imperfect but fault-tolerant processes, and the fault tolerance plays a role in smoothing out the fitness landscapes and letting the higher levels of organization be built on top; thus, trying to handle all the data by detecting exceptional conditions and correcting them, a standard pattern in human programming, may indicate that the modality is insufficiently fault-tolerant.  Fourth, handling all exceptions is characteristic of trying to handle all inputs and not just low-entropy inputs.  Hence, on the whole, sensory modalities are characterized by the smooth flow of information through ascending layers of feature detectors.  Of course, detecting an exceptional condition as a feature may turn out to be entirely appropriate!

Another issue which may arise in artificial sensory modalities is that unsophisticated artificial modalities may turn out to be significantly more expensive, computationally, for the effective intelligence they deliver.  Sophisticated evolved modalities conserve computing power in ways that might be very difficult for a human programmer to duplicate.  An example would be the use of partial imagery, modeling only the features that are needed for a high-level task [Hayhoe98]; a simplified modality that does not support partial imagery may consume more computing power.  Another example would be the human visual system's selective concentration on the center of the visual field - the "foveal architecture", in which areas of the visual field closer to the center are allocated a greater number of neurons.  The cortical magnification factor for primates is inverse-linear [Tootell85]; the complex logarithm is the only two-dimensional map function that has this property [Schwartz77], as confirmed experimentally by [Schwartz89].  A constant-resolution version of the visual cortex, with the maximum human visual resolution across the full human visual field, would require 10,000 times as many cells as our actual cortex [Rojer90].

But consider the programmatic problems introduced by the use of a logarithmic map.  Depending on where an object lies in the visual field, its internal representation on a retinotopic map will be completely different; no direct comparison of the data structures would show the identity or even hint at the identity.  That an off-center object in our visual field can rotate without perceptually distorting, as its image distorts wildly within the physical retinotopic map, presents a nontrivial computational problem21.

Evolution conserves computing power by complicating the algorithm.  Evolution, considered as a design pressure, exerts a steady equipotential design pressure across all existing complexity; a human programmer wields general intelligence like a scalpel.  It is not much harder for evolution to "design" and "debug" a logarithmic visual map because of this steady "design pressure"; further adaptations can build on top of a logarithmic visual map almost as easily as a constant-resolution map.  A human programmer's general intelligence would run into difficulty keeping track of all the simultaneous design complications created by a logarithmic map.  It might be possible, but it would be difficult, especially in the context of exploratory research; the logarithmic map transforms simple design problems into complex design problems and hence transforms complex design problems into nightmares.

I would suggest using constant-resolution sensory modalities during the early stages of an AI - as implied above by suggesting a sensory modality modeled around a 100x100 Go board - but the implication is that these early modalities will be lower-resolution, will have a smaller field, and will be less efficient computationally.  An opposing theoretic view would be that complex but efficient modalities introduce necessary issues for intelligence.  An opposing pragmatic view would be that complex but efficient modalities are easier to accommodate in a mature AI if they have been included in the architecture from the beginning, so as to avoid metaphorical "Y2K" issues (ubiquitous dependencies on a simplifying assumption which is later invalidated).


2.5: The concept level

DGI uses the term concept to refer to the mental stuffs underlying the words that we combine into sentences; concepts are the combinatorial building blocks of thoughts and mental imagery.  These building blocks are learned complexity, rather than innate complexity; they are abstracted from experience.  Concept structure is absorbed from recurring regularities in perceived reality.

A concept is abstracted from experiences that exist as sensory patterns in one or more modalities.  Once abstracted, a concept can be compared to a new sensory experience to determine whether the new experience satisfies the concept, or equivalently, whether the concept describes a facet of the experience.  Concepts can describe both environmental sensory experience and internally generated mental imagery.  Concepts can also be imposed on current working imagery.  In the simplest case, an exemplar associated with the concept can be loaded into the working imagery, but constructing complex mental imagery requires that a concept target a piece of existing mental imagery, which the concept then transforms.  Concepts are faceted; they have internal structure and associational structure which comes into play when imposition or description encounters a bump in the road.  Faceting can also be invoked purposefully; for example, "tastes like chocolate" versus "looks like chocolate".

A "concept kernel" is the pseudo-sensory pattern produced by abstracting from sensory experience.  During concept satisfaction, this kernel interacts with the layered feature detectors to determine whether the reported imagery matches the kernel; during concept imposition, the kernel interacts with the layered feature controllers to produce new imagery or alter existing imagery.  A programmer seeking a good representation for concept kernels must find a representation that simultaneously fulfills these requirements:

It would be a serious challenge to solve any one of these problems individually, with sufficient generality and using a computationally tractable method; solving all three problems simultaneously is the fundamental challenge of building a system that learns complexity in combinatorial chunks.

Concepts have other properties besides their complex kernels.  Kernels relate concepts to sensory imagery and hence to the modality level.  Concepts also have complexity that relates to the concept level; i.e., concepts have complexity that derives from their relation to other concepts.  In Good Old-Fashioned AI this aspect of concepts has been emphasized at the expense of all others22, but this is no excuse for ignoring concept-concept relations in a new theory.  For example, concepts are supercategories and subcategories of each other; there are concepts that describe concepts; there are concepts that describe relations between concepts; there are mutually exclusive concepts which cannot simultaneously describe the same referent.  (Further examples of concept relations are given later.)

In formal logic, the traditional idea of concepts is that concepts are categories defined by a set of individually necessary and together sufficient requisites; that a category's extensional referent is the set of events or objects that are members of the category; and that the combination of two categories is the sum of their requisites and hence the intersection of their sets of referents.  This formulation is inadequate to the complex, messy, overlapping category structure of reality and is incompatible with a wide range of established cognitive effects [Lakoff87].  Properties such as usually necessary and usually sufficient requisites, and concept combinations that are sometimes the sum of their requisites or the intersection of their extensional classes, are emergent from the underlying representation of concepts - along with other important properties, such as prototype effects in which different category members are assigned different degrees of typicality [Rosch78].

Concepts relate to the thought level primarily in that they are the building blocks of thoughts, but there are other level-crossings as well.  Introspective concepts can describe beliefs and thoughts and even deliberation; the concept "thought" is an example.  Inductive generalizations are often "about" concepts in the sense that they apply to the referents of a concept; for example, "Triangular lightbulbs are red."  Deliberation may focus on a concept in order to arrive at conclusions about the extensional category, and introspective deliberation may focus on a concept in its role as a cognitive object.  Concept structure is ubiquitously invoked within perceptual and cognitive processes because category structure is ubiquitous in the low-entropy processes of our low-entropy universe.

2.5.1: The substance of concepts

One of the meanings of "abstraction" is "removal"; in chemistry, to abstract an atom means subtracting it from a molecular group.  Using the term "abstraction" to describe the process of creating concepts could be taken as implying two views:  First, that to create a concept is to generalize; second, that to generalize is to lose information.  Abstraction as information loss is implicit in the classical view of concepts (that is, the view of concepts under GOFAI and formal logic).  Forming the concept "red" is taken to consist of focusing only on color, at the expense of other features such as size and shape; all concept usag