| Next: | Version History | Bookmark | |
| Up: | 3: Cognition | Monolithic | |
| Prev: | 3: Cognition |
Time in a digital computer is discrete and has a single space of simultaneity, so anyone who's ever played Conway's Game of Life knows everything they need to know about the True Ultimate Nature of time in the AI. With each tick of the clock, each frame is derived from the preceeding frame by the "laws of physics" of that ontology. (Higher-level regularities in the sequence of frames form what we call causality; more about this in Unimplemented section: Causality.)
A general intelligence needs to be able to perceive and visualize when two events occur at the same time; when one event precedes or follows another event; when two sequences of events are identical or opposite-symmetrical; and when two intervals are equal, lesser, or greater. Most of this comes under the general heading of having a feel for time as a quantity and time as a trajectory, which requires both concept-level and modality-level support.
To support temporal metaphors and temporal concepts - to provide an API with sufficient complexity for the mindstuff to hook into - the AI needs modality-level support. The most obvious method would be to tag all events with a 64-bit number indicating the nanoseconds since 1970 - a plain good-old-fashioned system clock. The problem is that then the AI can't think about anything that happened before 1970. Or about picoseconds.
If we humans have a built-in system clock - there are several candidates, ranging from the heartbeat to a 40-hertz electrical pulse in the brain - we don't have conscious, abstract access to it. What we remember is the relative times; that event A came before event B, that event C was between A and B, that a lot of stuff happened between A and B, that D seemed to take a long time, that E seemed to go by very quickly, that E and F happened at the same time, and so on. If I know that a particular event happened at 4:58 PM on July 23rd 2000, it's because I looked at my watch and associated the visual or auditory label "4:58" with the event. That's why I can think - at least abstractly - about the age of the Universe or picosecond time frames. Our abstract concepts for quantitative time aren't really built on our internal modality-level clocks, but on the external clocks we built. Or rather, the internal modality-level clocks are used for immediate perceptions only, and the abstract concepts create the modality level through a layer of abstraction that can handle millennia as easily as minutes.
Because it's very easy to derive all the relative perceptions of time by comparing absolute quantitative times, we'll almost certainly wind up tagging every event with a 64-bit system-clock time (or equivalent interpreter token), and building any other modality functions on top of that. It's just important to remember that the really important concepts about time should not be founded directly on the underlying, absolute numbers, because then the AI really can't think about picoseconds or pre-1970 events; the mindstuff making up the concepts will crash. Concepts about time, if they refer to quantitative numbers at all, should be founded on the relative times of the cognitive events that occur while thinking about a temporal problem. Thus the AI can imagine a process that takes place on picosecond timescales, and because the visualization itself takes place on nanosecond timescales (or whatever speed the AI's system clock runs at), there's no crash. It's a kind of automatic scaling.
To put it another way: Generality requires that there be at least one layer of complete abstraction between temporal concepts and temporal modalities. Even if stored memories also store the attached system-clock time, a replay of those memories obviously won't take place at the recorded time! If all remembered times are purely abstract characteristics, and only concretely visualized times give rise to temporal intuitions, then the AI can freely manipulate temporal aspects of a visualized process. Symbols such as slow and fast (1) can be abstracted from temporal intuitions and applied to aspects of any visualized temporal process.
Of course, because we aren't slavishly following human limitations, a seed AI should probably have some mode of direct access to the system clock. We've all been in situations where we've wanted to know exactly what time it is, or exactly what time it was when we had breakfast. That's why God gave us wristwatches (2). This should be safe as long as the direct access occurs through the same conceptual filter, the same layer of abstraction, so that the modality-level system clock time 203840928340 comes out as the abstract characteristic "System-clock time 203840928340".
Another subtlety of human temporal understanding is that our senses are synchronized even though different senses presumably have different processing delays. It takes time for the visual cortex to process an image, and time for the auditory cortex to process a sound - not necessarily the same amount of time. But a physical sound and a physical sight that arrive simultaneously should be perceived as simultaneous. Since a seed AI should be able to tag sensory events as distinct from the derivative perceptual events, this should be relatively easy to handle on the modality level... although it's possible to imagine problems popping up if there are heuristics or concepts that act on the derivative and possibly unsynchronized high-level features of multiple modalities.
For some cases, this problem can be solved by only allowing multimodality concepts to act on events that have been completely processed by all targeted modalities. If a vision and a sound arrive at t=10, the sound finishes feature-extraction at t=20, and vision finishes extraction at t=30, then no audiovisual concept can begin acting until t=31, with both the sound and the vision having a perceived time of t=10. In other words, rather than skimming the cream off the modalities, the perceived now of the AI will lag a few seconds behind real time.
This introduces two new problems: One, it may introduce severe delays into the system. Modalities don't just apply to external sensory information; modalities are where all the internal thoughts take place as well. To some extent this problem may be solvable by not requiring complete processing before concepts can activate, but only that level of processing which is necessary to the concept. After all, a concept can't act on information it doesn't have. But this may still lose some efficiency; there may be cases where concepts don't need synchronization.
The second problem is synchronization of subjective time. If the AI's now lags a few seconds behind, when are thoughts perceived to have taken place? If the AI thinks "foo!" at a time that looks to the AI like t=10 but is actually t=40, is the concept "foo!" labeled as having taken place at t=10 or t=40? And what difference does it make? I can't see that using t=40 makes any difference, so I'm strongly in favor of labeling all events as occurring when they actually occur. Still, the AI may eventually find useful heuristics that act on "subjective time".
All these modality-level and concept-level problems are simply echoes of the far more difficult problem of change propagation on the thought-level - how to ensure that "Aha!" experiences and "Oops" experiences propagate to all the corners of the mind, so that beliefs remain in a reasonably consistent state. The issue of Consistency doesn't belong in this section. However, it seems likely that issues of concept-level (and thought-level) synchronization are not problems that should be solved by autonomic processes; concept synchronization may need to be decided on a case-by-case basis. It may be that, in the process of learning thought-level reflexes, and finding concepts that work well, the AI will be forced to invent whatever forms of synchronization are necessary for each concept. If a multimodal concept must act on modality-images that began processing at the "same time" (3), and will otherwise fail (not generate useful results), it should be a relatively simple tweak/mutation, of the sort that even Eurisko could have performed easily enough. The same goes for whatever concepts are specified by the programmer during the initial stages.
As a general rule: All derivative perceptual events should be tagged with their true cognitive time as well as the external-world time of the derivative event. Human-programmed concepts should enable the programmer to decide which time should be used; learned concepts won't even be noticed unless the proper timeframe is used. Try to maintain the regularities in reality that all intelligence is supposed to represent; figure out whether the useful regularities represented by a temporal concept are perceptual/external or cognitive/internal.
"A general intelligence needs to be able to perceive and visualize when two events occur at the same time; when one event precedes or follows another event; when two sequences of events are identical or opposite-symmetrical; and when two intervals are equal, lesser, or greater. Most of this comes under the general heading of having a feel for time as a quantity and time as a trajectory..."Several of the most fundamental domains of cognition are one-dimensional or monotonically increasing, and thus share certain linear charateristics. In a sense, any possible use of the word "close" or "far" invokes a kind of linear intuition. So do the words "more" and "less". Time, because it is both monotonically increasing and one-dimensional (4), is one of the linear domains. The linear domains tend to relate very closely to each other - you can have "more" time or "less" time, treating time as a quantity; you can be "close" to a given time, treating time as a trajectory. We freely mix-and-match the words because the target domains share behaviors and underlying properties. In some sense, the relation between time and quantity and trajectory is not, as Lakoff and Johnson would call it, a "metaphor"; it is a real identity.
-- above
When you consider that time is almost always mathematically described as a real number (5); that one of the words for real number is "quantity"; that in most trajectories the spatial distance to the target decreases monotonically with time; and that time "moves forward" at constant velocity; then, the identity seems so perfect that there is no complexity to be gained by the metaphor. Lakoff and Johnson kindly remind us that "quantity" applies not just to mathematics, but to piles of bricks and stacks of coins; that "trajectories" are not just simple flights from source to target, but complex spatial maneuvers, with huge chunks of the visual subsystems dedicated to their visualization.
By observing that piles of two bricks plus piles of three bricks equal piles of five bricks, it is possible to guess that two hours plus three hours will equal five hours. Using the underlying numerical concept described in 2.3.3: The concept of "three", it can be seen that this "metaphor" requires the ability to treat temporal intervals as distinct objects, so that unique correspondences can be drawn between each of three hours and each of three bricks. To learn (concept-level) to treat time as a quantity requires that the AI encounter a task with a uniqueness constraint; one in which it can't do two things in the same minute (6). This leads to treating time as a limited resource, which leads to an even stronger analogy with time-as-material-substance.
Lakoff and Johnson describe the time-is-movement metaphor in terms of the motion of an observer. The "location" of the observer is the present, the "space" in front of the observer is the future, the "space" behind the observer is the past. "Objects" are events or times, "located" at various "points" along the "line". The time-is-motion metaphor has two (incompatible) interpretations: The observer can be thought of as moving forward at a constant speed, passing the events; or the events can be thought of as moving towards the observer. (L&J note that this is why "Let's move the meeting ahead a week" is ambiguous.) Lakoff and Johnson note that we also map time onto body image; in almost all languages, the observer "faces" the future - although a few languages (presumably noting that one can see the past, but not the future) have the observer facing the past. However, this is getting away from the primary topic - the utility of describing time as a trajectory.
One primary use of time-as-space is to visualize multiple events simultaneously. That is, by conceptualizing time as a line, we can simultaneously consider three points/events along the line, where a true temporal visualization would force us to consider the events sequentially. But this only applies to humans, with our single and indivisible stream of consciousness. A seed AI might be able to simultaneously visualize the dynamic qualities of three different events; in effect, placing three different moving observers at three different points along the timeline! Likewise, visualizing time as space makes it easier for humans to perceive certain types of qualitative relations. Visualizing a quantity plotted against time - you know, an ordinary 2D graph - enables us to perceive properties of the curve that would not be visible to a human observer watching the 1D variable change with time. Humans have one set of intuitions for static spatial properties, allowing us to stand back and look at the graph and form compounded perceptions and connected thoughts; we have another set for dynamic systems in which the sensory images change at the same rate as our stream of consciousness.
For an AI, the benefit of spatial metaphors might be provided directly by rewriting the spatial-modality perceptions directly for the temporal modality - rewriting a visual curve-detector so that it operates on data in the temporal modality, so that an AI watching a single quantity change over time has the same set of "smooth curve" or "sharp curve" or "global maximum" perceptions as a human contemplating a 2D graph.
In conclusion: Time, quantity, and trajectory share certain basic underlying properties. The primary driver for high-level metaphors between time and quantity is a task in which time is a limited resource. In humans, the primary driver for metaphors between time and trajectory is the greater sophistication of our static visual intuitions, but this may not apply to seed AIs.
Hofstadter, writing about Copycat - an AI that performs analogies in the domain of letter-strings, such as "abc->abd::pqrs->?" - notes that, despite the simplicity of Copycat's domain, the domain can contain analogy problems so complex as to embrace a significant chunk of human thought. A few years back, when I was only beginning to think about AI, I set out to brainstorm a list of a few hundred perceptions relating to analogies - "before, next, grow, quantity, add, distance, speed, blockage, symmetry, interval..." - and noticed that most of them could be represented on a linear strip of Xs and Os. These perceptions I collectively name to myself the linear intuitions - the perceptions that apply to straight lines.
One such perception is reflection: "XXOX" is the reflection of "XOXX", and the image "XXOXOXX" is bilaterally symmetric. (Note that it may take you more time to verify that "XXOXOXXO" is the reflection of "OXXOXOXX", or that "OXOXXOXXOXO" is bilaterally symmetric, and you may need to do so consciously rather than intuitively; our perceptions have horizons, limits to the amount of processing power expended. Of course, your perceptions are analyzing huge collections of two-dimensional pixels, not just the on-off "pixels" of a linear image.) Writing a computational procedure to verify reflection is trivial, but this would leave out some of the most important design features. On seeing the letter-strings "ooabaoo", "cxcdcxc", and "rauabauar", the letter-string "oomemool" would come as rather a surprise, and the "l" would stick out like a sore thumb. Even without precedents to establish the expectation, the image "WHMMOW" has something wrong about it (7).
The perception of reflection is not simply a binary, yes-or-no verification; once a partial reflection is visible, it establishes an expectation of complete reflection - a mental image of how the structure "ought" to look, if the reflection were complete - and if the expectation is violated, if the actual image conflicts with the imagined, then the violation is detected, and the violating object becomes more salient ("sticks out like a sore thumb"). If there is some way to look at the violating object that preserves perfect reflection, it will resonate strongly with the expectation. (A more complete discussion of expectation, especially on the concept-level rather than modality-level, is in Unimplemented section: Causality.) The point is that the perception of reflection, like most perceptions, has complex internal structure. In particular, it is possible to expect reflection, and for the property of "reflection" to be applied to a previously asymmetric object.
And the usual caveats: It is possible to notice reflection within an image, or to notice reflection of two structures in two different images; and it is easier to see reflection if you're looking for it in advance.
Since it would be computationally expensive to compare every possible set of pixels for reflection, and yet we notice even unexpected reflections within an image - implying that the detectors are always on - the human brain probably detects for prerequisites to reflection first, and tries to perceive reflection per se only if the prerequisites trigger. If two visual images are related by the property of reflection, they are likely to have very similar high-level properties, so that the simultaneous perception of an image and its reflection would lead to perceptual structures that, in the human neuron-based brain, would resonate very strongly with each other, suggesting that tests should be performed for both identity and reflection. If the object is recognizable, then both the object and its mirror image would usually be classified identically by the temporal lobe (8) - a bird and its mirror image are both classified as "bird" - so that the visual signals from object and mirror image would rendezvous at that point, and could be backtraced to their origins, and the test for symmetry then applied.
That's how humans detect visual symmetry, anyway. It is possible that the human brain uses its underlying electrical properties to detect neural synchronies on a global scale, a physically based method that it would be computationally extravagant to match on a von-Neumann-architecture digital computer. It could be that a Monte Carlo method would do as well; a million random samplings and comparisions of parts of the global state might often find local similarities between sufficiently large similar structures - if not always, then often enough to give perception a humanlike flavor of spontaneity. A Monte Carlo method that randomly tried to detect a million possible resonances might do to duplicate almost all the functionality of neural resonance, without the combinatorial explosion that would defeat a perfect implementation.
But that sort of thing is a major, fundamental, and underlying design issue, and somewhat beyond the scope of this section, or even 3: Cognition. The perception of 1D temporal reflection is much simpler than the perception of true 2D or 3D spatial reflection. The modality-level design requirement is that the AI should be able to independently notice blatantly obvious temporal reflections; detecting anything more subtle can be left to heuristics, concepts, and the full weight of deliberate intelligence. The AI needs to be able to verify temporal reflections suggested by concept-level or thought-level considerations, but this, as said, is relatively simple.
| Scenario 1 |
| A glass drops, and grapes explode in the microwave, and the computer turns itself on - and then, a few minutes later, the computer turns itself on, grapes explode in the microwave, and a glass drops. |
The reactivation of the infrequently-used exploding-grape concept (or perceptual structure, if it doesn't rate a concept) should be enough to suggest that events are being repeated; enough to draw correspondences between each unusual pair of events. The computational procedure for detecting reflection is simple enough that it could conceivably be run on every consciously perceived event-line where correspondences are drawn between events - at least, with respect to the events salient enough to have correspondences drawn between them.
Perhaps this example is a bit outré, but then it's hard to come up with examples of useful temporal reflections. The only example that springs to mind would be disassembling and reassembling a motorcycle (9). A stock-trading AI might find a temporal-reflection intuition useful, or an AI watching a light bob up and down and trying to deduce a pattern. "Run the process backwards" is an incredibly useful heuristic in a wide variety of circumstances, but such a high-level idea is a thought-level process; even the concept "backwards" properly belongs under Unimplemented section: Symmetry.
There are still some subtleties remaining in Scenario 1 (the exploding-grape scenario). First, the correspondences drawn are between high-level events. The concept of "exploding grape" is not represented directly in a sensory modality; at most, the sound and sight of the exploding grape are represented, and no two real-world sights and sounds will ever be precisely equal. The similarities between the first and second events that lead both of them to be classified as "exploding grape" are higher-level - either low-level conceptual or very high-level modality.
However, the modality-level intuition for temporal reflection can operate on concept-level cognitive events. In humans, for example, the thought exploding grape results in the visualization of the syllables "exploding grape" in the auditory cortex, which - in theory - could have a time-tag attached. In practice, it seems likely that the AI architecture will be such as to locate concept-level cognitive events and label them as objects - so that, among other things, thoughts can be tagged with the system-clock-time that's used for modality-level temporal intuitions. In general, thinking about thinking - introspection - obviously requires some way of observing the temporal sequence of thoughts, knowing when you thought something. Either the architecture needs to explicitly represent the activation of concepts and thoughts (the likely solution (10)); or, if it's all a big puddle of mindstuff with higher levels being emergent (11), the thoughts need to spill over into modalities in some way that allows evolved concepts and thought-level reflexes to do things like identify the time of a thought.
The second subtlety is that the temporal reflection is not likely to be perfect. The intervals between the dropped glass and the exploding grape are not likely to be exactly 20 seconds apiece. Only the comparative precedences - which event came first - are tested for reflection. That said, a reflection which preserves intervals constitutes a much stronger binding, although human temporal perceptions are too approximate for us to notice that sort of thing without a stopwatch. (Our spatial intuitions for reflection do require the preservation of distances.)
Simultaneity is when two events occur at the same time. Perfect simultaneity is when two events are tagged as occurring at exactly the same time, to the limits of the resolution of the modality-level system clock. Even in AIs that totally avoid parallel processing, sensory modalities will tag all the components of an incoming image as having arrived at the same time, so any mind is full of insignificant simultaneities. Significant simultaneities are those that are unexpected and that occur in high-level, salient objects. For example, two objects simultaneously disappearing from a sensory input.
Because a seed AI's system clock will probably run much much faster than our own, it may be necessary to define intuitions that detect imperfect simultaneities - for example, any sensory coincidence within 1/40th of a second, or any internal coincidence within 1/1000th of a second (or some other time scale chosen to match the speed of the AI's stream of consciousness). (12).
Aside from that, take all the caveats I listed in 3.1.4.1: Reflection and apply them to simultaneity. For example, if simultaneity is repeated often enough to be expected, then the expectation of simultaneity is applied to sensory inputs to create an image, a violated expectation should be noticed as a conflict of the real image with the expectation, the violating stimulus should become salient, and so on. (And if stimulus A appears without the expected simultaneous stimulus B... and stimulus B still hasn't appeared after the AI gets over the shock... then both stimulus A and the absence of B become salient.)
The human perception of intervals is approximate rather than quantitative. We divide how long something feels into "less than a second", "a second", "ten seconds", "a minute", "ten minutes", "an hour", "a few hours", "a day", "a few days", "a few weeks", "a few months", "a few years", "a lifetime", and "longer than a lifetime". (That's a guess. I don't know the actual categories or their boundaries. It would be an interesting thing to know, if someone has already done the research.)
The human perception of temporal intervals is also at least partially subjective, dependent on how much thinking is going on. A process relatively empty of events, in which our mind processes incoming data much faster than it becomes available, is paradoxically perceived as being longer - it is "boring" (13). A process packed full of emotionally significant events may appear as being longer; when it's over, "it feels much longer than it was". (Again, with the time-as-pathway metaphor, passing a lot of events may appear to make the intervals longer.) There's also the proverb "time flies when you're having fun"; if events happen so fast that "there's no time to think" or pay attention to underlying intervals, time may appear to move by much more quickly. (14).
However, it appears to me that human subjective intervals implement no important functionality. If the AI uses system-clock intervals to control the actual subjective perception, so that perceived intervals are precise, then the perception of exact intervals is more likely to be useful - that is, when two processes unexpectedly have the same intervals, it is more likely to signal a useful underlying correlation. The AI does need a perception for "approximately the same amount of time", since this is a useful human perception. (Such a perception might have a quantitative as well as a qualitative component; in other words, the perception of "approximately the same amount of time" might be strongly true or weakly true.)
It may be that we humans have no modality-level "equal interval detectors" at all - after all, we have to count heartbeats or glance at a watch when we want to even verify the equality of two intervals. If so, an AI with a modality-level appreciation for intervals might spot surprises that a human would miss.
"Temporal Reasoning" in MITECS notes that comparative operations on intervals can be more complex than the simple precedence or simultaneity of instantaneous events: "There are thirteen primitive possible relationships between a pair of intervals: for example, before (<) meets (m) (the end of the first corresponds to the beginning of the second), overlaps (o) and so on." Since these thirteen possible relationships can be built up from the relationships of the "start" and "end" events, I don't think they would require architecture-level support. Overlapping intervals should be intuitively noticed because salient intervals should be perceived as solid, filling in every point between the two events, and collisions should be detected in the same way as collisions of solid objects. Computationally, this can be implemented either by using a 1D collision-detection algorithm, or by creating an internally perceived "timeline", with temporal pixels that can be occupied by multiple events, with a computationally tractable resolution (the system clock might be too fast) that is nonetheless fine enough to detect overlap. (16).
Finally, intervals have the same caveats as 3.1.4.1: Reflection. For example, intervals are perceived only for salient events; they aren't computed for every pair of cognitive events in the mind. (This is, in fact, impossible, since the perception of an interval is itself a cognitive event.)
Temporal precedence is which of two events - A or B - came first. Precedence is the most often-used and most useful temporal perception; it is the one by which humans order reality. We don't care about the exact intervals in milliseconds (although an AI might - see above); we care whether event A or event B came first. Precedence is the most useful temporal intuition because it is the most deeply intertwined with causality - effects follow causes. (See Unimplemented section: Causality.)
Mathematically, transitivity of precedence is the defining characteristic of a linear ordering. If A < B and B < C, then A < C; if this relation holds true for all events A, B, and C in a group, then that defines a linear ordering of the group (17). The set of precedence relations defines a linear string of events. It is this definition that we humans use, most of the time. Without access to an actual calendar, we will almost never reconstruct a series of events by trying to remember the actual temporal labels and performing a sort(). Rather, we try to reconstruct the series by remembering that B came after A and before C, that D came after B, and so on
It is also noteworthy that we tend to remember precedences that have reasons behind them - such as the precedence of cause and effect. If the series is a causal chain, we may be able to rattle off the whole series without effort. If we're trying to describe the ordering of events that belong to multiple different causal series, we often have to consciously reconstruct the complete ordering from intersections in the partial orderings we remember; from remembering whether something was "a short time ago" or "a long time ago"; and so on. We do not remember an internal calendar or timeline, and we do not remember - on the modality level - the times of events. We remember precedences, and it is from these precedences that the timeline of our lives is constructed.
A seed AI should probably use a modality-level clock or a modality-level timeline, but it will still need to understand precedence.
Precedence in general is ubiquitous; we invoke it every time we say before or after. Precedence can be spatial as well as temporal. Precedence applies to priorities, not just in terms of what must be done first, but the first choice. In this sense, we invoke precedence every time we say better or worse. The metaphors for precedence apply to every comparator that operates on a linear ordering: This is why linear and temporal metaphors are ubiquitous in human language.
What all the metaphors have in common is that the comparative operation on the quantity or trajectory usually reflects an actual temporal precedence - the first choice is usually the one that is considered first; the cognitive events associated with extrapolating that choice will take place earlier. If a simpler theorem comes before a more complex one, it's because the complex theorems are constructed from simple ones; the simple ones are learned first or invented first, and the cognitive event of that learning or invention will have an earlier clock-time attached.
Comparision is as ubiquitous in modalities as it is in ordinary source code. The modality-level intuitions for temporal precedence are a single case of this general rule.
Usual caveats about expecting precedence and broken expectations and so on.
"Quantity" is invoked with every perception containing a real number, as ubiquitous as floating-point numbers in ordinary source code. When I say "quantity", I do not just refer to a continuously divisible material substance, like water or time; I generalize to the internal use of floating-point numbers in representations and intuitions - all the perceptions that can be "stronger" or "weaker".
Given two quantities, we can notice which is more or less; given two quantitative properties, such as height, we can notice which is higher or lower; given two quantitative perceptions, we can tell which is stronger or weaker. This perception can operate statically, in the absence of a temporal component.
As discussed in Unimplemented section: whenextract, quantities and comparators are too ubiquitous to initiate thoughts directly, unless the quantities and comparators are properties of very high-level objects; thus, low-level quantities and comparisions would be computed either as preludes to feature extraction, or only when demanded by the context of a higher thought. Comparisions computed for feature extraction are also generally local. A human visual pixel is compared with nearby pixels for edge detection, but not with every other pixel in the image, using O(N) instead of O(N^2) comparisions. A seed AI should be able to compare arbitrary pixels in arbitrary modalities - but only on demand. For more about the differences between on-demand and automatically-computed perceptions, the difference between low-level and high-level perceptions, and the difference between thought-initiating and guess-verifying perceptions, see Unimplemented section: whenextract.
The list of basic operations that can be performed on static quantities is basically the set of useful arithmetical operations: Subtraction (in other words, interval calculation), comparision, equality testing. It would also be possible to include addition, multiplication, division, bit shifting, bitwise & and |, remainder calculations, exponentiation, and all the other operations that can be performed on integers and floating-point numbers; however, these operations are less likely to be useful - less likely to pick out some interesting facet of reality.
A field of quantities, extended across time or space or both, can give rise to the mid-level features called patterns; patterns are higher-level than quantities, and richer, and rarer as a perception (a hundred pixels give rise to one pattern); thus, patterns are more meaningful. Patterns can be broken, and the high-level feature that constitutes the breaking of a pattern is rarer, and far more meaningful, than either the patterns themselves or the low-level quantities. (I speak here of modality-level patterns; the problem of seeing thought-level patterns is nearly identical with the problem of intelligence itself.)
One example of a pattern is a rising quantity - "rising" implying either a single quantity changing with time, or a field of quantities changing continuously with with some spatial dimension.
A modality observing D: 8, 16, 32, 64, 128, 256 should notice that the numbers are constantly increasing, and that the rate of the increase is constantly increasing. A human modality would not notice that the numbers formed a doubling sequence - and neither, in all probability, should an AI's modality, unless the sequence is examined by a thought-level process. I say this to emphasize that the problem of modality-level pattern detection is limited, in contrast to the problem of understanding patterns in general - if the AI's modality can understand a simple, limited set of patterns, it should be enough.
To notice a pattern is to form an expectation. When this expectation is violated, the pattern is broken. Observing a single quantity changing, as in sequence C, the feature "increasing" remains constant. If C continues but suddenly starts decreasing - 8, 19, 22, 36, 45, 71, 62, 21, 7, 6, 1 - an "edge" has been detected. On a higher level, this is what is observed: "...greater than, greater than, greater than, less than, less than, less than..." Thus the presence of the low-level feature detector for "greater than" or "less than" enables the AI to notice a pattern it could not otherwise notice, and to detect an edge it could not otherwise see. That is the function of modality-level feature detectors: To enable the discovery of regularities in reality that would otherwise remain hidden.
As a general rule, notice equality, continued equality, and broken equality in the quantity, in the first derivative, and in the second derivative. We notice when a constant quantity changes and when a constant rate of change changes, but we humans do not directly perceive changes in acceleration. We compute the quantity and the quantitative first derivative, but not the quantitative second derivative. Since the second derivative - for humans - is not quantitative but qualitative, we can notice it crossing the zero line, or notice large (order-of-magnitude) changes, but not notice small internal variances. An AI might find it useful to perceive the second derivative quantitatively, but computing a quantitative third derivative (and thus a qualitative fourth derivative) would probably not contribute significantly to intelligence outside of specialized applications.
(18).
There is still a question of salience. We would wish a financial AI, or a human accountant, to notice and wonder if a bank account customarily showing transactions measured in hundreds of dollars suddenly began showing transactions measured in millions - the mid-level feature "magnitude", formerly constant at "hundreds", suddenly jumps to "millions". But we wouldn't want to notice a change from the mid-level feature "magnitude: 150-155" to the mid-level feature "magnitude: 153-160", even though - on the surface - both look like equally sharp inequalities. (As a crystalline "compare" operation, "hundreds" != "millions" is neither more nor less unequal than "150-155" != "153-160".) Similarly, we would not notice a change from the mid-level feature "frequency of numbers ending in 5: 20%" to "frequency of numbers ending in 5: 25%"; or, if we did somehow notice, we wouldn't attach as much significance.
We have learned from experience, or from our cultural surroundings, that money is extremely significant, that people often try to tamper with it, and that the order-of-magnitude of monetary quantities should be paid attention to; we have not learned a similar heuristic for shifts in a few dollars, or shifts in percentage frequency of digits, which is why monitoring either quantity is a specialized technique used only by auditors.
Learning which patterns and broken patterns to pay attention to is a concept-level problem; it's not trivial, but Eurisko-oid techniques should suffice.
These are the feature extractors that can operate on quantities in general:
On the concept-level, all these features should be computed for all salient high-level quantities, and for all higher-level features rare enough that computing all the features is computationally tractable. Figuring out which features to compute for a quantity, and which features to pay attention to, is a major learning problem for the AI; learning in this area contributes significantly to qualitative intelligence as well as efficiency, since compounding extractors can lead to the computation of entirely new features.
On the modality level, these feature extractors can be composed to yield some basic mid-level features, such as edge detection in pixels, although anything more than that is probably a domain-specific problem. For example, a problem as simple as computing changes in velocity will not fit strictly within the domain of quantitative perceptions, unless the velocity is broken up by domain-specific perceptions into quantitative components of speed and direction.
Lakoff and Johnson, arguing that our understanding of trajectories is fundamentally based on motor functions, offer this list of the basic elements of a trajectory (quoted from "Philosophy in the Flesh"):
The concept of a trajectory can be represented in the temporal XO modality. Zooming out from the following frame, "OOOOOOXOOOOOOOOXOOOXOOOOOO", it could be described as "three points on a line". Given a temporal sequence of XO frames, the points on the line can "move"; they can have position, speed, direction, and velocity.
The XO modality suffices to represent an example of a trajectory, e.g.: "XXOOOX", "XOXOOX", "XOOXOX", "XOOOXX"; an observing human would say that the middle X has moved from the starting point defined by the first X to the endpoint defined by the third X. (Note that I do not yet use the word "target".)
For the sake of form, we should name all the intuitions giving rise to the start-move-endpoint perception. The largest hurdle is the perception of each middle X as an instance of the same continuous object - that is, that the X at position 2 in t1, the X at 3 in t2, the X at 4 in t3, and the X at 5 in t4, are all instances of a single object with a continuous existence. A human makes this interpretation immediately because we have built-in assumptions about the continued existence of discrete objects - domain-specific instincts that become visible within a few months after birth.
An AI could probably make the same interpretation, but it would be more difficult. To establish a strongly bound perception of each X as a discrete object and the middle X as a continuous object, it would probably take a trajectory lasting, say, ten frames, instead of four. Assume for the moment that the sequence is expanded to encompass ten frames and ten one-unit steps for the middle X. In this case, the following facts are visible immediately: First, that there are the same number of Xs in each frame. (I will not say "three Xs in each frame", since this implies an understanding of "three".) Second, that each frame has an X in position 1 and an X in position 12. To a human, it is "obvious" that the constant number of Xs implies a constant number of discrete objects; to a human, it is obvious that the three Xs are each different objects; to a human, it is obvious that an X maintaining an identical position in each frame is the same object in each frame; therefore, since the first and last Xs are accounted for, the leftover middle X in each frame must be the third object. And indeed, the "movement" of the third object (or "shift in the positional attribute", as an AI might see it) is incremental and constant.
A tremendous amount of cognition has just flashed by. Getting the AI to perceive two experiences as belonging to the same object is almost as deep a problem as that of getting the AI to perceive two objects as belonging to the same category. Some of the underlying forces are visible in the source code of Hofstadter's Copycat; Copycat can see two different letters in two different strings as occupying the same role. (Copycat can also see bonds formed by "movements" in letterspace; it knows that "c" follows "b".) The general rule, however, goes much deeper than this.
| Rules of Identification |
| 1.
Equality of attributes across experiences, particularly those attributes
that remain constant for constant objects, implies equality of identity.
2. Continuous change in an attribute, particularly those attributes that can change without changing the underlying object - such as "position" or "speed" - implies equality of identity. |
| Rule of Improbability Binding |
| When two images are equal or very similar, the probability that there is a shared underlying cause behind the equality is proportional to the improbability of a coincidental equality. |
The Rule of Improbability implies that, the wider the range of possible values for an attribute, the more strongly equality of values implies equality of underlying objects. "XOX" binds to "XOX" much more weakly than "roj" binds to "roj". "3" binds to "3" much more weakly than "23,083" binds to "23,083".
Thus, even so basic a task as knowing when two experiences are the "same" object requires that the AI have previously have learned which attributes are good indicators of identity, which in turn requires that the AI have watched over objects known to be identical so that it can observe which attributes remain constant. If this were a seminar on logic we'd be in trouble, but since we're pragmatists we can break the circularity by cheating, just as the human mind does - it seems highly likely that equality of visual signatures and continuous change in position are hardwired into the brain as signals of identity. Similarly, we can start by identifying a few good attributes to begin with, and giving some sample sets with pre-identified objects, and letting the seed AI work it out from there.
What are the consequences of identifying an object?
| Rules of Objectification |
| 1.
Objects constitute a major source of regularities in reality, and many
heuristics - perhaps even modality-level feature extractors - will operate
on objects rather than experiences.
2. Objects often continue to exist even when they are not directly experienced, and may require continuous modeling. 3. Objects will often have internal attributes and complex, dynamic internal structure. 4. All nonvisible attributes of an object remain constant across experiences, unless there is a reason to expect them to change. (If the object has intrinsic variability, then the description of the variability remains constant.) |
(Author's note: The discussion of objects should probably be somewhere other than 3.1.6: Trajectories, probably the section on categorization, and should have a much longer discussion.)
In what sense does labeling objects as "sources", "trajectors", and "destinations" - we will not use the term target just yet - differ from identifying them as "Object 1", "Object 2", and "Object 3"? In what sense is a "path" different from a "trajectory"? What expectations are implied by the labels, and what experiences are preconditions for using the labels?
Conceptually, a path can exist apart from the traversing objects. If, on multiple occasions, one or more objects is observed to precisely traverse the same path - perhaps at the same speed - then a generalization can be made; an observed feature can be extracted from the single experience and verified to apply across a set of different experiences. To observe the existence of a path is useful only if the observation is reflected in external reality - for example, if the reason a rolling ball follows a path down a mountain is because someone dug a trench. A seed AI is unlikely to need to deal with physical trajectories of the type we are familiar with, but the metaphor of "trajectory" extends to the more important modality of source code - a piece of data can follow a path through multiple functions.
Similarly, the conditions that lead us to identify some object or position as "source" is that one or more observed trajectories originate from that source; what leads us to identify a position as "endpoint" is that one or more observed trajectories terminate at that endpoint. What makes the perception of "source" useful is if there is a causal reason why the position is the source of the trajectory, especially if the object or position is actually generating the trajectors - if a pitcher throws a ball, for example; or, in AI terms, if a function outputs pieces of data that then travel through the system. Similarly, the perception of "endpoint" is especially useful if the endpoint actually halts the trajector, or consumes it.
One cue that a real cause may exist - that the perception of a position/object as "source"/"path"/"endpoint" is useful - is if multiple, varying paths/trajectories have the same source or endpoint. Imagine that a randomly moving point darts over a screen, and then the movie is played back three times; the fact that the sources and endpoints were identical may not mean that the sources and endpoints have any particular significance; the rest of the path was identical too.
| Rule of Variance Binding |
| Multiple, variant experiences sharing a single higher-level characteristic, but not others, means that the shared characteristic is likely to be significant. Multiple identical experiences can have any number of possible sources; only if at least some properties differ is there a reason to focus on a particular shared characteristic as opposed to others. |
Thus, the perception of "source" or "endpoint" exists whenever multiple trajectories share an starting position or ending position, and exists more strongly when multiple different trajectories share a source or endpoint but not other characteristics. The perception of "source" and "endpoint" is useful when the perception reflects the underlying cause of the initiation or termination of the trajectory.
A "source" or "endpoint" can be any characteristic shared by multiple origins or terminating points, not just position. If the trajectory of a grenade always ends at the location of the blue car, regardless of where the blue car goes, then it's a good guess that someone is trying to blow up the blue car - that the blue car is the endpoint. The greater the variance, the less probability that the covariance is coincidence, and the stronger the binding. The more unique the description of the endpoints - e.g., the blue car was the only car which shared a location with all endpoints, and the green car and the purple car were elsewhere - the stronger the binding. This binding is predictive if it can be used to predict the position of the next trajectory termination by reference to the position of the perceived "endpoint", and manipulative if moving the perceived "endpoint" can change the trajectories - that is, if you can guess where the grenade will fall by looking at the blue car, and make the grenade fall in a particular place by driving the blue car there. If the binding is strong enough, the endpoint may deserve the name of "target" (see below).
Finally, it is noteworthy that "source" and "endpoint" do not necessarily imply that the trajector goes into and out of existence. Any interval which bounds the trajectory, or any conditions which bound the trajectory, or any sharp changes within the trajectory, may make salient the location of the trajector during the boundary change. (To perform the computational operations which check multiple trajectories for binding of sources or endpoints, it is necessary that the source and endpoint be salient - salient enough that the additional processing is performed which discovers the binding.)
When defining what it means to take the intentional stance with respect to a system, the archetypal example given is usually that of the thermostat. A thermostat turns on a cooling system when the temperature rises above a certain point, and turns on a heating system when the temperature falls below a certain point. A thermostat behaves as though it "wants" the temperature to stay within a certain range; as if the thermostat had a goal state and deliberately resisted alterations to that goal state. In reality, a thermostat possesses no model of reality whatsoever, but we may still find it convenient to speak of the thermostat's behavior as goal-oriented or "intentional".
To describe a trajectory using the terms source, path, and target, the trajector's arrival at the target must be non-coincidental. If the trajector is continuously propelled, then use of the word "target" usually implies that the trajector's path is self-correcting - that if an impulse is applied which causes the trajector to depart from the path, a correction (originating from inside or outside the trajector) will correct the trajectory so that the trajector continues to approach the goal state. A trajector typically approaches the target such that the distance between trajector and target tends to decrease continuously, in spite of any interfering impulses. (This is not always true, particularly in cases where the "trajector" actually is an intelligent or semi-intelligent entity capable of taking the long way around, but you get the idea.) In a slightly different usage of the word "target", the trajector moves at a constant and unalterable velocity, but tends to hit the target - or at least come close to it - because the trajector was aimed. (Which is how "aiming is defined".) (Author's note: Expand this area.)
Resistance is the name given to an "obstacle" on the way to the target or goal state. The perception of "resistance" arises when we observe a trajector hit some type of barrier and bounce, or slow down, or be pushed back. The implication is that the trajector has not merely encountered some random impulse, but that there are specific forces preventing the achievement of a specific goal state or subgoal state.
Forcefulness is the ability to overcome resistance. The perception of "forcefulness" - force that, to humans, is viscerally impressive - arises when we see the trajector applying additional forces to overcome resistance.
All of this applies, not just to actual moving objects, but to goals in general; to the higher-level metaphor similarity is closeness. The idea of "closeness" does not apply only to two quantitative attributes, but also to two structures built from a number of qualitative attributes. If, over time, the qualitative attributes of the first structure are one by one adjusted so that they match the corresponding attributes of the second structure, then the first structure is "approaching" the second.
Mathematically, we might say that one point is approaching a second in the multi-dimensional phase space defined by the qualitative attributes, but this is being overly literal. The perception of similarity is useful when two objects being more similar means that the two objects are more likely to behave similarly. The similarity-is-closeness metaphor is useful and manipulative when two objects being "closer" means that less additional work is required to make them match completely - one object has become closer to the target represented by the other.
Use of the term "close" to mean "similar" is an astonishingly general metaphor. "Close" is used to describe almost any object, event, or situation that can "approach" a goal state. "Approach" is used as a metaphor to describe goals in general.
The ultimate underpinning of this metaphor, in humans, may actually be the human emotional state of tension. We feel tension as we watch something approach a goal; tension rises as the goal comes closer and closer... The same rising tension applies when we watch a trajector approach a target. The closer the approach, the sharper our attention, the more we're on the lookout for something that might go wrong at the last second. The metaphor between spatial closeness and generalized similarity is probably a shadow of the much stronger metaphor between approaching a target and approaching a goal.
Generally speaking, it's a bad idea to weigh down an AI with slavish imitations of human emotions. It may not even be necessary to duplicate the metaphor; I'm not all that sure that the space-to-similarity metaphor contributes to intelligence. It does seem likely that the AI will either experience (or learn) some type of heightened attention as events approach a goal state.
For we humans, who inhabit a physical world, trying to make an object achieve a certain position is one of the most common goal states; position is one of the attributes that is most commonly manipulated to reach a goal state. Indeed, we might be said to instinctively apply the metaphor state is position. Perhaps the AI will learn a similar set of extensive metaphors for source code.
There should probably be some type of modality-level support that indicates the feeling of approaching a goal, so that the concept of "approaching a goal" lies very close to the surface, and generalizations across tasks and modalities are easy to notice. The idea of "approach" is an opening wedge, a way to split reality along lines that reveal important regularities; the behavior of the "trajectory" towards the goal in one task is often usefully similar to the behavior of trajectories in other tasks.
It may be that this is a genuine instance of a physical property of the underlying neurons that would be very hard to duplicate as an external heuristic, without creating an additional layer of neuronlike interpreted code. However, I think that procedural pattern-detectors, plus the ability to learn heuristics about which pattern-detectors to apply and when, should be able to match the effectiveness of biological neurons at forming expectations and detecting patterns.
Our neural ability to adapt to unexpected new patterns may be simulable by trying to detect identity or covariance in a few thousand entirely random quantities, every now and then.
| Next: | Version History |
| Up: | 3: Cognition |
| Prev: | 3: Cognition |