Robots Using Symbols
David Leech Anderson: Author
Do you remember mechanical typewriters? You don't see them so much anymore now that computers with word-processing programs and printers do that work for us. Consider how a typewriter works. It is a machine built to produce symbols. For example, if you strike the right keys in the proper order, then ink will produce black marks on a page with the following shapes 'f-l-o-w-e-r'. Together they are a symbol that represents a kind of plant. It is the nature of a symbol to stand for or to represent something else. When we speak of the meaning or the content of a symbol, we are speaking of that which the symbol represents.
Let's return to our old-fashioned mechanical typewriter. While it is a machine that produces symbols (on a page), it does not use symbols to accomplish that task. To type the word, 'flower,' I strike the appropriate keys and the physical force of my fingers on each key causes the hammer to move in an arch and strike the paper. The typewriter is a physical system made of component parts. But none of its functional parts represents anything. The hammer that imprints the letter 'f' on the paper is simply a lever, it doesn't stand for anything. The letters that are printed on the page can be symbols, of course, but no component of the typewriter itself functions as a symbol.
So we can say of the typewriter, that it can produce symbols, but that it doesn't use symbols in the job that it performs. So, in its design, it is a simple mechanical device, just like a push lawnmower or a wind-up toy. But the computer you are using to view these pages, the computer that many of you use to write letters with is not a simple mechanical device. It not only produces symbols (when you print your letter on a printer), but it uses symbols to accomplish that task. When you type the keys on the computer keyboard, the force of your finger does not push a mechanical lever. Rather it sends an electrical signal which results in numerous "on/off" switches being thrown inside your computer. Just as certain arrangements of letters form words, which function as symbols, so too certain arrangements of switches (some on, some off) also function as symbols. It can be thought of as a kind of language -- a binary language, since it consists of only two elements. But they are, nonetheless, symbols. Which is to say that they are representational, they stand for something. One combination of on/off switches might represent the English word 'flower'. Another combination will be an electrical message being sent to the monitor that means "Display the word 'flower' on the screen".
In our pages on "What is a computer?" it is explained that a classical or digital computer is characterized by a kind of information processing in which information is carried in discrete units like the red balls in our animated computer. The on-off switches of a PC or a Mac are similarly digital in character. Switches are either on or off, they can't be anything inbetween. The on-off state of these switches is usually expressed with the numbers, 0 and 1. The number 1 indicates "on", and 0 indicates "off". A standard digital computer will encode distinct strings of 0's and 1's for each letter of the alphabet. It will takes lots of 0's and 1's to represent the word, 'flower.' But that string of 0's and 1's can (at least potentially) do everything that a symbol could be expected to do. It can represent the roses in your garden just as well as the word 'flower' can.
The 0's and 1's provide a kind of "alphabet" from which more complex symbols can be built. But you need more than symbols. If the symbols are going to do work for you, there needs to be a way of arranging and re-arranging them to accomplish particular tasks. Think of this like a board game. A board game has playing pieces and rules that specify the moves that are legal. To play the game, a person must be able to do two things: (1) to recognize the present state of the board, in particular which pieces are occupying which positions, and (2) to make legal moves with the pieces (e.g. to place them on the board, to remove them from the board, or to change their position on the board).
If a machine is going to use a system of symbols to accomplish a task, then it too must be able to accomplish these same two jobs: (1) to recognize the present state of the system -- we'll call this reading the symbols, and (2) to add, to remove, or to rearrange the symbols in the system -- we'll call this writing to the system (even though the removal of an element is more like erasing).
In summary, then, classical computers perform the tasks that they do by this symbol processing method. Symbol processing computers can be used to simulate many abilities, including our ability (A) to recognize objects (through vision), and (B) to make reasoned judgments about how we will behave. Let's explore one of these abilities now.
Machines speaking English
Let's consider how a machine might process symbols in such a way as to produce (as OUTPUT) the words of a language -- and not just any words, but intelligent conversation as we might expect from a competent human language speaker. (A computer program that is designed to simulate "intelligent" behavior is called an artificial intelligence or AI program.) Suppose that we want our program to be able to give reasonable answers to questions that ask for common sense information about the world, questions such as:
"How many legs do most dogs have?"
How is this question represented in a form that the computer program can handle? There is a unique series of 0's and 1's (on/off switches) assigned to each of the letters of the alphabet. For each letter we type, one string of 0's and 1's will be sent to the CPU or central processing unit. (The CPU is the device that actually does all of the "computing.") Our question, then, will be represented by many strings of 0's and 1's that carry information not only about each letter of each word, but also information about punctuation, capital letters, etc. Next, all the separate strings of 0's and 1's that together constitute a "symbolic representation" of the English sentence, must be stored somewhere (at least temporarily).
The next step is to give the program the ability to answer the question "intelligently." Suppose we want it to answer:
"Most dogs have four legs."
What could we do to give it this ability? The simplest way would be to use a brute-force method. That is, we could create a "look-up table" that directly associates the question "How many legs do most dogs have?" with the answer "Most dogs have four legs." And while we're at it, we could add a few other sentences and their associated responses, producing the following table:
How many legs do most dogs have?
Most dogs have four legs.
What is the capital of Illinois?
What is your name?
My name is Iris.
Nice weather we've been having.
Yes, it has been quite pleasant.
[NOTE: This method is similar to the look-up table used to do multiplication in What is a Computer?]
To make use of this table, the program would require only one basic rule:
THE RULE: If the INPUT is one of the sentences found in the INPUT column of the look-up table, then give as OUTPUT the sentence directly across from it in the OUTPUT column. If the INPUT is anything else, give as OUTPUT "I don't understand what you are saying."
Remember, the program can't directly handle English words or sentences, only 0's and 1's. But the required translations are simple. As I type in the question about dogs, the program stores it in the form of strings of 0's and 1's. The sentences in the look-up table are also stored with strings of 0's and 1's. So the program need only compare the string of 0's and 1's representing my question with the strings of 0's and 1's in the INPUT column of the look-up table. If any of the strings of 0's and 1's in the INPUT column matches exactly the string that was produced when I typed in my question, then the program will give as OUTPUT the corresponding string of 0's and 1's in the OUTPUT column. It will then send that string of 0's and 1's to the computer screen, which will display on the monitor "Most dogs have four legs."
This program implements an algorithmic function, which means that it follows a rule that determines precisely what the OUTPUT should be for every possible INPUT. The problem with this particular program, of course, is that for the countless things one might say in a conversation, this program will give an intelligent sounding response for only 4 possible sentences. For all the rest, it will not sound intelligent at all, for it will produce the the sentence, "I don't understand what you are saying." Even if you put a thousand such sentences in the look-up table, the chance that you would accidently hit on one of those sentences in a conversation are pretty slim. (NOTE: A look-up table can only "recognize" an exact copy of the INPUT sentence. If even one letter is different, it would treat it as an entirely different sentence. So if you asked "What's the capital of Illinois?" it would not find that exact sentence in the INPUT column, so it would give as OUTPUT, "I don't understand what you are saying."
Programs like this, that rely exclusively on this simple type of look-up table, are sometimes called, block machines. A program that treats the 0's and 1's that constitute each complete sentence as a single, indivisible block of data, and that always produces as OUTPUT a similar indivisible block of data, are very limited in what they can do. For example, they cannot take in any new information in a dynamic way. Most conversations that you have with people require that you remember things that they've told you, because those things often come up later in the discussion. A program with nothing more than a "pre-programmed" look-up table would be incapable of adding anything new to its database in the course of a conversation. To add anything new, a programmer would have to re-write the program.
But this is not the only way to write a computer program that "speaks" English. There are better ways. Let's meet an artificial intelligence program that improves on the "block machine" method.
It is time to take a look at a real artificial intelligence program in action. The program that you will be introduced to is called, ProtoThinker. It runs on a classical (digital) computer (a PC, in fact). In common with the "look-up table" program described above, ProtoThinker is a classic symbol-processing program. That is, it applies a set of determinate rules (algoritms) as it manipulates a finite set of discrete symbols within a symbol-system. We shall describe programs of this kind as symbol-processing programs, and what such programs do is symbolic (classical) computation.
ProtoThinker does manipulate symbols, but it does not rely exclusively or even primarily on look-up tables. It uses a variety of clever stategies that allow it to perform in a much more sophisticated way than could be achieved with a block machine. Rather than treating a sentence as a single, indivisible block of data, it analyzes every sentence that it receives as INPUT, attempting to identify the subject and verb of the sentence. In short, it "parses" the sentence. Because it breaks a sentence up into its component parts, it can also store information not only about whole sentences, but about the "entities" that are mentioned in the sentence, as well as the "actions" that they describe. Consequently, this program is no mere block machine.
ProtoThinker: A Model of the Mind
Before going further, let's see ProtoThinker in action. What you will see is a modest little robot that Mind Project student researchers have built. The ProtoThinker software does the cognitive processing (it is "the mind" of the robot), there is a text-to-speech program courtesy of Lucent Technologies (which is "the voice"), there is a Teachmover robotic arm, and there is software written by students that connects the three together. We call this robot, "Iris.1", the simplest of the Mind Project robots.
Meet IRIS 1.0
How is it possible for PT (aka ProtoThinker) to carry on this conversation? Is it programmed with a look-up table to mechanically give whole sentences as OUTPUT if it receives whole sentences as INPUT. This would require that Barker, the programmer, anticipate precisely what I was going to ask in advance and program in an appropriate response. But that is not how it was done. PT is programmed to say very little in advance of a conversation with human beings. Rather, PT gains information about the world by what it is told by human beings. It works like this.
PT is a Windows program and we communicate with it through a computer key board. We teach it things the same way you would teach a small child -- we "talk" to it. For example, we might tell it
"Most dogs have four legs."
Just as with any other digital (classical) computer that is running a symbol-processing program (including the "look-up table" program above), at the heart of the PT controlled robot we find unique strings of 0's and 1's that are manipulated according to fixed rules. The result, as with any such program, is that the behavior of the program will be determined by the INPUT that it receives.
Yet, in the case of PT, the system is sophisticated enough that it can take the sentence above, about dogs, and do the following:
store it in its memory
classify it as information about "dogs"
retrieve it as a reasonable answer to the question "How many legs do most dogs have?"
. . and more
These capacities give PT the ability to do a number of interesting things. But more about that later.
Robots that are controlled by symbol-processing programs, like PT, are especially good at capturing the structure of certain forms of reasoning. Suppose you believe the following two statements:
- All philosophers are cool.
- I am a philosopher.
Can you see anything that follows logically from those two statements? That is, if it really is the case that all philosophers are cool, and if it really is the case that YOU are a philosopher, then couldn't you logically infer the following statement:
3. I am cool.
Indeed, this is a valid inference. Inferences of this general type are called deductive inferences and they are easily captured by symbol-processing programs. This is because rules (algorithms) can be written that can often recognize when a set of three statements have a valid structure. For example, the previous inference has this structure:
1a. All A's are B.
2a. x is an A.
3a. Therefore, x is a B.
For any two statements that have the structure in 1a. and 2a., it will follow logically from them that a statement with the structure of 3a is true.
If you want to built a robot that at least simulates the kind of reasoning that humans often do before they decide how to act, deductive inferences work quite well. Let's say we want to build a robot that has three different commitments: to pick up litter, to keep doors closed (to save energy, let's suppose) and to put toy blocks into cups (justs for the fun of it). One way to express these commitments is with conditional statements that express a commitment to do the three previously mentioned activities on the condition that the circumstances are appropriate. Here are three such conditionals:
If the soda can is empty, then I will throw the can in the trash.
If the block is on the table, then I will put the block in the cup.
If the door is open, then I will close the door.
In the animation below, you will find a robot that we might describe as "believing" each of the three conditionals above. To say that the robot "believes" these statements is simply to say that the the computer that controls the robot has a symbolic representation of each of those conditional statements in its "memory" and that those symbolic representations can influence its behavior. Until the circumstances are appropriate, however, the robot will not act on them. Originally, the animation below was an interactive Flash animation. When Flash was retired, we made a video (.mp4) capturing all of the content. As we click on each of the environments (a door will open, a toy block will appear on a table, and an empty soda can will appear. As each of these situations is detected by the robot, the robot will go into action . . closing the door, putting the block in the cup, and throwing the litter in a recepticle. Why? Because those decisions follow logically from the robot's "beliefs."
A DIGITAL SYMBOL PROCESSING ROBOT
The inference rule is called "modus ponens" and it has this structure:
If p, then q.
Can you see this pattern in the reasoning that the robot performs? Isn't this rational behavior? That is, if YOU were committed to the stateament "If the soda can is empty, then I will throw the can in the trash" and, if you then learned that the soda can was empty -- wouldn't it be reasonable for you to throw the can in the trash?
The next question is a profound one:
When you reason about your own actions, are you (or is your brain) manipulating internal symbols according to rules in roughly the same way that the PC robot does?
If you think the answer is "Yes," then you embrace some version of the symbol-processing hypotheses.
The symbol-processing hypotheses
One question that we can ask about PC, the "symbol-processing robot," is:
When WE reason, are we manipulating symbols in something like the way that the PC robot is?
That is, when we reason about what to do next are we in fact manipulating symbols in some way (even if we aren't necessarily aware of it)? Are there states of the brain that function quite literally as symbols? And, if so, when we reason can it be said that we are (at least sometimes) manipulating symbols that have representational content so as to produce behavior that is rational? If you believe that the answer is "Yes," then you probably accept at least one of the following versions of the symbol-processing hypotheses.
The weak symbol-processing hypothesis: Some symbol-processing systems could have mental states. It is (at least theoretically) possible that a symbol-processing system have mental states and have them because of (in virtue of) operations it performs on symbols. According to this hypothesis, it is possible that a symbol-processing machine might have a mind.
The human symbol-processing hypothesis: Human beings have mental states because of (by virtue of) symbol-processing operations performed by the brain. All humans with mental states are symbol-processing systems.
The strong symbol-processing hypothesis: All things that have mental states are symbol-processing systems. It is necessary that a thing with mental states be a symbol-processing system because the very nature of being a mental state (and having a mind) is performing certain types of operations on symbols. According to this hypothesis, nothing in the universe has a mind unless it is a symbol-processing machine.
Are human beings symbol-processing systems? Is the content of my thoughts determined by what is represented by "internal symbols" that my brain is manipulating? If so, then when I have a "belief" about a door, it is because that door is somehow symbolically represented by some state of my brain. Of course, I can be in a variety of different mental states that are all "about" the door. In addition to having beliefs about it, I can also have hopes or fears or doubts about the door. But what is it that determines that my present "door"-thought is a belief rather than something else, like a hope or a fear? It depends on the function or the causal role of the "door"-thought, both in the way that it influences my actions in the world and in the way that it influences (and is influenced by) other of my thoughts. This is precisely what the theory of functionalism says about mental states. And so it is natural for those who believe that human beings are symbol-processing systems to accept a functionalist theory of the mind. (If you haven't already done so, you may want to read our module on Functionalism.)
In an elaborate symbol system, there will not only be symbols that represent physical objects, like the "door" mentioned above. There will also be symbols that stand for properties or predicates (e.g. being red, confused, or built on Tuesday), events (e.g. an avalanche, my throwing a ball, or George W. Bush winning the 2000 election), and abstract entities (e.g. the number "2" or triangularity). In short, a symbol system complex enough to have the rich kinds of thoughts that we have, will of necessity have internal representations of all of the component parts of which those rich thoughts are constituted.
The role of symbols
Before we close this section, let us pause, briefly, to consider an important issue that may already have been on your mind.
From the outset, we've defined symbols in terms of their function of representing or standing for other things. While this is true to a point, we must be cautious. Consider the string of 0's and 1's that PT uses to encode the English word, 'dog'. Does that string of 0's and 1's (or on-off switches) really refer to dogs, the same way our word 'dog' does when we use it? Well, in one important sense, yes, it seems to. Afterall, John Barker, the programmer, intends that these English words be represented within his program by way of the computer code. Isn't that enough, then, to make that particular string of 0's and 1's refer to dogs? Even the letters of our alphabet (as mere wiggly lines on a page) don't have any meaning by themselves. The black lines that make up the symbol 'dog' have the power to refer to real live animals only because human speakers have invested them with meaning. And human language users seem to accomplish that feat primarily by their intention to use those wiggly lines for the purpose of refering to a particular type of animal, together with their successful association of the words with the animals themselves. So on this line of reasoning, we can say that a particular string of 0's and 1's can come to represent canines.
And yet, even if this is true (and not everyone agrees that it is) it would be a mistake to think that the ability of the program to function as successfully as it does, somehow depends upon the program itself treating the 0's and 1's as symbols representing other things. Quite the contrary. It is in fact the very nature of classical computers that the operations that they perform on the symbols do not (and could not) in any way take account of the meaning of the symbols they manipulate. Instead, the computing machine is able to manipulate language as it does precisely because the 0's and 1's (the on-off switches) can be identified by their broadly structural properties (size, shape, switch-position, etc.) and the rules that the program implements can be accomplished by repeated operations of a strictly mechanical kind. And so, when we describe what the program is doing when it is reading and writing the symbols that are internal to the system, it would be completely misleading to describe its behavior by saying that it writes some string of 0's and 1's because that string means "dog". No. The only reason that a particular piece of computer code will ever write the string, 00100111, is if there is a rule that specifies that the string should be written whenever certain other strings of 0's and 1's are present. The part of the program that actually reads and writes the symbols identifies the symbols by their structural or physical properties (sometimes called, their "formal" or "syntactic" properties) not by their meaning (or their "semantic" properies).
What does all of this mean? Well, the rules within the program that direct certain symbols to be read or to be written are NOT rules of the form "find the string of symbols that means dog". Rather, the rules are like, "If there is a string 00100111 at location 129B, then write the string 11110010 at location 4312C".
So why does any of this matter? Well, it matters if we are interested in whether or not a machine (say, a robot) can genuinely understand a language. A robot controled by the ProtoThinker software might sometimes behave as if it has beliefs, intends to refer to a coke can, and hopes that we will give it a block to play with. But does it really have beliefs? Does it really understand the language that it is producing? Does it really have intentions and other mental states? Some argue that it couldn't because the programs that move the 0's and 1's about operate only on the structural properties of the symbols, not on their meanings. (John Searle's "Chinese room argument" is a famous argument that defends this position. Pages on this topic are coming soon.) But there are others who insist that a complex robot that had internal states that performed the same function that our beliefs about doors perform, then we must say that the robot has "beliefs" about doors and that it "intends" to refer to them.
So, could a machine "speaking English" really be said to understand the language? Could it be saying something meaningful? You probably have some thoughts about these questions already. But before you make a final judgment about the matter, you may want to explore some more of the central issues that could shed light on the question.