Connectionism: An Introduction (page 3)

Home Contact ☰

Return to MODULE PAGE

Connectionism: An Introduction (page 3)

Rob Stufflebeam: Author, artist, and animator

Network behavior

Connectionist networks consist of units and connections between units. The following image captures a 3-layered feedforward network; that is, a network consisting of 3 layers of units, where each unit is connected to each unit above it, and where information flows "forward" from the network's INPUT units, through its "hidden" units, to its OUTPUT units.

This is not the only way that a connectionist network can be designed. In fact, because a connectionist network can composed of any number of units, and units can be connected to one another in an unlimited number of ways, there is an infinite number of possible connectionist architectures. Generally speaking, however, there are two types of connectionist architectures: feedforward networks, such as the one above, and feedback (or recurrent) networks such as the 3-layered one below.

Feedforward networks never contain feedback connections between units. Feedback (recurrent) networks always do. The presence of feedback connections in a network typically results in a network whose behavior is far more interesting and dynamic than a network composed of feedforward connections alone. Moreover, because biological neural networks typically have recurrent connections, artificial neural networks with recurrent connections tend to be more accurate models of information processing within the brain.

Regardless of whether a particular network has feedback connections, properly configured (or "trained") connectionist networks have some interesting features. Before we take a look at how learning occurs, let's explore what some of those features are.

Emergence

If you completed the previous page, then you know quite a bit about units. For instance, you know that the purpose of a unit is to compute an OUTPUT activation. An OUTPUT activation value is analogous to the firing rate of a neuron (or the probability that a particular feature is present). Units send their activation values as "signals" through weighted connections to other units. And you also know that for a unit to do this, it first computes its COMBINED INPUT value (which is the sum of each INPUT activation multiplied by its connection weight). This value is then put through an activation function that "squashes" it into a value between 0 and 1, which is its OUTPUT activation. For a review of this processing, play the demos in the following Flash animation.

Is a properly configured connectionist network "merely" a collection of signal processing units and connections through which signals are sent. No! Indeed! One of the most interesting aspects of properly configured (or "trained") connectionist networks is the emergence of behavior that does NOT reduce to any particular unit. Many things in the world have emergent properties. For instance, consider a drop of water. Drops of water have the property of liquidity, which means they are a liquid. Cups of water, glasses of water, barrels of water, all of this property too. But does a single water molecule? No. Liquidity emerges as a property of a collection of water molecules, just as a flock emerges only at the level of a collection of birds. While emergent properties found in collections of water molecules or birds may or may not be all that interesting, that connectionist networks have emergent properties certainly is -- at least if understanding how brains work matters. After all, biological networks have emergent properties too. Consciousness itself is very likely to be one of them.

What emergent properties do connectionist networks have? Alternatively, what sort of properties or behavior arises NOT because it is "built into" the network, but "emerges" from the ways in which activations are spread throughout the network? Here are two: graceful degradation and pattern completion.

Graceful degradation

Basically, graceful degradation is the property of a network whose performance progressively becomes worse as the number of its randomly destroyed units or connections increases. The alternative property might be called catastrophic degradation -- the property of a system whose performance plummets to zero when even a single component of the system is destroyed. That the performance of biological neural networks degrades gradually -- "gracefully" -- not catastrophically, is another reason why artificial neural networks are more accurate information processing models than classical ones. After all, altering even a single rule in a classical computer model tends to bring the computer implementing the damaged program to a "crashing" halt.

Pattern completion

Connectionist networks are very good at performing tasks that require associating one pattern with another. Typically, each unit in a connectionist network represents a feature of some sort, but what particular feature a unit represents is (hardly) ever determined by the network designer in advance. Rather, it emerges as a result of training. So too do the patterns of activation within a network that represent objects. Both of these aspects of emergence depend on learning. And how connectionist networks learn is arguably their most important emergent property. Hence, rather than discuss pattern completion in the abstract, let me introduce you to how how connectionist networks learn.

How do connectionist networks learn?

Outside the cognitive science community, many people consider the idea that a computer can learn to be rather strange. After all, something learns only if its behavior changes as a result of experience. Computers don't have experiences. Rather, they run programs. Although what a computer does may be called behavior, noncatastrophic changes in a computer's behavior occur only when it was programed to behave that way by some or other computer programmer. Hence, computers do no learn.

But this is not true. Ask Garry Kasparov, arguably the best chess player who has ever existed. Human player that is. In the famous 1997 rematch between himself and the IBM supercomputer Deep Blue, Kasparov lost. And he lost not to a computer that played chess by simply searching its database for the best possible moves (although it could consider 200 million moves per second). He lost to a computer (program) that was adaptive; in other words, Deep Blue altered its play as a result of its "experience" playing Kasperov. Deep Blue is not the only classical computer capable of learning. In any event, it is possible for a classical computer to learn.

Of course, connectionist networks are not classical computers. Rather, they are nonclassical ones. All things being equal, nonclassical computers do not do what they do because they are programed to behave a certain way. But that's the rub: Although connectionist networks are computers, a connectionist network's behavior does not arise because it is manipulating symbols in accordance with an algorithm. Rather, they learn to behave the way they do. What follows is description of how training occurs.

Consider once again the following 3-layered feedforward network.

Because information processing in such networks begins with an INPUT pattern and ends with an OUTPUT pattern, this architecture is very well suited for a host perception-like tasks requiring object recognition. Hence, suppose we wanted to train this network to "recognize" some objects, say, a cup, a can, and a door.

Of course, since this network has no sensory systems, it cannot really recognize objects. For that matter, it cannot really do much of anything. Then again, it does not have to. It is only a computer model.

Nevertheless, in much the same way that you can produce the word 'cup' (as OUTPUT) if you were asked to name the object you see in a picture of a cup (INPUT), this network will "recognize" a cup only if it learns how to produce an OUTPUT pattern that represents a cup when it is presented with an INPUT pattern that represents a cup. Because units in a connectionist network process activations as INPUT and produce activations as OUTPUT, each INPUT pattern and each OUTPUT pattern will be a set of activation values, NOT an image or a word.

An OUTPUT pattern is the set of activation values of the units in the OUTPUT layer. An INPUT pattern determines the activation values of units in the INPUT layer. Think about it. The information processing within a network has to begin somewhere. Since none of the units in the INPUT layer have any INPUT connections, none of the INPUT units has a COMBINED INPUT value to compute and to put through an activation function. As you might suspect, a set of activation values looks like this: {.31, .67, .85}. If this were an INPUT pattern to our network, the first value corresponds to the activation of the left unit in the INPUT layer, the second value corresponds to the activation of the center unit in the INPUT layer, the third value corresponds to the activation of the last (right) unit in the INPUT layer.

How does a particular set of INPUT or OUTPUT activations come to represent a particular object such as a cup? Do such patterns inherently represent objects?

No. As a result, what a particular INPUT or OUTPUT pattern represents is determined by the network designer. Thus, for this network, let's say that the INPUT pattern {.31, .67, .85} represents a prototypical cup. Let's say that the OUTPUT pattern we want associated with this INPUT is {.50, .48, .72}. And let's say that the objects can and door have their own patterns. How does the network learn to produce the correct OUTPUT pattern on the basis of a particular INPUT pattern?

Well, at the beginning of training, each of the connection weights is set to random values. During training, the network is presented again and again (for n number of epochs, where an epoch is a complete cycle) with a distorted set of the prototypical INPUT patterns corresponding to the objects cup, can, and door. Then, after each epoch, the connection weights are gradually adjusted via a learning algorithm (usually back propagation). Basically, what the learning algorithm does is compare what the OUTPUT patterns should have been in relation to what they were, then "tweak" the connection weight values by incrementally increasing some and decreasing others.

After completing the desired number of epochs, it is time to test the network to see whether it has learned. To test the network, it is presented for the first time with the prototypical INPUT patterns that correspond to the objects cup, can, and door. If the network has learned to recognize these objects, then its OUTPUT pattern will correspond within some accepted range to the target OUTPUT patterns. And if this occurs, then the ability of the network to generalize demonstrates that it has learned.

To see how a real connectionist network works, training and all, now would be a good time to explore the workings of an artificial neural net: GNNV is a connectionist network that recognizes faces. Before doing so, there are a couple of things to note about connectionist networks.

One is that although artificial neural networks are generally plausible models of how biological neural networks work, learning algorithms (such as back propagation) are NOT biologically plausible. Hence, how artificial neural networks learn and how biologically neural network learn are two different things. So, besides the fact that connectionist models process numbers as activations, not action potentials or neurotransmitters, how connectionist models learn is another reason why artificial neural networks only simulate how biological neural networks work.

The other is that in connectionist networks, the processing necessary to complete a task is occurs in parallel and is extended over many processing units. As such, each unit, like each neuron, can be very active for more than one feature or item of interest, and different items of interest can be stored as patterns of activity over the same set of units. Thus, a parallel, distributed, nonsymbolic manner of processing and storing information is a very natural, fast, and relatively cheap way of achieving what classical processing does, but with far fewer processing units.