[Revised 14/11/98]

Ullin T. Place

School of Philosophy, University of Leeds
School of Psychology, University of Wales, Bangor

Address for Correspondence:

Willowtree Cottage
North Yorkshire, YO7 2DY



This paper has four sections. Section A sets out four principles which should guide any attempt to reconstruct the evolution of an existing biological characteristic. Section B sets out thirteen principles specific to a reconstruction of the evolution of language. Section C sets out eleven pieces of evidence for the view that vocal language must have been preceded by an earlier language of gesture. Based on those principles and evidence, Section D sets out seven proposed stages in the process whereby language evolved: (1) the use of mimed movement to indicate an action to be performed, (2) the development of referential pointing which, when combined with mimed movement, leads to a language of gesture, (3) the development of vocalization, initially as a way of imitating the calls of animals, (4) counting on the fingers leading into (5) the development of symbolic as distinct from iconic representation, (6) the introduction of the practice of question and answer, and (7) the emergence of syntax as a way of disambiguating utterances that can otherwise be disambiguated only by gesture.

Keywords: gesture; miming; pointing; referring; sentence; vocalization; homesigning; iconic; symbolic; equivalence; protolanguage; syntax

A. The Gestural Theory of the Origin of Language.

In this paper I advocate a version of what Hewes (1973a; 1973b; 1976) calls "the gestural theory of the origin of language" which he traces back to Condillac (1746/1947), Tylor (1868; 1871), Morgan (1877), Wallace (1881; 1895), Romanes (1888), Wundt (1900) and Jóhannesson (1949; 1950). This version differs from earlier versions of the theory, including that of Hewes himself, in that it incorporates some more recent work from two sources: behavioral psychology and neuroscience.

A1. Principles for Reconstructing the Evolution of Biological Phenomena. Until the development of writing, language leaves no direct trace in the archaeological record. Consequently the reconstruction of its evolution is necessarily speculative. There are, however, certain principles to which we can appeal in deciding which of two alternative reconstructions is the more probable. In developing the reconstruction presented here, I have been guided by a number of principles. Three of these are principles which apply to any such evolutionary reconstruction. The remainder are specific to the evolution of language. The general principles are:

A2. The Selectionist Principle. The evolution of any complex biological characteristic proceeds in a sequence of small stages. Each stage comes about as a consequence of the selection from a population of chance mutations one that gives the group in which it occurs a selective advantage over groups in which it has not occurred in the particular niche occupied by the species at that stage in the evolutionary process. This principle would rule out the sudden emergence of an innate language faculty in a single step as envisaged by Chomsky (1965 etc.)

A3. The Principle of the Replication of Phylogeny by Ontogeny. Although phylogenetic development cannot be simply "read off" from an examination of the process of ontogenetic development, it is a reasonable assumption that the stages that are recognizable in the process of ontogenetic development correspond to stages that punctuated the phylogenetic development of the characteristic in question.

A4. The Principle of Regression to Earlier Adaptations. When the manifestation of an adaptation in its fully fledged form is blocked, organisms will revert to a form of adaptation that preceded it in the process of evolution.

A5. The Principle of Structural Traces Left by Functional Developments. Every mutation that is selected in the course of the evolution of a complex biological characteristic will leave its mark on the anatomical structure of members of the species in which it develops.

B. Principles Specific to Reconstructing the Evolution of Language

The principles that are specific to the evolution of language are:

B1. The Sentence as the Unit of Linguistic Communication. Language is primarily a means of communication. Contrary to the opinion of Fodor (1975), it is only secondarily and derivatively a vehicle for thought. Like other interpersonal communication systems, it consists of responses emitted by one individual, the sign-producer or speaker, which act as signs or discriminative stimuli, as Skinner (1938) calls them, which have a consistent and predictable behavior-orientating effect on another individual, the sign-receiver or listener, to whom the speaker's utterance is directed. What distinguishes language from other forms of interpersonal communication is that in order to exercise effective control over the behavior of the listener and secure the reinforcement that only the listener can provide, the speaker must construct and utter a sentence. Sentences of the kind that convey information, as distinct from those whose function is to facilitate the process of communication, differ from other non-linguistic response-produced signs two respects. Whereas other response-produced signs are repeated over and over again, as Chomsky (1957, etc.) has always insisted, information-providing sentences are seldom repeated word for word, but are typically constructed anew on each occasion of utterance, albeit out of words, phrases and sentence frames that are repeated. The phenomenon of novel sentence construction arises from the fact that, whereas the units of which sentences are composed derive their behavior-orientating function by generalization from or repeated association with the natural signs of the presence or impending presence of the kind of action or object they "stand for," sentences, provided they are constructed in accordance with the syntactic conventions accepted within the verbal community, have the ability to orientate the behavior of the listener towards the potential or actual presence beyond her current stimulus environment of a contingency the like of which she need never have experienced in her own case.

B2. The Principle of the Ontogenetic Primacy of the Mand. There can be little doubt that in the evolution of language, as in the linguistic development of the child, the earliest form of sentence to which listeners responded and speakers produced were of the type which B. F. Skinner (1957) calls a mand (command, request or question). In emitting a mand the speaker specifies an action to be performed by the listener the subsequent and consequent emission of which by the listener reinforces the speaker's mand and secures reinforcement from the speaker in the form of either an expression of gratitude or the withdrawal of a threat. According to Skinner (1957, p. 85), "behavior in the form of the mand operates primarily for the benefit of the speaker", whereas what he calls "behavior in the form of the tact", an information-providing sentence in other words, "works for the benefit of the listener by extending his contact with the environment." Tacts are typically emitted in response to an interrogative mand or question. They are reinforced, not, as in the case of the mand, by the behavior they call for from the listener, but by a variety of specialized reinforcers, responses such as gratitude for information supplied, agreement with opinions given, sympathy for troubles told, surprise at and interest in news reported, or laughter at jokes. The tact is thus a more sophisticated form of utterance than the mand. In the evolution of language it must have developed later, as it does in the child. Moreover, since interrogative mands presuppose the availability of the tacts they solicit from the listener, it follows that the first sentences must all have been imperatives.

B3. Argument Structure. A sentence performs its function of acting as a discriminative stimulus for the message-receiver or listener by depicting what Barwise and Perry (1983) call a situation. A situation is either a state of affairs whereby a property of an entity or a relation between two or more entities remains constant over a period of time or an event whereby a property or relation changes at a moment of time (instantaneous event) or over a period of time (process). In the case of a mand, the situation depicted by the sentence is either an event, a change which the listener is being asked to bring about, or a state of affairs she is being asked to preserve. In the case of a tact the situation depicted is typically one whose occurrence or existence outside the listener's current stimulus environment is being reported or predicted. In either case the sentence consists of a multi-place predicate or verb phrase and as many arguments or noun phrases as are needed to depict the situation. The function of the predicate is to depict an action which, in the case of a mand, is to be performed by the listener or, in the case of a tact, either has been or is being performed by some other person or object (the agent). The arguments represent the various objects involved in the situation depicted of which the agent (the listener in the case of a mand) is one. The object to be or being manipulated, the destination to which it is to be or is being conveyed, and the consequence for the sake of which the action is to be or is being performed are others. Their nature and number is determined by the nature of the action represented by the predicate. Where a predicate represents a change in or persistence of a property, it is monadic. It has only one argument place to be filled, that of the property-bearer. Other predicates depict a change in or persistence of a relation and are polyadic. They need two or more argument places to be filled corresponding to each of the various objects between which the relation holds.

B4. Evidence of Sentence Interpretation and Sentence Construction in Infra-Humans. It is clear from work that has been done with a number of infra-human species that the ability both to respond to and generate sequences of stimuli which conform to the argument structure characteristic of a sentence is not something that is exclusive to the human species. Clear evidence of the ability to respond to simple imperative sentences organized in this way has been provided in the case of suitably trained bottlenose dolphins (Turiops truncatus) by Lois Herman and his colleagues (Herman 1987; Herman et al. 1984; Herman, Kuczaj & Holder 1993) and in the case of similarly trained sea lions (Zalophus californianus) by Ronald Schusterman and his colleagues (Schusterman & Krieger 1984; Schusterman & Gisiner 1988). The evidence for the ability of infra-humans to produce such sentences is less clear cut. It consists, both in the case of Irene Pepperberg's (1987) African Grey Parrott (Psittachus erithacus), Alex, and in the case of the bonobo chimpanzee, Kanzi, studied by Sue Savage-Rumbaugh, and her co-workers (Greenfield & Savage-Rumbaugh 1990), in combinations of two elements, one a predicate and the other an argument. Evidence that these combinations are ones that the organism, the parrot or chimpanzee, has put together for himself rather ones he has imitated from the utterances of his human caregivers is much stronger in the case of Kanzi than it is in the case of Alex. This is partly because Kanzi is using artificial lexigrams combined in some cases with gestures rather than the vocal imitations of English used by Alex. But it is also because Greenfield and Savage-Rumbaugh (op.cit) took great care to exclude cases where a particular combination could have been imitated from the behavior of the caregivers, and because, in many cases the utterance combines the predicate either with another lexigram or with a gesture which is used to indicate the agent who is to perform the action specified by the predicate, a practice which is common in the utterances of young children whose sentences conform to what Bickerton (1990) calls "protolanguage" as in the child's sentence cited by Horne and Lowe (1996):Daddy push car

That said, it is clear from the absence of reports of combinations with more than one argument that Kanzi's ability to construct novel sentences is very much more restricted than his ability to understand such sentences when formulated in either Yerkish or English and much more restricted than the structures understood by Herman's dolphins and Schusterman's sea lions. This suggests that what a theory of evolution of human language needs to explain is not how our ancestors acquired the ability to decipher complex combinations of abstract symbols when organized in the form of the argument structure characteristic of the sentence. That is an ability that they and their mammalian predecessors had long possessed. What needs to be explained is how they acquired the ability which their ape ancestors, it would seem, did not possess to use abstract symbols to construct sentences of equal complexity to those they could understand and respond to and how these two components, sentence comprehension and sentence construction, came to be welded together in such a way as to satisfy Alvin Liberman's (1993) requirement, (quoted by Rizzolatti and Arbib, 1998), that "in all communication......the processes of production and perception must somehow be linked; their representation must, at some point, be the same."

B5. The Selection of Linguistic Functions through their Technological Utility. Once we begin to view semantics as the handmaiden of pragmatics, and syntactics as the handmaiden of semantics, it becomes increasingly difficult to endorse Chomsky's (op.cit.) belief that, in order to explain the human ability to construct and construe complex sentences, we must postulate an innate language faculty which appeared deus ex machina in a single gigantic mutation at the dawn of human prehistory. Mutations there must have been. How else can we explain the fact that we talk and animals, even with the best human instruction barely do so? But what we must look for is not just one mutation, but a number of mutations spread out over millions of years, each one building on what has gone before, each one providing a selective advantage to the group in which it occurs which has enabled its members to survive and pass on their genes, when those who lacked that mutation went to the wall. Nor should we expect to find that the selective advantages which have promoted the survival of the groups in which such mutations have occurred have always been advantages conferred by improvements in interpersonal communication. It seems likely that the earliest mutations whose establishment made the development of language possible were selected in the first instance, not so much for their utility in relation to the process of interpersonal communication as for their utility in relation to a Stone Age hunting-and-gathering technology. Two mutations which may well have been selected in this way are that which made referential pointing possible and that or those which produced the changes in the mouth and larynx which made possible the production of vocal speech. An impressive body of evidence to be reviewed in Section C below makes it tolerably certain that the earliest form of human linguistic communication took the form of a "language of gesture" (Piaget 1926/1932) in which imperatives are constructed by combining a predicate in the form of a mimed action, something of which, as we know from Köhler's (1921/1927 pp. 307-8) observations, chimpanzees are already capable, with one or more arguments (agent, manipulandum, destination) indicated by referential pointing at the objects in question. In addition to their ability to use miming to indicate the action to be performed, there is evidence (Savage-Rumbaugh 1986) that chimpanzees can learn to use pointing to indicate a particular food which they wish the human trainer to give them, and that the bonobo Kanzi (Greenfield & Savage-Rumbaugh op.cit.) has learnt to use referential gestures to indicate who is to do what, what object is to be manipulated or where it is to be placed, but in both cases only when the utterance is directed to a human caregiver. That, it would seem, is because, although they can learn to point, the evidence (Savage-Rumbaugh 1986, pp. 13, 180-2; Povinelli & Davis 1994) suggests that they cannot respond to such gestures when made by others. If so, referential pointing directed to a conspecific would be of no avail,(1) a circumstance which would both confirm and explain Wundt's (1900) contention that referential pointing is something that apes in their natural state do not do. As Louis Herman (1998) points out, this inability of chimpanzees to respond to referential pointing is in marked contrast to the bottlenosed dolphins he has trained who, as described in Herman, Pack and Morrel-Samuels (1993), readily learn to take an object to a place indicated by referential pointing on the part of the human trainer. Herman suggests that the ability to respond to referential pointing may be "a generalization from the directionality of their `searchlight beam' echolocation signal which `illuminates' remote objects sonically, a behavior also associated with a linear posture of the rostrum pointing at the echolocated object."

He also cites some recent evidence (Xitco & Roitblat, 1996) "that one dolphin can identify the target being sonically illuminated by another dolphin." If this is correct, it would add weight to the suggestion of Noble and Davidson (1996) that in the human case referential pointing at objects may have evolved from the ability to aim weapons at the same target which a group of hunters would need to be able to do in order to dispatch a large prey. A similar story can be told with respect to the other component of the "language of gesture", the mimed movement used to indicate an action to be performed by the sign-recipient. This ability, as we have seen, is already present in the repertoire of chimpanzees. It follows that this use of gesture must have evolved before the bloodline leading to the chimpanzee diverged from that leading to the hominids and ultimately to homo sapiens, and thus before the emergence of any substantial technological development. Nevertheless, it is still possible to argue that the relevant mutation was selected in the first instance by virtue of the contribution which the ability to imitate the movements of others makes to the acquisition of a manipulative skill of the type in which chimpanzees readily become proficient. A similar argument can be deployed in relation to the selection of the mutation or series of mutations which have altered the conformation of the human mouth and larynx in such a way as to make possible vocal speech. In this case it seems entirely plausible that this was selected in the first instance, not by virtue of its utility in interpersonal communication, but by virtue of allowing humans to imitate the calls of animals, and thus lure the prey into the traps that the emerging technology has provided (Hewes 1973b, p. 19).(2)

B6. The Tower of Babel Principle. As the Biblical story of the Tower of Babel implies, there is an intimate connection between the initial evolution of human language and subsequent development of mutually unintelligible natural languages on the one hand and the human reliance on technology, rather than the development of physical characteristics, in adapting to a new environment. The precise role of language in the exploitation, communication and development of a new technology is debateable. But, as the Tower of Babel story suggests, the utility which led to its selection was probably connected to technological projects, such as hunting, building traps for large animals or providing shelter where no natural cave was available, which would require the coordinated activity of a number of individuals.

B7. The Behaviorist or Language-Learning Principle. The communication systems found in pre-linguistic organisms which have been analyzed by the ethologists depend on the propensity to produce and respond to innate releasing stimuli (Tinbergen 1948; 1951). In so far as there is learning involved, it is narrowly constrained to particular contexts and to particular stages in the organism's development or life cycle as in the well known phenomenon of imprinting (Lorenz, 1935/1957). Unlike these pre-linguistic communication systems which have their human counterparts in our `instinctive' emotional responses to such things as other people's facial expressions, tone of voice and `body-language', language proper is a form of learned behavior. In language, arbitrary stimuli acquire their stimulus function by virtue of social conventions which vary from one natural language to another. Moreover, the evidence of phenomena such as the so-called "back channels" and the conventions of politeness shows that conformity to these social conventions is acquired, maintained and modified by the same process of operant or instrumental learning which we observe in the acquisition, maintenance and modification of the motor skills of all free-moving living organisms. Were it not so, language would not be able to adapt in the way that it evidently does to new technologies and new environments. Nor could it support the kind of creative thought process which is needed to create and disseminate a new technology. But if, as this evidence suggests, the acquisition of linguistic competence follows the same principles as those found in the learning of animals, we need to explain why it is that they have not developed language also, and why, when they are taught language by humans, even the anthropoid apes seem unable to progress much beyond the level of a human two-year old. In order to explain that we have to suppose that mutations have been selected in the course of human evolution which make the acquisition of the kinds of associative links involved in the interpretation and production speech very much easier for humans than it is for members of other species.

B8. The Principle of Natural Signs and Pre-Linguistic Concepts. In order to adapt effectively to its environment, any free-moving living organism with a complex behavioral repertoire must be able to recognize objects and situations of the different kinds which it regularly encounters in its environment, if it is to select from the various behavioral strategies available to it those which suit the particular context and the needs of the moment. An organism which has at its disposal an armory of behavioral dispositions appropriate to a variety of different circumstances under which an object or situation of a particular kind may be encountered can be said to have a pre-linguistic concept of things of that kind. The stimuli which, by virtue of their regular association with objects and situations of that kind, acquire the ability to bring such a concept into play (if they do not already possess it by virtue of the innate constitution of the nervous system) constitute the natural signs of the presence of things of that kind.

B9. The Principle of the Determination of Concepts by Environmental Contingencies. The concepts formed by organisms of the complexity of a bird or a mammal which has a rich variety of behavioral strategies at its disposal derive in part from the genetic constitution of the species and are partly acquired by learning from the past experience of the individual concerned. The effect of both processes is to ensure that the various features of the environment, both those that are common to all environments that members of the species are likely to encounter and those that are peculiar to the current circumstances of the individual are classified in a way that yields reliable predictions of the consequences of behaving in one way rather than another. It follows that, despite differences due to differences in the ecological niche occupied by the species, all living organisms who depend for their survival on being able to conceptualize any problematic stimulus the individual encounters will tend to share a similar conceptual scheme, one which "carves nature at its joints". Without such a shared conceptual scheme at its foundation, linguistic communication in which the same arbitrary symbol becomes attached to the same concept for all members of the linguistic community in question could never have developed.

B10. The Principle of the Development of the Symbolic from the Iconic. Evidence from the study of "homesigning" in congenitally deaf children who have no access to a regular sign language in their early years (Tervoort 1961; Morford et al. 1993; Morford 1996), from the history of American Sign Language (Frishberg 1975) and from the history of the development of Chinese pictograms shows that in the development of a system of linguistic communication which is independent of vocal speech, the earliest signs are invariably iconic in the sense that they imitate the visual appearance of the object they represent. We have also seen reason to think that same may have been true of the earliest vocal signs, in this case imitating the sound made by the object referred to. In all these cases the evidence shows a tendency for the sign system as it develops to move away from the iconic towards the use of arbitrary symbols which have no iconic resemblance to what they stand for. The use of arbitrary symbols not only allows for the representation of objects whose features cannot easily be imitated or depicted, it also eliminates the possibility of misunderstandings due to focusing on aspects of the image presented which are irrelevant to the communicator's purpose. In the face of this evidence the conclusion that a similar progression from the iconic to the symbolic must have taken place in the evolution of language as a whole is irresistible. However, it has been suggested to me by Heng-syung Jeng and, indirectly, by Rob Burling (personal communications) that symbolic representation could only have developed in the course of evolution with the emergence of a form of vocalization which included consonants as well as vowels. It is argued that with vowels alone vocalization is restricted in its representation of objects and events to an analogue imitation of the sounds that they make. With consonants it becomes possible to generate a string of discrete digital units (syllables and words) to which any arbitrary significance can be attached.

B11. The Principle of Stimulus Equivalence. Recent research within the behavior-analytic tradition on the formation of stimulus equivalence classes (Sidman 1971; Sidman & Tailby 1982; Sidman 1986; 1990), though difficult to interpret clearly and uncontroversially, is evidently exploring a fundamental process whereby arbitrary stimuli acquire symbolic function. Suppose that a three year old child is taught, when presented with an arbitrarily selected stimulus A (the sample), to select from a group of comparable stimuli (the comparisons) another arbitrarily selected stimulus B (the target). If, after this initial training, it is presented with stimulus B and asked to select from a group of stimuli which includes A, it will spontaneously pick A. A child who spontaneously generalizes in this symmetrical way is said to have formed a stimulus equivalence class whose members in this case are A and B. Figure 1, taken from Sidman (1990), illustrates the formation of a stimulus equivalence class using the matching to sample procedure. In both the upper and lower figures the sample is in the center of the cross and the comparisons are in the arms. It is proposed that in previous linguistic training the behavior of picking (nota bene) the drawing of a car (the target in the case of the upper figure), when presented with the written word CAR as sample, and the behavior of picking the written word CAR (the target in the case of the lower figure), when presented with the drawing of a car as sample, have both been reinforced. As a consequence these two stimuli are treated as equivalent and are said to have become members of the same stimulus equivalence class. Although in this particular example the equivalence class has been formed by linguistic training outside the laboratory, there is a wealth of experimental evidence showing that in human children and adults any two arbitrarily selected stimuli can be formed into an equivalence class by this procedure. Despite the fact that the experimental technology tends to restrict research to the investigation of arbitrary associative links between static visual shapes,(3) it is generally agreed by workers in this field that the ability to form stimulus equivalence classes in this sense is intimately associated with the early stages of language development in the human infant. According to the view endorsed here (Place 1995/6), an arbitrary response-produced stimulus becomes a symbol for or name of some individual object or kind of object, property, relation or event when, as illustrated on Figure 1, it becomes a member of a stimulus equivalence class which includes amongst its members one or more natural signs of the presence of the individual or kind which it thereby symbolizes. The propensity of the child that is developing language to form such stimulus equivalence classes is seen as a result of having repeatedly learned both, as speaker, to produce the symbol or name in the presence of a natural sign of the thing it `stands for' and, as listener, to pick out the natural sign when presented with the symbol or name. Despite many attempts to do so, there is no convincing evidence that any animal species, including apes who have been taught to use sign-language or other symbols, has spontaneously developed a stimulus equivalence class in the way human children invariably do, i.e., unless the individual has been specifically trained to respond to each of the possible combinations of sample and comparison. There is evidence, moreover, (Beasty 1987; Dugdale & Lowe 1990; Horne & Lowe 1996) which links the emergence of spontaneous stimulus equivalence class formation with the use of names to distinguish the stimuli the child is learning to associate. Animals who have been taught to use symbols not only fail to show the spontaneous formation of stimulus equivalence classes. They also fail to show the exponential increase in vocabulary size which has been referred to as the "naming explosion" and which would seem to begin in the child at about the same time (around the age of two). It seems that a mutation has been selected which gives human beings the ability to form the kind of associations involved in giving significance to arbitrary symbols far more readily than any other species.

B12. Bickerton's Proto-Language. Because the work that has been done on the formation of stimulus equivalence classes has focused on static visual stimuli, it is directly relevant only to the acquisition of object-names. How action-names are acquired has not been studied from this perspective. However, it seems likely that this ability grew out of the ancient practice of representing actions by mimed movement, just as the ability to acquire object-names appears to have grown out of the practice of pointing at objects in order to establish reference to them. Once object-names and action-names have been acquired it becomes possible to construct sentences in what Bickerton (op.cit.) has called "proto-language" in which sentences consist of an object-name or noun specifying the agent, an action-name or verb specifying the action to be performed, and a second object-name or noun specifying the manipulandum. Horne and Lowe's (op.cit.)

Daddy push car

is a typical example of just such a sentence. Apart from the distinction between verb and noun and the order in which the different components of the argument structure occur, such sentences are devoid of syntax. Nevertheless, within their limitations, they provide the rudiments of a working symbolic language.

B13. The Principle of the Progressive Extension of Referential Scope. As language develops in the child and as it presumably developed in the species, reference is initially restricted to objects in the current common stimulus environment of sign-producer and sign-receiver to which the sign producer refers by pointing at them. With the introduction of iconic representation reference is extended to objects which are absent from the common stimulus environment of both speaker and listener, but only in so far as either their shape can be depicted by means of a mimed movement or their sound can be vocally imitated. With the introduction of symbolic representation reference is extended to absent objects, both individuals and kinds, to which a name has been assigned by the conventions of the language. With the introduction of syntax, particularly with the introduction of embedded clauses, it becomes possible to refer to absent objects by description.

C. Evidence for the Role of the Human Hand in the Evolution of Language.

Before proceeding to detailed reconstruction of the evolution of language based on these principles, it will be helpful to review some of the evidence which supports the view that the freeing of the human forelimb from its locomotor functions, and the consequent development of manipulative skills, is as important for the evolution of language as it clearly is for the evolution of technology. The following pieces of evidence are relevant in this connection:

C1. A Good Vocal Apparatus is not Enough. Many birds have a vocal apparatus as good as that of humans; yet they have not developed language. This suggests that the crucial difference between birds and humans in this respect may be that, while both are bipedal, the forelimbs of birds are still specialized for locomotion, rather than, as in the human case, for manipulation.

C2. Gesticulation as the Invariable Accompaniment of Speech . The occurrence of gesticulation as an invariable accompaniment of speech strongly suggests that gesticulation had a much more important role in the early stages of language evolution.

C3. Gesticulation as the Invariable Default when Speech is Blocked. Whenever vocal communication is blocked, either because it cannot be heard or, if heard, cannot be understood, human beings of every culture invariably fall back on gesticulation.

C4. Sign-Language. The ease with which the deaf learn sign-language, particularly if brought up in an environment in which signing is in constant use by others, and the spontaneous development of "homesigning" by those who are not, suggests that the ability to use and respond to manual signs is an integral part of our human linguistic heritage.

C4. Referring to an Object by Pointing at it. The practice of pointing with the index finger as a way of establishing reference to objects in the common stimulus environment of speaker and listener is a linguistic universal which by common consent plays an essential role in the acquisition of word-meanings.

C5. Sentences in the Language of Gesture. The earliest form of sentence seems to have been one in which the function (action) is indicated by means of a mimed movement and the arguments by pointing at the objects concerned. Communication which relies exclusively on sentences of this type constitutes a "language of gesture" (Piaget op.cit.; Hewes 1973a; 1973b; 1976) on which human beings invariably fall back when vocal communication is blocked.

C7. The Association between Handedness and Language in Brain-Lateralization. The concentration of areas specialized for language in the same hemisphere of the cerebral cortex as that which controls the hand which is preferred for precise manipulative tasks demonstrates the intimate connection between the two functions (Cf. Hewes 1973b, p. 9).

C8. Reading Ability cannot have Evolved to Decipher Writing. There is a part of the human cerebral cortex, the angular gyrus on the dominant (usually left) hemisphere, which is specialized for deciphering linguistic stimuli in the visual modality (Thompson 1993, pp. 399-402). Since writing and reading have developed far too recently and are still far from universal human accomplishments, the need to decipher a written text cannot explain the development of this ability to process visually presented linguistic signs. It must have been selected, probably before the development of speech, to facilitate the interpretation of a language of gesture.

C9. Rizzolatti and Arbib's `Language in our grasp'. In a recent paper entitled `Language in our grasp', Rizzolatti and Arbib (op.cit.) have reached a similar conclusion in the light of evidence that Broca's area in the human left frontal cortex, long known as the area involved in the production and interpretation of syntactically articulated sentences, is homologous with an area in the monkey's brain (F5) where neurons ("mirror neurons") have been found which respond both to the production of visually-controlled hand-movements and to the visual perception of the corresponding movements when made by others. Although Rizzolatti and Arbib do not make these points, it is evident that this link between the execution of a voluntary hand-movement and the visual perception of similar movements made by others is (a) a by-product of the visual feedback-control of voluntary movement, (b) the foundation for the ability to imitate the hand-movements of others without which a human technology based on the manufacture and use of tools would have been impossible, and (c) the foundation of the ability found, as we have seen, in chimpanzees to communicate by miming the action to be performed by the sign-receiver.

C10. Counting and the Communication of Number Using the Fingers of Two Hands. No one would seriously dispute the claim that the earliest form of counting consisted in the practice which is found in every human culture of counting up to ten on the fingers of the two hands, and displaying the result to others by holding up the relevant number of fingers. This practice can, perhaps, be seen as an outgrowth of the ability to refer to objects by pointing at them. But since what is pointed at are the fingers rather than the objects being counted, this form of counting is an iconic representation of the number of the objects.(4) Furthermore since you can only count things of a kind, counting presupposes a pre-existing ability to classify objects into kinds and, in the case of communicating the results of a count, a pre-existing ability to indicate the kind of object being counted.

C11. Pointing and Picking in the Learning of Object-Names. The recent work on the process whereby arbitrary response-produced stimuli become symbols for (names of) objects, described in section B11 above, argues for a key role in this process for the manual responses of pointing at and picking out the relevant stimuli.

C12. The Role of Mimed Action in the Learning of Action-Names. Little appears to be known about the process whereby action-names are learned. What is known (Köhler op. cit.) is that chimpanzees communicate what they want a conspecific to do by miming the action in question, and such miming is a conspicuous feature both of the gesticulation that invariably accompanies speech, unless the hands are otherwise engaged and of sign-languages, whether officially recognized or devised by the individual. This suggests that miming of the action by the caregiver and its imitation by the child must play a key role in the acquisition of action-names.

D. A Hypothetical Scenario for the Evolution of Language

In the light of this evidence and the principles outlined in Sections A and B above, I would propose the following scenario for the evolution of what I am suggesting is the sequence of stages involved in the evolution of language:

D1. Mimed Action. The first stage in the evolution of language appears to have occurred at a time when chimpanzees and humans had a common ancestor. Three interconnected abilities would seem to have developed at this stage:

(a) the ability to use sticks and stones as tools and weapons,

(b) the ability to imitate the movements of others in the context of learning to perform the manipulations involved in the effective use of tools and weapons,

(c) the ability to communicate what one wants someone else to do by miming the actionrequired.

There is some reason to think that the concentration of the manipulative and communicatory functions in one hemisphere of the cerebral cortex (the left in those who are right-handed) may have begun at this stage, perhaps with the specialization of such structures as the angular gyrus and the pre-motor cortex in the dominant hemisphere for the visual interpretation of hand-movements in general and gesture in particular.

D2. The Language of Gesture. The second stage culminates in the emergence of the first true sentences formulated in the "language of gesture". It begins with the emergence of the practice of pointing referentially at objects, at the individual who is to do something, at an object to be manipulated, at a destination or location to which the individual is to move or to which the object is to be moved. As we have seen (Section B4 above), this ability is lacking in chimpanzees, not because it is something they cannot learn to do, but because referential pointing is something to which, unlike dolphins, they cannot learn to respond. We have also seen reason to agree with Noble and Davidson's (op.cit.) suggestion that this ability may have evolved with the development of the ability of a group of hunters to aim their weapons at the same target. Given the ability to use pointing to distinguish (a) who is to perform the action, (b) the object to be manipulated and (c) the individual to whom the object is to be transferred, it becomes possible for the first time in the history of communication between living organisms to construct novel sentences in what may be justly described as "the language of gesture" in which different mimed actions are combined with different combinations of argument (agent, object and recipient) identified by pointing at them.(5) But significant though it is, the practice of referring to objects by pointing at them is severely limited in its scope. Whereas mimed action allows the communicator to refer to what has not yet occurred, the action to be performed by the respondent, referring to objects by pointing at them allows the communicator to refer only to conspicuous objects in the stimulus environment of both parties. The effect of subsequent developments is to increase that scope beyond what is indexically present.

D3. Iconic Vocalization. In Stage 3 vocalization is added to the language of gesture. It depends on changes to the conformation of the mouth and larynx which are selected in the first instance by their effect in allowing human beings to imitate the sounds made, for example, by the male or female of the species to attract a potential mate, thereby enticing the latter into the traps which the technology provides. Once established such calls are introduced into otherwise gestural sentences as an alternative to pointing at instances of the object where no such instance is present. Since there is no obvious trace of the kind of iconic gestures used by "homesigners" to represent objects (Morford et al. op.cit.) in the gesticulations of those without auditory impairment, I am inclined to think that the vocal imitation of sounds made by animals were the first iconic representations of objects, as distinct from the iconic representations of actions by means of mimed actions which have been used since the days of our ape ancestors to represent the action to be performed by the sign-recipient. They make it possible for the first time to talk about absent objects as well as actions not yet performed.

D4. Counting. The position of this fourth stage in the sequence of evolutionary events leading to fully developed language is unclear. It is placed here because it can be plausibly seen as the first step in the move away from the iconic towards a symbolic system of representation. It is the development of the ability to count up to ten on the fingers of the two hands and communicate the result by holding up the appropriate number of fingers. Considered as a representation of the number of objects in a group, holding up the corresponding number of fingers may be considered iconic. But, once they progress beyond the number of fingers on the two hands, counting systems inevitably become symbolic. Vocal counting is invariably symbolic from the outset.

D5. Symbols. In Stage 5 the first representations of objects using arbitrary symbols (names) begin to appear. Once the use of symbols is well established in the repertoire of a human child, all that is required for the child to learn a new name or other lexical word is for the instructor to point to one or two instances of the kind of object the word is used to refer to while uttering the word in question. However, the evidence reviewed in Section B11 above suggests that in its early stages learning the names of things is a much more complex process, one in which there is reinforcement both of the response of producing the name in the presence of an instance of the kind in question and the response of picking an instance of the kind in the presence of the name. Although apes, and possibly members of other animal species, can be taught to use symbols, they never progress to the point where there is spontaneous generalization in both directions between the word or symbol and the natural signs of the presence of the object for which it stands. To be able to learn word-meanings as easily as a human child does from about the age of two requires a mutation which has occurred and been selected only in the human species. Apes who have been taught sign language or some other form of symbolic communication can construct sentences in what Bickerton (op.cit.) calls "proto-language". But without the rapidly expanding vocabulary which seems to develop only with the spontaneous emergence of stimulus equivalence classes, language can never "take off" as it does in the human child. Even so, consisting as they do entirely of names (lexical words), proto-language sentences have no syntax other than the verb/noun distinction. That and perhaps some of the other distinctions that are later drawn by means of syntax, are indicated by gesture which, at this stage, still forms an integral part of the process of linguistic communication. This is the first stage in the evolution of language where the increased efficiency of language as a medium for interpersonal and intrapersonal communication is unquestionably what determines the selection of the mutation that provides it, rather than its utility in relation to some purely technological adaptation. It is at this stage presumably that Wernicke's area evolves as a center for the interpretation and production of names. With the development of symbols (proper names) referring to particular persons and places, unambiguous reference to individuals in their absence becomes possible for the first time.

D6. Question and Answer. As argued in Section B2 above, the developmental evidence suggests that the first sentences produced and responded to by our ancestors in the course of language evolution were all imperatives. It also seems likely that the earliest declarative sentences were answers to questions and that questions and answers evolved simultaneously as part of a single practice. As in the case of counting, it is unclear at what stage in the evolution of language this development took place. The best guess is that it was associated, as it seems to be in children, with the so-called "naming explosion" which occurs around the age of two or three and consists in a rapid increase in the child's vocabulary, particularly the names of kinds of object. This event appears to coincide with the child's discovery of the practice of asking questions of the caregiver, particularly questions about the names of things, a practice which, like the "naming explosion" it triggers, seems to be absent from the behavior of the most intelligent of those apes who have been taught a form of sign-language.

D7. Syntax. The development of syntax is the final stage in the evolution of language. It is selected by virtue of its effect in releasing linguistic communication from dependence on the listener's paying attention to the context of utterance and the gestures of the speaker in order to disambiguate what a speaker is saying. It thus allows speakers to talk intelligibly about situations which are not part of the current stimulus environment of either speaker or listener, whether in the past, in the future or at some place geographically remote from the context of utterance. Once it is fully developed, gesture, though still a valuable aid to the speaker's eloquence, ceases to perform any essential communicatory function as far as the listener is concerned. But, if gesture itself has been made redundant for all but the deaf by the introduction of syntax, it seems that the connection between language and manual and other forms of motor skill still survives in the remarkable parallel to which Horgan and Tienson (1996) have drawn attention between the syntactic organization of sentences and the syntactic (no metaphor) organization of a motor skill such as basket-ball playing. It is an open question whether syntax evolved, as Chomsky would have us believe, through a single mutation, or whether the emergence of each class of syntactic operator required the selection of a separate mutation. In favor of the former view is the existence of a single area in the human cerebral cortex, Broca's area, which is specialized for its interpretation and production, damage to which appears to affect all types of syntactic operator more or less equally (Thompson op.cit., p. 398). In favor of the latter view is the observation that the order in which the different classes of syntactic operator are acquired by the child is a linguistic universal (Slobin 1985; Aitchison 1989). With the introduction of syntax, particularly the definite article and the relative clause, it becomes possible for the first time to refer to absent objects by descriptionas well as by proper name.


I am indebted for their stimulating comments and for additional references to Bernard Bichakjian, Paul Bloom, Rob Burling, Annabel Cormack, Tom Dickins, Heng-syung Jeng, Harry Jerison and Jill Morford. Since I have not otherwise cited his work, I should also express my indebtedness to Lev Vygotsky's (1934/1986) Thought and Language to which, among other things, I am indebted for the crucial references to Köhler, Piaget and Wundt.


Aitchison, J. (1989) The articulate mammal: An introduction to psycholinguistics, Routledge.

Barwise, J. & Perry, J. (1983) Situations and attitudes, MIT Press.

Beasty, A. (1987) The role of language in the emergence of equivalence relations: A developmental study. Unpublished Ph.D. thesis, University of Wales, Bangor, U.K.

Bickerton, D. (1990) Language and species, University of Chicago Press.

Chomsky, N. (1957) Syntactic structures, Mouton.

Chomsky, N. (1965) Aspects of the theory of syntax, MIT Press.

Condillac, É. B. de (1746/1947) Essai sur l'origine des connaissances humaines, ouvrage ou l'on réduit à un seul principe tout ce concerne l'entendement, In: Oeuvres Philosophiques de Condillac. Paris: Georges LeRoy.

Dugdale, N. & Lowe, C.F. (1990) Naming and stimulus equivalence. In: Behavior analysis in theory and practice: Contributions and controversies, ed. D. E. Blackman & H. Lejeune, Erlbaum.

Fodor, J. (1975) The language of thought, MIT Press.

Frishberg, N. (1975) Arbitrariness and iconicity: Historical change in American Sign Language, Language 51:696-719.

Greenfield, P. M., & Savage-Rumbaugh, E. S. (1990). Grammatical combinations in pan paniscus: processes of learning and invention in the evolution and development of language. In: "Language" and intelligence in monkeys and apes: comparative developmental perspectives, ed. S. T. Parker & K. R. Gibson, Cambridge University Press.

Harzem, P. & Miles, T. R. (1978) Conceptual issues in operant psychology, Wiley.

Herman, L. M. (1987). Receptive competencies of language-trained animals, In: Advances in the study of behavior, ed. J. S. Rosenblatt, C. Beer, M.-C. Busnel, & P. J. B. Slater, Academic Press.

Herman, L. M. (1998) The dolphin's grammatical competency: Comments on Elements of syntax in the systems of three language-trained animals, E. Kako. Animal Learning and Behavior.

Herman, L. M. Kuczaj, S. A. & Holder, M. D. (1993). Responses to anomalous gestural sequences by a language-trained dolphin: Evidence for processing of semantic relations and syntactic information. Journal of Experimental Psychology: General 122 (2):184-194.

Herman, L. M., Pack A. A. & Morrel-Samuels, P. (1993). Representational and conceptual skills of dolphins, In: Language and communication: comparative perspectives, ed. H. R. Roitblat, L. M. Herman & P. Nachtigall, Erlbaum.

Herman, L. M., Richards, D. G. & Wolz, J. P. (1984) Comprehension of sentences by bottlenosed dolphins, Cognition 16:129-219.

Hewes, G. W. (1973a) An explicit formulation of the relationship between tool-using, tool-making and the emergence of language, Visible Language 7:101-127.

Hewes, G. W. (1973b) Primate communication and the gestural origin of language. Current Anthropology 14:5-24.

Hewes, G. W. (1976) The current status of the gestural theory of language origins. In: Origins and evolution of language and speech, ed. S. R. Harnad, H. D. Steklis, & J. Lancaster, New York Academy of Science.

Horgan, T. & Tienson, J. (1996) Connectionism and the philosophy of psychology, MIT Press.

Horne, P. J. & Lowe, C. F. (1996) On the origins of naming and other symbolic behavior. Journal of the Experimental Analysis of Behavior 65:185-241.

Jóhannesson, A. (1949) Origins of language: Four essays. Leiftur.

Jóhannesson, A. (1950) The gestural origins of language, Nature 166:60-61.

Köhler, W. (1921/1927) Intelligenzprüfungen auf Menschenaffen. Springer. English translation by E. Winter as The mentality of apes, 2nd Ed. Routledge & Kegan Paul.

Liberman, A. M. (1993) Haskins Laboratories Status Report on Speech Research 113:1-32.

Lorenz, K. (1935/1957) Der Kumpan in der Umwelt des Vogels; die Artgenoße als auslösendes Moment sozialer Verhaltungsweisen, Journal of Ornithology, 83:137-213 & 289-413. English translation as `Companionship in Bird Life: Fellow Members of the Species as Releasers of Social Behavior' In: Instinctive Behavior, ed. C. H. Schiller, International University Press.

Morford, J. P. (1996) Insights into language from the study of gesture: A review of research on the gestural communication of non-signing deaf people, Language and Communication 16:165-178.

Morford, J. P., Singleton, J. L. & Goldin-Meadow, S. (1993) The role of iconicity in manual communication, In: K. Beals, G. Cooke, D. Kathman, S. Kita, K.-E. McCullough & D. Testen, Papers from the Chicago Linguistic Society 29, Vol 2: The Parasession:243-253.

Morgan, L. H. (1877) Ancient society, Holt.

Noble, W. & Davidson, I. (1996) Human evolution, language and mind: A psychological and archaeological inquiry. Cambridge University Press.

Pepperberg, I. M. (1987). Interspecies communication: A tool for assessing conceptual abilities in the African Grey parrot. In: Cognition, language and consciousness: Interactive levels, ed. G. Greenberg & E. Tobach, Erlbaum.

Piaget, J. (1926/1932) The language and thought of the Child, 2nd Ed. Routledge & Kegan Paul.

Place, U. T. (1985) Three senses of the word "tact". Behaviorism, 13:63-74.

Place, U. T. (1995/6) Symbolic processes and stimulus equivalence. Behavior and Philosophy, 23/24:13-30.

Povinelli, D. J. & Davis, D. R. (1994). Differences between chimpanzees (Pan troglodytes) and humans (Homo sapiens) in the resting state of the finger: implications for pointing. Journal of Comparative Psychology, 108:134-139.

Rizzolatti, G. & Arbib, M. A. (1998) Language within our grasp. Trends in Neuroscience, 21:188-194.

Romanes, G. J. (1888) Mental evolution in man: Origin of human faculty, Kegan Paul.

Savage-Rumbaugh, E. S. (1986). Ape language: From conditioned response to symbol, Columbia University Press.

Schusterman, R. J. & Gisiner, R. C. (1988) Artificial language comprehension in dolphins and sea lions: The essential cognitive skills. The Psvchological Record 38:311-348.

Schusterman, R. J. & Krieger, K. (1984) California sea lions are capable of semantic comprehension. The Psvcholoeical Record 34:3-23.

Sidman, M. (1971) Reading and audio-visual equivalences. Journal of Speech and Hearing Research, 14:5-13.

Sidman, M. (1986). Functional analysis of emergent verbal classes. In: Analysis and integration of behavioral units, ed. T. Thompson & M. D. Zeiler, Erlbaum.

Sidman, M. (1990). Equivalence relations: Where do they come from? In: Behaviour analysis in theory and practice: Contributions and controversies, ed. D. E. Blackman & H. Lejeune, Erlbaum.

Sidman, M. & Tailby, W. (1982) Conditional discrimination vs. matching to sample: an expansion of the testing paradigm. Journal of the Experimental Analysis of Behavior, 37:5-22.

Skinner, B. F. (1938) The behavior of organisms, Appleton-Century.

Skinner, B. F. (1957) Verbal behavior, Appleton-Century-Crofts.

Slobin, D. I., ed. (1985) The crosslinguistic study of language acquisition, 2 vols., Erlbaum.

Tervoort, B. T. (1961) Esoteric symbolism in the communication behavior of young deaf children. American Annals of the Deaf, 106:436-480.

Thompson, R. F. (1993) The brain: A neuroscience primer, 2nd Ed., Freeman.

Tinbergen, N. (1948) Social releasers and the experimental method required for their study. Wilson Bulletin 60:6-52.

Tinbergen, N. (1951) A study of instinct, Clarendon Press.

Tylor, E. B. (1868) On the origin of language. Fortnightly Review, 1:22.

Tylor, E. B. (1871) Primitive culture. John Murray.

Vygotsky, L. (1934/1986) Thought and language. English translation by A. Kozulin, MIT Press.

Wallace, A. R. (1881) Review of Anthropology by Edward B. Tylor. Nature 24:242-245.

Wallace, A. R. (1895) Expressiveness of speech, the mouth gesture as a factor in the origin of language. Fortnightly Review 64:528-543.

Wundt, W. (1900) Völkerpsychologie, Vol. I: Die Sprache, Engelmann.

Xitco, M. J. & Roitblat, H. R. (1996). Object recognition through eavesdropping: passive echolocation in bottlenose dolphins. Animal Learning and Behavior 24:355-365.


1. Sue Savage-Rumbaugh (1986, p.315) reports "food-sharing sessions with Sherman and Austin", the two chimpanzees (Pan troglodytes) she had trained before she began working with Kanzi, the bonobo (Pan paniscus), in which "When the table was between them [my italics] they could communicate which food to give simply by pointing to it, and at times they did, though pointing tended to be used to clarify symbolic [i.e., using lexigram] requests." The fact that this behavior seems to have occurred only when the two chimpanzees were sitting opposite one another suggests to me that it was the direction of gaze rather than the pointing to which they were responding.

2. The ability to imitate the calls of wild animals seems to have fallen into disuse, at least in those cultures affected by the neolithic revolution and the domestication of animals. That it must once have existed is shown by the ease with which children in our culture learn to make a passable imitation of dog barking, a cat miaowing, a lamb bleating, a cow mooing, a pig grunting, a hen clucking and cock crowing.

3. But see Sidman (1990) Figure 4.2, p. 95, for an experiment in which the sample is spoken word.

4. I am indebted to Professor Robbins Burling of the University of Michigan (personal communication April 1998) for convincing me that, unlike vocal counting which is inevitably symbolic from the outset, digital counting, together with some written number systems such as the Roman (before the practice of writing IV instead of IIII was introduced), is a form of iconic rather than, as I had previously thought, a form of symbolic representation.

5. Dr. Marina Sbisà of the Department of Philosophy, University of Trieste, (personal communication, June 1998) has drawn my attention to the fact that human infants frequently indicate the object to be manipulated by an adult, in the case of a small portable object such as a bowl, by bringing it to the adult or, in the case of a larger object by dragging the adult towards it. It is not clear to me whether this behavior is part of the miming of the action to be performed which is already present in the behavior of chimpanzees or whether it is a separate development, possibly connected to the technology of using containers to collect, store and distribute liquids such as water and milk and solids such as fruit and grain.