The Book

Uncovering the mystery of language

The first two-third of the book deals with the state of art in linguistics and in neuroanatomical, neurophysiological and neurosurgical language research with a review of the achievements of the last decades. The most significant topics in these fields are the followings:


The development of language, from generation of affective sounds to babbling and production of first words and phrases, resembles rather the unfolding of an innate disposition depending on hearing and learning lingual statistics. Babies absorb the statistics of language in their first 10 months of life through a domain-specific ability of categorical generalization and binding the phonemes, which may be called the innate auditory magnet. This mechanism wraps the perceptual auditory space in favour of understanding the ambient language and prepares the brain areas involved in production of syllables, protowords and words for imitation of articulate speech.

The progress in understanding the evolution of language suggests that language arose from animal vocalization. The neural correlates of human language consist of mixtures of shared

and unique structures, reflecting the evolutionary process in which parts of an ancestral primate design is preserved, modified, augmented or lost in the human lineage. Human and large animals share a network of perceptive-motor pacemaker neurons that respond to minimal inner and/or environmental stimuli with production of meaningful vocalization.

Language seems to have arisen from quasi-voluntary affective and play-like vocalizations of non-human primates, through the achievement of the voluntary control of the vocal tract musculature and sequencing of voice. During the period of canonical babbling, the will to vocalize becomes associated with the ability to sequence the emotional vocalizations into articulate syllabification through interaction of the midcingulate cortex-supplementary motor area (MCC-SMA) complex with basal ganglia and anterior insula. The activity in the anterior insula can easily propagate to the neighbouring laryngeal motor cortex. The enlargement of the lexicon and symbolic capacity of the human brain may be related to the enlarged prefrontal cortex (PFC) and functionally differentiated neurons in planum temporale, which were probably already lateralized to the left hemisphere in the common ancestor of humans and apes living around 12 million years ago. The signaling system of affiliative communicative lipsmacking, originally used for uttering the good taste of food by monkeys, might have been a gestural-vocal precursor of the dynamic organization of syllables in human speech.

Accumulated data reveals that many primate vocalizations are functionally referential, providing listeners with information about food, predators and social issues, though without intention of vocalizers. In a few monkeys, this feature is associated with vocal learning ability. Campbell´s monkeys have a system of six calls with some similar prefixes and suffixes that may be considered "protowords". They can combine these "word formations" in certain orders that imply different meanings. This may be the most complex example of a "proto-syntax" in animal communication. Consequently, these call combinations could be considered a form of "proto-speech". The vocal behaviour of marmoset monkey infants demonstrate a transition from cries to phee-calls resembling babbling in human infants. Vocalization is one of the elementary behavioural movements induced in the anterior mid-cingulate cortex harbouring the cingulate motor areas (CMAs). There are close connections and interactions between the insula, basal ganglia and mid-cingulate gyrus subserving voice production. Language-related activities encompass large areas of the pre-SMA and SMA, anterior and posterior MCC, including the rostral and dorsal CMAs, and a limited area of the PCC. The MCC language-implementing regions show a large overlap with areas involved in attention and memory functions, action-execution, as well as those involved in the processing of pain and emotions. Bilateral lesions of the CMA and SMA, usually due to bilateral occlusion of the anterior cerebral artery, are reported to be produce long-lasting akinetic mutism, a state of lack or poverty of speech, profound apathy with evidence of preserved awareness.

The core neural structures involved in speech of young children and vocalization of monkeys and apes are identical. They include the ACC-SMA complex, PAG and brainstem motoneurons, basal ganglia, cerebellum and insula. A real progress in understanding the neuroanatomy of human language is the recognizing that the integration of the MCC harbouring the rostral and caudal CMAs and the presupplementary motor area-supplementary motor area (pre-SMA-SMA) complex may have been a major milestone in the development of human language. Several researchers have emphasized this feature in human language system, but it has been apparently underestimated. Through this fusion, the cingulate circuitries involved in instinctive and quasi-voluntary vocalization are connected with SMA-complex involved in learned articulation. The cingulate-SMA complex is then incorporated in the developing cortical speech network through the frontal aslant tract (FAT), which connects the MCC, SMA and pre-SMA to the posterior Broca´s area. The FAT is suggested to provide monosynaptic connectivity between the anterior cingulate gyrus and posterior inferior frontal gyrus (IFG). Damage to the left CMA-SMA and/or interruption of their connections with Broca´s area through FAT may cause “transcortical motor aphasia”. This nonfluent speech disorder, characterized by a lack of spontaneous speech production with preserved ability in object naming, repetition and ordinary or mildly reduced speech comprehension, may also be called limbic aphasia. Other limbic pathways involved in connection of the PFC with the insular, temporal and striatal language zones are the uncinate fasciculus and fronto-striate tract. The strengthening of the CMA-SMA-PFC connections subordinated the vocalization to voluntary activity of the SMA, Broca´s area, premotor and motor cortex. Most likely, these changes occurred in the same evolutionary period that the larynx descended to the lower pharynx. The enlargement, elongation and descent of the vocal tract has been a significant factor enabling the production of articulate speech in humans.

The cingulate vocalization area remains active in adulthood. It plays an important role in effortful and deliberate speech, and in the automatization of speech by exerting controlled processing over the performance until the process is governed automatically by the SMA, basal ganglia, cerebellum, anterior insula, Wernicke´s and Broca´s areas. When motivation and sustained attention to master effortful and concentrated speech production is no longer required, limbic activity returns to the baseline level. When initiating a complex discourse, naming novel objects, using an idiom not quite familiar to us or seeking the words to discuss sophisticated concepts, language use feels more conscious and effortful, partially through switching in the cingulate cortex.

Limbic speech is a set of affectively charged utterances mediated by connections between vocalization areas in the CMAs, paralimbic areas, in particular the pre-SMA-SMA complex, Broca´s area and other speech centres, including anterior insula. The vocalization area can stimulate, through its outputs to the SMA complex and FAT, Broca´s area and the ventral motor cortex, resulting in the production of certain words and phrases. Some patients with profound aphasia can sometimes utter words or short sentences that clearly have a high affective load. These vestigial limbic utterances include emotional expressions, words or phrases of deep personal significance, affirmations, denials and profane and vulgar curse words. They are characteristically expressed during periods of excitement.

The mid-cingulate cortex has connections with all major nodes of language perception and production. These connections maintain their initiating and energizing functions in humans. The limbic motivation in infancy and early childhood is an incredible resource for speech acquisition. The withdrawal of the limbic facilitation produces mutism, and heightened limbic facilitation results in qualitative and quantitative improvement in non-fluent dysphasias and may increase the rate and degree of ultimate recovery. These observations have important therapeutic consequences for the treatment and rehabilitation of aphasic patients. Although the plasticity of limbic pathways is reduced in adulthood, limbic motivational, stimulating and facilitatory resources can still contribute to the recovery of partially damaged speech areas and the recruitment of new neuronal assemblies for speech processing.


Results of research in genetics of language show that there are only about two dozen genes estimated to be present in humans and not in chimpanzee. The human and chimpanzee lineages diverged about 4.6-6.2 mya. During this time, two fixed amino-acid changes occurred on the human lineage, whereas none occurred on the chimpanzee and the other primate, except for one change on the orangutan lineage. These known two mutations in the humanized version of the FOXP2 has enhanced the synaptic plasticity and dendritic connectivity in the basal ganglia neurons, which have a key role in linguistic learning and speech production. The third mutation involving the factor POU3F2 on the FOXP2 appears to have enhanced the expression and efficiency of transcription by FOXP2. This last genetic change has been absent in Neanderthals and earlier humans. Thus, it appears that the evolution of language is supported by incremental steps of genetic complexity guided by natural selection leading to the enlargement of human brain and emergence of human language.


For a long time, it has been assumed that the processing of syntax is implanted by Broca´s area, whereas the semantic processing is accomplished by interaction of left PFC and left posterior temporal cortex. In a series of well-designed neuroimaging investigations, researchers in Max Planck Institute for Cognitive and Brain Sciences in Leipzig, Germany, have shown that in children younger than three to four years, the syntactic-semantic processing of sentences is not yet segregated. These processes are initially implemented undifferentiated in an area in the left posterior STG/ middle STG. This pattern seems to be correlated with the behavioural observation that by 3-4 years of age, children rely more on semantic rather than syntactic cues in language comprehension. Between the ages of 5 and 6 years children activate areas in the left and right IFG during semantic-syntactic sentence processing in addition to the largely overlapping regions in the left STG and frontal operculum. There is a distinct main effect for syntax and semantics in six to seven year old children, with the segregation of the related operations and shift of the hemodynamic correlates of syntax towards the posterior STG, and those of semantics towards the anterior STG. In 9 to 10 year olds, the hemodynamic activity underlying syntax processing shifts into the left IFG. The recruitment of Broca´s area for the processing of complex grammatical relations is thus the result of a long-lasting process of construction of reciprocal connections between the left posterior STG and IFG. This discovery is of high heuristic significance. The fact that syntax is born in the left STG in close relationship to the archaic auditory cortex and planum temporale, and not in the neocortical left IFG, demonstrates that we have to look for the roots of grammar in the paleocortical brain structures that are present, myelinated and functioning at birth and soon thereafter.


Language is vocalization plus learning by imitation. Vocal imitation in its simplest form is the ability to reproduce a novel version of a vocal signal produced by another individual. It is present in three groups of birds, including parrots and songbirds, and in four distantly related groups of primates and mammals. There are astonishing similarities between the genetic basis, neural structure and development of communication behaviour of these groups of distantly related animals. The deep homology describes the existence of the typical and specific correspondences between anatomy and physiology of members of natural groups of living organisms which have no close phylogenetic relationship. The identification of deep homologies among birds and mammals with vocal learning capacity may play a central role in understanding the phylogenetic trajectory of the evolution of language.

The author has suggested, on the basis of convergent neuroimaging and neurophysiological evidence, that the early infantile vocal imitation and learning are mediated by an archaic copy device at the depth of the brain. The cingulate motor areas, in particular the anterior CMA, provide the neural correlates for elementary behaviours including vocalization, mimicry, gesticulation and articulation of vocalized gestures, be they barking or braying or human speech. The face areas in both the rostral and caudal CMAs contain neuronal collections that can induce eye and orofacial movements in response to minimal environmental sensory data. The rich visual, auditory and sensory input to the CMAs makes it possible to translate what is observed and heard to the executed behaviour.


Imitation supports not only the learning of morphological features of the words, but also their order. Syntax refers to the rules for arranging sounds, words, word parts and phrases into their possible permissible combinations determining the answer to the question: Who did what to whom? The message is embedded in the deep structure of sentences, where the relations between verb and its arguments are clearly defined. The verbum mediates agency. There is evidence that learning syntax is guided not only by imitation of linguistic procedures, but also by human innate knowledge of identification of agency as the animate form of evident causality.

Speech contains a host of statistical regularities that are sufficient to support the kind of robust computational learning. The brain of human infants computes statistical regularities present in a set of words, pays attention to the statistical distributions, extracts information revealing a novelty preference and categorizes words by grammatical class. Infants are capable of sophisticated language learning abilities from seven months of age. They understand that the repetition is something special, and can identify the position of syllables in relation to begin and end of the artificial words as an edge-based positional code. Segmentation of words from fluent speech can be accomplished by eight month old infants based solely on the transitional probabilities. Twelve month old infants can learn more complex transitional probabilities, and could distinguish new grammatical from non-grammatical sequences, suggesting that they are able to generalize the acquired knowledge to new sequences. The looking behaviour of infants changes during a period of their exposure to artificial and natural languages, suggesting that they have become familiar with certain regularities.

In 1980, Chomsky coined the term "poverty of the stimulus" to describe a significant mechanism underlying language acquisition, where an "impoverished and unstructured" linguistic input leads to easy and rapid unsupervised learning of language by young children. The argument of "poverty of stimulus" is narrowly related to the new concept of Universal Grammar (UG), which was introduced by Chomsky in 1965 to characterize an intrinsic underlying rationale common between all languages, causing a familiar deep structural similarity among them, which is well-known to translators. UG was considered the genetic component of the language faculty keeping fix the principles, conditions and rules by which grammars operate. These ideas and the controversy over them have grown to a serious challenge for the neuroscience.

The discoveries of Chomsky are associated with his research on the hierarchical structure of human language. Syntax may be conceptualized as a tree diagram in which noun, verb and object phrases have several components. Chomsky´s computational hierarchy implies a general-to-specific pyramidal order. He showed that in contrast to bird songs, the English constituent structure could not be generated by a finite state grammar, because it includes phrase structures with multiple long-distance dependencies and recursive sentence structures.

Chomskian phrase hierarchy is not a "nested hierarchy". One can remove one or more phrases from a recursive expression, without damaging the structure of the sentence. In grammatical nested hierarchy, in contrary, the removal of either the verb or its arguments changes the meaning or render the sentence imperfect and inconceivable. The critics argue that the sequential grammatical structure is more fundamental to the language use than the hierarchical structure. The horizontal or linear display of the deep semantic hierarchy relationship between actor, verb and patient determines the close and distant dependencies in sentences according to the familiar causality and agency rules of the sequences of actions and events. It corresponds in addition to the anatomical and functional hierarchy of sentence production in semantic, syntactic and phonological order in the prefrontal cortices described in the "cascade theory" of action. It is the grammatical hierarchy that render sentences drivable only in one direction, like a one-way street. Long-distance dependencies are typical features of hierarchical nested systems.

Indeed, neuroimaging studies have shown that identification of changes and violations of hierarchical structures are associated with undifferentiated and unspecific increase in brain areas involved in working memory. The syntactic processing, including that of changes in computational hierarchical structures, takes place within the framework of the verbal working memory. The manipulation of embedding dimension, which includes Chomskian hierarchy and recursion, appears to be dependent on verbal working memory, instead of being related to specific linguistic operations. Thus, neuroimaging studies cannot objectively differentiate the mathematical branching operations underlying the processing of Chomskian type of hierarchy from other mental computational operations.

On the other hand, functional imaging studies are specifically sensitive to syntactic dependencies, verb-based argument hierarchy, and agreement dependency in person, gender and number. The manipulation of argument order, verb class and morphological ambiguity revealed that the left posterior STS is sensitive to morphological information and the syntactic realization of the verb-based argument hierarchy, while the activation of the left pars opercularis corresponded to linearization demands of verb arguments. The verbs are processed in the left posterior superior temporal sulcus (STS), and bilateral temporoparietal junction (TPJ), whereas Broca´s area is involved in determining the correct order of the arguments. These findings suggest that the human brain is sensitive to the grammatical hierarchical relationships between the verb and its argument, which in turn are determined by underlying agency relationships.

Nevertheless, the grammatical hierarchy does not interfere with the vertical tree-like hierarchical organization of sentences. In contrary, these two types of hierarchy have a complementary function in communication of the meaning of sentence. Since the systems that are highly organized are more learnable than those which are not, the generation of the hierarchical structure presumably maximizes cognitive economy, facilitating the encoding and transmission of more complex information than could not otherwise be conveyed in a serial channel. These kinds of organizational principles are the human engineering solution to the problem of serial order.

The grammatical verb-argument hierarchy, in which the agency relationships define dependencies between the top, subject, and bottom, verb phrase, of the sentence system, is based on the semantic links that may be defined as the nesting relationships in theory of systems. In compositional or nested hierarchy the elements composing the lower levels of a system are physically combined or nested within different levels to create complex wholes.

There is a consensus that sentences are tree-like hierarchical structures of nested phrases, characterized by grammatical dependencies of elements to each other.

Verbs convey the central set of events and relationships in which the nouns participate. The basic type of the order of the elements in sentence building in English and Arabian is Subject-Verb-Object (SVO), in Mandarin and Japanese SOV. Together, these two types constitute approximately 90 percent of living languages. Since verb defines the object regardless of preceding or following it, the brain does not differ between SVO and SOV basic types. The integration of noun phrase with predicate is implemented after the verb phrase is determined. Chomsky has called this terminal integration of sentences "Merge". Neuroimaging studies have confirmed that "Merge" is implemented through interaction of the posterior and anterior main language neural nodes. Evidence suggests that children have a strong predisposition in favour of speaking a SVO language. This is not an accident. The SVO type of sentence mediates more naturally the subject-action-patient order of agency.

Verbs describe the causal, temporal, spatial and prepositional relationships of objects and events in a plastic and three-dimensional token. The verbs are the "functional core" of syntax, and the gateway to grammar. They define the agency relations in a sentence.

Infants begin to understand the actional properties of agents at 3 to 4 months of age. Six month old infants appear to already perceive a cause-effect relationship. Infants learn the causality relationships, based on the perceptual categorization of moving objects, even before understanding that the things in the world are named,. The animate and moving causal agents are easier and earlier to comprehend, and therefore they are used as paradigms for sentence construction. The development of affective self-consciousness includes the generation of affective awareness of having a body that acts as an agent. This milestone of infantile mental development is achieved around the age of 6 months, and plays an important role in metamorphosis of affective calls of young infants to babbling and production of first words. The seven to nine month olds represent actions in terms of the intentional and agent-object relationships. By the end of their first year, infants conceptualize the difference between agents and recipients of action. In the course of language acquisition, the simple naming of the objects precedes the expression of their causal and agency relationships, first by gestural expression and followed by concrete verbs.

Toddlers initially produce verbs that signal concrete actions, such as "play" or "eat", or concrete desires with self-agency, such as "want water", or "like cars". Children´s verbs all encode only self-action for several months, and then the observed actions. In the next stage, children learn verbs that are related to the mental states such as the intention, and this knowledge help them later to learn the mental states of others. Thus, learning animate self-agency relationships provide toddlers with a paradigm required for learning verbs pertaining to observed actions and the less transparent aspects of causality. Using this paradigm, humans can convert every existing and even non-existing object and agent, act, state, topic and item to the potential subject matter of the verbal knowledge and conceptualization, treating them as subjects, action-words, verbs, and action-related objects.

The infant cognitive development demonstrates that at least by 9 months of age, infants have developed a conceptual system sufficiently rich to allow engagement in preverbal problem-solving, inductive inference of argument hierarchy and language acquisition. The data above suggests that originally, the deep structure of sentences has been a linguistic copy of the experienced or observed overt causality-agency relationships in animate actions. This paradigm may have been also used during evolution in gestural-verbal protospeech of Homo erectus. The learning syntax is guided by the innate capacity to conceive the causal and agency relationships, initially in biological and then in mechanical movements.

In the animal kingdom, survival depends on the ability to extract the biological motion from other movements in order to identify the intentions of predators and the behaviour of prey and mates. An inborn predisposition to attend to biological motions is presumably part of an evolutionarily ancient and non-species-specific system predisposing animals, including reptiles, birds and primates, to preferentially attend to the sights and sounds of conspecifics and other animate beings. This predisposition is present in human infants at birth. Two day old babies can discriminate between biological and non-biological point-light animations. This ability grows within few months to an innate knowledge about argument hierarchy that predicts subject/object asymmetry and dependency. By 8-9 months, infants can differentiate biological motion and understand simple intentional actions. This early social perception ability soon develops to an incredible toolbox for social cognition, associated with the identification of agents, their goal and intention. Based on this ability, children can correctly use verbs and their arguments that describe actions and events following the rules of causality and agency relationships in their third birthday.


The last third of the book is dealing with the relations between brain areas involved in language perception and production, and those engaged with identification of agency and causality relationships. The author reports here about the results of his own research on these topics.

A network involving the STS, frontal and prefrontal motor, premotor and association, parietal, and temporo-occipital cortices in both hemispheres is involved in processing sights and sounds pertaining to biological motions, gestures, imitation, body language and their underlying intentions. The activation of the posterior temporal-occipital cortex, extended particularly into the TPJ and STS, plays a crucial role in the integration of the inputs related to the features of biological motion. The STS is a sensory integration area, closely interconnected with the inferior parietal lobule (IPL) and the CMA-SMA complex, and involved in the highest level of cortical integration of sensory and limbic information. The identification of biological motions in STS is associated with recognition of agency relationships.

In linguistic context, the left posterior STS-STG, IPL, TPJ and the left IFG are major nodes in a network that is sensitive to the verbs and argument hierarchy. These areas are implicated in language processing mechanisms subserving the semantics-to-syntax interface. In children under 5-6 years, the IFG is not yet sufficiently mature to be recruited for this function. We have to assume that the early gauging of this network is implemented by the left superior temporal and inferior parietal areas. The pregenual and supragenual segments of the ACC, insula and the SMA have been demonstrated to respond preferentially to the biological motions. The perigenual sections of the ACC are known to process stimuli of vital significance for the biological self. The more anterior part of the medial PFC is a major neural node in the network underlying the understanding of the mental states of others in the framework of social cognition. The activation of the bilateral insula, both anterior and posterior, is highly correlated with the sense of agency. The anterior MCC is a “limbic premotor cortex" that is interconnected with the dorsolateral PFC. Both areas are, like the TPJ and insula, major components of the attentional-executive systems, involved in the immediate "instinctual" drawing of attention to the biological motions. The MCC has widespread connections with the auditory cortex, the superior temporal and the temporopolar regions, as well as the parietal and prefrontal areas. It is also heavily connected to the subgenual and perigenual ACC, the posterior cingulate and retrosplenial cortices, as well as the entorhinal and hippocampal limbic association cortices. These areas are interconnected with all other brain regions involved in processing of agency and language. The association cortices of the perigenual ACC and MCC funnel the integrated sensory inputs to the CMAs, allowing their guidance of the motor system and the speech areas, for example by effortful verb generation from visually presented nouns. These connections subserve both the identification of agency relationships and the initiation of self-stimulated speech.

Interestingly, the CMA-SMA complex, which is involved in the generation of communicative gestures and vocalizations, is also engaged in the production of the sense of intention, self-agency and the triggering of voluntary actions. The anterior and middle parts of the medial PFC appear to be involved in feeling the will to talk and the initiation of voluntary speech. Other areas involved in deciphering the meaning of biological movements as well as preparation and initiation of speech are the lateral PFC, including Broca´s area, and the precentral gyrus. Broca´s area is activated by observation, imitation, imagination and the execution of biological movements.

The anatomical picture above suggests that the brain regions involved in syntactic-semantic and morphological processing in the first decade of life have an extensive overlap with those involved in the identification of agency. No less importantly, these areas are interconnected with, and can be stimulated and energized by the vocalization area in the MCC. Regarding the fact that the development of the network involved in identification of biological agents, actions and patients precedes the development of major neural nodes involved in perception and production of language, and their connecting pathways, there remains little doubt that the former guides the development of the latter. The instinctive knowledge of infants and toddlers about biological agency facilitates the development of syntax. It is the Universal Grammar!