Katana VentraIP

Speech perception

Speech perception is the process by which the sounds of language are heard, interpreted, and understood. The study of speech perception is closely linked to the fields of phonology and phonetics in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize speech sounds and use this information to understand spoken language. Speech perception research has applications in building computer systems that can recognize speech, in improving speech recognition for hearing- and language-impaired listeners, and in foreign-language teaching.

The process of perceiving speech begins at the level of the sound signal and the process of audition. (For a complete description of the process of audition see Hearing.) After processing the initial auditory signal, speech sounds are further processed to extract acoustic cues and phonetic information. This speech information can then be used for higher-level language processes, such as word recognition.

Top-down influences[edit]

In a classic experiment, Richard M. Warren (1970) replaced one phoneme of a word with a cough-like sound. Perceptually, his subjects restored the missing speech sound without any difficulty and could not accurately identify which phoneme had been disturbed,[17] a phenomenon known as the phonemic restoration effect. Therefore, the process of speech perception is not necessarily uni-directional.


Another basic experiment compared recognition of naturally spoken words within a phrase versus the same words in isolation, finding that perception accuracy usually drops in the latter condition. To probe the influence of semantic knowledge on perception, Garnes and Bond (1976) similarly used carrier sentences where target words only differed in a single phoneme (bay/day/gay, for example) whose quality changed along a continuum. When put into different sentences that each naturally led to one interpretation, listeners tended to judge ambiguous words according to the meaning of the whole sentence[18] .[19] That is, higher-level language processes connected with morphology, syntax, or semantics may interact with basic speech perception processes to aid in recognition of speech sounds.


It may be the case that it is not necessary and maybe even not possible for a listener to recognize phonemes before recognizing higher units, like words for example. After obtaining at least a fundamental piece of information about phonemic structure of the perceived entity from the acoustic signal, listeners can compensate for missing or noise-masked phonemes using their knowledge of the spoken language. Compensatory mechanisms might even operate at the sentence level such as in learned songs, phrases and verses, an effect backed-up by neural coding patterns consistent with the missed continuous speech fragments,[20] despite the lack of all relevant bottom-up sensory input.

Infant speech perception[edit]

Infants begin the process of language acquisition by being able to detect very small differences between speech sounds. They can discriminate all possible speech contrasts (phonemes). Gradually, as they are exposed to their native language, their perception becomes language-specific, i.e. they learn how to ignore the differences within phonemic categories of the language (differences that may well be contrastive in other languages – for example, English distinguishes two voicing categories of plosives, whereas Thai has three categories; infants must learn which differences are distinctive in their native language uses, and which are not). As infants learn how to sort incoming speech sounds into categories, ignoring irrelevant differences and reinforcing the contrastive ones, their perception becomes categorical. Infants learn to contrast different vowel phonemes of their native language by approximately 6 months of age. The native consonantal contrasts are acquired by 11 or 12 months of age.[28] Some researchers have proposed that infants may be able to learn the sound categories of their native language through passive listening, using a process called statistical learning. Others even claim that certain sound categories are innate, that is, they are genetically specified (see discussion about innate vs. acquired categorical distinctiveness).


If day-old babies are presented with their mother's voice speaking normally, abnormally (in monotone), and a stranger's voice, they react only to their mother's voice speaking normally. When a human and a non-human sound is played, babies turn their head only to the source of human sound. It has been suggested that auditory learning begins already in the pre-natal period.[29]


One of the techniques used to examine how infants perceive speech, besides the head-turn procedure mentioned above, is measuring their sucking rate. In such an experiment, a baby is sucking a special nipple while presented with sounds. First, the baby's normal sucking rate is established. Then a stimulus is played repeatedly. When the baby hears the stimulus for the first time the sucking rate increases but as the baby becomes habituated to the stimulation the sucking rate decreases and levels off. Then, a new stimulus is played to the baby. If the baby perceives the newly introduced stimulus as different from the background stimulus the sucking rate will show an increase.[29] The sucking-rate and the head-turn method are some of the more traditional, behavioral methods for studying speech perception. Among the new methods (see Research methods below) that help us to study speech perception, near-infrared spectroscopy is widely used in infants.[28]


It has also been discovered that even though infants' ability to distinguish between the different phonetic properties of various languages begins to decline around the age of nine months, it is possible to reverse this process by exposing them to a new language in a sufficient way. In a research study by Patricia K. Kuhl, Feng-Ming Tsao, and Huei-Mei Liu, it was discovered that if infants are spoken to and interacted with by a native speaker of Mandarin Chinese, they can actually be conditioned to retain their ability to distinguish different speech sounds within Mandarin that are very different from speech sounds found within the English language. Thus proving that given the right conditions, it is possible to prevent infants' loss of the ability to distinguish speech sounds in languages other than those found in the native language.[30]

Cross-language and second-language[edit]

A large amount of research has studied how users of a language perceive foreign speech (referred to as cross-language speech perception) or second-language speech (second-language speech perception). The latter falls within the domain of second language acquisition.


Languages differ in their phonemic inventories. Naturally, this creates difficulties when a foreign language is encountered. For example, if two foreign-language sounds are assimilated to a single mother-tongue category the difference between them will be very difficult to discern. A classic example of this situation is the observation that Japanese learners of English will have problems with identifying or distinguishing English liquid consonants /l/ and /r/ (see Perception of English /r/ and /l/ by Japanese speakers).[31]


Best (1995) proposed a Perceptual Assimilation Model which describes possible cross-language category assimilation patterns and predicts their consequences.[32] Flege (1995) formulated a Speech Learning Model which combines several hypotheses about second-language (L2) speech acquisition and which predicts, in simple words, that an L2 sound that is not too similar to a native-language (L1) sound will be easier to acquire than an L2 sound that is relatively similar to an L1 sound (because it will be perceived as more obviously "different" by the learner).[33]

Speech phenomenology[edit]

The experience of speech[edit]

Casey O'Callaghan, in his article Experiencing Speech, analyzes whether "the perceptual experience of listening to speech differs in phenomenal character"[39] with regards to understanding the language being heard. He argues that an individual's experience when hearing a language they comprehend, as opposed to their experience when hearing a language they have no knowledge of, displays a difference in phenomenal features which he defines as "aspects of what an experience is like"[39] for an individual.


If a subject who is a monolingual native English speaker is presented with a stimulus of speech in German, the string of phonemes will appear as mere sounds and will produce a very different experience than if exactly the same stimulus was presented to a subject who speaks German.


He also examines how speech perception changes when one learning a language. If a subject with no knowledge of the Japanese language was presented with a stimulus of Japanese speech, and then was given the exact same stimuli after being taught Japanese, this same individual would have an extremely different experience.

Weak version – listening to speech engages previous knowledge of language.

Strong version – listening to speech engages specialized speech mechanisms for perceiving speech.

Related to the case study of

Genie (feral child)

Neurocomputational speech processing

Multisensory integration

Origin of speech

Speech-Language Pathology

Motor theory of speech perception

Dedicated issue of Philosophical Transactions B on the Perception of Speech. Some articles are freely available.