Proto-language
In the tree model of historical linguistics, a proto-language is a postulated ancestral language from which a number of attested languages are believed to have descended by evolution, forming a language family. Proto-languages are usually unattested, or partially attested at best. They are reconstructed by way of the comparative method.[1]
For other uses, see Proto-language (disambiguation).
In the family tree metaphor, a proto-language can be called a mother language. Occasionally, the German term Ursprache (from Ur- "primordial, original", and Sprache "language", pronounced [ˈuːɐ̯ʃpʁaːxə] ) is used instead. It is also sometimes called the common or primitive form of a language (e.g. Common Germanic, Primitive Norse).[1]
In the strict sense, a proto-language is the most recent common ancestor of a language family, immediately before the family started to diverge into the attested daughter languages. It is therefore equivalent with the ancestral language or parental language of a language family.[2]
Moreover, a group of languages (such as a dialect cluster) which are not considered separate languages (for whichever reasons) may also be described as descending from a unitary proto-language.
Definition and verification[edit]
Typically, the proto-language is not known directly. It is by definition a linguistic reconstruction formulated by applying the comparative method to a group of languages featuring similar characteristics.[3] The tree is a statement of similarity and a hypothesis that the similarity results from descent from a common language.
The comparative method, a process of deduction, begins from a set of characteristics, or characters, found in the attested languages. If the entire set can be accounted for by descent from the proto-language, which must contain the proto-forms of them all, the tree, or phylogeny, is regarded as a complete explanation and by Occam's razor, is given credibility. More recently, such a tree has been termed "perfect" and the characters labelled "compatible".
No trees but the smallest branches are ever found to be perfect, in part because languages also evolve through horizontal transfer with their neighbours. Typically, credibility is given to the hypotheses of highest compatibility. The differences in compatibility must be explained by various applications of the wave model. The level of completeness of the reconstruction achieved varies, depending on how complete the evidence is from the descendant languages and on the formulation of the characters by the linguists working on it. Not all characters are suitable for the comparative method. For example, lexical items that are loans from a different language do not reflect the phylogeny to be tested, and, if used, will detract from the compatibility. Getting the right dataset for the comparative method is a major task in historical linguistics.
Some universally accepted proto-languages are Proto-Afroasiatic, Proto-Indo-European, Proto-Uralic, and Proto-Dravidian.
In a few fortuitous instances, which have been used to verify the method and the model (and probably ultimately inspired it), a literary history exists from as early as a few millennia ago, allowing the descent to be traced in detail. The early daughter languages, and even the proto-language itself, may be attested in surviving texts. For example, Latin is the proto-language of the Romance language family, which includes such modern languages as French, Italian, Portuguese, Romanian, Catalan and Spanish. Likewise, Proto-Norse, the ancestor of the modern Scandinavian languages, is attested, albeit in fragmentary form, in the Elder Futhark. Although there are no very early Indo-Aryan inscriptions, the Indo-Aryan languages of modern India all go back to Vedic Sanskrit (or dialects very closely related to it), which has been preserved in texts accurately handed down by parallel oral and written traditions for many centuries.
The first person to offer systematic reconstructions of an unattested proto-language was August Schleicher; he did so for Proto-Indo-European in 1861.[4]
Proto-X vs. Pre-X[edit]
Normally, the term "Proto-X" refers to the last common ancestor of a group of languages, occasionally attested but most commonly reconstructed through the comparative method, as with Proto-Indo-European and Proto-Germanic. An earlier stage of a single language X, reconstructed through the method of internal reconstruction, is termed "Pre-X", as in Pre–Old Japanese.[5] It is also possible to apply internal reconstruction to a proto-language, obtaining a pre-proto-language, such as Pre-Proto-Indo-European.[6]
Both prefixes are sometimes used for an unattested stage of a language without reference to comparative or internal reconstruction. "Pre-X" is sometimes also used for a postulated substratum, as in the Pre-Indo-European languages believed to have been spoken in Europe and South Asia before the arrival there of Indo-European languages.
When multiple historical stages of a single language exist, the oldest attested stage is normally termed "Old X" (e.g. Old English and Old Japanese). In other cases, such as Old Irish and Old Norse, the term refers to the language of the oldest known significant texts. Each of these languages has an older stage (Primitive Irish and Proto-Norse respectively) that is attested only fragmentarily.