Molecular clock

The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleotide sequences for DNA, RNA, or amino acid sequences for proteins.

Not to be confused with Chemical clock or Biological clock.

Early discovery and genetic equidistance[edit]

The notion of the existence of a so-called "molecular clock" was first attributed to Émile Zuckerkandl and Linus Pauling who, in 1962, noticed that the number of amino acid differences in hemoglobin between different lineages changes roughly linearly with time, as estimated from fossil evidence.^[1] They generalized this observation to assert that the rate of evolutionary change of any specified protein was approximately constant over time and over different lineages (known as the molecular clock hypothesis).

The genetic equidistance phenomenon was first noted in 1963 by Emanuel Margoliash, who wrote: "It appears that the number of residue differences between cytochrome c of any two species is mostly conditioned by the time elapsed since the lines of evolution leading to these two species originally diverged. If this is correct, the cytochrome c of all mammals should be equally different from the cytochrome c of all birds. Since fish diverges from the main stem of vertebrate evolution earlier than either birds or mammals, the cytochrome c of both mammals and birds should be equally different from the cytochrome c of fish. Similarly, all vertebrate cytochrome c should be equally different from the yeast protein."^[2] For example, the difference between the cytochrome c of a carp and a frog, turtle, chicken, rabbit, and horse is a very constant 13% to 14%. Similarly, the difference between the cytochrome c of a bacterium and yeast, wheat, moth, tuna, pigeon, and horse ranges from 64% to 69%. Together with the work of Emile Zuckerkandl and Linus Pauling, the genetic equidistance result led directly to the formal postulation of the molecular clock hypothesis in the early 1960s.^[3]

Similarly, Vincent Sarich and Allan Wilson in 1967 demonstrated that molecular differences among modern Primates in albumin proteins showed that approximately constant rates of change had occurred in all the lineages they assessed.^[4] The basic logic of their analysis involved recognizing that if one species lineage had evolved more quickly than a sister species lineage since their common ancestor, then the molecular differences between an outgroup (more distantly related) species and the faster-evolving species should be larger (since more molecular changes would have accumulated on that lineage) than the molecular differences between the outgroup species and the slower-evolving species. This method is known as the relative rate test. Sarich and Wilson's paper reported, for example, that human (Homo sapiens) and chimpanzee (Pan troglodytes) albumin immunological cross-reactions suggested they were about equally different from Ceboidea (New World Monkey) species (within experimental error). This meant that they had both accumulated approximately equal changes in albumin since their shared common ancestor. This pattern was also found for all the primate comparisons they tested. When calibrated with the few well-documented fossil branch points (such as no Primate fossils of modern aspect found before the K-T boundary), this led Sarich and Wilson to argue that the human-chimp divergence probably occurred only ~4–6 million years ago.^[5]

Relationship with neutral theory[edit]

The observation of a clock-like rate of molecular change was originally purely phenomenological. Later, the work of Motoo Kimura^[6] developed the neutral theory of molecular evolution, which predicted a molecular clock. Let there be N individuals, and to keep this calculation simple, let the individuals be haploid (i.e. have one copy of each gene). Let the rate of neutral mutations (i.e. mutations with no effect on fitness) in a new individual be $\mu$ . The probability that this new mutation will become fixed in the population is then 1/N, since each copy of the gene is as good as any other. Every generation, each individual can have new mutations, so there are $\mu$ N new neutral mutations in the population as a whole. That means that each generation, $\mu$ new neutral mutations will become fixed. If most changes seen during molecular evolution are neutral, then fixations in a population will accumulate at a clock-rate that is equal to the rate of neutral mutations in an individual.

Nevertheless, there is some controversy regarding this view. Recently, Radrizzani et al. suggested the "unwanted transcript hypothesis" to explain differential substitution at synonymous sites^[7], a well known issue of the molecular clock regarding eukarya evolution^[8].

Changing generation times (If the rate of new mutations depends at least partly on the number of generations rather than the number of years)

Population size ( is stronger in small populations, and so more mutations are effectively neutral)

Genetic drift

Species-specific differences (due to differing metabolism, ecology, evolutionary history, ...)

Change in function of the protein studied (can be avoided in closely related species by utilizing sequences or emphasizing silent mutations)

non-coding DNA

Changes in the intensity of natural selection.

Sometimes only a single divergence date can be estimated from fossils, with all other dates inferred from that. Other sets of species have abundant fossils available, allowing the hypothesis of constant divergence rates to be tested. DNA sequences experiencing low levels of negative selection showed divergence rates of 0.7–0.8% per Myr in bacteria, mammals, invertebrates, and plants.^[27] In the same study, genomic regions experiencing very high negative or purifying selection (encoding rRNA) were considerably slower (1% per 50 Myr).

In addition to such variation in rate with genomic position, since the early 1990s variation among taxa has proven fertile ground for research too,^[28] even over comparatively short periods of evolutionary time (for example mockingbirds^[29]). Tube-nosed seabirds have molecular clocks that on average run at half speed of many other birds,^[30] possibly due to long generation times, and many turtles have a molecular clock running at one-eighth the speed it does in small mammals, or even slower.^[31] Effects of small population size are also likely to confound molecular clock analyses. Researchers such as Francisco J. Ayala have more fundamentally challenged the molecular clock hypothesis.^[32]^[33]^[34] According to Ayala's 1999 study, five factors combine to limit the application of molecular clock models:

Molecular clock users have developed workaround solutions using a number of statistical approaches including maximum likelihood techniques and later Bayesian modeling. In particular, models that take into account rate variation across lineages have been proposed in order to obtain better estimates of divergence times. These models are called relaxed molecular clocks^[35] because they represent an intermediate position between the 'strict' molecular clock hypothesis and Joseph Felsenstein's many-rates model^[36] and are made possible through MCMC techniques that explore a weighted range of tree topologies and simultaneously estimate parameters of the chosen substitution model. It must be remembered that divergence dates inferred using a molecular clock are based on statistical inference and not on direct evidence.

The molecular clock runs into particular challenges at very short and very long timescales. At long timescales, the problem is saturation. When enough time has passed, many sites have undergone more than one change, but it is impossible to detect more than one. This means that the observed number of changes is no longer linear with time, but instead flattens out. Even at intermediate genetic distances, with phylogenetic data still sufficient to estimate topology, signal for the overall scale of the tree can be weak under complex likelihood models, leading to highly uncertain molecular clock estimates.^[37]

At very short time scales, many differences between samples do not represent fixation of different sequences in the different populations. Instead, they represent alternative alleles that were both present as part of a polymorphism in the common ancestor. The inclusion of differences that have not yet become fixed leads to a potentially dramatic inflation of the apparent rate of the molecular clock at very short timescales.^[23]^[38]

Uses[edit]

The molecular clock technique is an important tool in molecular systematics, macroevolution, and phylogenetic comparative methods. Estimation of the dates of phylogenetic events, including those not documented by fossils, such as the divergences between living taxa has allowed the study of macroevolutionary processes in organisms that had limited fossil records. Phylogenetic comparative methods rely heavily on calibrated phylogenies.

Charles Darwin

Gene orders

Human mitochondrial molecular clock

and Y-chromosomal Adam

Mitochondrial Eve

Models of DNA evolution

Molecular evolution

Neutral theory of molecular evolution

Glottochronology

Ho, S.Y.W., ed. (2020). The Molecular Evolutionary Clock: Theory and Practice. Springer, Cham. :10.1007/978-3-030-60181-2. ISBN 978-3-030-60180-5. S2CID 231672167.

doi

Kumar S (August 2005). "Molecular clocks: four decades of evolution". Nature Reviews. Genetics. 6 (8): 654–662. :10.1038/nrg1659. PMID 16136655. S2CID 14261833.

doi

Morgan GJ (1998). "Emile Zuckerkandl, Linus Pauling, and the molecular evolutionary clock, 1959-1965". Journal of the History of Biology. 31 (2): 155–178. :10.1023/A:1004394418084. PMID 11620303. S2CID 5660841.

doi

, Pauling LB (1965). "Evolutionary divergence and convergence in proteins". In Bryson V, Vogel HJ (eds.). Evolving Genes and Proteins. Academic Press, New York. pp. 97–166.