DNA methylation
DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter, DNA methylation typically acts to repress gene transcription. In mammals, DNA methylation is essential for normal development and is associated with a number of key processes including genomic imprinting, X-chromosome inactivation, repression of transposable elements, aging, and carcinogenesis.
As of 2016, two nucleobases have been found on which natural, enzymatic DNA methylation takes place: adenine and cytosine. The modified bases are N6-methyladenine,[1] 5-methylcytosine[2] and N4-methylcytosine.[3]
Cytosine methylation is widespread in both eukaryotes and prokaryotes, even though the rate of cytosine DNA methylation can differ greatly between species: 14% of cytosines are methylated in Arabidopsis thaliana, 4% to 8% in Physarum,[4] 7.6% in Mus musculus, 2.3% in Escherichia coli, 0.03% in Drosophila; methylation is essentially undetectable in Dictyostelium;[5][6] and virtually absent (0.0002 to 0.0003%) from Caenorhabditis[7] or fungi such as Saccharomyces cerevisiae and S. pombe (but not N. crassa).[8][9]: 3699 Adenine methylation has been observed in bacterial, plant, and recently in mammalian DNA,[10][11] but has received considerably less attention.
Methylation of cytosine to form 5-methylcytosine occurs at the same 5 position on the pyrimidine ring where the DNA base thymine's methyl group is located; the same position distinguishes thymine from the analogous RNA base uracil, which has no methyl group. Spontaneous deamination of 5-methylcytosine converts it to thymine. This results in a T:G mismatch. Repair mechanisms then correct it back to the original C:G pair; alternatively, they may substitute A for G, turning the original C:G pair into a T:A pair, effectively changing a base and introducing a mutation. This misincorporated base will not be corrected during DNA replication as thymine is a DNA base. If the mismatch is not repaired and the cell enters the cell cycle the strand carrying the T will be complemented by an A in one of the daughter cells, such that the mutation becomes permanent. The near-universal use of thymine exclusively in DNA and uracil exclusively in RNA may have evolved as an error-control mechanism, to facilitate the removal of uracils generated by the spontaneous deamination of cytosine.[12] DNA methylation as well as many of its contemporary DNA methyltransferases have been thought to evolve from early world primitive RNA methylation activity and is supported by several lines of evidence.[13]
In plants and other organisms, DNA methylation is found in three different sequence contexts: CG (or CpG), CHG or CHH (where H correspond to A, T or C). In mammals however, DNA methylation is almost exclusively found in CpG dinucleotides, with the cytosines on both strands being usually methylated. Non-CpG methylation can however be observed in embryonic stem cells,[14][15][16] and has also been indicated in neural development.[17] Furthermore, non-CpG methylation has also been observed in hematopoietic progenitor cells, and it occurred mainly in a CpApC sequence context.[18]
In plants[edit]
Significant progress has been made in understanding DNA methylation in the model plant Arabidopsis thaliana. DNA methylation in plants differs from that of mammals: while DNA methylation in mammals mainly occurs on the cytosine nucleotide in a CpG site, in plants the cytosine can be methylated at CpG, CpHpG, and CpHpH sites, where H represents any nucleotide but not guanine.[74] Overall, Arabidopsis DNA is highly methylated, mass spectrometry analysis estimated 14% of cytosines to be modified.[9]: abstract Later, bisulfite sequencing data estimated that around 25% of Arabidopsis CG sites are methylated, but these levels vary based on the geographic location of Arabidopsis accessions (plants in the north are more highly methylated than southern accessions).[75]
The principal Arabidopsis DNA methyltransferase enzymes, which transfer and covalently attach methyl groups onto DNA, are DRM2, MET1, and CMT3. Both the DRM2 and MET1 proteins share significant homology to the mammalian methyltransferases DNMT3 and DNMT1, respectively, whereas the CMT3 protein is unique to the plant kingdom. There are currently two classes of DNA methyltransferases: 1) the de novo class or enzymes that create new methylation marks on the DNA; 2) a maintenance class that recognizes the methylation marks on the parental strand of DNA and transfers new methylation to the daughter strands after DNA replication. DRM2 is the only enzyme that has been implicated as a de novo DNA methyltransferase. DRM2 has also been shown, along with MET1 and CMT3 to be involved in maintaining methylation marks through DNA replication.[76] Other DNA methyltransferases are expressed in plants but have no known function (see the Chromatin Database).
Genome-wide levels of DNA methylation vary widely between plant species, and Arabidopsis cytosines tend to be less densely methylated than those in other plants. For example, ~92.5% of CpG cytosines are methylated in Beta vulgaris.[77] The patterns of methylation also differ between cytosine sequence contexts; universally, CpG methylation is higher than CHG and CHH methylation, and CpG methylation can be found in both active genes and transposable elements, while CHG and CHH are usually relegated to silenced transposable elements.[78][74]
It is not clear how the cell determines the locations of de novo DNA methylation, but evidence suggests that for many (though not all) locations, RNA-directed DNA methylation (RdDM) is involved. In RdDM, specific RNA transcripts are produced from a genomic DNA template, and this RNA forms secondary structures called double-stranded RNA molecules.[79] The double-stranded RNAs, through either the small interfering RNA (siRNA) or microRNA (miRNA) pathways direct de-novo DNA methylation of the original genomic location that produced the RNA.[79] This sort of mechanism is thought to be important in cellular defense against RNA viruses and/or transposons, both of which often form a double-stranded RNA that can be mutagenic to the host genome. By methylating their genomic locations, through an as yet poorly understood mechanism, they are shut off and are no longer active in the cell, protecting the genome from their mutagenic effect. Recently, it was described that methylation of the DNA is the main determinant of embryogenic cultures formation from explants in woody plants and is regarded the main mechanism that explains the poor response of mature explants to somatic embryogenesis in the plants (Isah 2016).
In fungi[edit]
Many fungi have low levels (0.1 to 0.5%) of cytosine methylation, whereas other fungi have as much as 5% of the genome methylated.[88] This value seems to vary both among species and among isolates of the same species.[89] There is also evidence that DNA methylation may be involved in state-specific control of gene expression in fungi. However, at a detection limit of 250 attomoles by using ultra-high sensitive mass spectrometry DNA methylation was not confirmed in single cellular yeast species such as Saccharomyces cerevisiae or Schizosaccharomyces pombe, indicating that yeasts do not possess this DNA modification.[9]: abstract
Although brewers' yeast (Saccharomyces), fission yeast (Schizosaccharomyces), and Aspergillus flavus[90] have no detectable DNA methylation, the model filamentous fungus Neurospora crassa has a well-characterized methylation system.[91] Several genes control methylation in Neurospora and mutation of the DNA methyl transferase, dim-2, eliminates all DNA methylation but does not affect growth or sexual reproduction. While the Neurospora genome has very little repeated DNA, half of the methylation occurs in repeated DNA including transposon relics and centromeric DNA. The ability to evaluate other important phenomena in a DNA methylase-deficient genetic background makes Neurospora an important system in which to study DNA methylation.
DNA methylation can be detected by the following assays currently used in scientific research:[99]
Differentially methylated regions (DMRs)[edit]
Differentially methylated regions, which are genomic regions with different methylation statuses among multiple samples (tissues, cells, individuals or others), are regarded as possible functional regions involved in gene transcriptional regulation. The identification of DMRs among multiple tissues (T-DMRs) provides a comprehensive survey of epigenetic differences among human tissues.[111] For example, these methylated regions that are unique to a particular tissue allow individuals to differentiate between tissue type, such as semen and vaginal fluid. Current research conducted by Lee et al., showed DACT1 and USP49 positively identified semen by examining T-DMRs.[112] The use of T-DMRs has proven useful in the identification of various body fluids found at crime scenes. Researchers in the forensic field are currently seeking novel T-DMRs in genes to use as markers in forensic DNA analysis. DMRs between cancer and normal samples (C-DMRs) demonstrate the aberrant methylation in cancers.[113] It is well known that DNA methylation is associated with cell differentiation and proliferation.[114] Many DMRs have been found in the development stages (D-DMRs)[115] and in the reprogrammed progress (R-DMRs).[116] In addition, there are intra-individual DMRs (Intra-DMRs) with longitudinal changes in global DNA methylation along with the increase of age in a given individual.[117] There are also inter-individual DMRs (Inter-DMRs) with different methylation patterns among multiple individuals.[118]
QDMR (Quantitative Differentially Methylated Regions) is a quantitative approach to quantify methylation difference and identify DMRs from genome-wide methylation profiles by adapting Shannon entropy.[119] The platform-free and species-free nature of QDMR makes it potentially applicable to various methylation data. This approach provides an effective tool for the high-throughput identification of the functional regions involved in epigenetic regulation. QDMR can be used as an effective tool for the quantification of methylation difference and identification of DMRs across multiple samples.[120]
Gene-set analysis (a.k.a. pathway analysis; usually performed tools such as DAVID, GoSeq or GSEA) has been shown to be severely biased when applied to high-throughput methylation data (e.g. MeDIP-seq, MeDIP-ChIP, HELP-seq etc.), and a wide range of studies have thus mistakenly reported hyper-methylation of genes related to development and differentiation; it has been suggested that this can be corrected using sample label permutations or using a statistical model to control for differences in the numbers of CpG probes / CpG sites that target each gene.[121]
Computational prediction[edit]
DNA methylation can also be detected by computational models through sophisticated algorithms and methods. Computational models can facilitate the global profiling of DNA methylation across chromosomes, and often such models are faster and cheaper to perform than biological assays. Such up-to-date computational models include Bhasin, et al.,[130] Bock, et al.,[131] and Zheng, et al.[132][133] Together with biological assay, these methods greatly facilitate the DNA methylation analysis.