Non-coding DNA
Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regulatory RNAs). Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.
Fraction of non-coding genomic DNA[edit]
In bacteria, the coding regions typically take up 88% of the genome.[1] The remaining 12% does not encode proteins, but much of it still has biological function through genes where the RNA transcript is functional (non-coding genes) and regulatory sequences, which means that almost all of the bacterial genome has a function.[1] The amount of coding DNA in eukaryotes is usually a much smaller fraction of the genome because eukaryotic genomes contain large amounts of repetitive DNA not found in prokaryotes. The human genome contains somewhere between 1–2% coding DNA.[2][3] The exact number is not known because there are disputes over the number of functional coding exons and over the total size of the human genome. This means that 98–99% of the human genome consists of non-coding DNA and this includes many functional elements such as non-coding genes and regulatory sequences.
Genome size in eukaryotes can vary over a wide range, even between closely related species. This puzzling observation was originally known as the C-value Paradox where "C" refers to the haploid genome size.[4] The paradox was resolved with the discovery that most of the differences were due to the expansion and contraction of repetitive DNA and not the number of genes. Some researchers speculated that this repetitive DNA was mostly junk DNA. The reasons for the changes in genome size are still being worked out and this problem is called the C-value Enigma.[5]
This led to the observation that the number of genes does not seem to correlate with perceived notions of complexity because the number of genes seems to be relatively constant, an issue termed the G-value Paradox.[6] For example, the genome of the unicellular Polychaos dubium (formerly known as Amoeba dubia) has been reported to contain more than 200 times the amount of DNA in humans (i.e. more than 600 billion pairs of bases vs a bit more than 3 billion in humans).[7] The pufferfish Takifugu rubripes genome is only about one eighth the size of the human genome, yet seems to have a comparable number of genes. Genes take up about 30% of the pufferfish genome and the coding DNA is about 10%. (Non-coding DNA = 90%.) The reduced size of the pufferfish genome is due to a reduction in the length of introns and less repetitive DNA.[8][9]
Utricularia gibba, a bladderwort plant, has a very small nuclear genome (100.7 Mb) compared to most plants.[10][11] It likely evolved from an ancestral genome that was 1,500 Mb in size.[11] The bladderwort genome has roughly the same number of genes as other plants but the total amount of coding DNA comes to about 30% of the genome.[10][11]
The remainder of the genome (70% non-coding DNA) consists of promoters and regulatory sequences that are shorter than those in other plant species.[10] The genes contain introns but there are fewer of them and they are smaller than the introns in other plant genomes.[10] There are noncoding genes, including many copies of ribosomal RNA genes.[11] The genome also contains telomere sequences and centromeres as expected.[11] Much of the repetitive DNA seen in other eukaryotes has been deleted from the bladderwort genome since that lineage split from those of other plants. About 59% of the bladderwort genome consists of transposon-related sequences but since the genome is so much smaller than other genomes, this represents a considerable reduction in the amount of this DNA.[11] The authors of the original 2013 article note that claims of additional functional elements in the non-coding DNA of animals do not seem to apply to plant genomes.[10]
According to a New York Times article, during the evolution of this species, "... genetic junk that didn't serve a purpose was expunged, and the necessary stuff was kept."[12] According to Victor Albert of the University of Buffalo, the plant is able to expunge its so-called junk DNA and "have a perfectly good multicellular plant with lots of different cells, organs, tissue types and flowers, and you can do it without the junk. Junk is not needed."[13]
Genome-wide association studies (GWAS) and non-coding DNA[edit]
Genome-wide association studies (GWAS) identify linkages between alleles and observable traits such as phenotypes and diseases. Most of the associations are between single-nucleotide polymorphisms (SNPs) and the trait being examined and most of these SNPs are located in non-functional DNA. The association establishes a linkage that helps map the DNA region responsible for the trait but it does not necessarily identify the mutations causing the disease or phenotypic difference.[50][51][52][53][54]
SNPs that are tightly linked to traits are the ones most likely to identify a causal mutation. (The association is referred to as tight linkage disequilibrium.) About 12% of these polymorphisms are found in coding regions; about 40% are located in introns; and most of the rest are found in intergenic regions, including regulatory sequences.[51]