Single-nucleotide polymorphism
In genetics and bioinformatics, a single-nucleotide polymorphism (SNP /snɪp/; plural SNPs /snɪps/) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population (e.g. 1% or more),[1] many publications[2][3][4] do not apply such a frequency threshold.
"SNPs" redirects here. For the singular, see SNP (disambiguation).
For example, a G nucleotide present at a specific location in a reference genome may be replaced by an A in a minority of individuals. The two possible nucleotide variations of this SNP – G or A – are called alleles.[5]
SNPs can help explain differences in susceptibility to a wide range of diseases across a population. For example, a common SNP in the CFH gene is associated with increased risk of age-related macular degeneration.[6] Differences in the severity of an illness or response to treatments may also be manifestations of genetic variations caused by SNPs. For example, two common SNPs in the APOE gene, rs429358 and rs7412, lead to three major APO-E alleles with different associated risks for development of Alzheimer's disease and age at onset of the disease.[7]
Single nucleotide substitutions with an allele frequency of less than 1% are sometimes called single-nucleotide variants (SNVs).[8] "Variant" may also be used as a general term for any single nucleotide change in a DNA sequence,[9] encompassing both common SNPs and rare mutations, whether germline or somatic.[10][11] The term SNV has therefore been used to refer to point mutations found in cancer cells.[12] DNA variants must also commonly be taken into consideration in molecular diagnostics applications such as designing PCR primers to detect viruses, in which the viral RNA or DNA sample may contain SNVs. However, this nomenclature uses arbitrary distinctions (such as an allele frequency of 1%) and is not used consistently across all fields; the resulting disagreement has prompted calls for a more consistent framework for naming differences in DNA sequences between two samples.[13][14]
Single-nucleotide polymorphisms may fall within coding sequences of genes, non-coding regions of genes, or in the intergenic regions (regions between genes). SNPs within a coding sequence do not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code.[15]
SNPs in the coding region are of two types: synonymous SNPs and nonsynonymous SNPs. Synonymous SNPs do not affect the protein sequence, while nonsynonymous SNPs change the amino acid sequence of protein.[16]
SNPs that are not in protein-coding regions may still affect gene splicing, transcription factor binding, messenger RNA degradation, or the sequence of noncoding RNA. Gene expression affected by this type of SNP is referred to as an eSNP (expression SNP) and may be upstream or downstream from the gene.
As there are for genes, bioinformatics databases exist for SNPs.
The International SNP Map working group mapped the sequence flanking each SNP by alignment to the genomic sequence of large-insert clones in Genebank. These alignments were converted to chromosomal coordinates that is shown in Table 1.[59] This list has greatly increased since, with, for instance, the Kaviar database now listing 162 million single nucleotide variants (SNVs).
The nomenclature for SNPs include several variations for an individual SNP, while lacking a common consensus.
The rs### standard is that which has been adopted by dbSNP and uses the prefix "rs", for "reference SNP", followed by a unique and arbitrary number.[60] SNPs are frequently referred to by their dbSNP rs number, as in the examples above.
The Human Genome Variation Society (HGVS) uses a standard which conveys more information about the SNP. Examples are:
SNP analysis[edit]
SNPs can be easily assayed due to only containing two possible alleles and three possible genotypes involving the two alleles: homozygous A, homozygous B and heterozygous AB, leading to many possible techniques for analysis. Some include: DNA sequencing; capillary electrophoresis; mass spectrometry; single-strand conformation polymorphism (SSCP); single base extension; electrochemical analysis; denaturating HPLC and gel electrophoresis; restriction fragment length polymorphism; and hybridization analysis.
An important group of SNPs are those that corresponds to missense mutations causing amino acid change on protein level. Point mutation of particular residue can have different effect on protein function (from no effect to complete disruption its function). Usually, change in amino acids with similar size and physico-chemical properties (e.g. substitution from leucine to valine) has mild effect, and opposite. Similarly, if SNP disrupts secondary structure elements (e.g. substitution to proline in alpha helix region) such mutation usually may affect whole protein structure and function. Using those simple and many other machine learning derived rules a group of programs for the prediction of SNP effect was developed:[65]