Single-nucleotide polymorphism

In genetics and bioinformatics, a single-nucleotide polymorphism (SNP /snɪp/; plural SNPs /snɪps/) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population (e.g. 1% or more),^[1] many publications^[2]^[3]^[4] do not apply such a frequency threshold.

"SNPs" redirects here. For the singular, see SNP (disambiguation).

For example, a G nucleotide present at a specific location in a reference genome may be replaced by an A in a minority of individuals. The two possible nucleotide variations of this SNP – G or A – are called alleles.^[5]

SNPs can help explain differences in susceptibility to a wide range of diseases across a population. For example, a common SNP in the CFH gene is associated with increased risk of age-related macular degeneration.^[6] Differences in the severity of an illness or response to treatments may also be manifestations of genetic variations caused by SNPs. For example, two common SNPs in the APOE gene, rs429358 and rs7412, lead to three major APO-E alleles with different associated risks for development of Alzheimer's disease and age at onset of the disease.^[7]

Single nucleotide substitutions with an allele frequency of less than 1% are sometimes called single-nucleotide variants (SNVs).^[8] "Variant" may also be used as a general term for any single nucleotide change in a DNA sequence,^[9] encompassing both common SNPs and rare mutations, whether germline or somatic.^[10]^[11] The term SNV has therefore been used to refer to point mutations found in cancer cells.^[12] DNA variants must also commonly be taken into consideration in molecular diagnostics applications such as designing PCR primers to detect viruses, in which the viral RNA or DNA sample may contain SNVs. However, this nomenclature uses arbitrary distinctions (such as an allele frequency of 1%) and is not used consistently across all fields; the resulting disagreement has prompted calls for a more consistent framework for naming differences in DNA sequences between two samples.^[13]^[14]

SNPs in can manifest in a higher risk of cancer,^[17] and may affect mRNA structure and disease susceptibility.^[18] Non-coding SNPs can also alter the level of expression of a gene, as an eQTL (expression quantitative trait locus).

non-coding regions

synonymous substitutions

Single-nucleotide polymorphisms may fall within coding sequences of genes, non-coding regions of genes, or in the intergenic regions (regions between genes). SNPs within a coding sequence do not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code.^[15]

SNPs in the coding region are of two types: synonymous SNPs and nonsynonymous SNPs. Synonymous SNPs do not affect the protein sequence, while nonsynonymous SNPs change the amino acid sequence of protein.^[16]

SNPs that are not in protein-coding regions may still affect gene splicing, transcription factor binding, messenger RNA degradation, or the sequence of noncoding RNA. Gene expression affected by this type of SNP is referred to as an eSNP (expression SNP) and may be upstream or downstream from the gene.

can determine whether a genetic variant is associated with a disease or trait.^[34]

Association studies

A tag SNP is a representative single-nucleotide polymorphism in a region of the genome with high (the non-random association of alleles at two or more loci). Tag SNPs are useful in whole-genome SNP association studies, in which hundreds of thousands of SNPs across the entire genome are genotyped.

linkage disequilibrium

mapping: sets of alleles or DNA sequences can be clustered so that a single SNP can identify many linked SNPs.

Haplotype

(LD), a term used in population genetics, indicates non-random association of alleles at two or more loci, not necessarily on the same chromosome. It refers to the phenomenon that SNP allele or DNA sequence that are close together in the genome tend to be inherited together. LD can be affected by two parameters (among other factors, such as population stratification): 1) The distance between the SNPs [the larger the distance, the lower the LD]. 2) Recombination rate [the lower the recombination rate, the higher the LD].^[35]

Linkage disequilibrium

In SNPs are used to estimate transmission clusters.^[36]

genetic epidemiology

and rs6313 are SNPs in the Serotonin 5-HT2A receptor gene on human chromosome 13.^[48]

rs6311

The SNP − 3279C/A (rs3761548) is amongst the SNPs locating in the promoter region of the gene, might be involved in cancer progression.^[49]

Foxp3

A SNP in the gene causes Factor V Leiden thrombophilia.^[50]

F5

is an example of a triallelic SNP in the CRP gene on human chromosome 1.^[51]

rs3091244

codes for PTC tasting ability, and contains 6 annotated SNPs.^[52]

TAS2R38

rs148649884 and rs138055828 in the gene encoding M-ficolin crippled the ligand-binding capability of the recombinant M-ficolin.^[53]

FCN1

An SNP in DNA mismatch repair gene PMS2 (rs1059060, Ser775Asn) is associated with increased sperm DNA damage and risk of male infertility.^[54]

intronic

is a SNP database from the National Center for Biotechnology Information (NCBI). As of June 8, 2015, dbSNP listed 149,735,377 SNPs in humans.^[55]^[56]

dbSNP

^[57] is a compendium of SNPs from multiple data sources including dbSNP.

Kaviar

is a wiki-style database supporting personal genome annotation, interpretation and analysis.

SNPedia

The database describes the association between polymorphisms and diseases (e.g., gives diseases in text form)

OMIM

dbSAP – single amino-acid polymorphism database for protein variation detection

[58]

The Human Gene Mutation Database provides gene mutations causing or associated with human inherited diseases and functional SNPs

The , where researchers are identifying Tag SNPs to be able to determine the collection of haplotypes present in each subject.

International HapMap Project

allows users to visually interrogate the actual summary-level association data in one or more genome-wide association studies.

GWAS Central

As there are for genes, bioinformatics databases exist for SNPs.

The International SNP Map working group mapped the sequence flanking each SNP by alignment to the genomic sequence of large-insert clones in Genebank. These alignments were converted to chromosomal coordinates that is shown in Table 1.^[59] This list has greatly increased since, with, for instance, the Kaviar database now listing 162 million single nucleotide variants (SNVs).

c.76A>T: "c." for , followed by a number for the position of the nucleotide, followed by a one-letter abbreviation for the nucleotide (A, C, G, T or U), followed by a greater than sign (">") to indicate substitution, followed by the abbreviation of the nucleotide which replaces the former^[61]^[62]^[63]

coding region

p.Ser123Arg: "p." for protein, followed by a three-letter abbreviation for the amino acid, followed by a number for the position of the amino acid, followed by the abbreviation of the amino acid which replaces the former.

[64]

The nomenclature for SNPs include several variations for an individual SNP, while lacking a common consensus.

The rs### standard is that which has been adopted by dbSNP and uses the prefix "rs", for "reference SNP", followed by a unique and arbitrary number.^[60] SNPs are frequently referred to by their dbSNP rs number, as in the examples above.

The Human Genome Variation Society (HGVS) uses a standard which conveys more information about the SNP. Examples are:

SNP analysis[edit]

SNPs can be easily assayed due to only containing two possible alleles and three possible genotypes involving the two alleles: homozygous A, homozygous B and heterozygous AB, leading to many possible techniques for analysis. Some include: DNA sequencing; capillary electrophoresis; mass spectrometry; single-strand conformation polymorphism (SSCP); single base extension; electrochemical analysis; denaturating HPLC and gel electrophoresis; restriction fragment length polymorphism; and hybridization analysis.

This program provides insight into how a laboratory induced missense or nonsynonymous mutation will affect protein function based on physical properties of the amino acid and sequence homology.

SIFT

^[66]^[67] (Local Identity and Shared Taxa) estimates the potential deleteriousness of mutations resulted from altering their protein functions. It is based on the assumption that variations observed in closely related species are more significant when assessing conservation compared to those in distantly related species.

LIST

SNAP2

SuSPect

PolyPhen-2

PredictSNP

: official website

MutationTaster

from the Ensembl project

Variant Effect Predictor

Archived 2020-08-07 at the Wayback Machine^[68] This program provides a 3D representation of the protein affected, highlighting the amino acid change so doctors can determine pathogenicity of the mutant protein.

SNPViz

PROVEAN

is a database which maps variants to experimental and predicted protein structures.^[69]

PhyreRisk

is a tool which provides a stereochemical report on the effect of missense variants on protein structure.^[70]

Missense3D

An important group of SNPs are those that corresponds to missense mutations causing amino acid change on protein level. Point mutation of particular residue can have different effect on protein function (from no effect to complete disruption its function). Usually, change in amino acids with similar size and physico-chemical properties (e.g. substitution from leucine to valine) has mild effect, and opposite. Similarly, if SNP disrupts secondary structure elements (e.g. substitution to proline in alpha helix region) such mutation usually may affect whole protein structure and function. Using those simple and many other machine learning derived rules a group of programs for the prediction of SNP effect was developed:^[65]

Archived 2013-09-02 at the Wayback Machine – Introduction to SNPs from NCBI

NCBI resources

– SNP search

The SNP Consortium LTD

– "a central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms"

NCBI dbSNP database

– the Human Gene Mutation Database, includes rare mutations and functional SNPs

HGMD

– a central database of summary-level genetic association findings

GWAS Central

– A Deep Catalog of Human Genetic Variation

1000 Genomes Project

Archived 2007-06-18 at the Wayback Machine – an online tool for the design of SNP-RFLP assays

WatCut

Archived 2008-10-13 at the Wayback Machine – SNPStats, a web tool for analysis of genetic association studies

SNPStats

– a set of tools for DNA restriction and SNP detection, including design of mutagenic primers

Restriction HomePage

American Association for Cancer Research Cancer Concepts Factsheet on SNPs

– The Pharmacogenetics and Pharmacogenomics Knowledge Base, a resource for SNPs associated with drug response and disease outcomes.

PharmGKB

Archived 2010-01-19 at the Wayback Machine – Online tool that identifies polymorphisms in test DNA sequences.

GEN-SNiP

Rules for Nomenclature of Genes, Genetic Markers, Alleles, and Mutations in Mouse and Rat

HGNC Guidelines for Human Gene Nomenclature

SNP effect predictor with galaxy integration

– a portal for sharing own SNP test results

Open SNP

Archived 2016-12-20 at the Wayback Machine – SNP database for protein variation detection