DNA sequencer

A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.

Manufacturers

Roche, Illumina, Life Technologies, Beckman Coulter, Pacific Biosciences, MGI/BGI, Oxford Nanopore Technologies

The first automated DNA sequencer, invented by Lloyd M. Smith, was introduced by Applied Biosystems in 1987.^[1] It used the Sanger sequencing method, a technology which formed the basis of the "first generation" of DNA sequencers^[2]^[3] and enabled the completion of the human genome project in 2001.^[4] This first generation of DNA sequencers are essentially automated electrophoresis systems that detect the migration of labelled DNA fragments. Therefore, these sequencers can also be used in the genotyping of genetic markers where only the length of a DNA fragment(s) needs to be determined (e.g. microsatellites, AFLPs).

The Human Genome Project spurred the development of cheaper, high throughput and more accurate platforms known as Next Generation Sequencers (NGS) to sequence the human genome. These include the 454, SOLiD and Illumina DNA sequencing platforms. Next generation sequencing machines have increased the rate of DNA sequencing substantially, as compared with the previous Sanger methods. DNA samples can be prepared automatically in as little as 90 mins,^[5] while a human genome can be sequenced at 15 times coverage in a matter of days.^[6]

More recent, third-generation DNA sequencers such as PacBio SMRT and Oxford Nanopore offer the possibility of sequencing long molecules, compared to short-read technologies such as Illumina SBS or MGI Tech's DNBSEQ.

Because of limitations in DNA sequencer technology, the reads of many of these technologies are short, compared to the length of a genome therefore the reads must be assembled into longer contigs.^[7] The data may also contain errors, caused by limitations in the DNA sequencing technique or by errors during PCR amplification. DNA sequencer manufacturers use a number of different methods to detect which DNA bases are present. The specific protocols applied in different sequencing platforms have an impact in the final data that is generated. Therefore, comparing data quality and cost across different technologies can be a daunting task. Each manufacturer provides their own ways to inform sequencing errors and scores. However, errors and scores between different platforms cannot always be compared directly. Since these systems rely on different DNA sequencing approaches, choosing the best DNA sequencer and method will typically depend on the experiment objectives and available budget.^[2]

History[edit]

The first DNA sequencing methods were developed by Gilbert (1973)^[8] and Sanger (1975).^[9] Gilbert introduced a sequencing method based on chemical modification of DNA followed by cleavage at specific bases whereas Sanger's technique is based on dideoxynucleotide chain termination. The Sanger method became popular due to its increased efficiency and low radioactivity. The first automated DNA sequencer was the AB370A, introduced in 1986 by Applied Biosystems. The AB370A was able to sequence 96 samples simultaneously, 500 kilobases per day, and reaching read lengths up to 600 bases. This was the beginning of the "first generation" of DNA sequencers,^[2]^[3] which implemented Sanger sequencing, fluorescent dideoxy nucleotides and polyacrylamide gel sandwiched between glass plates - slab gels. The next major advance was the release in 1995 of the AB310 which utilized a linear polymer in a capillary in place of the slab gel for DNA strand separation by electrophoresis. These techniques formed the base for the completion of the human genome project in 2001.^[4] The human genome project spurred the development of cheaper, high throughput and more accurate platforms known as Next Generation Sequencers (NGS). In 2005, 454 Life Sciences released the 454 sequencer, followed by Solexa Genome Analyzer and SOLiD (Supported Oligo Ligation Detection) by Agencourt in 2006. Applied Biosystems acquired Agencourt in 2006, and in 2007, Roche bought 454 Life Sciences, while Illumina purchased Solexa. Ion Torrent entered the market in 2010 and was acquired by Life Technologies (now Thermo Fisher Scientific). And BGI started manufacturing sequencers in China after acquiring Complete Genomics under their MGI arm. These are still the most common NGS systems due to their competitive cost, accuracy, and performance.

More recently, a third generation of DNA sequencers was introduced. The sequencing methods applied by these sequencers do not require DNA amplification (polymerase chain reaction – PCR), which speeds up the sample preparation before sequencing and reduces errors. In addition, sequencing data is collected from the reactions caused by the addition of nucleotides in the complementary strand in real time. Two companies introduced different approaches in their third-generation sequencers. Pacific Biosciences sequencers utilize a method called Single-molecule real-time (SMRT), where sequencing data is produced by light (captured by a camera) emitted when a nucleotide is added to the complementary strand by enzymes containing fluorescent dyes. Oxford Nanopore Technologies is another company developing third-generation sequencers using electronic systems based on nanopore sensing technologies.

GS Run Processor converts raw images generated by a sequencing run into intensity values. The process consists of two main steps: image processing and signal processing. The software also applies normalization, signal correction, base-calling and quality scores for individual reads. The software outputs data in Standard Flowgram Format (or SFF) files to be used in data analysis applications (GS De Novo Assembler, GS Reference Mapper or GS Amplicon Variant Analyzer).

[18]

GS De Novo Assembler is a tool for de novo assembly of whole-genomes up to 3GB in size from shotgun reads alone or combined with paired end data generated by 454 sequencers. It also supports de novo assembly of transcripts (including analysis), and also isoform variant detection.

[17]

GS Reference Mapper maps short reads to a reference genome, generating a consensus sequence. The software is able to generate output files for assessment, indicating insertions, deletions and SNPs. Can handle large and complex genomes of any size.

[17]

Finally, the GS Amplicon Variant Analyzer aligns reads from amplicon samples against a reference, identifying variants (linked or not) and their frequencies. It can also be used to detect unknown and low-frequency variants. It includes graphical tools for analysis of alignments.