Error detection and correction

In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data in many cases.

Not to be confused with Error handling.

Definitions[edit]

Error detection is the detection of errors caused by noise or other impairments during transmission from the transmitter to the receiver.

Error correction is the detection of errors and reconstruction of the original, error-free data.

History[edit]

In classical antiquity, copyists of the Hebrew Bible were paid for their work according to the number of stichs (lines of verse). As the prose books of the Bible were hardly ever written in stichs, the copyists, in order to estimate the amount of work, had to count the letters.^[1] This also helped ensure accuracy in the transmission of the text with the production of subsequent copies.^[2]^[3] Between the 7th and 10th centuries CE a group of Jewish scribes formalized and expanded this to create the Numerical Masorah to ensure accurate reproduction of the sacred text. It included counts of the number of words in a line, section, book and groups of books, noting the middle stich of a book, word use statistics, and commentary.^[1] Standards became such that a deviation in even a single letter in a Torah scroll was considered unacceptable.^[4] The effectiveness of their error correction method was verified by the accuracy of copying through the centuries demonstrated by discovery of the Dead Sea Scrolls in 1947–1956, dating from c. 150 BCE-75 CE.^[5]

The modern development of error correction codes is credited to Richard Hamming in 1947.^[6] A description of Hamming's code appeared in Claude Shannon's A Mathematical Theory of Communication^[7] and was quickly generalized by Marcel J. E. Golay.^[8]

Principles[edit]

All error-detection and correction schemes add some redundancy (i.e., some extra data) to a message, which receivers can use to check consistency of the delivered message and to recover data that has been determined to be corrupted. Error detection and correction schemes can be either systematic or non-systematic. In a systematic scheme, the transmitter sends the original (error-free) data and attaches a fixed number of check bits (or parity data), which are derived from the data bits by some encoding algorithm. If error detection is required, a receiver can simply apply the same algorithm to the received data bits and compare its output with the received check bits; if the values do not match, an error has occurred at some point during the transmission. If error correction is required, a receiver can apply the decoding algorithm to the received data bits and the received check bits to recover the original error-free data. In a system that uses a non-systematic code, the original message is transformed into an encoded message carrying the same information and that has at least as many bits as the original message.

Good error control performance requires the scheme to be selected based on the characteristics of the communication channel. Common channel models include memoryless models where errors occur randomly and with a certain probability, and dynamic models where errors occur primarily in bursts. Consequently, error-detecting and correcting codes can be generally distinguished between random-error-detecting/correcting and burst-error-detecting/correcting. Some codes can also be suitable for a mixture of random errors and burst errors.

If the channel characteristics cannot be determined, or are highly variable, an error-detection scheme may be combined with a system for retransmissions of erroneous data. This is known as automatic repeat request (ARQ), and is most notably used in the Internet. An alternate approach for error control is hybrid automatic repeat request (HARQ), which is a combination of ARQ and error-correction coding.

Convolutional codes are processed on a bit-by-bit basis. They are particularly suitable for implementation in hardware, and the allows optimal decoding.

Viterbi decoder

Block codes are processed on a basis. Early examples of block codes are repetition codes, Hamming codes and multidimensional parity-check codes. They were followed by a number of efficient codes, Reed–Solomon codes being the most notable due to their current widespread use. Turbo codes and low-density parity-check codes (LDPC) are relatively new constructions that can provide almost optimal efficiency.

block-by-block

Each uses CRC-32 error detection. Frames with detected errors are discarded by the receiver hardware.

Ethernet frame

The header contains a checksum protecting the contents of the header. Packets with incorrect checksums are dropped within the network or at the receiver.

IPv4

The checksum was omitted from the header in order to minimize processing costs in network routing and because current link layer technology is assumed to provide sufficient error detection (see also RFC 3819).

IPv6

has an optional checksum covering the payload and addressing information in the UDP and IP headers. Packets with incorrect checksums are discarded by the network stack. The checksum is optional under IPv4, and required under IPv6. When omitted, it is assumed the data-link layer provides the desired level of error protection.

UDP

provides a checksum for protecting the payload and addressing information in the TCP and IP headers. Packets with incorrect checksums are discarded by the network stack and eventually get retransmitted using ARQ, either explicitly (such as through three-way handshake) or implicitly due to a timeout.

TCP

Berger code

Burst error-correcting code

a type of computer data storage

ECC memory

Link adaptation

List of algorithms § Error detection and correction

List of hash functions

Shu Lin; Daniel J. Costello, Jr. (1983). Error Control Coding: Fundamentals and Applications. . ISBN 0-13-283796-X.

Prentice Hall

SoftECC: A System for Software Memory Integrity Checking

A Tunable, Software-based DRAM Error Detection and Correction Library for HPC

Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing

by David J.C. MacKay, contains chapters on elementary error-correcting codes; on the theoretical limits of error-correction; and on the latest state-of-the-art error-correcting codes, including low-density parity-check codes, turbo codes, and fountain codes.

The on-line textbook: Information Theory, Inference, and Learning Algorithms

- implementations of popular ECC encoding and decoding routines