Discrete cosine transform
A discrete cosine transform (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression. It is used in most digital media, including digital images (such as JPEG and HEIF), digital video (such as MPEG and H.26x), digital audio (such as Dolby Digital, MP3 and AAC), digital television (such as SDTV, HDTV and VOD), digital radio (such as AAC+ and DAB+), and speech coding (such as AAC-LD, Siren and Opus). DCTs are also important to numerous other applications in science and engineering, such as digital signal processing, telecommunication devices, reducing network bandwidth usage, and spectral methods for the numerical solution of partial differential equations.
A DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers. The DCTs are generally related to Fourier series coefficients of a periodically and symmetrically extended sequence whereas DFTs are related to Fourier series coefficients of only periodically extended sequences. DCTs are equivalent to DFTs of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), whereas in some variants the input or output data are shifted by half a sample.
There are eight standard DCT variants, of which four are common.
The most common variant of discrete cosine transform is the type-II DCT, which is often called simply the DCT. This was the original DCT as first proposed by Ahmed. Its inverse, the type-III DCT, is correspondingly often called simply the inverse DCT or the IDCT. Two related transforms are the discrete sine transform (DST), which is equivalent to a DFT of real and odd functions, and the modified discrete cosine transform (MDCT), which is based on a DCT of overlapping data. Multidimensional DCTs (MD DCTs) are developed to extend the concept of DCT to multidimensional signals. A variety of fast algorithms have been developed to reduce the computational complexity of implementing DCT. One of these is the integer DCT (IntDCT),[1] an integer approximation of the standard DCT,[2]: ix, xiii, 1, 141–304 used in several ISO/IEC and ITU-T international standards.[1][2]
DCT compression, also known as block compression, compresses data in sets of discrete DCT blocks.[3] DCT blocks sizes including 8x8 pixels for the standard DCT, and varied integer DCT sizes between 4x4 and 32x32 pixels.[1][4] The DCT has a strong energy compaction property,[5][6] capable of achieving high quality at high data compression ratios.[7][8] However, blocky compression artifacts can appear when heavy DCT compression is applied.
History[edit]
The DCT was first conceived by Nasir Ahmed, T. Natarajan and K. R. Rao while working at Kansas State University. The concept was proposed to the National Science Foundation in 1972. The DCT was originally intended for image compression.[9][1] Ahmed developed a practical DCT algorithm with his PhD students T. Raj Natarajan, Wills Dietrich, and Jeremy Fries, and his friend Dr. K. R. Rao at the University of Texas at Arlington in 1973.[9] They presented their results in a January 1974 paper, titled Discrete Cosine Transform.[5][6][10] It described what is now called the type-II DCT (DCT-II),[2]: 51 as well as the type-III inverse DCT (IDCT).[5]
Since its introduction in 1974, there has been significant research on the DCT.[10] In 1977, Wen-Hsiung Chen published a paper with C. Harrison Smith and Stanley C. Fralick presenting a fast DCT algorithm.[11][10] Further developments include a 1978 paper by M. J. Narasimha and A. M. Peterson, and a 1984 paper by B. G. Lee.[10] These research papers, along with the original 1974 Ahmed paper and the 1977 Chen paper, were cited by the Joint Photographic Experts Group as the basis for JPEG's lossy image compression algorithm in 1992.[10][12]
The discrete sine transform (DST) was derived from the DCT, by replacing the Neumann condition at x=0 with a Dirichlet condition.[2]: 35-36 The DST was described in the 1974 DCT paper by Ahmed, Natarajan and Rao.[5] A type-I DST (DST-I) was later described by Anil K. Jain in 1976, and a type-II DST (DST-II) was then described by H.B. Kekra and J.K. Solanka in 1978.[13]
In 1975, John A. Roese and Guner S. Robinson adapted the DCT for inter-frame motion-compensated video coding. They experimented with the DCT and the fast Fourier transform (FFT), developing inter-frame hybrid coders for both, and found that the DCT is the most efficient due to its reduced complexity, capable of compressing image data down to 0.25-bit per pixel for a videotelephone scene with image quality comparable to an intra-frame coder requiring 2-bit per pixel.[14][15] In 1979, Anil K. Jain and Jaswant R. Jain further developed motion-compensated DCT video compression,[16][17] also called block motion compensation.[17] This led to Chen developing a practical video compression algorithm, called motion-compensated DCT or adaptive scene coding, in 1981.[17] Motion-compensated DCT later became the standard coding technique for video compression from the late 1980s onwards.[18][19]
A DCT variant, the modified discrete cosine transform (MDCT), was developed by John P. Princen, A.W. Johnson and Alan B. Bradley at the University of Surrey in 1987,[20] following earlier work by Princen and Bradley in 1986.[21] The MDCT is used in most modern audio compression formats, such as Dolby Digital (AC-3),[22][23] MP3 (which uses a hybrid DCT-FFT algorithm),[24] Advanced Audio Coding (AAC),[25] and Vorbis (Ogg).[26]
Nasir Ahmed also developed a lossless DCT algorithm with Giridhar Mandyam and Neeraj Magotra at the University of New Mexico in 1995. This allows the DCT technique to be used for lossless compression of images. It is a modification of the original DCT algorithm, and incorporates elements of inverse DCT and delta modulation. It is a more effective lossless compression algorithm than entropy coding.[27] Lossless DCT is also known as LDCT.[28]
Inverse transforms[edit]
Using the normalization conventions above, the inverse of DCT-I is DCT-I multiplied by 2/(N − 1). The inverse of DCT-IV is DCT-IV multiplied by 2/N. The inverse of DCT-II is DCT-III multiplied by 2/N and vice versa.[6]
Like for the DFT, the normalization factor in front of these transform definitions is merely a convention and differs between treatments. For example, some authors multiply the transforms by so that the inverse does not require any additional multiplicative factor. Combined with appropriate factors of √2 (see above), this can be used to make the transform matrix orthogonal.
Computation[edit]
Although the direct application of these formulas would require operations, it is possible to compute the same thing with only complexity by factorizing the computation similarly to the fast Fourier transform (FFT). One can also compute DCTs via FFTs combined with pre- and post-processing steps. In general, methods to compute DCTs are known as fast cosine transform (FCT) algorithms.
The most efficient algorithms, in principle, are usually those that are specialized directly for the DCT, as opposed to using an ordinary FFT plus extra operations (see below for an exception). However, even "specialized" DCT algorithms (including all of those that achieve the lowest known arithmetic counts, at least for power-of-two sizes) are typically closely related to FFT algorithms – since DCTs are essentially DFTs of real-even data, one can design a fast DCT algorithm by taking an FFT and eliminating the redundant operations due to this symmetry. This can even be done automatically (Frigo & Johnson 2005). Algorithms based on the Cooley–Tukey FFT algorithm are most common, but any other FFT algorithm is also applicable. For example, the Winograd FFT algorithm leads to minimal-multiplication algorithms for the DFT, albeit generally at the cost of more additions, and a similar algorithm was proposed by (Feig & Winograd 1992a) for the DCT. Because the algorithms for DFTs, DCTs, and similar transforms are all so closely related, any improvement in algorithms for one transform will theoretically lead to immediate gains for the other transforms as well (Duhamel & Vetterli 1990).
While DCT algorithms that employ an unmodified FFT often have some theoretical overhead compared to the best specialized DCT algorithms, the former also have a distinct advantage: Highly optimized FFT programs are widely available. Thus, in practice, it is often easier to obtain high performance for general lengths N with FFT-based algorithms.[a]
Specialized DCT algorithms, on the other hand, see widespread use for transforms of small, fixed sizes such as the 8 × 8 DCT-II used in JPEG compression, or the small DCTs (or MDCTs) typically used in audio compression. (Reduced code size may also be a reason to use a specialized DCT for embedded-device applications.)
In fact, even the DCT algorithms using an ordinary FFT are sometimes equivalent to pruning the redundant operations from a larger FFT of real-symmetric data, and they can even be optimal from the perspective of arithmetic counts. For example, a type-II DCT is equivalent to a DFT of size with real-even symmetry whose even-indexed elements are zero. One of the most common methods for computing this via an FFT (e.g. the method used in FFTPACK and FFTW) was described by Narasimha & Peterson (1978) and Makhoul (1980), and this method in hindsight can be seen as one step of a radix-4 decimation-in-time Cooley–Tukey algorithm applied to the "logical" real-even DFT corresponding to the DCT-II.[b]
Because the even-indexed elements are zero, this radix-4 step is exactly the same as a split-radix step. If the subsequent size real-data FFT is also performed by a real-data split-radix algorithm (as in Sorensen et al. (1987)), then the resulting algorithm actually matches what was long the lowest published arithmetic count for the power-of-two DCT-II ( real-arithmetic operations[c]).
A recent reduction in the operation count to also uses a real-data FFT.[110] So, there is nothing intrinsically bad about computing the DCT via an FFT from an arithmetic perspective – it is sometimes merely a question of whether the corresponding FFT algorithm is optimal. (As a practical matter, the function-call overhead in invoking a separate FFT routine might be significant for small but this is an implementation rather than an algorithmic question since it can be solved by unrolling or inlining.)