Katana VentraIP

Advanced Audio Coding

Advanced Audio Coding (AAC) is an audio coding standard for lossy digital audio compression. It was designed to be the successor of the MP3 format and generally achieves higher sound quality than MP3 at the same bit rate.[4]

Filename extension

MPEG/3GPP container

Apple container

ADTS stream

audio/aac
audio/aacp
audio/3gpp
audio/3gpp2
audio/mp4
audio/mp4a-latm
audio/mpeg4-generic

December 1997 (1997-12)[2]

ISO/IEC 14496-3:2019
December 2019 (2019-12)

MPEG-4 Part 14, 3GP and 3G2, ISO base media file format and Audio Data Interchange Format (ADIF)

AAC has been standardized by ISO and IEC as part of the MPEG-2 and MPEG-4 specifications.[5][6] Part of AAC, HE-AAC ("AAC+"), is part of MPEG-4 Audio and is adopted into digital radio standards DAB+ and Digital Radio Mondiale, and mobile television standards DVB-H and ATSC-M/H.


AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 16 low frequency effects (LFE, limited to 120 Hz) channels, up to 16 "coupling" or dialog channels, and up to 16 data streams. The quality for stereo is satisfactory to modest requirements at 96 kbit/s in joint stereo mode; however, hi-fi transparency demands data rates of at least 128 kbit/s (VBR). Tests of MPEG-4 audio have shown that AAC meets the requirements referred to as "transparent" for the ITU at 128 kbit/s for stereo, and 384 kbit/s for 5.1 audio.[7] AAC uses only a modified discrete cosine transform (MDCT) algorithm, giving it higher compression efficiency than MP3, which uses a hybrid coding algorithm that is part MDCT and part FFT.[4]


AAC is the default or standard audio format for iPhone, iPod, iPad, Nintendo DSi, Nintendo 3DS, Apple Music,[a] iTunes, DivX Plus Web Player, PlayStation 4 and various Nokia Series 40 phones. It is supported on a wide range of devices and software such as PlayStation Vita, Wii, digital audio players like Sony Walkman or SanDisk Clip, Android and BlackBerry devices, various in-dash car audio systems, and is also one of the audio formats used on the Spotify web player.[8]

History[edit]

Background[edit]

The discrete cosine transform (DCT), a type of transform coding for lossy compression, was proposed by Nasir Ahmed in 1972, and developed by Ahmed with T. Natarajan and K. R. Rao in 1973, publishing their results in 1974.[9][10][11] This led to the development of the modified discrete cosine transform (MDCT), proposed by J. P. Princen, A. W. Johnson and A. B. Bradley in 1987,[12] following earlier work by Princen and Bradley in 1986.[13] The MP3 audio coding standard introduced in 1994 used a hybrid coding algorithm that is part MDCT and part FFT.[14] AAC uses a purely MDCT algorithm, giving it higher compression efficiency than MP3.[4] Development further advanced when Lars Liljeryd introduced a method that radically shrank the amount of information needed to store the digitized form of a song or speech.[15]


AAC was developed with the cooperation and contributions of companies including Bell Labs, Fraunhofer IIS, Dolby Laboratories, LG Electronics, NEC, Panasonic, Sony Corporation,[1] ETRI, JVC Kenwood, Philips, Microsoft, and NTT.[16] It was officially declared an international standard by the Moving Picture Experts Group in April 1997. It is specified both as Part 7 of the MPEG-2 standard, and Subpart 4 in Part 3 of the MPEG-4 standard.[17]

Standardization[edit]

In 1997, AAC was first introduced as MPEG-2 Part 7, formally known as ISO/IEC 13818-7:1997. This part of MPEG-2 was a new part, since MPEG-2 already included MPEG-2 Part 3, formally known as ISO/IEC 13818-3: MPEG-2 BC (Backwards Compatible).[18][19] Therefore, MPEG-2 Part 7 is also known as MPEG-2 NBC (Non-Backward Compatible), because it is not compatible with the MPEG-1 audio formats (MP1, MP2 and MP3).[18][20][21][22]


MPEG-2 Part 7 defined three profiles: Low-Complexity profile (AAC-LC / LC-AAC), Main profile (AAC Main) and Scalable Sampling Rate profile (AAC-SSR). AAC-LC profile consists of a base format very much like AT&T's Perceptual Audio Coding (PAC) coding format,[23][24][25] with the addition of temporal noise shaping (TNS),[26] the Kaiser window (described below), a nonuniform quantizer, and a reworking of the bitstream format to handle up to 16 stereo channels, 16 mono channels, 16 low-frequency effect (LFE) channels and 16 commentary channels in one bitstream. The Main profile adds a set of recursive predictors that are calculated on each tap of the filterbank. The SSR uses a 4-band PQMF filterbank, with four shorter filterbanks following, in order to allow for scalable sampling rates.


In 1999, MPEG-2 Part 7 was updated and included in the MPEG-4 family of standards and became known as MPEG-4 Part 3, MPEG-4 Audio or ISO/IEC 14496-3:1999. This update included several improvements. One of these improvements was the addition of Audio Object Types which are used to allow interoperability with a diverse range of other audio formats such as TwinVQ, CELP, HVXC, Text-To-Speech Interface and MPEG-4 Structured Audio. Another notable addition in this version of the AAC standard is Perceptual Noise Substitution (PNS). In that regard, the AAC profiles (AAC-LC, AAC Main and AAC-SSR profiles) are combined with perceptual noise substitution and are defined in the MPEG-4 audio standard as Audio Object Types.[27] MPEG-4 Audio Object Types are combined in four MPEG-4 Audio profiles: Main (which includes most of the MPEG-4 Audio Object Types), Scalable (AAC LC, AAC LTP, CELP, HVXC, TwinVQ, Wavetable Synthesis, TTSI), Speech (CELP, HVXC, TTSI) and Low Rate Synthesis (Wavetable Synthesis, TTSI).[27][28]


The reference software for MPEG-4 Part 3 is specified in MPEG-4 Part 5 and the conformance bit-streams are specified in MPEG-4 Part 4. MPEG-4 Audio remains backward-compatible with MPEG-2 Part 7.[29]


The MPEG-4 Audio Version 2 (ISO/IEC 14496-3:1999/Amd 1:2000) defined new audio object types: the low delay AAC (AAC-LD) object type, bit-sliced arithmetic coding (BSAC) object type, parametric audio coding using harmonic and individual line plus noise and error resilient (ER) versions of object types.[30][31][32] It also defined four new audio profiles: High Quality Audio Profile, Low Delay Audio Profile, Natural Audio Profile and Mobile Audio Internetworking Profile.[33]


The HE-AAC Profile (AAC LC with SBR) and AAC Profile (AAC LC) were first standardized in ISO/IEC 14496-3:2001/Amd 1:2003.[34] The HE-AAC v2 Profile (AAC LC with SBR and Parametric Stereo) was first specified in ISO/IEC 14496-3:2005/Amd 2:2006.[35][36][37] The Parametric Stereo audio object type used in HE-AAC v2 was first defined in ISO/IEC 14496-3:2001/Amd 2:2004.[38][39][40]


The current version of the AAC standard is defined in ISO/IEC 14496-3:2009.[41]


AAC+ v2 is also standardized by ETSI (European Telecommunications Standards Institute) as TS 102005.[38]


The MPEG-4 Part 3 standard also contains other ways of compressing sound. These include lossless compression formats, synthetic audio and low bit-rate compression formats generally used for speech.

AAC's improvements over MP3[edit]

Advanced Audio Coding is designed to be the successor of the MPEG-1 Audio Layer 3, known as MP3 format, which was specified by ISO/IEC in 11172-3 (MPEG-1 Audio) and 13818-3 (MPEG-2 Audio).


Blind tests in the late 1990s showed that AAC demonstrated greater sound quality and transparency than MP3 for files coded at the same bit rate.[4]


Improvements include:

Overall, the AAC format allows developers more flexibility to design codecs than MP3 does, and corrects many of the design choices made in the original MPEG-1 audio specification. This increased flexibility often leads to more concurrent encoding strategies and, as a result, to more efficient compression. This is especially true at very low bit rates where the superior stereo coding, pure MDCT, and better transform window sizes leave MP3 unable to compete.


While the MP3 format has near-universal hardware and software support, primarily because MP3 was the format of choice during the crucial first few years of widespread music file-sharing/distribution over the internet, AAC is a strong contender due to some unwavering industry support.[42]

Signal components that are perceptually irrelevant are discarded.

Redundancies in the coded audio signal are eliminated.

Licensing and patents[edit]

No licenses or payments are required for a user to stream or distribute audio in AAC format.[51] This reason alone might have made AAC a more attractive format to distribute audio than its predecessor MP3, particularly for streaming audio (such as Internet radio) depending on the use case.


However, a patent license is required for all manufacturers or developers of AAC "end-user" codecs.[52] The terms (as disclosed to SEC) uses per-unit pricing. In the case of software, each computer running the software is to be considered a separate "unit".[53]


It used to be common for free and open source software implementations such as FFmpeg and FAAC to only distribute in source code form so as to not "otherwise supply" an AAC codec. However, FFmpeg has since become more lenient on patent matters: the "gyan.dev" builds recommended by the official site now contains its AAC codec, with the FFmpeg legal page stating that patent law conformance is the user's responsibility.[54] (See below under Products that support AAC, Software.) Fortunately, the Fedora Project, a community backed by Red Hat, has imported the "Third-Party Modified Version of the Fraunhofer FDK AAC Codec Library for Android" to its repositories on September 25, 2018,[55] and has enabled FFmpeg's native AAC encoder and decoder for its ffmpeg-free package on January 31, 2023.[56]


The AAC patent holders include Bell Labs, Dolby, ETRI, Fraunhofer, JVC Kenwood, LG Electronics, Microsoft, NEC, NTT (and its subsidiary NTT Docomo), Panasonic, Philips, and Sony Corporation.[16][1] Based on the list of patents from the SEC terms, the last baseline AAC patent expires in 2028, and the last patent for all AAC extensions mentioned expires in 2031.[57]

Perceptual Noise Substitution (PNS), added in in 1999. It allows the coding of noise as pseudorandom data.

MPEG-4

Long Term Predictor (LTP), added in MPEG-4 in 1999. It is a forward predictor with lower computational complexity.

[29]

Error Resilience (ER), added in MPEG-4 Audio version 2 in 2000, used for transport over error prone channels

[58]

(Low Delay), defined in 2000, used for real-time conversation applications

AAC-LD

(HE-AAC), a.k.a. aacPlus v1 or AAC+, the combination of SBR (Spectral Band Replication) and AAC LC. Used for low bitrates. Defined in 2003.

High Efficiency AAC

, a.k.a. aacPlus v2, eAAC+ or Enhanced aacPlus, the combination of Parametric Stereo (PS) and HE-AAC; used for even lower bitrates. Defined in 2004 and 2006.

HE-AAC v2

, Not yet published,[59] can supplement an AAC stream to provide a lossless decoding option, such as in Fraunhofer IIS's "HD-AAC" product

MPEG-4 Scalable To Lossless (SLS)

Some extensions have been added to the first AAC standard (defined in MPEG-2 Part 7 in 1997):

Archos

(unofficially supported on some models)

Cowon

Portable

Creative Zen

(all current models)

Fiio

Nintendo 3DS

Nintendo DSi

Muse

Philips GoGear

(PSP) with firmware 2.0 or greater

PlayStation Portable

Samsung YEPP

(some models)

SanDisk Sansa

Walkman

Zune

Any portable player that fully supports the third party firmware

Rockbox

Comparison of audio coding formats

AAC-LD

(container format)

MPEG-4 Part 14

– a lossless codec developed by Apple

ALAC

– a royalty-free competitor to AAC and MP3

Vorbis

– an open, royalty-free codec for both pre-encoded and interactive use, standardized in 2012

Opus

Fraunhofer audio codecs

Archived 2006-08-25 at the Wayback Machine – home of FAAC and FAAD2

AudioCoding.com

Official MPEG web site

AAC improvements and extensions (2004)

 3016 - RTP Payload Format for MPEG-4 Audio/Visual Streams

RFC

 3640 - RTP Payload Format for Transport of MPEG-4 Elementary Streams

RFC

 4281 - The Codecs Parameter for "Bucket" Media Types

RFC

 4337 - MIME Type Registration for MPEG-4

RFC