Binaural recording
Binaural recording is a method of recording sound that uses two microphones, arranged with the intent to create a 3D stereo sound sensation for the listener of actually being in the room with the performers or instruments. This effect is often created using a technique known as dummy head recording, wherein a mannequin head is fitted with a microphone in each ear. Binaural recording is intended for replay using headphones and will not translate properly over stereo speakers. This idea of a three-dimensional or "internal" form of sound has also translated into useful advancement of technology in many things such as stethoscopes creating "in-head" acoustics and IMAX movies being able to create a three-dimensional acoustic experience.
The term "binaural" has frequently been confused as a synonym for the word "stereo", due in part to systematic misuse in the mid-1950s by the recording industry, as a marketing buzzword. Conventional stereo recordings do not factor in natural ear spacing or "head shadow" of the head and ears, since these things happen naturally as a person listens, generating interaural time differences (ITDs) and interaural level differences (ILDs) specific to their listening position. Because loudspeaker-crosstalk with conventional stereo interferes with binaural reproduction (i.e. because the sound from each channel's speaker is heard by both ears rather than only by the ear on the corresponding side, as would be the case with headphones), either headphones are required, or crosstalk cancellation of signals intended for loudspeakers such as Ambiophonics is required. For listening using conventional speaker-stereo, or MP3 players, a pinna-less dummy head may be preferable for quasi-binaural recordings such as the sphere microphone or Ambiophone. As a general rule, for true binaural results, an audio recording and reproduction system chain, from the microphone to the listener's brain, should contain one and only one set of pinnae (preferably the listener's own), and one head-shadow.
History[edit]
The history of binaural recording goes back to 1881.[1] The first binaural unit, the théâtrophone, was invented by Clément Ader.[1] It consisted of an array of carbon telephone microphones installed along the front edge of the Opera Garnier. The signal was sent to subscribers through the telephone system, and required that they wear a special headset, which had a tiny speaker for each ear.
The dummy head recording is associated with the use of the physical synthetic head called the Kunstkopf. The Kunstkopf would be placed in concert halls during the recording of a live orchestra or in the film industry actors could stand around the head whilst recording their dialogue. The dummy head could also be used to imprint positional information on prerecorded sound effects by playing sounds through a loudspeaker in a suitable orientation to the head. For example, thunder and birdsong sounds to be played above the dummy head.
Within the film industry Demolition (1973) was the first radio drama recorded using a dummy head.[2]
In 1974 Virgin Records issued the first solo album by Tangerine Dream's leader Edgar Froese, titled Aqua. The brief sleeve notes inform listeners that side 2 of the disc (i.e. the tracks NGC 891 and Upland) were recorded using the artificial head system developed by Gunther Brunschen. Listeners were advised to optimize their listening by using stereo headphones for that side of the album.
Although Edgar was keen to continue to use and promote this system for subsequent recordings, it was abandoned, due to the fact that, although it worked well through headphones, the improved sound quality did not translate adequately through a hi-fi speaker system.
In 1978, Lou Reed released the first commercially produced binaural pop record, Street Hassle, a combination of live and studio recordings.[3]
Binaural stayed in the background due to the expensive, specialized equipment required for quality recordings, and the requirement of headphones for proper reproduction. Particularly in pre-Walkman days, most consumers considered headphones an inconvenience, and were only interested in recordings that could be listened to on a home stereo system or in automobiles. Lastly, the types of things that can be recorded do not have a typically high market value. Studio recordings would have little to benefit from using a binaural set up, beyond natural cross-feed, as the spatial quality of the studio would not be very dynamic and interesting. Recordings that are of interest are live orchestral performances, and ambient "environmental" recordings of city sounds, nature, and other such subject matters.
During the 1990s, electronic devices which used digital signal processing (DSP) to reproduce HRTFs were made commercially available. These devices would allow the sound engineer to use dialled parameters to adjust the apparent direction of real time sounds. They were unusual and expensive, but would allow the sound engineer to alter special effects of prerecorded sounds quickly and conveniently. Through the manipulation of the parameters, sound engineers could take a monophonic recording of a passing car and make it sound as if it were passing behind them in real time. Recording with an actual dummy head for the same outcome would require a recording booth and a moving speaker, or an array of speakers as well as multiple panning or switching devices.
The modern era has seen a resurgence of interest in binaural, partially due to the widespread availability of headphones, cheaper methods of recording and the general increased commercial interest in 360° audio technology.
The online ASMR community is another movement that has widely employed binaural recordings.
The rise of Dolby Atmos and other 360° audio film technology in relation to commercial entertainment has seen a rise in popularity of the use of binaural simulation. This is with the purpose of fully adapting the 360° soundtrack for headphones and earphones. Users can ostensibly watch 360° films and music with the immersive surround sound experience remaining intact despite using just the two headset speakers. Notably, any full 360° multi-channel soundtrack is automatically converted to simulated binaural audio when listened to with headphones.
In 2005, Aqua was remixed for limited edition reissue in Germany and Japan, with an additional track Upland Dawn appended to the end of the CD.
In 2015, Singaporean singer-songwriter JJ Lin released his debut experimental album From M.E. to Myself, using dummy head recording. This is also the first album in pop music industry using this technology.[4]
Re-recording techniques[edit]
The technique of binaural re-recording is simple but has not been well established. It follows the same principles of Worldizing,[10] a technique used by film sound designers in which sound is played over a loudspeaker in a real-world location and then re-recorded, taking along all the aspects and characteristics of the real-world environment with it.[11]
Using space to manipulate a sound and then re-recording it has been done through the use of echo chambers in recording studios for many years. In 1959, an echo chamber was famously used by Irving Townsend during the post-production process of Miles Davis's 1959 album Kind of Blue. "[the effect of the echo chamber on Kind of Blue is] just a bit of sweetening. At 30th Street, a line was run from the mixing console down into a low-ceilinged, concrete basement room—about 12 by 15 feet in size—anywhere we set up a speaker and a good omnidirectional microphone."[12]
In binaural re-recording, a binaural microphone is used to record content being played over a multi-channel speaker set-up. The binaural head, or microphone, is therefore theoretically making a recording of how humans will hear multi-channel content. The soundtrack to a film, for example, will be recorded by the binaural microphone with all the environmental cues of the given location, as well as reverberations, including those commonly created by the human torso (assuming a HATS[13] model is used). This method, like certain binaural recordings made with a Neumann KU 100.[14]
Using an MRI scanner, Brüel & Kjær and DTU collected the geometries of a large population of human ears. By capturing the full ear canal geometry including the bony part adjoining the eardrum was, this data was post-processed to determine the average human ear canal geometry. Based on this, High-frequency Head and Torso Simulator (HATS) Type 5128, creates a very realistic reproduction of the acoustic properties, covering the full audible frequency range (up to 20 kHz).[15]
Known issues[edit]
Timbral issues[edit]
In January 2012 BBC R&D worked together with BBC Radio 4 to produce a binaural production of Private Peaceful, the book by Michael Morpurgo.[18] The 88 minute dramatization featured a reproduction of a 5.1 speaker system, and had 4 variations. At the start of each variation, the listener would hear a series of test signals allowing for a choice of which version gives the listener the best spatial experience. By doing this, BBC R&D have accepted that there will be variations on the success of the binaural reproduction, and therefore provided different mixes based on different sets of HRTF data. The release of Private Peaceful had an accompanying survey which all listeners were asked to complete. It asked questions about the success that the binaural reproduction had with the listeners and which version (1-4) the listener thought was most successful.
During an interview with Chris Pike from BBC R&D in September 2012, Pike stated that "you may get good spatial impression but timbral coloration is often an issue".[19] The issue of timbral coloration is mentioned in a large amount of spatial enhancement research and is sometimes seen as the outcome of the misuse or insufficient amount of HRTF data when reproducing binaural audio for example, or the fact that the end-user simply will not respond well to the collected HRTF data. Francis Rumsey states in the 2011 article "Whose head is it anyway?"[20] that "badly implemented HRTFs can give rise to poor timbral quality, poor externalisation, and a host of other unwanted results".[20] Getting the HRTF data correct is a key point in making the final product a success, and possibly by making the HRTF data as extensive as possible, there will be less room for error such as timbral issues. The HRTFs used for Private Peaceful[18] were designed by measuring impulse responses in a reverberant room, done so to capture a sense of space, but is not very external and there are obvious timbral issues as pointed out by Pike.[19]
Juha Merimaa from Sennheiser Research Laboratories found that using HRTF filters to reduce timbral issues did not affect the spatial localization previously achieved using the data when tested on a panel of listeners.[21] This explains that there are ways of reducing the effects of timbral issues on audio that have been processed with HRTF data, but this does mean further EQ manipulation of the audio. If this route is to be further explored, researchers will have to be happy with the fact that the audio is being manipulated in great amounts to achieve a greater sense of spatial awareness, and that this further manipulation will cause irreversible changes to the audio, something content creators may not be happy with. Consideration will have to be taken into how much manipulation is appropriate and to what extent, if any, will this affect the end users experience.
[edit]
Ideal listening conditions will most likely be experienced with headphones designed and calibrated to give an as flat frequency response as possible in order to reduce colouration of the audio the user is listening to. In most circumstances this has not seemed enough of a problem for end-users to make an investment into headphones that will allow them to hear audio exactly how the creator of the content intended, and will instead continue to use bundled headphones, or in some cases make investments into headphones endorsed and branded by certain artists. As previously discussed, there are issues of timbral effects present while using BRIR and HRTF data to create spatially improved audio, techniques used by Chris Pike and BBC R&D.[19] The results experienced timbral issues and therefore this method may not yet be a successful way of creating spatially enhanced audio for headphones, but these timbral issues are also experienced with headphone choice. "[Are timbral issues brought about by the use of BRIR and HRFT data] any worse than the difference between some cheap headphones that you get with an mp3 player versus some nice Sennheisers".[19]