History[edit]

Despite the increase in IVR technology during the 1970s, the technology was considered complex and expensive for automating tasks in call centers.[3] Early voice response systems were digital signal processing (DSP) technology based and limited to small vocabularies. In the early 1980s, Leon Ferber's Perception Technology became the first mainstream market competitor, after hard drive technology (read/write random-access to digitized voice data) had reached a cost-effective price point. At that time, a system could store digitized speech on disk, play the appropriate spoken message, and process the human's DTMF response.


As call centers began to migrate to multimedia in the late 1990s, companies started to invest in computer telephony integration (CTI) with IVR systems. IVR became vital for call centers deploying universal queuing and routing solutions and acted as an agent which collected customer data to enable intelligent routing decisions. With improvements in technology, systems could use speaker-independent voice recognition[4] of a limited vocabulary instead of requiring the person to use DTMF signaling.


Starting in the 2000s, voice response became more common and cheaper to deploy. This was due to increased CPU power and the migration of speech applications from proprietary code to the VXML standard.

Equipment installed on the customer premises

Equipment installed in the PSTN (public switched telephone network)

(ASP) / hosted IVR

Application service provider

DTMF decoding and speech recognition are used to interpret the caller's response to voice prompts. DTMF tones are entered via the telephone keypad.


Other technologies include using text-to-speech (TTS) to speak complex and dynamic information, such as e-mails, news reports or weather information. IVR technology is also being introduced into automobile systems for hands-free operation. TTS is computer generated synthesized speech that is no longer the robotic voice traditionally associated with computers. Real voices create the speech in fragments that are spliced together (concatenated) and smoothed before being played to the caller.


An IVR can be deployed in several ways:


An automatic call distributor (ACD) is often the second point of contact when calling many larger businesses. An ACD uses digital storage devices to play greetings or announcements, but typically routes a caller without prompting for input. An IVR can play announcements and request an input from the caller. This information can be used to profile the caller and used by an ACD to route the call to an agent with a particular skill set.


Interactive voice response can be used to front-end a call center operation by identifying the needs of the caller. Information can be obtained from the caller such as an account number. Answers to simple questions such as account balances or pre-recorded information can be provided without operator intervention. Account numbers from the IVR are often compared to caller ID data for security reasons and additional IVR responses are required if the caller ID does not match the account record.[5]


IVR call flows are created in a variety of ways. A traditional IVR depended upon proprietary programming or scripting languages, whereas modern IVR applications are generated in a similar way to Web pages, using standards such as VoiceXML,[6] CCXML,[7] SRGS[8] and SSML.[9] The ability to use XML-driven applications allows a web server to act as the application server, freeing the IVR developer to focus on the call flow.


IVR speech recognition interactions (call flows) are designed using 3 approaches to prompt for and recognize user input: directed, open-ended, and mixed dialogue.[10][11][12]


A directed dialogue prompt communicates a set of valid responses to the user (e.g. "How can I help you? ... Say something like, account balance, order status, or more options"). An open-ended prompt does not communicate a set of valid responses (e.g. "How can I help you?"). In both cases, the goal is to glean a valid spoken response from the user. The key difference is that with directed dialogue, the user is more likely to speak an option exactly as was communicated by the prompt (e.g. "account balance"). With an open-ended prompt, however, the user is likely to include extraneous words or phrases (e.g. "I was just looking at my bill and saw that my balance was wrong."). The open-ended prompt requires a greater degree of natural language processing to extract the relevant information from the phrase (i.e. "balance"). Open-ended recognition also requires a larger grammar set, which accounts for a wider array of permutations of a given response (e.g. "balance was wrong", "wrong balance", "balance is high", "high balance"). Despite the greater amount of data and processing required for open-ended prompts, they are more interactively efficient, as the prompts themselves are typically much shorter.[10]


A mixed dialogue approach involves shifting from open-ended to directed dialogue or vice versa within the same interaction, as one type of prompt may be more effective in a given situation. Mixed dialog prompts must also be able to recognize responses that are not relevant to the immediate prompt, for instance in the case of a user deciding to shift to a function different from the current one.[12][11]


Higher level IVR development tools are available to further simplify the application development process. A call flow diagram can be drawn with a GUI tool and the presentation layer (typically VoiceXML) can be automatically generated. In addition, these tools normally provide extension mechanisms for software integration, such as an HTTP interface to a website and a Java interface for connecting to a database.


In telecommunications, an audio response unit (ARU) (often included in IVR systems) is a device that provides synthesized voice responses to DTMF keypresses by processing calls based on (a) the call-originator input, (b) information received from a database, and (c) information in the incoming call, such as the time of day. ARUs increase the number of information calls handled and provide consistent quality in information retrieval.

Sangeet Swara: voice-based singing platform for low literate users in India. Although this platform was for a broader audience, it saw large participation from visually impaired people.

[19]

Gurgaon Idol: was a singing competition used voice system, where users could vote and sing to a number presented on radio.

[20]

Polly: A voiced based viral entertainment system that allowed users to modify their voice and share it with their contacts. The authors used the virality to play relevant job advertisements for literate population. Polly's model for entertainment has been adapted to spread information about maternal health for fathers, agriculture and community generated content.[22]

[21]

Developments[edit]

Video[edit]

The introduction of Session Initiation Protocol (SIP) means that point-to-point communications are no longer restricted to voice calls but can now be extended to multimedia technologies such as video. IVR manufacturers have extended their systems into IVVR (interactive voice and video response), especially for the mobile phone networks. The use of video gives IVR systems the ability to implement multimodal interaction with the caller.


The introduction of full-duplex video IVR in the future will allow systems the ability to read emotions and facial expressions. It may also be used to identify the caller, using technology such as Iris scan or other biometric means. Recordings of the caller may be stored to monitor certain transactions and can be used to reduce identity fraud.[26]

SIP contact center[edit]

With the introduction of SIP contact centers, call control in a SIP contact center can be implemented by CCXML scripting, which is an adjunct to the VXML language used to generate modern IVR dialogues. As calls are queued in the SIP contact center, the IVR system can provide treatment or automation, wait for a fixed period, or play music. Inbound calls to a SIP contact center must be queued or terminated against a SIP end point; SIP IVR systems can be used to replace agents directly by the use of applications deployed using BBUA (back-to-back user agents).

Interactive messaging response (IMR)[edit]

Due to the introduction of instant messaging (IM) in contact centers, agents can handle up to 6 different IM conversations at the same time, which increases agent productivity. IVR technology is being used to automate IM conversations using existing natural language processing software. This differs from email handling as email automated response is based on key word spotting and IM conversations are conversational. The use of text messaging abbreviations and smilies requires different grammars to those currently used for speech recognition. IM is also starting to replace text messaging on multimedia mobile handsets.

Hosted vs. on-premises IVR[edit]

With the introduction of web services into the contact center, host integration has been simplified, allowing IVR applications to be hosted remotely from the contact center. This has meant hosted IVR applications using speech are now available to smaller contact centers across the globe and has led to an expansion of ASP (application service providers).


IVR applications can also be hosted on the public network, without contact center integration. Services include public announcement messages and message services for small business. It is also possible to deploy two-prong IVR services where the initial IVR application is used to route the call to the appropriate contact center. This can be used to balance loading across multiple contact centers or provide business continuity in the event of a system outage.

Criticism[edit]

Surveys show IVR is generally unpopular with customers. It is difficult to use and unresponsive to the caller. Many customers object to talking to an automated system. There is a perception that IVR is adopted because it allows companies to save money and allow the hiring of fewer employees to answer the phone.[27] Additionally, as basic information is now available online, the calls coming into a call center are more likely to be complex problems and not ones that can be resolved in an automated fashion, thus requiring the attention of a live agent.

Automatic number identification

Call avoidance

Call whisper

Dialog system

(DNIS)

Dialed Number Identification Service

(DTMF)

Dual-tone multi-frequency

Electronic patient-reported outcome

Natural language

Radix economy

Speech recognition

Speech synthesis

Voice portal

Voder

Voice-based marketing automation

Voice user interface

at Curlie

Speech Technology / Telephony