Voice user interface
A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.
Voice user interfaces have been added to automobiles, home automation systems, computer operating systems, home appliances like washing machines and microwave ovens, and television remote controls. They are the primary way of interacting with virtual assistants on smartphones and smart speakers. Older automated attendants (which route phone calls to the correct extension) and interactive voice response systems (which conduct more complicated transactions over the phone) can respond to the pressing of keypad buttons via DTMF tones, but those with a full voice user interface allow callers to speak requests and responses without having to press any buttons.
Newer voice command devices are speaker-independent, so they can respond to multiple voices, regardless of accent or dialectal influences. They are also capable of responding to several commands at once, separating vocal messages, and providing appropriate feedback, accurately imitating a natural conversation.[1]
$_$_$DEEZ_NUTS#0__titleDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#0__subtitleDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#0__call_to_action.textDEEZ_NUTS$_$_$Overview[edit]
A VUI is the interface to any speech application. Only a short time ago, controlling a machine by simply talking to it was only possible in science fiction. Until recently, this area was considered to be artificial intelligence. However, advances in technologies like text-to-speech, speech-to-text, natural language processing, and cloud services contributed to the mass adoption of these types of interfaces. VUIs have become more commonplace, and people are taking advantage of the value that these hands-free, eyes-free interfaces provide in many situations.
VUIs need to respond to input reliably, or they will be rejected and often ridiculed by their users. Designing a good VUI requires interdisciplinary talents of computer science, linguistics and human factors psychology – all of which are skills that are expensive and hard to come by. Even with advanced development tools, constructing an effective VUI requires an in-depth understanding of both the tasks to be performed, as well as the target audience that will use the final system. The closer the VUI matches the user's mental model of the task, the easier it will be to use with little or no training, resulting in both higher efficiency and higher user satisfaction.
A VUI designed for the general public should emphasize ease of use and provide a lot of help and guidance for first-time callers. In contrast, a VUI designed for a small group of power users (including field service workers), should focus more on productivity and less on help and guidance. Such applications should streamline the call flows, minimize prompts, eliminate unnecessary iterations and allow elaborate "mixed initiative dialogs", which enable callers to enter several pieces of information in a single utterance and in any order or combination. In short, speech applications have to be carefully crafted for the specific business process that is being automated.
Not all business processes render themselves equally well for speech automation. In general, the more complex the inquiries and transactions are, the more challenging they will be to automate, and the more likely they will be to fail with the general public. In some scenarios, automation is simply not applicable, so live agent assistance is the only option. A legal advice hotline, for example, would be very difficult to automate. On the flip side, speech is perfect for handling quick and routine transactions, like changing the status of a work order, completing a time or expense entry, or transferring funds between accounts.
History[edit]
Early applications for VUI included voice-activated dialing of phones, either directly or through a (typically Bluetooth) headset or vehicle audio system.
In 2007, a CNN business article reported that voice command was over a billion dollar industry and that companies like Google and Apple were trying to create speech recognition features.[2] In the years since the article was published, the world has witnessed a variety of voice command devices. Additionally, Google has created a speech recognition engine called Pico TTS and Apple released Siri. Voice command devices are becoming more widely available, and innovative ways for using the human voice are always being created. For example, Business Week suggests that the future remote controller is going to be the human voice. Currently Xbox Live allows such features and Jobs hinted at such a feature on the new Apple TV.[3]
As car technology improves, more features will be added to cars and these features could potentially distract a driver. Voice commands for cars, according to CNET, should allow a driver to issue commands and not be distracted. CNET stated that Nuance was suggesting that in the future they would create a software that resembled Siri, but for cars.[17] Most speech recognition software on the market in 2011 had only about 50 to 60 voice commands, but Ford Sync had 10,000.[17] However, CNET suggested that even 10,000 voice commands was not sufficient given the complexity and the variety of tasks a user may want to do while driving.[17] Voice command for cars is different from voice command for mobile phones and for computers because a driver may use the feature to look for nearby restaurants, look for gas, driving directions, road conditions, and the location of the nearest hotel.[17] Currently, technology allows a driver to issue voice commands on both a portable GPS like a Garmin and a car manufacturer navigation system.[18]
List of Voice Command Systems Provided By Motor Manufacturers:
$_$_$DEEZ_NUTS#5__titleDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#5__descriptionDEEZ_NUTS$_$_$
Non-verbal input[edit]
While most voice user interfaces are designed to support interaction through spoken human language, there have also been recent explorations in designing interfaces take non-verbal human sounds as input. In these systems, the user controls the interface by emitting non-speech sounds such as humming, whistling, or blowing into a microphone.[19]
One such example of a non-verbal voice user interface is Blendie,[20][21] an interactive art installation created by Kelly Dobson. The piece comprised a classic 1950s-era blender which was retrofitted to respond to microphone input. To control the blender, the user must mimic the whirring mechanical sounds that a blender typically makes: the blender will spin slowly in response to a user's low-pitched growl, and increase in speed as the user makes higher-pitched vocal sounds.
Another example is VoiceDraw,[22] a research system that enables digital drawing for individuals with limited motor abilities. VoiceDraw allows users to "paint" strokes on a digital canvas by modulating vowel sounds, which are mapped to brush directions. Modulating other paralinguistic features (e.g. the loudness of their voice) allows the user to control different features of the drawing, such as the thickness of the brush stroke.
Other approaches include adopting non-verbal sounds to augment touch-based interfaces (e.g. on a mobile phone) to support new types of gestures that wouldn't be possible with finger input alone.[19]
Privacy implications[edit]
Privacy concerns are raised by the fact that voice commands are available to the providers of voice-user interfaces in unencrypted form, and can thus be shared with third parties and be processed in an unauthorized or unexpected manner.[31][32] Additionally to the linguistic content of recorded speech, a user's manner of expression and voice characteristics can implicitly contain information about his or her biometric identity, personality traits, body shape, physical and mental health condition, sex, gender, moods and emotions, socioeconomic status and geographical origin.[33]
$_$_$DEEZ_NUTS#3__titleDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#3__subtextDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#7__titleDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#7__subtextDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#2__titleDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#2__subtextDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#4__titleDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#4__subtextDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__titleDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__subtextDEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__answer--0DEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__answer--1DEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__answer--2DEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__answer--3DEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__answer--4DEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__answer--5DEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__answer--6DEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__answer--7DEEZ_NUTS$_$_$
$_$_$DEEZ_NUTS#1__answer--8DEEZ_NUTS$_$_$