Virtual assistant

A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.

For the human occupation, see Virtual assistant (occupation).

In many cases users can ask their virtual assistants questions, control home automation devices and media playback, and manage other basic tasks such as email, to-do lists, and calendars - all with verbal commands.^[1] In recent years, prominent virtual assistants for direct consumer use have included Apple's Siri, Amazon Alexa, Google Assistant, and Samsung's Bixby.^[2] Also, companies in various industries often incorporate some kind of virtual assistant technology into their customer service or support.^[3]

Recently, the emergence of recent artificial intelligence based chatbots, such as ChatGPT, has brought increased capability and interest to the field of virtual assistant products and services.^[4]^[5]^[6]

History[edit]

Experimental decades: 1910s–1980s[edit]

Radio Rex was the first voice activated toy, patented in 1916^[7] and released in 1922.^[8] It was a wooden toy in the shape of a dog that would come out of its house when its name is called.

In 1952, Bell Labs presented "Audrey", the Automatic Digit Recognition machine. It occupied a six- foot-high relay rack, consumed substantial power, had streams of cables and exhibited the myriad maintenance problems associated with complex vacuum-tube circuitry. It could recognize the fundamental units of speech, phonemes. It was limited to accurate recognition of digits spoken by designated talkers. It could therefore be used for voice dialing, but in most cases push-button dialing was cheaper and faster, rather than speaking the consecutive digits.^[9]

Another early tool which was enabled to perform digital speech recognition was the IBM Shoebox voice-activated calculator, presented to the general public during the 1962 Seattle World's Fair after its initial market launch in 1961. This early computer, developed almost 20 years before the introduction of the first IBM Personal Computer in 1981, was able to recognize 16 spoken words and the digits 0 to 9.

The first natural language processing computer program or the chatbot ELIZA was developed by MIT professor Joseph Weizenbaum in the 1960s. It was created to "demonstrate that the communication between man and machine was superficial".^[10] ELIZA used pattern matching and substitution methodology into scripted responses to simulate conversation, which gave an illusion of understanding on the part of the program.

Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.^[11]

This gave name to the ELIZA effect, the tendency to unconsciously assume computer behaviors are analogous to human behaviors; that is, anthropomorphisation, a phenomenon present in human interactions with virtual assistants.

The next milestone in the development of voice recognition technology was achieved in the 1970s at the Carnegie Mellon University in Pittsburgh, Pennsylvania with substantial support of the United States Department of Defense and its DARPA agency, funded five years of a Speech Understanding Research program, aiming to reach a minimum vocabulary of 1,000 words. Companies and academia including IBM, Carnegie Mellon University (CMU) and Stanford Research Institute took part in the program.

The result was "Harpy", it mastered about 1000 words, the vocabulary of a three-year-old and it could understand sentences. It could process speech that followed pre-programmed vocabulary, pronunciation, and grammar structures to determine which sequences of words made sense together, and thus reducing speech recognition errors.

In 1986 Tangora was an upgrade of the Shoebox, it was a voice recognizing typewriter. Named after the world's fastest typist at the time, it had a vocabulary of 20,000 words and used prediction to decide the most likely result based on what was said in the past. IBM's approach was based on a hidden Markov model, which adds statistics to digital signal processing techniques. The method makes it possible to predict the most likely phonemes to follow a given phoneme. Still each speaker had to individually train the typewriter to recognize his or her voice, and pause between each word.

Virtual assistants work via:

Many virtual assistants are accessible via multiple methods, offering versatility in how users can interact with them, whether through chat, voice commands, or other integrated technologies.

Virtual assistants use natural language processing (NLP) to match user text or voice input to executable commands. Some continually learn using artificial intelligence techniques including machine learning and ambient intelligence.

To activate a virtual assistant using the voice, a wake word might be used. This is a word or groups of words such as "Hey Siri", "OK Google" or "Hey Google", "Alexa", and "Hey Microsoft".^[20] As virtual assistants become more popular, there are increasing legal risks involved.^[21]^: 815

Into devices like smart speakers such as Amazon Echo, Google Home and

Apple HomePod

In applications on both smartphones and via the Web, e.g. M (virtual assistant) on both Facebook and Facebook Messenger apps or via the Web

instant messaging

Built into a (OS), as are Apple's Siri on iOS devices and BlackBerry Assistant on BlackBerry 10 devices, or into a desktop OS such as Cortana on Microsoft Windows OS

mobile operating system

Built into a smartphone independent of the OS, as is on the Samsung Galaxy S8 and Note 8.^[22]

Bixby

Within instant messaging platforms, assistants from specific organizations, such as 's Aerobot on Facebook Messenger or WeChat Secretary.

Aeromexico

Within mobile apps from specific companies and other organizations, such as Dom from ^[23]

Domino's Pizza

In appliances, cars,^[25] and wearable technology.^[26]

[24]

Previous generations of virtual assistants often worked on websites, such as ' Ask Jenn,^[27] or on interactive voice response (IVR) systems such as American Airlines' IVR by Nuance.^[28]

Alaska Airlines

Provide information such as weather, facts from e.g. or IMDb, set an alarm, make to-do lists and shopping lists

Wikipedia

Play music from streaming services such as and Pandora; play radio stations; read audiobooks

Spotify

Play videos, TV shows or movies on televisions, streaming from e.g.

Netflix

(see below)

Conversational commerce

Assist public interactions with government (see )

Artificial intelligence in government

Complement and/or replace human customer service specialists in domains like healthcare, sales, and banking. One report estimated that an automated online assistant produced a 30% decrease in the work-load for a human-provided call centre.^[31]

[30]

Enhance the driving experience by enabling interaction with virtual assistants like Siri and Alexa while in the car.

Virtual assistants spur the : As for social media, virtual assistants' algorithms are trained to show pertinent data and discard others based on previous activities of the consumer: The pertinent data is the one which will interest or please the consumer. As a result, they become isolated from data that disagrees with their viewpoints, effectively isolating them into their own intellectual bubble, and reinforcing their opinions. This phenomenon was known to reinforce fake news and echo chambers.^[42]

filter bubble

Virtual assistants are also sometimes criticized for being overrated. In particular, points out that the AI of virtual assistants are neither intelligent nor artificial for two reasons:

A. Casilli

was opened to developers in April 2017. It involves natural language understanding technology combined with automatic speech recognition and had been introduced in November 2016.^[47]

Amazon Lex

Google provides the and Dialogflow platforms for developers to create "Actions" for Google Assistant^[48]

Actions on Google

Apple provides SiriKit for developers to create extensions for

Siri

's Watson, while sometimes spoken of as a virtual assistant is in fact an entire artificial intelligence platform and community powering some virtual assistants, chatbots. and many other types of solutions.^[49]^[50]

IBM

Economic relevance[edit]

For individuals[edit]

Digital experiences enabled by virtual assistants are considered to be among the major recent technological advances and most promising consumer trends. Experts claim that digital experiences will achieve a status-weight comparable to 'real' experiences, if not become more sought-after and prized.^[51] The trend is verified by a high number of frequent users and the substantial growth of worldwide user numbers of virtual digital assistants. In mid-2017, the number of frequent users of digital virtual assistants is estimated to be around 1 bn worldwide.^[52] In addition, it can be observed that virtual digital assistant technology is no longer restricted to smartphone applications, but present across many industry sectors (incl. automotive, telecommunications, retail, healthcare and education).^[53] In response to the significant R&D expenses of firms across all sectors and an increasing implementation of mobile devices, the market for speech recognition technology is predicted to grow at a CAGR of 34.9% globally over the period of 2016 to 2024 and thereby surpass a global market size of US$7.5 billion by 2024.^[53] According to an Ovum study, the "native digital assistant installed base" is projected to exceed the world's population by 2021, with 7.5 billion active voice AI–capable devices.^[54] According to Ovum, by that time "Google Assistant will dominate the voice AI–capable device market with 23.3% market share, followed by Samsung's Bixby (14.5%), Apple's Siri (13.1%), Amazon's Alexa (3.9%), and Microsoft's Cortana (2.3%)."^[54]

Taking into consideration the regional distribution of market leaders, North American companies (e.g. Nuance Communications, IBM, eGain) are expected to dominate the industry over the next years, due to the significant impact of BYOD (Bring Your Own Device) and enterprise mobility business models. Furthermore, the increasing demand for smartphone-assisted platforms are expected to further boost the North American intelligent virtual assistant (IVA) industry growth. Despite its smaller size in comparison to the North American market, the intelligent virtual assistant industry from the Asia-Pacific region, with its main players located in India and China is predicted to grow at an annual growth rate of 40% (above global average) over the 2016–2024 period.^[53]

Economic opportunity for enterprises[edit]

Virtual assistants should not be only seen as a gadget for individuals, as they could have a real economic utility for enterprises. As an example, a virtual assistant can take the role of an always available assistant with an encyclopedic knowledge. And which can organize meetings, check inventories, verify informations. Virtual assistants are all the more important that their integration in small and middle-sized enterprises often consists in an easy first step through the more global adaptation and use of Internet of Things (IoT). Indeed, IoT technologies are first perceived by small and medium-sized enterprises as technologies of critical importance, but too complicated, risky or costly to be used.^[55]