Existential risk from artificial general intelligence

Existential risk from artificial general intelligence refers to the idea that substantial progress in artificial general intelligence (AGI) could lead to human extinction or an irreversible global catastrophe.^[1]^[2]^[3]

One argument supporting this risk suggests that human beings dominate other species because the human brain possesses distinctive capabilities other animals lack. If AI were to surpass human intelligence and become superintelligent, it might become uncontrollable. Just as the fate of the mountain gorilla depends on human goodwill, the fate of humanity could depend on the actions of a future machine superintelligence.^[4]

The plausibility of existential catastrophe due to AI is widely debated. It hinges in part on whether AGI or superintelligence are achievable, the speed at which dangerous capabilities and behaviors emerge,^[5] and whether practical scenarios for AI takeovers exist.^[6] Concerns about superintelligence have been voiced by leading computer scientists and tech CEOs such as Geoffrey Hinton,^[7] Yoshua Bengio,^[8] Alan Turing,^[a] Elon Musk,^[11] and OpenAI CEO Sam Altman.^[12] In 2022, a survey of AI researchers with a 17% response rate found that the majority believed there is a 10 percent or greater chance that our inability to control AI will cause an existential catastrophe.^[13]^[14] In 2023, hundreds of AI experts and other notable figures signed a statement declaring that "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war".^[15] Following increased concern over AI risks, government leaders such as United Kingdom prime minister Rishi Sunak^[16] and United Nations Secretary-General António Guterres^[17] called for an increased focus on global AI regulation.

Two sources of concern stem from the problems of AI control and alignment: controlling a superintelligent machine or instilling it with human-compatible values may be difficult. Many researchers believe that a superintelligent machine would likely resist attempts to disable it or change its goals, as that would prevent it from accomplishing its present goals. It would be extremely challenging to align a superintelligence with the full breadth of significant human values and constraints.^[1]^[18]^[19] In contrast, skeptics such as computer scientist Yann LeCun argue that superintelligent machines will have no desire for self-preservation.^[20]

A third source of concern is the possibility of a sudden "intelligence explosion" catching humanity unprepared. This scenario considers that an AI more intelligent than its creators might be able to recursively improve itself at an exponentially increasing rate, improving too quickly for its handlers and society at large to control.^[1]^[18] Empirically, examples like AlphaZero, which taught itself to play Go and quickly surpassed human ability, show that domain-specific AI systems can sometimes progress from subhuman to superhuman ability very quickly, although such systems do not involve altering their fundamental architecture.^[21]

AI capabilities[edit]

General Intelligence[edit]

Artificial general intelligence (AGI) is typically defined as a system that performs at least as well as humans in most or all intellectual tasks.^[41] A 2022 survey of AI researchers found that 90% of respondents expected AGI would be achieved in the next 100 years, and half expected the same by 2061.^[42] Meanwhile, some researchers dismiss existential risks from AGI as "science fiction" based on their high confidence that AGI will not be created anytime soon.^[43]

Breakthroughs in large language models have led some researchers to reassess their expectations. Notably, Geoffrey Hinton said in 2023 that he recently changed his estimate from "20 to 50 years before we have general purpose A.I." to "20 years or less".^[44]

Superintelligence[edit]

In contrast with AGI, Bostrom defines a superintelligence as "any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest", including scientific creativity, strategic planning, and social skills.^[45]^[4] He argues that a superintelligence can outmaneuver humans anytime its goals conflict with humans'. It may choose to hide its true intent until humanity cannot stop it.^[46]^[4] Bostrom writes that in order to be safe for humanity, a superintelligence must be aligned with human values and morality, so that it is "fundamentally on our side".^[47]

Stephen Hawking argued that superintelligence is physically possible because "there is no physical law precluding particles from being organised in ways that perform even more advanced computations than the arrangements of particles in human brains".^[31]

When artificial superintelligence (ASI) may be achieved, if ever, is necessarily less certain than predictions for AGI. In 2023, OpenAI leaders said that not only AGI, but superintelligence may be achieved in less than 10 years.^[48]

As AI systems increase in capabilities, the potential dangers associated with experimentation grow. This makes iterative, empirical approaches increasingly risky.^[81]

[4]

If instrumental goal convergence occurs, it may only do so in sufficiently intelligent agents.

[82]

A superintelligence may find unconventional and radical solutions to assigned goals. Bostrom gives the example that if the objective is to make humans smile, a weak AI may perform as intended, while a superintelligence may decide a better solution is to "take control of the world and stick electrodes into the facial muscles of humans to cause constant, beaming grins."

[47]

A superintelligence in creation could gain some awareness of what it is, where it is in development (training, testing, deployment, etc.), and how it is being monitored, and use this information to deceive its handlers. Bostrom writes that such an AI could feign alignment to prevent human interference until it achieves a "decisive strategic advantage" that allows it to take control.^[4]

[83]

Analyzing the internals and interpreting the behavior of current large language models is difficult. And it could be even more difficult for larger and more intelligent models.

[81]

Clark, Jack (2015a). . Bloomberg.com. Archived from the original on 30 October 2015. Retrieved 30 October 2015.