Wayback Machine
The Wayback Machine is a digital archive of the World Wide Web founded by the Internet Archive, an American nonprofit organization based in San Francisco, California. Created in 1996 and launched to the public in 2001, it allows the user to go "back in time" to see how websites looked in the past. Its founders, Brewster Kahle and Bruce Gilliat, developed the Wayback Machine to provide "universal access to all knowledge" by preserving archived copies of defunct web pages.[1]
For the time machine from Peabody's Improbable History, see Wayback Machine (Peabody's Improbable History).
Type of site
Archive
- May 10, 1996 (private)
- October 24, 2001 (public)
Worldwide (except China, Russia, and Bahrain)
No
Optional
Active
HTML, CSS, JavaScript, Java, Python
History[edit]
The Wayback Machine began archiving cached web pages in 1996. One of the earliest known pages was archived on May 10, 1996, at (UTC).[4]
Internet Archive founders Brewster Kahle and Bruce Gilliat launched the Wayback Machine in San Francisco, California,[5] in October 2001,[6][7] primarily to address the problem of web content vanishing whenever it gets changed or when a website is shut down.[8] The service enables users to see archived versions of web pages across time, which the archive calls a "three-dimensional index".[9] Kahle and Gilliat created the machine hoping to archive the entire Internet and provide "universal access to all knowledge".[10] The name "Wayback Machine" is a reference to a fictional time-traveling and translation device, the "Wayback Machine", used by the characters Mister Peabody and Sherman in the animated cartoon The Adventures of Rocky and Bullwinkle and Friends.[11][12] In one of the cartoon's segments, "Peabody's Improbable History", the characters used the machine to witness, participate in, and often alter famous events in history.
From 1996 to 2001, the information was kept on digital tape, with Kahle occasionally allowing researchers and scientists to tap into the "clunky" database.[13] When the archive reached its fifth anniversary in 2001, it was unveiled and opened to the public in a ceremony at the University of California, Berkeley.[14] By the time the Wayback Machine launched, it already contained over 10 billion archived pages.[15] The data is stored on the Internet Archive's large cluster of Linux nodes.[10] It revisits and archives new versions of websites on occasion (see technical details below).[16] Sites can also be captured manually by entering a website's URL into the search box, provided that the website allows the Wayback Machine to "crawl" it and save the data.[17]
On October 30, 2020, the Wayback Machine began fact-checking content.[18] As of January 2022, domains of ad servers are disabled from capturing.[19]
In May 2021, for Internet Archive's 25th anniversary, the Wayback Machine introduced the "Wayforward Machine" which allows users to "travel to the Internet in 2046, where knowledge is under siege".[20][21]
Legal status[edit]
In Europe, the Wayback Machine could be interpreted as violating copyright laws. Only the content creator can decide where their content is published or duplicated, so the Archive would have to delete pages from its system upon request of the creator.[84] The exclusion policies for the Wayback Machine may be found in the FAQ section of the site.[85]
Some cases have been brought against the Internet Archive specifically for its Wayback Machine archiving efforts.
Censorship and other threats[edit]
Archive.org is blocked in China.[100][101][102] The Internet Archive was blocked in its entirety in Russia in 2015–16, ostensibly for hosting a Jihad outreach video.[65][103][104] Since 2016, the website has been back, available in its entirety, although in 2016 Russian commercial lobbyists were suing the Internet Archive to ban it on copyright grounds.[105]
In March 2015, it was published that security researchers became aware of the threat posed by the service's unintentional hosting of malicious binaries from archived sites.[106][107]
Alison Macrina, director of the Library Freedom Project, notes that "while librarians deeply value individual privacy, we also strongly oppose censorship".[65]
There is at least one case in which an article was removed from the archive shortly after it had been removed from its original website. A Daily Beast reporter had written an article that outed several gay Olympian athletes in 2016 after he had made a fake profile posing as a gay man on a dating app. The Daily Beast removed the article after it was met with widespread furor; not long after, the Internet Archive soon did as well, but emphatically stated that they did so for no other reason than to protect the safety of the outed athletes.[65]
Other threats include natural disasters,[108] destruction (remote or physical),[109] manipulation of the archive's contents (see also: cyberattack, backup), problematic copyright laws[110] and surveillance of the site's users.[111]
Alexander Rose, executive director of the Long Now Foundation, suspects that in the long term of multiple generations "next to nothing" will survive in a useful way, stating, "If we have continuity in our technological civilization, I suspect a lot of the bare data will remain findable and searchable. But I suspect almost nothing of the format in which it was delivered will be recognizable" because sites "with deep back-ends of content-management systems like Drupal and Ruby and Django" are harder to archive.[112]
In an article reflecting on the preservation of human knowledge, The Atlantic has commented that the Internet Archive, which describes itself to be built for the long-term,[113] "is working furiously to capture data before it disappears without any long-term infrastructure to speak of."[114]