Description of AWA[edit]
A web archive is described by the NLA as a "collection of snapshots of websites captured while they are accessible on the web, and then preserved in a static copy". The collection archived in the AWA is "relevant to the cultural, social, political, research and commercial life and activities of Australia and Australians". It collects web material via both scheduled archiving of selected websites and publications as well as some ad hoc harvesting relating to significant events.[9]
As of March 2019, when it began, AWA already contained around 600 terabytes of data, with 9 billion records.[5][13] It contains more functionality than the Wayback Machine, hosted by the Internet Archive, allowing full-text searching using a search engine built in-house. The developers also devised techniques to filter out unwanted "noise". The data remains on the Library servers, although a move to the cloud is envisaged in the future, as content grows.[5] Usability by a wide range of users, and in particular the search functionality, were major focuses during development.[9]
The archive is fully searchable, based on a combination of techniques used by the developers. Each team created a unique and complex search algorithm, by adapting a version of Google’s page ranking algorithm (based frequency of clicks on a page), modified to lead to better, high-quality resources. Other technologies include a Bayesian filter (effectively a spam filter), a Not Safe For Work classifier from Yahoo, and machine learning.[14]
There is a "Limit to the gov.au web domain" option before searching,[15] and government websites archived via AGWA can still be searched separately using the "Advanced Search" option.[9] Other options in Advanced Search are to limit by timespan of the snapshots, domain and file type.[16]
With many of the earlier websites from the 1990s now lost, mainly because of the frequent change of web platforms, the Australian Web Archive is a significant initiative that will help to save current and future web pages, especially Australian content.[4] Material will continue to be added to the Archive, and other online material collected in accordance with the National Library Act 1960, the legal deposit provisions of the Copyright Act 1968 and the NLA's digital collections selection policy.[9]
Asia/Pacific websites[edit]
Websites in the Asia Pacific region are not included in the AWA, but NLA partners with the Internet Archive to collect and preserve "selected Asia/Pacific websites related to specific events or socio-political groups".[17]