Google Books

Google Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean)^[1] is a service from Google that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database.^[2] Books are provided either by publishers and authors through the Google Books Partner Program, or by Google's library partners through the Library Project.^[3] Additionally, Google has partnered with a number of magazine publishers to digitize their archives.^[4]^[5]

"Google Print" redirects here. Not to be confused with Google Cloud Print.

Type of site

Digital library

Google

books.google.com

October 2004 (2004-10) (as Google Print)

Active

The Publisher Program was first known as Google Print when it was introduced at the Frankfurt Book Fair in October 2004. The Google Books Library Project, which scans works in the collections of library partners and adds them to the digital inventory, was announced in December 2004.

The Google Books initiative has been hailed for its potential to offer unprecedented access to what may become the largest online body of human knowledge^[6]^[7] and promoting the democratization of knowledge.^[8] However, it has also been criticized for potential copyright violations,^[8]^[9] and lack of editing to correct the many errors introduced into the scanned texts by the OCR process.

As of October 2019, Google celebrated 15 years of Google Books and provided the number of scanned books as more than 40 million titles.^[10] Google estimated in 2010 that there were about 130 million distinct titles in the world,^[11] and stated that it intended to scan all of them.^[11] However, the scanning process in American academic libraries has slowed since the 2000s.^[12]^[13] Google Book's scanning efforts have been subject to litigation, including Authors Guild v. Google, a class-action lawsuit in the United States, decided in Google's favor (see below). This was a major case that came close to changing copyright practices for orphan works in the United States.^[14] A 2023 study by scholars from the University of California, Berkeley and Northeastern University's business schools found that Google Books's digitization of books has led to increased sales for the physical versions of the books.^[15]

Full view: Books in the are available for "full view" and can be downloaded for free. In-print books acquired through the Partner Program are also available for full view if the publisher has given permission, although this is rare.

public domain

Preview: For in-print books where permission has been granted, the number of viewable pages is limited to a "preview" set by a variety of access restrictions and security measures, some based on user-tracking. Usually, the publisher can set the percentage of the book available for preview. Users are restricted from copying, downloading or printing book previews. A watermark reading "Copyrighted material" appears at the bottom of pages. All books acquired through the Partner Program are available for preview.

[17]

Snippet view: A "snippet view" – two to three lines of text surrounding the queried search term – is displayed in cases where Google does not have permission of the copyright owner to display a preview. This could be because Google cannot identify the owner or the owner declined permission. If a search term appears many times in a book, Google displays no more than three snippets, thus preventing the user from viewing too much of the book. Also, Google does not display any snippets for certain reference books, such as dictionaries, where the display of even snippets can harm the market for the work. Google maintains that no permission is required under copyright law to display the snippet view.

[18]

No preview: Google also displays search results for books that have not been digitized. As these books have not been scanned, their text is not searchable and only the such as the title, author, publisher, number of pages, ISBN, subject and copyright information, and in some cases, a table of contents and book summary is available. In effect, this is similar to an online library card catalog.^[3]

metadata

Results from Google Books show up in both the universal Google Search and in the dedicated Google Books search website (books.google.com).

In response to search queries, Google Books allows users to view full pages from books in which the search terms appear if the book is out of copyright or if the copyright owner has given permission. If Google believes the book is still under copyright, a user sees "snippets" of text around the queried search terms. All instances of the search terms in the book text appear with a yellow highlight.

The four access levels used on Google Books are:^[16]

In response to criticism from groups such as the American Association of Publishers and the Authors Guild, Google announced an opt-out policy in August 2005, through which copyright owners could provide a list of titles that they do not want scanned, and the request would be respected. The company also stated that it would not scan any in-copyright books between August and 1 November 2005, to provide the owners with the opportunity to decide which books to exclude from the Project. Thus, copyright owners have three choices with respect to any work:^[18]

Most scanned works are no longer in print or commercially available.^[19]

In addition to procuring books from libraries, Google also obtains books from its publisher partners, through the "Partner Program" – designed to help publishers and authors promote their books. Publishers and authors submit either a digital copy of their book in EPUB or PDF format, or a print copy to Google, which is made available on Google Books for preview. The publisher can control the percentage of the book available for preview, with the minimum being 20%. They can also choose to make the book fully viewable, and even allow users to download a PDF copy. Books can also be made available for sale on Google Play.^[3] Unlike the Library Project, this does not raise any copyright concerns as it is conducted pursuant to an agreement with the publisher. The publisher can choose to withdraw from the agreement at any time.^[18]

For many books, Google Books displays the original page numbers. However, Tim Parks, writing in The New York Review of Books in 2014, noted that Google had stopped providing page numbers for many recent publications (likely the ones acquired through the Partner Program) "presumably in alliance with the publishers, in order to force those of us who need to prepare footnotes to buy paper editions."^[20]

Scanning of books[edit]

The project began in 2002 under the codename Project Ocean. Google co-founder Larry Page had always had an interest in digitizing books. When he and Marissa Mayer began experimenting with book scanning in 2002, it took 40 minutes for them to digitize a 300-page book. But soon after the technology had been developed to the extent that scanning operators could scan up to 6000 pages an hour.^[14]

Google established designated scanning centers to which books were transported by trucks. The stations could digitize at the rate of 1,000 pages per hour. The books were placed in a custom-built mechanical cradle that adjusted the book spine in place while an array of lights and optical instruments scanned the two open pages. Each page would have two cameras directed at it capturing the image, while a range finder LIDAR overlaid a three-dimensional laser grid on the book's surface to capture the curvature of the paper. A human operator would turn the pages by hand, using a foot pedal to take the photographs. With no need to flatten the pages or align them perfectly, Google's system not only reached a remarkable efficiency and speed but also helped protect the fragile collections from being over-handled. Afterwards, the crude images went through three levels of processing: first, de-warping algorithms used the LIDAR data fix the pages' curvature. Then, optical character recognition (OCR) software transformed the raw images into text, and, lastly, another round of algorithms extracted page numbers, footnotes, illustrations and diagrams.^[14]

Many of the books are scanned using a customized Elphel 323 camera^[21]^[22] at a rate of 1,000 pages per hour.^[23] A patent awarded to Google in 2009 revealed that Google had come up with an innovative system for scanning books that uses two cameras and infrared light to automatically correct for the curvature of pages in a book. By constructing a 3D model of each page and then "de-warping" it, Google is able to present flat-looking pages without having to really make the pages flat, which requires the use of destructive methods such as unbinding or glass plates to individually flatten each page, which is inefficient for large scale scanning.^[24]^[25]

Google decided to omit color information in favour of better spatial resolution, as most out-of-copyright books at the time did not contain colors. Each page image was passed through algorithms that distinguished the text and illustration regions. Text regions were then processed via OCR to enable full-text searching. Google expended considerable resources in coming up with optimal compression techniques, aiming for high image quality while keeping the file sizes minimal to enable access by internet users with low bandwidth.^[26]

Website functionality[edit]

For each work, Google Books automatically generates an overview page. This page displays information extracted from the book—its publishing details, a high frequency word map, the table of contents—as well as secondary material, such as summaries, reader reviews (not readable in the mobile version of the website), and links to other relevant texts. A visitor to the page, for instance, might see a list of books that share a similar genre and theme, or they might see a list of current scholarship on the book. This content, moreover, offers interactive possibilities for users signed into their Google account. They can export the bibliographic data and citations in standard formats, write their own reviews, add it to their library to be tagged, organized, and shared with other people.^[27]^[28] Thus, Google Books collects these more interpretive elements from a range of sources, including the users, third-party sites like Goodreads, and often the book's author and publisher.^[29]

In fact, to encourage authors to upload their own books, Google has added several functionalities to the website. The authors can allow visitors to download their ebook for free, or they can set their own purchase price. They can change the price back and forth, offering discounts whenever it suits them. Also, if a book's author chooses to add an ISBN, LCCN or OCLC record number, the service will update the book's url to include it. Then, the author can set a specific page as the link's anchor. This option makes their book more easily discoverable.

Harvard University Library^[48]
The Harvard University Library and Google conducted a pilot throughout 2005. The project continued, with the aim of increasing online access to the holdings of the Harvard University Library, which includes more than 15.8 million volumes. While physical access to Harvard's library materials is generally restricted to current Harvard students, faculty, and researchers, or to scholars who can come to Cambridge, the Harvard-Google Project has been designed to enable both members of the Harvard community and users everywhere to discover works in the Harvard collection.

Harvard University

University of Michigan Library^[49]
As of March 2012, 5.5 million volumes were scanned.^[50]

University of Michigan

^[51]
In this pilot program, NYPL is working with Google to offer a collection of its public domain books, which will be scanned in their entirety and made available for free to the public online. Users will be able to search and browse the full text of these works. When the scanning process is complete, the books may be accessed from both The New York Public Library's website and from the Google search engine.^[51]

New York Public Library

Bodleian Library^[52]

University of Oxford

Stanford University Libraries (SULAIR)^[53]

Stanford University

is a volunteer effort to digitize and archive cultural works, to "encourage the creation and distribution of eBooks". It was founded in 1971 by Michael S. Hart and is the oldest digital library. As of October 3, 2015, Project Gutenberg reached 50,000 items in its collection.

Project Gutenberg

is a non-profit which digitizes over 1000 books a day, as well as mirrors books from Google Books and other sources. As of May 2011, it hosted over 2.8 million public domain books, greater than the approximate 1 million public domain books at Google Books.^[132] Open Library, a sister project of Internet Archive, lends 80,000 scanned and purchased commercial ebooks to the visitors of 150 libraries.^[133]

Internet Archive

maintains HathiTrust Digital Library since October 13, 2008,^[134] which preserves and provides access to material scanned by Google, some of the Internet Archive books, and some scanned locally by partner institutions. As of May 2010, it includes about 6 million volumes, over 1 million of which are public domain (at least in the US).

HathiTrust

an online collection of over 5,400 books of high quality in the humanities and related social sciences, accessible through institutional subscription.

ACLS Humanities E-Book

Microsoft funded the scanning of 300,000 books to create in late 2006. It ran until May 2008, when the project was abandoned^[135] and the books were made freely available on the Internet Archive.^[136]

Live Search Books

The (NDLI) is a project under Ministry of Human Resource Development, India. The objective is to integrate several national and international digital libraries in one single web-portal. The NDLI provides free of cost access to many books in English and the Indian languages.

National Digital Library of India

links to roughly 10 million digital objects as of 2010, including video, photos, paintings, audio, maps, manuscripts, printed books, and newspapers from the past 2,000 years of European history from over 1,000 archives in the European Union.^[137]^[138]

Europeana

from the French National Library links to about 4,000,000 digitized books, newspapers, manuscripts, maps and drawings, etc. Created in 1997, the digital library continues to expand at a rate of about 5000 new documents per month. Since the end of 2008, most of the new scanned documents are available in image and text formats. Most of these documents are written in French.

Gallica

Wikisource

Runivers

Amazon.com's book search

A9.com

Book Rights Registry

Digital library

List of digital library projects

Universal library

National Electronic Library

Hoffmann, Anna Lauren (2016). "Google Books, Libraries, and Self-Respect: Information Justice beyond Distributions". . 86: 76–92. doi:10.1086/684141. S2CID 146482065.

Library Quarterly

Jeanneney, Jean-Noël (2008). Google and the Myth of Universal Knowledge: A View from Europe. Chicago, IL: University of Chicago Press.

Official website

Jones, Elisabeth (May 14, 2013). .

"New Google Books Library Project Timeline: Now With (more) Citations!"

Darnton, Robert (February 12, 2009). . New York Review of Books. Vol. 56, no. 2. Archived from the original on January 25, 2009.

"Google & the Future of Books"

. Public Domain Reprints.

"Public Domain Archive and Reprints Service"

Somers, James (Apr 20, 2017). . The Atlantic.

"Torching the Modern-Day Library of Alexandria"

(February 5, 2007). "Google's Moon Shot". The New Yorker. Archived from the original on February 2, 2007.

Google Books

Type of site

Type of site

Owner

URL

Launched

Current status

public domain

[17]

[18]

metadata

Scanning of books[edit]

Website functionality[edit]

Harvard University

University of Michigan

New York Public Library

University of Oxford

Stanford University

Project Gutenberg

Internet Archive

HathiTrust

ACLS Humanities E-Book

Live Search Books

National Digital Library of India

Europeana

Gallica

Wikisource

Runivers

A9.com

Book Rights Registry

Digital library

List of digital library projects

Universal library

National Electronic Library

Library Quarterly

Official website

"New Google Books Library Project Timeline: Now With (more) Citations!"

"Google & the Future of Books"

"Public Domain Archive and Reprints Service"

"Torching the Modern-Day Library of Alexandria"

Toobin, Jeffrey