Google Books
Google Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean)[1] is a service from Google that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database.[2] Books are provided either by publishers and authors through the Google Books Partner Program, or by Google's library partners through the Library Project.[3] Additionally, Google has partnered with a number of magazine publishers to digitize their archives.[4][5]
"Google Print" redirects here. Not to be confused with Google Cloud Print.
Type of site
The Publisher Program was first known as Google Print when it was introduced at the Frankfurt Book Fair in October 2004. The Google Books Library Project, which scans works in the collections of library partners and adds them to the digital inventory, was announced in December 2004.
The Google Books initiative has been hailed for its potential to offer unprecedented access to what may become the largest online body of human knowledge[6][7] and promoting the democratization of knowledge.[8] However, it has also been criticized for potential copyright violations,[8][9] and lack of editing to correct the many errors introduced into the scanned texts by the OCR process.
As of October 2019, Google celebrated 15 years of Google Books and provided the number of scanned books as more than 40 million titles.[10]
Google estimated in 2010 that there were about 130 million distinct titles in the world,[11] and stated that it intended to scan all of them.[11] However, the scanning process in American academic libraries has slowed since the 2000s.[12][13] Google Book's scanning efforts have been subject to litigation, including Authors Guild v. Google, a class-action lawsuit in the United States, decided in Google's favor (see below). This was a major case that came close to changing copyright practices for orphan works in the United States.[14] A 2023 study by scholars from the University of California, Berkeley and Northeastern University's business schools found that Google Books's digitization of books has led to increased sales for the physical versions of the books.[15]
Results from Google Books show up in both the universal Google Search and in the dedicated Google Books search website (books.google.com).
In response to search queries, Google Books allows users to view full pages from books in which the search terms appear if the book is out of copyright or if the copyright owner has given permission. If Google believes the book is still under copyright, a user sees "snippets" of text around the queried search terms. All instances of the search terms in the book text appear with a yellow highlight.
The four access levels used on Google Books are:[16]
In response to criticism from groups such as the American Association of Publishers and the Authors Guild, Google announced an opt-out policy in August 2005, through which copyright owners could provide a list of titles that they do not want scanned, and the request would be respected. The company also stated that it would not scan any in-copyright books between August and 1 November 2005, to provide the owners with the opportunity to decide which books to exclude from the Project. Thus, copyright owners have three choices with respect to any work:[18]
Most scanned works are no longer in print or commercially available.[19]
In addition to procuring books from libraries, Google also obtains books from its publisher partners, through the "Partner Program" – designed to help publishers and authors promote their books. Publishers and authors submit either a digital copy of their book in EPUB or PDF format, or a print copy to Google, which is made available on Google Books for preview. The publisher can control the percentage of the book available for preview, with the minimum being 20%. They can also choose to make the book fully viewable, and even allow users to download a PDF copy. Books can also be made available for sale on Google Play.[3] Unlike the Library Project, this does not raise any copyright concerns as it is conducted pursuant to an agreement with the publisher. The publisher can choose to withdraw from the agreement at any time.[18]
For many books, Google Books displays the original page numbers. However, Tim Parks, writing in The New York Review of Books in 2014, noted that Google had stopped providing page numbers for many recent publications (likely the ones acquired through the Partner Program) "presumably in alliance with the publishers, in order to force those of us who need to prepare footnotes to buy paper editions."[20]
Scanning of books[edit]
The project began in 2002 under the codename Project Ocean. Google co-founder Larry Page had always had an interest in digitizing books. When he and Marissa Mayer began experimenting with book scanning in 2002, it took 40 minutes for them to digitize a 300-page book. But soon after the technology had been developed to the extent that scanning operators could scan up to 6000 pages an hour.[14]
Google established designated scanning centers to which books were transported by trucks. The stations could digitize at the rate of 1,000 pages per hour. The books were placed in a custom-built mechanical cradle that adjusted the book spine in place while an array of lights and optical instruments scanned the two open pages. Each page would have two cameras directed at it capturing the image, while a range finder LIDAR overlaid a three-dimensional laser grid on the book's surface to capture the curvature of the paper. A human operator would turn the pages by hand, using a foot pedal to take the photographs. With no need to flatten the pages or align them perfectly, Google's system not only reached a remarkable efficiency and speed but also helped protect the fragile collections from being over-handled. Afterwards, the crude images went through three levels of processing: first, de-warping algorithms used the LIDAR data fix the pages' curvature. Then, optical character recognition (OCR) software transformed the raw images into text, and, lastly, another round of algorithms extracted page numbers, footnotes, illustrations and diagrams.[14]
Many of the books are scanned using a customized Elphel 323 camera[21][22] at a rate of 1,000 pages per hour.[23] A patent awarded to Google in 2009 revealed that Google had come up with an innovative system for scanning books that uses two cameras and infrared light to automatically correct for the curvature of pages in a book. By constructing a 3D model of each page and then "de-warping" it, Google is able to present flat-looking pages without having to really make the pages flat, which requires the use of destructive methods such as unbinding or glass plates to individually flatten each page, which is inefficient for large scale scanning.[24][25]
Google decided to omit color information in favour of better spatial resolution, as most out-of-copyright books at the time did not contain colors. Each page image was passed through algorithms that distinguished the text and illustration regions. Text regions were then processed via OCR to enable full-text searching. Google expended considerable resources in coming up with optimal compression techniques, aiming for high image quality while keeping the file sizes minimal to enable access by internet users with low bandwidth.[26]
Website functionality[edit]
For each work, Google Books automatically generates an overview page. This page displays information extracted from the book—its publishing details, a high frequency word map, the table of contents—as well as secondary material, such as summaries, reader reviews (not readable in the mobile version of the website), and links to other relevant texts. A visitor to the page, for instance, might see a list of books that share a similar genre and theme, or they might see a list of current scholarship on the book. This content, moreover, offers interactive possibilities for users signed into their Google account. They can export the bibliographic data and citations in standard formats, write their own reviews, add it to their library to be tagged, organized, and shared with other people.[27][28] Thus, Google Books collects these more interpretive elements from a range of sources, including the users, third-party sites like Goodreads, and often the book's author and publisher.[29]
In fact, to encourage authors to upload their own books, Google has added several functionalities to the website. The authors can allow visitors to download their ebook for free, or they can set their own purchase price. They can change the price back and forth, offering discounts whenever it suits them. Also, if a book's author chooses to add an ISBN, LCCN or OCLC record number, the service will update the book's url to include it. Then, the author can set a specific page as the link's anchor. This option makes their book more easily discoverable.