Katana VentraIP

CiteSeerX

CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.

Type of site

Español

Pennsylvania State University College of Information Sciences and Technology

Active

Optional

2008 (2008) / 1997 (1997)

Active

CiteSeer's goal is to improve the dissemination and access of academic and scientific literature. As a non-profit service that can be freely used by anyone, it has been considered as part of the open access movement that is attempting to change academic and scientific publishing to allow greater access to scientific literature. CiteSeer freely provided Open Archives Initiative metadata of all indexed documents and links indexed documents when possible to other sources of metadata such as DBLP and the ACM Portal. To promote open data, CiteSeerX shares its data for non-commercial purposes under a Creative Commons license.[1]


CiteSeer is considered as a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search.[2] CiteSeer-like engines and archives usually only harvest documents from publicly available websites and do not crawl publisher websites. For this reason, authors whose documents are freely available are more likely to be represented in the index.


CiteSeer changed its name to ResearchIndex at one point and then changed it back.[3]

History[edit]

CiteSeer and CiteSeer.IST[edit]

CiteSeer was created by researchers Lee Giles, Kurt Bollacker and Steve Lawrence in 1997 while they were at the NEC Research Institute (now NEC Labs), Princeton, New Jersey, US. CiteSeer's goal was to actively crawl and harvest academic and scientific documents on the web and use autonomous citation indexing to permit querying by citation or by document, ranking them by citation impact. At one point, it was called ResearchIndex.


CiteSeer became public in 1998 and had many new features unavailable in academic search engines at that time. These included:

Current features[edit]

Automated information extraction[edit]

CiteSeerX uses automated information extraction tools, usually built on machine learning methods such ParsCit, to extract scholarly document metadata such as title, authors, abstract, citations, etc. As such, there are sometime errors in authors and titles. Other academic search engines have similar errors.

Focused crawling[edit]

CiteSeerX crawls publicly available scholarly documents primarily from author webpages and other open resources, and does not have access to publisher metadata. As such, citation counts in CiteSeerX are usually less than those in Google Scholar and Microsoft Academic Search who have access to publisher metadata.

Usage[edit]

CiteSeerX has nearly one million users worldwide based on unique IP addresses and has millions of hits daily. Annual downloads of document PDFs were nearly 200 million for 2015.

Data[edit]

CiteSeerX data is regularly shared under a Creative Commons BY-NC-SA license with researchers worldwide and has been and is used in many experiments and competitions.


Thanks to its OAI-PMH endpoint,[9] CiteSeerX is an open archive and its content is indexed like an institutional repository in academic search engines, for instance BASE and Unpaywall consumers.

Other SeerSuite-based search engines[edit]

The CiteSeer model had been extended to cover academic documents in business with SmealSearch and in e-business with eBizSearch. However, these were not maintained by their sponsors. An older version of both of these could be once found at BizSeer.IST but is no longer in service.


Other Seer-like search and repository systems have been built for chemistry, ChemXSeer and for archaeology, ArchSeer. Another had been built for robots.txt file search, BotSeer. All of these are built on the open source tool SeerSuite, which uses the open source indexer Lucene.

Giles, C. Lee; Bollacker, Kurt D.; Lawrence, Steve (1998). "CiteSeer: an automatic citation indexing system". Proceedings of the Third ACM Conference on Digital Libraries. pp. 89–98.  10.1.1.30.6847. doi:10.1145/276675.276685. ISBN 978-0-89791-965-4. S2CID 514080.

CiteSeerX

Edit this at Wikidata

Official website