WordNet

WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser,^[2] its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language^[3] and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website. There are now WordNets in more than 200 languages.^[4]

Developer(s)

Princeton University

mid 1980s

3.1 / June 2011^[1]

Prolog

Unix, Linux, Solaris, Windows

16MB (including 155,327 words organized in 175,979 synsets for a total of 207,016 word-sense pairs)

More than 200 languages

Lexical database

BSD-like

wordnet.princeton.edu

History and team members[edit]

WordNet was first created in 1985, in English only, in the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George Armitage Miller. It was later directed by Christiane Fellbaum. The project was initially funded by the U.S. Office of Naval Research, and later also by other U.S. government agencies including the DARPA, the National Science Foundation, the Disruptive Technology Office (formerly the Advanced Research and Development Activity) and REFLEX. George Miller and Christiane Fellbaum received the 2006 Antonio Zampolli Prize for their work with WordNet.

The Global WordNet Association is a non-commercial organization that provides a platform for discussing, sharing and connecting WordNets for all languages in the world. Christiane Fellbaum and Piek Th.J.M. Vossen are its co-presidents.^[5]

Nouns

hypernym

Verbs

troponym

The database contains 155,327 words organized in 175,979 synsets for a total of 207,016 word-sense pairs; in compressed form, it is about 12 megabytes in size.^[6]

It includes the lexical categories nouns, verbs, adjectives and adverbs but ignores prepositions, determiners and other function words.

Words from the same lexical category that are roughly synonymous are grouped into synsets, which include simplex words as well as collocations like "eat out" and "car pool." The different senses of a polysemous word form are assigned to different synsets. A synset's meaning is further clarified with a short defining gloss and one or more usage examples. An example adjective synset is:

All synsets are connected by means of semantic relations. These relations, which are not all shared by all lexical categories, include:

These semantic relations hold among all members of the linked synsets. Individual synset members (words) can also be connected with lexical relations. For example, (one sense of) the noun "director" is linked to (one sense of) the verb "direct" from which it is derived via a "morphosemantic" link.

The morphology functions of the software distributed with the database try to deduce the lemma or stem form of a word from the user's input. Irregular forms are stored in a list, and looking up "ate" will return "eat," for example.

Psycholinguistic aspects[edit]

The initial goal of the WordNet project was to build a lexical database that would be consistent with theories of human semantic memory developed in the late 1960s. Psychological experiments indicated that speakers organized their knowledge of concepts in an economic, hierarchical fashion. Retrieval time required to access conceptual knowledge seemed to be directly related to the number of hierarchies the speaker needed to "traverse" to access the knowledge. Thus, speakers could more quickly verify that canaries can sing because a canary is a songbird, but required slightly more time to verify that canaries can fly (where they had to access the concept "bird" on the superordinate level) and even more time to verify canaries have skin (requiring look-up across multiple levels of hyponymy, up to "animal").^[7] While such psycholinguistic experiments and the underlying theories have been subject to criticism, some of WordNet's organization is consistent with experimental evidence. For example, anomic aphasia selectively affects speakers' ability to produce words from a specific semantic category, a WordNet hierarchy. Antonymous adjectives (WordNet's central adjectives in the dumbbell structure) are found to co-occur far more frequently than chance, a fact that has been found to hold for many languages.

As a lexical ontology[edit]

WordNet is sometimes called an ontology, a persistent claim that its creators do not make. The hypernym/hyponym relationships among the noun synsets can be interpreted as specialization relations among conceptual categories. In other words, WordNet can be interpreted and used as a lexical ontology in the computer science sense. However, such an ontology should be corrected before being used, because it contains hundreds of basic semantic inconsistencies; for example there are, (i) common specializations for exclusive categories and (ii) redundancies in the specialization hierarchy. Furthermore, transforming WordNet into a lexical ontology usable for knowledge representation should normally also involve (i) distinguishing the specialization relations into subtypeOf and instanceOf relations, and (ii) associating intuitive unique identifiers to each category. Although such corrections and transformations have been performed and documented as part of the integration of WordNet 1.7 into the cooperatively updatable knowledge base of WebKB-2,^[8] most projects claiming to reuse WordNet for knowledge-based applications (typically, knowledge-oriented information retrieval) simply reuse it directly.

WordNet has also been converted to a formal specification, by means of a hybrid bottom-up top-down methodology to automatically extract association relations from it and interpret these associations in terms of a set of conceptual relations, formally defined in the DOLCE foundational ontology.^[9]

In most works that claim to have integrated WordNet into ontologies, the content of WordNet has not simply been corrected when it seemed necessary; instead, it has been heavily reinterpreted and updated whenever suitable. This was the case when, for example, the top-level ontology of WordNet was restructured^[10] according to the OntoClean-based approach, or when it was used as a primary source for constructing the lower classes of the SENSUS ontology.

Applications[edit]

WordNet has been used for a number of purposes in information systems, including word-sense disambiguation, information retrieval, automatic text classification, automatic text summarization, machine translation and even automatic crossword puzzle generation.

A common use of WordNet is to determine the similarity between words. Various algorithms have been proposed, including measuring the distance among words and synsets in WordNet's graph structure, such as by counting the number of edges among synsets. The intuition is that the closer two words or synsets are, the closer their meaning. A number of WordNet-based word similarity algorithms are implemented in a Perl package called WordNet::Similarity,^[21] and in a Python package called NLTK.^[22] Other more sophisticated WordNet-based similarity techniques include ADW,^[23] whose implementation is available in Java. WordNet can also be used to inter-link other vocabularies.^[24]

Interfaces[edit]

Princeton maintains a list of related projects^[25] that includes links to some of the widely used application programming interfaces available for accessing WordNet using various programming languages and environments.

:^[28]^[29] WordNet for Arabic language.

Arabic WordNet

a linguistic ontology that has the same structure as wordnet, and mapped to it.

Arabic Ontology

The BalkaNet project has produced WordNets for six European languages (Bulgarian, Czech, Greek, Romanian, Turkish and Serbian). For this project, a freely available XML-based WordNet editor was developed. This editor – VisDic – is not in active development anymore, but is still used for the creation of various WordNets. Its successor, DEBVisDic, is client-server application and is currently used for the editing of several WordNets (Dutch in Cornetto project, Polish, Hungarian, several African languages, Chinese).

[30]

is a Bulgarian version of the WordNet developed at the Department of Computational Linguistics of the Institute for Bulgarian Language, Bulgarian Academy of Sciences.^[31]

BulNet

CWN (Chinese Wordnet or 中文詞彙網路) supported by .^[32]

National Taiwan University

The project^[33] has produced WordNets for several European languages and linked them together; these are not freely available however. The Global Wordnet project attempts to coordinate the production and linking of "wordnets" for all languages.^[34] Oxford University Press, the publisher of the Oxford English Dictionary, has voiced plans to produce their own online competitor to WordNet.

EuroWordNet

FinnWordNet is a Finnish version of the WordNet where all entries of the original English WordNet were translated.

[35]

is a German version of the WordNet developed by the University of Tübingen.^[36]

GermaNet

The ^[37] is a linked lexical knowledge base of wordnets of 18 scheduled languages of India viz., Assamese, Bangla, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Meitei (Manipuri), Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu and Urdu.

IndoWordNet

JAWS (Just Another WordNet Subset), another French version of WordNet built using the Wiktionary and semantic spaces

[38]

: WordNet for Malay and Indonesia language, developed by Nanyang University of Technology.

WordNet Bahasa

developed by Cochin University Of Science and Technology.^[39]

Malayalam WordNet

Multilingual Central Repository (MCR) integrates in the same EuroWordNet framework wordnets from Spanish, Catalan, Basque, Galician and Portuguese liked to English.

[40]

The MultiWordNet project, a multilingual WordNet aimed at producing an Italian WordNet strongly aligned with the Princeton WordNet.

[41]

OpenDutchWordNet, is a Dutch lexical semantic database.

[42]

OpenWN-PT is a Brazilian Portuguese version of the original WordNet freely available for download under CC-BY-SA license.

[43]

^[44] is a Polish-language version of WordNet developed by Wrocław University of Technology.

plWordNet

PolNet is a Polish-language version of WordNet developed by Adam Mickiewicz University in Poznań (distributed under CC BY-NC-ND 3.0 license).

[45]

^[65]

Babylon

GoldenDict

[66]

^[67]

Lingoes

: Digital Platform for publishing reference works (dictionaries, encyclopedias, etc.). Includes WordnetPlus.

LexSemantic

WordNet Database is distributed as a dictionary package (usually a single file) for the following software:

WordNet

Developer(s)

Developer(s)

Initial release

Stable release

Written in

Operating system

Size

Available in

Type

Licence

Website

History and team members[edit]

hypernym

troponym

Psycholinguistic aspects[edit]

As a lexical ontology[edit]

Applications[edit]

Interfaces[edit]

Arabic WordNet

Arabic Ontology

[30]

BulNet

National Taiwan University

EuroWordNet

[35]

GermaNet

IndoWordNet

[38]

WordNet Bahasa

Malayalam WordNet

[40]

[41]

[42]

[43]

plWordNet

[45]

Babylon

[66]

Lingoes

LexSemantic

Lexical Markup Framework

Machine-readable dictionary

Synonym Ring

Taxonomy

Official website

"Malayalam WordNet"

"Adjectives, Intensifiers, Negations (AIN) Thesaurus"