Semantic Web

The Semantic Web, sometimes known as Web 3.0 (not to be confused with Web3), is an extension of the World Wide Web through standards^[1] set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

This article is about the concept of an Internet based around machine-readability and interoperability standards. For the concept of a World Wide Web based on public blockchains, see Web3.

To enable the encoding of semantics with the data, technologies such as Resource Description Framework (RDF)^[2] and Web Ontology Language (OWL)^[3] are used. These technologies are used to formally represent metadata. For example, ontology can describe concepts, relationships between entities, and categories of things. These embedded semantics offer significant advantages such as reasoning over data and operating with heterogeneous data sources.^[4]

These standards promote common data formats and exchange protocols on the Web, fundamentally the RDF. According to the W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries."^[5] The Semantic Web is therefore regarded as an integrator across different content and information applications and systems.

The term was coined by Tim Berners-Lee for a web of data (or data web)^[6] that can be processed by machines^[7]—that is, one in which much of the meaning is machine-readable. While its critics have questioned its feasibility, proponents argue that applications in library and information science, industry, biology and human sciences research have already proven the validity of the original concept.^[8]

Berners-Lee originally expressed his vision of the Semantic Web in 1999 as follows:

The 2001 Scientific American article by Berners-Lee, Hendler, and Lassila described an expected evolution of the existing Web to a Semantic Web.^[10] In 2006, Berners-Lee and colleagues stated that: "This simple idea…remains largely unrealized".^[11] In 2013, more than four million Web domains (out of roughly 250 million total) contained Semantic Web markup.^[12]

Vastness: The World Wide Web contains many billions of pages. The medical terminology ontology alone contains 370,000 class names, and existing technology has not yet been able to eliminate all semantically duplicated terms. Any automated reasoning system will have to deal with truly huge inputs.

SNOMED CT

Vagueness: These are imprecise concepts like "young" or "tall". This arises from the vagueness of user queries, of concepts represented by content providers, of matching query terms to provider terms and of trying to combine different with overlapping but subtly different concepts. Fuzzy logic is the most common technique for dealing with vagueness.

knowledge bases

Uncertainty: These are precise concepts with uncertain values. For example, a patient might present a set of symptoms that correspond to a number of different distinct diagnoses each with a different probability. reasoning techniques are generally employed to address uncertainty.

Probabilistic

Inconsistency: These are logical contradictions that will inevitably arise during the development of large ontologies, and when ontologies from separate sources are combined. Deductive reasoning fails catastrophically when faced with inconsistency, because . Defeasible reasoning and paraconsistent reasoning are two techniques that can be employed to deal with inconsistency.

"anything follows from a contradiction"

Deceit: This is when the producer of the information is intentionally misleading the consumer of the information. techniques are currently utilized to alleviate this threat. By providing a means to determine the information's integrity, including that which relates to the identity of the entity that produced or published the information, however credibility issues still have to be addressed in cases of potential deceit.

Cryptography

Some of the challenges for the Semantic Web include vastness, vagueness, uncertainty, inconsistency, and deceit. Automated reasoning systems will have to deal with all of these issues in order to deliver on the promise of the Semantic Web.

This list of challenges is illustrative rather than exhaustive, and it focuses on the challenges to the "unifying logic" and "proof" layers of the Semantic Web. The World Wide Web Consortium (W3C) Incubator Group for Uncertainty Reasoning for the World Wide Web^[24] (URW3-XG) final report lumps these problems together under the single heading of "uncertainty".^[25] Many of the techniques mentioned here will require extensions to the Web Ontology Language (OWL) for example to annotate conditional probabilities. This is an area of active research.^[26]

(RDF), a general method for describing information

Resource Description Framework

(RDFS)

RDF Schema

(SKOS)

Simple Knowledge Organization System

an RDF query language

SPARQL

(N3), designed with human readability in mind

Notation3

a format for storing and transmitting data

N-Triples

(Terse RDF Triple Language)

Turtle

(OWL), a family of knowledge representation languages

Web Ontology Language

(RIF), a framework of web rule language dialects supporting rule interchange on the Web

Rule Interchange Format

(JSON-LD), a JSON-based method to describe data

JavaScript Object Notation for Linked Data

a generic way for client and server to communicate with each other. This is used by the popular decentralized social network Mastodon.

ActivityPub

Servers that expose existing data systems using the RDF and SPARQL standards. Many converters to RDF exist from different applications. Relational databases are an important source. The semantic web server attaches to the existing system without affecting its operation.

[31]

Documents "marked up" with semantic information (an of the HTML <meta> tags used in today's Web pages to supply information for Web search engines using web crawlers). This could be machine-understandable information about the human-understandable content of the document (such as the creator, title, description, etc.) or it could be purely metadata representing a set of facts (such as resources and services elsewhere on the site). Note that anything that can be identified with a Uniform Resource Identifier (URI) can be described, so the semantic web can reason about animals, people, places, ideas, etc. There are four semantic annotation formats that can be used in HTML documents; Microformat, RDFa, Microdata and JSON-LD.^[32] Semantic markup is often generated automatically, rather than manually.

extension

The intent is to enhance the usability and usefulness of the Web and its interconnected resources by creating semantic web services, such as:

Such services could be useful to public search engines, or could be used for knowledge management within an organization. Business applications include:

In a corporation, there is a closed group of users and the management is able to enforce company guidelines like the adoption of specific ontologies and use of semantic annotation. Compared to the public Semantic Web there are lesser requirements on scalability and the information circulating within a company can be more trusted in general; privacy is less of an issue outside of handling of customer data.

Skeptical reactions[edit]

Practical feasibility[edit]

Critics question the basic feasibility of a complete or even partial fulfillment of the Semantic Web, pointing out both difficulties in setting it up and a lack of general-purpose usefulness that prevents the required effort from being invested. In a 2003 paper, Marshall and Shipman point out the cognitive overhead inherent in formalizing knowledge, compared to the authoring of traditional web hypertext:^[43]

Liyang Yu (December 14, 2014). . Springer. ISBN 978-3-662-43796-4.

A Developer's Guide to the Semantic Web, 2nd ed

donated by Morgan & Claypool Publishers after Aaron Swartz's death in January 2013.

Aaron Swartz's A Programmable Web: An unfinished Work

Grigoris Antoniou, (March 31, 2008). A Semantic Web Primer, 2nd Edition. The MIT Press. ISBN 978-0-262-01242-3.

Frank van Harmelen

Allemang, Dean; Hendler, James; Gandon, Fabien (August 3, 2020). Semantic Web for the Working Ontologist : Effective Modeling for Linked Data, RDFS, and OWL (Third ed.). [New York, NY, USA]: ACM Books; 3rd edition. 978-1450376143.

ISBN

; Markus Krötzsch; Sebastian Rudolph (August 25, 2009). Foundations of Semantic Web Technologies. CRCPress. ISBN 978-1-4200-9050-5.

Pascal Hitzler

Thomas B. Passin (March 1, 2004). Explorer's Guide to the Semantic Web. Manning Publications. 978-1-932394-20-7.

ISBN

Jeffrey T. Pollock (March 23, 2009). . For Dummies. ISBN 978-0-470-39679-7.

Semantic Web For Dummies

Hitzler, Pascal (February 2021). . Communications of the ACM. 64 (2): 76–83. doi:10.1145/3397512.

"A Review of the Semantic Web Field"

Unni, Deepak (March 2023). . Scientific Data. 10 (1): 127. Bibcode:2023NatSD..10..127T. doi:10.1038/s41597-023-02028-y. PMC 10006404. PMID 36899064.

Semantic Web

SNOMED CT

knowledge bases

Probabilistic

"anything follows from a contradiction"

Cryptography

Resource Description Framework

RDF Schema

Simple Knowledge Organization System

SPARQL

Notation3

N-Triples

Turtle

Web Ontology Language

Rule Interchange Format

JavaScript Object Notation for Linked Data

ActivityPub

[31]

extension

Skeptical reactions[edit]

Practical feasibility[edit]

A Developer's Guide to the Semantic Web, 2nd ed

Aaron Swartz's A Programmable Web: An unfinished Work

Frank van Harmelen

ISBN

Pascal Hitzler

ISBN

Semantic Web For Dummies

"A Review of the Semantic Web Field"

"FAIRification of health-related data using semantic web technologies in the Swiss Personalized Health Network"

Official website