Extraction from structured sources to RDF[edit]

1:1 Mapping from RDB Tables/Views to RDF Entities/Attributes/Values[edit]

When building a RDB representation of a problem domain, the starting point is frequently an entity-relationship diagram (ERD). Typically, each entity is represented as a database table, each attribute of the entity becomes a column in that table, and relationships between entities are indicated by foreign keys. Each table typically defines a particular class of entity, each column one of its attributes. Each row in the table describes an entity instance, uniquely identified by a primary key. The table rows collectively describe an entity set. In an equivalent RDF representation of the same entity set:

part-of-speech (POS) tagging

lemmatization (LEMMA) or stemming (STEM)

(WSD, related to semantic annotation below)

word sense disambiguation

named entity recognition (NER, also see IE below)

syntactic parsing, often adopting syntactic dependencies (DEP)

shallow syntactic parsing (CHUNK): if performance is an issue, chunking yields a fast extraction of nominal and other phrases

(see coreference resolution in IE below, but seen here as the task to create links between textual mentions rather than between the mention of an entity and an abstract representation of the entity)

anaphor resolution

(SRL, related to relation extraction; not to be confused with semantic annotation as described below)

semantic role labelling

discourse parsing (relations between different sentences, rarely used in real-world applications)

Databases

Relational data

Software

Source code

Text

Concept mining

Graphs

Molecule mining

Sequences

Data stream mining

Web

Cluster analysis

Data archaeology

Chicco, D; Masseroli, M (2016). . IEEE/ACM Transactions on Computational Biology and Bioinformatics. 13 (2): 248–260. doi:10.1109/TCBB.2015.2459694. PMID 27045825. S2CID 2795344.

"Ontology-based prediction and prioritization of gene functional annotations"