Metadata

Metadata (or metainformation) is "data that provides information about other data",^[1] but not the content of the data itself, such as the text of a message or the image itself.^[2] There are many distinct types of metadata, including:

Metadata is not strictly bound to one of these categories, as it can describe a piece of data in many other ways.

History[edit]

Metadata has various purposes. It can help users find relevant information and discover resources. It can also help organize electronic resources, provide digital identification, and archive and preserve resources. Metadata allows users to access resources by "allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information".^[8] Metadata of telecommunication activities including Internet traffic is very widely collected by various national governmental organizations. This data is used for the purposes of traffic analysis and can be used for mass surveillance.^[9]

Metadata was traditionally used in the card catalogs of libraries until the 1980s when libraries converted their catalog data to digital databases.^[10] In the 2000s, as data and information were increasingly stored digitally, this digital data was described using metadata standards.^[11]

The first description of "meta data" for computer systems is purportedly noted by MIT's Center for International Studies experts David Griffel and Stuart McIntosh in 1967: "In summary then, we have statements in an object language about subject descriptions of data and token codes for the data. We also have statements in a meta language describing the data relationships and transformations, and ought/is relations between norm and data."^[12]

Unique metadata standards exist for different disciplines (e.g., museum collections, digital audio files, websites, etc.). Describing the contents and context of data or data files increases its usefulness. For example, a web page may include metadata specifying what software language the page is written in (e.g., HTML), what tools were used to create it, what subjects the page is about, and where to find more information about the subject. This metadata can automatically improve the reader's experience and make it easier for users to find the web page online.^[13] A CD may include metadata providing information about the musicians, singers, and songwriters whose work appears on the disc.

In many countries, government organizations routinely store metadata about emails, telephone calls, web pages, video traffic, IP connections, and cell phone locations.

Means of creation of the data

Purpose of the data

Time and date of creation

Creator or author of the data

Location on a where the data was created

computer network

used

Standards

File size

Data quality

Source of the data

Process used to create the data

Metadata means "data about data". Metadata is defined as the data providing information about one or more aspects of the data; it is used to summarize basic information about data that can make tracking and working with specific data easier.^[14] Some examples include:

For example, a digital image may include metadata that describes the size of the image, its color depth, resolution, when it was created, the shutter speed, and other data.^[15] A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document. Metadata within web pages can also contain descriptions of page content, as well as key words linked to the content.^[16] These links are often called "Metatags", which were used as the primary factor in determining order for a web search until the late 1990s.^[16] The reliance on metatags in web searches was decreased in the late 1990s because of "keyword stuffing",^[16] whereby metatags were being largely misused to trick search engines into thinking some websites had more relevance in the search than they really did.^[16]

Metadata can be stored and managed in a database, often called a metadata registry or metadata repository.^[17] However, without context and a point of reference, it might be impossible to identify metadata just by looking at it.^[18] For example: by itself, a database containing several numbers, all 13 digits long could be the results of calculations or a list of numbers to plug into an equation – without any other context, the numbers themselves can be perceived as the data. But if given the context that this database is a log of a book collection, those 13-digit numbers may now be identified as ISBNs – information that refers to the book, but is not itself the information within the book. The term "metadata" was coined in 1968 by Philip Bagley, in his book "Extension of Programming Language Concepts" where it is clear that he uses the term in the ISO 11179 "traditional" sense, which is "structural metadata" i.e. "data about the containers of data"; rather than the alternative sense "content about individual instances of data content" or metacontent, the type of data usually found in library catalogs.^[19]^[20] Since then the fields of information management, information science, information technology, librarianship, and GIS have widely adopted the term. In these fields, the word metadata is defined as "data about data".^[21] While this is the generally accepted definition, various disciplines have adopted their own more specific explanations and uses of the term.

Slate reported in 2013 that the United States government's interpretation of "metadata" could be broad, and might include message content such as the subject lines of emails.^[22]

Types[edit]

While the metadata application is manifold, covering a large variety of fields, there are specialized and well-accepted models to specify types of metadata. Bretherton & Singley (1994) distinguish between two distinct classes: structural/control metadata and guide metadata.^[23] Structural metadata describes the structure of database objects such as tables, columns, keys and indexes. Guide metadata helps humans find specific items and is usually expressed as a set of keywords in a natural language. According to Ralph Kimball, metadata can be divided into three categories: technical metadata (or internal metadata), business metadata (or external metadata), and process metadata.

NISO distinguishes three types of metadata: descriptive, structural, and administrative.^[21] Descriptive metadata is typically used for discovery and identification, as information to search and locate an object, such as title, authors, subjects, keywords, and publisher. Structural metadata describes how the components of an object are organized. An example of structural metadata would be how pages are ordered to form chapters of a book. Finally, administrative metadata gives information to help manage the source. Administrative metadata refers to the technical information, such as file type, or when and how the file was created. Two sub-types of administrative metadata are rights management metadata and preservation metadata. Rights management metadata explains intellectual property rights, while preservation metadata contains information to preserve and save a resource.^[8]

Statistical data repositories have their own requirements for metadata in order to describe not only the source and quality of the data^[6] but also what statistical processes were used to create the data, which is of particular importance to the statistical community in order to both validate and improve the process of statistical data production.^[7]

An additional type of metadata beginning to be more developed is accessibility metadata. Accessibility metadata is not a new concept to libraries; however, advances in universal design have raised its profile.^[24]^{: 213–214} Projects like Cloud4All and GPII identified the lack of common terminologies and models to describe the needs and preferences of users and information that fits those needs as a major gap in providing universal access solutions.^[24]^{: 210–211} Those types of information are accessibility metadata.^[24]^: 214 Schema.org has incorporated several accessibility properties based on IMS Global Access for All Information Model Data Element Specification.^[24]^: 214 The Wiki page WebSchemas/Accessibility lists several properties and their values. While the efforts to describe and standardize the varied accessibility needs of information seekers are beginning to become more robust, their adoption into established metadata schemas has not been as developed. For example, while Dublin Core (DC)'s "audience" and MARC 21's "reading level" could be used to identify resources suitable for users with dyslexia and DC's "format" could be used to identify resources available in braille, audio, or large print formats, there is more work to be done.^[24]^: 214

IETF RFC 5013

[34]

ISO Standard 15836-2009

[35]

NISO Standard Z39.85.

[36]

International standards apply to metadata. Much work is being accomplished in the national and international standards communities, especially ANSI (American National Standards Institute) and ISO (International Organization for Standardization) to reach a consensus on standardizing metadata and registries. The core metadata registry standard is ISO/IEC 11179 Metadata Registries (MDR), the framework for the standard is described in ISO/IEC 11179-1:2004.^[30] A new edition of Part 1 is in its final stage for publication in 2015 or early 2016. It has been revised to align with the current edition of Part 3, ISO/IEC 11179-3:2013^[31] which extends the MDR to support the registration of Concept Systems. (see ISO/IEC 11179). This standard specifies a schema for recording both the meaning and technical structure of the data for unambiguous usage by humans and computers. ISO/IEC 11179 standard refers to metadata as information objects about data, or "data about data". In ISO/IEC 11179 Part-3, the information objects are data about Data Elements, Value Domains, and other reusable semantic and representational information objects that describe the meaning and technical details of a data item. This standard also prescribes the details for a metadata registry, and for registering and administering the information objects within a Metadata Registry. ISO/IEC 11179 Part 3 also has provisions for describing compound structures that are derivations of other data elements, for example through calculations, collections of one or more data elements, or other forms of derived data. While this standard describes itself originally as a "data element" registry, its purpose is to support describing and registering metadata content independently of any particular application, lending the descriptions to being discovered and reused by humans or computers in developing new applications, databases, or for analysis of data collected in accordance with the registered metadata content. This standard has become the general basis for other kinds of metadata registries, reusing and extending the registration and administration portion of the standard.

The Geospatial community has a tradition of specialized geospatial metadata standards, particularly building on traditions of map- and image-libraries and catalogs. Formal metadata is usually essential for geospatial data, as common text-processing approaches are not applicable.

The Dublin Core metadata terms are a set of vocabulary terms that can be used to describe resources for the purposes of discovery. The original set of 15 classic^[32] metadata terms, known as the Dublin Core Metadata Element Set^[33] are endorsed in the following standards documents:

The W3C Data Catalog Vocabulary (DCAT)^[37] is an RDF vocabulary that supplements Dublin Core with classes for Dataset, Data Service, Catalog, and Catalog Record. DCAT also uses elements from FOAF, PROV-O, and OWL-Time. DCAT provides an RDF model to support the typical structure of a catalog that contains records, each describing a dataset or service.

Although not a standard, Microformat (also mentioned in the section metadata on the internet below) is a web-based approach to semantic markup which seeks to re-use existing HTML/XHTML tags to convey metadata. Microformat follows XHTML and HTML standards but is not a standard in itself. One advocate of microformats, Tantek Çelik, characterized a problem with alternative approaches:

Use[edit]

Photographs[edit]

Metadata may be written into a digital photo file that will identify who owns it, copyright and contact information, what brand or model of camera created the file, along with exposure information (shutter speed, f-stop, etc.) and descriptive information, such as keywords about the photo, making the file or image searchable on a computer and/or the Internet. Some metadata is created by the camera such as, color space, color channels, exposure time, and aperture (EXIF), while some is input by the photographer and/or software after downloading to a computer.^[39] Most digital cameras write metadata about the model number, shutter speed, etc., and some enable you to edit it;^[40] this functionality has been available on most Nikon DSLRs since the Nikon D3, on most new Canon cameras since the Canon EOS 7D, and on most Pentax DSLRs since the Pentax K-3. Metadata can be used to make organizing in post-production easier with the use of key-wording. Filters can be used to analyze a specific set of photographs and create selections on criteria like rating or capture time. On devices with geolocation capabilities like GPS (smartphones in particular), the location the photo was taken from may also be included.

Photographic Metadata Standards are governed by organizations that develop the following standards. They include, but are not limited to:

A physical item such as a book, CD, DVD, a paper map, chair, table, flower pot, etc.

An electronic file such as a digital image, digital photo, electronic document, program file, database table, etc.

Administration and management[edit]

Storage[edit]

Metadata can be stored either internally,^[107] in the same file or structure as the data (this is also called embedded metadata), or externally, in a separate file or field from the described data. A data repository typically stores the metadata detached from the data but can be designed to support embedded metadata approaches. Each option has advantages and disadvantages:

Popular culture[edit]

One of the first satirical examinations of the concept of Metadata as we understand it today is American science fiction author Hal Draper's short story, "MS Fnd in a Lbry" (1961). Here, the knowledge of all Mankind is condensed into an object the size of a desk drawer, however, the magnitude of the metadata (e.g. catalog of catalogs of... , as well as indexes and histories) eventually leads to dire yet humorous consequences for the human race. The story prefigures the modern consequences of allowing metadata to become more important than the real data it is concerned with, and the risks inherent in that eventuality as a cautionary tale.

Gartner, Richard. 2016. Metadata: Shaping Knowledge from Antiquity to the Semantic Web. Springer. 9783319408910.

ISBN

Zeng, Marcia & Qin, Jian. 2016. Metadata. Facet. 9781783300525.

ISBN

– NISO, 2017

Understanding Metadata: What is metadata, and what is it for?

– The Guardian, Wednesday 12 June 2013.

"A Guardian guide to your metadata"

– Cory Doctorow's opinion on the limitations of metadata on the Internet, 2001

Metacrap: Putting the torch to 7 straw-men of the meta-utopia

Investigator Toolkit

DataONE

Routledge, Taylor & Francis Group, ISSN 1937-5034

Journal of Library Metadata

Inderscience Publishers, ISSN 1744-263X

International Journal of Metadata, Semantics and Ontologies (IJMSO)

(PDF). Retrieved 25 June 2011. (PDF)

"Metadata and metacontent"

(PDF), Department of Homeland Security (October 2012)

Metadata

History[edit]

computer network

Standards

Types[edit]

[34]

[35]

[36]

Use[edit]

Photographs[edit]

Administration and management[edit]

Storage[edit]

Popular culture[edit]

ISBN

ISBN

Understanding Metadata: What is metadata, and what is it for?

"A Guardian guide to your metadata"

Metacrap: Putting the torch to 7 straw-men of the meta-utopia

DataONE

Journal of Library Metadata

International Journal of Metadata, Semantics and Ontologies (IJMSO)

"Metadata and metacontent"

LPR Standards