Katana VentraIP

Open data

Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license.[1][2][3]

The goals of the open data movement are similar to those of other "open(-source)" movements such as open-source software, open-source hardware, open content, open specifications, open education, open educational resources, open government, open knowledge, open access, open science, and the open web. The growth of the open data movement is paralleled by a rise in intellectual property rights.[4] The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives Data.gov, Data.gov.uk and Data.gov.in.


Open data can be linked data - referred to as linked open data.


One of the most important forms of open data is open government data (OGD), which is a form of open data created by ruling government institutions. Open government data's importance is born from it being a part of citizens' everyday lives, down to the most routine/mundane tasks that are seemingly far removed from government.


The abbreviation FAIR/O data is sometimes used to indicate that the dataset or database in question complies with the principles of FAIR data and carries an explicit data‑capable open license.

Overview[edit]

The concept of open data is not new, but a formalized definition is relatively new. Open data as a phenomenon denotes that governmental data should be available to anyone with a possibility of redistribution in any form without any copyright restriction.[5] One more definition is the Open Definition which can be summarized as "a piece of data is open if anyone is free to use, reuse, and redistribute it – subject only, at most, to the requirement to attribute and/or share-alike."[6] Other definitions, including the Open Data Institute's "open data is data that anyone can access, use or share," have an accessible short version of the definition but refer to the formal definition.[7] Open data may include non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data, and practice, bioscience and biodiversity.


A major barrier to the open data movement is the commercial value of data. Access to, or re-use of, data is often controlled by public or private organizations. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions detract from the common good and that data should be available without restrictions or fees.


Creators of data do not consider the need to state the conditions of ownership, licensing and re-use; instead presuming that not asserting copyright enters the data into the public domain. For example, many scientists do not consider the data published with their work to be theirs to control and consider the act of publication in a journal to be an implicit release of data into the commons. The lack of a license makes it difficult to determine the status of a data set and may restrict the use of data offered in an "Open" spirit. Because of this uncertainty it is possible for public or private organizations to aggregate said data, claim that it is protected by copyright, and then resell it.

data.uni-muenster.de – Open data about scientific artifacts from the University of Muenster, Germany. Launched in 2011.

Network Project – archival repository software promoting data sharing, persistent data citation, and reproducible research.[15]

Dataverse

linkedscience.org/data – Open scientific datasets encoded as . Launched in 2011, ended 2018.[16][17]

Linked Data

systemanaturae.org – Open scientific datasets related to wildlife classified by animal species. Launched in 2015.

[18]

permanent, persistent digital IDs, which enable access controls for datasets;

permanent, discoverable metadata associated with each digital ID;

(API)-based access, tied to an authentication and authorization service;

application programming interface

data portability;

data "peering," without access, egress, and ingress charges; and

a rationed approach to users computing data over the data commons.

At a small level, a business or research organization's policies and strategies towards open data will vary, sometimes greatly. One common strategy employed is the use of a data commons. A data commons is an interoperable software and hardware platform that aggregates (or collocates) data, data infrastructure, and data-producing and data-managing applications in order to better allow a community of users to manage, analyze, and share their data with others over both short- and long-term timelines.[46][47][48] Ideally, this interoperable cyberinfrastructure should be robust enough "to facilitate transitions between stages in the life cycle of a collection" of data and information resources[46] while still being driven by common data models and workspace tools enabling and supporting robust data analysis.[48] The policies and strategies underlying a data commons will ideally involve numerous stakeholders, including the data commons service provider, data contributors, and data users.[47]


Grossman et al[47] suggests six major considerations for a data commons strategy that better enables open data in businesses and research organizations. Such a strategy should address the need for:


Beyond individual businesses and research centers, and at a more macro level, countries like Germany[49] have launched their own official nationwide open data strategies, detailing how data management systems and data commons should be developed, used, and maintained for the greater public good.

"Data belongs to the ". Typical examples are genomes, data on organisms, medical science, environmental data following the Aarhus Convention.

human race

was used to fund the work, and so it should be universally available.[50]

Public money

It was created by or at a government institution (this is common in US National Laboratories and government agencies).

cannot legally be copyrighted.

Facts

of research do not get full value unless the resulting data are freely available.

Sponsors

Restrictions on data re-use create an anticommons.

Data are required for the smooth process of running communal human activities and are an important enabler of (health care, education, economic productivity, etc.).[51]

socio-economic development

In scientific research, the rate of discovery is accelerated by better access to data.

[52]

Making data open helps combat "data rot" and ensure that scientific research data are preserved over time.[54]

[53]

Statistical literacy benefits from open data. Instructors can use locally relevant data sets to teach statistical concepts to their students.[56]

[55]

Allowing open data in the scientific community is essential for increasing the rate of discoveries and recognizing significant patterns.

[57]

Opening government data is only a waypoint on the road to improving education, improving government, and building tools to solve other real-world problems. While many arguments have been made categorically, the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.


Arguments made on behalf of open data include the following:


It is generally held that factual data cannot be copyrighted.[58] Publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.


While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.


Unlike open access, where groups of publishers have stated their concerns, open data is normally challenged by individual institutions. Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.


Arguments against making all data available as open data include the following:


The paper entitled "Optimization of Soft Mobility Localization with Sustainable Policies and Open Data"[62] argues that open data is a valuable tool for improving the sustainability and equity of soft mobility in cities. The author argues that open data can be used to identify the needs of different areas of a city, develop algorithms that are fair and equitable, and justify the installation of soft mobility resources.

is concerned with making scholarly publications freely available on the internet. In some cases, these articles include open datasets as well.

Open access

are documents describing file types or protocols, where the documents are openly licensed. These specifications are primarily meant to improve different software handling the same file types or protocols, but monopolists forced by law into open specifications might make it more difficult.

Open specifications

is concerned with making resources aimed at a human audience (such as prose, photos, or videos) freely available.

Open content

. Open Knowledge International argues for openness in a range of issues including, but not limited to, those of open data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information. Open data is included within the scope of the Open Knowledge Definition, which is alluded to in Science Commons' Protocol for Implementing Open Access Data.[63]

Open knowledge

refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.

Open notebook science

is concerned with the open-source licenses under which computer programs can be distributed and is not normally concerned primarily with data.

Open-source software

are freely accessible, openly licensed documents and media that are useful for teaching, learning, and assessing as well as for research purposes.

Open educational resources

/open science/open science data (linked open science) means an approach to open and interconnect scientific assets like data, methods and tools with linked data techniques to enable transparent, reproducible and interdisciplinary research.[64]

Open research

Open-GLAM (Galleries, Library, Archives, and Museums) is an initiative and network that supports exchange and collaboration between cultural institutions that support open access to their digitalized collections. The GLAM-Wiki Initiative helps cultural institutions share their openly licensed resources with the world through collaborative projects with experienced Wikipedia editors. Open Heritage Data is associated with Open GLAM, as openly licensed data in the heritage sector is now frequently used in research, publishing, and programming,[66] particularly in the Digital Humanities.

[65]

The goals of the Open Data movement are similar to those of other "Open" movements.

Open Data as commons[edit]

Ideas and definitions[edit]

Formally both the definition of Open Data and commons revolve around the concept of shared resources with a low barrier to access. Substantially, digital commons include Open Data in that it includes resources maintained online, such as data.[67] Overall, looking at operational principles of Open Data one could see the overlap between Open Data and (digital) commons in practice. Principles of Open Data are sometimes distinct depending on the type of data under scrutiny.[68] Nonetheless, they are somewhat overlapping and their key rationale is the lack of barriers to the re-use of data(sets).[68] Regardless of their origin, principles across types of Open Data hint at the key elements of the definition of commons. These are, for instance, accessibility, re-use, findability, non-proprietarily.[68] Additionally, although to a lower extent, threats and opportunities associated with both Open Data and commons are similar. Synthesizing, they revolve around (risks and) benefits associated with (uncontrolled) use of common resources by a large variety of actors.

The System[edit]

Both commons and Open Data can be defined by the features of the resources that fit under these concepts, but they can be defined by the characteristics of the systems their advocates push for. Governance is a focus for both Open Data and commons scholars.[68][67] The key elements that outline commons and Open Data peculiarities are the differences (and maybe opposition) to the dominant market logics as shaped by capitalism.[67] Perhaps it is this feature that emerges in the recent surge of the concept of commons as related to a more social look at digital technologies in the specific forms of digital and, especially, data commons.

Real-life case[edit]

Application of open data for societal good has been demonstrated in academic research works.[69] The paper "Optimization of Soft Mobility Localization with Sustainable Policies and Open Data" uses open data in two ways. First, it uses open data to identify the needs of different areas of a city. For example, it might use data on population density, traffic congestion, and air quality to determine where soft mobility resources, such as bike racks and charging stations for electric vehicles, are most needed. Second, it uses open data to develop algorithms that are fair and equitable. For example, it might use data on the demographics of a city to ensure that soft mobility resources are distributed in a way that is accessible to everyone, regardless of age, disability, or gender. The paper also discusses the challenges of using open data for soft mobility optimization. One challenge is that open data is often incomplete or inaccurate. Another challenge is that it can be difficult to integrate open data from different sources. Despite these challenges, the paper argues that open data is a valuable tool for improving the sustainability and equity of soft mobility in cities.


An exemplification of how the relationship between Open Data and commons and how their governance can potentially disrupt the market logic otherwise dominating big data is a project conducted by Human Ecosystem Relazioni in Bologna (Italy). See: https://www.he-r.it/wp-content/uploads/2017/01/HUB-report-impaginato_v1_small.pdf.


This project aimed at extrapolating and identifying online social relations surrounding “collaboration” in Bologna. Data was collected from social networks and online platforms for citizens collaboration. Eventually data was analyzed for the content, meaning, location, timeframe, and other variables. Overall, online social relations for collaboration were analyzed based on network theory. The resulting dataset have been made available online as Open Data (aggregated and anonymized); nonetheless, individuals can reclaim all their data. This has been done with the idea of making data into a commons. This project exemplifies the relationship between Open Data and commons, and how they can disrupt the market logic driving big data use in two ways. First, it shows how such projects, following the rationale of Open Data somewhat can trigger the creation of effective data commons. The project itself was offering different types of support to social network platform users to have contents removed. Second, opening data regarding online social networks interactions has the potential to significantly reduce the monopolistic power of social network platforms on those data.

to deposit bioinformatics, atomic and molecular coordinate data, experimental data into the appropriate public database immediately upon publication of research results.

to retain original data sets for a minimum of five years after the grant. This applies to all data, whether published or not.

Several funding bodies which mandate Open Access mandate Open Data. A good expression of requirements (truncated in places) is given by the Canadian Institutes of Health Research (CIHR):[70]


Other bodies active in promoting the deposition of data as well as full text include the Wellcome Trust. An academic paper published in 2013 advocated that Horizon 2020 (the science funding mechanism of the EU) should mandate that funded projects hand in their databases as "deliverables" at the end of the project, so that they can be checked for third party usability then shared.[71]

Open knowledge

Free content

Openness

Creative Commons license

Data curation

Data governance

Data management

Data publishing

Data sharing

Demand-responsive transport

Digital preservation

FAIR data principles

International Open Data Day

and Linked open data

Linked data

Open energy system databases

Urban informatics

Wikidata

List of datasets for machine-learning research

Open Standard

– from the Open Knowledge Foundation

Open Data – An Introduction

Archived 10 April 2011 at the Wayback Machine of Tim Berners-Lee at TED (conference) 2009 calling for "Raw Data Now"

Video

Archived 6 May 2011 at the Wayback Machine of Tim Berners-Lee at TED (conference) 2010 showing examples of open data

Six minute Video

G8 Open Data Charter

– research paper tracing different historical threads contributing to current conceptions of open data.

Towards a Genealogy of Open Data