DMP linkable icons
DMP-10: Persistent and resolvable identifiers |
The concept
Data will be assigned appropriate persistent, unique and resolvable identifiers to enable documents to cite the data on which they are based and to enable data providers to receive acknowledgement for use of their data.
Related terms: A persistent, unique and resolvable identifier, Persistence, Resolution to Location, Unique Identity.
Category: Curation
Explanation of the principle
Assigning a persistent, unique and resolvable digital identifier to data allows researchers and other users to communicate unambiguously the data that were used in the published research and contributes to the transparency and reproducibility of research. Persistent, unique and resolvable identifiers are an important component in the mechanism and practice of citation. They remove ambiguity about which work or data has been cited and easily allow citations to be counted and used as a metric for research contributions.
Data citations allow the user to locate the evidence underpinning a research statement, which is critical for scientific practice and the process of verification, and they provide acknowledgment of a source, which has become culturally important in the practice of attributing intellectual debt and as one of the metrics for assessing research contributions.
Improving data citation practice is an important step to ensure that contributions of data creators and data curators are acknowledged. In turn, such recognition should lead to proper financial support for data sharing and data stewardship, which are essential research lifecycle activities.
Thus, the Joint Declaration of Data Citation Principles states:
Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.
All the Data Citation Principles are relevant to this Data Management Principle.
Relatedly, the San Francisco Declaration on Research Assessment (DORA) calls for metrics relating to the value and impact of all research outputs, including datasets and software, to be included in the assessment of research contributions.
Guidance on Implementation, with Examples
The persistence, resolvability and uniqueness of an identifier depend on responsibility being taken to enact and maintain a series of key functions.
* Persistence and uniqueness of the identifier: a registration authority must ensure that the identifier is unique and that information is maintained that unambiguously associates the identifier with the resource. The identifier itself (the string of numbers or letters in whatever format) must be maintained and must not change;
* Persistence of resolution of identifier to location: a mechanism must be provided that enables the resource to be found at a specific location on a network. As noted above, this will generally be to a freely accessible ‘landing page’ providing detailed metadata relating to the data resource. If the data resource is moved, steps must be taken to ensure that the identifier resolves to the new location;
* Persistence of landing page: If for whatever reason the data holder needs to remove (de-accession or destroy the data itself) the landing page must be maintained and must provide information that this step has been taken. The identifier and metadata must persist even if the data resource has been destroyed;
* Persistence checking: to maintain these functions regular checking of link resolution, resource persistence and location should be undertaken.
Organizations that maintain and provide access to data resources should ensure that these functions are carried out, whether by the organization itself or by a third party.
The key words here are persistence and responsibility. The authors of Clark et al. 2015, recommend that all organizations endorsing the Joint Declaration of Data Citation Principles adopt a “Persistence Guarantee’:
[Organization/Institution Name] is committed to maintaining persistent identifiers in [Repository Name] so that they will continue to resolve to a landing page providing metadata describing the data, including elements of stewardship, provenance, and availability.
[Organization/Institution Name] has made the following plan for organizational persistence and succession: [plan].
The capacity to deliver such a guarantee corresponds to some of the criteria for being a Trusted Digital Repository (TDR) [see DMP-7 and reference DSA/WDS]
Persistent Identifier Schemes
A number of persistent identifier schemes exist. The principal ones, summarized in Clark et al. 2015, include PURLs (Permanent Uniform Resource Locators), the Handle System, ARKs (Archival Resource Keys), CrossRef and DataCite DOIs (Digital Object Identifiers). Some databases and data archives use their own identifier system and maintain the resolution between these identifiers and a location themselves.
DOIs are built on the Handle System. CrossRef and DataCite are Registration Agencies that provide services for registering and resolving DOIs and ensure persistence by requiring specific commitments from registering organizations and by actively monitor compliance.
The following table is adapted from Clark et al. 2015 and summarises the approach of the most important identifier schemes used for identifying data to maintain persistence.
Scheme |
Authority |
Resolution URI |
Achieving Persistence |
Enforcing Persistence |
Action on Removal of Data Resource |
PURL |
Online Computer Library Centre (OCLC) |
Registration |
None |
Domain owner responsibility |
|
ARK |
Various Name Assigning or Mapping Authorities |
Name Mapping Authorities |
User-defined policies |
Hosting server |
Host-dependent; metadata should persist |
Handle |
Corporation for National Research Initiatives (CNRI) |
Registration |
None |
Identifier should persist |
|
DataCite DOI |
DataCite |
Registration with contract |
Link checking |
DataCite contacts owners; metadata should persist |
Data contributed to GEOSS should be assigned appropriate persistent, unique and resolvable identifiers. Both the organisation holding the data and GEOSS should indicate clearly how the data should be cited by those using the data in published work.
Metrics to measure level of adherence to the principle
Measures of adherence are as follows:
1. Assigning appropriate, persistent, unique and resolvable identifiers to data sets contributed to GEOSS;
2. Resolution of the identifier to the data landing page;
3. Clear statement on the landing page and in the GEOSS entry of how to cite the data; and
4. Good practice data citation in the GEO community.
Resource Implications of Implementation
Data archives should subscribe to a service that generates unique persistent identifiers for data and should assign an identifier to each data product that is released to the public. The data identifier assignments may be initiated automatically or manually by the archive. The recommended citation for each data product should include the data product identifier.
Text extracted from the Data Management Principles Implementation Guidelines