DMP linkable icons


DMP-4: Data documentation. Metadata



The concept


Data will be comprehensively documented, including all elements necessary to access, use, understand, and process, preferably via formal structured metadata based on international or community-approved standards. To the extent possible, data will also be described in peer-reviewed publications referenced in the metadata record.


Related terms: Community-Approved Standards, Documented, International Standards.


Category: Usability


Explanation of the principle


The proper use of metadata for the purpose of data documentation helps ensure that data users can access, use, understand, and process data. Usability of data is maximized when all appropriate elements of metadata are utilized. Partial documentation of data negatively impacts its usability in two main ways. First, one or more aspects of documentation can be handled partially, while others are handled completely and can happen when not all appropriate metadata elements have been populated for a given aspect of documentation. Second, one or more aspects of documentation can be ignored completely, meaning none of the metadata elements have been populated for that aspect of documentation.


The purpose of using formal standards-based metadata for data documentation is to maximize the use and reuse of the metadata across community and disciplinary boundaries. Standards facilitate the sharing of metadata between data providers and data users, either directly or via mediation technology.


When applicable, data producers should publish, in the peer-reviewed literature, the methods used in creating and validating the data. These and other descriptions can assist users in understanding various aspects of the data in ways not easily captured by formal metadata and should reference the data. However, publications are not a substitute for formal metadata, which should reference such works to enable discovery of additional documentation contained in referenced publications.


Guidance on Implementation, with Examples


Implementation requires populating metadata elements with appropriate content. Formal metadata standards for comprehensive data documentation include, among others, ISO 19115-1 (Standards ISO 2014), ISO 19115-2 (Standards ISO 2009), ISO 19139 (Standards ISO 2007), ISO 19157 (Standards ISO 2013), Dublin Core (Standards ISO-2 2009), Darwin Core, Directory Interchange Format (DIF), and Climate and Forecast (CF) metadata conventions.


Each metadata standard contains a set of suggested elements, or fields, which should be populated to cover three categories of metadata, including Descriptive, Structural, and Administrative metadata. It is the responsibility of the data providers to create and populate the metadata according to the standard used. Data users should have an expectation that, if the standard is followed, the dataset metadata can be read and utilized appropriately.


Metrics to measure level of adherence to the principle


Measuring consistent adherence to metadata creation and population guidelines can be very problematic. It is relatively easy to determine if the suggested metadata fields have been left empty or populated, but it is much more difficult to determine if populated metadata fields have been populated properly, or in a meaningful way. For example, a metadata field used to point to where the associated data can be found may be populated incorrectly or populated with a link that resolves to a location where access or use of the data may not be possible. The question then becomes whether the link was wrong or the metadata expressing the manner in which the data can be accessed and used is incomplete or wrong. Finally, following the example just mentioned, even if a link to data, and the associated metadata fields that explain how to access it, are populated correctly, it is still possible for the data to be misunderstood if appropriate semantic metadata is not available.


Four levels of metrics should be used to determine adherence to DMP-4:


  • - Measure the completeness of the suggested metadata fields for the standard used, reporting the percentage of fields meaningfully populated;

  • - Count the number of metadata references to other sources of documentation that describe the associated data;

  • - Measure whether links work correctly, reflecting dependencies between metadata fields and information on the accessibility of other documentation; and

  • - Measure the semantic success of the metadata, indicating the level at which the associated data can be understood and used in a meaningful manner.


Resource Implications of Implementation


Organizational, administrative, financial, technical, and operational resources are needed to implement the guidelines and the metrics necessary for measuring adherence to DMP-4. Organizational resources include policy formulation to reflect adherence and the value of adherence to the organization. Administrative resources include workflow definitions and review to validate adherence. Financial resources include budgets for people, software, and hardware for implementation. The hardware costs may be minimal compared to resources for professional development on metadata generation, software creation and maintenance, process improvement, and evaluation. Technical resources include tools and documents to implement the metadata generation, its testing, and adherence metrics. Operational resources include the time and people needed to integrate the metadata generation and adherence metrics into routine processes of the data provider. Tools for capturing metadata are available, both commercially and in open source.



Text extracted from the Data Management Principles Implementation Guidelines