DMP linkable icons

DMP-1: Metadata for discovery



The concept


Data and all associated metadata will be discoverable, through catalogues and search engines, and data access and use conditions, including licenses, will be clearly indicated.


Related terms: Broker, Catalogue, Clearinghouse, Core Elements, Discovery, Discovery Services, License, Metadata, Metadata Element, Network Services, Queryable, Search Engine, Use Conditions


Category: Discovery


Explanation of the principle


A visitor to a library should be able to find a desired book without having to look at every book in the library. The library’s catalogue allows the visitor to search information about the books (e.g. author, ISBN number, genre), to discover where to find the book and under what conditions or restrictions the book might be read or borrowed. This “information about the book” is its metadata. Likewise, a user looking for Earth Observation resources (data, web services, models, etc.) should be able to find what they want by searching the metadata associated with that resource, including how the resource can be accessed and whether there are restrictions or conditions placed on its use. GEOSS maintains a catalogue of resources and like a library catalogue it does not keep copies of resources but instead manages the metadata that allows users to locate and access the resources.


Not all users begin a search for resources by going to a catalogue. Instead they may use general purpose search engines. For this reason catalogues may have portals, such as the geoportal, for use by humans, as well as programmatic interfaces (APIs) meant for access by search engines, metadata harvesters and the portals of other communities.


Guidance on Implementation, with Examples


The following types of metadata are particularly important for discoverability and reuse:

  • - a descriptive title and abstract;

  • - identity and contact information (e.g. ORCIDs) for the individuals responsible for the creation of the resource;

  • - identity and contact information for the individuals responsible for the management of the resource;

  • - geographic location or boundaries;

  • - temporal coverage;

  • - keywords describing the resource and the scientific or practical domain to which it applies;

  • - information on conditions and restrictions on use, in particular license information; and

  • - web links to the resource and to further information about the resource.


The following guidelines will help ensure that data and services are discoverable and usable. Adherence to these guidelines is checked and assessed through certification of the data repository or service using baseline certifications such as Data Seal of Approval or World Data System certifications.


    - catalogue entries should be in accordance with an accepted international or community agreed upon standards (e.g. DataCite, Dublin Core, ISO19115, etc.), and all core elements of the standard should be completed;

    - The catalogue should be accessible via an accepted international or community agreed upon standard protocol (e.g. OAI-PMH, OpenSearch, OGC CSW, etc.);

    - The metadata kept in the catalogue should be periodically checked for validity and links to the resources are still valid and responding. If metadata are maintained in the catalogue for resources that no longer exist, a mechanism should be provided to point to updated versions, if any, or suitable explanations be provided for why resources no longer exist;

    - The catalogue should provide search capabilities and the search results should display in a relevance-ranked order to reflect the user’s query;

    - GEOSS Data/Resource Providers are encouraged to register Catalogues over individual resources, where multiple resources are to be made discoverable;

    - As an alternative to creating a catalogue with a search interface, a data provider may post metadata, with links to the associated data, in a web-accessible location, which can then be harvested by search engines or metadata aggregators; and

    - Since some resources may have restrictions or other conditions of use, these should be clearly indicated in the metadata. Examples include limits on distribution, intended use, as well as licenses.


See also Data Management Principle 4 ‘Metadata’, which gives additional guidelines regarding documentation for that allows data to be used, understood and processed.


Metrics to measure level of adherence to the principle


Appropriate metrics relate to: 1) whether the metadata provides appropriate information for discovery and about reuse conditions; 2) whether the system providing the catalogue information follows good practice in terms of standards and performance; and, 3) whether the repository is certified as a trusted digital repository.


There are many projects and components that contribute to the implementation and measuring of metrics. Some examples of these include:


  • * Projects:

    • - The GeoViQua project utilizes metadata quality indicators.

  • * Service checkers:

    • - FGDC Service status checker;

    • - JRC Service checker.

  • * Performance indicators and availability:

    • - A catalogue is a system or service with high availability (e.g. > 99%), and should be engineered so that:

      • · no single points of failure;

      • · reliable cross over;

      • · detection of failures as they occur.

    • - Communities indicate the need for tools for validation (metadata, service, data – resources):

      • · Some tools interpret standards differently;

      • · Compliant resources should have undergone the certification process;

      • · Need for reference implementations and consistent, widely publicized and well-known community-accepted implementation guidelines.

    • - There should be a mechanism for data users to supply feedback as to the level of metadata adoption. In many cases, this is best known by the data user, and can serve as a qualitative metric.


Resource Implications of Implementation


The following activities are Key resource consumers include the activities of metadata authoring and maintenance, and standing up and maintaining a catalogue service. Examples of cost estimates to cover these activities have been made by many data management organizations, such as the Italian National research Council (CNR) and various EC member states. In particular, CNR have made cost estimates for operating the GEO Discovery and Access Broker (DAB), as well as other EC member states having cost estimates to operate a Spatial Data Infrastructure (SDI).



Text extracted from the Data Management Principles Implementation Guidelines