DMP linkable icons
DMP-8: Data and metadata verification |
The concept
Data and associated metadata held in data management systems will be periodically verified to ensure integrity, authenticity and readability.
Related terms: Authenticity, Integrity, Readability.
Category: Preservation
Explanation of the principle
Important among the actions performed by TDRs described above in DMP-7, is periodic checking and transformation (file migration) of data to ensure that they do not become obsolete. Constant and careful maintenance of the preserved data sets (data and associated knowledge) is necessary to ensure data integrity, authenticity, readability and thus usability over the long term. Archive and Data Management Systems’ curation and maintenance consist of all the activities aimed at guaranteeing the integrity, authenticity and readability of the archived and preserved data. This covers the storage of equipment, media and hard disk arrays in secured and environmentally controlled rooms, and a set of defined activities to be performed on routine basis, such as migration to new systems and media, in accordance with the technology and consumer market evolution, data compacting and data format/packaging conversion. Data holders and archive owners need to design a maintenance scheme for their Archives and Data Management System to guarantee the integrity of the archived and collected data.
Guidance on Implementation, with Examples
1. Archived data refreshment: Periodically perform a migration of the archived data (“media refreshment”) to the most adequate proven technology for data storage, to ensure data access preservation. Technology selection should not only be based on technical and cost aspects, but should also aim at the minimization of environmental impact (e.g. in terms of power consumption, thermal dissipation, etc.);
2. Archived data formats description: Provide formal description of old archiving formats to allow the conversion to new standard formats, which will increase technical compatibility and reduce diversity of formats and interfaces between archives;
3. Archived data duplication: Maintain identical copies of all archived data applying one of the security levels defined below:
a. Dual copy in the same geographical location (but different buildings) to avoid data loss due to media degradation or obsolescence, or
b. Dual copy in the same geographical location (but different buildings) based on different technology to avoid technology based principle failures, or
c. Dual copy in two different geographical locations to safeguard the archive from external hazards (e.g. floods, other natural and technological hazards, etc.), or
d. Dual copy in two different geographical locations, based on different technologies to avoid technology based principle failures.
4. Archive system components migration (hardware): Perform periodical migration of archive system components to new hardware platforms.
5. Media readability and accessibility tests: Perform periodical test for media readability and accessibility on a representative set of the archived data.
6. Archive content integrity: Periodically verify the integrity of the archive collection/content through integrity check on a representative set of the archived data.
7. Data content integrity: Ensure that archived content and associated information remains unchanged and, if changes are made, that these are documented, and that this documentation is preserved and made available as well (provenance information).
Metrics to measure level of adherence to the principle
Measures for the level of adherence include the Data Preservation Guidelines in point C above or to ISO 16363:2012 - Space data and information transfer systems - Audit and certification of trustworthy digital repositories (CCSDS 652.0-M-1), the standard used to assess the trustworthiness of a generic digital repository.
Resource Implications of Implementation
Estimating the cost in terms of resources for long-term digital preservation has received much attention from many organisations (e.g. companies, digital libraries, research data centres) interested in preserving their data and depends on the organization and on the data to be preserved (e.g. volume, format, etc.) and can therefore only be modelled here. Cost modelling techniques are used to estimate the costs involved in digital asset preservation and their economic impact on the organisation. Generic Cost models follow two main steps:
1. Identifying resource costs and activities
Activities identified for the Archiving process include managing storage, refreshment, migration, reporting, back-up, reformatting/repackaging, test and integrity verification, and reporting on archived data formats. Resources needed to complete the cost analysis include human resources and equipment, office/work space, IT services and technology, and other utilities. Usability and integrity are core parameters for quantifying impact.
Activities |
Parameters |
Impact |
Manage Storage |
* Usability (Readability, Authenticity)
* Integrity |
This activity is very important in order to ensure the physical preservation of digital data and consequently the physical access to it, that is to maintain data and technologies (HW, SW) used for accessing the data. If this activity is incorrectly performed, the risk of losing the data, as well as the ability to access the data, is very high. |
Manage Refreshment
Manage Migration
Manage Reporting
Manage Backup
Manage Reformatting/ Repackaging
Manage Test and Integrity Verification |
* Usability (Readability; Authenticity)
* Integrity |
It is very important in order to ensure the physical preservation of digital data and consequently the physical access to it, and its availability over time. Without such activities, the data can be lost in the long term, without the possibility to recover it or, if not correctly managed, the access to data could be lost. |
Report on archived data format |
* Integrity |
These activities are relevant in order to ensure the traceability of each action on the data. This can support the integrity and completeness of data and information provided to the data users |
2. Assigning resource costs to activities and Assigning activity costs to cost objects
The aforesaid step should be done with simulation and estimation value.
Text extracted from the Data Management Principles Implementation Guidelines