Archiving and preservation for research environments

Assessing the FAIRness of the ARCHIVER long-term data preservation

The European Open Science Cloud (EOSC) initiative has extensively worked to promote and enable access to Open Science data with the stated aim of ensuring that researchers can maximize the value of their research processes, sharing large-scale Research Infrastructures (RIs). The importance of advanced long-term preservation to allow reproducibility of research results is emphasized by the EOSC Strategic Research and Innovation Agenda (SRIA) and different reports of relevant bodies such as the Digital Preservation Coalition. ARCHIVER is a unique initiative currently running in the EOSC framework that competitively procured R&D services for archiving and digital preservation. The ARCHIVER tenderers were selected through an open and competitive procurement process. Between December 2020 and August 2021 three consortia worked on innovative, prototype solutions for long-term data preservation, in close collaboration with  CERN, EMBL-EBI, DESY and PIC. ARCHIVER procured R&D services that address the long-term preservation needs across the entire research data management cycle.

ARCHIVER resulting services provide the research community with a set of trustworthy scientific data repository services, following best practices, at scale, serving FAIR data, commercialised with transparent business models in conformance with the current European legislation (e.g. GDPR, Free Flow of Data), to be available through the European Open Science Cloud marketplace. These trustworthy repositories are following recognised certifications levels such as ISO 14721, CoreTrustSeal, ISO 16363, support FAIR with field tested exit strategies across preservation environments, based on open APIs, adherent to open standards, as a combined strategy to ensure long term value and enlargement of the market for preservation service providers

 

Assessing the FAIRness of the ARCHIVER long-term data preservation services

As part of its R&D validation process, ARCHIVER assessed the FAIRness of the resulting ARCHIVER repository services.

The F-UJI tool developed in the context of the FAIRsFAIR project  responds to this need as it provides programatic assessment of FAIRness of research data objects based on metrics developed by the FAIRsFAIR project, breaking it down in concrete tests that could be included in ARCHIVER.


The initial assessment started by gathering some basic information about the current repositories from the organisations involved in the ARCIHVER project, namely EMBL-EBI, DESY, PIC and CERN to get familiarised with the tool.

The following information was shared:

  • Data domains (scientific discipline, community)
  • Assessment Target (e.g. subset of data holdings)
  • Data access level (e.g. if restrictions in place)
  • Meta(data) dissemination (OAI-PMH, REST, Content Negotiation, Schema.org)
  • Metadata standards (e.g. DDI Dublin Core, schema.org etc.)
  • Semantics (SPARQL endpoint, Vocabularies)
  • Data formats (e.g. discipline specific formats)

The following data sets were used for a preliminary test of the FAIR assessment tool:

Provider

Datasets

EMBL-EBI The ‘1000 genomes’ dataset contains 1000 human genomes, all publicly available with no restriction
DESY Serial femtosecond crystallography data and metadata including links to CrystFEL Beam File, CrystFEL Geometry File, Processing Scripts and diffraction patterns
CERN Audiovisual recordings of talk  of a conference; Example of a CMS collision dataset in AOD format; example of a CMS simulated dataset in AODSIM format; a simple example of an OPERA neutrino event dataset
PIC Fake dataset mimicking one night of raw data from the MAGIC Telescopes

Download the story ►