Archiving and preservation for research environments

Arkivum SaaS solution

The Arkivum company was founded in 2011 out of the University of Southampton with initial focus on Higher Education Research Data and is headquartered in Reading, UK. During the ARCHIVER project, Arkivum has developed a SaaS solution for long-term data archiving, digital preservation and online access to large research datasets. The resulting services can be deployed into public or private cloud environments. For cloud deployment, Arkivum has partnered with Google in the ARCHIVER project to deploy the solution onto the Google Cloud Platform (GCP). In addition, Arkivum has deployed the solution on premise at CERN to demonstrate portability and show how this can address use cases of data sovereignty.

Arkivum Archiving and Preservation Solution

Arkivum enables content, which in the case of ARCHIVER included scientific data from a range of sources, to be uploaded, ingested, preserved and made accessible to those who need to use that content in the future. This includes the ability to ingest, validate, organise and manage content as it comes into an archive. The content then goes through appropriate preservation and safeguarding processes, including generating OAIS archiving packages to ensure it is properly protected and remains usable. Content is indexed, is searchable, and can be downloaded and exported on-demand. This ensures that the data is searchable, discoverable and accessible for users both today and in the future so people can find and use the content in the archive that they need, when they need it.

Arkivum can provide organizations with a fully hosted SaaS solution that enables them to apply and follow good practices from the Long Term Digital Preservation (LTDP) and Research Data Management (RDM) domains and use the service to help achieve their research data management objectives. This includes applying the Open Archive Information Model (OAIS), following Trusted Digital Repository guidelines and certifications (TDR, CoreTrustSeal, TRUST) and helping to ensure that research data assets remain Findable, Accessible, Interoperable and Reusable (FAIR).

Arkivum R&D within ARCHIVER

The Arkivum resulting services were  designed to meet all the R&D layers defined by the ARCHIVER project:

  • Layer 1 (storage/basic archiving/secure backup): fulfilled by GCP or on-premise infrastructure with high-volume data storage in the petabyte range with fast ingest and access.

  • Layer 2 (preservation): addresses the need for long-term digital preservation following the OAIS model, including obsolescence management, file fixity, authenticity checks, and packaging for preservation and access.

  • Layer 3 (baseline user services): ability to organise, describe, index, search and share large and complex research datasets.

  • Layer 4 (advanced services): either GCP or on-premise infrastructure provides the basis for hosting and running scientific applications that can be executed directly against archived and preserved datasets.

The ARCHIVER pilot solution developed by Arkivum has been tested and shown to address the above requirements of archiving and preservation of very large datasets, with high speed ingest and access, but also with the ability to host and run scientific applications against this data. The features developed were validated against a range of use cases from the ARCHIVER Buyers, the Early Adopters, and the wider scientific research community including large research-intensive institutions and the Long-Tail of Science (LTOS).

In addition, the resulting services include, as part of their results, support for sustainable long-term digital preservation, such as:

  • economic sustainability with a solution that is cost-effective at scale when working with very large datasets; 
  • environmental sustainability by minimizing the carbon footprint of the solution and applying good practice in the digital preservation and archiving of research data to ensure sustainable access and reuse of research datasets for the scientific community.
  • Total Cost of Service (TCO) modeling calculators to provide the ability to optimize cost taking into consideration data volumes, access frequencies, data safety, data processing and retention periods.
  • the use of open standards, open specifications, open source and open APIs to ensure portability, interoperability, exit strategies and reduction of vendor and solution lock-in.

Detailed models and commercialization plans that include service level agreements (SLAs), user support, licensing, service configuration and pricing as well as self-assessment as TDR service providers, have been developed during the ARCHIVER project and can be used in the context of the European Open Science Cloud (EOSC) and beyond.