The IEO is a comprehensive cancer centre dedicated to adult oncology, which integrates prevention, diagnosis, treatment and research with a multidisciplinary approach. At IEO, a complete integration exists between clinical and research activities in order to translate scientific results into therapy, as quickly as possible. Basic and translational research takes place at the Department of Experimental Oncology (DEO), in a which is also home to the European School of Medicine Molecular (SEMM), Center for Genomic Science of the Italian Institute of Technology and FIRC Institute of Molecular Oncology. IEO’s DEO is composed of about 300 scientists, whose research activities are aimed at discovering the molecular mechanisms involved in the development of cancer. Specifically, research conducted at DEO is founded on five principles: Independent research, Strong interaction with clinicians, Cutting-edge technology (including Technological Units with state-of-the-art equipment and expertise, and Clinical Technoshots for the dissemination of technologies that may favor specific translational-research projects), Open, collaborative and participatory research environment and Intense education activity.
The European Institute of Oncology is dealing with an increasing amount of omics data generated in its laboratories or by external collaborators, including for instance genomics, epigenomics, metabolomics, proteomics, imaging, clinical data.
The data generated can be used more than once:
In this context, the European Institute of Oncology faces two problems: computational resources and control of data access.
The solution adopted would also facilitate the sharing of data with external collaborators, who should be able to access it through the same workflow.
Finally, it should be possible to bill the single groups or units for the usage of the resources.
Up to now, IEO relied on an on-premise infrastructure (HPC cluster with associated on premise storage). A new solution should be identified and implemented within the next 12 months.
During the last 10 years, almost 1 PB of data has been generated. With the adoption of new technologies, the institute estimates a production of ~250 TB/year starting in 2020.
IEO generates omics data, both from cellular lines and patients. In particular:
The numbers provided are rough estimates. Other datatypes could be integrated, for instance radiomics data. For each dataset, additional space (up to the same size), is occupied by processed data.
The estimated cost requirements will be defined taking into account the cost of the extension of an on-premise solution, and third-party resources offered by national and international organizations. The solution should provide cost effective long-term storage of the data, still allowing the access of large datasets at reasonable price.
The ARCHIVER solution would have the following impact:
Such solution would not only benefit to the researchers in IEO: many collaborations are bringing together researchers from different institutes and hospitals, who need to access and process the same data.