Cloud Computational Platform for Earth and Environmental Sciences – A pilot study proposal to Digitaliseringsrådet at UiT
Although its significant, researchers do often not value the importance of open-access research data. Most of the researchers consider preparing their research data to be deposited in open-access repositories as a time-consuming process, neglecting the needs of the scientific community to re-use these data. This proposal suggests executing a pilot project to construct a cloud platform (CloudEARTH) that allows researchers to upload, handle, recalculate, and plot their research data (i.e., a tool to handle the data during the active phase of a research project). The research data will be uploaded to the CloudEARTH as raw data, and the researchers will get the benefits of using the online tool to analyze their data and prepare it for publication. The research data can then easily be transferred to an open-access repository. Consequently, the researchers will not consider uploading their research data to open-access repositories as a time-consuming and useless process anymore.
The CloudEARTH (during the pilot phase) will be tested using research data from the Solid Earth Science research group, Department of Geosciences2, nevertheless, the platform can be potentially extended to cover other disciplines. In this context, a simple set of CAGE3´s published research data will be used to test the possibility of extending the CloudEARTH to cover environmental/climate research data. The CloudEARTH will be hosted at the Department of Geosciences and will be developed in close collaboration with the Department of Information Technology (IT) and the University Library (UB). During the data-handling stage, the research data will be stored at the forskningsdatasenter i sky (i.e., Microsoft Azure cloud) operated by the Department of IT. While the finalized data will be published using DataverseNO operated by UB.
The pilot project aims to gain experience in arranging technical/programming infrastructure for developing the suggested cloud platform, map the need for competence building and support services for researchers who will use such a tool, propose a management model for research data at the Department of Geosciences and describe what is needed to implement such a full-scale service. The pilot project will be implemented for 9 months.
Content of the proposal
Background and challenges
Before 2003, the term “Open Access” was related only to free access to peer-reviewed literature (e.g. Budapest Open Access Initiative, 2002). In 2003, through the “Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities”, the definition was considered to have a wider scope that includes raw research data, metadata, source materials and scholarly multimedia material. In line with the goals to facilitate the Open Access to researchers, a set of policies about, as well as methods for publishing, archiving, and disseminating data were made.
However, setting up the roles and the policies is not enough to encourage the researchers to make their research data openly to the scientific community. Some of the senior researchers are not trained enough to upload research data to different open-access repositories (e.g., as discussed during the brainstorming session for digitalization of research on 23 Sep. 2020), while junior researchers cannot see credit out of making their data openly available. As a result, the most challenging barrier facing open science is to encourage researchers to make their research data available through open-access repositories. Moreover, the future of open science does not lie only in making the research data openly available, but the future is how to use, share and work collaboratively on the same data set even remotely (i.e., cloud computation). This is true for all the research disciplines but in particular the Earth and Environmental Sciences where the research data plays a major role to help the society facing challenges of sustainable development.
CloudEARTH – objectives of the pilot study
The main objective of the proposal is to encourage researchers to make their research data openly available by giving them access to an online platform to handle their raw data. This platform (i.e., CloudEARTH) will be beyond archiving and storage of the research data by allowing cloud processing of the raw data. The researchers will see direct benefits of using such a platform and preparation of their research data to be uploaded to the CloudEARTH will not be considered as a time-consuming process. Moreover, the CloudEARTH will allow using the research data remotely and in a collaborative way which is another factor that encourages researchers to upload their data.
The pilot study aims to:
- gain experience in arranging technical/programming infrastructure for developing the suggested cloud platform
- map the need for competence building and support services for researchers who will use such a service
- propose a management model for research data at the Department of Geosciences
- describe what is needed to implement such a full-scale service
Department of Geosciences/CAGE – a case study
The suggested pilot project will be hosted and executed at the Department of Geosciences, UiT. Research data from the Department of Geosciences is chosen here as a case study for three reasons: 1) the absence of any cloud-based tools to handle raw geochemical, mineralogical and petrological data (i.e., the data type that will be used to test the CloudEARTH during the pilot phase); 2) The Earth and Environmental Sciences are by their nature a dynamic field in which new issues continue to arise and old ones often evolve. Therefore, making Earth and Environmental Sciences´ data openly available to the scientific community will help facing challenges of the sustainable development of the society; 3) The Department of Geosciences has several large research projects financed by the Research Council of Norway, EU, and by different oil and mining companies. Also, the Department includes several research centers including CAGE and ARCEx (Research Centre for Arctic Petroleum Exploration). Therefore, the CloudEARTH will encourage many researchers in different fields to be involved in open science by extending the platform to other disciplines. During the pilot phase, the CloudEARTH will rely on published research data from the Solid Earth Science research group as well as unpublished data from the principal investigator. However, the suggested service can be potentially developed to cover other research disciplines. To test the opportunities to develop the CloudEARTH to cover multidisciplinary research data, an agreement with CAGE was made to use simple sets of published environmental data (e.g., Conductivity-Temperature-Depth (CTD) datasets).
Meet the strategic plans of UiT and Norwegian Research Council
The suggested pilot project lies directly at the core of the strategic plans of both UiT and the Norwegian Research Council. Developing new cutting-edge technologies that help researchers to develop their research is the central point of the 2014-2022 strategic plan of UiT. Applying the developed technology using data from the Earth and Environmental Sciences will help to achieve the strategic aims in the research fields of: 1) sustainable use of resources and 2) energy, climate, society and environment.
Moreover, developing such a research tool (in-home at UiT) will help to host technology that will benefit to reduce infrastructure costs in the long-term run and will train skilled persons/researchers.
Strengthening existing infrastructure – collaboration across different units and institutes
The suggested project will use, link, and strengthen existing infrastructures at UiT. Moreover, the suggested service comes as a cross-units project (Fig. 1). The target researchers are different research groups at the Department of Geosciences including CAGE researchers. The data-handling phase of the service (i.e., the cloud computation part of the CloudEARTH) will be performed using MySQL databases, PHP coding and Microsoft Azure cloud (i.e., forskningsdatasenter i sky operated by the section for Enterprise Digital Services for Research and Dissemination at the Department of IT). Archived/published data will be stored at DataverseNO which is a service run by UB. For both the pilot project and the full-scale model, the plan is to automatize the transfer of the data from the forskningsdatasenter i sky to the DataverseNO. It is worth mentioning here that the Department of IT will hire a permanent engineer who will have the responsibility to move the DataverseNO from the local servers at UiT to the Azure cloud which will give more flexibility to the CloudEARTH. The developed CloudEARTH´s user-interface can be linked and embedded as an app into RSpace which is an electronic lab notebook service being used by UiT.
Implementation plan, Organization and PI
The project will be affiliated with the Department of Geosciences and be carried out in collaboration with the Department of IT (i.e., section for Enterprise Digital Services for Research and Dissemination; Steinar Trædal-Henden), UB (e.g. the research data team and the IT team at UB) and CAGE/Department of Geosciences (e.g. Fabio Sarti). The project will run for 9 months comprises two phases. Phase I: aims to adopt and adapt the programming code, the algorithm and creating the database that will be used as a base for the cloud computation, for four months. Phase II: aims to use the CloudEARTH by different researchers at the Department of Geosciences, UiT, for five months. To attract more researchers and get reliable feedback on the service, other researchers from outside UiT will be invited to test the CloudEARTH. The pilot project will be concluded with a report to summarize the results and recommendations.
Principal investigator: the responsibility for the project will be assigned to Tamer Abu-Alam. Abu-Alam is a geochemist and petrologist (i.e., geoscientist) who has extensive experience in handling research data. Moreover, Abu-Alam has extensive experience in creating databases and writing codes using different computer languages. He is a project manager (midlertidige stilling) at UB for a project that aims to promote open science and open-access research data published on the Polar Regions (i.e., Open Polar project).
Project risk and future opportunities
Several cloud-based computation tools are existing (e.g., MathWorks Cloud, Wolfram Cloud, openstack). However, most of these tools require subscription and advanced programming skills from the users. In other words, there is no risk associated to fail in the development and adaptation of the technology. But the CloudEARTH will be more specified to the Earth and environmental sciences and will provide the users with free to use tool with an interactive graphical user interface (code-free interface).
EarthCube is an ongoing project hosted at University in San Diego, California and funded by the National Science Foundation (NSF). The EarthCube aims to share geosciences data interactively based on the FAIR principles. However, the EarthCube does not cover the Solid Earth data (i.e., geochemistry, petrology and mineralogy) and is missing the cloud computational option. It will be a future opportunity for cooperation between the UiT (i.e., CloudEARTH) and the University in San Diego (i.e., EarthCube) to develop the CloudEARTH tools at the international level.
To access information of this section, please contact Tamer Abu-Alam at email@example.com
Data sharing policy during the pilot study
During the pilot study, all the research data will be in private mode (i.e., only researchers have access to the data). Uploading the research data to the open-access repository will be voluntary. Researchers who decided to upload their data to the open-access repository will have an option to use the embargo period (according to the UiT rules).