Introduction to the DKRZ Data Pool (/pool/data)

Datasets which are stored in the DKRZ Data Pool are accessible and usable by any DKRZ user. The DKRZ Data Pool is comprised of modeling and observational datasets for which widespread and longer-lasting use across different DKRZ user groups is expected.

The datasets could be, for example,

  • initial and boundary conditions for numerical models,

  • reanalysis datasets,

  • standard evaluation products or

  • project-related datasets of interest to a wide user base.

For efficient reuse of datasets made available via the DKRZ Data Pool, the data is accessed using the path-prefix /pool/data/. Furthermore, the datasets are accompanied by a README-file documenting salient characteristics of the data to enable efficient reuse.

Sharing datasets via /pool/data

Obtaining storage resources

Datasets made available via /pool/data are associated with so-called pool-projects. With the migration to HLR-4 Levante in the end of 2021, pool projects are asked to fill in an application-like form describing the data and the scope of its use at DKRZ. The README-version of this document will be made public on the DKRZ website.

The storage resources required for pool-projects will be awarded in equal parts from the shareholder and community shares.

Obtaining storage resources follows one of two paths:

  1. Data volume to be shared less than 80TB:

  • users fill out the /pool/data storage allocation form and submit it to DKRZ staff at any time via support@dkrz.de

  • following a review process, which may involve amendments to the document, the required resources are granted

  • a DKRZ project number is assigned, the data are moved to the corresponding project-space on the file system and the project is soft-linked to /pool/data/THE_PROJECT

  • disk space on Levante is allocated for a period of up to 5 years, after which the process will have to be re-iterated

  1. Data volume to be shared larger than 80TB:

  • users fill out the /pool/data resource application form and submit it to DKRZ in the framework of the regular rounds of resource application with DKRZ Scientific Steering Committee (WLA)

  • application is reviewed by the WLA

  • if resources are granted, a DKRZ-project number is assigned, the data are moved to the corresponding project-space on the file system and the project is soft-linked to /pool/data/THE_PROJECT.

  • disk space on Levante is allocated for a period of up to 5 years, after which the process will have to be re-iterated

In the storage allocation/application document, it is critical for the data to provide sensible arguments and use cases for the DRKZ-wide use of the datasets which are to be made available via /pool/data.

Please note: all data collections which are currently (August 2021) stored directly at /pool/data/PROJECT_ACRONYM and made available to DKRZ users still have to fill out a storage allocation form and a README file for submission to DKRZ staff prior to the data migration to Levante. All those projects should lie below the threshold of 80TB and will be contacted by DKRZ in a separate email. If no application for resources is submitted, the data will not be migrated to Levante.

Data documentation and reuse license

All pool/data-projects must be accompanied by a README file containing the salient characteristics of the data required for reuse. The README file shall be provided as a write-protected .txt-document on disk in the top-level directory of the datasets it describes. A copy of the README is made available online in DKRZ’s user portal as well. This implies, that names and contact details of the data responsible persons are visible online.

Furthermore, the data must be associated with a license which is expected to follow international standards (see CC licenses for some examples) to facilitate reuse and ensure credit to the data creators. If datasets made available via /pool/data shall be associated with more restrictive licenses which limit the reuse to a subset of DKRZ users, the data owners should present sensible arguments supporting their case.

Templates for the README and /pool/data storage allocation documents are available for download and examples are provided below.

List of current projects

Here, an overview of current /pool/data projects will be provided.

Examples for README and /pool/data resource application

Here, examples of README files and resource applications for /pool/data projects will be uploaded.

Example 1 (ECHAM5-related datasets):

Example 2 (HAPPI-MIP related datasets):

Support

If you have any further questions, please do not hesitate to contact DKRZ staff at support@dkrz.de