The DKRZ operates a hierarchal storage management system (HSM) used for the storage of all relevant data created and post processed on DKRZ systems. The hardware of the HSM consists of a disc cache and two tape libraries. The primary tape archive is located in the DKRZ building in Hamburg. Selected files are mirrored to the secondary tape archive located at the Max Planck Computing and Data Facility (MPCDF) in Garching. The software installed to operate the HSM is StrongLink. All command-line based user-interaction with the tape archive goes through StrongLink and its command line tool slk.
If you have questions, which are not answered on this page or on the page linked in Further reading, please have a look into our FAQ. If you do not find an answer there, please contact us via email@example.com .
The amount of data that can be stored in the tape archive per project is limited by the available storage quota of this project. Individual users do not have a quota. Each compute time project has to apply for their quota during the application for computing time. There is normal tape archive quota denoted as arch and quota for long term archival denoted as docu. Additionally, users might select very important files to be stored twice. The following table provides an overview:
How to achieve this
Past location (HPSS)
New location (StrongLink)
single copy on tape
1 year after expiration of project
default storage type
second copy on separate tape
arch quota; double file size used
Manually set metadata field backup to true
10 years after expiration of project
Please contact firstname.lastname@example.org
The duplicates are stored at the tape archive in Garching. The output of slk list contains an extra column which indicates whether a file meant for duplication has already been copied to Garching.
The StrongLink software comes with a command line tool suite slk. slk is the user interface to the StrongLink software and allows the user to interact with the HSM. The available commands are:
help: displays the slk help page
login: log in to the system with LDAP credentials
archive: copy files to the HSM
chmod: modify permissions of archived files (same as chmod on the Linux shell)
group: change group ownership of archived files
owner: change ownership of archived files; for admins only
tag: modify metadata of archived files
search: search archived files based on metadata
list: list searched files and some of their metadata (similar to ls on the Linux shell)
retrieve: retrieve files based on search result or based on absolute path
Important: After logging on to the system, slk does not provide its own shell, but the user still navigates through the local file system, i.e. the parallel file system of mistral. slk therefore behaves more like a cp command on the Linux shell. It is also not possible to navigate through the emulated directory structure of the HSM using slk.
Please have a look into slk manual (on doc.dkrz.de) or into the StrongLink Command Line Interface Guide v3.0 beta for a detailed description of the individual commands. Alternatively, use cases contains several usage examples.
StrongLink Command Line Interface Guide v3.0 beta
See the extra page on external access.
The StrongLink software reads the header of archived netCDF files and extracts extended file metadata from the headers (this feature will be available in the production environment (and does work in the test environment)). Users may edit some of these metadata and add further metadata via slk tag. The metadata which are saved are described in detail on our metadata manual page (metadata). Additionally, extended metadata for some common file formats such as jpeg are extracted. Harvesting extended metadata from additional file formats used in the context of Earth System modeling is planned to be implemented.
Files can be searched and found based on their metadata. The StrongLink software provides the command line tools slk search, slk list and slk retrieve to search, list and retrieve files based on their metadata, respectively. Retrieval of files based on their absolute path in the HSM is also possible. Please see the metadata manual page (metadata), use cases and slk manual for details on the usage of slk in this context.
The tape archive delivers its best performance if the files to be archived are sufficiently large. Based on initial operational experience with StrongLink, DKRZ will revise its recommendations regarding the packing of files. The recommendations on packing developed on the basis of HPSS therefore remain effective for the time being. Like with the HPSS system, the accounting of used quota is done in increments of 1GB per archived file.
Just like with the previous HSM system HPSS, the fast disc cache is installed upstream of the tape system. Files selected for archival are first copied to the disc cache and then successively written onto tape. Files selected for retrieval are first copied from tape to the cache and, then, copied to the specified target locations. The retrieval of files that are still/already stored in the disc cache is considerably faster then the retrieval of files that are located on tape only.
The distribution of the files in the disc cache, primary tape archive and secondary tape archive is automatically controlled by the software StrongLink. The users have no control regarding the storage location of their data.