Introduction

file version: 16 August 2021

The preparation and testing of the new HSM to meet the specific needs of DKRZ is still in progress. Users will be informed by email three weeks before the new HSM will take up its operation.

The DKRZ operates a hierarchal storage management system (HSM) used for the storage of all relevant data created and post processed on DKRZ systems. The hardware of the HSM consists of a disk cache and two tape libraries. The primary tape archive is located in the DKRZ building in Hamburg. Selected files are mirrored to the secondary tape archive located at the Max Planck Computing and Data Facility (MPCDF) in Garching. The software installed to operate the HSM is StrongLink. All command-line based user-interaction with the tape archive goes through StrongLink and its command line tool slk.

If you have questions, which are not answered on this page or on the page linked in Further reading, please have a look into our FAQ. If you do not find an answer there, please contact us via support@dkrz.de .

Storage options and quota

The amount of data that can be stored in the tape archive per project is limited by the available storage quota of that project. Individual users do not have a quota. Storage space on the HSM is applied for in conjunction with the (bi-)annual application for DKRZ compute and storage resources. There is normal tape archive quota denoted as arch and quota for long term archival denoted as docu. Additionally, users might select very important files to be stored twice, i.e. one copy in Hamburg and one copy in Garching. The following table provides an overview:

Storage location, time and quota

File Storage

Storage Time

Used quota

How to achieve this

Past location (HPSS)

New location (StrongLink)

single copy on tape

1 year after expiration of DKRZ project

arch quota

default storage type

/hpss/arch/<prj>

/arch/<prj>

second copy on separate tape

1 year after expiration of project

arch quota; double file size used

store data in specific root namespace (see right column)

/hpss/double/<prj>

/double/<prj>

long-term storage for reference purpose

10 years after expiration of project

docu quota

Please contact data@dkrz.de

/hpss/doku/<prj>

/doku/<prj>

The output of slk list contains an extra column which indicates whether a file meant for duplication has already been copied to Garching.

slk: Command Line Tool for HSM access within the DKRZ network

Note

slk stores a login token in the home directory of each user (~/.slk/config.json). The login token is valid for 30 days. By default, this file can only accessed by the respective user (permissions: -rw-------/600). However, users should be careful when doing things like chmod 755 * in their home directory. If you assume that your slk login token has been compromized please contact support@dkrz.de .

The StrongLink software comes with a command line tool suite slk. slk is the user interface to the StrongLink software and allows the user to interact with the HSM. The available commands are:

  • help: displays the slk help page

  • version: print the slk version

  • login: log in to the system with LDAP credentials

  • archive: copy files to the HSM

  • chmod: modify permissions of archived files (same as chmod on the Linux shell)

  • delete: delete a namespace (and all child objects for the namespace) or a specific file

  • group: change group ownership of archived files; for file owners and admins only

  • owner: change ownership of archived files; for admins only

  • tag: modify metadata of archived files

  • search: search archived files based on metadata

  • list: list searched files and some of their metadata (similar to ls on the Linux shell)

  • retrieve: retrieve files based on search result or based on absolute path

  • recall: recall files based on search result or based on absolute path (needed for External Access only)

  • move: move a file or a namespace from one namespace to another namespace (might be merged with slk rename in future)

  • rename: rename a file or a namespace (might be merged with slk move in future)

Note

StrongLink uses the term “namespace” or “global namespace” (gns). A “namespace” is comparable to a “directory” or “path” on a common file system.

Note

After logging on to the system, slk does not provide its own shell, but the user still navigates through the local file system, i.e. the parallel file system of mistral. slk therefore behaves more like a cp command on the Linux shell. It is also not possible to navigate through the emulated directory structure of the HSM using slk.

Please read slk pitfalls before you start using the slk the first time. Please have a look into StrongLink Command Line Interface (slk) (on doc.dkrz.de) or into the StrongLink Command Line Interface Guide v3.1 for a detailed description of the individual commands. Alternatively, the sections Switching: pftp to slk and slk Usage Examples contain several usage examples.

slk helpers

The slk is lacking a few minor but very useful features. The slk_helpers program adds these features. Its usage is very similar to the usage of slk. The following commands are available:

  • checksum: prints one or all checksums of a resource

  • exists: check if a resource exists; the resource id is returned

  • help: print help / usage information

  • hostname: prints the hostname to which slk is currently connect to or to which slk will connect to

  • mkdir: create a namespace in an already existing namespace (like mkdir on the Linux shell)

  • metadata: get metadata of a resource

  • resourcepath: get path for a resource id

  • session: prints until when the current slk session is valid

  • size: returns file size in byte

Please have a look into slk helpers (slk_helpers) (on doc.dkrz.de) for a detailed description of the individual commands. Alternatively, the section slk Usage Examples contains several usage examples.

External access via sftp

See the extra page on External Access.

Packing of data

The tape archive delivers its best performance if the files to be archived are sufficiently large. The recommendations on packing developed on the basis of HPSS therefore remain effective for the time being. Like with the HPSS system, the accounting of used quota is done in increments of 1GB per archived file.

The package packems, which was developed by MPI-M and DKRZ for the HPSS, is planned to be adapted to the new HSM system. The process of packing & archiving of multiple data files to tape and their retrieval is simplified by this package. It consists of three command line programs:

  • a pack-&-archive tools packems,

  • a list-archived-content tool listems,

  • a retrieve-&-unpack tool unpackems and

Please use module load packems on mistral to load the packems package. Currently, the HPSS-compatible packems version is loaded. It will be switched after the migration from HPSS to StrongLink has been performed successfully. For details on the usage of packems please have a look into the packems manual.

Backend data handling

Just like with the previous HSM system HPSS, the fast disk cache is installed upstream of the tape system. Files selected for archival are first copied to the disk cache and then successively written onto tape. Files selected for retrieval are first copied from tape to the cache and then copied to the specified target locations. The retrieval of files that are still/already stored in the disk cache is considerably faster then the retrieval of files that are located on tape only.

The distribution of the files in the disk cache, primary tape archive and secondary tape archive is automatically controlled by the software StrongLink. The users have no control regarding the storage location of their data.

A - or t is appended to the permissions string of each file in the output of slk list. The - indicates that the file is stored in the cache.

Further reading