FAQ

v1.22, 08 October 2021

General information about the HSM system

What does HSM mean?

Hierarchical Storage Management. It means the DKRZ tape archive.

What type of HSM system will be installed?

The software is called StrongLink and it is developed and supplied by StrongBox Data Solutions (https://www.strongboxdata.com/stronglink).

Does the tape archive hardware also change?

The hardware of the tape archive remains unchanged. The metadata servers and disk cache will be replaced by new powerful servers.

Why is DKRZ getting a new system?

The contract for the current HSM system HPSS ends in 2021 and HPSS is also not laid out to cope with the data volumes expected to be produced by DKRZ’s upcoming HPC system Levante. Therefore DKRZ ran a tender to purchase a new HSM system. The company Cristie Data was chosen to deliver the new HSM system called StrongLink. The new system allows for a higher data throughput to and from tape and will be able to cope with larger volumes of data expected to be produced on Levante.

What are the main differences compared to the old system?

From a users’ perspecitve, a different command line tool to transfer data in and out of the tape archive will have to be used. The new tool will provide powerful metadata features. For details have a look at “Which new features does the HSM System provide?”. Technically, the new HSM system StrongLink allows for higher data throughput, is more scalable and is more resilient towards hardware failures compared to the current/old HPSS system.

Will the new HSM system be accessbile via pftp?

No. pftp will be replaced by a new command line tool slk. Additionally, a command line tool slk_helpers is provided which has some features that slk is lacking. slk is developed and maintained by StrongBox Data Solutions, whereas slk_helpers are developed and maintained at DKRZ.

Would it be possible/desirable to use only one command for slk and slk_helpers main classes?

Basically: yes. We decided otherwise because the development for slk is still ongoing and we do not want to clash with the development by StrongBox: features that we implement now, might later be implemented with slight functional changes by StrongBox.

Data Migration

When does the new HSM system go online?

The new HSM system is planned to go online on Monday, the 25 Oct 2021 - ten days after the HPSS went offline. Updates of the schedule will be reported. There is a test system provided for adapting and testing workflows (please see “Will DKRZ users be able to test their archiving workflows before the new system goes online?”).

When does the HPSS go offline?

The HPSS was taken offline on Friday 15 Oct 2021 at 8 AM and planned to remain offline.

When will an exact time schedule for the migration be published?

Will I be able to see how the new HSM system will look like before it becomes productive?

Will all my archived data be available on the new system?

All data will be migrated automatically from HPSS to StrongLink. An exception are files which originate from the DXUL (Disc EXtended Unix Linux) system and are located in the directory /dxul and below. These files will not be migrated.

How do I find out whether I have data from DXUL that have to be copied manually?

If you own or use legacy data created on HLRE-1 (hurrikan) and earlier (before 2010), please check if there are data in /dxul and below. If you are working with data produced on HLRE-2 (blizzard) and HLRE-3 (mistral) then you are probably not affected. However, there might be config or forcing files of more recent simulations still located in /dxul.

How do I access DXUL data after the HPSS is shut down?

It will no longer be possible to access the DXUL data via the new HSM system.

Where do I find data from the DXUL archive now?

Please have a look into the directory /dxul. The directory /dxul is on the same level as the /hpss directory in pftp or on xtape (read only). /dxul does not follow the same, strict structure as /hpss. The most likely place to check for old project data would be /dxul/prj/.

How to proceed if I still have DXUL data that need to be kept?

Please copy or move the data from /dxul to your project in the HPSS archive (/hpss) well before the migration. You can contact beratung@dkrz.de in case you need help to move directories into an existing HPSS project.

Training, Questions and Adaption of Workflows

Will there be an introduction session to the new HSM system and its usage?

Yes, a DKRZ Tech Talk took place on 6 July 2021. The new HSM system and the new command line tool slk were presented there. A recording of the Tech Talk will be available soon.

Will DKRZ users be able to test their archiving workflows before the new system goes online?

DKRZ users can access a StrongLink test system during the time period between the DKRZ Tech Talk and the new HSM system going online (please see “Will there be an introduction session to the new HSM system and its usage?” and “When does the new HSM system go online?”). Users can test their modified scripts on the test system. The amount of transferable data is limited due to the size of the test system. Please see Using the HSM test system in the documentation for details.

Where can I find written documentation about the new HSM system?

The user documentation is available at https://doc.dkrz.de.

Why is no exact time schedule for training and migration published yet?

A TechTalk which gave a broad overview over the StrongLink system and over slk took place on 6 July 2021 (https://youtu.be/JtmelPQ3ypw). All mistral users are welcome to use a StrongLink test instance for training and testing adapted scripts (see How do I use the HSM/StrongLink test system?). We are currently in the final testing phase of the StrongLink System. The HPSS was taken offline on 15 Oct (user access deactivated at 8 AM). StrongLink is expected to go online on Monday, the 25 Oct, ten days after HPSS went down. We offer a HSM Q&A session each Thursday 11:30 - 12:30 AM from 30 Sept onwards (meeting URL: https://global.gotomeeting.com/join/681975669; details: Important Dates).

Who do I contact when I have questions or issues regarding the new HSM system and its usage?

Please contact us via support@dkrz.de or join our HSM-Q&A session each Thursday from 11:30 to 12:30 AM starting from 30 Sept onwards (details: Important Dates).

Archiving and Retrieval

How will I interact with the new system?

A command line tool for archival, retrieval and search of data will be provided. The tool is called slk. The slk is missing a few small but very useful features. Therefore, a tool called slk_helpers was written at DKRZ to add these features. Details on these two tools are provided in the HSM Documentation at https://doc.dkrz.de .

Where can I use slk and slk_helpers?

slk/slk_helpers will be installed as module slk/slk_helpers on all mistral login and compute nodes and, hence, can be used everywhere on mistral.

Can I still use pftp to interact with the new HSM system?

No. A new command line tool will be provided (please see “How will I interact with the new system?”).

How do I login to the HSM system?

Login is done via the command line tool using your DKRZ credentials (LDAP; like mistral/luv/…). The command line tool stores a login token for a specific time period (currently 30 days) so that you do not even need to go through the process of logging in for that period.

Will Kerberos authentication work on the new HSM system?

No, Kerberos will not work anymore. You only need to provide your login data to the command line tool in certain time intervals.

Do I have to provide my login credentials each time I use the command line tool?

No, a login token is generated at first login. This token is valid for a fixed period of time (currently 30 days) and will then have to be renewed by performing a login operation.

Can I use the command line tool non-interactively?

Yes, it can be used non-interactively when a login token exists. From time to time an interactive session of the command line tool is necessary in order to renew the login token. The command line tool returns proper exit codes so that the success or failure of a program call can automatically be evaluated.

Can I access archived data from outside the DKRZ?

Yes. The current method to access data via sftp/scp will remain unchanged for the first months except that the name of the server is different. Details are provided in the External Access section of th HSM documentation. New interfaces for access from outside of the DKRZ infrastructure are planned.

Do I have write access to the archive from outside the DKRZ?

No. slk is not made for data transfer via the internet. The pftp/scp access, which we offer, is read only. An exception are institutions who have direct network connection to DKRZ and currently have access to the HPSS via pftp. Users from these institutions will be able to use slk.

Will the new system be available as a Globus endpoint for external transfers?

No, not at the moment and not in the near future.

Does the tape quota (/arch, /doku), which was assigned to my computing time project, remain unchanged?

Yes, your tape quota remains the same.

How do I create directories in the HSM?

A slk mkdir does not exist but the slk_helpers provide it. Use slk_helps mkdir /ex/am/ple/dir if /ex/am/ple already exists and you only want to create dir. If you want to create several nested folders (like mkdir -p does) please use slk_helpers mkdirs /ex/am/ple/dir. If you do not want to use the slk_helpers and only slk, please do as follows: create empty directories locally, fill them with non-empty dummy files and archive them via slk archive -R. An example for this process is given in the Use Case section of the HSM documentation

Do I manually need to check the integrity of archived and retrieved files?

No, StrongLink automatically checks the integrity of archived and retrieved files. But, you can do manually as described in the answer to “Does StrongLink automatically check the integrity of archived and retrieved files?

Is there an option to continue archiving if it was interrupted?

If the archival of several files was interrupted, the slk archive will not upload files a second time after its restart but only those files that are not already present in the target folder. If only a part of the file was uploaded at the time of interruption, then the upload of the whole file will be restarted when the archival process is restarted. After stopping an archival process a file fragment will remain in the archive. Either the archival process has to be resumed (the fragment will be overwritten then) or the file has to be deleted manually. File fragments have basic metadata attached to them but no checksums. When you can get a checksum for a particular file from StrongLink (slk_helpers checksum ...) then the file was archived successfully.

Does any command exist for deleting files immediately from /work in case of successful archival?

No, such a tool does not exist. We currently do not plan to provide such a tool.

Will it be possible to archive into my existing folder structure created on HPSS?

Yes, the folder structure and write permissions will remain untouched. The root folder(s) will change.

Will there be a “double” storage feature as for HPSS?

Yes, there will be a “double” storage feature. slk list will indicate via an additional column whether a duplication has taken place already. Please see the chapter “Storage options and quota” in the new HSM documentation for details.

What does “namespace”, “global namespace” or “gns” mean?

StrongLink uses the term “namespace” or “global namespace” (=”gns”). A “(global) namespace” is comparable to a “directory” or “path” on a common file system.

How do I automatically/non-interactively check whether I own a valid slk login token?

slk does not provide a command that returns the status of the login tokes as true/false, valid/invalid or similar does not exist yet. But, you can check the validity of your login token via slk_helpers session. If you do not want to use the slk_helpers but check the status of the login token anyway, please use one of the following two commands:
# command 1:
$ slk list /dummy_input < /dev/null > /dev/null 2>&1

# command 2:
$ test `jq .expireDate ~/.slk/config.json` -gt `date +%s`

$? will be 0 if login token is valid and 1 if not. Thanks to Karl-Hermann Wieners for the first command.

You need to have the program jq available for the second command. jq is installed in /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/jq. You might add /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/ to your PATH or set an alias.

Is my slk login token still valid?

How to I check for how long my login token is still valid?

The simplest way to do this is to call slk_helpers session. Alternatively, the date/time until when the login token is stored in the slk config file (~/.slk/config.json). The key is expirationDate. The expiration date is given in seconds since 1970-01-01 00:00:00 UTC. You can convert it into a human-readable form via date -d @SECONDS. You might open the config file with a text editor or print its content with tools like cat or less.
date -d @`jq .expireDate ~/.slk/config.json`

You need to have the program jq available. jq is installed in /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/jq. You might add /sw/rhel6-x64/devtools/jq-1.6-gcc48/bin/ to your PATH or set an alias.

Can I provide a file list to “slk archive” such as “-T” for “tar”?

Currently, this is not possible.

Can a user run multiple archival and retrieval requests at a time?

Yes, that is possible. However, on interactive nodes (mlogin10X, mistralppY) or in interactive sessions (via salloc) we suggest to run only one slk call per user and node to avoid memory issues.

Where on mistral should I run slk?

slk uses much CPU time and memory. Therefore, slk archive and slk retrieve `` should only be used for small amounts of data on the login nodes (``mlogin10X). For large amounts of data, we suggest to use the compute/compute2 nodes or the interactive mistralpp nodes. For details see Where and how to use slk.

How does slk archive the files: does it tar them itself (similar to packems) or should we tar the files before hand?

slk does not packs/tar files. Metadata from netCDF files will automatically imported into the StrongLink database to simplify search and retrieval later on. Direct archiveal of nc-files is preferable with respect to the metadata import feature. However, many small files are bad for tape performance and might cost additional storage space (see for details StrongLink HSM -> Packing of data. Therefore, the usage of packems is reasonable in the case of large amount of very small files.

Are there requirements on the file size for the tape archival?

Preferred file size: 10 GB to 100 GB; Lower size limit: small files are not optimal for tape storage. Therefore, we encourage users to pack small files if there is no need to use the netCDF metadata features of StrongLink. Upper size limit: We are testing bigger file sizes right now, but for the first weeks we recommend the same sizes as for HPSS (max. 500 GB). However, this might change soon.

Additional features

Which new features does the HSM System provide?

Extended metadata is harvested from many archived files. Individual files can be searched for by slk search based on these extended metadata. A more user-friendly command line tool is planned to be made available in future.

From which file types will extended metadata be harvested?

Harvesting from netCDF files will be provided by StrongBox. Further formats are being investigated and could be introduced later.

Which metadata fields will be harvested from netCDF files?

All global attributes and variable names of netCDF files will be stored in a metadata database. It will be possible to search for each of these global attributes. Hence, properly self-described and standardized files will be easier to find later on. These metadata are read-only. Metadata from a standardized subset of global attributes will be copied into an indexed metadata database. These can be modified and searched more efficiently.

Is there a python interface available?

No, there is no python interface available.

Is it possible to use slk chmod and slk group (=chgrp) commands recursively by the user?

Yes, it is possible. Please provide -R to apply these commands recursively.

Are the search IDs user specific?

No, the search IDs are assigned globally. E.g. the search ID 423 exists only once. Each search ID can be used by every user. Thus, you can share your search IDs with your colleagues. However, the output of slk list SEARCH_ID or retrieval of slk retrieve SEARCH_ID ... depends on the read permissions of the executing user.

How long are the search IDs stored?

This is not decided yet. This is configurable by the administrators. We will monitor whether (or when) a performance degradation takes place and act accordingly.

Is a search ID automatically updated when new files are archived which match the original search query?

No, the IDs of files matching the search query are stored once when the search is performed. This list of these file IDs will not be updated afterwards – except if files on the list are deleted. However, file specific metadata, such as file size or permission, are retrieved at the time when the search ID is used. slk list SEARCH_ID will show todays sizes of files covered by the search ID SEARCH_ID. Files that first matched the a search query are still listed by slk list even if they no longer match the original search query. This might happen if a file is renamed.

Can I share my search’s search ID with other DKRZ users?

Yes, you can. Please see “Are the search IDs user specific?” for details.

What does “RQL” mean?

RQL abbreviates “recource query language” and is another name for the “StrongLink Query Language”.

Why does slk search show more search results than slk list lists for this search id?

We see this behaviour of slk search and slk list:
$ slk search '{"resources.name": {"$regex": "hops"}}'
Total resources found: 2. Search complete.
Search ID: 735

$ slk list 735 | cat
/ex/am/ple/testing/testing/test03   -rw-r--r--  k204221  group1076   4 B   08 Apr 2021  hops_$.txt
Files: 1

slk search counts all files and namespaces that match the search query that the user is allowed to see/read. slk list search_id prints only files (no namespaces) that the user is allowed to see/read. In contrast, slk list namespace lists files and sub-namespaces in a namespace. The example below clarifies the situation. In the example we assume that the subfolder test does not contain any files.

$ slk search '{"path": {"$gte": "/ex/am/ple/testing/testing/test03/test"}}'
Total resources found: 3. Search complete.
Search ID: 856

$ slk list 856
-rw-r--r--  k204221    bm0146    16.1M  01 Apr 2021   some_file.nc
Files: 1

$ slk list /ex/am/ple/testing/testing/test03/test
drwxr-xr-x  k204221    bm0146           06 Apr 2021   test1
drwxr-xr-x  k204221    bm0146           06 Apr 2021   test2
-rw-r--r--  k204221    bm0146   16.1M   01 Apr 2021   some_file.nc
Files: 3

Is there any possibility to move around in the filesystem with something like the cd command?

No, this is not possible. The slk does not start it own shell like pftp or pure ftp do. It rather works like scp.

When slk list shows a file with “-” (not “t”) which means it exists at the cache: Does that mean it is not yet on the tape?

Right now it means that the file is in the cache. It can be on the tape. If the t is shown, it means the file is only on tape - we are trying to show the duality at some point.

For a better overview of the archived files, Is there a possibility to list only folders, not all files?

When you use slk list with a specific directory path, it shows all the files and directories in that specific directory that is listed. If you use the -R flag, it shows all the files and folders in that directory path. So if you want a clean overview, excluding -R would be the way. You might use slk list GNS_PATH | grep -E "^d" to print only folders.

Is it possible to remove files from the archive?

Yes you can use slk delete for removing files and slk delete -R for removing directories.

How to print the version of slk?

Please run slk version to print the version of slk. A --version flag or similar does not exist.

Advanced Technical Aspects

Can a user influence if data is written into the HSM cache or onto tape?

No. Fresh data (meant for archival) is first copied into the disc cache and then slowly written onto tape. When data is retrieved from tape, it is first copied into the disc cache and from there to the user-defined target file system.

How much time does a file stay on the cache?

We canno’t give any numbers. The residence time in cache depends on the size of the files and the usage of the cache. We will run clean up jobs regularly.

How fast can be read from the HSM?

The target transfer rate between single nodes on mistral and the HSM cache is 1 GB/s. It might be reduced when the traffic is high. The retrieval rate from tape considerably depends on how many other read and write operations of other users are performed in paralle

How do I determine the id (uid) of a DKRZ user?

Please use one of the following commands:
# get your user id
$ id -u

# get the id of any user
$ id USER_NAME -u

# get the id of any user
$ getent passwd USER_NAME
#  OR
$ getent passwd USER_NAME | awk -F: '{ print $3 }'

How do I determine the id (gid) of a DKRZ group?

Please use one of the following commands:
# get group ID and group members
$ getent group GROUP_NAME
#  OR
$ getent group GROUP_NAME | awk -F: '{ print $3 }'

# get groups and their ids of all groups of which member you are
$ id

How do I determine the username of a DKRZ user when I have her/his id (uid)?

Please use the following command:
# get the name of a user with uid USER_ID
$ getent passwd USER_ID
#  OR
$ getent passwd USER_ID | awk -F: '{ print $1 }'

How do I determine the group name of a DKRZ group when I have its id (gid)?

Please use one of the following commands:
# get group name of a groupd with gid GROUP_ID
$ getent group GROUP_ID
#  OR
$ getent group GROUP_ID | awk -F: '{ print $1 }'

How do I determine the MIME type of a file?

You could use file --mime-type FILE or file -b --mime-type FILE to determine the MIME type on the Linux shell. Please be aware that different tools determine the MIME type differently (i.e. by file header or by file extension) and MIME type databases might differ. It might be better not to search for a specific MIME type but for a particular file extension – e.g. via {"resources.name": {"$regex": ".*nc$"}}. StrongLink allocates the MIME type application/x-netcdf to netCDF files.

Can the search ID of slk search be captured by a shell variable?

slk search does not provide this feature out of the box. Currently (might change in future versions), the search ID is printed in columns >= 12 of the second row of the text output of slk search. We can use tail and sed to get the second line and extract a number or use tail and cut to get the second line and drop the first 11 characters. Example:
# normal call of slk search
$ slk search '{"resources.posix_uid": 23501}'
Total resources found: 11. Search complete.
Search ID: 466

# get ID using sed:
$ search_id=`slk search '{"resources.posix_uid": 23501}' | tail -n 1 | sed 's/[^0-9]*//g'`
$ echo $search_id
470

# get ID by dropping first 11 characters of the second line
$ search_id=`slk search '{"resources.posix_uid": 23501}' | tail -n 1 | cut -c12-20`
$ echo $search_id
471

# use awk pattern matching to get the correct line and correct column
$ search_id=`slk search '{"resources.posix_uid": 25301}' | awk '/Search ID/ {print($3)}'`
$ echo $search_id
507

Note

This is an example for bash. When using csh, you need to prepend set `` in front of the assignments of the shell variables: ``set search_id=....

Is the metadata of files within zip/tar files evaluated/ingested?

No, the metadata of packed files will not be ingested. The names of packed files might be extracted and stored somewhere. This is an optional feature that will only be implemented when the capacity of StrongBox Data Solutions allows it.

Will the packems package work with the new HSM system?

Packems is a joint development between the MPI-M and the DKRZ. The developers of both institutions plan to adapt packems to the new HSM system. Please have a look into the packems manual for details and usage of packems: https://code.mpimet.mpg.de/projects/esmenv/wiki/Packems.

Will it be possible to use listems to list files that were archived with packems on the HPSS?

Yes. All files archived with packems onto the HPSS can be listed with listems.

Will it be possible to use unpackems to retrieve files that were archived with packems on the HPSS?

Yes. All files archived with packems onto the HPSS can be retrieved with unpackems.

Can you work directly with files in the archive (e.g. with Python)?

No, you have to download files to change them and archive them again.

Common issues

error “conflict with jdk/…” when the slk module is loaded

slk needs a specific Java version that is automatically loaded with slk. Having other Java versions loaded in parallel might cause unwanted side effects. Therefore, the system throws an error message and aborts.

slk needs a specific Java version

You might encounter an error like this:

$ slk list 12
CLI tools require Java 13 (found 1)

slk needs a specific Java version. This Java version is automatically loaded when we load the slk module. If you have another Java loaded explicitely, please unload them prior to loading the slk module. If you loaded slk already, please: (1) unload slk, (2) unload all Java modules and (3) load slk again.

slk search yields RQL parse error

ERROR: Search failed. Reason: RQL parse error: No period found in collection field name ().

Either: Please consider using ' around your search query instead of " to prevent operators starting with $ to be evaluated as bash variables.

Or: Please escape $’s belongig to query operators when you use " as delimiters of the query string.

slk login asks me to provide a hostname and/or a domain

If you are asked for this information the configuration is faulty. Please contact support@dkrz.de and tell us on which machine you are working.

Session key has expired

The error message WARNING: Session key has expired. Please login again: (interactive usage) or ERROR Session key has expired, unable to login in non-interactive mode (non-interactive usage; e.g. in SLURM batch script) is printed but the session key is not yet 30 days old.

This error/warning might be printed if the connection to StrongLink is not stable or if StrongLink is overloaded. Please contact support@dkrz.de if you experiance it.

Login Unsuccessful - Incorrect Credentials

You log to the slk with the correct login credentials but get the error Login Unsuccessful - Incorrect Credentials. There is an internal issue in StrongLink through which no new login is possible in the moment. Please contact support@dkrz.de if you experiance it.

Archival fails and Java NullPointerException in the log

This error message is printed in the log:

2021-07-13 08:33:03 ERROR Unexpected exception
java.lang.NullPointerException: null
    at com.stronglink.slkcli.api.websocket.NodeThreadPools.getBestPool(NodeThreadPools.kt:28) ~[slk-cli-tools-3.1.62.jar:?]
    at com.stronglink.slkcli.archive.Archive.upload(Archive.kt:191) ~[slk-cli-tools-3.1.62.jar:?]
    at com.stronglink.slkcli.archive.Archive.uploadResource(Archive.kt:165) ~[slk-cli-tools-3.1.62.jar:?]
    at com.stronglink.slkcli.archive.Archive.archive(Archive.kt:77) [slk-cli-tools-3.1.62.jar:?]
    at com.stronglink.slkcli.SlkCliMain.run(SlkCliMain.kt:169) [slk-cli-tools-3.1.62.jar:?]
    at com.stronglink.slkcli.SlkCliMainKt.main(SlkCliMain.kt:103) [slk-cli-tools-3.1.62.jar:?]
2021-07-13 08:33:03 INFO

This error indicates that there is an API issue. A reason might be that one or more StrongLink nodes went offline and the other nodes did not take of their connections yet. Please notify support@dkrz.de if you experiance this error.

Changelog

v1.19, 20 September 2021

  • changed title of FAQ

  • corrected FAQ’s Changelog

v1.18, 17 September 2021

  • added cross-references

  • minor layout changes

v1.17, 17 September 2021

v1.14, 12 July 2021

v1.11, 06 May 2021

v1.06, 08 March 2021

v1.01, 28 January 2021

  • first public version