HSM News for Oct 2023#

pyslk release 1.9.5#

  • will be installed in module python3/2023.01-gcc-11.2.0 on Oct 17th

  • major changes in pyslk 1.9.3, 1.9.4 and 1.9.5:
    • new function pyslk.construct_dst_from_src() which accepts source file(s) and a destination root path is input and constructs the destination path of each source file if it was archived by slk archive

    • function json_str2hsm was not implemented before

    • improved type checking and corrected output types

    • extended deprecated warnings of functions

    • updated error types and error messages

    • major updates to the pyslk documentation (https://hsm-tools.gitlab-pages.dkrz.de/pyslk/index.html)

  • detailed changes: https://hsm-tools.gitlab-pages.dkrz.de/pyslk/changelog.html

  • next will be release 2.0.0; in Nov/Dec 2023 installed in python3/unstable

slk retrieval wrapper#

We provide a few wrapper scripts for slk retrieve and slk recall as part of the slk module on Levante. The core wrapper script is called slk_wrapper_recall_wait_retrieve. The argument --help prints details on the usage:

$ slk_wrapper_recall_wait_retrieve --help
usage:
slk_wrapper_recall_wait_retrieve <account> <source_path> <destination_path> <suffix_logfile>

useful log files:
    slk log file: ~/.slk/slk-cli.log
    wrapper log file: rwr_log_<suffix_logfile>.log
  • <account> has to be a DKRZ project account with allocated compute time. Your account has to be allowed to run SLURM jobs on Levante.

  • <source_path> can be a search id, a path pointing to a namespace or a path pointing to a resource. The wrapper script automatically starts recursive recalls and retrievals. However, it does not split the files by tape. If you wish to combine this wrapper with slk_helpers group_files_by_tape please have a look into this example.

  • <destination_path> destinatino path of the retrieval.

  • <suffix_logfile>: The script outmatically created a log file rwr_log_<suffix_logfile>.log into which relevant output from this script and from child scripts is written.

What does this script do? If the files do not need to be recalled but are stored in the HSM cache, a retrieval is directly started. Otherwise, it submits a SLURM job, which runs slk recall. The ID of the StrongLink recall job is extracted and a new “waiter job” is submitted to SLURM which is delayed by one hour. After one hour this “waiter job” starts and checks the status of the recall job with the given ID. If the recall job …

  • … was successful, a retrieval job is started to copy the file(s) from HSM cache to the Lustre filesystem.

  • failed, error information is printed to the log file and the script terminates.

  • … is still running or queued, the waiter job submits itself again with another hour of delay.

Note

If the StrongLink system is under high load and many timeout occur, the slk-recall-wait-retrieve wrapper might fail in between.

See also

This example on using the wrapper script in combination with slk_helpers group_files_by_tape