HSM News for Dec 2023#

Hohoho,

Von draussn vom Walde komm ich her;
Ich muss euch sagen, es schneiet doch sehr!

While StrongLink is still far from perfection
I bring you a nice new tool collection.

(frei nach “Knecht Ruprecht” von Theodor Storm)

Table of contents#

new version slk_helpers (1.10.2)#

We have release a new slk_helpers version 1.10.2. The major changes with respect to version 1.9.x are:

  • slk_helpers allow to run verify jobs which check whether archived files are smaller than expected

  • important new commands:
    • is_admin_session: Check if the use is currently logged in as normal user or admin user

    • job_report: fetch raw verify job report; please use result_verify_job instead if possible

    • print_rcrs: print size and checksums of file parts; some HPSS files are stored as two parts on two tapes

    • result_verify_job: list relevant errors found by a verify job

    • search_incomplete: Prints whether the search is incomplete (still running)

    • search_successful: Prints whether the search was successful

    • submit_verify_job: run a verify job for a provided set of files

  • new job states: BLOCKED, PAUSED, STOPPED, WAITING, OTHER

  • extended command size by new parameters: * -R / --recursive for requesting the size of the content of namespaces recursively * --pad-spaces-left for space padding to the left in order to align file/namespaces sizes when the command is called multiple times

  • improved handling of connection timeouts of StrongLink

  • changes which might break established workflows:
    • slk_helpers size does not return exit code 2 anymore when namespaces are targeted but 0 as exit code and 0 as size if -R / --recursive is not set.

The module with the new slk_helpers version is slk/3.3.91_h1.10.2_w1.2.1. I is not set as default module yet but will be on 18 Dec. pyslk has not been updated yet but existing functions will work with the new slk_helpers version.

new version slk wrappers (1.2.1)#

The slk wrapper collection has been updated to version 1.2.1

  • higher stability with respect to timeouts of StrongLink

  • improved error and warning output

  • two new slk wrapper scripts:
    • slk_wrapper_daily_login_check: starts SLURM jobs which daily check whether the user’s login token is still valid; if the token is due to expire, an email is send to the user

    • slk_wrapper_weekly_verify_job: starts SLURM jobs which weekly run a verify job for all files of the user in the HSM/StrongLink cache; sends a summary email after each job finished

Note

If the StrongLink system is under high load, timeout errors might occur. The new slk_helpers and slk_wrappers version have been made more robust with respect to these errors. However, the wrappers still might fail due to timeouts.

See also

new version pyslk#

No new pyslk version is released. The current pyslk version (1.9.5) works with the new version of the slk_helpers except that no wrappers exists for new helpers commands.

cleanup and renaming of slk modules#

  • many modules containing old slk / slk_helpers versions have been removed

  • modules which will be removed end of Dec 2023:
    • slk/3.3.91

    • slk_helpers/1.9.9

    • slk_helpers/1.9.7

    • slk_helpers/1.9.6

    • slk_helpers/1.9.5

  • new combined module for slk, slk_helpers and slk wrappers:
    • slk/<SLK_VERSION>_h<SLK_HELPERS_VERSION>_w<SLK_WRAPPERS_VERSION>

    • current default module: slk/3.3.91_h1.9.7_w1.0.0

    • next default module: slk/3.3.91_h1.10.2_w1.2.1

    • alternative modules (please shift to slk/3.3.91_h1.10.2_w1.2.1 as soon as possible)
      • slk/3.3.91_h1.10.1_w1.2.0

      • slk/3.3.91_h1.10.2_w1.2.0

file size verification#

The size of files can be verified by verify jobs. This is particularly useful to identify files which have been archived incompletely. These jobs can only target files, which are stored in the HSM cache. Files, which have already been written to tape, passed an interal file size verification and are not tested.

Please start a verify job as follows:

$ slk_helpers submit_verify_job /dkrz_test/netcdf/20230925a -R
Submitting up to 1 verify job(s) based on results of search id 576002:
search results: pages 1 to 1 of 1; visible search results: 10; submitted verify job: 176395
Number of submitted verify jobs: 1

A verify job with the id 176395 was submitted. It is in the same queue as recall jobs are. Thus, if many files are recalled and the StrongLink queue is well filled, verify jobs might need to wait some time until they are processed. The job status is checked as follows:

$ slk_helpers job_status 176395
PROCESSING

# wait a few seconds or minutes ...
$ slk_helpers job_status 176395
COMPLETED

The results of the verify job can be fetched via slk_helpers result_verify_job:

$ slk_helpers result_verify_job 176395
Errors:
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_b.nc
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_c.nc
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_a.nc
Resource content size does not match record: /dkrz_test/netcdf/20230925a/file_001gb_f.nc
Erroneous files: 4

Four size-mismatch errors were detected. The this case, these files should be re-archived or deleted from the archive.

See also