Using nearline storage

From CC Doc
Jump to: navigation, search
Other languages:
English • ‎français

Nearline is a filesystem virtualized onto tape

Nearline storage is a disk-tape hybrid filesystem with a layout like Project, except that it can virtualize files by moving them to tape-based storage on criteria like age and size, and then back again upon read or recall operations. This is a way to manage less used files. On tape, the files do not consume your disk quota, but they can still be accessed, albeit slower than with the home, scratch and project filesystems.

This is useful because the capacity of our tape libraries is both large and expandable. When a file has been moved to tape (or virtualized), it still appears in the directory listing. If the virtual file is read, the reading process will block for some time, probably a few minutes, while the file contents is copied from tape to disk.

Expected use

Because of the delay in reading from tape, nearline is not intended to be used by jobs where allocated time would be wasted. It is only accessible as a directory on certain nodes of the clusters, but never on compute nodes.

Nearline is intended for use with relatively large files and should not be used for a large number of small files. In fact, files smaller than a certain threshold size may not be moved to tape at all. Files smaller than ~200MB should be combined into archive files (tarballs) using tar or a similar tool.

Access

Nearline is only accessible as a directory on login nodes and on DTNs (Data Transfer Nodes).

To use nearline, just put files into your ~/nearline/PROJECT directory. After a period of time (24 hours as of February 2019), they will be copied onto tape. If the file remains unchanged for another period (24 hours as of February 2019), the copy on disk will be removed, making the file virtualized on tape.

If you remove a file from ~/nearline, the tape copy will be retained for up to 60 days. To restore such a file, contact technical support with the full path for the file(s) and desired version (by date), just as you would for restoring a backup. Note that since you will need the full path for the file, it is important for you to retain a copy of the complete directory structure of your nearline space. For example, you can run the command ls -R > ~/nearline_contents.txt from the ~/nearline/PROJECT directory so that you have a copy of the location of all the files.

Nearline service similar to that on Graham will be available soon.

HPSS is the nearline service on Niagara.
There are three methods to access the service:

1. By submitting HPSS-specific commands htar or hsi to the Slurm scheduler as a job in one of the archive partitions; see the HPSS documentation for detailed examples. Using job scripts offers the benefit of automating nearline transfers and is the best method if you use HPSS regularly. Your HPSS files can be found in the $ARCHIVE directory, which is like $PROJECT but with /project replaced by /archive.

2. To manage a small number of files in HPSS, you can use the VFS (Virtual File System) node, which is accessed with the command salloc --time=1:00:00 -pvfsshort. Your HPSS files can be found in the $ARCHIVE directory, which is like $PROJECT but with /project replaced by /archive.

3. By using Globus for transfers to and from HPSS using the endpoint computecanada#hpss. This is useful for occasional usage and for transfers to and from other sites.

Nearline service similar to that on Graham will be available soon.