Translations:Data management at Niagara/2/en

From Alliance Doc
Jump to navigation Jump to search

Performance

The filesystems on SciNet, with the exception of archive, are GPFS, a high-performance filesystem which provides rapid reads and writes to large datasets in parallel from many nodes. As a consequence of this design, however, the filesystem performs quite poorly at accessing data sets which consist of many, small files. For instance, you will find that reading data in from one 16MB file is enormously faster than from 400 40KB files. Such small files are also quite wasteful of space, as the blocksize for the scratch and project filesystems is 16MB. This is something you should keep in mind when planning your input/output strategy for runs on SciNet.