File system performance tuning¶
The following recommendations might help to improve throughput and metadata performance on parallel filesystems, i.e. on $HOME, $PROJECT, LSDF, workspaces and BeeOND.
Improving Throughput Performance¶
When you are designing your application you should consider that the performance of parallel filesystems is generally better if data is transferred in large blocks and stored in few large files. In more detail, to increase throughput performance of a parallel application following aspects should be considered:
-
collect large chunks of data and write them sequentially at once
-
to exploit complete filesystem bandwidth use several clients
-
avoid competitive file access from different tasks or clients
Spectrum Scale normally uses all disks to store the data of huge files, i.e. no adaptions are required by the user. Other parallel filesystems such as BeeOND use a fixed stripe count to select the number of disks which are used for a single file. Therefore, if many tasks use few huge files on BeeOND a directory with a high stripe count should be selected on the root of the BeeOND file system.
Improving Metadata Performance¶
Metadata performance on parallel file systems is usually not as good as with local filesystems. Therefore, you should omit metadata operations whenever possible. For example, it is much better to have few large files than lots of small files. In more detail, to increase metadata performance of a parallel application following aspects should be considered:
-
avoid creating many small files
-
avoid competitive directory access, e.g. by creating files in separate subdirectories for each task
-
if many small files are only used by one node store them on $TMP