Frequently Asked Questions
"Disk quota exceeded" on /project filesystem
Some users have seen this message or some similar quota error on their project folders. Other users have reported obscure failures while transferring files into their
/project folder from another cluster. Many of the problems reported are due to bad file ownership.
diskusage_report to see if you are at or over your quota:
[ymartin@cedar5 ~]$ diskusage_report Description Space # of files Home (user ymartin) 345M/50G 9518/500k Scratch (user ymartin) 93M/20T 6532/1000k Project (group ymartin) 5472k/2048k 158/5000k Project (group/def-zrichard) 20k/1000G 4/5000k
The example above illustrates a frequent problem:
/project for user
ymartin contains too much data in files belonging to group
ymartin. The data should instead be in files belonging to
Note the two lines labelled
Project (group ymartin)describes files belonging to group
ymartin, which has the same name as the user. This user is the only member of this group, which has a very small quota (2048k).
Project (group def-zrichard)describes files belonging to a project group. Your account may be associated with one or more project groups, and they will typically have names like
In this example, files have somehow been created belonging to group
ymartin instead of group
def-zrichard. This is neither the desired nor the expected behaviour
By design, new files and directories in
/project will normally be created belonging to a project group. The two main reasons why files may be associated with the wrong group are that
- files were moved from
mvcommand; to avoid this, use
- files were transfered from another cluster using rsync or scp with an option to preserve the original group ownership. If you have a recurring problem with ownership, check the options you are using with your file transfer program.
To see the project groups you may use, run the following command:
[name@server $] stat -c %G $HOME/projects/*/
If you are the owner of the files, you can run the
chgrp command to change their group ownership to the appropriate project group. To ask us to change the group owner for several users, contact technical support.
See Project layout for further explanations.
"sbatch: error: Batch job submission failed: Socket timed out on send/recv operation"
You may see this message when the load on the Slurm manager or scheduler process is too high. We are working both to improve Slurm's tolerance of that and to identify and eliminate the sources of load spikes, but that is a long-term project. The best advice we have currently is to wait a minute or so. Then run
squeue -u $USER and see if the job you were trying to submit appears: in some cases the error message is delivered even though the job was accepted by Slurm. If it doesn't appear, simply submit it again.