rsnt_translations
53,037
edits
No edit summary |
(Created page with "==''sbatch: error: Batch job submission failed: Socket timed out on send/recv operation'' ==") |
||
Line 35: | Line 35: | ||
Pour plus d'information, consultez la page [[Project layout/fr|Espace projet]]. | Pour plus d'information, consultez la page [[Project layout/fr|Espace projet]]. | ||
== | ==''sbatch: error: Batch job submission failed: Socket timed out on send/recv operation'' == | ||
You may see this message when the load on the [[Running jobs|Slurm]] manager or scheduler process is too high. We are working both to improve Slurm's tolerance of that and to identify and eliminate the sources of load spikes, but that is a long-term project. The best advice we have currently is to wait a minute or so. Then run <code>squeue -u $USER</code> and see if the job you were trying to submit appears: in some cases the error message is delivered even though the job was accepted by Slurm. If it doesn't appear, simply submit it again. | You may see this message when the load on the [[Running jobs|Slurm]] manager or scheduler process is too high. We are working both to improve Slurm's tolerance of that and to identify and eliminate the sources of load spikes, but that is a long-term project. The best advice we have currently is to wait a minute or so. Then run <code>squeue -u $USER</code> and see if the job you were trying to submit appears: in some cases the error message is delivered even though the job was accepted by Slurm. If it doesn't appear, simply submit it again. |