Frequently Asked Questions/fr: Difference between revisions

Jump to navigation Jump to search
Created page with "==''sbatch: error: Batch job submission failed: Socket timed out on send/recv operation'' =="
No edit summary
(Created page with "==''sbatch: error: Batch job submission failed: Socket timed out on send/recv operation'' ==")
Line 35: Line 35:
Pour plus d'information, consultez la page [[Project layout/fr|Espace projet]].
Pour plus d'information, consultez la page [[Project layout/fr|Espace projet]].


== "sbatch: error: Batch job submission failed: Socket timed out on send/recv operation" ==
==''sbatch: error: Batch job submission failed: Socket timed out on send/recv operation'' ==


You may see this message when the load on the [[Running jobs|Slurm]] manager or scheduler process is too high. We are working both to improve Slurm's tolerance of that and to identify and eliminate the sources of load spikes, but that is a long-term project. The best advice we have currently is to wait a minute or so. Then run <code>squeue -u $USER</code> and see if the job you were trying to submit appears: in some cases the error message is delivered even though the job was accepted by Slurm. If it doesn't appear, simply submit it again.
You may see this message when the load on the [[Running jobs|Slurm]] manager or scheduler process is too high. We are working both to improve Slurm's tolerance of that and to identify and eliminate the sources of load spikes, but that is a long-term project. The best advice we have currently is to wait a minute or so. Then run <code>squeue -u $USER</code> and see if the job you were trying to submit appears: in some cases the error message is delivered even though the job was accepted by Slurm. If it doesn't appear, simply submit it again.
rsnt_translations
53,037

edits

Navigation menu