Running jobs/fr: Difference between revisions

Jump to navigation Jump to search
Updating to match new version of source page
No edit summary
(Updating to match new version of source page)
Line 213: Line 213:
  #SBATCH --mail-type=REQUEUE
  #SBATCH --mail-type=REQUEUE
  #SBATCH --mail-type=ALL
  #SBATCH --mail-type=ALL
=== Attaching to a running job ===
It is possible to connect to the node running a job and execute new processes there. You might want to do this for troubleshooting or to monitor the progress of a job.
Suppose you want to run the utility [https://developer.nvidia.com/nvidia-system-management-interface <code>nvidia-smi</code>] to monitor GPU usage on a node where you have a job running. The following command runs <code>watch</code> on the node assigned to the given job, which in turn runs <code>nvidia-smi</code> every 30 seconds, displaying the output on your terminal.
{{Command2
|srun --jobid 123456 --pty watch -n 30 nvidia-smi}}
It is possible to launch multiple monitoring commands using [https://en.wikipedia.org/wiki/Tmux <code>tmux</code>]. The following command launches <code>htop</code> and <code>nvidia-smi</code> in separate panes to monitor the activity on a node assigned to the given job.
{{Command2
|srun --jobid 123456 --pty tmux new-session -d 'htop -u $USER' \; split-window -h 'watch nvidia-smi' \; attach}}
Processes launched with <code>srun</code> share the resources with the job specified. You should therefore be careful not to launch processes that would use a significant portion of the resources allocated for the job. Using too much memory, for example, might result in the job being killed; using too many CPU cycles will slow down the job.
'''Note''' the <code>srun</code> commands shown above work only to monitor a job submitted with <code>sbatch</code>. To monitor an interactive job, create multiple panes with <code>tmux</code> and start each process in its own pane.


== Annulation d'une tâche ==
== Annulation d'une tâche ==
35,719

edits

Navigation menu