35,719
edits
No edit summary |
(Updating to match new version of source page) |
||
Line 213: | Line 213: | ||
#SBATCH --mail-type=REQUEUE | #SBATCH --mail-type=REQUEUE | ||
#SBATCH --mail-type=ALL | #SBATCH --mail-type=ALL | ||
=== Attaching to a running job === | |||
It is possible to connect to the node running a job and execute new processes there. You might want to do this for troubleshooting or to monitor the progress of a job. | |||
Suppose you want to run the utility [https://developer.nvidia.com/nvidia-system-management-interface <code>nvidia-smi</code>] to monitor GPU usage on a node where you have a job running. The following command runs <code>watch</code> on the node assigned to the given job, which in turn runs <code>nvidia-smi</code> every 30 seconds, displaying the output on your terminal. | |||
{{Command2 | |||
|srun --jobid 123456 --pty watch -n 30 nvidia-smi}} | |||
It is possible to launch multiple monitoring commands using [https://en.wikipedia.org/wiki/Tmux <code>tmux</code>]. The following command launches <code>htop</code> and <code>nvidia-smi</code> in separate panes to monitor the activity on a node assigned to the given job. | |||
{{Command2 | |||
|srun --jobid 123456 --pty tmux new-session -d 'htop -u $USER' \; split-window -h 'watch nvidia-smi' \; attach}} | |||
Processes launched with <code>srun</code> share the resources with the job specified. You should therefore be careful not to launch processes that would use a significant portion of the resources allocated for the job. Using too much memory, for example, might result in the job being killed; using too many CPU cycles will slow down the job. | |||
'''Note''' the <code>srun</code> commands shown above work only to monitor a job submitted with <code>sbatch</code>. To monitor an interactive job, create multiple panes with <code>tmux</code> and start each process in its own pane. | |||
== Annulation d'une tâche == | == Annulation d'une tâche == |