Translations:Best practices for job submission/10/en

From Alliance Doc
Jump to navigation Jump to search
  • Increase the estimated duration by 5% or 10%, just in case.
    • It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible.
  • Longer jobs, such as those with a duration exceeding 48 hours, should consider using checkpoints if the software permits this.
    • With a checkpoint, the program writes a snapshot of its state to a diskfile and the program can then be restarted from this diskfile, at that precise point in the calculation. In this way, even if there is a power outage or some other interruption of the compute node(s) being used by your job, you won't necessarily lose much work if your program writes a checkpoint file every six or eight hours.