Skip to main content

5.6) Checkpointing


As mentioned before, jobs exceeding the requested run time will be terminated by the scheduler. It is always good practice to periodically save a program’s state to memory so that it can be restarted from the most recent saved data.

Currently there is no system level automatic checkpointing implemented on ARC2. User/application level checkpointing is the available option. Some applications have a restart/checkpoint feature, and should be used if available. If you are developing your own code, you should consider implementing a checkpoint feature. For example:

  • After a main iteration, save the data necessary to restart a run to a file.
  • Have a version of the code that can read in the restart file and resume from the last completed iteration.

Checkpointing can act as a safeguard against system crashes, as well as providing a method to run jobs that require more than the maximum allowed 48 hours.