PetaVision  Alpha
PV::Checkpointer Class Reference
Inheritance diagram for PV::Checkpointer:
PV::Subject

Data Structures

struct  TimeInfo
 

Checkpointer Parameters

List of parameters needed from the Checkpointer class

virtual void ioParam_verifyWrites (enum ParamsIOFlag ioFlag, PVParams *params)
 verifyWrites: If true, calls to FileStream::write are checked by opening the file in read mode and reading back the data and comparing it to the data just written.
 
virtual void ioParam_outputPath (enum ParamsIOFlag ioFlag, PVParams *params)
 mOutputPath: Specifies the absolute or relative output path of the run
 
void ioParam_checkpointWrite (enum ParamsIOFlag ioFlag, PVParams *params)
 checkpointWrite: Flag to determine if the run writes checkpoints.
 
void ioParam_checkpointWriteDir (enum ParamsIOFlag ioFlag, PVParams *params)
 checkpointWriteDir: If checkpointWrite is set, specifies the output checkpoint directory.
 
void ioParam_checkpointWriteTriggerMode (enum ParamsIOFlag ioFlag, PVParams *params)
 mCheckpointWriteTriggerMode: If checkpointWrite is set, specifies the method to checkpoint. More...
 
void ioParam_checkpointWriteStepInterval (enum ParamsIOFlag ioFlag, PVParams *params)
 checkpointWriteStepInterval: If checkpointWrite on step, specifies the number of steps between checkpoints.
 
void ioParam_checkpointWriteTimeInterval (enum ParamsIOFlag ioFlag, PVParams *params)
 checkpointWriteTimeInteval: If checkpointWrite on time, specifies the amount of simulation time between checkpoints.
 
void ioParam_checkpointWriteClockInterval (enum ParamsIOFlag ioFlag, PVParams *params)
 checkpointWriteClockInteval: If checkpointWrite on clock, specifies the amount of clock time between checkpoints. The units are specified using the parameter checkpointWriteClockUnit
 
void ioParam_checkpointWriteClockUnit (enum ParamsIOFlag ioFlag, PVParams *params)
 checkpointWriteClockInteval: If checkpointWrite on clock, specifies the units used in checkpointWriteClockInterval.
 
void ioParam_checkpointIndexWidth (enum ParamsIOFlag ioFlag, PVParams *params)
 If checkpointWrite is true, checkpointIndexWidth specifies the minimum width for the step number appearing in the checkpoint directory. More...
 
void ioParam_suppressNonplasticCheckpoints (enum ParamsIOFlag ioFlag, PVParams *params)
 
void ioParam_deleteOlderCheckpoints (enum ParamsIOFlag ioFlag, PVParams *params)
 deleteOlderCheckpoints: If checkpointWrite, specifies if the run should delete older checkpoints when writing new ones.
 
void ioParam_numCheckpointsKept (enum ParamsIOFlag ioFlag, PVParams *params)
 mNumCheckpointsKept: If mDeleteOlderCheckpoints is set, keep this many checkpoints before deleting the checkpoint. Default is 1 (delete a checkpoint when a newer checkpoint is written.)
 
void ioParam_initializeFromCheckpointDir (enum ParamsIOFlag ioFlag, PVParams *params)
 initializeFromCheckpointDir: Sets directory used by Checkpointer::initializeFromCheckpoint(). Layers and connections use this directory if they set their initializeFromCheckpointFlag parameter.
 
void ioParam_lastCheckpointDir (enum ParamsIOFlag ioFlag, PVParams *params)
 lastCheckpointDir: If checkpointWrite is not set, this required parameter specifies the directory to write a final written checkpoint at the end of the run. Writing the last checkpoint can be suppressed by setting this string to the empty string. Relative paths are relative to the working directory.
 

Public Member Functions

 Checkpointer (std::string const &name, MPIBlock const *globalMPIBlock, Arguments const *arguments)
 
virtual void addObserver (Observer *observer) override
 
void checkpointRead (double *simTimePointer, long int *currentStepPointer)
 
void checkpointWrite (double simTime)
 
bool doesVerifyWrites ()
 
void finalCheckpoint (double simTime)
 
std::string const & getBlockDirectoryName () const
 
std::string const & getCheckpointReadDirectory () const
 
char const * getCheckpointWriteDir () const
 
bool getCheckpointWriteFlag () const
 
double getCheckpointWriteSimtimeInterval () const
 
long int getCheckpointWriteStepInterval () const
 
enum CheckpointWriteTriggerMode getCheckpointWriteTriggerMode () const
 
char const * getInitializeFromCheckpointDir () const
 
char const * getLastCheckpointDir () const
 
MPIBlock const * getMPIBlock ()
 
std::string const & getOutputPath ()
 
bool getSuppressNonplasticCheckpoints () const
 
void ioParams (enum ParamsIOFlag ioFlag, PVParams *params)
 
std::string makeOutputPathFilename (std::string const &path)
 
void provideFinalStep (long int finalStep)
 
void readNamedCheckpointEntry (std::string const &objName, std::string const &dataName, bool constantEntireRun)
 
void readNamedCheckpointEntry (std::string const &checkpointEntryName, bool constantEntireRun=false)
 
void readStateFromCheckpoint ()
 
template<typename T >
bool registerCheckpointData (std::string const &objName, std::string const &dataName, T *dataPointer, size_t numValues, bool broadcast, bool constantEntireRun)
 
bool registerCheckpointEntry (std::shared_ptr< CheckpointEntry > checkpointEntry, bool constantEntireRun)
 
void registerTimer (Timer const *timer)
 
void writeTimers (PrintStream &stream) const
 

Protected Member Functions

Response::Status notify (ObserverTable const &table, std::vector< std::shared_ptr< BaseMessage const >> messages, bool printFlag)
 
Response::Status notify (ObserverTable const &table, std::shared_ptr< BaseMessage const > message, bool printFlag)
 
void notifyLoop (ObserverTable const &table, std::vector< std::shared_ptr< BaseMessage const >> messages, bool printFlag, std::string const &description)
 
void notifyLoop (ObserverTable const &table, std::shared_ptr< BaseMessage const > message, bool printFlag, std::string const &description)
 

Private Types

enum  CheckpointWriteTriggerMode { NONE, STEP, SIMTIME, WALLCLOCK }
 
enum  WallClockUnit { SECOND, MINUTE, HOUR, DAY }
 

Private Member Functions

void checkpointNow ()
 
void checkpointToDirectory (std::string const &checkpointDirectory)
 
void checkpointWriteSignal ()
 
void extractCheckpointReadDirectory ()
 
void findWarmStartDirectory ()
 
std::string generateBlockPath (std::string const &baseDirectory)
 
void initBlockDirectoryName ()
 
void initMPIBlock (MPIBlock const *globalMPIBlock, Arguments const *arguments)
 
void ioParamsFillGroup (enum ParamsIOFlag ioFlag, PVParams *params)
 
std::string makeCheckpointDirectoryFromCurrentStep ()
 
bool receivedSignal ()
 
void rotateOldCheckpoints (std::string const &newCheckpointDirectory)
 
bool scheduledCheckpoint ()
 
bool scheduledSimTime ()
 
bool scheduledStep ()
 
bool scheduledWallclock ()
 
void verifyDirectory (char const *directory, std::string const &description)
 
void writeTimers (std::string const &directory)
 

Private Attributes

std::string mBlockDirectoryName
 
int mCheckpointIndexWidth = -1
 
std::string mCheckpointReadDirectory
 
std::vector< std::shared_ptr< CheckpointEntry > > mCheckpointRegistry
 
TimermCheckpointTimer = nullptr
 
char * mCheckpointWriteDir = nullptr
 
bool mCheckpointWriteFlag = false
 
double mCheckpointWriteSimtimeInterval = 1.0
 
long int mCheckpointWriteStepInterval = 1L
 
enum CheckpointWriteTriggerMode mCheckpointWriteTriggerMode = NONE
 
char * mCheckpointWriteTriggerModeString = nullptr
 
std::time_t mCheckpointWriteWallclockInterval = 1L
 
std::time_t mCheckpointWriteWallclockIntervalSeconds = 1L
 
char * mCheckpointWriteWallclockUnit = nullptr
 
bool mDeleteOlderCheckpoints = false
 
char * mInitializeFromCheckpointDir = nullptr
 
char * mLastCheckpointDir = nullptr
 
std::time_t mLastCheckpointWallclock = (std::time_t)0
 
MPIBlockmMPIBlock = nullptr
 
std::string mName
 
double mNextCheckpointSimtime = 0.0
 
long int mNextCheckpointStep = 0L
 
std::time_t mNextCheckpointWallclock = (std::time_t)0
 
int mNumCheckpointsKept = 2
 
ObserverTable mObserverTable
 
std::vector< std::string > mOldCheckpointDirectories
 
int mOldCheckpointDirectoriesIndex
 
std::string mOutputPath = ""
 
bool mSuppressNonplasticCheckpoints = false
 
TimeInfo mTimeInfo
 
std::shared_ptr< CheckpointEntryData< TimeInfo > > mTimeInfoCheckpointEntry = nullptr
 
std::vector< Timer const * > mTimers
 
bool mVerifyWrites = true
 
bool mWarmStart = false
 
int mWidthOfFinalStepNumber = 0
 

Static Private Attributes

static std::string const mDefaultOutputPath = "output"
 

Detailed Description

Definition at line 25 of file Checkpointer.hpp.

Member Function Documentation

void PV::Checkpointer::addObserver ( Observer observer)
overridevirtual

The virtual method for adding an Observer-derived object. Derived classes must override this method to add the object to their hierarchy.

Reimplemented from PV::Subject.

Definition at line 456 of file Checkpointer.cpp.

void PV::Checkpointer::checkpointNow ( )
private

Called by checkpointWrite() (if there was not a SIGUSR1 signal pending) and finalCheckpoint(). Writes a checkpoint, indexed by the current timestep. If the deleteOlderCheckpoints param was set, and the number of checkpoints exceeds numCheckpointsKept, the oldest checkpoint is deleted, and the just-written checkpoint is rotated onto the list of checkpoints that will be deleted.

Definition at line 772 of file Checkpointer.cpp.

void PV::Checkpointer::checkpointToDirectory ( std::string const &  checkpointDirectory)
private

Creates a checkpoint based at the given directory. If the checkpoint directory already exists, it issues a warning, and deletes the timeinfo.bin file in the checkpooint. This way, the presence of the timeinfo.bin file indicates that the checkpoint is complete.

Definition at line 798 of file Checkpointer.cpp.

void PV::Checkpointer::checkpointWriteSignal ( )
private

Called by checkpointWrite if a SIGUSR1 signal was sent (as reported by receivedSignal). It writes a checkpoint, indexed by the current timestep. If the deleteOlderCheckpoints param was set, it does not cause a checkpoint to be deleted, and does not rotate the checkpoint into the list of directories that will be deleted.

Definition at line 752 of file Checkpointer.cpp.

void PV::Checkpointer::extractCheckpointReadDirectory ( )
private

If called when mCheckpointReadDirectory is a colon-separated list of paths, extracts the entry corresponding to the process's batch index and replaces mCheckpointReadDirectory with that entry. Called by configCheckpointReadDirectory. If mCheckpointReadDirectory is not a colon-separated list, it is left unchanged. If it is a colon-separated list, the number of entries must agree with

Definition at line 595 of file Checkpointer.cpp.

void PV::Checkpointer::ioParam_checkpointIndexWidth ( enum ParamsIOFlag  ioFlag,
PVParams params 
)
private

If checkpointWrite is true, checkpointIndexWidth specifies the minimum width for the step number appearing in the checkpoint directory.

If the step number needs fewer digits than checkpointIndexWidth, it is padded with zeroes. If the step number needs more, the full step number is still printed. Hence, setting checkpointWrite to zero means that there are never any padded zeroes. If set to a negative number, the width will be inferred from startTime, stopTime and dt. The default value is -1 (infer the width).

Definition at line 402 of file Checkpointer.cpp.

void PV::Checkpointer::ioParam_checkpointWriteTriggerMode ( enum ParamsIOFlag  ioFlag,
PVParams params 
)
private

mCheckpointWriteTriggerMode: If checkpointWrite is set, specifies the method to checkpoint.

Possible choices include

  • step: Checkpoint off of timesteps
  • time: Checkpoint off of simulation time
  • clock: Checkpoint off of clock time. Not implemented yet.

Definition at line 179 of file Checkpointer.cpp.

void PV::Checkpointer::ioParam_suppressNonplasticCheckpoints ( enum ParamsIOFlag  ioFlag,
PVParams params 
)
private

If checkpointWrite is true and this flag is true, connections will only checkpoint if plasticityFlag=true.

Definition at line 414 of file Checkpointer.cpp.

std::string PV::Checkpointer::makeOutputPathFilename ( std::string const &  path)

Given a relative path, returns a full path consisting of the effective output directory for the process's checkpoint cell, followed by "/", followed by the given relative path. It is a fatal error for the path to be an absolute path (i.e. starting with '/').

Definition at line 87 of file Checkpointer.cpp.

Response::Status PV::Subject::notify ( ObserverTable const &  table,
std::vector< std::shared_ptr< BaseMessage const >>  messages,
bool  printFlag 
)
protectedinherited

This method calls the respond() method of each object in the given table, using the given vector of messages. If the table consists of objects A, B, and C; and the messages vector consists of messages X and Y, the order is A->X, A->Y, B->X, B->Y, C->X, C->Y.

The objects' respond() method returns one of the Response::Status types: SUCCESS, NO_ACTION, PARTIAL, or PV_POSTPONE.

SUCCESS: the object completed the task requested by the messages. NO_ACTION: the object has nothing to do in response to the message, or had already done it. PARTIAL: the object has not yet completed but is making progress. It is expected that there are a small number of descrete tasks, so that an object will not return PARTIAL a large number of times in response to the same message. POSTPONE: the object needs to act but cannot do so until an event outside its control occurs.

If all objects return NO_ACTION, then notify() returns NO_ACTION. If all objects return either SUCCESS or NO_ACTION and there is at least one SUCCESS, then notify() returns SUCCESS. If all objects return either POSTPONE or NO_ACTION and there is at least one POSTPONE, then notify() returns POSTPONE. Otherwise, notify() returns PARTIAL.

Generally each message in the messages vector is sent to each object in the table. However, if an object returns POSTPONE in response to a message, the loop skips to the next object, and does not sent any remaining messages to the postponing object.

The rationale behind these rules is so that if the objects in the table are themselves derived from the Subject class, the messages can be passed down the tree and the return values passed up it, and the return value at the top can be interpreted as being over all the components at the bottom, without the topmost object needing to know details of the composition of the objects below it.

If printFlag is true, the method prints information regarding postponement to standard output.

Definition at line 15 of file Subject.cpp.

Response::Status PV::Subject::notify ( ObserverTable const &  table,
std::shared_ptr< BaseMessage const >  message,
bool  printFlag 
)
inlineprotectedinherited

A convenience overload of the basic notify method where there is only one message to send to the objects. This overloading handles enclosing the message in a vector of length one.

Definition at line 91 of file Subject.hpp.

void PV::Subject::notifyLoop ( ObserverTable const &  table,
std::vector< std::shared_ptr< BaseMessage const >>  messages,
bool  printFlag,
std::string const &  description 
)
protectedinherited

This method calls the notify() method in a loop until the result is not PARTIAL. If it is either POSTPONE, it exits with a fatal error. The description argument is used in the error message to report which message vector failed. If the result is PV_SUCCESS, notifyLoop() returns to the calling function.

notifyLoop should only be used if the objects that might cause a postponement are themselves in the table of objects; otherwise the routine will hang.

Definition at line 57 of file Subject.cpp.

void PV::Subject::notifyLoop ( ObserverTable const &  table,
std::shared_ptr< BaseMessage const >  message,
bool  printFlag,
std::string const &  description 
)
inlineprotectedinherited

A convenience overload of the basic notifyLoop method where there is only one message to send to the objects. This overloading handles enclosing the message in a vector of length one.

Definition at line 114 of file Subject.hpp.

bool PV::Checkpointer::receivedSignal ( )
private

If a SIGUSR signal has been received by the global root process, clears the signal and returns true. Otherwise returns false.

Definition at line 676 of file Checkpointer.cpp.

void PV::Checkpointer::rotateOldCheckpoints ( std::string const &  newCheckpointDirectory)
private

Called if deleteOlderCheckpoints is true. It deletes the oldest checkpoint in the list of old checkpoint directories, and adds the new checkpoint directory to the list.

Definition at line 843 of file Checkpointer.cpp.

bool PV::Checkpointer::scheduledCheckpoint ( )
private

Returns true if the params file settings indicate a checkpoint should occur at this timestep. It also advances the appropriate data member for the trigger mode to the next scheduled checkpoint. Returns false otherwise.

Definition at line 700 of file Checkpointer.cpp.

bool PV::Checkpointer::scheduledSimTime ( )
private

Called by scheduledCheckpoint if checkpointWriteTriggerMode is "time". If the simTime is >= the current value of mNextCheckpointSimtime, it advances mNextCheckpointSimtime by the time interval and returns true. Otherwise it returns false.

Definition at line 725 of file Checkpointer.cpp.

bool PV::Checkpointer::scheduledStep ( )
private

Called by scheduledCheckpoint if checkpointWriteTriggerMode is "step". If the step number is an integral multiple of checkpointWriteStepInterval, it advances mNextCheckpointStep by the step interval and returns true. Otherwise it returns false.

Definition at line 715 of file Checkpointer.cpp.

bool PV::Checkpointer::scheduledWallclock ( )
private

Called by scheduledCheckpoint if checkpointWriteTriggerMode is "clock". If the elapsed time between the wall clock time and mLastCheckpointWallclock exceeds mCheckpointWriteWallclockInterval, it sets mLastCheckpointWallclock to the current wall clock time and returns true. Otherwise it returns false.

Definition at line 734 of file Checkpointer.cpp.

Field Documentation

int PV::Checkpointer::mOldCheckpointDirectoriesIndex
private
Initial value:
=
0

Definition at line 334 of file Checkpointer.hpp.


The documentation for this class was generated from the following files: