PetaVision
Alpha
|
Data Structures | |
struct | TimeInfo |
Checkpointer Parameters | |
List of parameters needed from the Checkpointer class | |
virtual void | ioParam_verifyWrites (enum ParamsIOFlag ioFlag, PVParams *params) |
verifyWrites: If true, calls to FileStream::write are checked by opening the file in read mode and reading back the data and comparing it to the data just written. | |
virtual void | ioParam_outputPath (enum ParamsIOFlag ioFlag, PVParams *params) |
mOutputPath: Specifies the absolute or relative output path of the run | |
void | ioParam_checkpointWrite (enum ParamsIOFlag ioFlag, PVParams *params) |
checkpointWrite: Flag to determine if the run writes checkpoints. | |
void | ioParam_checkpointWriteDir (enum ParamsIOFlag ioFlag, PVParams *params) |
checkpointWriteDir: If checkpointWrite is set, specifies the output checkpoint directory. | |
void | ioParam_checkpointWriteTriggerMode (enum ParamsIOFlag ioFlag, PVParams *params) |
mCheckpointWriteTriggerMode: If checkpointWrite is set, specifies the method to checkpoint. More... | |
void | ioParam_checkpointWriteStepInterval (enum ParamsIOFlag ioFlag, PVParams *params) |
checkpointWriteStepInterval: If checkpointWrite on step, specifies the number of steps between checkpoints. | |
void | ioParam_checkpointWriteTimeInterval (enum ParamsIOFlag ioFlag, PVParams *params) |
checkpointWriteTimeInteval: If checkpointWrite on time, specifies the amount of simulation time between checkpoints. | |
void | ioParam_checkpointWriteClockInterval (enum ParamsIOFlag ioFlag, PVParams *params) |
checkpointWriteClockInteval: If checkpointWrite on clock, specifies the amount of clock time between checkpoints. The units are specified using the parameter checkpointWriteClockUnit | |
void | ioParam_checkpointWriteClockUnit (enum ParamsIOFlag ioFlag, PVParams *params) |
checkpointWriteClockInteval: If checkpointWrite on clock, specifies the units used in checkpointWriteClockInterval. | |
void | ioParam_checkpointIndexWidth (enum ParamsIOFlag ioFlag, PVParams *params) |
If checkpointWrite is true, checkpointIndexWidth specifies the minimum width for the step number appearing in the checkpoint directory. More... | |
void | ioParam_suppressNonplasticCheckpoints (enum ParamsIOFlag ioFlag, PVParams *params) |
void | ioParam_deleteOlderCheckpoints (enum ParamsIOFlag ioFlag, PVParams *params) |
deleteOlderCheckpoints: If checkpointWrite, specifies if the run should delete older checkpoints when writing new ones. | |
void | ioParam_numCheckpointsKept (enum ParamsIOFlag ioFlag, PVParams *params) |
mNumCheckpointsKept: If mDeleteOlderCheckpoints is set, keep this many checkpoints before deleting the checkpoint. Default is 1 (delete a checkpoint when a newer checkpoint is written.) | |
void | ioParam_initializeFromCheckpointDir (enum ParamsIOFlag ioFlag, PVParams *params) |
initializeFromCheckpointDir: Sets directory used by Checkpointer::initializeFromCheckpoint(). Layers and connections use this directory if they set their initializeFromCheckpointFlag parameter. | |
void | ioParam_lastCheckpointDir (enum ParamsIOFlag ioFlag, PVParams *params) |
lastCheckpointDir: If checkpointWrite is not set, this required parameter specifies the directory to write a final written checkpoint at the end of the run. Writing the last checkpoint can be suppressed by setting this string to the empty string. Relative paths are relative to the working directory. | |
Public Member Functions | |
Checkpointer (std::string const &name, MPIBlock const *globalMPIBlock, Arguments const *arguments) | |
virtual void | addObserver (Observer *observer) override |
void | checkpointRead (double *simTimePointer, long int *currentStepPointer) |
void | checkpointWrite (double simTime) |
bool | doesVerifyWrites () |
void | finalCheckpoint (double simTime) |
std::string const & | getBlockDirectoryName () const |
std::string const & | getCheckpointReadDirectory () const |
char const * | getCheckpointWriteDir () const |
bool | getCheckpointWriteFlag () const |
double | getCheckpointWriteSimtimeInterval () const |
long int | getCheckpointWriteStepInterval () const |
enum CheckpointWriteTriggerMode | getCheckpointWriteTriggerMode () const |
char const * | getInitializeFromCheckpointDir () const |
char const * | getLastCheckpointDir () const |
MPIBlock const * | getMPIBlock () |
std::string const & | getOutputPath () |
bool | getSuppressNonplasticCheckpoints () const |
void | ioParams (enum ParamsIOFlag ioFlag, PVParams *params) |
std::string | makeOutputPathFilename (std::string const &path) |
void | provideFinalStep (long int finalStep) |
void | readNamedCheckpointEntry (std::string const &objName, std::string const &dataName, bool constantEntireRun) |
void | readNamedCheckpointEntry (std::string const &checkpointEntryName, bool constantEntireRun=false) |
void | readStateFromCheckpoint () |
template<typename T > | |
bool | registerCheckpointData (std::string const &objName, std::string const &dataName, T *dataPointer, size_t numValues, bool broadcast, bool constantEntireRun) |
bool | registerCheckpointEntry (std::shared_ptr< CheckpointEntry > checkpointEntry, bool constantEntireRun) |
void | registerTimer (Timer const *timer) |
void | writeTimers (PrintStream &stream) const |
Protected Member Functions | |
Response::Status | notify (ObserverTable const &table, std::vector< std::shared_ptr< BaseMessage const >> messages, bool printFlag) |
Response::Status | notify (ObserverTable const &table, std::shared_ptr< BaseMessage const > message, bool printFlag) |
void | notifyLoop (ObserverTable const &table, std::vector< std::shared_ptr< BaseMessage const >> messages, bool printFlag, std::string const &description) |
void | notifyLoop (ObserverTable const &table, std::shared_ptr< BaseMessage const > message, bool printFlag, std::string const &description) |
Private Types | |
enum | CheckpointWriteTriggerMode { NONE, STEP, SIMTIME, WALLCLOCK } |
enum | WallClockUnit { SECOND, MINUTE, HOUR, DAY } |
Private Member Functions | |
void | checkpointNow () |
void | checkpointToDirectory (std::string const &checkpointDirectory) |
void | checkpointWriteSignal () |
void | extractCheckpointReadDirectory () |
void | findWarmStartDirectory () |
std::string | generateBlockPath (std::string const &baseDirectory) |
void | initBlockDirectoryName () |
void | initMPIBlock (MPIBlock const *globalMPIBlock, Arguments const *arguments) |
void | ioParamsFillGroup (enum ParamsIOFlag ioFlag, PVParams *params) |
std::string | makeCheckpointDirectoryFromCurrentStep () |
bool | receivedSignal () |
void | rotateOldCheckpoints (std::string const &newCheckpointDirectory) |
bool | scheduledCheckpoint () |
bool | scheduledSimTime () |
bool | scheduledStep () |
bool | scheduledWallclock () |
void | verifyDirectory (char const *directory, std::string const &description) |
void | writeTimers (std::string const &directory) |
Private Attributes | |
std::string | mBlockDirectoryName |
int | mCheckpointIndexWidth = -1 |
std::string | mCheckpointReadDirectory |
std::vector< std::shared_ptr< CheckpointEntry > > | mCheckpointRegistry |
Timer * | mCheckpointTimer = nullptr |
char * | mCheckpointWriteDir = nullptr |
bool | mCheckpointWriteFlag = false |
double | mCheckpointWriteSimtimeInterval = 1.0 |
long int | mCheckpointWriteStepInterval = 1L |
enum CheckpointWriteTriggerMode | mCheckpointWriteTriggerMode = NONE |
char * | mCheckpointWriteTriggerModeString = nullptr |
std::time_t | mCheckpointWriteWallclockInterval = 1L |
std::time_t | mCheckpointWriteWallclockIntervalSeconds = 1L |
char * | mCheckpointWriteWallclockUnit = nullptr |
bool | mDeleteOlderCheckpoints = false |
char * | mInitializeFromCheckpointDir = nullptr |
char * | mLastCheckpointDir = nullptr |
std::time_t | mLastCheckpointWallclock = (std::time_t)0 |
MPIBlock * | mMPIBlock = nullptr |
std::string | mName |
double | mNextCheckpointSimtime = 0.0 |
long int | mNextCheckpointStep = 0L |
std::time_t | mNextCheckpointWallclock = (std::time_t)0 |
int | mNumCheckpointsKept = 2 |
ObserverTable | mObserverTable |
std::vector< std::string > | mOldCheckpointDirectories |
int | mOldCheckpointDirectoriesIndex |
std::string | mOutputPath = "" |
bool | mSuppressNonplasticCheckpoints = false |
TimeInfo | mTimeInfo |
std::shared_ptr< CheckpointEntryData< TimeInfo > > | mTimeInfoCheckpointEntry = nullptr |
std::vector< Timer const * > | mTimers |
bool | mVerifyWrites = true |
bool | mWarmStart = false |
int | mWidthOfFinalStepNumber = 0 |
Static Private Attributes | |
static std::string const | mDefaultOutputPath = "output" |
Definition at line 25 of file Checkpointer.hpp.
|
overridevirtual |
The virtual method for adding an Observer-derived object. Derived classes must override this method to add the object to their hierarchy.
Reimplemented from PV::Subject.
Definition at line 456 of file Checkpointer.cpp.
|
private |
Called by checkpointWrite() (if there was not a SIGUSR1 signal pending) and finalCheckpoint(). Writes a checkpoint, indexed by the current timestep. If the deleteOlderCheckpoints param was set, and the number of checkpoints exceeds numCheckpointsKept, the oldest checkpoint is deleted, and the just-written checkpoint is rotated onto the list of checkpoints that will be deleted.
Definition at line 772 of file Checkpointer.cpp.
|
private |
Creates a checkpoint based at the given directory. If the checkpoint directory already exists, it issues a warning, and deletes the timeinfo.bin file in the checkpooint. This way, the presence of the timeinfo.bin file indicates that the checkpoint is complete.
Definition at line 798 of file Checkpointer.cpp.
|
private |
Called by checkpointWrite if a SIGUSR1 signal was sent (as reported by receivedSignal). It writes a checkpoint, indexed by the current timestep. If the deleteOlderCheckpoints param was set, it does not cause a checkpoint to be deleted, and does not rotate the checkpoint into the list of directories that will be deleted.
Definition at line 752 of file Checkpointer.cpp.
|
private |
If called when mCheckpointReadDirectory is a colon-separated list of paths, extracts the entry corresponding to the process's batch index and replaces mCheckpointReadDirectory with that entry. Called by configCheckpointReadDirectory. If mCheckpointReadDirectory is not a colon-separated list, it is left unchanged. If it is a colon-separated list, the number of entries must agree with
Definition at line 595 of file Checkpointer.cpp.
|
private |
If checkpointWrite is true, checkpointIndexWidth specifies the minimum width for the step number appearing in the checkpoint directory.
If the step number needs fewer digits than checkpointIndexWidth, it is padded with zeroes. If the step number needs more, the full step number is still printed. Hence, setting checkpointWrite to zero means that there are never any padded zeroes. If set to a negative number, the width will be inferred from startTime, stopTime and dt. The default value is -1 (infer the width).
Definition at line 402 of file Checkpointer.cpp.
|
private |
mCheckpointWriteTriggerMode: If checkpointWrite is set, specifies the method to checkpoint.
Possible choices include
Definition at line 179 of file Checkpointer.cpp.
|
private |
If checkpointWrite is true and this flag is true, connections will only checkpoint if plasticityFlag=true.
Definition at line 414 of file Checkpointer.cpp.
std::string PV::Checkpointer::makeOutputPathFilename | ( | std::string const & | path | ) |
Given a relative path, returns a full path consisting of the effective output directory for the process's checkpoint cell, followed by "/", followed by the given relative path. It is a fatal error for the path to be an absolute path (i.e. starting with '/').
Definition at line 87 of file Checkpointer.cpp.
|
protectedinherited |
This method calls the respond() method of each object in the given table, using the given vector of messages. If the table consists of objects A, B, and C; and the messages vector consists of messages X and Y, the order is A->X, A->Y, B->X, B->Y, C->X, C->Y.
The objects' respond() method returns one of the Response::Status types: SUCCESS, NO_ACTION, PARTIAL, or PV_POSTPONE.
SUCCESS: the object completed the task requested by the messages. NO_ACTION: the object has nothing to do in response to the message, or had already done it. PARTIAL: the object has not yet completed but is making progress. It is expected that there are a small number of descrete tasks, so that an object will not return PARTIAL a large number of times in response to the same message. POSTPONE: the object needs to act but cannot do so until an event outside its control occurs.
If all objects return NO_ACTION, then notify() returns NO_ACTION. If all objects return either SUCCESS or NO_ACTION and there is at least one SUCCESS, then notify() returns SUCCESS. If all objects return either POSTPONE or NO_ACTION and there is at least one POSTPONE, then notify() returns POSTPONE. Otherwise, notify() returns PARTIAL.
Generally each message in the messages vector is sent to each object in the table. However, if an object returns POSTPONE in response to a message, the loop skips to the next object, and does not sent any remaining messages to the postponing object.
The rationale behind these rules is so that if the objects in the table are themselves derived from the Subject class, the messages can be passed down the tree and the return values passed up it, and the return value at the top can be interpreted as being over all the components at the bottom, without the topmost object needing to know details of the composition of the objects below it.
If printFlag is true, the method prints information regarding postponement to standard output.
Definition at line 15 of file Subject.cpp.
|
inlineprotectedinherited |
A convenience overload of the basic notify method where there is only one message to send to the objects. This overloading handles enclosing the message in a vector of length one.
Definition at line 91 of file Subject.hpp.
|
protectedinherited |
This method calls the notify() method in a loop until the result is not PARTIAL. If it is either POSTPONE, it exits with a fatal error. The description argument is used in the error message to report which message vector failed. If the result is PV_SUCCESS, notifyLoop() returns to the calling function.
notifyLoop should only be used if the objects that might cause a postponement are themselves in the table of objects; otherwise the routine will hang.
Definition at line 57 of file Subject.cpp.
|
inlineprotectedinherited |
A convenience overload of the basic notifyLoop method where there is only one message to send to the objects. This overloading handles enclosing the message in a vector of length one.
Definition at line 114 of file Subject.hpp.
|
private |
If a SIGUSR signal has been received by the global root process, clears the signal and returns true. Otherwise returns false.
Definition at line 676 of file Checkpointer.cpp.
|
private |
Called if deleteOlderCheckpoints is true. It deletes the oldest checkpoint in the list of old checkpoint directories, and adds the new checkpoint directory to the list.
Definition at line 843 of file Checkpointer.cpp.
|
private |
Returns true if the params file settings indicate a checkpoint should occur at this timestep. It also advances the appropriate data member for the trigger mode to the next scheduled checkpoint. Returns false otherwise.
Definition at line 700 of file Checkpointer.cpp.
|
private |
Called by scheduledCheckpoint if checkpointWriteTriggerMode is "time". If the simTime is >= the current value of mNextCheckpointSimtime, it advances mNextCheckpointSimtime by the time interval and returns true. Otherwise it returns false.
Definition at line 725 of file Checkpointer.cpp.
|
private |
Called by scheduledCheckpoint if checkpointWriteTriggerMode is "step". If the step number is an integral multiple of checkpointWriteStepInterval, it advances mNextCheckpointStep by the step interval and returns true. Otherwise it returns false.
Definition at line 715 of file Checkpointer.cpp.
|
private |
Called by scheduledCheckpoint if checkpointWriteTriggerMode is "clock". If the elapsed time between the wall clock time and mLastCheckpointWallclock exceeds mCheckpointWriteWallclockInterval, it sets mLastCheckpointWallclock to the current wall clock time and returns true. Otherwise it returns false.
Definition at line 734 of file Checkpointer.cpp.
|
private |
Definition at line 334 of file Checkpointer.hpp.