Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Dumping and Loading Checkpoints During Run Mode #1727

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1eb292c
Trick prints warning for checkpoints outside of freeze mode.
Mrockwell2 Mar 28, 2024
ec6a437
Merge branch 'master' into 1461-only-allow-dumping-and-loading-checkp…
Mrockwell2 Mar 28, 2024
b7a65bb
Fixed small syntax error
Mrockwell2 Mar 28, 2024
9b9c830
Fixed small syntax error
Mrockwell2 Mar 28, 2024
7cb2e5f
Warnings no longer abort checkpoint actions due to test conflict
Mrockwell2 Mar 29, 2024
f71ce44
Moved sim_mode_to_string function to a more generic place.
Mrockwell2 Apr 2, 2024
0218314
Fixing simModeCharString
Mrockwell2 Apr 5, 2024
3604312
Cancel checkpoint if sim is running
Mrockwell2 Apr 5, 2024
ae586c1
Define dependencies for simModeCharString
Mrockwell2 Apr 5, 2024
155e3c5
Merge branch 'master' into 1461-edit-checkpoint-at-time
May 16, 2024
92fbacb
Fixing simModeCharString(SIM_MODE)
May 17, 2024
5b4e82c
Update checkpointing and the freeze loop
May 17, 2024
2638f88
Correct the ref_logs to properly reflect the expected outcomes.
May 17, 2024
ac82444
Updated SIM_stls test to pass with new checkpoint restrictions
May 17, 2024
08ed1e2
Limit automatic freeze and re-run to when simulation is in run mode
Mrockwell2 May 31, 2024
3520b1e
Restored specific checkpoint time in SIM_stls input file
Jun 20, 2024
61e9294
Added ability to checkpoint to a named file at a specific time.
Jun 20, 2024
7d7fd63
Updated tests to increase coverage
Jun 27, 2024
9fc718f
Merge branch 'master' into 1461-edit-checkpoint-at-time
Mrockwell2 Jun 28, 2024
50dc60e
Updated Checkpointing Documentation
Mrockwell2 Jun 28, 2024
7821695
Few little documentation changes
Jul 1, 2024
0926010
Updated checkpointing best practices
Mrockwell2 Jul 1, 2024
cebc83f
Merge branch 'master' into 1461-edit-checkpoint-at-time
Mrockwell2 Jul 24, 2024
deafb9f
Update CheckPointRestart.cpp to appease John
Mrockwell2 Dec 11, 2024
3e568f8
Update CheckPointRestart.cpp so John can pad his resume
Mrockwell2 Dec 11, 2024
fd0b1e2
Update CheckPointRestart.cpp to preemptively appease John
Mrockwell2 Dec 11, 2024
080b6ec
Update CheckPointRestart.cpp to add comments and appease Hong
Mrockwell2 Dec 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ requires a job class.
<tr><td bgcolor="#ffff99">SELF-SCHEDULING</td><td>jobs in the scheduled job loop that must schedule themselves when to run</td></tr>
<tr><td bgcolor="#ccffcc">CYCLIC</td><td>jobs in the scheduled job loop that run cyclically according to their specified cycle time</td></tr>
<tr><td bgcolor="#e6ceff">FREEZE</td><td>jobs that are run during the freeze execution mode</td></tr>
<tr><td bgcolor="#ffc285">CHECKPOINT</td><td>jobs that are run when a checkpoint is dumpded or loaded</td></tr>
<tr><td bgcolor="#ffc285">CHECKPOINT</td><td>jobs that are run when a checkpoint is dumped or loaded</td></tr>
</table>

<b>Table SD_1 Trick-Provided Job Classes</b>
Expand Down
18 changes: 15 additions & 3 deletions docs/documentation/running_a_simulation/Input-File.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,19 +375,31 @@ trick.freeze(trick.exec_get_sim_time() + 5.0)

## Checkpoint the Simulation

To checkpoint a simulation call `trick.checkpoint([<checkpoint_time>])`. `trick.checkpoint()` called with no
arguments will checkpoint immediately. An optional checkpoint time may be provided to checkpoint some time
in the future.
`trick.checkpoint()` called with no arguments will checkpoint immediately. To checkpoint a simulation call `trick.checkpoint(<checkpoint_time>)`.
An optional checkpoint time may be provided to checkpoint some time in the future. When you supply a name, you can save the checkpoint to a specific file. For example, `trick.checkpoint(<file_name>)` creates a checkpoint in a file with the given name immediately. `trick.checkpoint(<checkpoint_time>, <file_name>)` saves a checkpoint in
the described file at the given time.

```python
# Checkpoints immediately
trick.checkpoint()

# Checkpoints immediately, saving to 'checkpoint_save'
trick.checkpoint("checkpoint_save")

# Checkpoints at an absolute time
trick.checkpoint(100.0)

# Checkpoints at an absolute time, saving to 'late_checkpoint'
trick.checkpoint(100.0, "late_checkpoint")

# Checkpoints 5 seconds relative from the current sim_time
trick.checkpoint(trick.exec_get_sim_time() + 5.0)

# Checkpoints to 'checkpoint.txt' immediately
trick.checkpoint("checkpoint.txt")

# Checkpoints to 'checkpoint2.txt' at an absolute time
trick.checkpoint(50.0, "checkpoint.txt")
```

## Stopping the Simulation
Expand Down
4 changes: 4 additions & 0 deletions docs/documentation/simulation_capabilities/Checkpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,12 @@ trick.checkpoint_post_init(True|False)
trick.checkpoint_end(True|False)
# Save a checkpoint at a time in the future
trick.checkpoint(<time>)
# Save a checkpoint at a time in the future to a specified file
trick.checkpoint(<time>, <file_name>)
# Save a checkpoint now
trick.checkpoint()
# Save a checkpoint now to a specified file
trick.checkpoint(<file_name>)

# Set the CPU to use for checkpoints
trick.checkpoint_cpu(<cpu_num>)
Expand Down
4 changes: 2 additions & 2 deletions docs/howto_guides/Checkpointing-Best-Practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ A checkpoint of a simulation is usually initiated from the Input Processor. That
1. The input file, or
2. The variable server.

```trick.checkpoint( <time> )``` is called from Python. This Python function is bound to the corresponding C++ function. At a simulation frame boundary (so that data is time-homogeneous), three things happen:
```trick.checkpoint( <time> )``` is called from Python. This Python function is bound to the corresponding C++ function. At a simulation frame boundary (so that data is time-homogeneous), the simulation freezes and then three things happen:

1. The ```"checkpoint"``` jobs in the S_define file are executed. These job-classes allow you to prepare your sim to be checkpointed. Perhaps you want to transform simulation state data into a different form for checkpointing. This is up to you.

Expand All @@ -109,7 +109,7 @@ A checkpoint of a simulation is usually initiated from the Input Processor. That
<a id=loading-a-checkpoint></a>
### What Happens When You Load a Checkpoint.
Trick.load_checkpoint() is called from Python.
At a simulation frame boundary, three things happen:
At a simulation frame boundary, the simulation freezes and then three things happen:

1. The ```“preload_checkpoint”``` jobs are called. These job-classes allow you to prepare your sim for a checkpoint-restore, in whatever way you see fit.

Expand Down
4 changes: 4 additions & 0 deletions docs/not_referenced/Input-File-Quick-Reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,12 @@ trick.checkpoint_post_init(True|False)
trick.checkpoint_end(True|False)
# Save a checkpoint at a time in the future
trick.checkpoint(<time>)
# Save a checkpoint at a time in the future to a specified file
trick.checkpoint(<time>, <file_name>)
# Save a checkpoint now
trick.checkpoint()
# Save a checkpoint now to a specified file
trick.checkpoint(<file_name>)

# Set the CPU to use for checkpoints
trick.checkpoint_cpu(<cpu_num>)
Expand Down
13 changes: 11 additions & 2 deletions include/trick/CheckPointRestart.hh
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#include <string>
#include <vector>
#include <queue>
#include <map>

#include "trick/Scheduler.hh"

Expand All @@ -22,6 +23,13 @@ namespace Trick {
*/
class CheckPointRestart : public Trick::Scheduler {

private:
/** Flag to track if an automatic freeze has been triggered */
bool auto_freeze = false; /* ** */

/** Map to track custom named checkpoints based on the scheduled times */
std::map<long long, std::string> chkpnt_names; /* ** */

protected:
/** queue to hold jobs to be called before a checkpoint is dumped. */
Trick::ScheduledJobQueue checkpoint_queue ; /* ** */
Expand Down Expand Up @@ -194,13 +202,14 @@ namespace Trick {
/**
@brief @userdesc Command to dump a checkpoint at in_time. (Sets checkpoint_time to the integral time tic value corresponding
to the incoming in_time so that checkpoint occurs once at that time at the end of the execution frame.)
The checkpointed file name is @e chkpnt_<in_time>.
The checkpointed file name is @e chkpnt_<in_time> or @e <file_name>.
@par Python Usage:
@code trick.checkpoint(<in_time>) @endcode
@param in_time - desired checkpoint time in seconds.
@param file_name - checkpoint file name. Defaults to blank in which case the checkpoint follows the expecteed convention.
@return always 0
*/
virtual int checkpoint(double in_time) ;
virtual int checkpoint(double in_time, std::string file_name = "") ;

/**
* Executes the pre_init_checkpoint
Expand Down
10 changes: 10 additions & 0 deletions include/trick/sim_mode.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@
#ifndef SIMMODE_HH
#define SIMMODE_HH

#ifdef __cplusplus
extern "C" {
#endif

typedef enum {

NoCmd = 0 , /* NoCmd */
Expand All @@ -38,4 +42,10 @@ typedef enum {

} SIM_MODE ;

const char * simModeCharString(SIM_MODE mode);

#ifdef __cplusplus
}
#endif

#endif
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
sys.exec.out.time {s},testSimObject.my_foo.a {1},testSimObject.my_foo.b {1}
5,6,12
5.1,6,12
5.2,6,12
5.3,6,12
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
sys.exec.out.time {s},testSimObject.my_foo.a {1},testSimObject.my_foo.b {1}
5,6,12
5.1,6,12
5.2,6,12
5.3,6,12
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
sys.exec.out.time {s},testSimObject.my_foo.a {1},testSimObject.my_foo.b {1}
2,3,6
2.1,3,6
2.2,3,6
2.3,3,6
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
sys.exec.out.time {s},testSimObject.my_foo.a {1},testSimObject.my_foo.b {1}
7,8,16
7.1,8,16
7.2,8,16
7.3,8,16
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
sys.exec.out.time {s},testSimObject.my_foo.b {1}
7,16
8,18
9,20
10,22
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
sys.exec.out.time {s},testSimObject.my_foo.a {1},testSimObject.my_foo.b {1},testSimObject.my_foo.q {1}
5,6,12,2
8,9,18,3
11,12,24,4
14,15,30,5
Expand Down
2 changes: 2 additions & 0 deletions test/SIM_checkpoint_data_recording/RUN_test8/dump.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
def main():
exec(open("Modified_data/fooChange2.dr").read())

# Testing that separately scheduling a freeze and checkpoint at the same time still results in a checkpoint
trick.freeze(5.0)
trick.checkpoint(5.0)

trick.stop(20.0)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
sys.exec.out.time {s},testSimObject.my_foo.q {1}
5,2
8,3
11,4
14,5
Expand Down
2 changes: 1 addition & 1 deletion test/SIM_stls/RUN_test/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ def main():
trick.exec_set_job_onoff("the_object.stlc.test", 1, False)
trick.exec_set_job_onoff("the_object.stlc.print", 1, False)

trick.add_read( 0.5, 'trick.checkpoint("chkpnt_in")')
trick.checkpoint(0.5, "chkpnt_in")

trick.exec_set_freeze_frame(0.10)
trick.stop(1.0)
Expand Down
60 changes: 55 additions & 5 deletions trick_source/sim_services/CheckPointRestart/CheckPointRestart.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "trick/message_proto.h"
#include "trick/message_type.h"
#include "trick/TrickConstant.hh"
#include "trick/sim_mode.h"

Trick::CheckPointRestart * the_cpr ;

Expand Down Expand Up @@ -109,7 +110,12 @@ int Trick::CheckPointRestart::find_write_checkpoint_jobs(std::string sim_object_
return(0) ;
}

int Trick::CheckPointRestart::checkpoint(double in_time) {
/**
* @brief Schedule a checkpoint to be written at a given time.
* @param in_time The time the checkpoint should be dumped
* @see write_checkpoint()
*/
int Trick::CheckPointRestart::checkpoint(double in_time, std::string file_name) {

long long curr_time = exec_get_time_tics() ;
long long new_time ;
Expand All @@ -121,6 +127,10 @@ int Trick::CheckPointRestart::checkpoint(double in_time) {
if ( new_time < write_checkpoint_job->next_tics ) {
write_checkpoint_job->next_tics = new_time ;
}

if (!file_name.empty()) chkpnt_names[new_time] = file_name;

the_exec->freeze(in_time);
//std::cout << "\033[33mSET CHECKPOINT TIME " << in_time << " " << new_time << "\033[0m" << std::endl ;
} else {
message_publish(MSG_ERROR, "Checkpoint time specified in the past. specified %f, current_time %f\n",
Expand Down Expand Up @@ -171,6 +181,19 @@ int Trick::CheckPointRestart::do_checkpoint(std::string file_name, bool print_st

JobData * curr_job ;
pid_t pid;
SIM_MODE mode;

mode = the_exec->get_mode();

if (mode == Run) {
std::string msg_format = "WARNING: Saving a checkpoint in 'Run Mode' may cause non time-homogeneous data. ";
msg_format += "Current Mode: %s (%d)\n";
message_publish(MSG_WARNING, msg_format.c_str(),
simModeCharString(mode), mode);

return 0;
}


if ( ! file_name.compare("") ) {
std::stringstream file_name_stream ;
Expand Down Expand Up @@ -227,6 +250,10 @@ int Trick::CheckPointRestart::do_checkpoint(std::string file_name, bool print_st
return 0 ;
}

/**
* @brief Writes a scheduled checkpoint if it is the correct time.
* @see checkpoint(double in_time)
*/
int Trick::CheckPointRestart::write_checkpoint() {

long long curr_time = exec_get_time_tics() ;
Expand All @@ -246,12 +273,20 @@ int Trick::CheckPointRestart::write_checkpoint() {
}

double sim_time = exec_get_sim_time() ;
std::stringstream chk_name_stream ;
std::string file_name = "";

chk_name_stream << "chkpnt_" << std::fixed << std::setprecision(6) << sim_time ;
if (chkpnt_names.find(curr_time) == chkpnt_names.end()) {
std::stringstream chk_name_stream ;
chk_name_stream << "chkpnt_" << std::fixed << std::setprecision(6) << sim_time ;
file_name = chk_name_stream.str();
} else {
file_name = chkpnt_names[curr_time];
chkpnt_names.erase(curr_time);
}

checkpoint( chk_name_stream.str() );
checkpoint( file_name );

the_exec->run();
}

return(0) ;
Expand Down Expand Up @@ -293,6 +328,20 @@ int Trick::CheckPointRestart::safestore_checkpoint() {
}

void Trick::CheckPointRestart::load_checkpoint(std::string file_name) {
SIM_MODE mode = the_exec->get_mode();

if (mode == Run) {
std::string msg_format = "WARNING: Loading a checkpoint in 'Run Mode' may cause non time-homogeneous data. ";
msg_format += "Current Mode: %s (%d)\n";

message_publish(MSG_WARNING, msg_format.c_str(),
file_name.c_str(), simModeCharString(mode), mode);
// If in RUN mode, this will freeze the simulation and notify the code to unfreeze later.
// To forbid loading a checkpoint in RUN mode, remove the following two lines and the second to last line in load_checkpoint_job()
the_exec->freeze();
auto_freeze = true;
}

load_checkpoint_file_name = file_name ;
}

Expand All @@ -306,7 +355,7 @@ int Trick::CheckPointRestart::load_checkpoint_job() {
JobData * curr_job ;
struct stat temp_buf ;

if ( ! load_checkpoint_file_name.empty() ) {
if ( ! load_checkpoint_file_name.empty() && the_exec->get_mode() != Run) {

if ( stat( load_checkpoint_file_name.c_str() , &temp_buf) == 0 ) {
preload_checkpoint_queue.reset_curr_index() ;
Expand Down Expand Up @@ -338,6 +387,7 @@ int Trick::CheckPointRestart::load_checkpoint_job() {
message_publish(MSG_INFO, "Could not find checkpoint file %s.\n", load_checkpoint_file_name.c_str()) ;
}
load_checkpoint_file_name.clear() ;
if(auto_freeze) the_exec->run();
}

return(0) ;
Expand Down
7 changes: 7 additions & 0 deletions trick_source/sim_services/Executive/Executive_freeze_loop.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@

#include "trick/Executive.hh"
#include "trick/ExecutiveException.hh"
#include "trick/CheckPointRestart.hh"
#include "trick/exec_proto.h"
#include "trick/message_proto.h"
#include "trick/message_type.h"

extern Trick::CheckPointRestart * the_cpr ;

/**
@details
-# Set the mode to Freeze. Requirement [@ref r_exec_mode_2]
Expand All @@ -32,6 +35,10 @@ int Trick::Executive::freeze_loop() {
}

message_publish(MSG_INFO, "Freeze ON. Simulation time holding at %f seconds.\n" , get_sim_time()) ;

if (!the_cpr->checkpoint_times.empty()) {
the_cpr->write_checkpoint();
}

while (mode == Freeze) {

Expand Down
3 changes: 3 additions & 0 deletions trick_source/sim_services/VariableServer/Makefile_deps
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@

object_${TRICK_HOST_CPU}/simModeCharString.o: simModeCharString.c \
${TRICK_HOME}/include/trick/sim_mode.h
object_${TRICK_HOST_CPU}/VariableServerSessionThread_loop.o: VariableServerSessionThread_loop.cpp \
${TRICK_HOME}/include/trick/VariableServer.hh \
${TRICK_HOME}/include/trick/tc.h \
Expand Down
14 changes: 14 additions & 0 deletions trick_source/sim_services/VariableServer/simModeCharString.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

#include "trick/sim_mode.h"

const char * simModeCharString(SIM_MODE mode) {
switch (mode)
{
case Initialization: return "Initialization";
case Run: return "Run";
case Step: return "Step";
case Freeze: return "Freeze";
case ExitMode: return "ExitMode";
default: return "InvalidMode";
}
}
Loading