462 checkpointing dev #473

Pale-Blue-Dot-97 · 2024-04-23T22:52:33Z

This PR adds full checkpointing support to Trainer.

New functionality:

To use checkpointing, set checkpoint_experiment to true in the config.
To reload an experiment, set resume to true and add the experiment to reload to the config in exp_name.

Note

The early stopper will now also use the checkpointing function. This means it'll save a checkpoint that includes the model state dict, rather than just saving the weights.

Pale-Blue-Dot-97 added 6 commits April 23, 2024 00:39

Added save_checkpoint and load_checkpoint

96a3ec2

Added option to save the model externally to stopper

b8223c7

Added checkpointing

e91e69d

Added check that exp_name is specified if resume==True

09e35b9

Added test_trainer_resume

dda7b32

Fixed checkpointing

01b78a1

Pale-Blue-Dot-97 added enhancement New feature or request testing New tests needed python Pull requests that update Python code labels Apr 23, 2024

Pale-Blue-Dot-97 self-assigned this Apr 23, 2024

Updated correct answer for Vatican City test

e3e968f

Pale-Blue-Dot-97 merged commit e18b40c into beta Apr 23, 2024
2 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

462 checkpointing dev #473

462 checkpointing dev #473

Pale-Blue-Dot-97 commented Apr 23, 2024

462 checkpointing dev #473

462 checkpointing dev #473

Conversation

Pale-Blue-Dot-97 commented Apr 23, 2024