Skip to content

Commit

Permalink
Add inspect_ckpts.py instructions to README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
gkielian committed Apr 17, 2024
1 parent 8de824f commit 386bd36
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 2 deletions.
26 changes: 24 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,11 +157,33 @@ logs/

and save checkpoints for inference in `out_test`

### Inspect best losses

Often we want to run a large number of experiments and find the best validation
loss (a metric for how well the model does on next token prediction on a given
dataset).

The included `inspect_ckpts.py` script to recursively check the best valiation
loss and associated iteration number for all ckpt.pt files in a given directory:
```bash
python3 inspect_ckpts.py --directory ./out --sort loss
```

![image](./images/inspect_ckpts.png)

This can be wrapped with color via the watch command for a realtime dashboard.

For example to look at all checkpoint files in the out directory:
```bash
watch --color 'python3 inspect_ckpts.py --directory ./out --sort loss'
```

As with remainder of the repo, this script is provided as a base to open up for
additional community contributions.

### Start Tensorboard Logging

If using tensorboard for logging (recommended as this is the means tested by the
development team), we have provided a convenience script:
If using tensorboard for logging, we have provided a convenience script:

```bash
bash start_tensorboard.sh
Expand Down
Binary file added images/inspect_ckpts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 386bd36

Please sign in to comment.