training stats event-handler (for printing training losses) #7

wyli · 2020-01-10T16:25:13Z

No description provided.

Nic-Ma · 2020-01-14T08:24:36Z

I found a similar example in @ericspod 's notebook example:
@trainer.on(Events.EPOCH_COMPLETED)
def log_training_loss(engine):
print("Epoch", engine.state.epoch, "Loss:", engine.state.output)

About this task, do you mean something like it?
Thanks.

ericspod · 2020-01-14T12:25:35Z

We would something a bit more generalized with optional outputs for end of iteration, end of step, end of train, etc. We ought to be logging to the engine's log object instead of printing to stdout, we can add and printing handler to that log object to do both of course.

Nic-Ma · 2020-01-14T14:19:28Z

Hi @ericspod ,

Thanks for your suggestion about general StatsLogger.
Intead of engine.log, can we put all the useful outputs to engine.state.metrics?
We can also get "iteration", "epoch", "max_epochs", "epoch_length", etc. from engine.state.
I think engine.state is designed to be an unified API to store useful information for all kinds of event-handlers.
If we aligned on this direction, I can try to make a PR based on engine.state.
Thanks.

vfdev-5 · 2020-01-14T23:55:42Z

Sorry for jumping in into your conversation, I just would like to help with that and make more clear on what is provided out-of-the-box in ignite for that:

TensorboardLogger if you would like to write any output parameters (e.g. batch losses), computed metrics, optimizer params etc.
VisdomLogger, same as TensorboardLogger but for visdom.
ProgressBar is a tqdm bar for displaying the progress and showing some values of interest.

In my experience, using experiment tracking systems like MLflow or Polyaxon, we can either log to the system via their api (and ignite's wrappers like MLflowLogger or PolyaxonLogger), write events to TensorBoard or simply print values to stdout and this is automatically written to a log file. The first and the second approaches are obviously more interesting if we would like to compare different runs etc.

HTH

Nic-Ma · 2020-01-15T03:39:06Z

Hi @vfdev-5 ,

Thanks very much for your detailed sharing!
I will take a deep dive into your examples.

And @wyli @ericspod @yanchengnv ,

About the usage of Ignite, have we aligned to use only Ignite official code or both official code and the 3rd contrib code?
Thanks.

ericspod · 2020-01-15T12:35:58Z

@vfdev-5 @Nic-Ma I've used the log file for logging just messages and such, the SessionSaver class in ptproto creates a new directory in a given parent directory for every new run and sends the log to a file there along with the checkpoints and saved networks. My subclasses of Engine add extra fields to the state and we could add more things to it, I would think that metrics should only be the output from metric handlers and shouldn't have anything else. Returning to the idea of the session handling if we're saving the whole engine state (or everything without large tensors) then these other things we add will get saved as well.

ericspod · 2020-01-15T15:18:16Z

@vfdev-5 One thing to mention is that tqdm doesn't play well with Jupyterlab for some reason, I believe it's a known bug. I had written a super primitive text progress bar that works, I don't know if we collectively want to investigate anything else. I really like doing things through Jupyter a lot, so stuff that doesn't rely on tensorboard/visdom is what I would prefer.

fepegar · 2020-01-15T15:20:10Z

There is a tqdm_notebook that works quite well: https://pypi.org/project/tqdm/#ipython-jupyter-integration

ericspod · 2020-01-15T15:21:06Z

@fepegar I think that does have issues with Jupyterlab, Jupyter notebook vanilla I think is fine. I don't why but they're different.

fepegar · 2020-01-15T15:23:42Z

Yes, I've had trouble before on JupyterLab. But I think installing the widgets extension solves it: https://ipywidgets.readthedocs.io/en/latest/user_install.html#installing-the-jupyterlab-extension

vfdev-5 · 2020-01-15T20:59:53Z

@ericspod I'm also using jupyterlab for development, it provides a cool environment for research/prototyping/testing etc.

so stuff that doesn't rely on tensorboard/visdom is what I would prefer.

However, how do you plan to run and then organize and compare various trainings for the same task ?

ericspod · 2020-01-15T22:13:16Z

@fepegar I thought I had tried that and it didn't fix the issue, maybe it didn't load correct for me? I'll try again.

@vfdev-5 That is something that I wasn't doing in a great way so definitely we should be targeting ways of supporting lab and tensorboard/visdom.

pdogra89 · 2020-02-07T16:28:48Z

Yan - Setup time to discuss design choice here.

* adds network * adds basic training * update loading * working prototype * update validation set * [MONAI] Add author; paper info; PDDCA18 (#6) + Author + Early accept + PDDCA18 link * Update README.md * adds network * adds basic training * update loading * working prototype * update validation set * [MONAI] Update TRAIN_PATH, VAL_PATH (#8) + Update TRAIN_PATH, VAL_PATH * [MONAI] Add data link (#7) + Add data link https://drive.google.com/file/d/1A2zpVlR3CkvtkJPvtAF3-MH0nr1WZ2Mn/view?usp=sharing * fixes typos * tested new dataset * print more infor, checked new dataset * [MONAI] Add paper link (#9) Add paper link https://arxiv.org/abs/2006.12575 * [MONAI] Use dice loss + focal loss to train (#10) Use dice loss + focal loss to train * [MONAI] Support non-one-hot ground truth (#11) Support non-one-hot ground truth * fixes format and docstrings, adds argparser options * resume the focal_loss * adds tests * [MONAI] Support non-one-hot ground truth (#11) Support non-one-hot ground truth * adds tests * update docstring * [MONAI] Keep track of best validation scores (#12) Keep track of best validation scores * model saving * adds window sampling * update readme * update docs * fixes flake8 error * update window sampling * fixes model name * fixes channel size issue * [MONAI] Update --pretrain, --lr (#13) + lr from 5e-4 to 1e-3 because we use mean for class channel instead of sum for class channel. + pretrain path is consistent with current model_name. * [MONAI] Pad image; elastic; best class model (#14) * [MONAI] Pad image; elastic; best class model + Pad image bigger than crop_size, avoid potential issues in RandCropByPosNegLabeld + Use Rand3DElasticd + Save best model for each class * Update train.py Co-authored-by: Wenqi Li <wenqil@nvidia.com> * flake8 fixes * removes -1 cropsize deform * testing commands * fixes unit tests * update spatial padding * [MONAI] Add full image deform augmentation (#15) + Add full image deform augmentation by Rand3DElasticd + Please use latest MONAI in #623 * Adding py.typed * updating setup.py to comply with black * update based on comments * excluding research from packaging * update tests * update setup.py Co-authored-by: Wentao Zhu <wentaozhu1991@gmail.com> Co-authored-by: Neil Tenenholtz <ntenenz@users.noreply.github.com> Co-authored-by: Nic Ma <nma@nvidia.com>

wyli assigned wyli, atbenmurray, yanchengnv and Nic-Ma and unassigned wyli and yanchengnv Jan 10, 2020

pdogra89 assigned yanchengnv and ericspod Feb 7, 2020

pdogra89 assigned wyli Feb 7, 2020

Nic-Ma mentioned this issue Feb 11, 2020

7 training stats event handler #71

Merged

3 tasks

wyli closed this as completed in #71 Feb 12, 2020

acamargosonosa mentioned this issue Nov 28, 2022

Bug running docker Project-MONAI/monai-deploy-app-sdk#400

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training stats event-handler (for printing training losses) #7

training stats event-handler (for printing training losses) #7

wyli commented Jan 10, 2020

Nic-Ma commented Jan 14, 2020

ericspod commented Jan 14, 2020

Nic-Ma commented Jan 14, 2020

vfdev-5 commented Jan 14, 2020 •

edited

Loading

Nic-Ma commented Jan 15, 2020 •

edited

Loading

ericspod commented Jan 15, 2020

ericspod commented Jan 15, 2020

fepegar commented Jan 15, 2020

ericspod commented Jan 15, 2020

fepegar commented Jan 15, 2020

vfdev-5 commented Jan 15, 2020 •

edited

Loading

ericspod commented Jan 15, 2020

pdogra89 commented Feb 7, 2020

training stats event-handler (for printing training losses) #7

training stats event-handler (for printing training losses) #7

Comments

wyli commented Jan 10, 2020

Nic-Ma commented Jan 14, 2020

ericspod commented Jan 14, 2020

Nic-Ma commented Jan 14, 2020

vfdev-5 commented Jan 14, 2020 • edited Loading

Nic-Ma commented Jan 15, 2020 • edited Loading

ericspod commented Jan 15, 2020

ericspod commented Jan 15, 2020

fepegar commented Jan 15, 2020

ericspod commented Jan 15, 2020

fepegar commented Jan 15, 2020

vfdev-5 commented Jan 15, 2020 • edited Loading

ericspod commented Jan 15, 2020

pdogra89 commented Feb 7, 2020

vfdev-5 commented Jan 14, 2020 •

edited

Loading

Nic-Ma commented Jan 15, 2020 •

edited

Loading

vfdev-5 commented Jan 15, 2020 •

edited

Loading