Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training stats event-handler (for printing training losses) #7

Closed
wyli opened this issue Jan 10, 2020 · 13 comments · Fixed by #71
Closed

training stats event-handler (for printing training losses) #7

wyli opened this issue Jan 10, 2020 · 13 comments · Fixed by #71
Assignees

Comments

@wyli
Copy link
Contributor

wyli commented Jan 10, 2020

No description provided.

@wyli wyli assigned wyli, atbenmurray, yanchengnv and Nic-Ma and unassigned wyli and yanchengnv Jan 10, 2020
@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 14, 2020

Hi @yanchengnv and @wyli ,

I found a similar example in @ericspod 's notebook example:
@trainer.on(Events.EPOCH_COMPLETED)
def log_training_loss(engine):
print("Epoch", engine.state.epoch, "Loss:", engine.state.output)

About this task, do you mean something like it?
Thanks.

@ericspod
Copy link
Member

We would something a bit more generalized with optional outputs for end of iteration, end of step, end of train, etc. We ought to be logging to the engine's log object instead of printing to stdout, we can add and printing handler to that log object to do both of course.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 14, 2020

Hi @ericspod ,

Thanks for your suggestion about general StatsLogger.
Intead of engine.log, can we put all the useful outputs to engine.state.metrics?
We can also get "iteration", "epoch", "max_epochs", "epoch_length", etc. from engine.state.
I think engine.state is designed to be an unified API to store useful information for all kinds of event-handlers.
If we aligned on this direction, I can try to make a PR based on engine.state.
Thanks.

@vfdev-5
Copy link
Member

vfdev-5 commented Jan 14, 2020

Sorry for jumping in into your conversation, I just would like to help with that and make more clear on what is provided out-of-the-box in ignite for that:

  • TensorboardLogger if you would like to write any output parameters (e.g. batch losses), computed metrics, optimizer params etc.
  • VisdomLogger, same as TensorboardLogger but for visdom.
  • ProgressBar is a tqdm bar for displaying the progress and showing some values of interest.

In my experience, using experiment tracking systems like MLflow or Polyaxon, we can either log to the system via their api (and ignite's wrappers like MLflowLogger or PolyaxonLogger), write events to TensorBoard or simply print values to stdout and this is automatically written to a log file. The first and the second approaches are obviously more interesting if we would like to compare different runs etc.

HTH

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 15, 2020

Hi @vfdev-5 ,

Thanks very much for your detailed sharing!
I will take a deep dive into your examples.

And @wyli @ericspod @yanchengnv ,

About the usage of Ignite, have we aligned to use only Ignite official code or both official code and the 3rd contrib code?
Thanks.

@ericspod
Copy link
Member

@vfdev-5 @Nic-Ma I've used the log file for logging just messages and such, the SessionSaver class in ptproto creates a new directory in a given parent directory for every new run and sends the log to a file there along with the checkpoints and saved networks. My subclasses of Engine add extra fields to the state and we could add more things to it, I would think that metrics should only be the output from metric handlers and shouldn't have anything else. Returning to the idea of the session handling if we're saving the whole engine state (or everything without large tensors) then these other things we add will get saved as well.

@ericspod
Copy link
Member

@vfdev-5 One thing to mention is that tqdm doesn't play well with Jupyterlab for some reason, I believe it's a known bug. I had written a super primitive text progress bar that works, I don't know if we collectively want to investigate anything else. I really like doing things through Jupyter a lot, so stuff that doesn't rely on tensorboard/visdom is what I would prefer.

@fepegar
Copy link

fepegar commented Jan 15, 2020

There is a tqdm_notebook that works quite well: https://pypi.org/project/tqdm/#ipython-jupyter-integration

@ericspod
Copy link
Member

@fepegar I think that does have issues with Jupyterlab, Jupyter notebook vanilla I think is fine. I don't why but they're different.

@fepegar
Copy link

fepegar commented Jan 15, 2020

Yes, I've had trouble before on JupyterLab. But I think installing the widgets extension solves it: https://ipywidgets.readthedocs.io/en/latest/user_install.html#installing-the-jupyterlab-extension

@vfdev-5
Copy link
Member

vfdev-5 commented Jan 15, 2020

@ericspod I'm also using jupyterlab for development, it provides a cool environment for research/prototyping/testing etc.

so stuff that doesn't rely on tensorboard/visdom is what I would prefer.

However, how do you plan to run and then organize and compare various trainings for the same task ?

@ericspod
Copy link
Member

@fepegar I thought I had tried that and it didn't fix the issue, maybe it didn't load correct for me? I'll try again.

@vfdev-5 That is something that I wasn't doing in a great way so definitely we should be targeting ways of supporting lab and tensorboard/visdom.

@pdogra89
Copy link

pdogra89 commented Feb 7, 2020

Yan - Setup time to discuss design choice here.

@wyli wyli closed this as completed in #71 Feb 12, 2020
Nic-Ma added a commit that referenced this issue Jun 26, 2020
* adds network

* adds basic training

* update loading

* working prototype

* update validation set

* [MONAI] Add author; paper info; PDDCA18 (#6)

+ Author
+ Early accept
+ PDDCA18 link

* Update README.md

* adds network

* adds basic training

* update loading

* working prototype

* update validation set

* [MONAI] Update TRAIN_PATH, VAL_PATH (#8)

+ Update TRAIN_PATH, VAL_PATH

* [MONAI] Add data link (#7)

+ Add data link https://drive.google.com/file/d/1A2zpVlR3CkvtkJPvtAF3-MH0nr1WZ2Mn/view?usp=sharing

* fixes typos

* tested new dataset

* print more infor, checked new dataset

* [MONAI] Add paper link (#9)

Add paper link https://arxiv.org/abs/2006.12575

* [MONAI] Use dice loss + focal loss to train (#10)

Use dice loss + focal loss to train

* [MONAI] Support non-one-hot ground truth (#11)

Support non-one-hot ground truth

* fixes format and docstrings, adds argparser options

* resume the focal_loss

* adds tests

* [MONAI] Support non-one-hot ground truth (#11)

Support non-one-hot ground truth

* adds tests

* update docstring

* [MONAI] Keep track of best validation scores (#12)

Keep track of best validation scores

* model saving

* adds window sampling

* update readme

* update docs

* fixes flake8 error

* update window sampling

* fixes model name

* fixes channel size issue

* [MONAI] Update --pretrain, --lr (#13)

+ lr from 5e-4 to 1e-3 because we use mean for class channel instead of sum for class channel.
+ pretrain path is consistent with current model_name.

* [MONAI] Pad image; elastic; best class model (#14)

* [MONAI] Pad image; elastic; best class model

+ Pad image bigger than crop_size, avoid potential issues in RandCropByPosNegLabeld
+ Use Rand3DElasticd
+ Save best model for each class

* Update train.py

Co-authored-by: Wenqi Li <wenqil@nvidia.com>

* flake8 fixes

* removes -1 cropsize deform

* testing commands

* fixes unit tests

* update spatial padding

* [MONAI] Add full image deform augmentation (#15)

+ Add full image deform augmentation by Rand3DElasticd
+ Please use latest MONAI in #623

* Adding py.typed

* updating setup.py to comply with black

* update based on comments

* excluding research from packaging

* update tests

* update setup.py

Co-authored-by: Wentao Zhu <wentaozhu1991@gmail.com>
Co-authored-by: Neil Tenenholtz <ntenenz@users.noreply.github.com>
Co-authored-by: Nic Ma <nma@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants