Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4116 Add support for advanced args of AMP #4132

Merged
merged 12 commits into from
Apr 19, 2022

Conversation

Nic-Ma
Copy link
Contributor

@Nic-Ma Nic-Ma commented Apr 14, 2022

Fixes #4116 .

Description

This PR added support for the advanced args of PyTorch AMP module.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Apr 15, 2022

Hi @ericspod @wyli @vfdev-5 ,

I feel maybe it's better to change the _iteration() function of engines to be a staticmethod or classmethod:
https://github.com/Project-MONAI/MONAI/blob/dev/monai/engines/trainer.py#L164
I have 2 reasons:

  1. Currently, the self and the engine args in this function mean the same thing, some of the code uses self.XXX, some of the code uses engine.XXX. No need the self arg at all.
  2. If users want to pass their own iteration logic to the engine, it will be a regular function without self arg, users may need to refer to our default implementation.

Another thing: I want to change all the typehints engine: Engine to engine: Workflow, because actually our engine functions only work with MONAI engine workflows and ignite Engine type caused many mypy errors.

What do you think?

Thanks in advance.

Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Apr 15, 2022

/black

monai-bot and others added 2 commits April 15, 2022 15:03
Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma Nic-Ma marked this pull request as ready for review April 16, 2022 01:15
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Apr 16, 2022

/black

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Apr 16, 2022

/build

@Nic-Ma Nic-Ma changed the title [WIP] 4116 Add support for advanced args of AMP 4116 Add support for advanced args of AMP Apr 16, 2022
@Nic-Ma Nic-Ma requested review from ericspod, rijobro and wyli April 16, 2022 01:17
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Apr 18, 2022

Hi @ericspod @wyli @vfdev-5 ,

I feel maybe it's better to change the _iteration() function of engines to be a staticmethod or classmethod: https://github.com/Project-MONAI/MONAI/blob/dev/monai/engines/trainer.py#L164 I have 2 reasons:

  1. Currently, the self and the engine args in this function mean the same thing, some of the code uses self.XXX, some of the code uses engine.XXX. No need the self arg at all.
  2. If users want to pass their own iteration logic to the engine, it will be a regular function without self arg, users may need to refer to our default implementation.

Another thing: I want to change all the typehints engine: Engine to engine: Workflow, because actually our engine functions only work with MONAI engine workflows and ignite Engine type caused many mypy errors.

What do you think?

Thanks in advance.

Hi @ericspod ,

What do you think about these 2 points? If you don't have concerns, I will do it in a seperate PR in case we may revert..

Thanks in advance.

@ericspod
Copy link
Member

The value of _iteration being a regular method is to allow override in subclasses easily. This could be done with classmethod as well but it's a slightly more complex mechanism that is a bit confusing to less advanced Python users, explaining the difference between staticmethod and classmethod can be difficult. I don't think staticmethod should really be used anyway since we can define functions outside classes, unlike pure OO languages.

For using self or engine I would normally want to use self to make it clear we're using the current object, even if the reader forgets that the engine argument and self are the same they'll be directed away from using the argument. However if anyone wants to define their own function rather than override the method it's easier to copy-paste code that doesn't reference self, so I'd go with engine.

Changing the type hint to Workflow is probably the right change now if we've totally broken compatibility with the original Engine class.

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Apr 19, 2022

Hi @ericspod ,

Thanks very much for your detailed analysis and comments.
How about doing below changes in a separate PR:

  1. Still defining _iteration() as a regular method, but unify all the self.XXX and engine.XXX to engine.XXX. Because (1) as you said users can easily copy-paste our code, (2) maybe someone will override the engine run logic someday and pass other engine as parameter instead of self: https://github.com/pytorch/ignite/blob/master/ignite/engine/engine.py#L859.
  2. Check all the engine: Engine functions, if having MONAI specific logic, change the typehint to Workflow.

What do you think?
And this PR is only for the AMP parameters according to user's feedback, ready for review.

Thanks in advance.

@ericspod
Copy link
Member

Sounds good to me. I'll approve here but we should work out what the failures are about, there's not error output.

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Apr 19, 2022

Sounds good to me. I'll approve here but we should work out what the failures are about, there's not error output.

Hi @ericspod ,

Thanks for your review.
The failure is because Github changed CI authentication policy this week and our remote machine is offline and need to sign in again (@IsaacYangSLA or @wyli may know more). But currently, we mainly use the blossom CI environment, so the Github CI is just optional tests, we can merge PRs once blossom tests passed.

Thanks.

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Apr 19, 2022

/build

@Nic-Ma Nic-Ma enabled auto-merge (squash) April 19, 2022 22:12
@Nic-Ma Nic-Ma merged commit 4a8f815 into Project-MONAI:dev Apr 19, 2022
@wyli
Copy link
Contributor

wyli commented Apr 26, 2022

I think these integration test errors are from the PR (for pytorch less than 1.10.x)

01:54:41  Current run is terminating due to exception: __init__() got an unexpected keyword argument 'dtype'
01:54:41  Exception: __init__() got an unexpected keyword argument 'dtype'
01:54:41  Traceback (most recent call last):
01:54:41    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 840, in _run_once_on_dataset
01:54:41      self.state.output = self._process_function(self, self.state.batch)
01:54:41    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 209, in _iteration
01:54:41      with torch.cuda.amp.autocast(**engine.amp_kwargs):
01:54:41  TypeError: __init__() got an unexpected keyword argument 'dtype'
01:54:42  Engine run is terminating due to exception: __init__() got an unexpected keyword argument 'dtype'
01:54:42  Exception: __init__() got an unexpected keyword argument 'dtype'
01:54:42  Traceback (most recent call last):
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 753, in _internal_run
01:54:42      time_taken = self._run_once_on_dataset()
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 854, in _run_once_on_dataset
01:54:42      self._handle_exception(e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 464, in _handle_exception
01:54:42      self._fire_event(Events.EXCEPTION_RAISED, e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 421, in _fire_event
01:54:42      func(*first, *(event_args + others), **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/handlers/stats_handler.py", line 179, in exception_raised
01:54:42      raise e
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 840, in _run_once_on_dataset
01:54:42      self.state.output = self._process_function(self, self.state.batch)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 209, in _iteration
01:54:42      with torch.cuda.amp.autocast(**engine.amp_kwargs):
01:54:42  TypeError: __init__() got an unexpected keyword argument 'dtype'
01:54:42  E
01:54:42  ======================================================================
01:54:42  ERROR: test_timing (__main__.IntegrationWorkflows)
01:54:42  ----------------------------------------------------------------------
01:54:42  TypeError: __init__() got an unexpected keyword argument 'dtype'
01:54:42  
01:54:42  The above exception was the direct cause of the following exception:
01:54:42  
01:54:42  Traceback (most recent call last):
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/utils.py", line 575, in _wrapper
01:54:42      raise RuntimeError(res.traceback) from res
01:54:42  RuntimeError: Traceback (most recent call last):
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/utils.py", line 526, in run_process
01:54:42      output = func(*args, **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/utils.py", line 609, in _call_original_func
01:54:42      return f(*args, **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/test_integration_workflows.py", line 361, in test_timing
01:54:42      self.train_and_infer(idx=2)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/test_integration_workflows.py", line 308, in train_and_infer
01:54:42      best_metric = run_training_test(self.data_dir, device=self.device, amp=(idx == 2))
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/test_integration_workflows.py", line 210, in run_training_test
01:54:42      trainer.run()
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 58, in run
01:54:42      super().run()
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/workflow.py", line 290, in run
01:54:42      super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 704, in run
01:54:42      return self._internal_run()
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 783, in _internal_run
01:54:42      self._handle_exception(e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 464, in _handle_exception
01:54:42      self._fire_event(Events.EXCEPTION_RAISED, e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 421, in _fire_event
01:54:42      func(*first, *(event_args + others), **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/handlers/stats_handler.py", line 179, in exception_raised
01:54:42      raise e
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 753, in _internal_run
01:54:42      time_taken = self._run_once_on_dataset()
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 854, in _run_once_on_dataset
01:54:42      self._handle_exception(e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 464, in _handle_exception
01:54:42      self._fire_event(Events.EXCEPTION_RAISED, e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 421, in _fire_event
01:54:42      func(*first, *(event_args + others), **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/handlers/stats_handler.py", line 179, in exception_raised
01:54:42      raise e
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 840, in _run_once_on_dataset
01:54:42      self.state.output = self._process_function(self, self.state.batch)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 209, in _iteration
01:54:42      with torch.cuda.amp.autocast(**engine.amp_kwargs):
01:54:42  TypeError: __init__() got an unexpected keyword argument 'dtype'
01:54:42  
01:54:42  
01:54:42  ======================================================================
01:54:42  ERROR: test_training (__main__.IntegrationWorkflows)
01:54:42  ----------------------------------------------------------------------
01:54:42  Traceback (most recent call last):
01:54:42    File "tests/test_integration_workflows.py", line 355, in test_training
01:54:42      results = self.train_and_infer(idx=i)
01:54:42    File "tests/test_integration_workflows.py", line 308, in train_and_infer
01:54:42      best_metric = run_training_test(self.data_dir, device=self.device, amp=(idx == 2))
01:54:42    File "tests/test_integration_workflows.py", line 210, in run_training_test
01:54:42      trainer.run()
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 58, in run
01:54:42      super().run()
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/workflow.py", line 290, in run
01:54:42      super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 704, in run
01:54:42      return self._internal_run()
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 783, in _internal_run
01:54:42      self._handle_exception(e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 464, in _handle_exception
01:54:42      self._fire_event(Events.EXCEPTION_RAISED, e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 421, in _fire_event
01:54:42      func(*first, *(event_args + others), **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/handlers/stats_handler.py", line 179, in exception_raised
01:54:42      raise e
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 753, in _internal_run
01:54:42      time_taken = self._run_once_on_dataset()
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 854, in _run_once_on_dataset
01:54:42      self._handle_exception(e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 464, in _handle_exception
01:54:42      self._fire_event(Events.EXCEPTION_RAISED, e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 421, in _fire_event
01:54:42      func(*first, *(event_args + others), **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/handlers/stats_handler.py", line 179, in exception_raised
01:54:42      raise e
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 840, in _run_once_on_dataset
01:54:42      self.state.output = self._process_function(self, self.state.batch)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 209, in _iteration
01:54:42      with torch.cuda.amp.autocast(**engine.amp_kwargs):
01:54:42  TypeError: __init__() got an unexpected keyword argument 'dtype'
01:54:42  
01:54:42  ----------------------------------------------------------------------
01:54:42  Ran 2 tests in 208.820s

@ericspod
Copy link
Member

I've suggested a fix here: #4178 I'll run the tests there to see if it works.

Can-Zhao added a commit to Can-Zhao/MONAI that referenced this pull request May 10, 2022
Add padding to filter to ensure same size after anti-aliasing

Use replicate padding insteadof zero padding to avoid artifacts for non-zero boundary

Reuse GaussianSmooth

4073 Enhance DynUNet doc-strings (Project-MONAI#4102)

* Fix doc strings error

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

* remove duplicate places

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

4105 drops pt16 support (Project-MONAI#4106)

* update sys req

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

* temp test

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

* update code for torch>=1.7

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

* temp tests

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

* fixes tests

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

* autofix

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

* fixes import

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

* clear cache

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

* update based on comments

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

* remove temp cmd

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

Make `pixelshuffle` scriptable (Project-MONAI#4109)

* Update the existing functionality to comply with the `torchscript.jit.script` function.

Signed-off-by: Ramon Emiliani <ramon@afxmedical.com>

meta tensor (Project-MONAI#4077)

* meta tensor

Signed-off-by: Richard Brown <33289025+rijobro@users.noreply.github.com>

4084 Add kwargs for `Tensor.to()` in engines (Project-MONAI#4112)

* [DLMED] add kwargs for to() API

Signed-off-by: Nic Ma <nma@nvidia.com>

* [MONAI] python code formatting

Signed-off-by: monai-bot <monai.miccai2019@gmail.com>

* [DLMED] fix typo

Signed-off-by: Nic Ma <nma@nvidia.com>

* [DLMED] fix flake8

Signed-off-by: Nic Ma <nma@nvidia.com>

* [DLMED] update according to comments

Signed-off-by: Nic Ma <nma@nvidia.com>

Co-authored-by: monai-bot <monai.miccai2019@gmail.com>

fixes pytorch version tests (Project-MONAI#4127)

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

update meta tensor api (Project-MONAI#4131)

* update meta tensor api

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

* update based on comments

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

runtests.sh isort (Project-MONAI#4134)

Signed-off-by: Richard Brown <33289025+rijobro@users.noreply.github.com>

update citation (Project-MONAI#4133)

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

`ToMetaTensor` and `FromMetaTensor` transforms (Project-MONAI#4115)

to and from meta

no skip if before pytorch 1.7 (Project-MONAI#4139)

* no skip if before pytorch 1.7

Signed-off-by: Richard Brown <33289025+rijobro@users.noreply.github.com>

* fix

Signed-off-by: Richard Brown <33289025+rijobro@users.noreply.github.com>

* fix

Signed-off-by: Richard Brown <33289025+rijobro@users.noreply.github.com>

[DLMED] fix file name in meta (Project-MONAI#4145)

Signed-off-by: Nic Ma <nma@nvidia.com>

4116 Add support for advanced args of AMP (Project-MONAI#4132)

* [DLMED] fix typo in bundle scripts

Signed-off-by: Nic Ma <nma@nvidia.com>

* [DLMED] add support for AMP args

Signed-off-by: Nic Ma <nma@nvidia.com>

* [MONAI] python code formatting

Signed-off-by: monai-bot <monai.miccai2019@gmail.com>

* [DLMED] fix flake8

Signed-off-by: Nic Ma <nma@nvidia.com>

Co-authored-by: monai-bot <monai.miccai2019@gmail.com>

New wsireader (Project-MONAI#4147)

`MetaTensor`: collate; decollate; dataset; dataloader; out=; indexing and iterating across batches (Project-MONAI#4137)

`MetaTensor`: collate; decollate; dataset; dataloader; out=; indexing and iterating across batches (Project-MONAI#4137)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend AMP selection choices
4 participants