4116 Add support for advanced args of AMP #4132

Nic-Ma · 2022-04-14T04:05:55Z

Fixes #4116 .

Description

This PR added support for the advanced args of PyTorch AMP module.

Status

Ready

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

merge master

Signed-off-by: Nic Ma <nma@nvidia.com>

Nic-Ma · 2022-04-15T14:24:31Z

Hi @ericspod @wyli @vfdev-5 ,

I feel maybe it's better to change the _iteration() function of engines to be a staticmethod or classmethod:
https://github.com/Project-MONAI/MONAI/blob/dev/monai/engines/trainer.py#L164
I have 2 reasons:

Currently, the self and the engine args in this function mean the same thing, some of the code uses self.XXX, some of the code uses engine.XXX. No need the self arg at all.
If users want to pass their own iteration logic to the engine, it will be a regular function without self arg, users may need to refer to our default implementation.

Another thing: I want to change all the typehints engine: Engine to engine: Workflow, because actually our engine functions only work with MONAI engine workflows and ignite Engine type caused many mypy errors.

What do you think?

Thanks in advance.

Signed-off-by: Nic Ma <nma@nvidia.com>

Nic-Ma · 2022-04-15T14:59:32Z

/black

Signed-off-by: monai-bot <monai.miccai2019@gmail.com>

Signed-off-by: Nic Ma <nma@nvidia.com>

Nic-Ma · 2022-04-16T01:15:45Z

/black

Nic-Ma · 2022-04-16T01:16:00Z

/build

Nic-Ma · 2022-04-18T02:17:18Z

Hi @ericspod @wyli @vfdev-5 ,

I feel maybe it's better to change the _iteration() function of engines to be a staticmethod or classmethod: https://github.com/Project-MONAI/MONAI/blob/dev/monai/engines/trainer.py#L164 I have 2 reasons:

Currently, the self and the engine args in this function mean the same thing, some of the code uses self.XXX, some of the code uses engine.XXX. No need the self arg at all.

If users want to pass their own iteration logic to the engine, it will be a regular function without self arg, users may need to refer to our default implementation.

Another thing: I want to change all the typehints engine: Engine to engine: Workflow, because actually our engine functions only work with MONAI engine workflows and ignite Engine type caused many mypy errors.

What do you think?

Thanks in advance.

Hi @ericspod ,

What do you think about these 2 points? If you don't have concerns, I will do it in a seperate PR in case we may revert..

Thanks in advance.

ericspod · 2022-04-19T13:09:14Z

The value of _iteration being a regular method is to allow override in subclasses easily. This could be done with classmethod as well but it's a slightly more complex mechanism that is a bit confusing to less advanced Python users, explaining the difference between staticmethod and classmethod can be difficult. I don't think staticmethod should really be used anyway since we can define functions outside classes, unlike pure OO languages.

For using self or engine I would normally want to use self to make it clear we're using the current object, even if the reader forgets that the engine argument and self are the same they'll be directed away from using the argument. However if anyone wants to define their own function rather than override the method it's easier to copy-paste code that doesn't reference self, so I'd go with engine.

Changing the type hint to Workflow is probably the right change now if we've totally broken compatibility with the original Engine class.

Nic-Ma · 2022-04-19T14:56:30Z

Hi @ericspod ,

Thanks very much for your detailed analysis and comments.
How about doing below changes in a separate PR:

Still defining _iteration() as a regular method, but unify all the self.XXX and engine.XXX to engine.XXX. Because (1) as you said users can easily copy-paste our code, (2) maybe someone will override the engine run logic someday and pass other engine as parameter instead of self: https://github.com/pytorch/ignite/blob/master/ignite/engine/engine.py#L859.
Check all the engine: Engine functions, if having MONAI specific logic, change the typehint to Workflow.

What do you think?
And this PR is only for the AMP parameters according to user's feedback, ready for review.

Thanks in advance.

ericspod · 2022-04-19T16:21:13Z

Sounds good to me. I'll approve here but we should work out what the failures are about, there's not error output.

Nic-Ma · 2022-04-19T22:07:34Z

Sounds good to me. I'll approve here but we should work out what the failures are about, there's not error output.

Hi @ericspod ,

Thanks for your review.
The failure is because Github changed CI authentication policy this week and our remote machine is offline and need to sign in again (@IsaacYangSLA or @wyli may know more). But currently, we mainly use the blossom CI environment, so the Github CI is just optional tests, we can merge PRs once blossom tests passed.

Thanks.

Nic-Ma · 2022-04-19T22:08:28Z

/build

wyli · 2022-04-26T12:28:17Z

I think these integration test errors are from the PR (for pytorch less than 1.10.x)

01:54:41  Current run is terminating due to exception: __init__() got an unexpected keyword argument 'dtype'
01:54:41  Exception: __init__() got an unexpected keyword argument 'dtype'
01:54:41  Traceback (most recent call last):
01:54:41    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 840, in _run_once_on_dataset
01:54:41      self.state.output = self._process_function(self, self.state.batch)
01:54:41    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 209, in _iteration
01:54:41      with torch.cuda.amp.autocast(**engine.amp_kwargs):
01:54:41  TypeError: __init__() got an unexpected keyword argument 'dtype'
01:54:42  Engine run is terminating due to exception: __init__() got an unexpected keyword argument 'dtype'
01:54:42  Exception: __init__() got an unexpected keyword argument 'dtype'
01:54:42  Traceback (most recent call last):
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 753, in _internal_run
01:54:42      time_taken = self._run_once_on_dataset()
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 854, in _run_once_on_dataset
01:54:42      self._handle_exception(e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 464, in _handle_exception
01:54:42      self._fire_event(Events.EXCEPTION_RAISED, e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 421, in _fire_event
01:54:42      func(*first, *(event_args + others), **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/handlers/stats_handler.py", line 179, in exception_raised
01:54:42      raise e
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 840, in _run_once_on_dataset
01:54:42      self.state.output = self._process_function(self, self.state.batch)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 209, in _iteration
01:54:42      with torch.cuda.amp.autocast(**engine.amp_kwargs):
01:54:42  TypeError: __init__() got an unexpected keyword argument 'dtype'
01:54:42  E
01:54:42  ======================================================================
01:54:42  ERROR: test_timing (__main__.IntegrationWorkflows)
01:54:42  ----------------------------------------------------------------------
01:54:42  TypeError: __init__() got an unexpected keyword argument 'dtype'
01:54:42  
01:54:42  The above exception was the direct cause of the following exception:
01:54:42  
01:54:42  Traceback (most recent call last):
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/utils.py", line 575, in _wrapper
01:54:42      raise RuntimeError(res.traceback) from res
01:54:42  RuntimeError: Traceback (most recent call last):
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/utils.py", line 526, in run_process
01:54:42      output = func(*args, **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/utils.py", line 609, in _call_original_func
01:54:42      return f(*args, **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/test_integration_workflows.py", line 361, in test_timing
01:54:42      self.train_and_infer(idx=2)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/test_integration_workflows.py", line 308, in train_and_infer
01:54:42      best_metric = run_training_test(self.data_dir, device=self.device, amp=(idx == 2))
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/tests/test_integration_workflows.py", line 210, in run_training_test
01:54:42      trainer.run()
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 58, in run
01:54:42      super().run()
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/workflow.py", line 290, in run
01:54:42      super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 704, in run
01:54:42      return self._internal_run()
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 783, in _internal_run
01:54:42      self._handle_exception(e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 464, in _handle_exception
01:54:42      self._fire_event(Events.EXCEPTION_RAISED, e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 421, in _fire_event
01:54:42      func(*first, *(event_args + others), **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/handlers/stats_handler.py", line 179, in exception_raised
01:54:42      raise e
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 753, in _internal_run
01:54:42      time_taken = self._run_once_on_dataset()
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 854, in _run_once_on_dataset
01:54:42      self._handle_exception(e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 464, in _handle_exception
01:54:42      self._fire_event(Events.EXCEPTION_RAISED, e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 421, in _fire_event
01:54:42      func(*first, *(event_args + others), **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/handlers/stats_handler.py", line 179, in exception_raised
01:54:42      raise e
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 840, in _run_once_on_dataset
01:54:42      self.state.output = self._process_function(self, self.state.batch)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 209, in _iteration
01:54:42      with torch.cuda.amp.autocast(**engine.amp_kwargs):
01:54:42  TypeError: __init__() got an unexpected keyword argument 'dtype'
01:54:42  
01:54:42  
01:54:42  ======================================================================
01:54:42  ERROR: test_training (__main__.IntegrationWorkflows)
01:54:42  ----------------------------------------------------------------------
01:54:42  Traceback (most recent call last):
01:54:42    File "tests/test_integration_workflows.py", line 355, in test_training
01:54:42      results = self.train_and_infer(idx=i)
01:54:42    File "tests/test_integration_workflows.py", line 308, in train_and_infer
01:54:42      best_metric = run_training_test(self.data_dir, device=self.device, amp=(idx == 2))
01:54:42    File "tests/test_integration_workflows.py", line 210, in run_training_test
01:54:42      trainer.run()
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 58, in run
01:54:42      super().run()
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/workflow.py", line 290, in run
01:54:42      super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 704, in run
01:54:42      return self._internal_run()
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 783, in _internal_run
01:54:42      self._handle_exception(e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 464, in _handle_exception
01:54:42      self._fire_event(Events.EXCEPTION_RAISED, e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 421, in _fire_event
01:54:42      func(*first, *(event_args + others), **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/handlers/stats_handler.py", line 179, in exception_raised
01:54:42      raise e
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 753, in _internal_run
01:54:42      time_taken = self._run_once_on_dataset()
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 854, in _run_once_on_dataset
01:54:42      self._handle_exception(e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 464, in _handle_exception
01:54:42      self._fire_event(Events.EXCEPTION_RAISED, e)
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 421, in _fire_event
01:54:42      func(*first, *(event_args + others), **kwargs)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/handlers/stats_handler.py", line 179, in exception_raised
01:54:42      raise e
01:54:42    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 840, in _run_once_on_dataset
01:54:42      self.state.output = self._process_function(self, self.state.batch)
01:54:42    File "/home/jenkins/agent/workspace/Monai-latest-image/monai/engines/trainer.py", line 209, in _iteration
01:54:42      with torch.cuda.amp.autocast(**engine.amp_kwargs):
01:54:42  TypeError: __init__() got an unexpected keyword argument 'dtype'
01:54:42  
01:54:42  ----------------------------------------------------------------------
01:54:42  Ran 2 tests in 208.820s

ericspod · 2022-04-26T13:33:14Z

I've suggested a fix here: #4178 I'll run the tests there to see if it works.

Add padding to filter to ensure same size after anti-aliasing Use replicate padding insteadof zero padding to avoid artifacts for non-zero boundary Reuse GaussianSmooth 4073 Enhance DynUNet doc-strings (Project-MONAI#4102) * Fix doc strings error Signed-off-by: Yiheng Wang <vennw@nvidia.com> * remove duplicate places Signed-off-by: Yiheng Wang <vennw@nvidia.com> 4105 drops pt16 support (Project-MONAI#4106) * update sys req Signed-off-by: Wenqi Li <wenqil@nvidia.com> * temp test Signed-off-by: Wenqi Li <wenqil@nvidia.com> * update code for torch>=1.7 Signed-off-by: Wenqi Li <wenqil@nvidia.com> * temp tests Signed-off-by: Wenqi Li <wenqil@nvidia.com> * fixes tests Signed-off-by: Wenqi Li <wenqil@nvidia.com> * autofix Signed-off-by: Wenqi Li <wenqil@nvidia.com> * fixes import Signed-off-by: Wenqi Li <wenqil@nvidia.com> * clear cache Signed-off-by: Wenqi Li <wenqil@nvidia.com> * update based on comments Signed-off-by: Wenqi Li <wenqil@nvidia.com> * remove temp cmd Signed-off-by: Wenqi Li <wenqil@nvidia.com> Make `pixelshuffle` scriptable (Project-MONAI#4109) * Update the existing functionality to comply with the `torchscript.jit.script` function. Signed-off-by: Ramon Emiliani <ramon@afxmedical.com> meta tensor (Project-MONAI#4077) * meta tensor Signed-off-by: Richard Brown <33289025+rijobro@users.noreply.github.com> 4084 Add kwargs for `Tensor.to()` in engines (Project-MONAI#4112) * [DLMED] add kwargs for to() API Signed-off-by: Nic Ma <nma@nvidia.com> * [MONAI] python code formatting Signed-off-by: monai-bot <monai.miccai2019@gmail.com> * [DLMED] fix typo Signed-off-by: Nic Ma <nma@nvidia.com> * [DLMED] fix flake8 Signed-off-by: Nic Ma <nma@nvidia.com> * [DLMED] update according to comments Signed-off-by: Nic Ma <nma@nvidia.com> Co-authored-by: monai-bot <monai.miccai2019@gmail.com> fixes pytorch version tests (Project-MONAI#4127) Signed-off-by: Wenqi Li <wenqil@nvidia.com> update meta tensor api (Project-MONAI#4131) * update meta tensor api Signed-off-by: Wenqi Li <wenqil@nvidia.com> * update based on comments Signed-off-by: Wenqi Li <wenqil@nvidia.com> runtests.sh isort (Project-MONAI#4134) Signed-off-by: Richard Brown <33289025+rijobro@users.noreply.github.com> update citation (Project-MONAI#4133) Signed-off-by: Wenqi Li <wenqil@nvidia.com> `ToMetaTensor` and `FromMetaTensor` transforms (Project-MONAI#4115) to and from meta no skip if before pytorch 1.7 (Project-MONAI#4139) * no skip if before pytorch 1.7 Signed-off-by: Richard Brown <33289025+rijobro@users.noreply.github.com> * fix Signed-off-by: Richard Brown <33289025+rijobro@users.noreply.github.com> * fix Signed-off-by: Richard Brown <33289025+rijobro@users.noreply.github.com> [DLMED] fix file name in meta (Project-MONAI#4145) Signed-off-by: Nic Ma <nma@nvidia.com> 4116 Add support for advanced args of AMP (Project-MONAI#4132) * [DLMED] fix typo in bundle scripts Signed-off-by: Nic Ma <nma@nvidia.com> * [DLMED] add support for AMP args Signed-off-by: Nic Ma <nma@nvidia.com> * [MONAI] python code formatting Signed-off-by: monai-bot <monai.miccai2019@gmail.com> * [DLMED] fix flake8 Signed-off-by: Nic Ma <nma@nvidia.com> Co-authored-by: monai-bot <monai.miccai2019@gmail.com> New wsireader (Project-MONAI#4147) `MetaTensor`: collate; decollate; dataset; dataloader; out=; indexing and iterating across batches (Project-MONAI#4137) `MetaTensor`: collate; decollate; dataset; dataloader; out=; indexing and iterating across batches (Project-MONAI#4137)

Nic-Ma and others added 8 commits February 1, 2021 19:15

Merge pull request #19 from Project-MONAI/master

42a45e0

merge master

Merge pull request #32 from Project-MONAI/master

cd16a13

merge master

Merge pull request #180 from Project-MONAI/dev

6f87afd

merge master

Merge pull request #214 from Project-MONAI/dev

f398298

merge master

Merge pull request #397 from Project-MONAI/dev

ec463d6

merge master

Merge pull request #402 from Project-MONAI/dev

ddea3c0

merge master

[DLMED] fix typo in bundle scripts

756a9a4

Signed-off-by: Nic Ma <nma@nvidia.com>

Merge branch 'dev' into 4116-extend-amp-args

3eee718

[DLMED] add support for AMP args

21edab5

Signed-off-by: Nic Ma <nma@nvidia.com>

monai-bot and others added 2 commits April 15, 2022 15:03

[MONAI] python code formatting

db4d91a

Signed-off-by: monai-bot <monai.miccai2019@gmail.com>

[DLMED] fix flake8

6c0cf78

Signed-off-by: Nic Ma <nma@nvidia.com>

Nic-Ma marked this pull request as ready for review April 16, 2022 01:15

Nic-Ma changed the title ~~[WIP] 4116 Add support for advanced args of AMP~~ 4116 Add support for advanced args of AMP Apr 16, 2022

Nic-Ma requested review from ericspod, rijobro and wyli April 16, 2022 01:17

ericspod approved these changes Apr 19, 2022

View reviewed changes

Merge branch 'dev' into 4116-extend-amp-args

e9e7567

Nic-Ma enabled auto-merge (squash) April 19, 2022 22:12

Nic-Ma mentioned this pull request Apr 19, 2022

Update engines according to discussion #4148

Closed

Nic-Ma merged commit 4a8f815 into Project-MONAI:dev Apr 19, 2022

Nic-Ma mentioned this pull request Apr 20, 2022

4148 Enhance engine iteration logic and the typehints #4150

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4116 Add support for advanced args of AMP #4132

4116 Add support for advanced args of AMP #4132

Nic-Ma commented Apr 14, 2022 •

edited

Loading

Nic-Ma commented Apr 15, 2022 •

edited

Loading

Nic-Ma commented Apr 15, 2022

Nic-Ma commented Apr 16, 2022

Nic-Ma commented Apr 16, 2022

Nic-Ma commented Apr 18, 2022

ericspod commented Apr 19, 2022

Nic-Ma commented Apr 19, 2022

ericspod commented Apr 19, 2022

Nic-Ma commented Apr 19, 2022

Nic-Ma commented Apr 19, 2022

wyli commented Apr 26, 2022

ericspod commented Apr 26, 2022

4116 Add support for advanced args of AMP #4132

4116 Add support for advanced args of AMP #4132

Conversation

Nic-Ma commented Apr 14, 2022 • edited Loading

Description

Status

Types of changes

Nic-Ma commented Apr 15, 2022 • edited Loading

Nic-Ma commented Apr 15, 2022

Nic-Ma commented Apr 16, 2022

Nic-Ma commented Apr 16, 2022

Nic-Ma commented Apr 18, 2022

ericspod commented Apr 19, 2022

Nic-Ma commented Apr 19, 2022

ericspod commented Apr 19, 2022

Nic-Ma commented Apr 19, 2022

Nic-Ma commented Apr 19, 2022

wyli commented Apr 26, 2022

ericspod commented Apr 26, 2022

Nic-Ma commented Apr 14, 2022 •

edited

Loading

Nic-Ma commented Apr 15, 2022 •

edited

Loading