Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data transfer model hook (+ refactor) #1756

Merged
merged 18 commits into from
Jun 3, 2020
Merged

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented May 7, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Part 1 of feature #1245 by introducing a hook.

  1. Refactored data transfer method such that all device information is contained in torch.device (necessary for model hook)
  2. Removed .cuda() call since it is obsolete (can use .to() for every tensor)
  3. Added a model hook which is called for all device transfers and by default implements the supported datatypes. The user can override this model hook to implement transfers of custom batch types.

Open Questions:

  • What do we call the hook? transfer_data_to_device? transfer_batch_to_device? transfer_batch?

Other PRs that are blocked by this: #1729, #1526

Link to test TPU works with this branch:
https://colab.research.google.com/drive/1wy6sbl8Bh6S3QaHzBsJNbhQ6UX1IC864?usp=sharing

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team May 7, 2020 16:46
@codecov
Copy link

codecov bot commented May 9, 2020

Codecov Report

Merging #1756 into master will increase coverage by 0%.
The diff coverage is 76%.

@@          Coverage Diff           @@
##           master   #1756   +/-   ##
======================================
  Coverage      86%     86%           
======================================
  Files          74      75    +1     
  Lines        4713    4705    -8     
======================================
- Hits         4070    4064    -6     
+ Misses        643     641    -2     

@awaelchli awaelchli changed the title refactor data transfer + hook [blocked by #1729, #1526] refactor data transfer + hook May 9, 2020
@awaelchli awaelchli added the feature Is an improvement or enhancement label May 9, 2020
@awaelchli awaelchli marked this pull request as ready for review May 9, 2020 02:58
@mergify
Copy link
Contributor

mergify bot commented May 9, 2020

This pull request is now in conflict... :(

@awaelchli awaelchli force-pushed the feature/data-transfer-hook branch from 06c1ac2 to 7aa093d Compare May 14, 2020 21:47
@mergify
Copy link
Contributor

mergify bot commented May 14, 2020

This pull request is now in conflict... :(

@awaelchli awaelchli force-pushed the feature/data-transfer-hook branch from 7aa093d to cbdefc8 Compare May 14, 2020 22:40
@mergify mergify bot requested a review from a team May 15, 2020 05:58
@awaelchli awaelchli changed the title [blocked by #1729, #1526] refactor data transfer + hook [blocked by #1729, #1526] refactor data transfer + hook [wip] May 15, 2020
@awaelchli awaelchli changed the title [blocked by #1729, #1526] refactor data transfer + hook [wip] [blocked by #1729, #1526] refactor data transfer + hook May 15, 2020
@mergify
Copy link
Contributor

mergify bot commented May 17, 2020

This pull request is now in conflict... :(

@awaelchli awaelchli changed the title [blocked by #1729, #1526] refactor data transfer + hook [blocked by #1526] refactor data transfer + hook May 18, 2020
@awaelchli awaelchli force-pushed the feature/data-transfer-hook branch from 18dff98 to c7e4493 Compare May 18, 2020 03:20
@awaelchli awaelchli changed the title [blocked by #1526] refactor data transfer + hook refactor data transfer + hook May 18, 2020
@awaelchli awaelchli changed the title refactor data transfer + hook data transfer model hook (+ refactor) May 18, 2020
@mergify mergify bot requested a review from a team May 19, 2020 06:04
@awaelchli awaelchli force-pushed the feature/data-transfer-hook branch from c7e4493 to c884f68 Compare May 19, 2020 22:41
@awaelchli awaelchli requested a review from justusschock May 19, 2020 23:47
@awaelchli
Copy link
Contributor Author

awaelchli commented May 19, 2020

@justusschock @Borda I factored out the transfer function to a utility and wanted to ask you if it is a good place to put it next to the apply_to_collections.
Motivation was to make it accessible to user when they do their custom batch transfer stuff in the model hook.

Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some missing docstrings :)

pytorch_lightning/utilities/apply_func.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/distrib_parts.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team May 20, 2020 06:31
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@mergify mergify bot requested a review from a team May 20, 2020 12:55
@awaelchli awaelchli requested a review from williamFalcon May 20, 2020 17:33
@awaelchli awaelchli force-pushed the feature/data-transfer-hook branch from 97c9fdb to ea9fab3 Compare June 2, 2020 23:10
@awaelchli
Copy link
Contributor Author

@williamFalcon true that is confusing.
I renamed it now.

@williamFalcon williamFalcon merged commit 8211256 into master Jun 3, 2020
@Borda Borda deleted the feature/data-transfer-hook branch June 3, 2020 05:59
@ZhaofengWu
Copy link
Contributor

To confirm, this does not work for DDP, because in DDP we use the default scatter to move the tensors, correct? Is there a way to similarly customize this behavior for DDP?
https://github.com/PyTorchLightning/pytorch-lightning/blob/16a7326e5259a3cdd20a508c34a0f84806d88f8e/pytorch_lightning/trainer/training_loop.py#L736-L737

@awaelchli
Copy link
Contributor Author

@ZhaofengWu I did not know this. I thought the Trainer always called the same function to move data to the device. I searched in the PyTorch docs for DP and DDP but it seems to me it is not possible to override the scattering of custom batch objects. I guess the best we could do is add a note to our docs?

@ZhaofengWu
Copy link
Contributor

I don't know enough about this to know if there's any workaround. It'd be great if this override consistently works in all scenarios, but I guess if it doesn't work, it doesn't work.

@ZhaofengWu
Copy link
Contributor

But yes, in any case, at least there should be a note.

@ZhaofengWu
Copy link
Contributor

@awaelchli Do you want me to open a separate issue?

@awaelchli
Copy link
Contributor Author

yes good idea, could you do that please?

justusschock pushed a commit that referenced this pull request Jun 29, 2020
* refactor and added hook


variant a


variant b


add test


revert rename


add changelog


docs

* resolve merge duplication

* overridden typo

* fix test

* tpu id

* raise if TPU not available

* re-use apply_to_collection function for parsing collections

* comment

* make utility function available to user

* documentation

* move changelog entry to top

* fix tpu transfer call

* fix call

* remove hardcoded string

* improve test

* call model hook by default

* Apply suggestions from code review

* rename utility function

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants