Bugfix/fix gan example #2019

lobantseff · 2020-05-30T15:44:36Z

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)

What does this PR do?

GAN example in repo was not working. Fixed typos and working in dp, and ddp mode.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

ternaus · 2020-05-30T22:22:20Z

Is there a way to implement caching of the generated images?

Without it, the code works twice slower as we need to generate both in generator and discriminator phases.

williamFalcon · 2020-05-31T04:43:54Z

@armavox we moved away from hparams.
Made the changes, verify?

ternaus · 2020-05-31T05:02:59Z

@williamFalcon is there a way to avoid doing self(z) in the discriminator phase and reuse images generated at the generator phase?

lobantseff · 2020-05-31T07:04:37Z

@williamFalcon ok. Good.

lobantseff · 2020-05-31T07:15:28Z

@ternaus you’re right. However, the problem is that in forward(), model class, which is used by distributed training is replicated every time, then the training_step() is called for each copycat. So it seems that there is the only way to store buffered variables - to save it in the initial class before replicating, e.g. to initialize variable self.generated_images = None in __init__, and then put there value from one of the copies. But each copy tends to generate its own generated_images, hence the use of common buffer seems to be incorrect.

PyTorch developers warn about that: https://pytorch.org/docs/stable/nn.html#dataparallel-layers-multi-gpu-distributed

Maybe someone can propose a good idea of how to overcome this and make its own buffer for each replica.

Best regards, Artem.

williamFalcon · 2020-05-31T12:31:17Z

@ternaus yes, easiest is to use ddp and NOT dp (dp is not recommended anyhow).

However, we're working on a fix to maintain state in dp @ananyahjha93

ananyahjha93 · 2020-05-31T14:35:09Z

@williamFalcon thanks for pointing me to this updated GAN example.

lobantseff · 2020-05-31T14:39:36Z

@williamFalcon @ananyahjha93 may I see the branch, where are you working on that problem?

ananyahjha93 · 2020-05-31T14:42:44Z

@armavox This is a [wip] commit but the current look of the solution is quite different from this. So, this is not a representation of the solution we are working on right now.

#1895

* 🐛 fixed fake example type assigning and hparams arg * fixed GAN example to work with dp, ddp., ddp_cpu * Update generative_adversarial_net.py Co-authored-by: William Falcon <waf2107@columbia.edu>

Artem Lobantsev added 2 commits May 30, 2020 17:06

🐛 fixed fake example type assigning and hparams arg

ca8bfe9

fixed GAN example to work with dp, ddp., ddp_cpu

d69a644

mergify bot requested a review from a team May 30, 2020 15:45

Borda added the bug Something isn't working label May 30, 2020

Borda added this to the 0.8.0 milestone May 30, 2020

Update generative_adversarial_net.py

87555fd

williamFalcon merged commit 55fdfe3 into Lightning-AI:master May 31, 2020

lobantseff deleted the bugfix/fix-gan-example branch May 31, 2020 12:55

rohitgr7 mentioned this pull request May 31, 2020

Fix domain_template scripts #2014

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/fix gan example #2019

Bugfix/fix gan example #2019

lobantseff commented May 30, 2020 •

edited

Loading

ternaus commented May 30, 2020

williamFalcon commented May 31, 2020

ternaus commented May 31, 2020

lobantseff commented May 31, 2020

lobantseff commented May 31, 2020 •

edited

Loading

williamFalcon commented May 31, 2020

ananyahjha93 commented May 31, 2020

lobantseff commented May 31, 2020

ananyahjha93 commented May 31, 2020

Bugfix/fix gan example #2019

Bugfix/fix gan example #2019

Conversation

lobantseff commented May 30, 2020 • edited Loading

What does this PR do?

PR review

ternaus commented May 30, 2020

williamFalcon commented May 31, 2020

ternaus commented May 31, 2020

lobantseff commented May 31, 2020

lobantseff commented May 31, 2020 • edited Loading

williamFalcon commented May 31, 2020

ananyahjha93 commented May 31, 2020

lobantseff commented May 31, 2020

ananyahjha93 commented May 31, 2020

lobantseff commented May 30, 2020 •

edited

Loading

lobantseff commented May 31, 2020 •

edited

Loading