Testing in dp mode uses only one of the GPUs #1213

Ir1d · 2020-03-23T11:26:40Z

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

Run a test without training

Code sample

Modified from the conference-seed repo

trainer = Trainer(
            gpus="-1",
            distributed_backend='dp',
        )
trainer.test(model)

Expected behavior

Environment

pl version: 0.6.0
PyTorch Version (e.g., 1.0): 1.2
OS (e.g., Linux): Ubuntu
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source):
Python version: 3.6
CUDA/cuDNN version: 10.1
GPU models and configuration:
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

williamFalcon · 2020-03-23T11:29:51Z

ummm yeah, that's a bug. it should run via dp. @Ir1d want to submit a PR?
@PyTorchLightning/core-contributors any one else experience this?

Ir1d · 2020-03-23T11:31:21Z

@williamFalcon I tried wrapping the model in LightningDataParallel like the training loop did, but it tells me that LightningDataParallel object doen't have a test_step. How do I debug this?

williamFalcon · 2020-03-23T11:40:28Z

you wouldn’t wrap it yourself ever haha.
the trainer does the wrapping for you.

the trainer needs to be modified to run the test on the correct method when done this way

Ir1d · 2020-03-23T11:46:52Z

I was trying to wrap it in evaluate in pytorch_lightning/trainer/evaluation_loop.py . Do you have any idea where to wrap this func?

Ir1d · 2020-03-23T11:48:20Z

Anyway, we've find one possible workround here:
After defining a torch model, and before sending it into PL model, wrap it with nn.dataparallel.

williamFalcon · 2020-03-23T12:09:23Z

evaluate is private... you're not meant to call it directly.

call .test()

williamFalcon · 2020-03-23T12:15:31Z

lightning does the wrapping by itself...

the fact that this doesn't work, is a bug.

model = MyLightningModule.load_from_checkpoint(...)
trainer = Trainer(
            gpus="-1",
            distributed_backend='dp',
        )
trainer.test(model)

The bug needs to be addressed correctly.

It's weird because we have tests for this... double check that this is really not working for you.

Borda · 2020-03-23T12:38:02Z

evaluate is private... you're not meant to call it directly.
call .test()

so let's rename it starting with _ to be clear that it is private from it name

Ir1d · 2020-03-23T12:41:39Z

I was calling .test and its not working

Borda · 2020-03-27T12:30:40Z

@neggert may you have look at this multi GPU issue?

edenlightning · 2020-06-08T12:41:21Z

@neggert ping :)

williamFalcon · 2020-06-26T13:46:56Z

looking at this with next sprint

williamFalcon · 2020-07-10T01:27:31Z

fixed! (0.8.5)

Ir1d added bug Something isn't working help wanted Open to be worked on labels Mar 23, 2020

Borda mentioned this issue Mar 27, 2020

make evaluate private #1260

Merged

williamFalcon closed this as completed in #1260 Mar 30, 2020

Borda reopened this Mar 30, 2020

Borda added the priority: 0 High priority task label Apr 8, 2020

Borda added this to the 0.7.3 milestone Apr 8, 2020

Borda modified the milestones: 0.7.4, 0.7.5 Apr 24, 2020

Borda modified the milestones: 0.7.6, 0.8.0, 0.7.7 May 12, 2020

Borda modified the milestones: 0.7.7, 0.8.0 May 26, 2020

Borda modified the milestones: 0.8.0, 0.8.x Jun 17, 2020

williamFalcon closed this as completed Jul 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing in dp mode uses only one of the GPUs #1213

Testing in dp mode uses only one of the GPUs #1213

Ir1d commented Mar 23, 2020 •

edited by Borda

Loading

williamFalcon commented Mar 23, 2020

Ir1d commented Mar 23, 2020

williamFalcon commented Mar 23, 2020

Ir1d commented Mar 23, 2020

Ir1d commented Mar 23, 2020

williamFalcon commented Mar 23, 2020

williamFalcon commented Mar 23, 2020

Borda commented Mar 23, 2020

Ir1d commented Mar 23, 2020

Borda commented Mar 27, 2020

edenlightning commented Jun 8, 2020

williamFalcon commented Jun 26, 2020

williamFalcon commented Jul 10, 2020 •

edited

Loading

Testing in dp mode uses only one of the GPUs #1213

Testing in dp mode uses only one of the GPUs #1213

Comments

Ir1d commented Mar 23, 2020 • edited by Borda Loading

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

williamFalcon commented Mar 23, 2020

Ir1d commented Mar 23, 2020

williamFalcon commented Mar 23, 2020

Ir1d commented Mar 23, 2020

Ir1d commented Mar 23, 2020

williamFalcon commented Mar 23, 2020

williamFalcon commented Mar 23, 2020

Borda commented Mar 23, 2020

Ir1d commented Mar 23, 2020

Borda commented Mar 27, 2020

edenlightning commented Jun 8, 2020

williamFalcon commented Jun 26, 2020

williamFalcon commented Jul 10, 2020 • edited Loading

Ir1d commented Mar 23, 2020 •

edited by Borda

Loading

williamFalcon commented Jul 10, 2020 •

edited

Loading