Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA_VISIBLE_DEVICES environ setting not work in ver. 0.5.0 #2161

Closed
ChenglongWang opened this issue May 8, 2021 · 5 comments · Fixed by #2174
Closed

CUDA_VISIBLE_DEVICES environ setting not work in ver. 0.5.0 #2161

ChenglongWang opened this issue May 8, 2021 · 5 comments · Fixed by #2174
Labels
bug Something isn't working

Comments

@ChenglongWang
Copy link

Describe the bug
I'm using os.environ["CUDA_VISIBLE_DEVICES"] = '0' to set available GPU instead of specifing device like cuda:0.
Everthing works fine until updating to ver. 0.5.0. Maybe some GPU-related operations were added in the latest version during import phase.

To Reproduce
Steps to reproduce the behavior:

  1. Use one tutorial for example.
  2. Default behavior: All GPUs are detected and only one gpu is actually used
  3. Add os.environ["CUDA_VISIBLE_DEVICES"] = '0' after def main(tempdir): on line 31.
  • ver. 0.4.x : Only GPU-0 is detected and used.
  • ver. 0.5.0 : All GPUs are detected.

Environment
MONAI version: 0.5.0
Numpy version: 1.19.4
Pytorch version: 1.6.0
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: 2707407

Optional dependencies: Pytorch Ignite version: 0.4.4
Nibabel version: 3.2.1
scikit-image version: 0.17.2
Pillow version: 8.0.1
Tensorboard version: 2.5.0a20201221
gdown version: NOT INSTALLED or UNKNOWN VERSION. TorchVision version: 0.7.0
ITK version: 5.1.2
tqdm version: 4.54.1
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: 5.7.3

@rijobro
Copy link
Contributor

rijobro commented May 10, 2021

Hi @ChenglongWang what metric are you using to determine which GPU is being used? Are you simply using nvidia-smi and observing that both GPUs are being used?

@rijobro
Copy link
Contributor

rijobro commented May 10, 2021

Also I think it's unlikely that MONAI is the cause of your problem, since MONAI doesn't (and I don't think ever has) use the variable CUDA_VISIBLE_DEVICES. It can be set with monai.config.deviceconfig.set_visible_devices(0), but this will just call os.environ["CUDA_VISIBLE_DEVICES"] = '0'. The rest gets handled by pytorch.

If you think I've missed something, would it be possible to create a minimum example that highlights your point? Preferably using all the same versions for the dependencies and simply switching between 0.4.x and 0.5.x? Thanks.

@rijobro
Copy link
Contributor

rijobro commented May 10, 2021

See this slightly related issue: #2167.

I think if you add to the top of your script:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = x

and then have the rest of your script as normal will work. If you set CUDA_VISIBLE_DEVICES after torch has been imported (and/or monai, since monai imports torch), your changes won't be used.

@ChenglongWang
Copy link
Author

Hi @rijobro . Thank you for your kind help. Put the os.environ["CUDA_VISIBLE_DEVICES"] = x to the top will definitely fix this problem. However, in my case, the gpu is specified online, which means the gpu id is one argument of the program, and cannot be hard-coded on the top of script.

For reproduction, you can simply test on one tutorial (ex. https://github.com/Project-MONAI/tutorials/blob/master/2d_segmentation/torch/unet_training_array.py).

See this slightly related issue: #2167.

I think if you add to the top of your script:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = x

and then have the rest of your script as normal will work. If you set CUDA_VISIBLE_DEVICES after torch has been imported (and/or monai, since monai imports torch), your changes won't be used.

BTW, put the os.environ["CUDA_VISIBLE_DEVICES"] = x after the import torch would not cause the problem. Only GPU-related imports will make the cuda environ setting invalid. For example, you can simply move the monai-related imports after the os.environ["CUDA_VISIBLE_DEVICES"] = x, left torch-related imports on the top, like:

import torch
from PIL import Image
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

def main(tempdir):
    os.environ["CUDA_VISIBLE_DEVICES"] = '0'

    import monai
    from monai.data import ArrayDataset, create_test_image_2d
    from monai.inferers import sliding_window_inference
    from monai.metrics import DiceMetric
    from monai.transforms import (
        Activations,
        AddChannel,
        AsDiscrete,
        Compose,
        LoadImage,
        RandRotate90,
        RandSpatialCrop,
        ScaleIntensity,
        ToTensor,
    )

This will also fix the problem in the v0.5.0. So I'm considering the problem is caused by MONAI.

@rijobro rijobro mentioned this issue May 11, 2021
6 tasks
@rijobro
Copy link
Contributor

rijobro commented May 11, 2021

Think i found the bug and fixed it, see #2174.

@Nic-Ma Nic-Ma added the bug Something isn't working label May 11, 2021
@qingpeng9802 qingpeng9802 mentioned this issue Aug 5, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants