Fix configs to properly use pytorch-lightning==1.6 with GPU #234

samet-akcay · 2022-04-14T16:10:30Z

Description

Fix configs to properly use pytorch-lightning==1.6.* and train with GPU
Fixes GPU available but not used #232

Changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist

My code follows the pre-commit style and check guidelines of this project.
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing tests pass locally with my changes

ashwinvaidya17

Thanks! One minor comment but rest looks good

ashwinvaidya17 · 2022-04-15T07:07:55Z

anomalib/models/cflow/config.yaml

  fast_dev_run: false
-  gpus: 1
+  gpus: null # Set automatically


Does it mean that pytorch lightning will distribute job across all available GPUs? In which case, we might have to look at the performance.

Not sure what's the best approach here. Setting it to 1 or automatically assigning it?
Any thoughts @ashwinvaidya17, @djdameln?

I would prefer using only a single gpu by default. I feel users should use distribute only if they know what they are doing. If we distribute the training then we will have to look at the learning rate and experiments might not be reproducible across different number of GPUs.

What if no GPUs are available. Will it break or just switch to CPU training?

What if no GPUs are available. Will it break or just switch to CPU training?

It switches to CPU.

I would prefer using only a single gpu by default. I feel users should use distribute only if they know what they are doing. If we distribute the training then we will have to look at the learning rate and experiments might not be reproducible across different number of GPUs.

As far as I understand, this doesn't set it to distributed training because strategy is still null. It automatically uses a single GPU.

In that case I am fine with how it is now.

aj2563 · 2022-04-19T15:44:25Z

Since this PR is not merged yet I'm adding a comment here rather than create an issue. I checked out the branch fix/sa/configs on a fresh install of anomalib and tried to run with following command

python tools/train.py --model cflow

And got the following error:

Traceback (most recent call last):
  File "tools/train.py", line 28, in <module>
    from anomalib.models import get_model
  File "/data/home/epi/AnomalibTest/lib/python3.8/site-packages/anomalib/models/__init__.py", line 24, in <module>
    from anomalib.models.components import AnomalyModule
  File "/data/home/epi/AnomalibTest/lib/python3.8/site-packages/anomalib/models/components/__init__.py", line 17, in <module>
    from .base import AnomalyModule, DynamicBufferModule
  File "/data/home/epi/AnomalibTest/lib/python3.8/site-packages/anomalib/models/components/base/__init__.py", line 17, in <module>
    from .anomaly_module import AnomalyModule
  File "/data/home/epi/AnomalibTest/lib/python3.8/site-packages/anomalib/models/components/base/anomaly_module.py", line 24, in <module>
    from torchmetrics import F1, MetricCollection
ImportError: cannot import name 'F1' from 'torchmetrics' (/data/home/epi/AnomalibTest/lib/python3.8/site-packages/torchmetrics/__init__.py)

The error is coming from <path>/anomalib/anomalib/models/components/base/anomaly_module.py.
However if you use from from torchmetrics.classification.f_beta import F1Score it does not throw the error above (F1Score in other places in the file is replaced as well)

Package list (installed using pip install e . command):

pytorch-lightning       1.6.1 
torch                   1.11.0   
torchmetrics            0.8.0    
torchtext               0.12.0   
torchvision             0.12.0

Out of the box it is failing for me, do you see the same issue on your side?

ashwinvaidya17 · 2022-04-19T17:49:37Z

@aj2563 This is a known issue and is addressed in a different open PR. For now you can use pip install torchmetrics==0.6.0 to make it work. Here is a colab link which uses this branch for training. https://colab.research.google.com/drive/18NciqtQwlrUIiuwWn8ld_Tp_5K6kgRPH?usp=sharing The first 7 cells are relevant in your case.

…o fix/sa/configs

samet-akcay added 8 commits April 14, 2022 08:21

☑︎ Check if openvino is in config.yaml file.

fdc6898

🗑 Remove update_device_config from get_config

0842a1a

➕ devices and 🗑 gpus from test configs

fa2e02f

➕ devices and 🗑 gpus from model configs.

165ac35

🗑 terminate_on_nan from trainer configs.

2c43d6b

➕ Added new Trainer config params and 🗑 deprecated ones.

bd94354

🗑 callback

6b7173a

🗑 device and set progress bar to true

a7b58aa

samet-akcay changed the title ~~Fix configs~~ Fix configs to properly use pytorch-lightning==1.6 Apr 14, 2022

samet-akcay requested review from ashwinvaidya17 and djdameln April 14, 2022 16:11

🛠 Fix tests

9495ffb

samet-akcay changed the title ~~Fix configs to properly use pytorch-lightning==1.6~~ Fix configs to properly use pytorch-lightning==1.6 with GPU Apr 14, 2022

samet-akcay mentioned this pull request Apr 14, 2022

GPU available but not used #232

Closed

ashwinvaidya17 approved these changes Apr 15, 2022

View reviewed changes

➕ missing trainer configs

4da865a

djdameln approved these changes Apr 20, 2022

View reviewed changes

ashwinvaidya17 mentioned this pull request Apr 20, 2022

AttributeError: 'Trainer' object has no attribute 'loggers' #237

Closed

Merge branch 'development' of github.com:openvinotoolkit/anomalib int…

b04d4df

…o fix/sa/configs

samet-akcay merged commit 01370ef into development Apr 20, 2022

samet-akcay deleted the fix/sa/configs branch April 20, 2022 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix configs to properly use pytorch-lightning==1.6 with GPU #234

Fix configs to properly use pytorch-lightning==1.6 with GPU #234

samet-akcay commented Apr 14, 2022 •

edited

Loading

ashwinvaidya17 left a comment

ashwinvaidya17 Apr 15, 2022

samet-akcay Apr 19, 2022 •

edited

Loading

ashwinvaidya17 Apr 19, 2022

djdameln Apr 19, 2022

samet-akcay Apr 19, 2022

ashwinvaidya17 Apr 19, 2022

aj2563 commented Apr 19, 2022

ashwinvaidya17 commented Apr 19, 2022

Fix configs to properly use pytorch-lightning==1.6 with GPU #234

Fix configs to properly use pytorch-lightning==1.6 with GPU #234

Conversation

samet-akcay commented Apr 14, 2022 • edited Loading

Description

Changes

Checklist

ashwinvaidya17 left a comment

Choose a reason for hiding this comment

ashwinvaidya17 Apr 15, 2022

Choose a reason for hiding this comment

samet-akcay Apr 19, 2022 • edited Loading

Choose a reason for hiding this comment

ashwinvaidya17 Apr 19, 2022

Choose a reason for hiding this comment

djdameln Apr 19, 2022

Choose a reason for hiding this comment

samet-akcay Apr 19, 2022

Choose a reason for hiding this comment

ashwinvaidya17 Apr 19, 2022

Choose a reason for hiding this comment

aj2563 commented Apr 19, 2022

ashwinvaidya17 commented Apr 19, 2022

samet-akcay commented Apr 14, 2022 •

edited

Loading

samet-akcay Apr 19, 2022 •

edited

Loading