-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Convert SyncBN to BN when training on DP #772
Conversation
Codecov Report
@@ Coverage Diff @@
## master #772 +/- ##
==========================================
+ Coverage 89.02% 89.05% +0.03%
==========================================
Files 111 111
Lines 6043 6051 +8
Branches 969 969
==========================================
+ Hits 5380 5389 +9
Misses 467 467
+ Partials 196 195 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
May move to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the conflict.
We can import from mmcv after it merged. |
Can import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please upgrade mmcv requirement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/open-mmlab/mmsegmentation/blob/master/docs/get_started.md#installation
Update mmcv dependence of master branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [Fix] Convert SyncBN to BN when training on DP. * Modify SyncBN2BN. * Add SyncBN2BN unit test. * Resolve some comments. * use mmcv official revert_sync_batchnorm * Remove local syncbn2bn unit tests. * Update mmcv version. * Fix bugs of gather model tools. * Modify warnings. * Modify docker mmcv version. * Update mmcv version table.
* add accelerate to load models with smaller memory footprint * remove low_cpu_mem_usage as it is reduntant * move accelerate init weights context to modelling utils * add test to ensure results are the same when loading with accelerate * add tests to ensure ram usage gets lower when using accelerate * move accelerate logic to single snippet under modelling utils and remove it from configuration utils * format code using to pass quality check * fix imports with isor * add accelerate to test extra deps * only import accelerate if device_map is set to auto * move accelerate availability check to diffusers import utils * format code * add device map to pipeline abstraction * lint it to pass PR quality check * fix class check to use accelerate when using diffusers ModelMixin subclasses * use low_cpu_mem_usage in transformers if device_map is not available * NoModuleLayer * comment out tests * up * uP * finish * Update src/diffusers/pipelines/stable_diffusion/safety_checker.py * finish * uP * make style Co-authored-by: Pi Esposito <piero.skywalker@gmail.com>
* modify stat.py merge_docs * unify merge_docs style * fix bugs * fix bugs
Incompatible between DP and SyncBN
run this command without setting
--launcher
:The python environment will report errors about process group:
This error is caused by
SyncBN
. WhenSyncBN.training
is True,SyncBN
need to initprocess_group
. However,process_group
is only valid whenDDP
.The situation that SyncBN is valid:
process_group
is not None ortorch.distributed.group.WORLD
is notNone
;SyncBN.training
is False orSyncBN.eval()
;Based on those mentioned above, we convert SyncBN to BN when training on DP.
This PR is blocked by MMCV #1253
MMCV PR #1253 may be released in MMCV 1.3.13 version, the mmcv compatibility of mmseg 0.18 will larger than 1.3.13.
The CI will pass when MMCV 1.3.13 version release