-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MonaiAlgo's convert_global_weights broken for multi-gpu clients #6303
Comments
Maybe, this condition should be raising an error rather than just a warning. Then the test would have failed. |
Hi @holgerroth , I think this issue is caused by wrong test code instead of Thanks. |
Sure, changed to Thanks. |
part of #5821 Fixes #6303 ### Description This PR simplified the MONAI FL `MonaiAlgo` module to leverage `BundleWorkflow`. The main point is to decouple the bundle read / write related logic with FL module and use predefined required-properties. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: Nic Ma <nma@nvidia.com> Signed-off-by: monai-bot <monai.miccai2019@gmail.com> Co-authored-by: Holger Roth <hroth@nvidia.com> Co-authored-by: monai-bot <monai.miccai2019@gmail.com>
Describe the bug
If the server initializes the model in a single-GPU/CPU environment, but clients use multi-gpu training, we get a key mismatch in this function.
Ideally, we can also update the below test to fail when this warning happens.
To Reproduce
Steps to reproduce the behavior:
Run test
python3 -m tests.test_fl_monai_algo_dist
Expected behavior
Should add a check similar to the one in
copy_model_state()
util:Screenshots
If applicable, add screenshots to help explain your problem.
Environment
Ensuring you use the relevant python executable, please paste the output of:
Additional context
n/a
The text was updated successfully, but these errors were encountered: