-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MLflow handler get wrong when running two run without name #6415
Comments
Hi @binliunls , What do you mean "running two bundles"? You run 2 bundles in parallel with multi-thread? Or just run 2 bundles one by one? If one by one, I think we should set Thanks. |
Hi @Nic-Ma , |
Fixes #6415 . ### Description Fix the mlflow handler bug. When running a bundle with ` MLFLowHandler` back to back without assigning the run name , the later run info will be recorded into the former run, although the former run is finished. This PR checks the status of runs and filters the finished ones. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [x] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: binliu <binliu@nvidia.com>
Describe the bug
This line in the MLFlowHandler with the
or not self.run_name
logic will lead to a bug. When running two bundles withrun_name=None
to the MLFlowHandler, the first one can be run successfully. While the 2nd one comes to this line, since it has no run name, this line will put all previous runs into a list. And the logic below will fetch the latest run and record the 2nd bundle workflow info into it, i.e. the first run.If simply remove this
or not self.run_name
logic, it will cause another issue. For example, if there are two MLFlowHandlers in one workflow, like the train workflow, which has two MLFlowHandlers for trainer and validator. Since these two handlers are initialized in different time, they will have two different defaultrun_name
, which will make the train information and the validation information of the same bundle workflow into to two different runs.To Reproduce
Steps to reproduce the behavior:
python -m monai.bundle run --config_file configs/train.json --bundle_root ./ --train#trainer#max_epochs 10 --tracking ../tracking.json
The
tracking.json
file is like:Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
Ensuring you use the relevant python executable, please paste the output of:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: