FedCV Decentralized FL object detection #918
Unanswered
TiagoPinaC
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi! I'm currently trying to implement a Decentralized FL version of YOLOv5 for object detection. I would also like to compare its performance to the Vanilla FL implementation. I have encountered some problems when trying to use FedML for this end.
A) Problems related to Vanilla FL
This is certainly because of the use of MPI.COMM_WORLD.Abort() in the fedml_comm_manager.py file. This causes all processes to be aborted even when they're not done with their tasks. This is confirmed by the fact that the logs present in runs/train/exp are all but one incomplete. I wonder if this interruption doesn't affect the weights saved from the procedure as well. I would like to know if there's another way to end the MPI simulation in a way that lets all processes finish their tasks.
By looking at the code, I believe that the reason is because in the yolov5_trainer.py the model is saved in the following way
whereas in the train.py script of the YOLOv5 model, the saving is done in the following way
That is, by explicitly defining some certain fields in the dictionary to be saved, including the 'model' field that generated the error. This is a necessary field to be included, since in the init_yolo.py file the pretrained weights are loaded in the following manner
I would like to know how can I utilize the generated weights to perform detection or further training.
B) Decentralized FL
Does this mean that to implement the decentralized version of the model, the only thing to be done would be include the trainer in this script?
I hope the explanations of the problems were clear ! Thank you in advance !
Beta Was this translation helpful? Give feedback.
All reactions