FedCV Decentralized FL object detection #918

TiagoPinaC · 2023-04-20T16:47:22Z

TiagoPinaC
Apr 20, 2023

Hi! I'm currently trying to implement a Decentralized FL version of YOLOv5 for object detection. I would also like to compare its performance to the Vanilla FL implementation. I have encountered some problems when trying to use FedML for this end.

A) Problems related to Vanilla FL

When running the simulation .sh file from the object detection folder I get the following error message

This is certainly because of the use of MPI.COMM_WORLD.Abort() in the fedml_comm_manager.py file. This causes all processes to be aborted even when they're not done with their tasks. This is confirmed by the fact that the logs present in runs/train/exp are all but one incomplete. I wonder if this interruption doesn't affect the weights saved from the procedure as well. I would like to know if there's another way to end the MPI simulation in a way that lets all processes finish their tasks.

When trying to run a simulation using weights produced by a previous simulation (or when trying to use the weights produced by the simulation with the detect.py script of YOLOv5) I get the error

By looking at the code, I believe that the reason is because in the yolov5_trainer.py the model is saved in the following way

whereas in the train.py script of the YOLOv5 model, the saving is done in the following way

That is, by explicitly defining some certain fields in the dictionary to be saved, including the 'model' field that generated the error. This is a necessary field to be included, since in the init_yolo.py file the pretrained weights are loaded in the following manner

I would like to know how can I utilize the generated weights to perform detection or further training.

I would like to know how to collect the training statistics in wandb. I've managed to connect my project to the running code, but i don't receive any statistics regarding the loss or the accuracy. I searched in the code for the lines that perform the logging to wandb but I couldn't find it. Is it implemented? If not, how did you collect these statistics for the benchmarks presented in the FedCV paper for object detection? I would like to reproduce those results.

B) Decentralized FL

I would like to understand if there's already an implementation of Decentralized YOLOv5 in FedCV. From what I understood, the backbone is implemented by changing in the config file the federated_optimizer to "decentralized_fl". But it doesn't seem to implement the YOLOv5 model specified in the same file. By reading the code in the files, I've seen that the decentralized_worker.py script implements a defined function in the train() method, instead of calling the train() function of the suitable trainer (yolov5_trainer in this case).

Does this mean that to implement the decentralized version of the model, the only thing to be done would be include the trainer in this script?

I hope the explanations of the problems were clear ! Thank you in advance !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FedCV Decentralized FL object detection #918

{{title}}

Replies: 0 comments

Select a reply

FedCV Decentralized FL object detection #918

TiagoPinaC Apr 20, 2023

Replies: 0 comments

TiagoPinaC
Apr 20, 2023