SVT lowers performance extremely #824

LSnyd · 2022-09-02T08:22:33Z

LSnyd
Sep 2, 2022

Hi all,
I added SVT to my current FL Training round, but it lowers the performance tremendously. I am aware, that DP will generally lower the performance, however, mine is lowered by almost 45%. Even when I am sharing all weights. I am still trying out different configurations of the SVT, but I just want to make sure that I am not missing anything important to generally make it work.

I am training with fedprox and my learner is based on the CIFAR10 learner provided in the examples. For SVT parameters, I am primarily playing around with espilon and the noise_var values for now, keeping fraction at 1.0.
Is there anything wrong with my current approach or do I just have to be patient to find a good SVT configuration?

This is my current config client file:


{
  "format_version": 2,
  "executors": [
    {
      "tasks": [ "train", "submit_model", "validate"],
      "executor": {
        "id": "Executor",
        "path": "nvflare.app_common.executors.learner_executor.LearnerExecutor",
        "args": {
          "learner_id": "learner"
        }
      }
    }
  ],
  "task_result_filters": [
    { 
            "tasks": [ "train" ],
            "filters": [
                {
                    "path": "nvflare.app_common.filters.svt_privacy.SVTPrivacy",
                    "args": {
                        "fraction":1.0, 
                        "epsilon":0.01, 
                        "noise_var":0.1, 
                        "gamma":1e-05, 
                        "tau":1e-06
                    }
                }
            ]
                         }
  ],
  "task_data_filters": [

  ],
  "components": [
       {
      "id": "learner",
      "path": "custom.learner.MedicalLearner",
      "args": {
        "dataset_root": "{DATASET_ROOT}",
        "aggregation_epochs": 1,
        "lr":0.0001,
        "fedproxloss_mu":1e-5
      }
    }
  ]
}

holgerroth · 2022-09-07T00:32:39Z

holgerroth
Sep 7, 2022
Maintainer

The gamma value might be too low. It causes gradients to be clipped which might severely hinder the convergence of your model. Here's an example of good values for a certain task the BraTS task. These values might not be optimal for yours though.

5 replies

holgerroth Sep 22, 2022
Maintainer

@LSnyd did you have a chance to test some other parameters?

LSnyd Sep 22, 2022
Author

Hi @holgerroth,
yes, I tried a couple different configurations, but unfortunately, the results were not any better. I also tried to train longer and to add more than 1 local epoch, but the performance was still worse by over 40%. I am still trying to figure out why.
I just started a run with only 2 clients this afternoon to make sure the performance gap is not caused by the larger amount of clients as I am training with 9 clients. I can report on that probably tomorrow. Do you have any other possible causes in mind?

Update: The performance is indeed much better with less clients.

holgerroth Sep 26, 2022
Maintainer

@ZiyueXu77 and @Can-Zhao might be able to give more pointers here.

ZiyueXu77 Sep 27, 2022
Maintainer

We did observe similar significant negative impact over the performance, and it could be sensitive to DP parameter combinations.

Can-Zhao Sep 27, 2022

Maybe a bigger gamma, smaller epsilon could help.

LSnyd · 2022-10-05T06:17:22Z

LSnyd
Oct 5, 2022
Author

Hi @holgerroth, @Can-Zhao and @ZiyueXu77,
thanks for your help so far. Unfortunately, I still haven't found a configuration for SVT leading to acceptable performance.
Therefore, I was wondering how federated averaging works within NVFlare with SVT. I can't figure it completely out by checking the InTimeAccumulateWeightedAggregator class.

Let's assume 3 of my 9 clients share the noisy gradients of one specific parameter (after SVT). Are these gradients going to be averaged by 9 clients or by 3 clients? It seems like there is only one global weighting for the parameters available, which may lead to much more noisy avg gradients than originally intended by SVT.

4 replies

holgerroth Oct 5, 2022
Maintainer

One question, are you sending full weights or weight differences? I think our BraTS example is based on weight differences. This might affect what parameters work with SVT. @ZiyueXu77 to confirm.

LSnyd Oct 5, 2022
Author

No, I am also sending weight differences (edited in the message above).

YuanTingHsieh Oct 3, 2023
Maintainer

@holgerroth I think we will aggregate all 9 in this case right?

holgerroth Oct 3, 2023
Maintainer

Yes, everything sent by the clients is being aggregated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SVT lowers performance extremely #824

{{title}}

Replies: 2 comments 9 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

SVT lowers performance extremely #824

LSnyd Sep 2, 2022

Replies: 2 comments · 9 replies

holgerroth Sep 7, 2022 Maintainer

holgerroth Sep 22, 2022 Maintainer

LSnyd Sep 22, 2022 Author

holgerroth Sep 26, 2022 Maintainer

ZiyueXu77 Sep 27, 2022 Maintainer

Can-Zhao Sep 27, 2022

LSnyd Oct 5, 2022 Author

holgerroth Oct 5, 2022 Maintainer

LSnyd Oct 5, 2022 Author

YuanTingHsieh Oct 3, 2023 Maintainer

holgerroth Oct 3, 2023 Maintainer

LSnyd
Sep 2, 2022

Replies: 2 comments 9 replies

holgerroth
Sep 7, 2022
Maintainer

holgerroth Sep 22, 2022
Maintainer

LSnyd Sep 22, 2022
Author

holgerroth Sep 26, 2022
Maintainer

ZiyueXu77 Sep 27, 2022
Maintainer

LSnyd
Oct 5, 2022
Author

holgerroth Oct 5, 2022
Maintainer

LSnyd Oct 5, 2022
Author

YuanTingHsieh Oct 3, 2023
Maintainer

holgerroth Oct 3, 2023
Maintainer