FedOpt algorithm not working as expected in cifar10 example #2314

LeandroDiL · 2023-04-28T08:10:55Z

LeandroDiL
Apr 28, 2023

Describe the bug
The FedOpt algorithm is not working as expected in cifar10 example when I change the model from the pre-existing ModerateCNN to another model like MobileNetv2 or Resnet18 and others. The problem is that the accuracy of the global model is not increasing or increasing too slow with the FedOpt algorithm while the other algorithms works just fine even changing the model.

To Reproduce

Add in 'cifar10_nets.py' the new model :
class MyModel(nn.Module):
def init(self):
super(MyModel, self).init()
model = models.mobilenet_v2(weights='DEFAULT')
model.classifier = nn.Sequential(
nn.Dropout(0.4),
nn.Linear(1280, 10),
)
self.model = model

def forward(self, x):
return self.model(x)
Import and change the model in file 'cifar10_learner.py'
Launch the example with ./run_simulator.sh cifar10_fedopt 0.1 8 8
See the results in tensorboard with tensorboard --logdir=/tmp/nvflare/sim_cifar10 under the section 'val_acc_global_model'

Expected behavior
I expect reading the algorithm proposed in Reddi, Sashank, et al. "Adaptive federated optimization." arXiv preprint arXiv:2003.00295 (2020), to obtain the same performance of FedAvg using SGD optimizer with lr = 1.0 and no scheduler. Also obtain better results changing optimizer and adding a scheduler.

Screenshots

Purple = FedAvg
Pink = FedOpt

Desktop (please complete the following information):

OS: ubuntu 22.04
Python Version 3.10
NVFlare Version 2.3.0

Ty in advance!

YuanTingHsieh · 2023-05-01T06:01:00Z

YuanTingHsieh
May 1, 2023
Maintainer

Thank you for trying out and raising the issue!

It would be nice if you can share your other experiments figure to benefit other people.

@holgerroth can you help answer this question, thanks

0 replies

LeandroDiL · 2023-05-01T07:39:25Z

LeandroDiL
May 1, 2023
Author

This is a graph from Tensorboard containing also the other experiments :

Pink = Scaffold
Dark Grey = FedProx
Yellow = FedAvg
Purple = FedOpt
All the experiments has been done using the same model and configuration. They did 20 rounds of FL and 4 local epochs for each of the 4 clients involved by each experiment. The FedOpt experiment is worst than the other posted by me before due to a different scheduler.

Ty for the support!

0 replies

holgerroth · 2023-05-01T16:10:03Z

holgerroth
May 1, 2023
Maintainer

Interesting. Just to confirm, are you using momentum on the server when using FedOpt (see here). That could explain a different behavior to FedAvg.

0 replies

LeandroDiL · 2023-05-02T07:41:41Z

LeandroDiL
May 2, 2023
Author

Ye @holgerroth, I tried using momentum with different values and I also tried to don't use it. Even if the results were changing and I was obtaining better results with some values compared to others, they were still bad results like I reached at max 0.5 acc that is pretty low compared with the other algorithms.
I also noticed that with other models like the SimpleCNN or other models built from scratch it's working fine, problems come when using pretrained CNN.
Hope this can help!

0 replies

holgerroth · 2023-05-02T17:32:37Z

holgerroth
May 2, 2023
Maintainer

That's interesting. So, the problem only comes up when using the pretrained CNN? FedOpt seems to be more sensitive to this initialization.

Have you tried reducing the local aggregation_epochs?

0 replies

LeandroDiL · 2023-05-03T06:58:49Z

LeandroDiL
May 3, 2023
Author

Ye, even reducing the local epochs of each client the behaviour stays the same (obviously worst due to the less epochs). I also tried using MobieNetv2 and ResNet18 with the same settings explained before but without param weights='DEFAULT' and it results in a static 0.1 (in the section val_acc_global_model).

0 replies

holgerroth · 2023-06-09T22:55:06Z

holgerroth
Jun 9, 2023
Maintainer

Hi @LeandroDiL, do you have any updates on this topic?

0 replies

siomvas · 2023-06-22T01:28:10Z

siomvas
Jun 22, 2023

Hi @holgerroth, I can confirm there is problematic behaviour when using anything other than the ModerateCNN and SimpleCNN. Global model validation metrics get stuck at 0.1 from the first round of aggregation.

0 replies

holgerroth · 2023-06-22T15:34:34Z

holgerroth
Jun 22, 2023
Maintainer

I see. Can you specify what models and alpha setting you are using? Are the same models working fine with FedAvg and the same alpha setting on CIFAR-10?

0 replies

siomvas · 2023-06-22T15:43:21Z

siomvas
Jun 22, 2023

Yes, this is with alpha 0.6. FedAvg & FedProx work fine. It's a dozen of models, from a ResNet-20 to a couple of Transformers, all of them break under FedOpt except for ModerateCNN and to some extent SimpleCNN. SimpleCNN underperforms, but at least it converges. The rest get stuck in terms of global model validation accuracy, but locally they do learn (local validation accuracy increases between agreggations). All trained from scratch.

0 replies

holgerroth · 2023-06-22T16:04:47Z

holgerroth
Jun 22, 2023
Maintainer

Ok. Have you tried different learning rates and momentum for the fedopt optimizer, maybe even some optimizers other than SGD? lr 1 and momentum 0 should behave identically to FedAvg with SGD optimizer.

0 replies

siomvas · 2023-06-24T01:22:33Z

siomvas
Jun 24, 2023

I have not tried multiple settings but lr 1, no momentum and no scheduler, which should be identical to FedAvg, is getting stuck around 0.1. No errors in the logger, and the same script runs fine when using FedAvg/FedProx.

Edit:

What is interesting is that the training loss and the local model accuracies are correct (yellow line is FedAvg, orange is the FedOpt equivalent):

Meanwhile the global model malfunctions, but somehow that is not propagated to the next round's local models?

0 replies

YuanTingHsieh · 2023-07-07T01:20:47Z

YuanTingHsieh
Jul 7, 2023
Maintainer

@siomvas thanks for more information,
@holgerroth is going on a vacation now, he will be back and reply later.

0 replies

holgerroth · 2023-07-11T16:20:51Z

holgerroth
Jul 11, 2023
Maintainer

Hi @LeandroDiL, @siomvas, I'm looking into this issue now. Just to confirm, have you also changed the model configuration in config_fed_server.json when running these experiments? Please attach your job configurations and code if possible.

0 replies

holgerroth · 2023-07-12T02:24:12Z

holgerroth
Jul 12, 2023
Maintainer

Okay, I was able to reproduce the behavior. It has to do with the batch norm layers of these more complex models. When updating the global model using SGD, the batch norm parameters are actually not included in self.model.named_parameters(), and therefore the optimizer doesn't update them.

The FedOpt paper also uses group norm instead of batch norm to avoid these kinds of issues: "We train a modified ResNet-18 on both datasets, where the batch normalization layers are replaced by group normalization layers (Wu & He, 2018). We use two groups in each group normalization layer. As shown by Hsieh et al. (2019), group normalization can lead to significant gains in accuracy over batch normalization in federated settings."

I provided a workaround for this issue by updating the batch norm parameters using FedAvg and only updating the trainable parameters using the FedOpt optimizer for the global model: #1851

0 replies

siomvas · 2023-07-12T11:40:17Z

siomvas
Jul 12, 2023

I pinpointed this issue/bug when trying to use SCAFFOLD since that actually (conveniently) breaks, so I could see where the error was, I will open a new bug report for that. This is what I found:

The issue is not with batch norm itself, but with the running stats:

>>> [k for k,v in mobile.named_parameters()][:5]
['conv1.weight', 'bn1.weight', 'bn1.bias', 'layers.0.conv1.weight', 'layers.0.bn1.weight']
>>> [k for k in mobile.state_dict()][:7]
['conv1.weight', 'bn1.weight', 'bn1.bias', 'bn1.running_mean', 'bn1.running_var', 'bn1.num_batches_tracked', 'layers.0.conv1.weight']

The weight and bias of BN are getting averaged, but the running stats don't, causing a non-sensical layer (note that num_batches_tracked is not used in any calculation in the default setting where BN uses momentum instead), as these are learned in-tandem client-side.

It also applies to other architectural elements too; SWIN has a relative_position_index parameter that is an integer and was also causing the same issue, it does not converge with FedOpt despite it using LayerNorm.

Correct me if I'm wrong but it seems with #1851 the weights and biases are still getting "FedOpted", while the running stats get averaged, so this should not be expected behaviour as there will be a mismatch.

A quick test with FedAdam using the proposed hparams from the FedOpt paper (client lr=0.03, server lr=0.01) using #1851 shows there is convergence, but how the mismatch in the affected layers affects model performance is unclear.

To investigate further, I tried combining FedOpt with FedBN, implemented via the task filter mechanism (adding Exclude_vars for bn parameters). But it seems currently there is another bug where FedOpt does not respect task filters. See fix which can be added in #1851.

        for name, param in self.model.named_parameters():
            param.grad = torch.tensor(-1.0 * model_diff[name]).to(self.device)
            updated_params.append(name)

should be

        for name, param in self.model.named_parameters():
            if name in model_diff:
                param.grad = torch.tensor(-1.0 * model_diff[name]).to(self.device)
                updated_params.append(name)

I believe this should remain open as not a bug but a documented issue.

0 replies

holgerroth · 2023-07-12T14:49:13Z

holgerroth
Jul 12, 2023
Maintainer

Hi @siomvas, thanks for your test and additional info. Yes, the desired behavior of batch norm layers with FedOpt is somewhat unclear. That's why many try to avoid using batch norm in FL settings as in the FedOpt paper and why I used "workaround" to describe #1851 as it will use FedOpt to optimize the global trainable parameters but use FedAvg to update any other layers such as batch norm statistics. It needs to be seen if this approach also works with SWIN architectures.

I know it's inconvenient as most of the pretrained torchvision models use batch norm but I would recommend looking into models that use group norm instead.

Thanks for pointing out the issue with using filters, I added that fix to the PR. I also updated the doc string to document the behavior when using batch norm. It's acceptable to me as we can match the performance of FedAvg using this workaround and the equivalent SGD settings (lr=1, momentum=0).

0 replies

BitCalSaul · 2024-01-24T16:28:25Z

BitCalSaul
Jan 24, 2024

Hi, @siomvas I found the similar situation as yours. I used Adam as an optimizer and Swin as the model for Cifar10. However, with the first epoch done, the loss, acc1, and acc5 never got better. I changed to a much smaller model ResNet56 from timm, and the results got very good as expected.
It's really strange that Swin didn't work at this case.
The loss and evaluation are shown as below.

0 replies

holgerroth · 2024-01-24T16:40:31Z

holgerroth
Jan 24, 2024
Maintainer

@BitCalSaul, converted this to an open discussion around FedOpt. Did you confirm that the SWIN architecture can train a good model in centralized training? From your curves it looks like it doesn't converge at all.

0 replies

BitCalSaul · 2024-01-24T16:53:35Z

BitCalSaul
Jan 24, 2024

Hi @holgerroth , I googled this situation and found this issue. I'm not sure if I understand centralized training correct, but I guess it means training a model in one GPU?
Yes, unfortunately it doesn't converge at all. From some resource like this paper https://openreview.net/pdf?id=SCN8UaetXx, it claimed it's difficult to get a very good result for ViTs. On the other hand, someone still got a good result like this link https://github.com/kentaroy47/vision-transformers-cifar10. I'm quite confused by this conflict.

2 replies

holgerroth Jan 24, 2024
Maintainer

By "centralized," I mean training the model on this task without using FL. What implementation of SWIN are you using?

holgerroth Jan 24, 2024
Maintainer

Can you use a VIT architecture that is pretrained, e.g. on ImageNet? I would assume that to work better if it can be transferred to cifar10.

siomvas · 2024-01-24T18:22:03Z

siomvas
Jan 24, 2024

Hello,

I can attest to i) the nvflare bug having been resolved ii) SWIN-T with random init weights performing around the 40% mark on CIFAR-10 with the inputs upscaled to (224,224). iii) SWIN-T with Imagenet weights getting >90%.

Without any code snippet it's difficult to comment on what is wrong with your implementation. Does FedAvg work?

I am not familiar with the paper or the repo you mentioned and they don't seem to be placed in the FL context, if you are looking to understand the learning dynamics of SotA architectures when used for FL you might be interested in the following recent work (disclaimer, the second paper is mine):
[1] Handling Data Heterogeneity via Architectural Design for Federated Visual Recognition
[2] ARIA: On the interaction between Architectures, Aggregation methods and Initializations in federated visual classification

0 replies

BitCalSaul · 2024-01-24T22:04:34Z

BitCalSaul
Jan 24, 2024

Hi @holgerroth @siomvas , the issue has been addressed. The code for the implementation of Swin came from the official repo. I tried to use it to do Cifar10. The bad results or said the stuck loss came from the layernorm in the PatchMerging module. When I removed this layernorm, the loss didn't get stuck anymore. It claims that the model for big dataset doesn't necessarily work well in the small dataset. Thank you for your guys attention.

0 replies

falibabaei · 2024-11-26T16:09:29Z

falibabaei
Nov 26, 2024

Hi @holgerroth,
I just noticed this issue. In our application, we are using U-Net from https://github.com/qubvel/segmentation_models, and we are experiencing the same problem. Even though we handle batch normalization using the FedAvg strategy, any learning rate below 1.0 and momentum close to zero causes the model's performance to drop significantly after the first round.

I even froze all the batch normalization layers during the model update in FedOpt and updated them using the FedAvg strategy. However, the performance drop still occurs. It is weird

0 replies

holgerroth · 2024-11-26T16:40:15Z

holgerroth
Nov 26, 2024
Maintainer

@falibabaei, yes, FedOpt can be a bit tricky to get to work. I would first recommend setting lr=1 and momentum=0 with SGD as the optimizer to see if you can reproduce the FedAvg results using the FedOpt configuration. Then, you can try different optimizers on the server to see if you can push the performance. If that doesn't help, trying group norm instead of batch norm might be the better strategy. Without any normalization, it will be challenging to train a good model.

3 replies

falibabaei Nov 26, 2024

We have tested with lr=1.0 and momentum=0, which produces similar results to FedAvg. Additionally, we have experimented with various optimizers, learning rates, and momentum values, but there has been no improvement in performance. As mentioned, we excluded the Batch Normalization (BN) layers from the trainable layers in the update_model function for tf/fedopt using the following:

for layer in global_model_tf.layers:
    if isinstance(layer, tf.keras.layers.BatchNormalization):
        layer.trainable = False

We updated the BN layers using the FedAvg strategy. With this, only the convolutional layers were updated using the optimizer. However, the same performance drop persists after the first round. We will try group norm to see if it helps

falibabaei Nov 27, 2024

Interestingly, when we replaced the BN layers with GN layers, the performance improved. In this case, there are no non-trainable variables.

holgerroth Nov 27, 2024
Maintainer

That's good news. It's also in line with previous research, e.g. https://dl.acm.org/doi/abs/10.1145/3565010.3569062

FedOpt algorithm not working as expected in cifar10 example #2314

Replies: 24 comments · 5 replies

YuanTingHsieh May 1, 2023 Maintainer

LeandroDiL May 1, 2023 Author

holgerroth May 1, 2023 Maintainer

LeandroDiL May 2, 2023 Author

holgerroth May 2, 2023 Maintainer

LeandroDiL May 3, 2023 Author

holgerroth Jun 9, 2023 Maintainer

holgerroth Jun 22, 2023 Maintainer

holgerroth Jun 22, 2023 Maintainer

YuanTingHsieh Jul 7, 2023 Maintainer

holgerroth Jul 11, 2023 Maintainer

holgerroth Jul 12, 2023 Maintainer

holgerroth Jul 12, 2023 Maintainer

holgerroth Jan 24, 2024 Maintainer

holgerroth Jan 24, 2024 Maintainer

holgerroth Jan 24, 2024 Maintainer

holgerroth Nov 26, 2024 Maintainer

holgerroth Nov 27, 2024 Maintainer

Replies: 24 comments 5 replies

YuanTingHsieh
May 1, 2023
Maintainer

LeandroDiL
May 1, 2023
Author

holgerroth
May 1, 2023
Maintainer

LeandroDiL
May 2, 2023
Author

holgerroth
May 2, 2023
Maintainer

LeandroDiL
May 3, 2023
Author

holgerroth
Jun 9, 2023
Maintainer

holgerroth
Jun 22, 2023
Maintainer

holgerroth
Jun 22, 2023
Maintainer

YuanTingHsieh
Jul 7, 2023
Maintainer

holgerroth
Jul 11, 2023
Maintainer

holgerroth
Jul 12, 2023
Maintainer

holgerroth
Jul 12, 2023
Maintainer

holgerroth
Jan 24, 2024
Maintainer

holgerroth Jan 24, 2024
Maintainer

holgerroth Jan 24, 2024
Maintainer

holgerroth
Nov 26, 2024
Maintainer

holgerroth Nov 27, 2024
Maintainer