Different dataset for each client #1156
-
Hi everybody and thanks for the support. I executed a simple example of real world NVFlare. In particular I run the example hello-pt-tb using a simple infrastructure made by a server, an overseer, an admin and two clients. The example went well, but I was wondering if is it possible (not necessarily in hello-pt-tb, maybe in other examples) to run a training in which the two client have different dataset (for example client1 uses CIFAR10 and client2 uses CIFAR100). Is that possible? If yes how can I do that? Is there a configuration file in which I can specify the dataset associated to each client? While studying the file pt_learner.py, I noticed that there are few instructions that specify the dataset to use, so can I just modify the dataset expressed in that lines in one of the two clients? Thank you very much for your support and availability. Sorry for any possible error or imprecision, I'm trying to learn. I remain available for possible details requests. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
Yes, it's totally possible from NVFLARE's perspective. But for Federal learning perspective, its might be slightly complicated. The complications is related the features distribution in the dataset. For example, if both dataset has the same set of features, only different data. Then this is belong to the category of horizontal FL. In this case, two different datasets is no different from take a dataset and split into two datasets and copy the first dataset to site-1 and 2nd dataset to site-2. If the features are different on two different sites such as site-1 with features A, B, C and site-2 with E, F, G, X, Y, Z, then the problem becomes vertical FL. The FL algorithm will need different algorithms. We will provide vertical FL examples in next release |
Beta Was this translation helpful? Give feedback.
-
For CIFAR-10, each client takes a different set of indices of the original cifar-10 dataset (indices are non-overlapping across clients). The selection of the indices happens based on client_id here. In examples where we use actually different datasets from different sources, like the prostate example, again, we assume there's a datalist with the client name that can be used to load the corresponding data indices. See here. |
Beta Was this translation helpful? Give feedback.
-
FYI, it's also possible to use the deploy_map in meta.json to send different configurations to different clients. Hence, you can just pass on a different folder name as argument to your client Executor/Learner. |
Beta Was this translation helpful? Give feedback.
Yes, it's totally possible from NVFLARE's perspective. But for Federal learning perspective, its might be slightly complicated.
The complications is related the features distribution in the dataset. For example, if both dataset has the same set of features, only different data. Then this is belong to the category of horizontal FL. In this case, two different datasets is no different from take a dataset and split into two datasets and copy the first dataset to site-1 and 2nd dataset to site-2.
If the features are different on two different sites such as site-1 with features A, B, C and site-2 with E, F, G, X, Y, Z, then the problem becomes vertical FL. The FL algorithm will need different …