We first conduct the comparison of average accuracy across three independent runs for different datasets under clean and noisy client data scenarios. For the noisy data scenario, we consider 5 clean clients and 15 noisy clients with 100% noise level. Models are trained with FedAvg.
Data |
CIFAR10 |
CIFAR100 |
PathMNIST |
FMNIST |
EuroSAT |
Tiny-ImageNet |
IID |
Non-IID |
IID |
Non-IID |
IID |
Non-IID |
IID |
Non-IID |
IID |
Non-IID |
IID |
Non-IID |
Clean |
90.14% |
85.52% |
64.79% |
62.36% |
87.74% |
82.55% |
92.34% |
89.37% |
94.72% |
95.12% |
53.26% |
52.88% |
Noisy |
78.62% |
73.51% |
44.58% |
42.10% |
54.80% |
52.14% |
88.14% |
84.67% |
67.39% |
75.06% |
24.32% |
22.90% |
As shown in the table below, the performance of all aggregation methods exhibits a general trend of improvement by simply plugging FedNS into the considered strategies.
Methods |
CIFAR-10 |
CIFAR-100 |
PathMNIST |
FMNIST |
EuroSAT |
Tiny-ImageNet |
IID |
Non-IID |
IID |
Non-IID |
IID |
Non-IID |
IID |
Non-IID |
IID |
Non-IID |
IID |
Non-IID |
FedAvg |
78.62% |
73.51% |
44.58% |
42.10% |
54.80% |
52.14% |
88.14% |
84.67% |
67.39% |
75.06% |
24.32% |
22.90% |
+ FedNS (Ours) |
81.67% |
78.44% |
48.14% |
45.94% |
63.89% |
62.92% |
89.61% |
88.53% |
78.22% |
80.12% |
27.85% |
25.93% |
FedProx |
79.89% |
78.13% |
46.75% |
45.17% |
57.28% |
56.27% |
87.15% |
86.96% |
70.83% |
76.64% |
24.90% |
23.76% |
+ FedNS (Ours) |
82.31% |
81.18% |
48.27% |
46.80% |
60.18% |
63.11% |
89.12% |
87.48% |
76.94% |
81.20% |
26.48% |
25.98% |
FedTrimmedAvg |
78.92% |
77.24% |
41.81% |
41.25% |
56.34% |
54.50% |
90.09% |
89.95% |
68.30% |
74.39% |
16.97% |
15.48% |
+ FedNS (Ours) |
82.63% |
82.47% |
49.11% |
48.32% |
64.27% |
63.04% |
90.29% |
91.57% |
83.81% |
80.50% |
29.43% |
27.46% |
FedNova |
81.45% |
82.16% |
49.48% |
48.24% |
55.36% |
51.04% |
90.65% |
89.68% |
73.54% |
66.29% |
28.62% |
27.24% |
+ FedNS (Ours) |
88.65% |
88.34% |
59.19% |
59.17% |
80.82% |
81.89% |
90.57% |
91.50% |
93.31% |
92.70% |
48.50% |
46.16% |
We evaluate FedNS under complex noise conditions, including distortions and patch-based noises, demonstrating consistent performance improvements across all datasets. Additionally, FedNS effectively mitigates real-world issues like human annotation label noise, enhancing accuracy in all experiments.
Mixed noise conditions
Noise Setting |
Clean |
Noisy |
FedNova |
FedNova+NS |
CIFAR-10 |
90.14% |
80.10% |
85.83% (↑ 5.73%) |
CIFAR-100 |
64.79% |
45.28% |
51.47% (↑ 6.19%) |
Path-MNIST |
92.34% |
65.19% |
87.81% (↑ 22.62%) |
Tiny-ImageNet |
53.26% |
29.26% |
43.30% (↑ 14.04%) |
Real-world Label Noise
Methods |
CIFAR-10N |
CIFAR-100N |
IID |
Non-IID |
IID |
Non-IID |
FedAvg |
76.06% |
69.52% |
52.61% |
52.57% |
+ FedNS (Ours) |
78.87% |
71.14% |
54.06% |
53.85% |
FedNova |
76.06% |
69.26% |
53.32% |
52.41% |
+ FedNS (Ours) |
84.38% |
78.65% |
55.54% |
54.80% |
By mitigating both input corruption and label noise, FedNS proves to be a valuable tool for practitioners managing real-world datasets with multiple types of data imperfections.