Three Pillars of FedNS:

  • Noise Identification: FedNS identifies noisy clients in the first training round (one-shot).
  • Resilient Aggregation: A Resilient strategy that minimizes the impact of noisy clients, ensuring robust model performance.
  • Data Confidentiality: Shares only scalar gradient norms to keep data confidential.

Abstract

Federated learning (FL) has emerged as a prominent method for collaboratively training machine learning models using local data from edge devices, all while keeping data decentralized. However, accounting for the quality of data contributed by local clients remains a critical challenge in FL, as local data are often susceptible to corruption by various forms of noise and perturbations, which compromise the aggregation process and lead to a subpar global model. In this work, we focus on addressing the problem of noisy data in the input space, an under-explored area compared to the label noise. We propose a comprehensive assessment of client input in the gradient space, inspired by the distinct disparity observed between the density of gradient norm distributions of models trained on noisy and clean input data. Based on this observation, we introduce a straightforward yet effective approach to identify clients with low-quality data at the initial stage of FL. Furthermore, we propose a noise-aware FL aggregation method, namely Federated Noise-Sifting (FedNS), which can be used as a plug-in approach in conjunction with widely used FL strategies. Our extensive evaluation on diverse benchmark datasets under different federated settings demonstrates the efficacy of FedNS. Our method effortlessly integrates with existing FL strategies, enhancing the global model’s performance by up to 13.68% in IID and 15.85% in non-IID settings when learning from noisy decentralized data.

Problem Definition: Noisy Federated Learning

Problem Formulation

In an FL setup with $K$ clients, $M$ have partially noisy local datasets. Let $m$ represent noisy clients with data $D_m$ and $n$ represent clean clients with data $D_n$. Each noisy client $m$'s dataset is divided into clean $D_{m}^{\text{clean}}$ and noisy $D_{m}^{\text{noisy}}$ parts. The global objective is:

$$ \begin{split} \min_{\theta}\mathcal{L}(\theta) &= \min_{\theta} \hspace{-.5em} \sum_{\substack{n \in [K]\\ \textrm{s.t. } D_n \textrm{ is clean}}} \hspace{-1.5em}{w}_n\ell(\theta\text{;} D_n)\hspace{0.5em}+\hspace{-1em}\sum_{\substack{m \in [K]\\ \textrm{s.t. } D_m \textrm{ is noisy}}} \hspace{-1.5em}{w}_m{\ell}(\theta\text{;} D_m)\\ &= \min_{\theta} \hspace{-.5em} \sum_{\substack{n \in [K]\\ \textrm{s.t. } D_n \textrm{ is clean}}} \hspace{-1.5em}{w}_n\ell(\theta\text{;} D_n)\quad+\hspace{-1em}\sum_{\substack{m \in [K]\\ \textrm{s.t. } D_m \textrm{ is noisy}}} \hspace{-1em} {w}_m \left( \frac{\mid D_{m}^{\textrm{clean}}\mid}{\mid D_m\mid}{\ell}(\theta\text{;} D_{m}^{\textrm{clean}})\right.\\ &\quad\left.+\frac{\mid D_{m}^{\textrm{noisy}}\mid}{\mid D_m\mid}{\ell}(\theta\text{;} D_{m}^{\textrm{noisy}}) \right). \end{split} $$

The local dataset of $m$-{th} noisy client $D_m$ contains noisy local data which applies a randomly selected transformation $\tau$ from all the data transformations $\mathbf{T}$ with a specific severity level $\xi \in \{low, medium, high\}$.

$$ {D}_m^{\textrm{noisy}} = \left\{\ \eta(x, \tau, \xi) \mid x \in D_m, \tau \in \mathbf{T} \right\}. $$
$$ \text{NL}_m = \frac{\mid{D}_m^{\textrm{noisy}}\mid}{\mid D_m\mid}, $$

representing the fraction of noisy data samples over the entire local data of client $m$.

Gradient Norm Analysis

Visualization of Gradient Norms on CIFAR-10 Dataset
Visualization of Gradient Norms on PathMNIST Dataset

As shown in the above figures, we identify that the gradient norm effectively encapsulates the relationship between input features and the model’s output. We apply K-means clustering on all client gradient norm vectors to distinguish two clusters. We then compute each cluster's centroid, categorizing the higher centroid value as 'clean' and the other as 'noisy'.

Results

We first conduct the comparison of average accuracy across three independent runs for different datasets under clean and noisy client data scenarios. For the noisy data scenario, we consider 5 clean clients and 15 noisy clients with 100% noise level. Models are trained with FedAvg.

Data CIFAR10 CIFAR100 PathMNIST FMNIST EuroSAT Tiny-ImageNet
IID Non-IID IID Non-IID IID Non-IID IID Non-IID IID Non-IID IID Non-IID
Clean 90.14% 85.52% 64.79% 62.36% 87.74% 82.55% 92.34% 89.37% 94.72% 95.12% 53.26% 52.88%
Noisy 78.62% 73.51% 44.58% 42.10% 54.80% 52.14% 88.14% 84.67% 67.39% 75.06% 24.32% 22.90%

As shown in the table below, the performance of all aggregation methods exhibits a general trend of improvement by simply plugging FedNS into the considered strategies.

Methods CIFAR-10 CIFAR-100 PathMNIST FMNIST EuroSAT Tiny-ImageNet
IID Non-IID IID Non-IID IID Non-IID IID Non-IID IID Non-IID IID Non-IID
FedAvg 78.62% 73.51% 44.58% 42.10% 54.80% 52.14% 88.14% 84.67% 67.39% 75.06% 24.32% 22.90%
+ FedNS (Ours) 81.67% 78.44% 48.14% 45.94% 63.89% 62.92% 89.61% 88.53% 78.22% 80.12% 27.85% 25.93%
FedProx 79.89% 78.13% 46.75% 45.17% 57.28% 56.27% 87.15% 86.96% 70.83% 76.64% 24.90% 23.76%
+ FedNS (Ours) 82.31% 81.18% 48.27% 46.80% 60.18% 63.11% 89.12% 87.48% 76.94% 81.20% 26.48% 25.98%
FedTrimmedAvg 78.92% 77.24% 41.81% 41.25% 56.34% 54.50% 90.09% 89.95% 68.30% 74.39% 16.97% 15.48%
+ FedNS (Ours) 82.63% 82.47% 49.11% 48.32% 64.27% 63.04% 90.29% 91.57% 83.81% 80.50% 29.43% 27.46%
FedNova 81.45% 82.16% 49.48% 48.24% 55.36% 51.04% 90.65% 89.68% 73.54% 66.29% 28.62% 27.24%
+ FedNS (Ours) 88.65% 88.34% 59.19% 59.17% 80.82% 81.89% 90.57% 91.50% 93.31% 92.70% 48.50% 46.16%

We evaluate FedNS under complex noise conditions, including distortions and patch-based noises, demonstrating consistent performance improvements across all datasets. Additionally, FedNS effectively mitigates real-world issues like human annotation label noise, enhancing accuracy in all experiments.

Mixed noise conditions
Noise Setting Clean Noisy
FedNova FedNova+NS
CIFAR-10 90.14% 80.10% 85.83% (↑ 5.73%)
CIFAR-100 64.79% 45.28% 51.47% (↑ 6.19%)
Path-MNIST 92.34% 65.19% 87.81% (↑ 22.62%)
Tiny-ImageNet 53.26% 29.26% 43.30% (↑ 14.04%)
Real-world Label Noise
Methods CIFAR-10N CIFAR-100N
IID Non-IID IID Non-IID
FedAvg 76.06% 69.52% 52.61% 52.57%
+ FedNS (Ours) 78.87% 71.14% 54.06% 53.85%
FedNova 76.06% 69.26% 53.32% 52.41%
+ FedNS (Ours) 84.38% 78.65% 55.54% 54.80%

By mitigating both input corruption and label noise, FedNS proves to be a valuable tool for practitioners managing real-world datasets with multiple types of data imperfections.

BibTeX

@inproceedings{li2024collaboratively,
  title={Collaboratively Learning Federated Models from Noisy Decentralized Data},
  author={Li, Haoyuan and Funk, Mathias and G{\"u}rel, Nezihe Merve and Saeed, Aaqib},
  booktitle={2024 IEEE International Conference on Big Data (BigData)},
  pages={7879--7888},
  year={2024},
  organization={IEEE}
}