Collaboratively Learning Federated Models from Noisy Decentralized Data

Abstract

Federated learning (FL) has emerged as a prominent method for collaboratively training machine learning models using local data from edge devices, all while keeping data decentralized. However, accounting for the quality of data contributed by local clients remains a critical challenge in FL, as local data are often susceptible to corruption by various forms of noise and perturbations, which compromise the aggregation process and lead to a subpar global model. In this work, we focus on addressing the problem of noisy data in the input space, an under-explored area compared to the label noise. We propose a comprehensive assessment of client input in the gradient space, inspired by the distinct disparity observed between the density of gradient norm distributions of models trained on noisy and clean input data. Based on this observation, we introduce a straightforward yet effective approach to identify clients with low-quality data at the initial stage of FL. Furthermore, we propose a noise-aware FL aggregation method, namely Federated Noise-Sifting (FedNS), which can be used as a plug-in approach in conjunction with widely used FL strategies. Our extensive evaluation on diverse benchmark datasets under different federated settings demonstrates the efficacy of FedNS. Our method effortlessly integrates with existing FL strategies, enhancing the global model’s performance by up to 13.68% in IID and 15.85% in non-IID settings when learning from noisy decentralized data.

Problem Definition: Noisy Federated Learning

Problem Formulation

In an FL setup with $K$ clients, $M$ have partially noisy local datasets. Let $m$ represent noisy clients with data $D_m$ and $n$ represent clean clients with data $D_n$. Each noisy client $m$'s dataset is divided into clean $D_{m}^{\text{clean}}$ and noisy $D_{m}^{\text{noisy}}$ parts. The global objective is:

$$ \begin{split} \min_{\theta}\mathcal{L}(\theta) &= \min_{\theta} \hspace{-.5em} \sum_{\substack{n \in [K]\\ \textrm{s.t. } D_n \textrm{ is clean}}} \hspace{-1.5em}{w}_n\ell(\theta\text{;} D_n)\hspace{0.5em}+\hspace{-1em}\sum_{\substack{m \in [K]\\ \textrm{s.t. } D_m \textrm{ is noisy}}} \hspace{-1.5em}{w}_m{\ell}(\theta\text{;} D_m)\\ &= \min_{\theta} \hspace{-.5em} \sum_{\substack{n \in [K]\\ \textrm{s.t. } D_n \textrm{ is clean}}} \hspace{-1.5em}{w}_n\ell(\theta\text{;} D_n)\quad+\hspace{-1em}\sum_{\substack{m \in [K]\\ \textrm{s.t. } D_m \textrm{ is noisy}}} \hspace{-1em} {w}_m \left( \frac{\mid D_{m}^{\textrm{clean}}\mid}{\mid D_m\mid}{\ell}(\theta\text{;} D_{m}^{\textrm{clean}})\right.\\ &\quad\left.+\frac{\mid D_{m}^{\textrm{noisy}}\mid}{\mid D_m\mid}{\ell}(\theta\text{;} D_{m}^{\textrm{noisy}}) \right). \end{split} $$

The local dataset of $m$-{th} noisy client $D_m$ contains noisy local data which applies a randomly selected transformation $\tau$ from all the data transformations $\mathbf{T}$ with a specific severity level $\xi \in \{low, medium, high\}$.

$$ {D}_m^{\textrm{noisy}} = \left\{\ \eta(x, \tau, \xi) \mid x \in D_m, \tau \in \mathbf{T} \right\}. $$

$$ \text{NL}_m = \frac{\mid{D}_m^{\textrm{noisy}}\mid}{\mid D_m\mid}, $$

representing the fraction of noisy data samples over the entire local data of client $m$.

An example of distortion corruption (Contrast). We introduce image noise severity to quantify the extent of distortion in digital images. Severity levels are scaled from low to high, indicating the intensity of the noise. An illustration of different levels of image noise severity is shown in the above figure.

An example of patch-based data corruption. Here, specific data samples in the client's local dataset are substituted with such patches, while the corresponding labels remain unchanged, e.g., a random generative patch has an object label assigned to it.

Gradient Norm Analysis

Visualization of Gradient Norms on CIFAR-10 Dataset

Visualization of Gradient Norms on PathMNIST Dataset

As shown in the above figures, we identify that the gradient norm effectively encapsulates the relationship between input features and the model’s output. We apply K-means clustering on all client gradient norm vectors to distinguish two clusters. We then compute each cluster's centroid, categorizing the higher centroid value as 'clean' and the other as 'noisy'.

Results

We first conduct the comparison of average accuracy across three independent runs for different datasets under clean and noisy client data scenarios. For the noisy data scenario, we consider 5 clean clients and 15 noisy clients with 100% noise level. Models are trained with FedAvg.

Data	CIFAR10		CIFAR100		PathMNIST		FMNIST		EuroSAT		Tiny-ImageNet
Data	IID	Non-IID	IID	Non-IID	IID	Non-IID	IID	Non-IID	IID	Non-IID	IID	Non-IID
Clean	90.14%	85.52%	64.79%	62.36%	87.74%	82.55%	92.34%	89.37%	94.72%	95.12%	53.26%	52.88%
Noisy	78.62%	73.51%	44.58%	42.10%	54.80%	52.14%	88.14%	84.67%	67.39%	75.06%	24.32%	22.90%

As shown in the table below, the performance of all aggregation methods exhibits a general trend of improvement by simply plugging FedNS into the considered strategies.

Methods	CIFAR-10		CIFAR-100		PathMNIST		FMNIST		EuroSAT		Tiny-ImageNet
Methods	IID	Non-IID	IID	Non-IID	IID	Non-IID	IID	Non-IID	IID	Non-IID	IID	Non-IID
FedAvg	78.62%	73.51%	44.58%	42.10%	54.80%	52.14%	88.14%	84.67%	67.39%	75.06%	24.32%	22.90%
+ FedNS (Ours)	81.67%	78.44%	48.14%	45.94%	63.89%	62.92%	89.61%	88.53%	78.22%	80.12%	27.85%	25.93%
FedProx	79.89%	78.13%	46.75%	45.17%	57.28%	56.27%	87.15%	86.96%	70.83%	76.64%	24.90%	23.76%
+ FedNS (Ours)	82.31%	81.18%	48.27%	46.80%	60.18%	63.11%	89.12%	87.48%	76.94%	81.20%	26.48%	25.98%
FedTrimmedAvg	78.92%	77.24%	41.81%	41.25%	56.34%	54.50%	90.09%	89.95%	68.30%	74.39%	16.97%	15.48%
+ FedNS (Ours)	82.63%	82.47%	49.11%	48.32%	64.27%	63.04%	90.29%	91.57%	83.81%	80.50%	29.43%	27.46%
FedNova	81.45%	82.16%	49.48%	48.24%	55.36%	51.04%	90.65%	89.68%	73.54%	66.29%	28.62%	27.24%
+ FedNS (Ours)	88.65%	88.34%	59.19%	59.17%	80.82%	81.89%	90.57%	91.50%	93.31%	92.70%	48.50%	46.16%

We evaluate FedNS under complex noise conditions, including distortions and patch-based noises, demonstrating consistent performance improvements across all datasets. Additionally, FedNS effectively mitigates real-world issues like human annotation label noise, enhancing accuracy in all experiments.

**Mixed noise conditions**
Noise Setting	Clean	Noisy
Noise Setting	Clean	FedNova	FedNova+NS
CIFAR-10	90.14%	80.10%	85.83% (↑ 5.73%)
CIFAR-100	64.79%	45.28%	51.47% (↑ 6.19%)
Path-MNIST	92.34%	65.19%	87.81% (↑ 22.62%)
Tiny-ImageNet	53.26%	29.26%	43.30% (↑ 14.04%)

**Real-world Label Noise**
Methods	CIFAR-10N		CIFAR-100N
Methods	IID	Non-IID	IID	Non-IID
FedAvg	76.06%	69.52%	52.61%	52.57%
+ FedNS (Ours)	78.87%	71.14%	54.06%	53.85%
FedNova	76.06%	69.26%	53.32%	52.41%
+ FedNS (Ours)	84.38%	78.65%	55.54%	54.80%

By mitigating both input corruption and label noise, FedNS proves to be a valuable tool for practitioners managing real-world datasets with multiple types of data imperfections.

BibTeX

@inproceedings{li2024collaboratively,
  title={Collaboratively Learning Federated Models from Noisy Decentralized Data},
  author={Li, Haoyuan and Funk, Mathias and G{\"u}rel, Nezihe Merve and Saeed, Aaqib},
  booktitle={2024 IEEE International Conference on Big Data (BigData)},
  pages={7879--7888},
  year={2024},
  organization={IEEE}
}