Scalable Federated Learning for Massive Medical Image Classification: Tackling Noisy and Imbalanced Data
Received: 10 May 2025 | Revised: 7 June 2025, 29 June 2025, and 3 July 2025 | Accepted: 5 July 2025 | Online: 9 August 2025
Corresponding author: Hadjir Zemmouri
Abstract
The proliferation of medical imaging data, coupled with stringent privacy regulations, necessitates scalable and reliable classification methods to address the challenges of big data. Motivated by the critical need for accurate pneumonia detection from Chest X-rays (CXR) across diverse clinical settings, this research confronts the five fundamental challenges of big data: volume, velocity, variety, value, and veracity. In such scenarios, traditional centralized methods are often constrained by data heterogeneity, computational bottlenecks, and privacy risks. To overcome these constraints, we propose a Federated Learning (FL) system that distributes model training across multiple clients, thereby ensuring data privacy while efficiently managing large-scale datasets. Our method leverages transfer learning by fine-tuning a pre-trained VGG11 model and employs FedProx regularization to mitigate client drift arising from non-Independent and non-Identically Distributed (non-IID) data distributions. Furthermore, we introduce an innovative data partitioning technique that simulates real-world conditions by generating imbalanced label distributions with a Dirichlet process and injecting Gaussian noise to mimic image quality variations. By enabling distributed local training and dynamic learning rate adjustments, our approach effectively manages high-volume, high-velocity data while preserving data privacy. Experimental results demonstrate that our proposed method efficiently aggregates diverse and noisy client updates while achieving competitive performance in pneumonia classification.
Keywords:
federated learning, big data classification, distributed training, class imbalance, pneumoniaDownloads
References
G. Gana et al., "Development and performance testing of a deep learning computer-aided diagnosis system for chest X-rays," European Respiratory Journal, vol. 60, no. suppl. 66, 2022.
P. Chakraborty and C. Tharini, "Pneumonia and Eye Disease Detection using Convolutional Neural Networks," Engineering, Technology & Applied Science Research, vol. 10, no. 3, pp. 5769–5774, Jun. 2020.
N. Kumar, A. Hashmi, M. Gupta, and A. Kundu, "Automatic Diagnosis of Covid-19 Related Pneumonia from CXR and CT-Scan Images," Engineering, Technology & Applied Science Research, vol. 12, no. 1, pp. 7993–7997, Feb. 2022.
H. Zemmouri, S. Labed, and A. Kout, "A survey of parallel clustering algorithms based on vertical scaling platforms for big data," in 2022 4th International Conference on Pattern Analysis and Intelligent Systems (PAIS), Oum El Bouaghi, Algeria, Oct. 2022, pp. 1–8.
A. Z. Tan, H. Yu, L. Cui, and Q. Yang, "Towards Personalized Federated Learning," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 12, pp. 9587–9603, Dec. 2023.
Y. Wang, M. M. Rosli, N. Musa, and F. Li, "Multi-Class Imbalanced Data Classification: A Systematic Mapping Study," Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14183–14190, Jun. 2024.
S. Ram, Y. N. Kiran, A. Bhute, and T. Khare, "Federated Learning for Accurate Labeling of Chest X-Ray Scans," in 2024 36th Conference of Open Innovations Association (FRUCT), Lappeenranta, Finland, Oct. 2024, pp. 649–654.
H. Zhu, J. Xu, S. Liu, and Y. Jin, "Federated learning on non-IID data: A survey," Neurocomputing, vol. 465, pp. 371–390, Nov. 2021.
S. Sharma, K. Guleria, and A. Dogra, "FedPneu: Federated Learning for Pneumonia Detection across Multiclient Cross-Silo Healthcare Datasets," Current Medical Imaging Reviews, vol. 21, Mar. 2025.
A. Mabrouk, R. P. D. Redondo, M. A. Elaziz, and M. Kayed, "Ensemble Federated Learning: An approach for collaborative pneumonia diagnosis," Applied Soft Computing, vol. 144, Sep. 2023, Art. no. 110500.
P. Kulkarni, A. Kanhere, P. H. Yi, and V. S. Parekh, "From Isolation to Collaboration: Federated Class-Heterogeneous Learning for Chest X-Ray Classification." arXiv, Nov. 15, 2024.
P. R. Kaur, A. Sharma, I. Singh, and R. Malhotra, "Deep Learning-Based Pneumonia Recognition from Chest X-Ray Images," International Journal of Performability Engineering, vol. 18, no. 5, 2022, Art. no. 380.
M. Nawaz, T. Nazir, J. Baili, M. A. Khan, Y. J. Kim, and J.-H. Cha, "CXray-EffDet: Chest Disease Detection and Classification from X-ray Images Using the EfficientDet Model," Diagnostics, vol. 13, no. 2, Jan. 2023, Art. no. 248.
P. Rajpurkar et al., "CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning." arXiv, 2017.
V. Iglovikov and A. Shvets, "TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation." arXiv, 2018.
C. Shorten and T. M. Khoshgoftaar, "A survey on Image Data Augmentation for Deep Learning," Journal of Big Data, vol. 6, no. 1, Dec. 2019.
N. Kumar, J. Manzar, Shivani, and S. Garg, "Underwater Image Enhancement using Deep Learning," Multimedia Tools and Applications, vol. 82, no. 30, pp. 46789–46809, Dec. 2023.
J. Lin, "On The Dirichlet Distribution," Department of Mathematics and Statistics, Queens University, 2016.
Chest X-Ray Images (Pneumonia). (2018), Kaggle. [Online]. Available: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia.
Downloads
How to Cite
License
Copyright (c) 2025 Hadjir Zemmouri, Akram Kout, Said Labed

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.