A Scalable Big Data-Driven Distributed Deep Learning Framework for Breast Cancer Diagnosis Using Big Data Analytics

Authors

  • Muhammad Babar Robotics and Internet-of-Things Laboratory, Prince Sultan University, Saudi Arabia
  • Sarah Kaleem EIAS Data Science Lab, Prince Sultan University, Saudi Arabia
  • Mohammed El-Affendi College of Computer and Information Sciences, Prince Sultan University, Saudi Arabia
  • Zahid Khan Robotics and Internet-of-Things Laboratory, Prince Sultan University, Saudi Arabia
Volume: 15 | Issue: 5 | Pages: 27557-27562 | October 2025 | https://doi.org/10.48084/etasr.12485

Abstract

The accurate and early detection of breast cancer remains a significant challenge in medical diagnostics, primarily due to the complexity of histopathological images and the large volume of data involved. This paper presents a novel hybrid deep learning framework that leverages Big Data Analytics (BDA) and Convolutional Neural Networks (CNNs) to enhance the accuracy of breast cancer detection. The proposed system integrates three robust deep learning architectures (VGG16, VGG19, and ResNet50) trained in parallel across distributed nodes using Apache Spark, thereby accelerating computation and enabling scalable learning. This study used the BreakHis dataset, which contains 15,918 original images collected at four magnifications. To enhance generalization and class balance, extensive data augmentation and patch extraction were applied, which expanded the dataset to approximately 275,000 training samples. The hybrid model demonstrated high performance in classification tasks, achieving high precision, recall, and F1-scores compared to existing benchmarks. Key performance indicators, such as accuracy, specificity, and sensitivity, confirm the effectiveness of the model in distinguishing between benign and malignant cases. Unlike traditional monolithic CNN approaches, the proposed system leverages distributed processing to reduce training time while efficiently handling massive datasets.

Keywords:

big data, breast cancer, distributed learning, deep CNN

Downloads

Download data is not yet available.

References

C. Santucci et al., "European cancer mortality predictions for the year 2025 with focus on breast cancer," Annals of Oncology, vol. 36, no. 4, pp. 460–468, Apr. 2025.

S. E. Robertson et al., "Comparing Lung Cancer Screening Strategies in a Nationally Representative US Population Using Transportability Methods for the National Lung Cancer Screening Trial," JAMA Network Open, vol. 7, no. 1, Jan. 2024, Art. no. e2346295.

D. Mastrodicasa et al., "Use of AI in Cardiac CT and MRI: A Scientific Statement from the ESCR, EuSoMII, NASCI, SCCT, SCMR, SIIM, and RSNA," Radiology, vol. 314, no. 1, Jan. 2025, Art. no. e240516.

M. A. Wahed, M. Alqaraleh, M. S. Alzboon, and M. S. Al-Batah, "Evaluating AI and Machine Learning Models in Breast Cancer Detection: A Review of Convolutional Neural Networks (CNN) and Global Research Trends," LatIA, vol. 3, pp. 117–117, Jan. 2025.

D. Tsietso, A. Yahya, R. Samikannu, B. Qureshi, and M. Babar, "Computational Approach for Automated Segmentation and Classification of Region of Interest in Lateral Breast Thermograms," Computers, Materials and Continua, vol. 80, no. 3, pp. 4749–4765, Sep. 2024.

D. Tsietso et al., "Multi-Input Deep Learning Approach for Breast Cancer Screening Using Thermal Infrared Imaging and Clinical Data," IEEE Access, vol. 11, pp. 52101–52116, 2023.

S. Kaleem, A. Sohail, M. U. Tariq, and M. Asim, "An Improved Big Data Analytics Architecture Using Federated Learning for IoT-Enabled Urban Intelligent Transportation Systems," Sustainability, vol. 15, no. 21, Jan. 2023, Art. no. 15333.

M. T. J. Mehedy et al., "Big Data and Machine Learning in Healthcare: A Business Intelligence Approach for Cost Optimization and Service Improvement," The American Journal of Medical Sciences and Pharmaceutical Research, vol. 7, no. 03, pp. 115–135, Mar. 2025.

K. J. Merceedi and N. A. Sabry, "A Comprehensive Survey for Hadoop Distributed File System," Asian Journal of Research in Computer Science, pp. 46–57, Aug. 2021.

K. C. Burçak, Ö. K. Baykan, and H. Uğuz, "A new deep convolutional neural network model for classifying breast cancer histopathological images and the hyperparameter optimisation of the proposed model," The Journal of Supercomputing, vol. 77, no. 1, pp. 973–989, Jan. 2021.

G. Hamed, M. A. E. R. Marey, S. E. S. Amin, and M. F. Tolba, "Deep Learning in Breast Cancer Detection and Classification," in Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020), 2020, pp. 322–333.

M. Sreevani and R. Latha, "A Deep Learning with Metaheuristic Optimization-Driven Breast Cancer Segmentation and Classification Model using Mammogram Imaging," Engineering, Technology & Applied Science Research, vol. 15, no. 1, pp. 20342–20347, Feb. 2025.

A. Naz, H. Khan, I. U. Din, A. Ali, and M. Husain, "An Efficient Optimization System for Early Breast Cancer Diagnosis based on Internet of Medical Things and Deep Learning," Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15957–15962, Aug. 2024.

A. Bekkouche, M. Merzoug, M. Hadjila, and W. Ferhi, "Towards Early Breast Cancer Detection: A Deep Learning Approach," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17517–17523, Oct. 2024.

T. N. Nguyen, T. T. Nguyen, T. H. Nguyen, and B. V. Ngo, "A Robust Approach for Breast Cancer Classification from DICOM Images," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 23499–23505, Jun. 2025.

K. Gupta and N. Chawla, "Analysis of Histopathological Images for Prediction of Breast Cancer Using Traditional Classifiers with Pre-Trained CNN," Procedia Computer Science, vol. 167, pp. 878–889, Jan. 2020.

A. M. Ibraheem, K. H. Rahouma, and H. F. A. Hamed, "3PCNNB-Net: Three Parallel CNN Branches for Breast Cancer Classification Through Histopathological Images," Journal of Medical and Biological Engineering, vol. 41, no. 4, pp. 494–503, Aug. 2021.

L. Li et al., "Multi-task deep learning for fine-grained classification and grading in breast cancer histopathological images," Multimedia Tools and Applications, vol. 79, no. 21, pp. 14509–14528, Jun. 2020.

T. Abdeljawad, R. U. Din, N. Fatima, K. Shah, K. J. Ansari, and H. Alrabaiah, "Mathematical modeling of breast cancer with four stages," International Journal of Biomathematics, Apr. 2025, Art. no. 2550036.

T. Mahmood, T. Saba, and A. Rehman, "Breast cancer diagnosis with MFF-HistoNet: a multi-modal feature fusion network integrating CNNs and quantum tensor networks," Journal of Big Data, vol. 12, no. 1, Mar. 2025, Art. no. 60.

"BreakHis - Breast Cancer Histopathological Dataset." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/waseemalastal/breakhis-breast-cancer-histopathological-dataset.

Downloads

How to Cite

[1]
M. Babar, S. Kaleem, M. El-Affendi, and Z. Khan, “A Scalable Big Data-Driven Distributed Deep Learning Framework for Breast Cancer Diagnosis Using Big Data Analytics”, Eng. Technol. Appl. Sci. Res., vol. 15, no. 5, pp. 27557–27562, Oct. 2025.

Metrics

Abstract Views: 114
PDF Downloads: 49

Metrics Information