Enhancing Regression Accuracy and Reliability with Ensemble Learning and Iterative Hyperparameter Optimization
Received: 3 April 2025 | Revised: 23 April 2025, 5 May 2025, and 12 May 2025 | Accepted: 15 May 2025 | Online: 30 May 2025
Corresponding author: Thanh Ngoc Tran
Abstract
Ensemble learning models such as XGBoost, LightGBM, and CatBoost are gaining popularity for regression predictions. As their accuracy significantly depends on hyperparameters, identifying the optimal hyperparameters plays a crucial role. This study proposes a new approach by integrating ensemble learning models with hyperparameter optimization algorithms to assess their effectiveness compared to traditional machine learning models such as RF, SVR, and KNN. This study also conducted multiple repetitions of the hyperparameter optimization process on the Boston Housing Price and Bike Sharing Demand datasets, supported by statistical analyses of the results to enhance the reliability of the findings compared to a single run. The results demonstrate that applying ensemble learning models helps reduce prediction errors, increasing the models' accuracy compared to traditional ML models. Furthermore, comparisons indicate that repeating iterations n times improves the reliability of the results.
Keywords:
XGBoost, LightGBM, CatBoost, HPO Algorithms, MLDownloads
References
J. Brownlee, Master Machine Learning Algorithms: Discover How They Work and Implement Them From Scratch. Machine Learning Mastery, 2016.
I. Vasilev, D. Slater, G. Spacagna, P. Roelants, and V. Zocca, Python Deep Learning: Exploring deep learning techniques and neural network architectures with PyTorch, Keras, and TensorFlow, 2nd Edition. Birmingham, UK: Packt Publishing, 2019.
N. Zhang, Z. Li, X. Zou, and S. M. Quiring, "Comparison of three short-term load forecast models in Southern California," Energy, vol. 189, Dec. 2019, Art. no. 116358. DOI: https://doi.org/10.1016/j.energy.2019.116358
Q. Zhao, Z. Ye, Y. Su, and D. Ouyang, "Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques," Acta Pharmaceutica Sinica B, vol. 9, no. 6, pp. 1241–1252, Nov. 2019. DOI: https://doi.org/10.1016/j.apsb.2019.04.004
Z. Tan, J. Zhang, Y. He, Y. Zhang, G. Xiong, and Y. Liu, "Short-Term Load Forecasting Based on Integration of SVR and Stacking," IEEE Access, vol. 8, pp. 227719–227728, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3041779
P. Meesad and R. I. Rasel, "Predicting stock market price using support vector regression," in 2013 International Conference on Informatics, Electronics and Vision (ICIEV), Dhaka, Bangladesh, May 2013, pp. 1–6. DOI: https://doi.org/10.1109/ICIEV.2013.6572570
F. Martínez, M. P. Frías, M. D. Pérez, and A. J. Rivera, "A methodology for applying k-nearest neighbor to time series forecasting," Artificial Intelligence Review, vol. 52, no. 3, pp. 2019–2037, Oct. 2019. DOI: https://doi.org/10.1007/s10462-017-9593-z
S. B. Imandoust and M. Bolandraftar, "Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: Theoretical Background," International Journal of Engineering Research and Applications, vol. 3, no. 5, 2013.
E. Al Daoud, "Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset," International Journal of Computer and Information Engineering, vol. 13, no. 1, pp. 6–10, 2019.
W. F. Zhou, J. G. Wang, L. F. Deng, Y. Yao, and J. L. Liu, "Terminal Temperature Prediction of Molten Steel in VD Furnace based on XGBoost and LightGBM Algorithms," in 2021 40th Chinese Control Conference (CCC), Shanghai, China, Jul. 2021, pp. 6289–6294. DOI: https://doi.org/10.23919/CCC52363.2021.9550444
T. N. Tran and Q. D. Nguyen, "Research on the Influence of Genetic Algorithm Parameters on XGBoost in Load Forecasting," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 18849–18854, Dec. 2024. DOI: https://doi.org/10.48084/etasr.8863
N. Naineni, "State-of-the-Art MPI Allreduce Implementations for Distributed Machine Learning: A Survey." OSF, Sep. 24, 2024. DOI: https://doi.org/10.31219/osf.io/esm7q
L. Yang and A. Shami, "On hyperparameter optimization of machine learning algorithms: Theory and practice," Neurocomputing, vol. 415, pp. 295–316, Nov. 2020. DOI: https://doi.org/10.1016/j.neucom.2020.07.061
S. Punia, Nikolopoulos ,Konstantinos, Singh ,Surya Prakash, Madaan ,Jitendra K., and K. and Litsiou, "Deep learning with long short-term memory networks and random forests for demand forecasting in multi-channel retail," International Journal of Production Research, vol. 58, no. 16, pp. 4964–4979, Aug. 2020. DOI: https://doi.org/10.1080/00207543.2020.1735666
I. H. Sarker, "Machine Learning: Algorithms, Real-World Applications and Research Directions," SN Computer Science, vol. 2, no. 3, Mar. 2021, Art. no. 160. DOI: https://doi.org/10.1007/s42979-021-00592-x
M. Jiang, S. Jiang, L. Zhu, Y. Wang, W. Huang, and H. Zhang, "Study on Parameter Optimization for Support Vector Regression in Solving the Inverse ECG Problem," Computational and Mathematical Methods in Medicine, vol. 2013, no. 1, 2013, Art. no. 158056. DOI: https://doi.org/10.1155/2013/158056
K. Smets, B. Verdonk, and E. M. Jordaan, "Evaluation of Performance Measures for SVR Hyperparameter Selection," in 2007 International Joint Conference on Neural Networks, Dec. 2007, pp. 637–642. DOI: https://doi.org/10.1109/IJCNN.2007.4371031
T. Parhizkar, E. Rafieipour, and A. Parhizkar, "Evaluation and improvement of energy consumption prediction models using principal component analysis based feature reduction," Journal of Cleaner Production, vol. 279, Jan. 2021, Art. no. 123866. DOI: https://doi.org/10.1016/j.jclepro.2020.123866
J. Bergstra and Y. Bengio, "Random search for hyper-parameter optimization," The Journal of Machine Learning Research, vol. 13, pp. 281–305, Oct. 2012.
K. Eggensperger, F. Hutter, H. Hoos, and K. Leyton-Brown, "Efficient Benchmarking of Hyperparameter Optimizers via Surrogates," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, no. 1, Feb. 2015. DOI: https://doi.org/10.1609/aaai.v29i1.9375
M. Jain, V. Saihjpal, N. Singh, and S. B. Singh, "An Overview of Variants and Advancements of PSO Algorithm," Applied Sciences, vol. 12, no. 17, Jan. 2022, Art. no. 8392. DOI: https://doi.org/10.3390/app12178392
A. Kumar and M. Jain, Ensemble Learning for AI Developers: Learn Bagging, Stacking, and Boosting Methods with Use Cases. Berkeley, CA, USA: Apress, 2020. DOI: https://doi.org/10.1007/978-1-4842-5940-5
Z. Wan, Y. Xu, and B. Šavija, "On the Use of Machine Learning Models for Prediction of Compressive Strength of Concrete: Influence of Dimensionality Reduction on the Model Performance," Materials, vol. 14, no. 4, Jan. 2021, Art. no. 713. DOI: https://doi.org/10.3390/ma14040713
X. Zhao, N. Xia, Y. Xu, X. Huang, and M. Li, "Mapping Population Distribution Based on XGBoost Using Multisource Data," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 11567–11580, 2021. DOI: https://doi.org/10.1109/JSTARS.2021.3125197
T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, May 2016, pp. 785–794. DOI: https://doi.org/10.1145/2939672.2939785
G. Ke et al., "LightGBM: A Highly Efficient Gradient Boosting Decision Tree," in Advances in Neural Information Processing Systems, 2017, vol. 30.
A. V. Dorogush, V. Ershov, and A. Gulin, "CatBoost: gradient boosting with categorical features support." arXiv, Oct. 24, 2018.
H. Alshari, A. Y. Saleh, and A. Odabaş, "Comparison of Gradient Boosting Decision Tree Algorithms for CPU Performance," Journal of Institute Of Science and Technology, vol. 37, no. 1, 2021.
N. T. Tran, T. T. G. Tran, T. A. Nguyen, and M. B. Lam, "A new grid search algorithm based on XGBoost model for load forecasting," Bulletin of Electrical Engineering and Informatics, vol. 12, no. 4, pp. 1857–1866, Aug. 2023. DOI: https://doi.org/10.11591/beei.v12i4.5016
M. Karasuyama and R. Nakano, "Optimizing SVR Hyperparameters via Fast Cross-Validation using AOSVR," in 2007 International Joint Conference on Neural Networks, Orlando, FL, USA, Aug. 2007, pp. 1186–1191. DOI: https://doi.org/10.1109/IJCNN.2007.4371126
H. Fanaee-Τ and J. Gama, "Event labeling combining ensemble detectors and background knowledge," Progress in Artificial Intelligence, vol. 2, no. 2, pp. 113–127, Jun. 2014. DOI: https://doi.org/10.1007/s13748-013-0040-3
Downloads
How to Cite
License
Copyright (c) 2025 Thi Giang Thanh Tran, Thanh Ngoc Tran

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
