An Automatic Grading System for Arabic Language Short-Answer Questions Using Deep Learning

Afnan Alqurashi; Basma Alharbi; Sahar Sabbeh

doi:10.48084/etasr.10917

Authors

Afnan Alqurashi Computer Science and Artificial Intelligence Department, College of Computer Science and Engineering, University of Jeddah, Saudi Arabia
Basma Alharbi Computer Science and Artificial Intelligence Department, College of Computer Science and Engineering, University of Jeddah, Saudi Arabia
Sahar Sabbeh Information Science and Technology Department, College of Computer Science and Engineering, University of Jeddah, Saudi Arabia | Department of Information Systems, Faculty of Computer Science and Artificial Intelligence, Benha University, Egypt

Volume: 15 | Issue: 5 | Pages: 26665-26675 | October 2025 | https://doi.org/10.48084/etasr.10917

Received: 11 March 2025 | Revised: 30 April 2025 and 23 June 2025 | Accepted: 27 June 2025 | Online: 6 October 2025

Corresponding author: Afnan Alqurashi

Abstract

Assessing students' acquired knowledge is a core objective of the educational process. Automating this task can improve the quality of the evaluation and reduce the time, effort, and cost associated with manual grading. Automated Short Answer Grading (ASAG) systems aim to support this goal by providing accurate scoring and clear feedback to students. The ability to explain assigned scores is crucial for the real-world deployment of automated systems, ensuring transparency and trust. This work introduces an Arabic ASAG system designed to combine automated scoring with explainability. Several deep learning models and embedding techniques are evaluated, including BERT, LSTM, and attention mechanisms. Experiments are conducted on three datasets: AR-ASAG, a dedicated Arabic ASAG dataset; a manually translated Arabic version of the PT-ASAG dataset (originally in Portuguese); and a merged dataset combining both. The results highlight the effectiveness of the BERT model, which achieved strong performance with Pearson correlation coefficients from 0.811 to 0.923 and a minimum RMSE of 0.182. Prediction results are also interpreted to improve both the explainability and reliability of the system.

Keywords:

ASAG, short answer questions, Arabic language, short answer grading, deep learning, RNN, LSTM, BERT, explainable AI

Downloads

Download data is not yet available.

References

L. Yuan and S. J. Powell, MOOCs and open education: Implications for higher education. JISC cetis, 2013.

E. Badger and B. Thomas, "Open-Ended Questions in Reading," Practical Assessment, Research, and Evaluation, vol. 3, no. 1, Jan. 1991.

R. E. Bennett, "On the Meanings of Constructed Response," ETS Research Report Series, vol. 1991, no. 2, 1991.

E. B. Page, "The Imminence of... Grading Essays by Computer," The Phi Delta Kappan, vol. 47, no. 5, pp. 238–243, 1966.

P. W. Foltz, D. Laham, and T. K. Landauer, "The Intelligent Essay Assessor: Applications to Educational Technology," Interactive Multimedia Electronic Journal of Computer-Enhanced Learning, vol. 1, no. 2, 1999.

A. Adamson, A. Lamb, and R. M. December, "Automated essay grading," in Proceedings of the Conference on Artificial Intelligence in Education, Québec, Canada, 2014, pp. 27–31.

B. Riordan, A. Horbach, A. Cahill, T. Zesch, and C. M. Lee, "Investigating neural architectures for short answer scoring," in Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, Copenhagen, Denmark, Jun. 2017, pp. 159–168.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, Mar. 2019, pp. 4171–4186.

J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching Word Vectors with Subword Information," Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, Jun. 2017.

D. Cer et al., "Universal Sentence Encoder for English," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, Aug. 2018, pp. 169–174.

N. Reimers and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." arXiv, Aug. 27, 2019.

V. Belle and I. Papantonis, "Principles and Practice of Explainable Machine Learning," Frontiers in Big Data, vol. 4, Jul. 2021.

A. B. Arrieta et al., "Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI," Information Fusion, vol. 58, pp. 82–115, Jun. 2020.

S. Burrows, I. Gurevych, and B. Stein, "The Eras and Trends of Automatic Short Answer Grading," International Journal of Artificial Intelligence in Education, vol. 25, no. 1, pp. 60–117, Mar. 2015.

S. Hassan, A. A. Fahmy, and M. El-Ramly, "Automatic short answer scoring based on paragraph embeddings," International Journal of Advanced Computer Science and Applications, vol. 9, no. 10, pp. 397–402, 2018.

G. Liang, B. W. On, D. Jeong, H. C. Kim, and G. S. Choi, "Automated Essay Scoring: A Siamese Bidirectional LSTM Neural Network Architecture," Symmetry, vol. 10, no. 12, Dec. 2018, Art. no. 682.

P. Patil and A. Agrawal, "Auto Grader for Short Answer Questions," presented at the CS229: Machine Learning, Stanford University, 2018.

F. S. Pribadi, A. E. Permanasari, and T. B. Adji, "Short answer scoring system using automatic reference answer generation and geometric average normalized-longest common subsequence (GAN-LCS)," Education and Information Technologies, vol. 23, no. 6, pp. 2855–2866, Nov. 2018.

K. Surya, E. Gayakwad, and M. Nallakaruppan, "Deep learning for short answer scoring," International Journal of Recent Technology and Engineering, vol. 7, no. 6, pp. 1712–1715, 2019.

T. Gong and X. Yao, "An attention-based deep model for automatic short answer score," International Journal of Computer Science and Software Engineering, vol. 8, no. 6, pp. 127–132, 2019.

N. George, P. J. Sijimol, and S. M. Varghese, "Grading descriptive answer scripts using deep learning," International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 5, 2019.

L. Zhang, Y. Huang, X. Yang, S. Yu, and F. Zhuang, "An automatic short-answer grading model for semi-open-ended questions," Interactive Learning Environments, vol. 30, no. 1, pp. 177–190, Jan. 2022.

R. A. Rajagede and R. P. Hastuti, "Stacking Neural Network Models for Automatic Short Answer Scoring," IOP Conference Series: Materials Science and Engineering, vol. 1077, no. 1, Oct. 2021, Art. no. 012013.

C. N. Tulu, O. Ozkaya, and U. Orhan, "Automatic Short Answer Grading With SemSpace Sense Vectors and MaLSTM," IEEE Access, vol. 9, pp. 19270–19280, 2021.

M. A. Salam, M. A. El-Fatah, and N. F. Hassan, "Automatic grading for Arabic short answer questions using optimized deep learning model," PLOS ONE, vol. 17, no. 8, 2022, Art. no. e0272269.

M. D. Alahmadi, M. Alharbi, A. Tayeb, and M. Alshangiti, "Evaluating Large Language Models’ Proficiency in Answering Arabic GAT Exam Questions," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 17774–17780, Dec. 2024.

C. Zhao, M. Silva, and S. Poulsen, "Language Models are Few-Shot Graders." arXiv, Feb. 18, 2025.

G. Meyer, P. Breuer, and J. Fürst, "ASAG2024: A Combined Benchmark for Short Answer Grading," in Proceedings of the 2024 on ACM Virtual Global Computing Education Conference, Sep. 2024, pp. 322–323.

H. Alamoudi et al., "Arabic Sentiment Analysis for Student Evaluation using Machine Learning and the AraBERT Transformer," Engineering, Technology & Applied Science Research, vol. 13, no. 5, pp. 11945–11952, Oct. 2023.

T. Schlippe, Q. Stierstorfer, M. ten Koppel, and P. Libbrecht, "Explainability in Automatic Short Answer Grading," in Artificial Intelligence in Education Technologies: New Development and Innovative Practices, Singapore, 2023, pp. 69–87.

V. Kumar and D. Boulanger, "Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value," Frontiers in Education, vol. 5, Oct. 2020.

D. Aggarwal, P. Sil, B. Raman, and P. Bhattacharyya, "‘I understand why I got this grade’: Automatic Short Answer Grading with Feedback." arXiv, Jun. 23, 2025.

H. Do, S. Ryu, and G. G. Lee, "Teach-to-Reason with Scoring: Self-Explainable Rationale-Driven Multi-Trait Essay Scoring." arXiv, Feb. 28, 2025.

Y. Zhu et al., "Using natural language processing on free-text clinical notes to identify patients with long-term COVID effects," in Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, May 2022, pp. 1–9.

V. M. Nguyen et al., "Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data," in 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, Dec. 2023, pp. 2095–2102.

L. Ouahrani and D. Bennouar, "AR-ASAG An ARabic Dataset for Automatic Short Answer Grading Evaluation," in Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, Feb. 2020, pp. 2634–2643.

L. B. Galhardi, "PT_ASAG_2018." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/lucasbgalhardi/pt-asag-2018.

W. Antoun, F. Baly, and H. Hajj, "AraBERT: Transformer-based Model for Arabic Language Understanding." arXiv, Mar. 07, 2021.

M. Sundararajan, A. Taly, and Q. Yan, "Axiomatic Attribution for Deep Networks," in Proceedings of the 34th International Conference on Machine Learning, Jul. 2017, pp. 3319–3328.

Y. Saoudi and M. M. Gammoudi, "A Comprehensive Review of Arabic Question Answering Datasets," in Neural Information Processing, Singapore, 2024, pp. 278–289.

L. Ouahrani and D. Bennouar, "Paraphrase Generation and Supervised Learning for Improved Automatic Short Answer Grading," International Journal of Artificial Intelligence in Education, vol. 34, no. 4, pp. 1627–1670, Dec. 2024.

An Automatic Grading System for Arabic Language Short-Answer Questions Using Deep Learning

Authors

Abstract

Keywords:

Downloads

References

Downloads

How to Cite

Metrics

License