An Automatic Grading System for Arabic Language Short-Answer Questions Using Deep Learning
Received: 11 March 2025 | Revised: 30 April 2025 and 23 June 2025 | Accepted: 27 June 2025 | Online: 6 October 2025
Corresponding author: Afnan Alqurashi
Abstract
Assessing students' acquired knowledge is a core objective of the educational process. Automating this task can improve the quality of the evaluation and reduce the time, effort, and cost associated with manual grading. Automated Short Answer Grading (ASAG) systems aim to support this goal by providing accurate scoring and clear feedback to students. The ability to explain assigned scores is crucial for the real-world deployment of automated systems, ensuring transparency and trust. This work introduces an Arabic ASAG system designed to combine automated scoring with explainability. Several deep learning models and embedding techniques are evaluated, including BERT, LSTM, and attention mechanisms. Experiments are conducted on three datasets: AR-ASAG, a dedicated Arabic ASAG dataset; a manually translated Arabic version of the PT-ASAG dataset (originally in Portuguese); and a merged dataset combining both. The results highlight the effectiveness of the BERT model, which achieved strong performance with Pearson correlation coefficients from 0.811 to 0.923 and a minimum RMSE of 0.182. Prediction results are also interpreted to improve both the explainability and reliability of the system.
Keywords:
ASAG, short answer questions, Arabic language, short answer grading, deep learning, RNN, LSTM, BERT, explainable AIDownloads
References
L. Yuan and S. J. Powell, MOOCs and open education: Implications for higher education. JISC cetis, 2013.
E. Badger and B. Thomas, "Open-Ended Questions in Reading," Practical Assessment, Research, and Evaluation, vol. 3, no. 1, Jan. 1991.
R. E. Bennett, "On the Meanings of Constructed Response," ETS Research Report Series, vol. 1991, no. 2, 1991.
E. B. Page, "The Imminence of... Grading Essays by Computer," The Phi Delta Kappan, vol. 47, no. 5, pp. 238–243, 1966.
P. W. Foltz, D. Laham, and T. K. Landauer, "The Intelligent Essay Assessor: Applications to Educational Technology," Interactive Multimedia Electronic Journal of Computer-Enhanced Learning, vol. 1, no. 2, 1999.
A. Adamson, A. Lamb, and R. M. December, "Automated essay grading," in Proceedings of the Conference on Artificial Intelligence in Education, Québec, Canada, 2014, pp. 27–31.
B. Riordan, A. Horbach, A. Cahill, T. Zesch, and C. M. Lee, "Investigating neural architectures for short answer scoring," in Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, Copenhagen, Denmark, Jun. 2017, pp. 159–168.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, Mar. 2019, pp. 4171–4186.
J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching Word Vectors with Subword Information," Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, Jun. 2017.
D. Cer et al., "Universal Sentence Encoder for English," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, Aug. 2018, pp. 169–174.
N. Reimers and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." arXiv, Aug. 27, 2019.
V. Belle and I. Papantonis, "Principles and Practice of Explainable Machine Learning," Frontiers in Big Data, vol. 4, Jul. 2021.
A. B. Arrieta et al., "Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI," Information Fusion, vol. 58, pp. 82–115, Jun. 2020.
S. Burrows, I. Gurevych, and B. Stein, "The Eras and Trends of Automatic Short Answer Grading," International Journal of Artificial Intelligence in Education, vol. 25, no. 1, pp. 60–117, Mar. 2015.
S. Hassan, A. A. Fahmy, and M. El-Ramly, "Automatic short answer scoring based on paragraph embeddings," International Journal of Advanced Computer Science and Applications, vol. 9, no. 10, pp. 397–402, 2018.
G. Liang, B. W. On, D. Jeong, H. C. Kim, and G. S. Choi, "Automated Essay Scoring: A Siamese Bidirectional LSTM Neural Network Architecture," Symmetry, vol. 10, no. 12, Dec. 2018, Art. no. 682.
P. Patil and A. Agrawal, "Auto Grader for Short Answer Questions," presented at the CS229: Machine Learning, Stanford University, 2018.
F. S. Pribadi, A. E. Permanasari, and T. B. Adji, "Short answer scoring system using automatic reference answer generation and geometric average normalized-longest common subsequence (GAN-LCS)," Education and Information Technologies, vol. 23, no. 6, pp. 2855–2866, Nov. 2018.
K. Surya, E. Gayakwad, and M. Nallakaruppan, "Deep learning for short answer scoring," International Journal of Recent Technology and Engineering, vol. 7, no. 6, pp. 1712–1715, 2019.
T. Gong and X. Yao, "An attention-based deep model for automatic short answer score," International Journal of Computer Science and Software Engineering, vol. 8, no. 6, pp. 127–132, 2019.
N. George, P. J. Sijimol, and S. M. Varghese, "Grading descriptive answer scripts using deep learning," International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 5, 2019.
L. Zhang, Y. Huang, X. Yang, S. Yu, and F. Zhuang, "An automatic short-answer grading model for semi-open-ended questions," Interactive Learning Environments, vol. 30, no. 1, pp. 177–190, Jan. 2022.
R. A. Rajagede and R. P. Hastuti, "Stacking Neural Network Models for Automatic Short Answer Scoring," IOP Conference Series: Materials Science and Engineering, vol. 1077, no. 1, Oct. 2021, Art. no. 012013.
C. N. Tulu, O. Ozkaya, and U. Orhan, "Automatic Short Answer Grading With SemSpace Sense Vectors and MaLSTM," IEEE Access, vol. 9, pp. 19270–19280, 2021.
M. A. Salam, M. A. El-Fatah, and N. F. Hassan, "Automatic grading for Arabic short answer questions using optimized deep learning model," PLOS ONE, vol. 17, no. 8, 2022, Art. no. e0272269.
M. D. Alahmadi, M. Alharbi, A. Tayeb, and M. Alshangiti, "Evaluating Large Language Models’ Proficiency in Answering Arabic GAT Exam Questions," Engineering, Technology & Applied Science Research, vol. 14, no. 6, pp. 17774–17780, Dec. 2024.
C. Zhao, M. Silva, and S. Poulsen, "Language Models are Few-Shot Graders." arXiv, Feb. 18, 2025.
G. Meyer, P. Breuer, and J. Fürst, "ASAG2024: A Combined Benchmark for Short Answer Grading," in Proceedings of the 2024 on ACM Virtual Global Computing Education Conference, Sep. 2024, pp. 322–323.
H. Alamoudi et al., "Arabic Sentiment Analysis for Student Evaluation using Machine Learning and the AraBERT Transformer," Engineering, Technology & Applied Science Research, vol. 13, no. 5, pp. 11945–11952, Oct. 2023.
T. Schlippe, Q. Stierstorfer, M. ten Koppel, and P. Libbrecht, "Explainability in Automatic Short Answer Grading," in Artificial Intelligence in Education Technologies: New Development and Innovative Practices, Singapore, 2023, pp. 69–87.
V. Kumar and D. Boulanger, "Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value," Frontiers in Education, vol. 5, Oct. 2020.
D. Aggarwal, P. Sil, B. Raman, and P. Bhattacharyya, "‘I understand why I got this grade’: Automatic Short Answer Grading with Feedback." arXiv, Jun. 23, 2025.
H. Do, S. Ryu, and G. G. Lee, "Teach-to-Reason with Scoring: Self-Explainable Rationale-Driven Multi-Trait Essay Scoring." arXiv, Feb. 28, 2025.
Y. Zhu et al., "Using natural language processing on free-text clinical notes to identify patients with long-term COVID effects," in Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, May 2022, pp. 1–9.
V. M. Nguyen et al., "Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data," in 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, Dec. 2023, pp. 2095–2102.
L. Ouahrani and D. Bennouar, "AR-ASAG An ARabic Dataset for Automatic Short Answer Grading Evaluation," in Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, Feb. 2020, pp. 2634–2643.
L. B. Galhardi, "PT_ASAG_2018." Kaggle, [Online]. Available: https://www.kaggle.com/datasets/lucasbgalhardi/pt-asag-2018.
W. Antoun, F. Baly, and H. Hajj, "AraBERT: Transformer-based Model for Arabic Language Understanding." arXiv, Mar. 07, 2021.
M. Sundararajan, A. Taly, and Q. Yan, "Axiomatic Attribution for Deep Networks," in Proceedings of the 34th International Conference on Machine Learning, Jul. 2017, pp. 3319–3328.
Y. Saoudi and M. M. Gammoudi, "A Comprehensive Review of Arabic Question Answering Datasets," in Neural Information Processing, Singapore, 2024, pp. 278–289.
L. Ouahrani and D. Bennouar, "Paraphrase Generation and Supervised Learning for Improved Automatic Short Answer Grading," International Journal of Artificial Intelligence in Education, vol. 34, no. 4, pp. 1627–1670, Dec. 2024.
Downloads
How to Cite
License
Copyright (c) 2025 Afnan Alqurashi, Basma Alharbi, Sahar Sabbeh

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.