MSAFF: Multi-Script Adaptive Feature Fusion for Script Identification
Received: 26 April 2025 | Revised: 15 May 2025, 28 May 2025, and 2 June 2025 | Accepted: 6 June 2025 | Online: 2 August 2025
Corresponding author: Suneel C. Shinde
Abstract
Script identification in multilingual documents remains a critical challenge in document analysis, especially in the context of Indian languages, where multiple scripts often coexist within the same document. This paper proposes MSAFF (Multi-Script Adaptive Feature Fusion), a novel framework designed to tackle this complexity by dynamically integrating multiple discriminative feature sets, namely Local Binary Patterns (LBP), Horizontal Projection Profiles (HPP), and Histogram of Oriented Gradients (HOG). MSAFF employs an adaptive fusion mechanism that intelligently adjusts feature weights according to the granularity of the input, enabling robust script recognition across various textual levels, including blocks, lines, words, numerals, and alphanumeric strings. To effectively classify scripts and manage transitions between them in mixed-script environments, MSAFF utilizes a hybrid classification strategy that combines Support Vector Machines (SVMs) for initial script identification with Hidden Markov Models (HMMs) to model sequential script transitions. Extensive evaluations on the MDIW-13 dataset demonstrate the effectiveness of MSAFF, which achieved an overall accuracy of 92%, with outstanding results in text block-level identification (96%) and mixed-script transition detection (>85%). The method also shows strong resilience to document degradations, maintaining high accuracy under noise (90%) and skew (88%) conditions. Additionally, MSAFF exhibits notable computational efficiency, outperforming state-of-the-art techniques in processing speed across varying input sizes.
Keywords:
script identification, feature fusion, document analysis, multilingual documents, Indian scripts, SVM, HMMDownloads
References
S. Chanda, S. Pal, K. Franke, and U. Pal, "Two-stage Approach for Word-wise Script Identification," in 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 2009, pp. 926–930. DOI: https://doi.org/10.1109/ICDAR.2009.239
B. B. Chaudhuri and U. Pal, "An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)," in Proceedings of the Fourth International Conference on Document Analysis and Recognition, Ulm, Germany, 1997, vol. 2, pp. 1011–1015. DOI: https://doi.org/10.1109/ICDAR.1997.620662
M. C. Padma and P. Nagabhushan, "Horizontal and Vertical linear edge features as useful clues in the discrimination of multiligual (Kannada, Hindi and English) machine printed documents," in Proc. National Workshop on Computer Vision, Graphics and Image Processing (WVGIP), 2002, pp. 204–209.
S. B. Patil and N. V. Subbareddy, "Neural network based system for script identification in Indian documents," Sadhana, vol. 27, no. 1, pp. 83–97, Feb. 2002. DOI: https://doi.org/10.1007/BF02703314
D. Dhanya and A. G. Ramakrishnan, "Script Identification in Printed Bilingual Documents," in Document Analysis Systems V, 2002, pp. 13–24. DOI: https://doi.org/10.1007/3-540-45869-7_2
D. N. L. Vu, T. Igamberdiev, and I. Habernal, "Granularity is crucial when applying differential privacy to text: An investigation for neural machine translation." arXiv, Sep. 26, 2024.
P. Nagabhushan, S. A. Angadi, and B. S. Anami, "An intelligent pin code script identification methodology based on texture analysis using modified invariant moments," in Proceedings of International Conference on Cognition and Recognition, 2005, pp. 615–623.
M. A. Ferrer, A. Das, M. Diaz, C. Carmona-Duarte, and U. Pal, "MDIW-13 MultiScript Document Database." IEEE DataPort, Oct. 25, 2019.
F. Saleh, W. Buntine, G. Haffari, and L. Du, "Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help?" arXiv, Oct. 15, 2021. DOI: https://doi.org/10.18653/v1/2021.findings-emnlp.114
Y. Liu, C. Ma, H. Ye, and H. Schütze, "TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models." arXiv, 2024. DOI: https://doi.org/10.18653/v1/2024.acl-long.136
J. Sauvola and M. Pietikäinen, "Adaptive document image binarization," Pattern Recognition, vol. 33, no. 2, pp. 225–236, Feb. 2000. DOI: https://doi.org/10.1016/S0031-3203(99)00055-2
R. O. Duda and P. E. Hart, "Use of the Hough transformation to detect lines and curves in pictures," Communications of the ACM, vol. 15, no. 1, pp. 11–15, Jan. 1972. DOI: https://doi.org/10.1145/361237.361242
P. Di et al., "CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model," in Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice, Feb. 2024, pp. 418–429. DOI: https://doi.org/10.1145/3639477.3639719
Downloads
How to Cite
License
Copyright (c) 2025 Suneel C. Shinde, V. S. Malemath, Joshi B. Vinayak, Anilkumar C. Korishetti, R. Gogulan

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
