Enhanced Square Fiducial Marker Recognition under Challenging Visual Environments Using Multi-Scale CNN-Transformer Fusion

Liliek Triyono; Rahmat Gernowo; . Prayitno; Eko Harry Pratisto

doi:10.48084/etasr.12693

Authors

Liliek Triyono Doctoral Program of Information System, Diponegoro University, Semarang, Indonesia | Electrical Engineering Department, Politeknik Negeri Semarang, Semarang, Indonesia
Rahmat Gernowo Doctoral Program of Information System, Diponegoro University, Semarang, Indonesia
Prayitno Electrical Engineering Department, Politeknik Negeri Semarang, Semarang, Indonesia
Eko Harry Pratisto Diploma Program of Informatics Engineering, Universitas Sebelas Maret, Surakarta, Indonesia

Volume: 15 | Issue: 5 | Pages: 27810-27817 | October 2025 | https://doi.org/10.48084/etasr.12693

Received: 11 June 2025 | Revised: 11 July 2025, 23 July 2025, and 4 August 2025 | Accepted: 11 August 2025 | Online: 2 September 2025

Corresponding author: Liliek Triyono

Abstract

Recent methods using deep learning have demonstrated promising outcomes in tackling the issue of object recognition in low-light images. However, existing techniques often face challenges related to distortion and occlusions, and many strategies rely on neural networks with convolutional neural network (CNN) structures, which are limited in their ability to capture long-term dependencies. This frequently leads to inadequate recovery of very dark areas in low-light images. This work introduces a unique Transformer-based method for ArUco marker recognition in low-light environments, termed Extreme ArUco Vision Transformer (XAViT). We present a Transformer–CNN hybrid block that utilizes mixed attention to effectively capture both global and local information. This method integrates the Transformer's capacity to model long-range dependencies with the CNN's proficiency in extracting detailed features, facilitating the reliable detection of ArUco markers even in extreme lighting conditions. Additionally, we employ a Swin-Transformer discriminator to selectively improve various areas of low-light images, alleviating problems of overexposure, underexposure, and noise. Comprehensive experiments show that XAViT achieves 99.16% accuracy, 97.86% recall, 97.95% precision, and 97.89% F1-score on a realistic low-light dataset, outperforming state-of-the-art CNN and Transformer models. Moreover, its utilization in additional vision-based tasks underscores its potential for wider implementation in advanced vision applications.

Keywords:

low-light image, indoor navigation, computer vision, markers, assistive technology

Downloads

Download data is not yet available.

References

R. Liu, L. Ma, J. Zhang, X. Fan, and Z. Luo, "Retinex-inspired Unrolling with Cooperative Prior Architecture Search for Low-light Image Enhancement," in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 10556–10565.

J. Liang, Y. Xu, Y. Quan, J. Wang, H. Ling, and H. Ji, "Deep Bilateral Retinex for Low-Light Image Enhancement." arXiv, Jul. 04, 2020.

J. Subash and J. Majumdar, "Comparison of Image Enhancement Algorithms for Improving the Visual Quality in Computer Vision Application," International Journal of Advanced Computer Science and Applications, vol. 13, no. 7, pp. 638–654, Jul. 2022.

Z. Tian et al., "A Survey of Deep Learning-Based Low-Light Image Enhancement," Sensors, vol. 23, no. 18, Sep. 2023, Art. no. 7763.

W. Huang, Y. Zhu, and R. Huang, "Low Light Image Enhancement Network With Attention Mechanism and Retinex Model," IEEE Access, vol. 8, pp. 74306–74314, 2020.

R. Khan, Q. Liu, and Y. Yang, "A Deep Hybrid Few Shot Divide and Glow Method for Ill-Light Image Enhancement," IEEE Access, vol. 9, pp. 17767–17778, 2021.

X. Li, Q. Yu, X. Pan, and Z. Yu, "Research on the Contrast Enhancement Algorithm for X-ray Images of BiFeO3 Material Experiment," Applied Sciences, vol. 14, no. 9, May 2024, Art. no. 3546.

J. Park, A. G. Vien, J.-H. Kim, and C. Lee, "Histogram-Based Transformation Function Estimation for Low-Light Image Enhancement," in 2022 IEEE International Conference on Image Processing, Bordeaux, France, 2022, pp. 1–5.

Y. Zhou, Y. Wang, and W. Cai, "Cycle-enhance: low-light image enhancement based on CycleGan," in Second International Conference on Electronic Information Engineering, Big Data, and Computer Technology, Xishuangbanna, China, 2023, pp. 178–183.

Y. Fan et al., "Laser Image Enhancement Algorithm Based on Improved EnlightenGAN," Electronics, vol. 12, no. 9, May 2023, Art. no. 2081.

X. Zhao and L. Li, "A low-light-level image enhancement algorithm combining Retinex and Transformer," in International Conference on Remote Sensing, Mapping, and Image Processing, Xiamen, China, 2024, pp. 704–712.

M. He, R. Wang, Y. Wang, F. Zhou, and N. Guo, "DMPH-Net: a deep multi-scale pyramid hybrid network for low-light image enhancement with attention mechanism and noise reduction," Signal, Image and Video Processing, vol. 17, no. 8, pp. 4533–4542, Nov. 2023.

M. Elgendy, C. Sik-Lanyi, and A. Kelemen, "A Novel Marker Detection System for People with Visual Impairment Using the Improved Tiny-YOLOv3 Model," Computer Methods and Programs in Biomedicine, vol. 205, Jun. 2021, Art. no. 106112.

A. Fisne, A. Kalay, F. Yavuz, C. Cetintepe, and A. Ozsoy, "Energy-efficient computing for machine learning based target detection," Concurrency and Computation: Practice and Experience, vol. 35, no. 24, Nov. 2023, Art. no. e7582.

D. Heller, M. Rizk, R. Douguet, A. Baghdadi, and J.-Ph. Diguet, "Marine Objects Detection Using Deep Learning on Embedded Edge Devices," in 2022 IEEE International Workshop on Rapid System Prototyping, Shanghai, China, 2022, pp. 1–7.

A. Radman, F. Mohammadimanesh, and M. Mahdianpari, "Wet-ConViT: A Hybrid Convolutional–Transformer Model for Efficient Wetland Classification Using Satellite Data," Remote Sensing, vol. 16, no. 14, Jul. 2024, Art. no. 2673.

K. Ren, T. Zhang, X. Li, Y. Du, and H. Han, "HTViT: an efficient CNN-Transformer hybrid model with high throughput," in Optoelectronic Imaging and Multimedia Technology XI, Nantong, Jiangsu, China, 2024, pp. 332–341.

D. Hendrycks and K. Gimpel, "Gaussian Error Linear Units (GELUs)." arXiv, Jun. 06, 2023.

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database," in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248–255.

S. M. S. Zobly, "Comparison of Different Image Enhancement Methods for Effective Whole-Body Bone Scan Image," Advances in Bioscience and Bioengineering, vol. 7, no. 3, pp. 55–59, Aug. 2019.

A. Arora et al., "Low Light Image Enhancement via Global and Local Context Modeling." arXiv, Jan. 04, 2021.

L. Triyono, R. Gernowo, and Prayitno, "MoNetViT: an efficient fusion of CNN and transformer technologies for visual navigation assistance with multi query attention," Frontiers in Computer Science, vol. 7, Feb. 2025, Art. no. 1510252.

F. Cutolo, V. Mamone, N. Carbonaro, V. Ferrari, and A. Tognetti, "Ambiguity-Free Optical–Inertial Tracking for Augmented Reality Headsets," Sensors, vol. 20, no. 5, Mar. 2020, Art. no. 1444.

Enhanced Square Fiducial Marker Recognition under Challenging Visual Environments Using Multi-Scale CNN-Transformer Fusion

Authors

Abstract

Keywords:

Downloads

References

Downloads

How to Cite

Metrics

License