Classification of Macromolecules Based on Amino Acid Sequences Using Deep Learning

S. Khan; I. Ali; F. Ghaffar; Q. Mazhar-ul-Haq

doi:10.48084/etasr.5230

Authors

S. Khan Department of Computer Science, National Chengchi University, Taiwan
I. Ali Department of Computer Science, University of Swat, Pakistan
F. Ghaffar System Design Engineering Department, University of Waterloo, Canada
Q. Mazhar-ul-Haq National Taipei University of Technology, Taiwan

Volume: 12 | Issue: 6 | Pages: 9491-9495 | December 2022 | https://doi.org/10.48084/etasr.5230

Received: 31 July 2022 | Revised: 26 August 2022 | Accepted: 28 August 2022 | Online: 20 September 2022

Corresponding author: S. Khan

Abstract

The classification of amino acids and their sequence analysis plays a vital role in life sciences and is a challenging task. Deep learning models have well-established frameworks for solving a broad spectrum of complex learning problems compared to traditional machine learning techniques. This article uses and compares state-of-the-art deep learning models like Convolution Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU) to solve macromolecule classification problems using amino acid sequences. The CNN extracts features from amino acid sequences, which are treated as vectors with the use of word embedding. These vectors are fed to the above-mentioned models to train robust classifiers. The results show that word2vec as embedding combined with VGG-16 performs better than LSTM and GRU. The proposed approach gets an error rate of 1.5%.

Keywords:

CNN, LSTM, macromolecules , amino acid

References

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, May 2016, pp. 770-778. DOI: https://doi.org/10.1109/CVPR.2016.90

C.-L. Liu, Hsaio W.-H., and Tu Y.-C., "Time series classification with multivariate convolutional neural network," IEEE Transactions on Industrial Electronics, vol. 66, no. 6, pp. 4788-4797, Aug. 2018. DOI: https://doi.org/10.1109/TIE.2018.2864702

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 2818–2826. DOI: https://doi.org/10.1109/CVPR.2016.308

Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time series," in The handbook of brain theory and neural networks, Cambridge, MA, USA: MIT Press, 1998, pp. 255–258.

S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling." arXiv, Dec. 11, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84–90, Feb. 2017. DOI: https://doi.org/10.1145/3065386

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Aug. 1998. DOI: https://doi.org/10.1109/5.726791

Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436–444, May 2015. DOI: https://doi.org/10.1038/nature14539

M. Hussain, J. J. Bird, and D. R. Faria, "A Study on CNN Transfer Learning for Image Classification," in Advances in Computational Intelligence Systems, 2019, pp. 191–202. DOI: https://doi.org/10.1007/978-3-319-97982-3_16

T. K. Lee and T. Nguyen, "Protein Family Classiﬁcation with Neural Networks," [Online]. Available: https://cs224d.stanford.edu/reports/LeeNguyen.pdf.

J. Hou, B. Adhikari, and J. Cheng, "DeepSF: deep convolutional neural network for mapping protein sequences to folds," Bioinformatics, vol. 34, no. 8, pp. 1295–1303, Apr. 2018. DOI: https://doi.org/10.1093/bioinformatics/btx780

N. G. Nguyen et al., "DNA Sequence Classification by Convolutional Neural Network," Journal of Biomedical Science and Engineering, vol. 9, no. 5, pp. 280–286, Apr. 2016. DOI: https://doi.org/10.4236/jbise.2016.95021

I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to Sequence Learning with Neural Networks." arXiv, Dec. 14, 2014.

D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate." arXiv, May 19, 2016.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, Red Hook, NY, USA, Sep. 2013, pp. 3111–3119.

A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, "FastText.zip: Compressing text classification models." arXiv, Dec. 12, 2016.

J. Pennington, R. Socher, and C. Manning, "GloVe: Global Vectors for Word Representation," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Jul. 2014, pp. 1532–1543. DOI: https://doi.org/10.3115/v1/D14-1162

K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition." arXiv, Apr. 10, 2015.

G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, "Snapshot Ensembles: Train 1, get M for free." arXiv, Mar. 31, 2017.

R. P. D. Bank, "RCSB PDB: Homepage," Protein Data Bank. https://www.rcsb.org/.

L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. DOI: https://doi.org/10.1023/A:1010933404324

Classification of Macromolecules Based on Amino Acid Sequences Using Deep Learning

Authors

Abstract

Keywords:

References

Downloads

How to Cite

Metrics

License

template

Download the latest version of our template (March 13, 2026)