Transformer Encoder with Protein Language Model for Protein Secondary Structure Prediction

In bioinformatics, protein secondary structure prediction plays a significant role in understanding protein function and interactions. This study presents the TE_SS approach, which uses a transformer encoder-based model and the Ankh protein language model to predict protein secondary structures. The research focuses on the prediction of nine classes of structures, according to the Dictionary of Secondary Structure of Proteins (DSSP) version 4. The model's performance was rigorously evaluated using various datasets. Additionally, this study compares the model with the state-of-the-art methods in the prediction of eight structure classes. The findings reveal that TE_SS excels in nine-and three-class structure predictions while also showing remarkable proficiency in the eight-class category. This is underscored by its performance in Qs and SOV evaluation metrics, demonstrating its capability to discern complex protein sequence patterns. This advancement provides a significant tool for protein structure analysis, thereby enriching the field of bioinformatics


INTRODUCTION
Proteins are made up of chains of amino acids.By altering their arrangement, 20 different types of acids can create a wide range of proteins.The primary structure of a protein is represented by one sequence, comprising the specific order in which the amino acids are arranged [1], and is referred to as 1D structure.Tertiary structures, often referred to as three shapes, are formed in living organisms through the interactions, among amino acids.These interactions play a crucial role in determining the function of proteins [2].To fully comprehend the relationship between the tertiary structures of a protein, it is important to predict its secondary structure [3].The use of efficient techniques to forecast protein structures has become essential in closing the disparity between the number of recognized protein sequences and the determined structures due to the limitations that experimental procedures entail, such as time requirements and the substantial costs involved [4].These predictive models are instrumental in enhancing our comprehension of protein functions and may be utilized in applications like drug development and disease control [5].
Secondary structure in proteins refers to the folded patterns that occur within a chain of acids as a result of forces like hydrogen and van der Waals bonds.To precisely define structure, the Dictionary of Secondary Structure of Proteins (DSSP) was devised [6].This program analyzes the coordinates of proteins with known structures to identify patterns of hydrogen bonding and geometric characteristics.DSSP assigns a type of secondary structure to each residue in the protein.The original classification consisted of eight classes: G (310 helix), H (α-helix), I (π-helix), B (isolated β-strand), E (extended strand), S (bend), T (turn), and L (irregular structure).These categories are often grouped into three group classes: H, G, I to helix (H), (B, E) to strand (E), and T, S, L to coil (C).The latest iteration of DSSP, version 4.0, which was released in 2021, marks a significant update in the field of protein secondary structure classification.This version extends the conventional eight types of secondary structures to include a ninth type, known as the poly-proline helix (P) [7].The task of Protein Secondary Structure Prediction (PSSP) involves assigning classes of structures, such as alpha helices, beta sheets, and coils to each individual amino acid in a protein chain.For computational methods to predict a structure, it is necessary to represent acids as numeric vectors.One-hot vector approach uses 21-encoding for each amino acid in protein sequence, which includes the 20 standard amino acids that make up the proteins and 1 non-standard amino acid represented by X to indicate an unknown or unspecified amino acid.However, this method has shown limited accuracy, in prediction.Another used technique involves utilizing PSSM profile features [8] or HHM profile features [9].These profile features incorporate information derived from analyzing sequence alignments obtained from a large protein sequence database.
Creating Hidden Markov Models (HMMs) or Position Specific Scoring Matrices (PSSMs) for each template sequence can be a time-consuming process, especially when dealing with proteins that have no sequences.To overcome this hurdle, recent advancements have introduced novel protein representation techniques inspired by methods used in natural language processing [10][11][12][13].These techniques involve the usage of pretrained protein language models, followed by fine tuning for specific tasks.These models can achieve performance even with limited task specific data available, where embedding from a language model pretrained on a large corpus of protein sequences effectively replaces evolutionary information.The implementation of this approach has demonstrated encouraging outcomes in several protein-related subsequent studies [4,7,[14][15][16][17][18][19].In the early stages of PSSP research statistical approaches were predominantly used.These methods focused on determining the likelihood of amino acids, in protein structures [20].Initially these predictors were designed for a three-class secondary structure prediction due to training data availability and computational constraints of that time.However, the particular methods encountered challenges in achieving high Q3 accuracy because they struggled to extract information from primary protein structure sequences.To overcome this limitation and improve their performance, researchers started incorporating information and position specific scoring matrices into the prediction process.This advancement proved significant, leading to a Q3 accuracy exceeding 70% [21].
Various machine learning techniques have been used for performing coarse-grained prediction, including decision trees [22], support vector machines [23], Neural Networks (NN) [24], HMMs [25], probabilistic graph models [26], and knearest neighbors [27].The methodologies in this area primarily utilize a fixed-size sliding window approach.This method was employed to forecast the secondary structure category of the essential amino acid residue in a given sequence.JPred4 [28] and PSIPRED V3.0 [29] were notable among the initial prediction algorithms.These techniques laid the foundation for further progress in the field, demonstrating the effectiveness of machine learning in understanding and predicting protein structures.The increased availability of data has led to the dominance of sequence-to-sequence deep model predictions, which have achieved state-of-the-art performance.Innovations in this area include DCRNN [30], which uses cascaded Convolutional and Recursive NN to extract both multiscale local and global contextual features.Other significant contributions include multiscale chained convolutional architecture for improved eight-state prediction [31].SPIDER3 [32] uses LSTM BRNNs to capture complex amino acid interactions, DeepACLSTM [33] integrates networks with LSTM units and utilizes specific dimensions in protein sequence feature vectors.MUFOLD SS [34] and SAINT [35] both employ Deep inception-inside-inception networks with MUFOLD SS emphasizing inception modules while SAINT incorporates self-attention mechanisms.SPOD 1D [36] combines LSTM BRNN and ResNet models with residue contact maps for its predictions.NetSurfP 2.0 [37] employs convolutional and LSTM networks, while ShuffleNet_SS [5] focuses on a lightweight convolutional NN.Another important development is the introduction of the protein encoder [38].This method employs a two-step process, beginning with an unsupervised autoencoder for feature extraction, followed by an ensemble of feature selection methods.A common element in earlier prediction models is their reliance on profile features, which are primarily obtained from Multiple Sequence Alignments (MSA).Nevertheless, the specific process, especially considering the rapidly expanding protein sequence databases, poses a significant time constraint.In response to this challenge, recent research has shifted toward leveraging embedding features extracted from pretrained protein language models.For instance, DML_SS [4] applies learning through a deep centroid model, for its predictions.SPOT 1D LM [19] synergizes embeddings from language models with one hot encoding techniques.LIFT SS [7] focuses on tuning pretrained protein language models.
In the field of predicting protein secondary structures, it is interesting to note that most current predictors apart from LIFT_SS [7] rely on the eight class assignments of structures from the previous version of the DSSP program for their training and evaluation data.Notably, even subsequent studies published after the introduction of DSSP 4.0 have continued to rely on eight-class secondary structure information, rather than adopting the more recent nine-class secondary structure classification.This trend indicates that these methodologies are being trained and evaluated using potentially outdated labeling information.In this study, the latest edition DSSP 4, a comprehensive database for secondary structure sequences, was utilized.The use of DSSP 4 ensured the training and evaluation data were based on a detailed classification of protein structures.Additionally, the Ankh protein language model [39] was adopted for obtaining protein embeddings, leveraging its capability to accurately represent protein sequences and replace the need for more computationally intensive evolutionary information.TE_SS, a deep transformer-based model [40], specifically designed to discern complex relationships between distant and proximal amino acid sequences in proteins, is proposed.This model is specifically designed to discern complex relationships between local and nonlocal amino acid sequences in proteins, processing sequential features in parallel, in contrast to existing models that extract features sequentially.The architecture of this model enables it to capture patterns and interactions within protein structures, enhancing the prediction of secondary structure.

A. Dataset
In this research, a collection of protein training data was generated using the PISCES server [38].The latter is well known for creating curated lists of sequence subsets from the Protein Data Bank (PDB).To assess protein structure prediction algorithms, criteria and parameters related to sequence identity were applied.The PISCES server utilizes a filter based on predefined protein parameters.Subsequently it sends the resulting lists and sequence files directly to the email address provided.To ensure the dataset's reliability and usefulness, the PISCES server was configured with settings, including a maximum resolution of 2.0 Å, an upper limit R value of 2.0 and a requirement that there will be no more than 50% sequence identity between any pair of protein sequences.Initially the dataset suggested consisted of 16,225 proteins.However, 188 proteins from this collection were excluded because they lacked corresponding information.In addition, to maintain the integrity of the performed analysis and avoid data contamination, any proteins that overlapped with the proposed test dataset were eliminated.Following these rigorous filtering criteria, a refined dataset, labeled as 16,037, consisting of 16,037 proteins was successfully curated.This dataset was strategically partitioned, with 15,037 proteins designated for the training set and the remaining 1000 proteins allocated for validation purposes.
This study involves datasets for secondary structure analysis of protein based on the 9-class classification provided by DSSP4.The DSSP software generates a DSSP file for each protein with an established structure, which contains detailed secondary structure information derived from the protein's three-dimensional structural data recorded in the PDB database.In the performed methodology, the Biopython library was initially employed to retrieve the PDB file corresponding to a given protein chain.This file is accessed from the PDB website using the specific PDB ID and the chain ID of the protein chain.The occurrence of nonstandard amino acids in these files, including modified residues was observed.A notable example includes the representation of methionine (MET) and selenomethionine (MSE) by the one-letter code M. To address this issue, a conversion process in which the three-letter amino acid codes in the PDB file were translated to their one-letter equivalents, with nonstandard amino acids denoted as X, was implemented.This modified sequence is referred to as the target primary sequence.Subsequently, the identical DSSP file was acquired on the basis of the PDB file.Contiguous fragments of amino acid residues and their associated secondary structures were extracted from this file, guided by the chain ID and the residue sequence number.However, extracting a primary sequence from the DSSP file that exactly matches the target primary sequence in terms of sequence composition or length is often not feasible [6].To accurately represent the sequence of protein structure, the primary sequence of interest is matched with the sequence obtained from the DSSP file.During this alignment process, any gaps that occur are filled with the letter X to indicate unassigned types of structures.To achieve this alignment, the Pairwise2 alignment algorithm from the BioPython package was utilized [39].
Performance metrics for nine-class prediction were assessed on diverse datasets.Three editions of the CASP competition, namely CASP12, CASP13, and CASP14, were utilized.These datasets encompass a selection of 47, 41, and 33 protein chains, respectively, carefully chosen to represent realworld challenges in protein structure prediction.Additionally, the CB433 test data [4], a curated and filtered subset of the widely used CB513 dataset, comprising 433 protein structures was considered.Evaluating the proposed model fairly against existing models necessitates the use of datasets that adhere to the same 8-class system.For this purpose, two well-known test datasets, TEST2016 and TEST2018 [34], containing 1213 and 250 protein sequences accordingly, were chosen.These are in line with the training and validation sets, which include 10029 and 983 proteins, respectively.Across all four datasets, the maximum length of any protein sequence does not exceed 700.Additionally, the primary and secondary structure sequence data, which is the standard for these datasets, is also utilized.

B. Embedding
Pretrained models that focus on protein language (Protein Language Models-PLMs) have become a tool, in biological applications serving as a strong foundation for modeling protein related tasks.While most approaches rely on these models for extracting features, this study takes an approach by utilizing the Ankh model [39], which is a large unsupervised PLM.Ankh, built on a transformer-based architecture and has been trained on the BFD [41] and UniRef50 [42] dataset.It achieves state of the art performance while using less than 10% of the parameters compared to models.This impressive efficiency opens up possibilities for accessible and scalable protein modeling applications.One of the strengths of the Ankh model lies in its ability to extract high quality embedding features that represent proteins accurately.These features are representations of protein sequences that capture information about their structure, function and evolutionary relationships.To acquire the embedding feature of a specific protein chain using Ankh we input its sequence into the model encoder and retrieve its output.Each amino acid, in a protein sequence is assigned a 1536-feature vector through the output embedding, which captures its information.For every protein sequence L this model generates an embedding vector of size (L*1536).The embeddings acquired from the embedding process were used as the input for the model.

C. Model Architecture
A novel model has been developed to predict protein structures.It heavily relies on transformer architectures.These transformers are great, at identifying both distant relationships within protein sequences by utilizing self-attention mechanisms and feed forward layers.The initial input for this model is a two-dimensional embedding, with (L, 1536) sequence dimensions.This embedding is generated using the training Ankh model.Additionally, the model incorporates encoding for each amino acid in the sequence to enhance information representation.After that, a series of N transformer encoders process the input as shown in Figure 1.This approach demonstrates how effective the model is at capturing patterns, within protein sequences leading to accurate predictions of protein structures.Transformer-based model architecture for protein secondary structure prediction.

1) Positional Encoding
To ensure the proposed model effectively takes into account the nature of protein sequences, positional encoding was incorporated.This approach produces data on the precise locations of amino acids throughout the protein sequence.By combining positional encoding with amino acid embeddings, not only can this model comprehend the unique characteristics of each amino acid, but also their contextual relationships within the sequence.This approach is crucial for capturing the spatial details of amino acids, which are essential for accurately predicting protein secondary structure.Positional encoding (PESS) is defined as follows: PESS , sin psn/10000 / (1) where psn represents the position of an amino acid in the sequence and " is its dimension in the encoding space, whereas #"$ % &' refers to the dimensionality of the model [43].In (1) the encoding for odd sequence positions is addressed, while (2) pertains to the encoding for even sequence positions.
In (3) the positional encoding obtained from ( 1) and ( 2) is and added into the input embeddings.
By including these data the suggested model acquires a comprehension of the protein's arrangement, which improves its predictive abilities, for the secondary structure.

2) Transformer Encoder in Protein Secondary Structure Prediction
The transformer encoder is a component of the transformer architecture [43] used for processing sequences in parallel.It is composed of layers, each of which has two sublayers: the Position-Wise Feed Forward Network and the Multi Head Self Attention Mechanism.
The key elements of the Transformer Encoder are:

a) Multi-head Attention
The multi head self-attention mechanism plays a role, in the encoder by allowing the model to evaluate and adjust the importance of segments within an input sequence.It creates three vector representations, i.e. query (Q), key (K), and value (V) for each input element.By measuring the similarity between Q and K, the attention scores are calculated to determine a sum of V vectors highlighting most relevant information.This process is performed across multiple heads enabling focus on different aspects of the sequence.The mathematical formulation, for this process is [43]: .//)0/"10 2, 3, 4 516/$78 9 where =# ; serves as a scaling factor to ensure values for sequences.

b) Feed-Forward Networks
After the self-attention mechanism, the data pass through a feed-forward NN, which is applied to each position separately and identically.This network consists of fully connected layers with activation functions and is responsible for further transforming the representation.

c) Layer Normalization
Each sublayer of both self-attention and feed-forward networks in the transformer encoder has a residual connection around it, followed by layer normalization.The residual connections help mitigate the vanishing gradient problem, enabling the training of very deep models.

d) Stacking of Layers
The self-attention, multi-head attention, and feed-forward layers are stacked together, forming multiple encoder layers.Each layer builds upon the previous one, gradually extracting increasingly complex and higher-level representations of the sequence.

3) Convolution 1D Layer
To further augment the model's capability to extract informative features, a 1D convolutional layer follows the transformer encoder.This layer operates along the feature dimension, applying learnable filters to capture local patterns and dependencies within the feature space.Mathematically, the expression of the one-dimensional convolution operation can be formally articulated as: @ 6 A ⊛ ( -* where Y is the output feature map, X is the input feature map, w is the convolutional filter, b is the bias term, ⊛ is the convolution operator, and f is the activation function.

4) Final Fully Connected Layer
The architecture concludes with a fully connected layer, which serves as the classification component of the model.This layer translates the processed features into predictions of the protein's secondary structure.

D. Evaluation Metrics
To evaluate the effectiveness of the proposed approach, two employed metrics were utilized; Q s accuracy and Segment Overlap (SOV) [44].Q s accuracy measures how the predicted secondary structure aligns, with the determined secondary structure specifically looking at the proportion of residues that match.Meanwhile, SOV assesses how closely the predicted and experimentally determined secondary structure segments resemble each other.In addition to these metrics, F1, Precision, and Recall were also employed to evaluate the proposed model's performance on the selected test dataset.Q s accuracy quantifies the proportion of residues where the predicted secondary structure aligns with the findings.This metric plays a role in assessing a model's ability to accurately classify types of secondary structures found in proteins.Precision is indicated by how residues are correctly predicted for their corresponding secondary structures.It expands on the conventional Q 3 accuracy measure S = (H, E, C) by categorizing secondary structures into nine categories: S = (H, G, I, P, B, E, T, S, L).To compute Q s we divide the number of correctly predicted residues, in state 5 (n s ) by the total number of residues actually in state 5 (N s ), with s representing each state within the set S. This is formally represented in (6): To calculate the overall accuracy for per residue prediction all (n s ) values for each state 5 in set S are summed up and divided by the sum of all (N s ) values for each state 5 in set S [4]: The SOV metric is crucial when evaluating the precision of protein secondary structure predictions.Unlike accuracy measures, SOV provides a detailed evaluation by considering both length and overlap between the predicted and actual segments.This metric is useful when assessing predictions for structure elements like alpha helices and beta sheets which can vary significantly in length.SOV compares how well predicted segments align with segments in terms of length and overlap.It takes into account variations, in segment size making it a comprehensive and realistic measure to assess prediction performance for complex proteins that exhibit diverse secondary structures.

E. Implementation Details
PyTorch framework was used as it offers a graph, imperative execution style and a wide range of tools and libraries.To ensure training and avoid overfitting to data patterns, the minibatch size was set to 8 and random sampling was employed to create minibatches.For optimizing the suggested models, the AdamW optimizer was used with a weight decay value of 0.0001.Throughout the training process a fixed learning rate of 0.00005 was maintained.To enhance the proposed model's performance, a custom cross loss function that handles class imbalances by allowing optional weights, for different classes was implemented.This function calculates the loss for each instance without reduction, and then averages it across the minibatch while considering the provided class weights.This approach ensures an impact of each class on the models learning process.Moreover, a stopping criterion was implemented.The particular criterion halts training if there is no improvement in Q s accuracy, on the validation set for 5 consecutive epochs.This study experiments were conducted using an NVIDIA Tesla V100 GPU with 16 GB VRAM and 32 GB system memory.The transformer encoder architecture used in this study consisted of 5 layers, each equipped with 8 attention heads.In these layers, a dropout rate of 0.2 was incorporated.The dimension of the feed forward network was set to 2048.The model's convolutional layers produced an output with 1024 channels.

A. Ablation Study
We comprehensively evaluated the performance of the proposed method through a series of experiments on the CB433 test set and our validation set.These experiments were meticulously designed to analyze the influence of key hyperparameters, specifically the number of transformer encoder layers, the number of attention heads, and the learning rate, on the model's effectiveness.

1) Number of Encoder Layers
To examine the effect of encoder layer depth on model performance, this study experimented with architectures ranging from 1 to 7 layers, each coupled with a fixed configuration of 8 attention heads.The validation and test results on the CB433 dataset, as depicted in Figure 2, indicate that the architecture with 5 encoder layers achieved the highest 9-class accuracy (Q9).Deeper models can learn more complex contextual representations and better capture long-range dependencies in protein sequences, performance plateaus, but this ability slightly declines beyond 5 layers.This could be indicative of overfitting or vanishing gradients, affecting the model's generalizability and learning efficacy.Performance evaluation of transformer models with varying numbers of encoder layers.

2) Number of Attention Heads
To investigate the optimal configuration for protein secondary structure prediction, there was a focus on the number of attention heads in the transformer encoder layer.Configurations with 1, 2, 3, 4, 6, 8, 12, and 16 attention heads were tested for their impact on the model's performance to be determined.As shown in Figure 3, the model with 8 attention heads proved most effective in both validation and testing sets, particularly on the CB433 dataset, achieving significant improvements in 9-class accuracy, pointing to the optimal balance between the granularity and breadth of attention mechanisms.While increasing the number of attention heads generally improves model's ability to discern intricate relationships within protein structures, a threshold exists beyond which additional heads may not enhance or could even reduce predictive accuracy.This highlights the importance of fine-tuning attention mechanisms in transformer models for specialized bioinformatics tasks mechanisms in transformer models for specialized bioinformatics tasks.Comparative analysis of prediction accuracy across different numbers of attention heads.

3) Learning Rate
The study examined the best hyperparameters for protein secondary structure prediction and found that the learning rate had a significant impact on model accuracy.The former rigorously assessed the model's performance throughout a range of learning rates, as shown in Figure 4: 0.001, 0.0005, 0.0001, 0.00005, 0.00001, and 0.000005.In the testing and validation stages, a learning rate of 0.00005 produced the best 9-class accuracy, especially when employing the CB433 dataset.Interestingly, there was a clear trend in the model's performance: the accuracy decreased dramatically at higher learning rates (0.001 and 0.0005) pointing the detrimental effect of rapid weight adjustments.However, as the learning rate was gradually reduced, a notable improvement in accuracy was observed, culminating in the optimal performance at 0.00005.Impact of learning rate on model accuracy.

B. Comparative Analysis on Eight-State Prediction
This section provides a comparative analysis of the proposed method against a selection of state-of-the-art predictors methods, specifically focusing on two distinct test data called, TEST2016 and TEST2018.To ensure an equitable comparison, data for existing predictors were sourced from the literature [4,7,9].The comparison encompasses a variety of methods: 10 predictors based on profile features and 3 predictors based on embedding features (Table I).For the LIFT_SS method, the most accurate results were selected from three lightweight fine-tuning approaches.In the conducted analysis the TE_SS method was evaluated, against these 13 predictors through the employment of different metrics on two test datasets.These metrics include Q8 and SOV8 for predictions in 8 classes and Q3 and SOV3 for predictions in 3 classes.The detailed results can be found in Table I, which showcases the performance of the proposed method alongside the 13 methods for each metric.Table I clearly demonstrates that the TE_SS method outperforms the others in predicting protein secondary structures in both 8-and 3-class formats.Not only does this comprehensive analysis reveal the strength of the TE_SS model, but also its advancement over existing state of the art methods setting a new standard, in protein secondary structure prediction.

C. Comparative Analysis on Nine-State Prediction
To assess the effectiveness of the TE_SS framework experiments were conducted using four widely used benchmark datasets, in the field of protein structure analysis.TE_SS was compared against two leading methods for predicting 9 class protein structures; DML_SSembed and LIFT_SS.Both these methods utilize embeddings derived from ProtTrans, a trained PL).These two methods were selected based on their utilization of the 9 class predictor from DSSP4.
DML_SSembed employs a centroid model for sequence to sequence prediction.It assigns a centroid in the embedding space to each structure category and aims to maximize the similarity between each amino acid and its corresponding centroid.This approach enhances the accuracy of secondary structure prediction.In contrast, LIFT_SS utilizes a fine tuning strategy on the pre trained PLM by employing 7 state of the art fine tuning techniques.This enables LIFT_SS to predict structures accurately by introducing new parameters during the embedding process.The results of these comparisons including predictions, for both 9-and 3-class scenarios are presented in Tables II and III.Notably, the highest metric values were taken from the 7 fine-tuning techniques used by LIFT_SS.The data for existing predictors were obtained from [7].It is worth mentioning that the TE_SS model consistently outperformed both DML_SSembed and LIFT_SS exhibiting its accuracy and effectiveness, in predicting protein structure.

D. Multi-Metric Evaluation
To thoroughly evaluate models performance, an approach was adopted by considering evaluation metrics, such as F1 score, Precision, and Recall.These metrics were applied to CB433, CASP12, CASP13, and CASP14 datasets.This rigorous evaluation strategy ensured that the model's effectiveness in predicting protein structure was reliable and applicable to a range of protein sequences.Table IV provides a summary of the models performance on these metrics highlighting its accuracy in predicting protein structure.The proposed model consistently performed satisfactorily across all datasets indicating its potential, for various protein structure prediction tasks.

IV. CONCLUSIONS
In this study, the effectiveness of the transformer-based TE_SS model in predicting protein structures has been demonstrated.Utilizing the Ankh protein language model for feature embedding, the TE_SS model achieves accurate predictions of protein structures in both nine and eight classification systems.The model's performance in predicting 9-class structures was evaluated on CASP12, CASP13, CASP14, and CB433 test datasets.Also, the model, trained on data containing 8 classes, was evaluated on two publicly available test datasets, TEST2016 and TEST2018.The experimental results indicate improved accuracy compared to the other models.A notable advancement of TE_SS is its adeptness in capturing both short-range and long-range dependencies among residues in proteins.The ability of this transformer-based model to process sequence data in parallel demonstrates its efficiency and effectiveness in analyzing complex protein structures.However, it is worth noting that the proposed method has limitations in terms of its demanding resources and GPU memory requirements.Moreover, the model currently lacks the ability to provide information regarding the reliability or confidence level of its predictions.This shortcoming is especially evident when the model encounters specific types of proteins or disordered regions within proteins, where its predictions may be less accurate or reliable.For future work, it is imperative to address these limitations, potentially by developing methods to estimate prediction reliability and optimizing the model for reduced resource consumption.

Fig. 2 .
Fig. 2.Performance evaluation of transformer models with varying numbers of encoder layers.

Fig. 3 .
Fig. 3.Comparative analysis of prediction accuracy across different numbers of attention heads.

Fig. 4 .
Fig. 4.Impact of learning rate on model accuracy.

TABLE III .
COMPARATIVE 3-CLASS PSSP RESULTS ON THE TEST DATASETS