Aim: Predicting the secondary structure of proteins based on amino acid sequences is one of the most significant issues in bioinformatics that requires clarification. A high accuracy in determining the secondary structure is a key to programmatically uncover 3D structure of proteins and for individual drug applications of programmable proteins. The success rates in predicting the secondary structures (Q3 score) were around 0.60 when relevant research was initiated and now the rates have reached to the limit of 0.80.
Material and Methods: In this study, the secondary structure was predicted through 3-state (Helix, Strand and Turn). Artificial neural networks and machine learning algorithms were used as a hybrid model and a framework was developed. The probability of the paired presence of amino acids in sequences was used in digitizing amino acid sequences. Calculations were completed separately for each secondary structural element and the cascade mean filter was used as a threshold method to clarify the differences. The generated matrices were used to digitize the protein sequences. Secondary structure was predicted through the helix-strand, helix-turn, strand-turn, and subsequently, a final decision as helix, strand and turn was reached via machine learning models.
Results: It was determined that the success rates in the dual estimation of secondary structural elements were 0.797 for helix-strand, 0.848 for helix-turn and 0.829 for strand-turn. The average success rate for paired estimation of secondary structural elements was calculated as 0.824. In the proposed model, accuracy was calculated as 0.742 for Helix, 0.703 for Strand and 0.880 for Turn. Q3 score was obtained as 0.775.
Key words: Protein secondary structure prediction; amino acid encoding; neural networks; machine learning methodes
|