DeepFold Server: Protein Structure Prediction with Enhanced Sidechain and Backbone Accuracy

Minsoo Kim1, Hanjin Bae1, Gyeongpil Jo1, Jejoong Yoo1,*, and Keehyoung Joo2,*

1 Department of Physics, Sungkyunkwan University, Suwon 16419, Korea

2 School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea

* To whom correspondence should be addressed. Email: newton@kias.re.kr; jejoong@skku.edu

Despite the advancements achieved by AlphaFold2 (1), powered by deep neural networks, there remains a strong demand within the life sciences and industrial communities for high-quality protein structure predictions, particularly for detailed features such as sidechains and improved backbone structures. To address this need, we developed the DeepFold protocol (2), incorporating newly trained sidechains using more sophisticated loss functions and a novel template feature generation leveraging advanced structure alignment methods. Our protocol has been rigorously validated through the CASP15 competition and published in a peer-reviewed article in December 2023 (Bioinformatics 2023;39(12):btad712, Pubmed ID: 37995286). Here, we implemented the DeepFold protocol into the DeepFold server, accessible at https://model.deepfold.org. While the code is also available as open-source software at https://github.com/newtonjoo/deepfold, the server is designed to enhance usability. This website is free and open to all users and there is no login requirement.

Description of the DeepFold web server

• Input: Protein sequence.

• Outputs: Protein structure predictions complemented with additional analysis data, including the secondary structure information, position-dependent solvent accessibility, a 3D visualization window displaying predicted models with pLDDT scores, an interactive distogram plot integrated with the 3D visualization window, Ramachandran plots, and multiple sequence alignments annotated with Neff values and sequence similarity.

• The processing method: DeepFold generates multiple sequence alignments (MSAs) using HHBlits (3) or JackHMMER (4) and identifies template candidates through HHpred (5). To improve the quality of MSA and template features, templates are re-ranked using CRFalign (6), a sequence-structure alignment method that combines pairwise conditional random fields with gradient-boosted regression trees. To further enhance sidechain quality, new loss functions were introduced, including sequentially conditioned torsion angles and sidechain confidence. Frame-aligned point error (FAPE) was reweighted, and a novel loss function for secondary structure prediction was employed. These loss functions were trained on the latest PDB database (February 2022). Structure templates are then processed as input to the DeepFold network, where the Evoformer network (1) generates single and pair representations for the sequence. These representations are subsequently fed into structure modules to produce high-quality 3D protein structures.

• Performance on recent CASP experiments (CASP15): In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold secured fourth place among 132, demonstrating significant improvements in structural details, including backbone, sidechain, and MolProbity scores (Table 1). Based on the official CASP15 scores, DeepFold achieved a mean GDT-TS score of 80.33, outperforming AlphaFold2’s score of 75.81 (Table 1). Notably, for TBM-easy/hard targets, DeepFold ranked at the top according to Z-scores for GDT-TS, highlighting its ability to enhance the structural details of predicted models. Furthermore, an in-depth analysis of 55 domains from 39 targets with publicly available structures demonstrated that DeepFold delivers superior side-chain accuracy and MolProbity scores compared to other top-performing groups.

Table 1. Official CASP15 scores of the top 10 performing groups based on the assessor’s formula. For comparison, three AlphaFold2-based methods are included at the bottom of the table as references.

References

1. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A. et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583-589.

2. Lee, J.-W., Won, J.-H., Jeon, S., Choo, Y., Yeon, Y., Oh, J.-S., Kim, M., Kim, S., Joung, I., Jang, C. et al. (2023) DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function. Bioinformatics, 39, btad712.

3. Remmert, M., Biegert, A., Hauser, A. and Söding, J. (2012) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nature Methods, 9, 173-175.

4. Johnson, L.S., Eddy, S.R. and Portugaly, E. (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics, 11, 431.

5. Söding, J. (2004) Protein homology detection by HMM–HMM comparison. Bioinformatics, 21, 951-960.

6. Lee, S.J., Joo, K., Sim, S., Lee, J., Lee, I.-H. and Lee, J. (2022) CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields. Molecules, 27, 3711.