Automated structure prediction of weakly homologous proteins on a genomic scale

AUTOR(ES)
FONTE

National Academy of Sciences

RESUMO

We have developed tasser, a hierarchical approach to protein structure prediction that consists of template identification by threading, followed by tertiary structure assembly via the rearrangement of continuous template fragments guided by an optimized Cα and side-chain-based potential driven by threading-based, predicted tertiary restraints. tasser was applied to a comprehensive benchmark set of 1,489 medium-sized proteins in the Protein Data Bank. With homologues excluded, in 927 cases, the templates identified by our threading algorithm prospector_3 have a rms deviation from native <6.5 Å with ≈80% alignment coverage. After template reassembly, this number increases to 1,172. This shows significant and systematic improvement of the final models with respect to the initial template alignments. Furthermore, significant improvements in loop modeling are demonstrated. We then apply tasser to the 1,360 medium-sized ORFs in the Escherichia coli genome; ≈920 can be predicted with high accuracy based on confidence criteria established in the Protein Data Bank benchmark. These results from our unprecedented comprehensive folding benchmark on all protein categories provide a reliable basis for the application of tasser to structural genomics, especially to proteins of low sequence identity to solved protein structures.

Documentos Relacionados