Given the native structure of a protein, SSFold predicts the interaction order of secondary structure elements (SSEs) during folding. By considering each intermediate conformation as a collection of fully folded structures in which each of them contains a set of interacting SSEs, the conformation space is significantly reduced.
We show that SSFold is able to accurately predict the most energetically favorable folding pathway of large proteins with hundreds of residues at the mesoscopic level, including the pig muscle phosphoglycerate kinase with 416 residues. The model is detailed enough to distinguish between different folding pathways of structurally very similar proteins, including the streptococcal protein G and the peptostreptococcal protein L, and two variants NuG1 and NuG2 of protein G.
The SSFold source code consists of a single file ssfold.c. It can be compiled under the Unix/Linux/Windows(Cygwin) environment with the command "gcc -O3 -o ssfold ssfold.c -lm".
Input:
Each SSE occupies two lines: the first line starts with ">" followed by the name of the SSE, and the second line lists the starting position, the ending position, followed by the type of SSE (H denotes a helix, S denotes a strand, and T denotes a turn).
Example:
>1S 1, 8, S >2T 9, 12, T >3S 13, 19, S >4T 20, 22, T >5H 23, 35, H >6T 36, 41, T >7S 42, 46, S >8T 47, 50, T >9S 51, 56, S
By default, SSFold assumes that Rosetta has been installed and the rosetta.gcc executable is included in the search path which will be used automatically. A paths.txt file that specifies directory information for Rosetta should be in the current directory. The original Rosetta energy function is used by default. To use other methods to compute free energy, change the function
double cal_tot_energy_2(char *f );in ssfold.c, where f is the name of the file that contains three-dimensional coordinates of atoms in a partially folded protein and the function returns the free energy of the structure.
Rosetta implicitly imposes an upper bound on the size of a protein that can be processed. To allow for large proteins, change the parameter MAX_RES in param.cc within the Rosetta source code and re-compile Rosetta.
Usage:
ssfold -p 1GB1.pdb -s 1GB1SS -o 1GB1Fold
Command line parameters:
-p "name of file that contains the three-dimensional structure"
-s "name of file that contains the SSEs"
-o "name of output file"
Output:
Each row shows an intermediate conformation during folding. SSEs that are enclosed within a pair of parentheses are fully folded. The first number is the free energy of the conformation, and the second number is the percentage of native contacts of the conformation, where a native contact is defined to be a pair of amino acids that have their α-carbon atoms within 7 Å of each other.Example:
( 1S )( 3S )( 5H )( 7S )( 9S ) 43.94 0.50 ( 1S )( 3S )( 5H )( 7S 9S ) 21.43 0.61 ( 1S )( 3S )( 5H 7S 9S ) 0.29 0.70 ( 1S 5H 7S 9S )( 3S ) -23.52 0.78 ( 1S 3S 5H 7S 9S ) -66.40 1.00
Examples:
Yang Q. and Sze S.-H. (2008) Predicting protein folding pathways at the mesoscopic level based on native interactions between secondary structure elements. BMC Bioinformatics, 9(320).