Skeleton-based analysis of 3D Cryo-EM protein structures at intermediate resolutions

Tao Ju, 2007

Background
Cryo-electron microscopy (cryo-EM) has assumed an increasingly important role in determining subnanometer structures of macromolecular machines, such as viruses, that are difficult to study by conventional methods such as X-ray crystallography and NMR. Despite the fast-growing availability of cryo-EM data at intermediate resolutions (6-8 angstrom), efficient and accurate methods for visualizing large molecular assemblies in such data (which is in volumetric form) and for analyzing their structures are still in great demand. The difficulties of developing such methods are attributed to two causes: the complexity of the volume, which makes traditional iso-surface visualization often more confusing than illustrating, and the insufficient resolution of the data, which is much lower than what's required to directly build atomic protein models (as done in X-ray crystallography). The ultimate computational challenge lies in how to obtain atomic models from complex data that is not, or even not close to, atomic resolution.



Methods
To resolve the complexity and the resolution problem inherent with cryo-EM data, our approach is to develop geometric algorithms that first reduces the complexity of the cryo-EM data, and then utilize the reduced representation in constraining traditional sequence-based modeling techniques, such as homology and Ab initio modeling, to obtain close-to atomic models that agree with the cryo-EM data. In particular, our methods are based on computing a skeleton of the cryo-EM density map that carries both its connectivity and shape information. The simplicity of the skeleton allows us to device faster and more robust methods for predicting protein structures from cryo-EM data. Our approach, as outlined in the picture below, consists of creating the skeleton, identifying the protein secondary structures using the geometry of the skeleton, identifying the correspondence of these structures to those in the sequence using the topology of the skeleton, and refining the tertiary structure by placing pseudo-atoms and atoms using global and local energy minimization.



Computing 3D skeletons from irregular volumes [CAD 07]

Skeleton is a thin, centered structure that preserves the topology and shape of a 3D object. At the foundation of our protein modeling methodology, we first developed a novel skeletonization technique that automatically generates a simplified skeleton of the protein object from the noise-abundant cryo-EM data. The skeleton consists of 1D curves and 2D surfaces that correspond to tube-like shapes and plate-like shapes in the cryo-EM density. At intermediate resolutions, these shapes often correspond to alpha-helices and beta-sheets on the protein.



Identifying secondary structures [Structure 07]

As observed above, the skeleton carries important shape information about the protein, in particular, about the secondary structures. Using the geometry (e.g., curves and surfaces) of the skeleton, we developed an automated method (known as SSEhunter) for identifying both alpha-helices and beta-sheets from cryo-EM density maps at intermediate resolutions. Besides the skeleton information, the method also utilizes the density gradients and cross-correlation with prototypical helices.



Identifying secondary structure correspondences [SPM 07]

While SSEhunter (and several other methods for similar purposes) can identify the secondary structures in a cryo-EM map with reasonable accuracy, what is still lacking is the correspondence between these identified structures in the cryo-EM and those predicted in the protein sequence. Once again, by utilizing the skeleton (this time using its topological information), we developed an efficient graph-matching-based method that establishes such correspondence with reasonable accuracy given only the cryo-EM density map and the amino acid sequence. The constraining of the search space by the skeleton allows our method to perform 3-4 orders of magnitude faster than the best available method.




Talks
  • Computing a family of skeletons of volumetric models for shape description
    Paper presentation at GMP (July 2006). (Download PPT 13MB)


  • Building Skeletons for Analyzing 3D Cryo-EM Protein Structures at Intermediate Resolutions
    Seminar lecture at the Center for Computational Biology in Washington University Medical School (Oct 2005). (Download PPT 45MB)

Collaborators


Comments or suggestions: taoju at cs.wustl.edu