The Root-Mean-Square Deviation (RMSD) of least-squares rigid-body superposition of two vector sets serves as a reliable measure of structural similarity in proteins. Structural Biologists and Crystallographers use this measure for the common task of detecting three-dimensional oligopeptide fragments in large protein structural databases. Routine applications of this task include:
The Super web server permits the rapid extraction of structurally similar fragments from the entire Protein Data Bank (PDB). In particular, for the submitted structural queries, users of Super can detect ALL loci in the PDB where the query will orthogonally superpose within a user-stipulated threshold RMSD.
Behind the scenes, our method relies on quickly computing a mathematically guaranteed lower bound on the RMSD of the superposition of any two vector sets. Using this cheaply computable lower bound, a vast majority of candidate matches in the PDB can be quickly filtered out when the lower bound is greater than the stipulated RMSD threshold.
In addition to the mathematics, our underlying superposition program additionally benefits from a very efficient implementation of Kearsley’s method to solve the least-squares superposition problem as an eigenvalue problem in quaternion parameters. In particular, the eigenvalue decomposition can be prematurely terminated using Gershgorin’s circle theorem which bounds the spectrum of a square matrix. This, together with some engineering optimizations, makes Super very fast allowing it to perform an average search across the entire PDB in about 30 secs (while guaranteeing to find all superposable fragments).
The program behind Super is freely available from the following link (distributed using the GNU General Public License version 3): http://gitorious.org/super
Super supports two types of queries:
In the first option (see figure above), the user can search using a contiguous fragment as a search query and the server will search the entire PDB and extract all fragments that superpose with the query within a specified threshold of RMSD.
The second option (see below) generalizes the first and allows the user to search using a query fragment with a prespecified gap within it. In other words, the user can search the PDB for superposable fragments (within the specified threshold), ignoring a stretch of residues in between. The query comes in two parts with a fixed gap length in between.
Super offers users two simple methods to submit fragment search queries:
The “Load test data” button at the top of the Super’s submission’s page provides an example of the above two methods of input. The following provides more details about these two input methods.
The coordinate records of the PDB can be directly pasted into the text box (shown in the figure above). The coordinate records must be provided in the standard Brookhaven PDB format with atomic-level records of the residues of the query fragment. Purely C atoms records of residues in the fragment can also be provided, ignoring the details of other atoms.
The user can specify the RMSD threshold (real number; default = 2.0 Angstroms) for the search. The search process will detect ALL structural fragments in the PDB that superpose with the query fragment within the stated RMSD. Since the list of results returned by the server can be very long, the user has an option to specify the number of annotated results to be displayed (default=70 hits). (See figure below.) However, a flat text file containing the entire search results can always be downloaded for each query via a link in the results page.
Alternatively, the query fragment can be specified using the wwPDB accession number of the query along with its loci in terms of its chain ID and residue range. The wwPDB accession numbers or PDB ID are unique 4 letter codes (e.g. 2IC7) for structures in the PDB. On entering the PDB ID in the text box provided (see figure below), Super automatically communicates with the RCSB PDB server, downloads the PDB file and parses it for the user to interactively select the chain IDs within the PDB file.
The user will experience some latency (of a few seconds) in validating the entered PDB ID. During the validation process the PDB ID text box will be colored red as shown in the figure below.
Upon validation, the text box will turn green and will automatically provide the chain IDs of the structure from the drop down list. (See figure above.) The user needs to specify the start and end residue ranges of the query. The residue IDs are specified either as an alphanumeric string composed of the three-letter amino acid code followed by the residue number or simply a residue number in the PDB as shown above. An insertion code following the residue number may also be used if necessary. As before, this input option allows users to specify the RMSD threshold and control the number of annotated hits to be displayed.
As mentioned earlier, Super additionally provides an option to search queries that are an equivalent of ‘wild cards’ (in a manner of speaking). Assume, for example, that the user wants to identify all superposable fragments in the PDB of length (say) 11 where the criteria of similarity is based on regions in the fragment that match residues 1-4 (stump 1) and then 8-11 (stump 2), while entirely ignoring the fit of the residues in the range 5-7. Such searches are tremendously useful for constructing loop regions in the structure predictive method of homology modelling.
The input method for such queries requires the user to paste the PDB records (in Brookhaven format) of stump 1 and stump 2 in the designated text boxes and specify the number of residues in between these two stumps to ignore.
Submitting the search using either of the two input methods described above, opens a new results web browser tab. An adjustable title text is provided for each search so that users can describe the search in their own words. This is especially useful when a user submits multiple search queries and wants to review the results at a later stage. (See figure below.)
Associated with each web browser is a unique user (browser) identifier. This helps the server to keep track of all the searches performed on the browser which can be accesses at a later stage.
During the search process, a status bar shows the percentage completion of the search (within the PDB). An interactive visualization of the query fragment is available at the time of the search as shown in the figure below.
When the search is completed, a certain fixed number of annotated search hits are displayed in the results page. This number is adjustable via a number box available at the top of the results page. (See figure below).
At the top right of the results page (below the status bar) is a link to “download the complete search results as a text file”.
For each fragment in the PDB that is superposable with the query within the RMSD threshold, the search results page displays its PDB ID (hyperlinked to its RCSB PDB page), the title of the structure extracted from its PDB file, its chain ID, residue IDs of start and end points in the PDB where the match occurred, and the RMSD of superposition. (See figure below.) The fragments are displayed in increasing order of RMSD.
Against each returned result is a “view” button which allows the user to interactively visualize the superposition of the query with the PDB fragment. On clicking the view button, Super automatically downloads the corresponding protein coordinate data from the RCSB PDB site and orthogonally transforms the query PDB coordinates to superpose on that specified loci (chain and residue IDs).
Upon completion, the visualization frame on the left is refreshed to display the entire structure from the PDB (in blue) and the query superposed on the listed loci (in red). This visualization is interactive and right clicking provides many options that control the viewing of the superposition.
As indicated previously, each results page displays a specified number of search hits, while a full list of hits can be downloaded as a text file from the link provided at the top of the page. A snippet containing the top few lines of the text file is provided below.
The text file contains results in the internal order in which the search was conducted and (unlikeSuper maintains a cookie with each browser from where queries are submitted. This cookie stores information to link to the results of all successful searches from that browser. Clicking the ‘view previous searches’ link from the top of the submission page at any time (see figure above) opens up a page with links to all previous searches undertaken using that browser. The link provides details of the date and time of the search, and the search title (either default title prepared by the server or a user-updated title as described earlier), shown in the figure above.
Clicking on the link takes the user to the completed search results stored on the server for up to 30 days from the date of the search.
While the results of all searches from a browser can be accessed using the ‘view previous searches’ link, the user can bookmark individual results using the ‘bookmark this link’ option supplied at the top-right on any results page. On most browsers right-clicking on that option would allow the user to bookmark the page. Alternatively, left-clicking will provide the full URL in the browser’s address bar with which the user can return to the results page.