Super web server help

Introduction

The Root-Mean-Square Deviation (RMSD) of least-squares rigid-body superposition of two vector sets serves as a reliable measure of structural similarity in proteins. Structural Biologists and Crystallographers use this measure for the common task of detecting three-dimensional oligopeptide fragments in large protein structural databases. Routine applications of this task include:

  1. understanding the general principles of protein architecture.
  2. bridging gaps in an incomplete structural model, by searching the database for fragments that link the given starting and ending points.
  3. and, more generally, for predicting the protein three-dimensional structure.

The Super web server permits the rapid extraction of structurally similar fragments from the entire Protein Data Bank (PDB). In particular, for the submitted structural queries, users of Super can detect ALL loci in the PDB where the query will orthogonally superpose within a user-stipulated threshold RMSD.

Superposition Algorithm Overview

Behind the scenes, our method relies on quickly computing a mathematically guaranteed lower bound on the RMSD of the superposition of any two vector sets. Using this cheaply computable lower bound, a vast majority of candidate matches in the PDB can be quickly filtered out when the lower bound is greater than the stipulated RMSD threshold.

In addition to the mathematics, our underlying superposition program additionally benefits from a very efficient implementation of Kearsley’s method to solve the least-squares superposition problem as an eigenvalue problem in quaternion parameters. In particular, the eigenvalue decomposition can be prematurely terminated using Gershgorin’s circle theorem which bounds the spectrum of a square matrix. This, together with some engineering optimizations, makes Super very fast allowing it to perform an average search across the entire PDB in about 30 secs (while guaranteeing to find all superposable fragments).

The program behind Super is freely available from the following link (distributed using the GNU General Public License version 3): http://gitorious.org/super

Query types

Super supports two types of queries:


  1. Search PDB using a contiguous fragment.
  2. Search PDB using two fragments with a prespecified gap in between.

In the first option (see figure above), the user can search using a contiguous fragment as a search query and the server will search the entire PDB and extract all fragments that superpose with the query within a specified threshold of RMSD.

The second option (see below) generalizes the first and allows the user to search using a query fragment with a prespecified gap within it. In other words, the user can search the PDB for superposable fragments (within the specified threshold), ignoring a stretch of residues in between. The query comes in two parts with a fixed gap length in between.

Input methods to the Web server

Super offers users two simple methods to submit fragment search queries:


  1. By pasting the query coordinates in the standard Brookhaven PDB format.
  2. By specifying the wwPDB accession number, followed by the chain ID and residue range of the query.

The “Load test data” button at the top of the Super’s submission’s page provides an example of the above two methods of input. The following provides more details about these two input methods.

Pasted input in Brookhaven PDB format

The coordinate records of the PDB can be directly pasted into the text box (shown in the figure above). The coordinate records must be provided in the standard Brookhaven PDB format with atomic-level records of the residues of the query fragment. Purely C atoms records of residues in the fragment can also be provided, ignoring the details of other atoms.

The user can specify the RMSD threshold (real number; default = 2.0 Angstroms) for the search. The search process will detect ALL structural fragments in the PDB that superpose with the query fragment within the stated RMSD. Since the list of results returned by the server can be very long, the user has an option to specify the number of annotated results to be displayed (default=70 hits). (See figure below.) However, a flat text file containing the entire search results can always be downloaded for each query via a link in the results page.

Specified input using the wwPDB accession number and loci.

Alternatively, the query fragment can be specified using the wwPDB accession number of the query along with its loci in terms of its chain ID and residue range. The wwPDB accession numbers or PDB ID are unique 4 letter codes (e.g. 2IC7) for structures in the PDB. On entering the PDB ID in the text box provided (see figure below), Super automatically communicates with the RCSB PDB server, downloads the PDB file and parses it for the user to interactively select the chain IDs within the PDB file.

The user will experience some latency (of a few seconds) in validating the entered PDB ID. During the validation process the PDB ID text box will be colored red as shown in the figure below.

Upon validation, the text box will turn green and will automatically provide the chain IDs of the structure from the drop down list. (See figure above.) The user needs to specify the start and end residue ranges of the query. The residue IDs are specified either as an alphanumeric string composed of the three-letter amino acid code followed by the residue number or simply a residue number in the PDB as shown above. An insertion code following the residue number may also be used if necessary. As before, this input option allows users to specify the RMSD threshold and control the number of annotated hits to be displayed.

Gapped query fragment input format

As mentioned earlier, Super additionally provides an option to search queries that are an equivalent of ‘wild cards’ (in a manner of speaking). Assume, for example, that the user wants to identify all superposable fragments in the PDB of length (say) 11 where the criteria of similarity is based on regions in the fragment that match residues 1-4 (stump 1) and then 8-11 (stump 2), while entirely ignoring the fit of the residues in the range 5-7. Such searches are tremendously useful for constructing loop regions in the structure predictive method of homology modelling.

The input method for such queries requires the user to paste the PDB records (in Brookhaven format) of stump 1 and stump 2 in the designated text boxes and specify the number of residues in between these two stumps to ignore.

Search Results

Submitting the search using either of the two input methods described above, opens a new results web browser tab. An adjustable title text is provided for each search so that users can describe the search in their own words. This is especially useful when a user submits multiple search queries and wants to review the results at a later stage. (See figure below.)

Associated with each web browser is a unique user (browser) identifier. This helps the server to keep track of all the searches performed on the browser which can be accesses at a later stage.

During the search process, a status bar shows the percentage completion of the search (within the PDB). An interactive visualization of the query fragment is available at the time of the search as shown in the figure below.

When the search is completed, a certain fixed number of annotated search hits are displayed in the results page. This number is adjustable via a number box available at the top of the results page. (See figure below).

At the top right of the results page (below the status bar) is a link to “download the complete search results as a text file”.

For each fragment in the PDB that is superposable with the query within the RMSD threshold, the search results page displays its PDB ID (hyperlinked to its RCSB PDB page), the title of the structure extracted from its PDB file, its chain ID, residue IDs of start and end points in the PDB where the match occurred, and the RMSD of superposition. (See figure below.) The fragments are displayed in increasing order of RMSD.

Against each returned result is a “view” button which allows the user to interactively visualize the superposition of the query with the PDB fragment. On clicking the view button, Super automatically downloads the corresponding protein coordinate data from the RCSB PDB site and orthogonally transforms the query PDB coordinates to superpose on that specified loci (chain and residue IDs).

Upon completion, the visualization frame on the left is refreshed to display the entire structure from the PDB (in blue) and the query superposed on the listed loci (in red). This visualization is interactive and right clicking provides many options that control the viewing of the superposition.


Below the visualization, a direct amino acid sequence comparison between the fragments of the query and the search result is provided as shown in the figure above. In the sequence comparison display, the single letter codes of each amino acid are coloured according to their chemical nature. Specifically:

As indicated previously, each results page displays a specified number of search hits, while a full list of hits can be downloaded as a text file from the link provided at the top of the page. A snippet containing the top few lines of the text file is provided below.

The text file contains results in the internal order in which the search was conducted and (unlike
the results page) is NOT sorted in increasing order of RMSD. The legend of the columns is given below (left to right):
  1. PDB ID
  2. Chain ID
  3. Start and end residues (both inclusive)
  4. RMSD of superposition with the query
  5. PDB fragment sequence : query sequence
  6. Sequence identity as a percentage
  7. Sequence similarity as a percentage

Retrieving previous search results



Super maintains a cookie with each browser from where queries are submitted. This cookie stores information to link to the results of all successful searches from that browser. Clicking the ‘view previous searches’ link from the top of the submission page at any time (see figure above) opens up a page with links to all previous searches undertaken using that browser. The link provides details of the date and time of the search, and the search title (either default title prepared by the server or a user-updated title as described earlier), shown in the figure above.

Clicking on the link takes the user to the completed search results stored on the server for up to 30 days from the date of the search.

Bookmarking results page

While the results of all searches from a browser can be accessed using the ‘view previous searches’ link, the user can bookmark individual results using the ‘bookmark this link’ option supplied at the top-right on any results page. On most browsers right-clicking on that option would allow the user to bookmark the page. Alternatively, left-clicking will provide the full URL in the browser’s address bar with which the user can return to the results page.

Known Issues

Reporting problems with the web server

Email
James Collier