SuperPose Version 1.0

Online Help

Introduction
SuperPose Input
Options
How SuperPose Works
How SuperPose Measures Up

Introduction

Because 3D structure is actually much more conserved than sequence, 3D structure comparisons allow us to look even further back into biological prehistory. The most common method for 3D structure comparison is called structure superposition. Superposition (or superimposition) is simply the process of rotating or orienting an object until it can be directly overlaid on top of a similar object. The SuperPose web server uses a combination of sequence alignment, difference distance matrix comparison and quaternion eigenvalue superposition to allow users to superimpose 3D structures in 5 different ways:

1) Two molecules of identical sequence and structure (i.e. chains of 2TRX). See example

2) Two molecules of identical sequence but profoundly different structure (i.e. open and closed forms of calmodulin). See example

3) Two molecules of modestly dissimilar sequence, length and structure (i.e. hemoglobin A and B chains). See example

4) Two molecules that are profoundly different in sequence but similar in structure (i.e. ubiquitin and elongin). See example

5) Two or more molecules of identical sequence but slightly different structure (i.e. crystal subunit isoforms, NMR structure ensembles). See example

SuperPose Input

SuperPose requires one or more properly formatted PDB file which can be retrieved from the Protein Data Bank at www.rcsb.org or which can be generated using most kinds of molecular modeling software (CNS, Xplor, CCP4, DeepView, Gromacs, Babel, MOLMOL) or homology modeling software (DeepModel or other structure prediction web servers). These files may be edited by the user to remove chains, polypeptide segments, atoms or comments that are not needed. The most common reason for SuperPose to not work is that the input PDB file has been corrupted or improperly prepared. SuperPose allows PDB files to be uploaded either as:

1) PDB files (txt format) from the user’s hard drive

2) PDB accession numbers

If a user chooses to use PDB accession numbers, the program automatically goes to the PDB website and retrieves the necessary files. This is the safest way of getting a properly formatted PDB file. SuperPose also allows users to interactively select chains within PDB files if they are not familiar with the chain structure or chain content in their chosen PDB file. This is done simply by clicking on the name of the chain in the scroll boxes that SuperPose generates after it has read each PDB file. Models and chains can also be selected directly from the main input page by following the pdb accession with an underscore (or colon) and the model/chain identifier as illustated in the following examples:

1BQV: 1BQV is an NMR ensemble containing 28 models. Each model contains a single chain. To select the chain in model 5, enter: 1BQV_5 or 1BQV:5

1HBB: 1HBB contains a single model with four chains (A,B,C, and D). To select chain B, enter: 1HBB_B, or 1HBB:B

1A03: 1A03 is an NMR ensemble containing 20 models. Each model contains 2 chains (A and B). To select chain A from model 12, enter: 1A03_12_A or 1A03:12:A

Inputting Multiple Models

SuperPose accepts as input only 2 PDB files, but each PDB file may contain mutliple models and/or chains. If you wish to superimpose structures contained in more than 2 PDB files you will have to concatenate them into at most 2 files. A handy way to do this is to use SuperPose initially to superimpose the structures from 2 PDB files. SuperPose provides a PDB file of the superimposed structures for you as part of its output. The superimposed structures can be used as input with another file, and this process repeated until all the desired files are concatenated and superimposed. Alternately, you can manually concatenate the structures into a single PDB file using a text editor, or more preferably with molecular modeling software.

Options

Output Options

Figure 2: Output Options

SuperPose features a number of options for customizing the still image output, including:

Style:: Choose from Backbone or Ribbon styles.

Colour:: Choose from Greyscale or Colour.

View:: Choose from Mono or Stereo View.

Background:: Choose from Black or White.

Alignment Options

Figure 7: Alignment Options

In order to perform a superposition, SuperPose needs information on which pairs of atoms to 'match' for the superposition (see How SuperPose Works). You may specify the alignment by entering residue numbers (PDB numbering) directly in the Alignment Options box. Specify comma-separated ranges of residues in ascending order (eg. 5-14, 35-100, 115-140). Ensure the total number of residues in each textbox are equal (if superposing structures from 2 different PDB entries). For automated alignments, simply leave these boxes blank.

Example: Global Alignment

Say you want to use the following alignment for superposition:

PDB_Entry_A      1 SDKIIHLTDDSFDTDVLKA--DGAILVDFWAEWCGPCKMIAPILDEIADE     48
                     .:..:...:...:.|.|  |..::|||.|.||||||||.|....::::
PDB_Entry_B      1   MVKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMIKPFFHSLSEK     48

PDB_Entry_A     49 YQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQ     98
                   |...:.: ::::|.....|.:..::..||...||.|:    |||..| |.
PDB_Entry_B     49 YSNVIFL-EVDVDDCQDVASECEVKCTPTFQFFKKGQ----KVGEFS-GA     92

PDB_Entry_A     99 LKEFLDANLA       108
                   .||.|:|.:.   
PDB_Entry_B     93 NKEKLEATINELV    105

Identify the residues that are aligned to a gap, and therefore unmatched. Exclude these residues from the list of residue numbers for the alignment. For PDB Entry 'A', residues 1-2 (SD), 56 (A), 86-89 (VAAT), and 96 (K) align to gaps. Enter the residue ranges for PDB entry 'A' as follows:

     PDB_Entry_A:  3-55, 57-85, 90-95, 97-108

For PDB Entry 'B', residues 18-19 (AG), and 103-105 (ELV) align to gaps. Enter the residue ranges for PDB entry 'B' as follows:

     PDB_Entry_B:   1-17, 20-102

Example: Local Alignment

Say you wish to superpose an NMR ensemble of structures contained within a single PDB Entry, and you wish to restrict the superposition to the N-terminal 50 residues. Enter the residue list as follows:

     PDB_Entry_A:  1-50

Advanced Options

Figure 8: Advanced Options

The advanced options allow you to adjust certain parameters that SuperPose uses to decide how to best superimpose two or more structures. Don't tinker with these options unless you understand How SuperPose Works!

Secondary Structure: For structures sharing remote homologies, sequence alignments can be poor, whereas the structural similiarity may very well be preserved. For alignments with low sequence identity, SuperPose will attempt a secondary structural alignment, and use this information to guide the superposition. The Secondary Structure Alignment option allows you to determine the percent-identity cutoff for sequence-vs-secondary-structure alignments.

Subdomain Matching

SuperPose can look for structurally similar and dissimilar regions between aligned protein chains. This is useful in identifying hinge motions, mobile segments, open and closed forms of protein structures, etc. If SuperPose finds structurally dissimilar regions, it will superpose the structures based on the single longest structurally similar region shared by the two structures.

Subdomain Matching: This option toggles the subdomain matching algorithm on or off. This is handy if you wish to ignore the structural differences in your set of structures, and instead superpose them based on a global sequence alignment.

Minimum Sequence Similarity: SuperPose assumes that you want to perform subdomain matching primarily for superposing the structurally similar domains of highly related structures, therefore SuperPose uses a high default minimum pairwise sequence identity cutoff of 80 percent. Only alignments greater than this value are considered for subdomain superpositions.

Similarity Cutoff: When performing a subdomain match, SuperPose identifies the longest 'similar' stretch of residue pairs, and uses this subdomain for the superposition. This option allows you to adjust the degree of similarity by setting the maximum allowable RMSD contained in a stretch of residues pairs.

Dissimilarity Cutoff: Superpose identifies candidates for subdomain superposition by checking the difference distance matrix for stretches of residue-pairs with an RMSD greater than the dissimilarity cutoff. Stretchs of residue-pairs with an RMSD value above this cutoff and exceeding a certain number of residues will trigger a subdomain superposition.

Dissimilar Subdomain: Superpose identifies candidates for subdomain superposition by checking the difference distance matrix for stretches of residue-pairs with an RMSD greater than the dissimilarity cutoff (above) and exceeding the Dissimilar Subdomain length (number of alpha carbon pairs).

SuperPose Output

SuperPose produces up to seven kinds of output:

1) a PDB file containing the coordinates of the superimposed molecules;

2) a PDB file containing the backbone coordinates of a single averaged structure (if all sequences match identically);

For output #1 and #2, the file is in a txt format and it can be used directly in other kinds of modeling or viewing software. The output in #3 is also in txt format and it may be used simply to track which residues have been matched in which chains.

3) a sequence and/or secondary structural alignment (pairwise or multiple) file of the sequences used in the alignment;

4) a difference distance matrix (if only 2 molecules are superimposed);

The difference distance matrix (output #4) is generated as a PNG image that may be used to visually identify regions where there are significant differences between any two structures. The lighter the region, the more similar the structures are. Likewise the darker the region, the more different the structures are. The default display for SuperPose’s difference distance plot shows 6 graded cutoffs. Differences between 0 and 1.5 Angstroms are white, differences between 1.5 and 3.0 A are yellow, differences between 3.0 and 5.0 A are a light green, differences between 5 and 7 A are colored dark turquoise, differences between 7 and 9 are colored dark blue and those greater than 9 are colored black. To learn more about difference distance plots please read the following paper: Richards, F.M. and Kundrot, C.E. (1988) Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure. Proteins. 3, 71-84.

5) the calculated RMSD values (in Angstroms) between the superimposed molecules;

6) a still image (PNG) of the superimposed molecules generated using MolScript

7) a WebMol applet view of the superimposed molecules.

The WebMol viewer is a Java applet for simple structure viewing. It may not work if a user’s site is protected by a firewall or if a user’s browser doesn’t have the Java run time library installed (especially recent versions of Internet Explorer).

How SuperPose Works

SuperPose uses a combination of sequence (or secondary structure) alignment, difference distance matrix analysis, and quaternion superposition to superpose two or more protein chains. The procedure for multiple-chain (3 or more) superposition differs somewhat from simple 2-chain superposition, so these procedures are described separately.

2-Chain Superposition

Alignment: SuperPose extracts the protein amino acid sequence from the atomic coordinate data for each chain, and then aligns the two sequences using a needleman-wunsch global alignment (BLOSUM62). The sequence identity is examined to determine the relatedness of the two chains. If the identity is low (default <25 %), SuperPose extracts the secondary structure information for the two chains and performs a secondary structure alignment. If the sequence identity is high (>80%), will look for structural dissimilarity (hinge regions, open / closed forms, etc.) by analysis of the difference distance matrix. Alignments with identities falling between these two ranges are superposed using all the matched C-alpha carbons in both structures.

Difference Distance Matrix: A difference distance (DD) matrix can be generated by first calculating the distances between all pairs of C-alpha atoms in one molecule to generate an initial distance matrix. A second pair-wise distance matrix is generated for the second molecule and, for equivalent/aligned C-alpha atoms, the two matrices are subtracted from one another, yielding the DD matrix. From the DD matrix it is possible to quantitatively assess the structural similarity/dissimilarity between two structures. In fact, the difference distance method is particularly good at detecting domain or hinge motions in proteins. For sequences with high identity (default >80%), SuperPose analyzes the DD matrix to look for dissimilar regions, defined by a stretch of 7-or-more aligned C-alpha pairs with RMSD over a certain cutoff (default 3 Angstroms). If a significant dissimilarity is found, SuperPose will perform a 'subdomain superposition'. This is accomplished by finding from the DD matrix the longest stretch of C-alpha pairs with RMSDs below a certain cutoff (default 2 Angstroms). SuperPose restricts the superposition to the C-alpha carbons contained in this stretch.

Quaternion Superposition: Once Superpose has identified the optimal matching of pairs of C-alpha atoms between the two protein structures, it can proceed with the quaternion Superposition, as described by Kearsley [1]. Superposition is achieved by rotating one chain around its geometric center to orient it with the other structure, then translating the structure so that they have a common geometric center.

Multiple Chain Superposition

Pileup: Multiple chain superpositions follow in general the procedure described for two chain superpositions, but with some important differences. For multiple superpositions, SuperPose extracts the sequence information for all the structures, and performs an all-against-all alignment and DD matrix analysis. An average RMSD value is determined for the alpha-carbons of each aligned pair by examination of the DD matrix. The alignments are then sorted by average RMSD, from lowest to highest. A pileup is performed as follows: the aligned pair with the lowest RMSD is selected to start the pileup. The other alignments that contain one structure that is represented in the pileup and one structure that not not represented in the pileup are examined for their RMSD values. From this set, the alignment containing the lowest RMSD is identified and added to the pileup. This process is iterated until all structures are represented in the pileup. The pileup determines the order of superposition for the multiple structures. At this stage, SuperPose examines the DD matrix for each pair to detect structural dissimilarity (described above). If found, SuperPose will perform a subdomain superposition for all the structures in the collection. If all the structures are sufficiently similar, SuperPose will perform a global superposition.

Superposition

Before superposition, SuperPose examines the sequences in the pileup and determines if they all have the same sequence or if the pileup contains a collection of different sequences. If all the sequences are the same, SuperPose uses a 'superpose to average' strategy, otherwise SuperPose uses a 'pairwise superposition' strategy.

Superpose to Average: For collections of structures with a common sequence SuperPose proceeds as follows: SuperPose superimposes the first two structures from the pileup, creates an average structure, and then superposes the subsequent structure with the average structure. A new average structure is created from the collection of superposed structures, and the process is repeated until all structures have been superposed. RMSD values for the collection are calculated from mean (i.e. a mean structure is calculated from the ensemble and the RMSD determined from each structure to the mean. The reported RMSD is the mean of these RMSDs).

Multiple Pairwise Superposition: If the collection of structures to be superposed contain dissimilar sequences, SuperPose superimposes the first two structures and adds them to a collection of 'superposed structures'. The next structure is superposed to the most similar structure from the collection of superposed structures, and the process repeated until all the structures are superposed. RMSD values for the collection are calculated pairwise (i.e the RMSDs are calculated for each pairwise superposition, and the mean RMSD from these values are reported.)

[1] S.K.Kearsley, On the orthogonal transformation used for structural comparisons, Acta Cryst. A45, 208 (1989)

How SuperPose Measures Up

This table presents a comparison of RMSD values between SuperPose, DeepView, and MolMol, using matching alignments.

Structure(s), Sequence ID & Category

SuperPose

Backbone RMSD (Å) & match residues

DeepView

Backbone

RMSD (Å) & match residues

MolMol

Backbone

RMSD (Å) & match residues

Same Sequence + Similar Structure (pair)

Thioredoxin (2TRX_A vs. 2TRX_B) 100% ID

0.77 (1-108)

0.77 (1-108)

0.66 (1-108)

Hemoglobin (4HHB_A vs. 1DKE_A) 100% ID

0.37 (1-141)

0.37 (1-141)

0.36 (1 -141)

P21 Oncogene(6Q21_A vs. 6Q21_B) 100% ID

1.27 (1-171)

1.27 (1-171)

1.16 (1-171)

~Same Sequence + Different Structure (pair)

Calmodulin (1A29 vs. 1CLL) 98.6% ID

14.97 (4 -146)

15.02 (4-146)

14.94 (4-146)

Maltose Bind Prot. (1OMP vs. 1ANF) 100% ID

3.76 (1 -370)

3.76 (1-370)

3.76 (1-370)

Similar Structure + Different Length (pair)

Hemoglobin (4HHB_A vs. 4HHB_B) 43% ID

1.21

4HHB_A (51-141)

4HHB_B (56-146)

1.21

4HHB_A (51-141)

4HHB_B (56-146)

1.16

4HHB_A (51-141)

4HHB_B (56-146)

Thioredoxin (3TRX vs. 2TRX_A) 29% ID

1.70

3TRX (23-49)

2TRX_A (23-49)

1.70

3TRX (23-49)

2TRX_A (23-49)

1.63

3TRX (23-49)

2TRX_A (23-49)

Lysozyme/Lactalbumin(1DPX vs.1A4V) 36% ID

2.05

1DPX (32-99)

1A4V (29-96)

2.05

1DPX (32-99)

1A4V (29-96)

1.91

1DPX (32-99)

1A4V (29-96)

Calmodulin/TnC (1CLL vs. 5TNC) 47% ID

5.24

1CLL (4-78)

5TNC (14-88)

5.24

1CLL (4-78)

5TNC (14-88)

5.21

1CLL (4-78)

5TNC (14-88)

Similar Structure + Very Different Sequence

Ubiquitin/Elongin (1UBI vs. 1VCB_A) 26% ID

2.19

1UBI (12-66)

1VCB_A(13-67)

2.19

1UBI (12-66)

1VCB_A(13-67)

2.11

1UBI (12-66)

1VCB_A(13-67)

Thio/Glutaredoxin(3TRX vs. 3GRX_A) 7% ID

1.75

3TRX (78-89)

3GRX_A(54-65)

1.76

3TRX (78-89)

3GRX_A(54-65)

1.51

3TRX (78-89)

3GRX_A(54-65)

Hemoglobins (1ASH vs. 2LHB) 17% ID

1.90

1ASH (26-54)

2LHB (34-62)

1.90

1ASH (26-54)

2LHB (34-62)

1.76

1ASH (26-54)

2LHB (34-62)

Thioredoxins (1NHO_A vs. 1DE2_A) 22% ID

4.04

1NHO_A(13-30)

1DE2_A(14-31)

4.04

1NHO_A(13-30)

1DE2_A(14-31)

3.85

1NHO_A(13-30)

1DE2_A(14-31)

Multiple Structures + Same Sequence

Pointed Domain (1BQV, 28 chains) 100% ID

6.28 (all atoms) Subdomain match off

7.87 ( all atoms)

Subdomain match on

Failed

5.88 (all atoms)

Trypsin Inhibitor (1PIT, 20 chains) 100% ID

1.32 (all atoms)

Failed

1.30 (all atoms)

Oligomerization domain (1OLG, 4 chains) 100% ID

0.57 (all atoms)

Failed

0.58 (all atoms)

Oxidoreductase (1NHO, 20 chains) 100% ID

0.96 (all atoms)

Failed

0.96 (all atoms)

Problems? Questions? Suggestions? Contact Canadian Bioinformatics Help Desk

Acknowledgements:

SuperPose uses VADAR, BioPerl, WebMol, EMBOSS, ClustalW, MolScript, ImageMagick, and Gnuplot.

SuperPose v1.0 (2004) Rajarshi Maiti, Gary Van Domselaar, Haiyan Zhang, and David Wishart

Funding for this project was provided by

and