Using an appropriate measure for structural similarity, protein structures can be arranged in distinct groups whose members fulfil a certain similarity threshold relative to a proxy structure, the representative of the group. Performing this clustering procedure with different threshold values, proceeding from a coarse-grained similarity level to a fine-grained level, results in distinct layers, with each of these layers being the complete set of clusters obtained upon a certain similarity threshold. Using these layers, all clusters are associated in a hierarchical way which allows to generate a tree-like structure containing all currently known protein structures. TopTree allows to search and explore this hierarchy of structures upon input of a certain query protein.
(1) Change between Details and Summary view.
(2) Sort table by that column.
(3) Add/remove columns.
(4) Search the table.
(5) Select rows, copy to clipboard, save the table or get a fullscreen view.
(6) Go to TopSearch or TopMatch.
(7) Expand table to the left, or tree to the right.
The structural similarity between a group's representative and one of its members is characterized with a single score S, which combines the number of structurally equivalent residues and the root mean square error (see Sippl & Wiederstein, 2012). In order to deal with the variation in protein size, the relative similarity Sr is used for the clustering procedure:
with Ql and Tl being the lengths of the proteins in comparison. The clustering is then performed with different values for the relative similarity, starting at 30%, which means a very remote structural similarity, and ending at 90%, which characterizes very similar proteins. This leads to the construction of distinct layers which are defined as sets of clusters obtained upon a certain Sr value, e.g. the term L30 denotes the layer resulting from a clustering run with Sr = 30%.
TopTree expects the PDB code of the query protein as input. Alternatively, a keyword can be given to start a text search against the database, such as the name or the function of a protein. Though any text might be specified, be aware that very general terms (e.g. "protein") will result in a server timeout.
The search against the tree of protein structures can be performed on three different levels of structural organization:
A click on either a biological unit, a chain or an asymmetric unit generates the hierarchical tree for that structure. In addition, the table widget gets initiated with the L90 cluster of the query.
The input of a keyword initiates a text search against our database of protein structures which returns all clusters containing at least one member that corresponds to the search term. The clusters are organized by their respective layer and are denoted by their representative and the number of search hits in that specific cluster. A click on the code of the representative then generates the corresponding tree for one of the search hits in this cluster. Additionally, the table widget shows information about the cluster in the chosen layer containing the search hit.
The relevant part of the hierarchical tree is presented: Beginning at L30, the cluster containing the query structure as well as all children clusters are shown. Each cluster is presented by its size (the amount of members), the PDB code of the representative and its compound information as given in the PDB file. A click on the folder symbols opens or closes the underlying branch of the structure tree, while clicking the representative shows further information about the cluster members in the table widget on the right. If a PDB code was given as input, all query clusters are highlighted in bold, while in case of a keyword search all clusters that contain matching structures are highlighted.
The table provides detailed information about the cluster chosen on the tree widget. This information is presented in two ways: