MotiveValidator

From WebChem Wiki
Jump to: navigation, search

Background

The advancement of research in structural biology has provided a large body of structural data deposited in various databases. One great example is the Protein Data Bank (PDB), which has been growing exponentially, and which currently consists of more than 100,000 structures of biomolecules and their complexes. Such large bodies of data, especially accumulated over a short period of time and using high throughput techniques, will inherently be plagued by a variety of problems.

Validation arose as a major issue in the structural biology community when it became apparent that some published structures contained serious errors, either documented (e.g., due to insufficient electron density in a certain area), or not. Structural databases generally require that the new submissions be checked prior to acceptance. The tools employed for presubmission validations work fairly well for well studied residues like amino acids or nucleotides. However, an essential step in the validation process is checking the ligand structure, because ligands play a key role in protein function, and also because they are the main source of errors in structures. A notable case of ligand validation is the analysis of carbohydrate structures[1], because they have complex topology and many chiral atoms. Yet carbohydrates are involved in a variety of fundamental biological processes and they have large pharmaceutical and diagnostic potential. Additionally, more than 60% of nontrivial-sized ligands (> 10 atoms) from the Protein Data Bank contain a carbohydrate. In recent years, many algorithms for validation, ligand validation and carbohydrate validation have been developed. Nonetheless, significant limitations persist, such as insufficient coverage of ligands and time inefficiency (i.e., calculations are time demanding and only one entry can be validated in each run).

Functionality

MotiveValidator identifies the validated residue in the input structure based on the name (3-letter code) of the model residue. Then it extracts the validated residue in the form of an input motif, containing all the atoms of the validated residue, together with all atoms found within one or two bonds of any atom from the validated residue (surroundings). The validated motif results as the subset of atoms in the input motif which correspond to atoms in the model residue

We have developed MotiveValidator[2], a user-friendly, interactive and platform independent environment for the speedy validation of ligands, residues and fragments (denoted as structural motifs). MotiveValidator covers all standard and custom residues and ligands, and was successfully tested in six research labs on more than 50.000 input samples.

The main purpose of the software is to help you determine whether a residue, ligand, or fragment in a biomolecule or biomolecular complex is structurally complete and correctly annotated. Specifically, MotiveValidator checks if the topology and chirality are correct. Validation is performed against model residues from the wwPDB chemical component dictionary. MotiveValidator covers all residues and ligands accessible via LigandExpo[3].

Additionally, you may upload your own custom model in case the residue, ligand or fragment of your interest does not have any model available in LigandExpo. However, please note that the user bears all responsibility for the correctness of uploaded models.

In its most basic form, the function of MotiveValidator is to take a residue and validate it by comparing its structure against the structure of a model residue that shares the same name (3-letter code) with the residue of interest, as you can see in the picture below. Therefore, MotiveValidator generally requires two input files, namely a file containing the residue to be validated (input structure), and a file containing the model to be used as reference. Note that, most commonly, the input structure containing the validated residue actually includes entire biomolecules (or their fragments) and other ligands. For this reason, the interface of MotiveValidator in fact uses the term biomolecule to refer to the input file containing validated residues. This simply means that you need not extract the validated residue yourself before you can validate it. MotiveValidator will do that for you.


Availability and technical details

Where to find MotiveValidator

MotiveValidator is freely available via the internet since September 2013 at our webpages. There is no login requirement for using MotiveValidator. If you find this tool useful for your work, please cite it as:[2]

What you need in order to run MotiveValidator

  • Up to date internet browser with WebGL support.
  • JavaScript enabled.

Check it out, if your browser is WebGL and Javascript compliant.

How to get around the web page

For the quick tour on using MotiveValidator service, please click the Guide button in the upper right corner and follow instructions.


Input file requirements

Before you move on, please consult the Terminology used within the MotiveValidator.

MotiveValidator is meant for residue based validations against LigandExpo models, and thus accepts input files in PDB format[4]. The PDB format is necessary due to the fact that it contains residue information (the 3-letter code residue name and residue identifier). However, especially in the case of more unusual residues or user defined motifs, it is many times useful to also submit your structures in SD/SDF/MOL format along with the PDB format. This is to insure that MotiveValidator identifies inter-atomic bonds correctly. Additionally, since the representation of large biomacromolecules and their complexes is moving towards a more general format, MotiveValidator also allows to upload the structure to be validated in PDBx/mmCIF format.

The PDB file format is well established. The following fields must appear correctly in your input PDB files: atom index, atom name, element symbol, residue name, residue index, 3D coordinates. Alternate locations of atoms are ignored. If MotiveValidator finds any issues in the input files, it will report them as warnings or processing errors TODO link na detaily.

The input files may be uploaded from your computer, or retrieved from the corresponding databases. The model may be retrieved from LigandExpo by specifying its 3-letter identifier according to the PDB residue naming convention. Similarly, the biomolecule may be retrieved from the Protein Data Bank by specifying its 4-letter PDB ID. Capitalization is not important, so 1tqn, 1TQN and 1Tqn will retrieve the same structure. Notice that we keep mirrors of LigandExpo and the Protein Data Bank on our server, so that file retrieval is as speedy as possible. The date the database mirrors were last updated is always displayed at the bottom of the submission page.


Validation modes

One feature that makes MotiveValidator very efficient is that it allows the validation of multiple residues in multiple files in a single run. If you need to validate more than one residue at a time, simply request to retrieve several models from LigandExpo by specifying a list of 3-letter codes separated by commas. Alternatively, you may upload more files with model residues in a .zip archive. You may also wish to validate a residue in more than one biomolecule, in which case simply upload a .zip archive of all your input structures, or request to retrieve several structures from the Protein Data Bank by specifying a list of 4-letter PDB IDs separated by commas. Thousands of biomolecules can be scanned at a time.

Residue Validation

Remember that the term residue refers here to any collection of atoms bound by chemical bonds (covalent, coordinative or ionic), as long as all the atoms have the same residue name (3-letter code) and residue identifier (index internal to the input PDB file). Therefore any set of atoms that can be defined as a residue (or ligand) within the MotiveValidator environment can be validated using Residue Validation. MotiveValidator will first read in the model file and establish the residue name and structural information for all its atoms. MotiveValidator will then scan the entire input biomolecule and extract all instances of residues with the same name (3-letter code) as the model. Each of these instances will be considered a separate motif, receive a unique motif identifier and subsequently be validated against the structure of the model residue.

Sugar Validation

Sugar validation is a special case of residue validation where the residues to be validated are carbohydrates. This special analysis mode allows MotiveValidator to perform fully automated validations. Specifically, MotiveValidator keeps a pre-established list of sugar residues and their models from LigandExpo, containing the residue name and structural information for all atoms in each sugar residue. The consequence of this fact is that you need not provide the sugar models yourself, and need only specify the biomolecule file. MotiveValidator will automatically scan all residues in the biomolecule file, and extract all instances of all sugars based on residue names and residue identifiers. Each instance of each sugar will be considered a separate motif, receive a unique motif identifier and subsequently be validated against its corresponding sugar model. All sugar validations take place in a single run, making sugar validation very efficient.

Motif/Fragment Validation

The third type of validation that can be performed using MotiveValidator can be initiated via the Motif/Fragment Validation tab on the submission page. The Motif/Fragment Validation tab provides a brief description of the validation process, and several options for input file submission.

Remember that the term motif refers here to a fragment of a biomolecule or ligand, made up of one or more residues or parts of residues. Therefore, any set of atoms that can be defined as a motif within the MotiveValidator environment can be validated using Motif/Fragment Validation. MotiveValidator will first read in the model file and collect the structural information for all its atoms. MotiveValidator will then read in the motif file and assign it a unique motif identifier. Then, the entire structure of the motif as a whole is validated against the model, regardless of the residue names and identifiers in the motif file.


Results Analysis and Output

As soon as you initiate a validation by providing the necessary input files and clicking the Upload and Compute button at the bottom of any functional tabs on the MotiveValidator submission page, you will be redirected to the results page. If you are working with large volumes of data, MotiveValidator will display the progress of the calculation. However, even in such cases, where thousands of structures are processed, the total duration of any validation is on the order of seconds. For the description and analysis of the output provided by the MotiveValidator, please follow this link.

Command line version

In this manual we described in detail how to use MotiveValidator directly from our server, since that is the most straightforward, comfortable and effective way to take advantage of MotiveValidator. Nonetheless, should you wish to run the validations on your system instead of on our server, you may do so. All you need to do is click on the Command line version tab, and download the command line version of MotiveValidator. Instructions on how to run MotiveValidator on your system can be found on our web page


MotifExtractor

For Residue Validation and Sugar Validation, input files may contain any additional structural elements in addition to the validated residue. MotiveValidator will identify and extract the validated residues based on the annotations in the model file. Thus you need not process the biomolecule input files before running MotiveValidator in either of these two validation modes. However, in the case of Motif/Fragment Validation, the entire structure of each input motif will be compared to the model without checking annotations (residue name and identifier) in the model file. Therefore, you will need to provide the input motifs yourself. To save you time in this task, we provide MotifExtractor, a simple automated script for extracting the motifs of interest from biomolecule structure files. Instructions on how to run MotifExtractor on your system can be found on our web page.

References

  1. Lütteke,T. and von der Lieth,C.-W. (2004) pdb-care (PDB carbohydrate residue check): a program to support annotation of complex carbohydrate structures in PDB files. BMC Bioinformatics, 5, 69.
  2. 2.0 2.1 Vařeková,R.S., Jaiswal,D., Sehnal,D., Ionescu,C.-M., Geidl,S., Pravda,L., Horský,V., Wimmerová,M. and Koča,J. (2014) MotiveValidator: interactive web-based validation of ligand and residue structure in biomolecular complexes. Nucleic Acids Res., 12, 227–233.
  3. Feng,Z., Chen,L., Maddula,H., Akcan,O., Oughtred,R., Berman,H.M. and Westbrook,J. (2004) Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics, 20, 2153–5.
  4. Read more about the PDB format at http://wwpdb.org/documentation/format33/v3.3.htm