The advancement of research in structural biology has provided a large body of structural data deposited in various databases. One great example is the Protein Data Bank (PDB), which has been growing exponentially, and which currently consists of more than 100,000 structures of biomolecules and their complexes. Such large bodies of data, especially accumulated over a short period of time using high throughput techniques, will inherently be plagued by various problems.
Validation arose as a major issue in the structural biology community when it became apparent that some published structures contained serious errors. Structural databases generally require that the new submissions be checked prior to acceptance. The tools employed for presubmission validations work fairly well for residues like amino acids or nucleotides. However, an essential step in the validation process is checking the ligand structure, because ligands play a key role in protein function, and also because they are the main source of errors in structures. Ligand validation, as well as the validation of non-standard residues, are very challenging tasks, because of the high diversity and nontriviality of their structure, and the general lack of information about correct structures. Therefore, software tools focused on ligand validation were developed relatively recently,  and the topic is still under active development. These tools are able to validate one or more structures (even thousands of structures), but they are not able to provide the broad scientific community with a more complex image of the quality of structures in dedicated and well established structural databases. For example, a general overview and corresponding statistical evaluation of validation results for the entire PDB is not yet available, despite the exponential growth of the PDB and the development of structural validation tools in recent years.
We had recently developed MotiveValidator, an interactive platform for the speedy validation of ligands, residues and fragments using a novel, straightforward approach based on the validation of residue annotation. MotiveValidator employs advanced algorithms for the detection and comparison of structural motifs, along with tools for chirality verification and interactive visualization of 3D structures. We expanded the validation algorithm used by MotiveValidator, and subsequently applied to validate all ligands and non-standard residues in the PDB.
We further created ValidatorDB, a comprehensive resource of validation results for ligands and non-standard residues in the Protein Data Bank. ValidatorDB offers several advantages over currently available tools (ValLigURL, pdb-care, MotiveValidator):
- ValidatorDB is a database of precomputed validation results for all ligands and non-standard residues in the PDB (except small molecules having fewer than 7 heavy atoms). The other tools require the user to provide input files, or to reference the input molecules from another database. Although the number of input molecules can be high (up to thousands), database-wide validations for resources such as the PDB are unfeasible for a single user.
- ValidatorDB provides summaries of the validation results for the entire PDB, for sets of molecules sharing the same annotation (3-letter code) or the same PDB entry, and for user defined selections of annotations or PDB entries. Other tools are only able to provide results for one or more PDB entries or validated molecules.
- ValidatorDB provides a systematic insight into validation results. The validation analyses are classified into three main categories (Completeness, Chirality and Advanced), each containing several related analyses. Other tools provide just a set of independent analyses.
- Unlike other tools, ValidatorDB classifies the types of chirality errors, enabling the user to distinguish between serious chirality issues and minor inconsistencies.
- ValidatorDB performs novel analyses and can report completely erroneous ligands, alternate conformations, identity with the model molecules, etc. Such analyses can provide information valuable for further data processing.
- Lütteke,T. and von der Lieth,C.-W. (2004) pdb-care (PDB carbohydrate residue check): a program to support annotation of complex carbohydrate structures in PDB files. BMC Bioinformatics, 5, 69.
- Kleywegt,G.J. and Harris,M.R. (2007) ValLigURL: a server for ligand-structure comparison and validation. Acta Crystallogr. D. Biol. Crystallogr., 63, 935–8.
- Berman, H.M., Kleywegt, G.J., Nakamura, H. and Markley, J.L. (2014) The Protein Data Bank archive as an open data resource. J. Comput. Aided. Mol. Des.
- Vařeková,R.S., Jaiswal,D., Sehnal,D., Ionescu,C.-M., Geidl,S., Pravda,L., Horský,V., Wimmerová,M. and Koča,J. (2014) MotiveValidator: interactive web-based validation of ligand and residue structure in biomolecular complexes. Nucleic Acids Res., 12, 227–233.
- Sehnal,D., Vařeková,R.S., Huber,H.J., Geidl,S., Ionescu,C.-M., Wimmerová,M. and Koča,J. (2012) SiteBinder: an improved approach for comparing multiple protein structural motifs. J. Chem. Inf. Model., 52, 343–59.
- O’Boyle,N.M., Banck,M., James,C. a, Morley,C., Vandermeersch,T. and Hutchison,G.R. (2011) Open Babel: An open chemical toolbox. J. Cheminform., 3, 33.