ValidatorDB is a database of validation results for ligands and non-standard residues in the Protein Data Bank. Before moving on to more extensive descriptions of features, it is important to clearly establish the meaning of a few key terms and principles within the ValidatorDB environment.
We generally use the term residue to refer to any component of a biomacromolecule or a biomacromolecular complex. Within the ValidatorDB environment, a collection of atoms bound by chemical bonds (covalent, coordinative or ionic) can be considered a residue as long as this fact is appropriately indicated in the input PDB file. Specifically, all the atoms that make up a residue should have the same residue annotation (3-letter code) and residue identifier (index internal to the PDB file).
With respect to the building blocks of biomacromolecules, ValidatorDB denotes as standard residues the 5 standard nucleotides (A, C, G, T, U) together with their 5 common deoxy forms (DA, DC, DG, DT, DU), and the 20 standard amino acids. Validation results for these standard building blocks of biomacromolecules are not included in our database because many tools already cover these. Additionally, Selenomethionine (MSE) is also considered a standard residue here due to its extremely high occurrence in the Protein Data Bank (markedly higher than other ligands and non-standard residues), and high incidence of circumstantial inclusion in biomacromolecules (to aid X-ray crystallography experiments). Therefore, ValidatorDB covers only non-standard residues, i.e., residues which cannot be denoted as standard by the above definition.
We use the term ligand to refer to a chemical compound which forms a complex with a biomacromolecule (e.g., sugar, drug, heme). Ions can also function as self standing ligands, or they can be part of a residue (such as Fe in heme). In the PDB format, a ligand has its own residue identifier and annotation (3-letter code), and is composed from HETATM records. The ValidatorDB term residue thus fully covers ligands.
ValidatorDB uses molecules as an umbrella term for ligands and non-standard residues. Therefore, all properties of ligands and non-standard residues are valid for molecules as well (PDB entry of origin, residue annotation, residue identifier, number of heavy atoms, properties assigned after the validation, etc.). Moreover, a single occurrence of a ligand or non-standard residue is also a molecule.
It is essential to note that ValidatorDB contains validation results for all ligands and non-standard residues containing 7 or more heavy atoms. These are denoted as molecules relevant for validation, or simply molecules. The reason ValidatorDB focuses on these types of molecules is that they exhibit high diversity and nontriviality in their structure.
We use the term model to refer to a particular structure that is known to be correct. This structure will then be used as reference in the validation process. A model is identified by its residue annotation (3-letter code). The origin of the models used by ValidatorDB is the wwPDB Chemical Component Dictionary (wwPDB CCD).
The term motif is used here as a fragment of a biomacromolecule, biomacromolecular complex or ligand, made up of one or more residues or parts of residues. Specifically, the term input motif refers to the individual molecule being validated, together with its surroundings (i.e., atoms from neighboring residues, within two bonds of any atom of the validated molecule). Each input motif in ValidatorDB is assigned a unique motif identifier based on its PDB entry of origin. On the other hand, the term validated motif (or validated molecule) refers strictly to the subset of atoms in the input motif which were successfully mapped to atoms in the model.