Difference between revisions of "ValidatorDB:Terminology"

From WebChem Wiki
Jump to: navigation, search
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''ValidatorDB''' is a database of validation results for ligands and non-standard residues in the Protein Data Bank. Before moving on to more extensive descriptions of features, it is important to clearly establish the meaning of a few key terms and principles within the '''ValidatorDB''' environment.
+
The key terms used within the '''MotiveValidator''' and '''ValidatorDB''' environment are defined below. It is important to establish these terms before moving on to the description of the [[ValidatorDB:Principles | '''ValidatorDB''' principles]], or [[MotiveValidator:Functionality | '''MotiveValidator''' functionality]] on which both validation tools are built.
  
 
=Residue=
 
=Residue=
We generally use the term ''residue'' to refer to any component of a biomacromolecule or a biomacromolecular complex. Within the '''ValidatorDB''' environment, a collection of atoms bound by chemical bonds (covalent, coordinative or ionic) can be considered a residue as long as this fact is appropriately indicated in the input PDB file. Specifically, all the atoms that make up a residue should have the same ''residue annotation'' (3-letter code) and ''residue identifier'' (index internal to the PDB file).
+
We generally use the term ''residue'' to refer to any component of a biomacromolecule or a biomacromolecular complex. Within the validation environment, a collection of atoms bound by chemical bonds (covalent, coordinative or ionic) can be considered a residue as long as this fact is appropriately indicated in the input PDB file. Specifically, all the atoms that make up a residue should have the same ''residue annotation'' (3-letter code) and ''residue identifier'' (index internal to the PDB file).
  
 
=Non-standard residue=
 
=Non-standard residue=
Line 8: Line 8:
  
 
=Ligand=
 
=Ligand=
We use the term ''ligand'' to refer to a chemical compound which forms a complex with a biomacromolecule (e.g., sugar, drug, heme). Ions can also function as self standing ligands, or they can be part of a residue (such as Fe in heme). In the PDB format, a ligand has its own residue identifier and annotation (3-letter code), and is composed from HETATM records. The '''ValidatorDB''' term ''residue'' thus fully covers ligands.
+
We use the term ''ligand'' to refer to a chemical compound which forms a complex with a biomacromolecule (e.g., sugar, drug, heme). Ions can also function as self standing ligands, or they can be part of a residue (such as Fe in heme). In the PDB format, a ligand has its own residue identifier and annotation (3-letter code), and is composed from HETATM records. The '''validation''' term ''residue'' thus fully covers ligands.
  
 
=Molecule=
 
=Molecule=
'''ValidatorDB''' uses ''molecules'' as an umbrella term for ''ligands and non-standard residues''. Therefore, all properties of ligands and non-standard residues are valid for molecules as well (PDB entry of origin, residue annotation, residue identifier, number of heavy atoms, properties assigned after the validation, etc.). Moreover, a single occurrence of a ligand or non-standard residue is also a ''molecule''. Furthermore, it is essential to note that '''ValidatorDB''' contains validation results for all ligands and non-standard residues containing ''7 or more heavy atoms''. These are denoted as ''molecules relevant for validation'', or simply ''molecules''.
+
In the validation, ''molecules'' are used as an umbrella term for ''ligands and non-standard residues''. Therefore, all properties of ligands and non-standard residues are valid for molecules as well (PDB entry of origin, residue annotation, residue identifier, number of heavy atoms, properties assigned after the validation, etc.). Moreover, a single occurrence of a ligand or non-standard residue is also a ''molecule''.  
 +
 
 +
It is essential to note that [[ValidatorDB:UserManual | ValidatorDB]] contains validation results for all ligands and non-standard residues containing ''7 or more heavy atoms''. These are denoted as ''molecules relevant for validation'', or simply ''molecules''. The reason [[ValidatorDB:UserManual | ValidatorDB]] focuses on these types of molecules is that they exhibit high diversity and nontriviality in their structure.
  
 
=Model=
 
=Model=
We use the term ''model'' to refer to a particular structure that is known to be correct. This structure will then be used as reference in the validation process. A model is identified by its residue annotation (3-letter code). The origin of the models used by '''ValidatorDB''' is the wwPDB Chemical Component Dictionary (wwPDB CCD).
+
We use the term ''model'' to refer to a particular structure that is known to be correct. This structure will then be used as reference in the validation process. A model is identified by its residue annotation (3-letter code). The origin of the models used both by [[MotiveValidator:UserManual | MotiveValidator]] and [[ValidatorDB:UserManual | ValidatorDB]] is the wwPDB Chemical Component Dictionary (wwPDB CCD)<ref name="Sen_2014"/>.
 
 
=Motif=
 
The term ''motif'' is used here as a fragment of a biomacromolecule, biomacromolecular complex or ligand, made up of
 
one or more residues or parts of residues. Specifically, the term ''input motif'' refers to the individual molecule being validated, together with its surroundings (i.e., atoms from neighboring residues, within two bonds of any atom of the validated molecule). Each ''input motif'' in '''ValidatorDB''' is assigned a unique motif identifier based on its PDB entry of origin. On the other hand, the term ''validated motif'' (or ''validated molecule'') refers strictly to the subset of atoms in the ''input motif'' which were successfully mapped to atoms in the ''model''.
 
 
 
=Validation procedure=
 
'''ValidatorDB''' implements the ''validation of annotation'' approach, which consists of several steps. First, for each molecule under investigation, the ''input motif'' is extracted from the respective PDB entry. At the same time, the appropriate ''model'' is retrieved from wwPDB CCD. Then, the ''validated molecule'' (or ''validated motif'') is identified as the subset of atoms common in the ''model'' and the ''input motif''. Subsequently, the ''validated molecule'' is compared against the ''model'', atom by atom. All the validation analyses in '''ValidatorDB''' are based on this comparison of atom properties (presence, chirality, element symbol, PDB name, etc.). Other unusual aspects encountered during validation are reported as processing warnings (e.g., which conformer was validated if several conformers were present). Refer to figure..................
 
 
 
=Validation analyses=
 
The validation analyses performed by ValidatorDB cover all main issues which have been observed in the topology (2D structure) and geometry (3D structure) of ligands and non-standard residues. These validation analyses, along with their respective results, can be classified into three categories, namely ''Completeness'', ''Chirality'' and ''Advanced'' analyses.
 
 
 
The ''Completeness'' analyses attempt to find which atoms are missing, whether these atoms are part of rings, or the structure is degenerate, i.e., the molecule contains very severe errors. These may refer to residues overlapping in the 3D space, or atoms which are disconnected from the rest of the structure.
 
 
 
The ''Chirality'' analyses are performed only on complete structures, and aim to evaluate the chirality of each atom in the validated molecule. We distinguish between several types of chirality errors: on carbon atoms (C chirality), on metal atoms (Metal chirality), on atoms with 4 substituents in one plane (Planar chirality), on atoms connected to at least one substituent by a bond of higher order (High order chirality), and the remaining chirality issues (Other chirality).
 
 
 
The ''Advanced'' analyses are focused on issues which are not real chemical problems, but which can complicate further processing and exploration of data, and thus should be noted. The Substitution analysis reports the replacement of some atom by an atom of a different chemical element. The Foreign atom analysis detects atoms which originate from the neighborhood of the validated molecule (i.e., having different PDB residue ID than the majority of the validated molecule), and generally marks sites of inter-molecular linkage. The Different naming analysis identifies atoms whose name in PDB format is different than the standard convention for the validated molecule. The Zero RMSD analysis reports molecules whose structure is identical (root mean square deviation = 0 Å) to the model from wwPDB CCD. The Alternate conformations analysis informs about the occurrence of alternate conformations in the validated PDB entry.
 
  
=Validation results=
+
=Input motif=
Each molecule is evaluated depending on how it fares during the validation analyses described above. If no issues are found during the validation analyses, the molecule is marked as having ''complete structure and correct chirality''. Validated molecules exhibiting an error in at least one of the ''Completeness'' analyses are denoted as ''incomplete'', whereas the remaining molecules are reported as ''complete''. If no issues are detected during the ''Chirality'' analyses, the validated molecule is marked as having ''Correct chirality'', whereas the remaining molecules are marked as having ''Wrong chirality''.  
+
The term ''motif'' is used here as a fragment of a biomacromolecule, biomacromolecular complex or ligand, made up of one or more residues or parts of residues. Specifically, the term ''input motif'' refers to the individual molecule being validated, together with its surroundings (i.e., atoms from neighboring residues, within two bonds of any atom of the validated molecule). Each ''input motif'' is assigned a unique motif identifier during the validation based on its PDB entry of origin. On the other hand, the term ''validated motif'' (or ''validated molecule'') refers strictly to the subset of atoms in the ''input motif'' which were successfully mapped to atoms in the ''model''.
  
Some types of chirality errors do not constitute real issues, but are artifacts of the automated chirality determination procedure. Specifically, an error in planar chirality may just mean that the chiral atom is situated slightly above or below the plane compared to its equivalent in the model from wwPDB CCD. Further, an error in high order chirality often marks the involvement of phosphate O atoms in salt or ester formation, or merely a different PDB format identification of phosphate O atoms of the validated molecule compared to the model. Therefore, if the validated molecule is found to have planar or high order chirality errors, but no other type of chirality issues, the molecule is marked as having ''Correct chirality (tolerant)''.
 
  
When issues are found during an ''Advanced'' analysis, a warning is reported: Substitution, Foreign atom, Different naming, Zero RMSD or Alternate conformations. While the results of the ''Advanced'' analyses have no bearing over the chemical soundness of the validated molecules, they indicate that further, especially automated processing of these structures can be very problematic. Comparison between the structures of molecules with the same annotation (3-letter code) from different PDB entries might even be impossible in the presence of a substitution, as the corresponding atoms have different chemical elements. PDB atom names cannot be used straightforwardly, since even element symbols can differ and atoms can be formally included in neighboring residues.
+
'''Continue with reading about the [[ValidatorDB:Principles | ValidatorDB principles]], [[MotiveValidator:Principles | MotiveValidator functionality]], or return to the [[ValidatorDB:UserManual | ValidatorDB]], or [[MotiveValidator:UserManual | MotiveValidator]] manuals.'''
  
=Validation reports=
 
In '''ValidatorDB''', the results of the validation analyses are organized systematically:
 
* Validation overview for the entire PDB: summarizes the results of all validation analyses for all molecules
 
* Summary of validation results for sets of molecules sharing the same annotation
 
* Summary of validation results for sets of molecules originating from the same PDB entry
 
* Detailed validation report for a set of molecules sharing a particular annotation
 
* Detailed validation report for a particular PDB entry
 
* Detailed validation report for a particular molecule
 
* Custom validation report
 
  
Each type of validation report is accessible via different sections of the web interface (.................)
+
=References=
 +
<references>
 +
<ref name="Sen_2014">Sen,S., Young,J., Berrisford,J.M., Chen,M., Conroy,M.J., Dutta,S., Di Costanzo,L., Gao,G., Ghosh,S., Hudson,B.P., et al. (2014) [http://dx.doi.org/10.1093/database/bau116 Small molecule annotation for the Protein Data Bank]. Database (Oxford)., 2014, 1–11.</ref>
 +
</references>

Latest revision as of 18:13, 10 August 2015

The key terms used within the MotiveValidator and ValidatorDB environment are defined below. It is important to establish these terms before moving on to the description of the ValidatorDB principles, or MotiveValidator functionality on which both validation tools are built.

Residue

We generally use the term residue to refer to any component of a biomacromolecule or a biomacromolecular complex. Within the validation environment, a collection of atoms bound by chemical bonds (covalent, coordinative or ionic) can be considered a residue as long as this fact is appropriately indicated in the input PDB file. Specifically, all the atoms that make up a residue should have the same residue annotation (3-letter code) and residue identifier (index internal to the PDB file).

Non-standard residue

With respect to the building blocks of biomacromolecules, ValidatorDB denotes as standard residues the 5 standard nucleotides (A, C, G, T, U) together with their 5 common deoxy forms (DA, DC, DG, DT, DU), and the 20 standard amino acids. Validation results for these standard building blocks of biomacromolecules are not included in our database because many tools already cover these. Additionally, Selenomethionine (MSE) is also considered a standard residue here due to its extremely high occurrence in the Protein Data Bank (markedly higher than other ligands and non-standard residues), and high incidence of circumstantial inclusion in biomacromolecules (to aid X-ray crystallography experiments). Therefore, ValidatorDB covers only non-standard residues, i.e., residues which cannot be denoted as standard by the above definition.

Ligand

We use the term ligand to refer to a chemical compound which forms a complex with a biomacromolecule (e.g., sugar, drug, heme). Ions can also function as self standing ligands, or they can be part of a residue (such as Fe in heme). In the PDB format, a ligand has its own residue identifier and annotation (3-letter code), and is composed from HETATM records. The validation term residue thus fully covers ligands.

Molecule

In the validation, molecules are used as an umbrella term for ligands and non-standard residues. Therefore, all properties of ligands and non-standard residues are valid for molecules as well (PDB entry of origin, residue annotation, residue identifier, number of heavy atoms, properties assigned after the validation, etc.). Moreover, a single occurrence of a ligand or non-standard residue is also a molecule.

It is essential to note that ValidatorDB contains validation results for all ligands and non-standard residues containing 7 or more heavy atoms. These are denoted as molecules relevant for validation, or simply molecules. The reason ValidatorDB focuses on these types of molecules is that they exhibit high diversity and nontriviality in their structure.

Model

We use the term model to refer to a particular structure that is known to be correct. This structure will then be used as reference in the validation process. A model is identified by its residue annotation (3-letter code). The origin of the models used both by MotiveValidator and ValidatorDB is the wwPDB Chemical Component Dictionary (wwPDB CCD)[1].

Input motif

The term motif is used here as a fragment of a biomacromolecule, biomacromolecular complex or ligand, made up of one or more residues or parts of residues. Specifically, the term input motif refers to the individual molecule being validated, together with its surroundings (i.e., atoms from neighboring residues, within two bonds of any atom of the validated molecule). Each input motif is assigned a unique motif identifier during the validation based on its PDB entry of origin. On the other hand, the term validated motif (or validated molecule) refers strictly to the subset of atoms in the input motif which were successfully mapped to atoms in the model.


Continue with reading about the ValidatorDB principles, MotiveValidator functionality, or return to the ValidatorDB, or MotiveValidator manuals.


References

  1. Sen,S., Young,J., Berrisford,J.M., Chen,M., Conroy,M.J., Dutta,S., Di Costanzo,L., Gao,G., Ghosh,S., Hudson,B.P., et al. (2014) Small molecule annotation for the Protein Data Bank. Database (Oxford)., 2014, 1–11.