ChargeCalculator:Job submission

From WebChem Wiki
Jump to: navigation, search

The ACC submission page contains a brief description of ACC, and is further organized into several tabs. Since this is your first contact with ACC, the support tabs Quick Help and Samples help you get started, with basic information and examples of ACC use cases.

ACC FirstTimeUserGuide 20.jpg ACC FirstTimeUserGuide 21.jpg
Examples of interesting computation setups accessible via the Submit a computation tab of the ACC submission page. The setup and results of interesting use cases accessible via the Samples tab of the ACC submission page.

You may submit an atomic charge calculation in 3 easy steps, namely: upload your molecule(s), setup the computation, and finally start the calculation.

Upload molecules

ACC is able to read the molecular structure and charge information from the most common file formats. Nonetheless, because it was designed to handle molecules of all kinds and size, ACC generally requires that the input files follow the formal guidelines established for each format.

Checklist for uploading molecules
  • File format: PDB, PDBx/mmCIF, PQR, MOL, MOL2, SDF, or .zip with multiple files of a suitable format; 50MB limit for each upload
  • All H should be present, according to the appropriate protonation states. The computation can run even when H are missing, but the absolute values of the atomic charges will not be chemically relevant.
  • No limitation regarding the size, number or nature of the chemical entities in each file: proteins, nucleic acids, ligands, water... - everything will be loaded and identified as a single molecule
  • If a file contains multiple molecules annotated as such (typically NMR models), only the first one will be loaded. If you want to load more NMR models (e.g., to later compare the results), you need to upload them in separate files.
  • If a molecule contains more chains, the chain IDs should be present at their appropriate location.
ACC FirstTimeUserGuide 03.jpg ACC FirstTimeUserGuide 04.jpg
ACC supports all popular molecular structure formats (PDB, PDBx/mmCIF, PQR, MOL, MOL2, SDF). For multiple molecules, upload a .zip archive. All structural elements in one file are assigned to one molecule (protein, ligands, water, etc.). Make sure the input molecules are complete. If necessary, use an external program to add H before uploading the molecule(s).

All the chemical entities in each file (proteins, nucleic acids, ligands, water...) will be loaded and identified as a single molecule. If you do not wish to include part of the system in your calculation, you must remove it from the input file. Once you have uploaded the file, you may choose to include/ignore water, but all other chemical entities will be present.

By default, only the first NMR model is loaded from files with multiple models annotated as such. If you wish to load all NMR models, you must produce a single input file per model. The same holds true for multiple molecules in sdf format - only the first molecule will be loaded.

In order to produce chemically relevant atomic charges using EEM, it is necessary that the structure of the molecule be complete. No crucial parts should be missing. If parts of the structure are missing, appropriate cappings should be included. All protons should be present according to the relevant protonation state. Since ACC does not currently include functionality for editing the molecular structure, you must address these issues prior to uploading the molecule into ACC. For example, you may use a server like pdb2pqr to assign protonation states, add protons and subsequently estimate the total molecular charge.

ACC produces a missing H warning if no H are found in the input file. Despite the missing H warning, ACC allows to proceed with the charge calculation step, as it might not always be possible to obtain a perfect structure (e.g., when working with low resolution structures of extremely large complexes). The results from such calculations may not have chemical meaning in their absolute values, but they can be very useful when comparing sets of charges (open vs closed conformation, free vs bound state, etc.).

Input files generally contain atom type information. Many different atom type schemes are used in different modeling projects. Moreover, many times the output is not even standardized between different applications implementing the same atom type scheme. ACC attempts to be a general utility, and currently implements only the detection of chemical elements. If the atom types in the input file differ from chemical elements, ACC will report them as unknown chemical elements, and these atoms will be skipped during the EEM calculation (they will not contribute to the EEM matrix ). A similar problem will arise if the atom type information is not found at the expected place in the file. In the future, a more complex parsing algorithm may be implemented in ACC in order to cover the most common atom type schemes (e.g., AMBER, OPLS, etc.). Currently, the atom type parsing problem can be worked around either by uploading input files which adhere to the formal guidelines for their respective formats and contain atom types according to chemical elements, or by creating an EEM parameter set with special parameters for those atom types which ACC finds problematic.

If the chain ID is not explicitly included in the input file, but the molecule contains multiple chains with overlapping residue serial numbers, the results will not be meaningful for the affected residues, and possibly even in the vicinity of these residues. ACC provides check chain ID warnings both before and after the computation if this problem is detected, so that the input file can be corrected.

If bond information is not explicitly included in the input file, ACC will attempt to compute this information based on the molecular structure. This algorithm may assign wrong bond information when interatomic distances vary significantly from the expected norms. This behavior may only affect calculations using EEM parameter sets which distinguish between atom types based on bond information.

For a given molecule, you may wish to visualize and analyze atomic charges which you obtained by other means than ACC. You can use one of the dedicated molecular structure formats that include atomic charges (mol2, pqr). Alternatively, you can use one or multiple wprop files or .*chrg files to supply ACC with several sets of precomputed charges for one or more molecules. The format of a .wprop file with atomic charges for a molecule with 7 atoms is:

molecule name
name of set of charges
Charges
RealAtomProperties
7
1 -0.303792368142346
2 0.172805849402877
3 0.850934507393088
4 -0.665816273614449
5 0.155665455717068
6 0.0563610400635921
7 0.205393963483477

To let ACC know which .wprop or .*chrg files should be assigned with which molecule, you must name these files as MOLECULENAME_ref_UNIQUENAME.wprop or MOLECULENAME_ref_UNIQUENAME.wprop. The first part of the file name, MOLECULENAME, must be the same as the name of the molecular structure file containing the molecule to which the set of charges will be assigned. The designation ref marks this set of charges as user defined. The last part of the file name, UNIQUENAME is any group of characters which will allow for a unique identification of this set of charges. Thus, if you are uploading the file 4DRP.pdb and wish to include 3 sets of charges that you previously computed using other means, simply name these files 4DRP_ref_1.wprop, 4DRP_ref_2.wprop and 4DRP_ref_3.wprop.


Setup computation

Once you have uploaded your molecule(s), ACC parses the molecular structure and redirects you to the ACC Setup page. Note that at this point a unique URL has been generated on the ACC server, and assigned to your computation. You may use this URL during calculation setup, execution or completion to access all the data associated with your ACC computation.

ACC FirstTimeUserGuide 05.jpg ACC FirstTimeUserGuide 06.jpg
Upon uploading the molecule(s), ACC parses the molecular structure and tries to prefill the submission form with suitable default settings. Check and adjust whatever default settings you find unsuitable.

ACC generally tries to prefill the setup form with suitable options based on the molecule(s) you uploaded. It commonly happens that you can start the calculation immediately after loading the molecule. Nonetheless, the setup has a few critical sections you should check and maybe adjust before proceeding.

Set total charge

The total molecular charge correlates with the number of unsatisfied valences in the molecule. Make sure to assign the appropriate total charge for all non-neutral molecules you uploaded.

The total molecular charge quantifies the amount of electron density that will be distributed across the molecule during the EEM calculation. The total molecular charge plays an important role in the quality of the ACC results. By default, ACC assumes that all molecules are neutral. If this is not the case for your molecule(s), you must indicate so in the appropriate field. For each non-neutral molecule you uploaded, write the appropriate total charge value in the Total charge column. If all molecules you uploaded have the same charge, use the Total Charge for All Molecules field, and click Apply.

The total molecular charge must be in tune with the structure of the molecule. Specifically, it correlates with the number of unsatisfied valences in the molecule. For this reason, ACC checks valences, and produces a missing H warning if they are not satisfied, suggesting the molecule may not be neutral. You should always check the Message column.

Pick EEM parameters

The Electronegativity Equalization Method (EEM) is the procedure by which atomic charges are calculated. EEM employs special parameters for each type of atom. An EEM parameter set, or simply set, was developed for a certain charge definition, and contains parameters for certain atom types (H,C,N,O, halogens, metals, etc, depending on the target molecules).

Many EEM parameter sets have been published in literature, and are available here as built-in sets. ACC tries to recommend an EEM parameter set suitable for the molecules you uploaded. It may be that the default ACC selection is not optimal for you, or you wish to try several EEM parameter sets. Click more or Show sets for a table with the full list of sets available in ACC.

The applicability domain of a given EEM parameter set is generally limited to the target molecules, and closely related to the applicability domain of that particular approach. This is why the table with EEM sets is organized according to the class of target molecules (organic molecules, biomolecules), and the charge definition, or approach used during the development of the parameters (population analysis, QM theory level). Click on the name of any set to get more info (citation, target molecules, approach) in the panel on the right side of the table. In order to view the parameters themselves, click View XML.

In the table with EEM sets, check the column Atoms to see which atom types are covered by each set. ACC will let you know if a given set does not contain parameters for some atom types present in the molecule(s) you uploaded. Look for the orange highlight which marks missing parameters in the column Missing atoms. If parameters are not available for a certain atom type, these atoms will be skipped during the EEM calculation. Note that if built-in EEM sets with all necessary parameters are available, the rest of the sets are hidden by default. To reveal them, untick the option Show only sets with all required parameters at the top of the table with sets.

Select one or more EEM parameter sets for your computation. Pick sets which contain EEM parameters for all or most atoms in your input molecule(s), and which have lower values of the priority descriptor. Each EEM parameter set will be associated with a different ACC job or set of jobs.

ACC FirstTimeUserGuide 08.jpg ACC EEM sets.png
EEM employs special parameters for each atom type. ACC tries to recommend an EEM parameter set suitable for the molecules you uploaded. You may pick one or more EEM parameter sets for your calculation.

You may also add your own set of EEM parameters if you are unhappy with the built-in sets available. Click the Add button at the top of the EEM Parameters section to open a separate window where you can fill in the values of the parameters into a predefined XML template suitable for ACC. Make sure to keep the established XML syntax and give your set a unique name, then click the Add button. ACC will verify the syntax and let you know if you need to change anything. Your new set will be available in the table, along with the built-in sets.

You can use the add function in order to generate modified versions of built-in sets. Click on the built-in set of interest, then View XML on the panel to the right. Copy/paste the content into the Add window and make your modifications. For example, if you copy the parameters for H into the same set, and modify the Element name tag to say P, you have added EEM parameters for phosphorus which are identical to the EEM parameters used for hydrogen. Save this EEM parameter set under a unique name, and it will be available in the table.

This strategy is very useful if the EEM parameter set you wish to use does not contain parameters for certain atom types present in your molecule(s), or if ACC could not read chemical elements from the atom type information in the input file, and reported an unknown chemical elements warning. You must always check the Message column in the table with molecules, to ensure that all relevant atoms are included appropriately in the calculation.


Pick computation method

In addition to classical EEM, ACC offers two time and memory efficient EEM implementations. Please use EEM Cutoff for systems with tens of thousands of atoms, and EEM Cutoff Cover for larger systems, in both cases with a Cutoff Radius parameter of at least 8.

The default computation includes all atoms in the system and solves the entire EEM matrix in double precision. For very large systems you may need to resort to one of the time and memory efficient EEM implementations specifically tailored for such systems (check the Theoretical background for more details). By default, ACC tries to suggest a computation method suitable for the molecule(s) you uploaded.

You may change the method chosen by default, and add one or more methods of your choice using the Add button at the top of the Computation Methods section. Note that each unique method configuration added to the list will result in an additional ACC job, or set of jobs.

Even for molecules containing several thousands of atoms, our implementation of EEM is very time efficient, and you may use the method labeled as Full EEM. For systems with tens of thousands of atoms, you will probably need to pick EEM Cutoff, and a reasonable value for the Cutoff Radius parameter (we recommend a minimum of 10). A biomolecular complex with hundreds of thousands of atoms makes good use of the method EEM Cutoff Cover with a Cutoff Radius of 8 and single Precision.

If your system is solvated or includes a few key water molecules, you may want to run parallel calculations with and without the water molecules (tick the Ignore Waters option), to see how water can affect the charge distribution in the biomolecule. Note that ACC identifies water atoms if they are annotated by a residue name typically associated with water (HOH, WAT, H2O). Other residue names sometimes associated with water, such as SOL and TIP, are not considered here because the Protein Data Bank contains instances of such residues which are chemical components different from water. Thus, if you wish to ignore water which is annotated in your file as SOL or TIPx, you will need to remove these records before submitting to ACC.


Start computation

Before you launch the calculation, check out the summary of the setup. One computation may require more than one job. Each job is uniquely defined by the molecule and its total charge, the set of EEM parameters, and the computation method. The number of jobs thus depends on the number of molecules uploaded, along with the number of sets and methods chosen. Within a single computation, each job may use a different amount of time and memory resources, depending on the size of the molecule and the complexity of the computation method employed.

If you are satisfied with the computation setup, click Compute to launch the calculation. You may always check the progress of the calculation in real time, or at a later time using the same URL as for the Computation Setup page.

ACC FirstTimeUserGuide 10.jpg ACC FirstTimeUserGuide 11.jpg
Each computation may consist of several jobs, each of which with its own memory and time requirements. To find out what the settings mean, why they are required and how they influence your calculation, take advantage of all the help available as interactive guides and tool tips.

Finally, if you don't know what something means or how it works, don't forget that there is plenty of help available directly on the ACC web pages. Hover the mouse over any keyword you do not understand to reveal a useful tool tip. Click on any of the green buttons marking the interactive guides. Note that such guides are available for the entire Setup page, and for any of its sections. Additional specialized guides are provided for the Sample calculations.

The complete list of keywords involved in the setup of ACC calculations is available in the Index. Return to the Table of contents.