NEEMP:Examples

From WebChem Wiki
Jump to: navigation, search


This section shows several use case examples. All of them use only data from examples directory.

NB: NEEMP is case-sensitive. Please follow carefully this section for insight on the correct syntax. Additional information can be found here.

Example 1 - Training set information

./neemp -m info --atom-types-by ElemBond --sdf-file examples/set01.sdf
Prints information about training set set01.sdf, provided as argument of the --sdf-file option, grouping atoms according to chemical element and bond order. To override this behaviour and group atoms only by element, call NEEMP with --atom-types-by Element option.


Figure 9: Information about structures set examples/set01.sdf.

Example 2 - Calculation mode

./neemp -m charges --sdf-file examples/set01.sdf --par-file examples/ElemBond.par --chg-out-file eem_charges --max-threads 8
Compute EEM charges and store them into the file eem_charges. Use up to 8 threads for computation.

For more information on this mode and its output, refer to Calculation mode section.

For details on eem_charges output file, see CHG file paragraph, as the format is the same. The only different regards the third column, since in this case the EEM charges are listed in place of the ab-initio charges.

Example 3 - Parametrization mode

./neemp -m params -p lr-full --sdf-file examples/set01.sdf --chg-file examples/set01.chg --par-out-file new_parameters.par --chg-stats-out-file stats
Perform EEM parametrization using LR approach, save parameters into the file new_parameters.par, output charge statistics for each molecule into the file stats. No discarding is used.

NB: the parameters set file new_parameters.par presents the same identical format and layout as described here.

For additional information click here.

Example 4 - Parametrization mode k search

./neemp -m params -p lr-full --sdf-file examples/set01.sdf --chg-file examples/set01.chg --kappa-max 1.0 --fs-precision 0.02 --sort-by RMSD -vv
Similar to previous example, use custom range for k. Select best parameters according to RMSD. Print k search progress.

Reference to parametrization paragraph and options list if necessary.

Ksearch1.png Ksearch2.png
Figure 10: Detailed view of the parametrization settings from the execution of the above command. In particular note the usage of a different evaluating statistics (RMSD) and the custom range for the k search. Figure 11: Abridged output for the k search progress. For each k value a set of parameters is generated and evaluated. In this particular instance, the parameters set with the lowest RMSD value is chosen.

Example 5 - Parametrization mode simple discard

./neemp -m params -p lr-full --sdf-file examples/set01.sdf --chg-file examples/set01.chg --discard simple -v --limit-iters 600 --limit-time 00:30:00
Perform simple discarding with LR approach and print its progress. Issue warning about molecules with abnormal values of statistical descriptors. The duration of the discarding procedure can be at most 600 iterations or 30 minutes (time format HH:MM:SS), whichever is reached first.


Figure 12: Abridged output for the simple discard progress. The simple discarding requires as many iterations as the number of molecules in the set, unless the iterations or time limits are reached first. In this case the training set contains only 500 structures so the iteration limit doesn't affect the discarding. The time limit still holds on the other hand. Refer to the parametrization paragraph for details.

Example 6 - Parametrization mode DE-MIN

./neemp -m params -p de --sdf-file examples/set01.sdf --chg-file examples/set01.chg --om-pop-size 50 --om-iters 500 --random-seed 1234 -vv
Compute parameters for the given molecules (set01.sdf) and ab-initio charges (charges.chg). The chosen optimization method: differential evolution + minimization will act on a parameter vector population consisting of 50 units. The optimization procedure will be carried out for at most 500 iterations. A user-defined seed is used, as well as an high level of verbosity.

For additional information click here.

Example 7 - Coverage validation

./neemp -m cover -sdf-file examples/set02.sdf --par-file examples/ElemBond.par
Calculate coverage of supplied molecules set given a specific parameter set. For more information on this mode refer to Coverage validation section.


Figure 13: Output from Cover mode. In this case the parameter set ElemBond.par lacks the parameters for all the marked atomic types in set02.sdf, leading to the discard of 1690 molecules. Note the resemblance with Calculation mode output for a similar low coverage circumstance.

Example 8 - Quality validation

./neemp -m quality --sdf-file examples/set01.sdf --chg-file examples/set01.chg --par-file examples/ElemBond.par --chg-stats-out-file stats 2> warns > log
Perform quality validation of EEM parameters for atoms grouped by atomic type. Save standard output into the file log and warnings into the file warns. Moreover output charge statistics for each molecule into the file stats.

For more information on this mode click here.

Figure 14: Close-up from stats file. Along with statistics for each molecule, ab-initio charges (3rd column), EEM charges (4th column) and their difference (5th column) are also printed out.