Potential of mean force

(2.120) |

Where , the is the average force and therfore is the so-called Potential of Mean Force (PMF). A particular example would be that describes the interaction between two molecules held a fixed distance when the remaining molecules are canonically averaged over all configurations.

In a more practical way, the PMF can be used to know how the free energy changes as a function of a coordinate of the system. It can be a geometrical coordinate or a more general energetic (solvent) coordinate. Unlike the mutations used frequently in free energy perturbation calculations which are often along non-physical pathways, the PMF is usually calculated for a physical achievable process. In particular it is useful for predicting the rates in chemical solution and for elucidating the reaction mechanism of condensed phase reaction such as enzymatic processes.

The PMF along some coordinate is defined from the average distribution function

(2.121) |

where and are arbitrary reference values. The average distribution function along the coordinate is obtained from a Boltzmann weighted average.

Where is the Dirac delta function for the coordinate . We are assuming that the chosen coordinate is a geometrical coordinate , but can have any other functionality. For processes with an activation barrier higher than kT the distribution function cannot be computed by a straight molecular dynamics simulation. Such computation would not converge due to low sampling in higher-energy configurations. Special sampling techniques (non-Boltzmann sampling) have been developed to obtain a PMF along a coordinate . Although PMF of enzymatic reactions can be calculated using free energy perturbation [112,188,195], the method used in this thesis and explained here is the Umbrella Sampling technique [196,197].

**Umbrella Sampling:**

In this method, the microscopic system is simulated in the presence of an artificial biasing window potential
that is added to the potential energy
.

(2.123) |

This forces the system to compute an ensemble average over a non-Boltzmann distribution within a small interval of a prescribed value of .

Unless the entire range of coordinate is spanned in a single simulation, multiple simulations (windows) are performed with different biasing umbrella potentials that center the sampling in different overlapping regions or windows of .

A reasonable choice to produce the biased ensembles, though not the unique^{2.12},
is to use for every window an harmonic function of the form

(2.124) |

At every window the distribution function is efficiently converged
through the expression of the distribution function in equation 1.122
substituting
for
.
The superindices and indicate *biased* and *unbiased* respectively.
The distribution functions from the various windows need to be unbiased
(the non-Boltzmann factor is removed)
and then recombined together to obtain the final estimate PMF . Otherwise, obtaining the unbiased PMF for the
window

(2.125) |

where are also undetermined constant that represents the free energy associated with introducing the window potential.

(2.126) |

These undetermined constants are obtained by adjusting the various adjacent windows . The process of unbias and recombine the different simulation windows is the main difficulty in Umbrella Sampling. There are different strategies to do it, the most intuitive is the adjusting the various adjacent windows (manually or automatically by least-squares) in the region in which they overlap until they match.

A more sophisticated strategy is the Weighted Histogram Analysis Method (WHAM) [198] that makes usage of all the information in the umbrella sampling, and does not discard the overlapping regions. In addition, WHAM does not require a significant amount of overlap and it can be easily extended to multi-dimensional PMF [199,200].

In particular the WHAM technique computes the total unbiased distribution function as a weighted sum of the unbiased distribution functions . This weighting function can be expressed in terms of the known biased distribution functions

where is the number of windows and is the number of data points (number of steps in a MD) in the window. The free energy constants need to be obtained from an optimal estimate of the total distribution function

Equations 1.127 and 1.128 need to be solved iteratively until self-consistence. A common procedure is giving a guess to the set of free energy constants , obtain the total distribution function by equation 1.127 and use it to compute in equation 1.128. The procedure is stopped when the difference between the constants in two consecutive iterations is below a threshold.

See reference [201] for a comparative study between the different techniques to unbias and recombine the data extracted from Umbrella Sampling.