Monday, December 29, 2014

Predicting binding free energies with electronic structure theory: thermodynamic considerations - Part 2


2015.01.25:  I have summarized discussion of this and related issues in this paper.  I have also made some changes in the post to reflect what I have learned since I wrote it.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in a previous post. In addition I'll talk about further improvements that could potentially increase the accuracy further.

The general approach is based on the following equation

$\Delta G^\circ = \Delta E + \Delta G^\circ_{\mathrm{RRHO}}+\Delta \delta G_{\mathrm{solv}}$

where $\Delta E$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E$. $\Delta \delta G_{\mathrm{solv}}$ is the change in solvation free energy computed using a polarizable continuum model such as COSMO.

pH and molecular charge. Virtually all binding measurements in aqueous solution are performed in buffer with a constant pH and many ligands and or receptors contain one or more ionizable groups. The charge of an ionizable (acid/base) group in aqueous solution depends on its pK$_a$ and the pH:

$q= \frac{1}{1+10^{\mathrm{pH}-\mathrm{p}K_a}}-\delta$

where $\delta$ is 1 for an acid and 0 for a base.  In most cases, selecting the wrong charge for the ligand and/or host will result in a significant error in the computed binding free energy.  The pK$_a$ can be computed using electronic structure theory or empirically using software such Marvin. However, if the pK$_a$ value is perturbed by the binding the situation may be complicated further. Here I illustrate this point for a simple example where the ligand (L) has a basic group that is neutral when deprotonated and the receptor (R) is non-ionizable.

$\mathrm{R + L(H^+) \rightleftharpoons RL(H^+)}$

The (apparent) experimental binding constant is then

$K^\prime=\mathrm{\frac{[RL]+[RLH^+]}{[R]([L]+[LH^+])}}$

and the corresponding binding free energy is

$\Delta G^{\prime\circ}=\Delta G^{\circ}(+)-RT\ln \left( \frac{1+10^{\mathrm{pH}-\mathrm{p}K_a^c}}{1+10^{\mathrm{pH}-\mathrm{p}K_a^f}} \right)$   (1)

where $\Delta G^{\circ}(+)$ is the binding free energy computed using the charged (protonated) form of the ligand and pK$_a^c$ and pK$_a^f$ are the pK$_a$ values the ligand bound to the receptor and the free ligand, respectively.

If the pK$_a$ is unaffected by the binding, the binding free energy is unaffected by pH and the chosen protonation state of the ligand. For the remaining scenarios it is instructive to plug in some numbers. For example, for pH = 7, pK$_a^f$ = 9 and pK$_a^c$ = 11, the ligand is protonated before and after binding and $\Delta G^{\prime\circ}=\Delta G^{\circ}(+)$ is a good approximation. However, if pH = 7, pK$_a^f$ = 5 and pK$_a^c$ = 3, the ligand is neutral before and after binding and assuming it is charged leads to a 2.7 kcal/mol error in the binding free energy (unless corrected by this equation).  Finally, if pH = 7, pK$_a^f$ = 8 and pK$_a^c$ = 6, the ligand is (91%) charged before and (91%) neutral after binding and assuming it remains charged leads to a 1.4 kcal/mol error in the binding free energy.

For many ligands of interest the pK$_a^f$ can be estimated fairly accurately in a matter of second using programs such as Marvin. The effect of binding on pK$_a^f$ can often be estimated by chemical intuition since hydrogen bonds to charged acid and basic groups tend to, respectively, lower or raise the pK$_a$ even further.  For example, if an amine with pK$_a^f$ = 9 binds to the receptor via hydrogen bonding, then pK$_a^c$ is likely higher than 9 and $\Delta G^{\prime\circ}=\Delta G^{\circ}(+)$ is a good approximation.  However, if pK$_a^f$ is close to 7 then pK$_a^c$ should be computed.  Also, it is possible for charged ligands to change to their neutral state if they bind to hydrophobic or similarly charged receptors.

If pK$_a^f$ is known with some degree of confidence then pK$_a^c$ can be estimated by

$\mathrm{p}K_a^c=\mathrm{p}K_a^f-\Delta G^\circ /RT\ln(10)$

where $\Delta A^\circ$ is the free energy change for

$\mathrm{RLH^+ + L \rightleftharpoons  RL + LH^+}$

If there are several ionizable groups then Eq (1) generalizes to

$\Delta G^{\prime\circ}=\Delta G^{\circ}(-/+)-RT\ln \left(\sum_i \frac{1+10^{n_i(\mathrm{pH}-\mathrm{p}K_{a,i}^c)}}{1+10^{n_i(\mathrm{pH}-\mathrm{p}K_{a,i}^f)}} \right)$

where $\Delta G^{\circ}(-/+)$ is the binding free energy when all acids and bases are deprotonated and protonated, respectively, the sum runs over all ionizable groups and $n_i$ is 1 and -1 if $i$ is a base or acid, respectively.

However, this assumes that the ionizable groups titrate independently of one another, i.e. that the pK$_a$ value of one group is independent of the protonation states of all other ionizable groups.  If that is not the case then it is difficult to give a general expression for the pH-dependent free energy correction in terms of pK$_a$ values (though it can easily be derived for a specific case).

Instead a general expression can be written in terms of Legendre transformed free energies as suggested by Alberty (modified here to electronic structure calculations):

$G^{\prime\circ}(\overline{X})=-RT\ln \left( \sum_i \exp{(-G^{\prime\circ}(X_i)/RT)} \right)$

Here the sum runs over all possible protonation states and

$G^{\prime\circ}(X_i)=G^{\circ}(X_i)-n(\mathrm{H^+})[\delta G^\circ (\mathrm{H^+}) -RT\ln(10)\mathrm{pH}]$

where $G^{\circ}(X_i)$ is the usual standard free energy of protonation state $i$, $n(\mathrm{H^+})$ is the number of ionizable proton in pronation state $i$, and $\delta G^\circ (\mathrm{H^+})$ is the solvation free energy of the proton.  So in the case of ligand L considered above, $n(\mathrm{H^+})$ is 0 and 1 for L and LH$^+$, respectively. $\delta G^\circ (\mathrm{H^+})$ is usually taken from the literature where estimates vary between -265.8 and -268.6 kcal/mol.

Thus, Eq (1) can be rewritten as

$\Delta G^{\prime\circ}=G^{\prime\circ}(\overline{RL})-G^{\prime\circ}(\overline{L})-G^{\circ}(R)$

Since the electronic energy contribution to the standard free energy can be very large in magnitude this form is more easily evaluated

$G^{\prime\circ}(\overline{X})=G^{\prime\circ}({X_0})-RT\ln \left( 1+\sum_{i\ne0} \exp{(-G^{\prime\circ}(X_i)/RT)} \right)$

where $X_0$ is some arbitrarily chosen reference protonation state, for example that for which $n(\mathrm{H^+})$ = 0.  The sum can be combined with that over different conformations, discussed in a previous post.

Other ions
The buffers that are commonly used to regulate the pH also contain other ions, such as Na$^+$, Mg$^{2+}$, Cl$^-$.  At high ion concentrations, it is possible that these ions bind at certain sites in the ligand, receptor, or ligand-receptor complex with sufficient probability that they must be included in the thermodynamics. If so the exact same equations and considerations outlined above for H$^+$ also apply to, e.g. Cl$^-$ and pCl$^-$ (computed from the specified buffer concentration) is used instead of pH.



This work is licensed under a Creative Commons Attribution 4.0

Saturday, December 27, 2014

Predicting binding free energies with electronic structure theory: thermodynamic considerations - Part 1


2015.01.25:  I have summarized discussion of this and related issues in this paper.  I have also made some changes in the post to reflect what I have learned since I wrote it.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in subsequent posts. In addition I'll talk about further improvements that could potentially increase the accuracy further.

The approach is based on the following equation

$\Delta G^\circ = \Delta E + \Delta G^\circ_{\mathrm{RRHO}}+\Delta \delta G_{\mathrm{solv}}$

where $\Delta E$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E$. $\Delta \delta G_{\mathrm{solv}}$ is the change in solvation free energy computed using a polarizable continuum model such as COSMO.

Electronic energy
Grimme has shown that dispersion typically makes a very big (>10 kcal/mol) contribution to binding free energies of host-guest complexes. Dispersion corrections are therefore a must if DFT is used to compute the electronic binding energy. Furthermore, Grimme has shown that three-body dispersion makes a non-negligible (2-3 kcal/mol) contribution to the electronic binding energy.  For convergent methods this effect is only included in rather expensive methods that involve triple-excitations such as MP4 and CCSD(T).

Molecular Thermodynamics
The translational, rotational and vibrational thermodynamic contribution to the binding free energy is very large (>10 kcal/mol) and must be included for accurate results.  Some years ago there was a bit of confusion in the literature about whether the RRHO approach was appropriate for condenses phase systems, but Zhou and Gilson have clarified this beautifully.

Standard state.  Most electronic structure codes compute the RRHO energy corrections for an ideal gas, where the standard state is a pressure of 1 bar.  The standard state for solution is 1 M, so the free energy of a molecule X must be corrected accordingly

$G^\circ_{\mathrm{RRHO}}(\mathrm{X}) = G^{\circ,\mathrm{1 bar}}_{\mathrm{RRHO}}(\mathrm{X})-RT\ln (V^{-1})$

where $V$ is the volume of an ideal gas at 1 bar and temperature $T$. At 298K $-RT\ln (V^{-1})$ = 1.90 kcal/mol.  This correction is already included in the solvation energy $\delta G_{\mathrm{solv}}$

Symmetry. Many host molecules and some guest molecules are symmetric, which affects the rotational entropy through the symmetry number ($\sigma$) which is a function of the point group.

$S_{rot}=R\ln\left(\frac{8\pi^2}{\sigma}\left(\frac{2\pi ekT}{h^2}\right)^{3/2}\sqrt{I_1I_2I_3}\right)$

It can be very difficult to build large molecules with the correct point group and most studies use $C_1$ symmetry.  In this case the effect of symmetry must be added manually to the free energy

$G^\circ_{\mathrm{RRHO}}(\mathrm{X}) \rightarrow G^\circ_{\mathrm{RRHO}}(\mathrm{X})+RT \ln(\sigma_X)$

As an example, the popular host molecule corcubit[7] has $D_{7h}$ symmetry and a corresponding $\sigma$ value of 14, in which case the correction contributes 1.56 kcal/mol to the free energy at 298K.

Anharmonicity and low frequency modes. Host-guest complexes can exhibit very low frequency vibrations on the order of 50 cm$^{-1}$ or less, which tend to dominate the vibrational entropy contribution.  Grimme (and many others) have questioned whether the harmonic approximation is valid for such low frequency modes and this is an open research question.  The main problem is that it is very difficult to compute the vibrational entropy exactly.  Most methods for computing anharmonic vibrational are developed to obtain the 2 or 3 lowest energy states, but for very low frequency modes 10-20 states likely significantly populated at room temperature and therefore contribute to the entropy.

In the absence of theoretical benchmarks, comparison to experiment can prove constructive. Kjærgaard and co-workers have recently measured standard binding free energies for small gas phase compounds and compared them to CCSD(T)/aug-cc-pV(T+d) calculations.  For example, in the case of acetronitrile-HCl the measured binding free energy at 295K is between 1.2 and 1.9 kcal/mol, while the predicted value is 2.3 kcal/mol using the harmonic approximation.  Since the errors in $\Delta E$ and the rigid-rotor approximation presumably are quite low, this suggest and error in the vibrational free energy of at most 1.1 kcal/mol, despite the fact that the lowest vibrational frequency is only about 30 cm$^{-1}$.  Furthermore, the error can be reduced by 0.4 kcal/mol by scaling the harmonic frequencies by anharmonic scaling factors suggested by Shields and co-workers.  Similar results were found for dimethylsulfide-HCl.  So there are some indications that the harmonic approximation yields free energy corrections that are reasonable and that can be improved upon by relatively minor corrections.

Grimme has taken a different approach by arguing that low-frequency modes resemble free rotations and using the corresponding entropy term for low frequency modes.  This changes the RRHO free energy correction by 0.5 - 4 kcal/mol, depending on the system.

Imaginary frequencies. Low frequencies are especially susceptible to numerical error and it is not unusual to see 1 or 2 imaginary frequencies of low magnitude in a vibrational analysis of a host-guest complex.  Since imaginary frequencies are excluded from the vibrational free energy this effectively removes 1 or 2 low frequency contributions to the vibrational free energy. For example, a 30 cm$^{-1}$ frequency contributes about 1.7 kcal/mol to the free energy at 298K.

Imaginary frequencies resulting from a flat PES and numerical errors can often be removed by making the convergence criteria for the geometry optimization and electronic energy minimization more stringent and making the grid size finer in the case of DFT calculations. If the Hessian is computed using finite difference it is important to use double-differencing.  If all else fails, it is probably better to pretend that the imaginary frequency is real and add the corresponding vibrational free energy contribution.

Conformations. One of the main problems in computing accurate binding free energies is to identify the structures of the host, guest and (especially) the host-guest complex with the lowest free energy. Because both the RRHO and solvation energy contributions contribute greatly to the binding free energy change, simply finding the structure with the lowest electronic energy and computing the free energy only for that conformation is probably not enough.

The free energy for a molecule (X) with N conformations the standard free energy is

$G^\circ(\mathrm{X})= G^\circ _0(\mathrm{X})-RT\ln\left(\sum^N_{i=1} \left( 1+\exp{\left( -(G^\circ _i- G^\circ _0)/RT\right)} \right)\right)$

where $G^\circ _0(\mathrm{X})$ is the conformation with lowest free energy.  Conformations with free energies higher than 1.36 kcal/mol contribute less than 0.1 to the sum at 298K.  So a significant number of very low free energy structures is needed to make even a 0.5 kcal/mol contribution to the free energy.  Conformations related by symmetry should not be included here as their effects are accounted for in the rotational entropy (see above).  Also, conformationally flexible regions of the ligand or host that are unaffected by binding need not be explored since the effect on the binding free energy will cancel.



This work is licensed under a Creative Commons Attribution 4.0

Tuesday, December 9, 2014

QM computed standard free energy changes and pH


2015.01.25:  I have summarized discussion of this and related issues in this paper.  I have also made some changes in the post to reflect what I have learned since I wrote it.

The problem
Thermodynamics involving ionizable functional group at constant pH require special considerations. Robert Alberty has written extensively on this topic (example) but I confess I had some trouble relating the work to quantum chemical calculations.  This post aims to do just that.

Let's say you want to compute the standard free energy change at pH 7 for this equilibrium

$\mathrm{B \rightleftharpoons HA}$

where molecule $\mathrm{B}$ can be converted to an acid $\mathrm{HA}$, which is in equilibrium with it conjugate base and thus pH-dependent.

$\mathrm{HA \rightleftharpoons A^- + H^+}$

The apparent equilibrium constant ($K^\prime$) is

$K^\prime=\mathrm{\frac{[HA]+[A^-]}{[B]}}$

How to compute the corresponding standard free energy change?:

$\Delta G^{\prime\circ}=-RT\ln(K^\prime)$

The Solution
$K^\prime$ can be rewritten as

$K^\prime=K+\frac{KK_a}{[\mathrm{H^+}]}$

where

$K=\mathrm{\frac{[HA]}{[B]}}=\exp{(-\Delta G^\circ/RT)}$

and

$K_a=\mathrm{\frac{[A^-][H^+]}{[HA]}}=\exp{(-\Delta G^\circ_a/RT)}$

Thus,

$K^\prime=\exp{(-\Delta G^\circ/RT)} + \exp{(-\Delta G^\circ/RT)}\exp{(-\Delta G^\circ_a/RT)}10^{\mathrm{pH}}$

$=\exp{(-\Delta G^\circ/RT)} + \exp{(-(\Delta G^\circ+\Delta G^\circ_a-RT\ln(10)\mathrm{pH})/RT)}$

$=\frac{\exp{(-G^\circ(\mathrm{HA})/RT)}+\exp{(-G^{\prime\circ}(\mathrm{A^-})/RT)} }{\exp{(-G^\circ(\mathrm{B})/RT)}}$

where

$G^{\prime\circ}(\mathrm{A^-})=G^{\circ}(\mathrm{A^-})+[G^{\circ}(\mathrm{H^+})-RT\ln(10)\mathrm{pH}]$  (1)

Thus

$\Delta G^{\prime\circ}= G^{\prime\circ}(\mathrm{\overline{HA}})-G^\circ(\mathrm{B})$ (2)

where

$G^{\prime\circ}(\mathrm{\overline{HA}})=-RT\ln \left( \exp{(-G^\circ(\mathrm{HA})/RT)}+\exp{(-G^{\prime\circ}(\mathrm{A^-})/RT)}  \right)$ (3)

Here $\mathrm{\overline{HA}}$ refers to "$\text{HA and A}$"

An example
How do you compute the standard free energy change in aqueous solution at pH 7 using QM for this reaction?:

$\mathrm{\text{HC(=O)-}NH_2+ H_2O \rightleftharpoons HCOOH + NH_3}$

First you optimize each molecule and perform the vibrational analysis to get the free energies.  You can account for solvation effects using a method like PCM or COSMO.  $\mathrm{HCOOH}$ and $\mathrm{NH_3}$ are acids and basis, respectively so you will also need the energies for $\mathrm{HCOO^-}$ and $\mathrm{NH_4^+}$

Most QM programs assume the molecules are in the gas phase when computing the Gibbs free energies so the standard state is 1 bar.  The free energies must be corrected for the solution standard state of 1 M:

$G^\circ(\mathrm{X}) = G^{\circ,\mathrm{1 bar}}(\mathrm{X})-RT\ln (V^{-1})$

where $V$ is the molar volume of an ideal at gas at your chosen temperature.  The exception is water since that is also the solvent.  Here the standard state is 55.34 M at 25C.

$G^\circ(\mathrm{H_2O}) = G^{\circ,\mathrm{1 bar}}(\mathrm{H_2O})-RT\ln ((55.34V)^{-1})$

$G^{\prime\circ}(\mathrm{HCOO^-})$ and is then computed using Eq (1). $G^{\circ}(\mathrm{H^+})$ is usually taken from the literature though estimates vary between -265.8 and -268.6 kcal/mol.

$G^{\prime\circ}(\mathrm{\overline{HCOOH}})$ can then be computed using Eq (3).  The electronic energy component of $G^{\circ}(\mathrm{X})$ can be quite large in magnitude and give some numerical problems when computing the exponential function so Eq (3) and be rewritten as

$G^{\prime\circ}(\mathrm{\overline{HA}})= G^\circ(\mathrm{HA})-RT\ln \left(1+\exp{(-(G^{\prime\circ}(\mathrm{A^-})-G^\circ(\mathrm{HA}))/RT)}  \right)$

Following a derivation similar to the one at the beginning of this post,

$G^{\prime\circ}(\mathrm{NH_4^+})=G^{\circ}(\mathrm{NH_4^+})-[G^{\circ}(\mathrm{H^+})-RT\ln(10)\mathrm{pH}]$

and $G^{\prime\circ}(\mathrm{\overline{NH_3}})$ is computed by Eq (3)

Finally, we put everything together

$\Delta G^{\prime\circ}=G^{\prime\circ}(\mathrm{\overline{HCOOH}})+G^{\prime\circ}(\mathrm{\overline{NH_3}})- G^\circ(\mathrm{\text{HC(=O)-}NH_2})-G^\circ(\mathrm{H_2O})  $

Another way
If you happen to know the pKa values (e.g. from experiment) of the ionizable species you can use this expression

$\Delta G^{\prime\circ}=\Delta A^{\circ}-RT\ln \left( 1+10^{\mathrm{pH}- pK_{a,\mathrm{HCOOH}}} \right)-RT\ln \left( 1+10^{ pK_{a,\mathrm{NH_4^+}}-\mathrm{pH}} \right)$

where

$\Delta G^{\circ}=G^{\circ}(\mathrm{HCOOH})+G^{\circ}(\mathrm{NH_3})- G^\circ(\mathrm{\text{HC(=O)-}NH_2})-G^\circ(\mathrm{H_2O})$



This work is licensed under a Creative Commons Attribution 4.0

Wednesday, December 3, 2014

Errors in the book

It seems the comment section for pages is no longer working on Blogger. So I am creating this post as a complement to this page.  Please leave comments below. Please remember to tag me (+jan jensen) so I am alerted to your comment.

Tuesday, December 2, 2014

Saturday, November 22, 2014

Computing a solvent accessible surface area using Jmol


Someone asked me how to compute a surface area of a molecule.  Here's how to compute a solvent accessible surface area using Jmol.

1. Load your molecule into Jmol (File > Open)

2. Open the Script Editor and Output Console (File > Script Editor, etc)

3. Type the following three lines into the Script Editor window and hit run

set solventproberadius 1.2
isosurface solvent
isosurface area set 0

1.2 Å is the probe radius for water solvent.  If you are using another solvent input another value.  It will correspond roughly to the radius of the solvent molecule.

4. The are will be printed out in the Console.  The units are not specified but I'm sure it's Å2 (square Ångstrom)

Questions about the book?

I have a sneaking suspicion that at least one of you have questions about somethings that I wrote in the book.  So ask away below in the comments section and please remember to tag me in the post (+jan jensen - see picture below) so I get alerted.

Wednesday, November 19, 2014

Introducing the Computational Chemistry Daily


Yesterday I started the Computational Chemistry Daily which aggregates posts related to computational chemistry on social media such as Twitter, Google+ and Facebook on a daily basis.  It identifies the posts by certain words or hashtags such as #compchem and everyone can contribute. Hope you find it useful.


This work is licensed under a Creative Commons Attribution 4.0

Sunday, November 9, 2014

Calculation of Solvation Free Energies of Charged Solutes Using Mixed Cluster/Continuum Models


2015.01.25:  I have summarized discussion of this and related issues in this paper.  Point 1 is wrong and Gibbs free energies should be used throughout.

This post takes it title from this paper by Bryantsev et al., which provides an excellent discussion of the issue.  The point of this post is to suggest four modifications to the one of the approaches discussed in the paper:

1. The use of standard Helmholz free energies instead of standard Gibbs free energies

2. Correction for indistinguishability of water molecules

3. Parameterization of a water oxygen solvent radius

4. The general use of the cluster model

Background
Continuum solvation models do not account for strong and specific interactions of solvent molecules with the solute (notably ions) so one must include some explicit solvent molecules in the continuum calculations.  The question is how many explicit solvent molecules to use.

Pliego and co-workers suggested that the optimum number of water molecules is the one for which $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$, computed using Scheme 1, is most negative (refer to the paper for a definition of terms).

Taken from 10.1021/jp802665d. (c) American Chemical Society

However, Bryantsev et al. argue that $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ should be computed according to Scheme 2 and that computed in this way, $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ will converge smoothly to the best value.  Thus, the optimum number of explicit solvent molecules is that for which $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ changes very little when you add another one.

In principle, Scheme 1 and 2 should yield the same $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ but Bryantsev et al. argue that Scheme 2 provides better cancellation of error compared to Scheme 1. The authors demonstrate this by point showing that the computed free energy of water cluster formation in bulk water, computed using Scheme 3, deviates significantly from 0 with increasing cluster size.

Taken from 10.1021/jp802665d. (c) American Chemical Society

In Figure 1 I show a plot of the residual error ($RE$, large dots), 

$RE=-n\Delta G^*_{\text{solv}}(\text{H}_2\text{O}) + \Delta G^\circ_{\text{g,bind}}-(n-1)\Delta G^{\circ-*}$
$ + \Delta G^*_{\text{solv}}((\text{H}_2\text{O})_n)-RT\ln (n[\text{H}_2\text{O}]^{n-1})$

computed using two different continuum solvation models (SM6 red and COSMO blue) for increasing number of $n$.

Figure 1. Plot of RE without (large dots) and with (small dots) the Helmholz free energy and correction for indistinguishability of water molecules


Next I'll suggest some changes that significantly reduce $RE$. 

1. The use of standard Helmholz free energies instead of standard Gibbs free energies
The change in Helmholz free energy ($\Delta A^\circ$) is related to the change in Gibbs free energy ($\Delta G^\circ$) by

$\Delta G^\circ=\Delta A^\circ+p^\circ \Delta V$

In the gas phase $p^\circ \Delta V = \Delta n RT$ where $\Delta n$ is the difference between the number of product and reactant molecules.  In solution $\Delta V$ is hard to estimate, but the best approximation is $\Delta V \approx 0$ and $\Delta G^\circ \approx \Delta A^\circ$.  So, I suggest using 

$\Delta A^\circ_{\text{g,bind}}=\Delta G^\circ_{\text{g,bind}}-(n-1)RT$

2. Correction for indistinguishability of water molecules
Let's consider formation of the water dimer in solution, i.e. Scheme 3 for $n=2$.  As I have written about earlier, there are two ways of making the water dimer: one in which molecule $A$ is the H-donor and one where molecule $B$ is the H-donor (I assume the gas phase Hessian calculation for the water molecule is done in $C_{2v}$ symmetry).  In general, a (H$_2$O)$_n$ cluster can be made $n!$ ways so I suggest using 

$\Delta A^\circ_{\text{g,bind}}=\Delta G^\circ_{\text{g,bind}}-(n-1)RT-RT\ln(n!)$ 

With these corrections $RE$ vs $n$ is significantly reduced as shown in Figure 1 (small dots). For example, for $n=18$ and COSMO $RE$ has been reduced from 41 to 10 kcal/mol.  The remaining error comes from errors in the solvation energies, hydrogen bond strengths, anharmonicity, and conformational entropy.  The $n=18$ cluster has around 30 hydrogen bonds to the 10 kcal/mol error could easily be explained by a 0.3 kcal/mol error in the computed hydrogen bond strength. 

3. Parameterization of a water oxygen solvent radius
The solvation energies are clearly another source of error.  For example, the COSMO value for $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ is 0.35 kcal/mol lower than the experimental value, corresponding to an error of 6.3 kcal/mol for 18 water molecules.  Part of that error is probably cancelled by similar errors in $\Delta G^*_{\text{solv}}((\text{H}_2\text{O})_n)$ but not all.  

In fact, the higher errors for SM6 must be due to its underestimating $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ by 2.51 kcal/mol (corresponding to ca 45 kcal/mol in the gas phase for $n=18$!).  Thus if SM6 has to be used it is important to re-parameterize it so that $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ is reproduced reasonably well.  The easiest way is probably to change the radius of the oxygen water molecule sphere.

Notice that results are not improved simply by using the experimental value of $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ instead of the computed one.  For example, for $n=18$ the difference in $RE$ for SM6 and COSMO (where the error in the solvation energy is quite small) is "only" ca 20 kcal/mol so introducing a 45 kcal/mol correction will not improve things for the right reasons. 

Changes in Scheme 1 and 2
So, I suggest the following corrections to Scheme 1 and 2, respectively

$\Delta A^\circ_{\text{g,bind}}(I)=\Delta G^\circ_{\text{g,bind}}(I)-nRT-RT\ln(n!)$ 

$\Delta A^\circ_{\text{g,bind}}(II)=\Delta G^\circ_{\text{g,bind}}(II)-RT$ 

Notice that there is a $RT\ln(n!)$ term for both $(\text{H}_2\text{O})_n$ and $[\text{A}(\text{H}_2\text{O})_n]^{m\pm}$ leading to cancellation of this term for Scheme 2.

Clearly the corrections are much more important for Scheme 1 and Scheme 2 is still the more accurate approach due to better cancellation of errors.

4. The general use of the cluster model
While Bryantsev et al. advocate the use of the cluster cycle to find the optimum cluster size for the ions they still seem to advocate the use of a monomer cycle for pKa predictions (Scheme 4).

Taken from 10.1021/jp802665d. (c) American Chemical Society

I think better cancellation would be achieved by using two water clusters of size $n$ and $m$ instead of $n+m$ water monomers.  Thus

$\Delta G^\circ_{\text{g,deprot}}-(n+m-1)\Delta G^{\circ-*} \rightarrow \Delta A^\circ_{\text{g,deprot}} - \Delta G^{\circ-*}$

$\Delta A^\circ_{\text{g,deprot}}=\Delta G^\circ_{\text{g,deprot}}-RT$ 

$(n+m)\Delta G^*_{\text{solv}}(\text{H}_2\text{O}) \rightarrow \Delta G^*_{\text{solv}}((\text{H}_2\text{O})_n)+ \Delta G^*_{\text{solv}}((\text{H}_2\text{O})_m)$

$(n+m)RT\ln([\text{H}_2\text{O}]) \rightarrow 2RT\ln([\text{H}_2\text{O}]/(nm))$



This work is licensed under a Creative Commons Attribution 4.0

Saturday, October 4, 2014

Tight binding DFT


Tight binding DFT (DFTB) is a semi-empirical method with speed and accuracy similar to NDDO-based semiempirical methods such as AM1, PM3, and PM6.  Currently there are three types of DFTB methods called DFT1, DFTB2, and DFTB3. DFTB1 and DFTB2 are sometimes called non-SCC DFTB (non-selfconsistent charge) and SCC-DFTB, respectively.  DFTB3 is generally considered the most accurate for molecules and there are several parameter sets for DFTB2 and DFTB3 for different elements.  Compared to PM6, DFTB has so far been parameterized for relatively few elements.

DFTB1
The closed shell DFTB1 energy is computed from the following equation
$$E^{\text{DFTB1}}=\sum_i^{N/2} \sum_\mu^K \sum_\nu^K 2 C_{\mu i} C_{\nu i} H^0_{\mu\nu}+\sum_A \sum_{B>A} E^{\text{rep}}_{AB}$$
where $C_{\mu i}$ is the molecular orbital coefficient for MO $i$ and basis function $\mu$.
$$H^0_{\mu\nu}= \begin{cases}
\varepsilon^{\text{free atom}}_{\mu\mu} & \text{if } \mu=\nu \\
0 & \text{if } A=B, \mu\ne\nu\\
\langle \chi_\mu | \hat{T}+V_{\text{eff}}[\rho_0^A+\rho_0^B] | \chi_\nu \rangle & \text{if } A\ne B
\end{cases}$$
Here,  $\varepsilon^{\text{free atom}}_{\mu\mu}$ is an orbital energy of a free atom, $\chi$ is a valence Slater-type orbital (STO) or numerical orbital, $\hat{T}$ is the electronic kinetic energy operator, $V_{\text{eff}}$ is the Kohn-Sham potential (electron-nuclear attraction, electron-electron repulsion, and exchange correlation), and $\rho_0^A$ is the electron density of neutral atom $A$.

DFT calculations on free atoms using some functional yield $\left\{\varepsilon^{\text{free atom}}_{\mu\mu} \right\}$, $\left\{\chi\right\}$, and $\rho_0$, which are then used to compute $H^0_{\mu\nu}$ for A-B atom pairs at various separations $R_{AB}$ and stored. When performing DFTB calculations $H^0_{\mu\nu}$ is simply computed for each atom pair A-B by interpolation using this precomputed data set.

Similarly, the overlap matrix $\left\{ \langle \chi_\mu | \chi_\nu \rangle  \right\}$ need to orthonormalize the MOs are computed for various distances and stored for future use.

$E^{\text{rep}}_{AB}$ is an empirical repulsive pairwise atom-atom potential with parameters adjusted to minimize the difference in atomization energies, geometries, and vibrational frequencies computed using DFTB and DFT or electronic structure calculations for set of molecules.

So, a DFTB1 calculation is performed by constructing $\mathbf{H}^0$, diagonalizing it to yield $\mathbf{C}$, and then computing $E^{\text{DFTB1}}$.

DFTB2
$$E^{\text{DFTB2}}=E^{\text{DFTB1}}+\sum_A \sum_{B>A} \gamma_{AB}(R_{AB})\Delta q_A\Delta q_B$$
where $\Delta q_A$ is the Mulliken charge on atom $A$ and $\gamma_{AB}$ is a function of $R_{AB}$ that tends to $1/R_{AB}$ at long distances.

The Mulliken charges depend on $\mathbf{C}$ so a selfconsistent calculation is required:
1. Compute DFTB1 MO coefficients, $\mathbf{C}$
2. Use $\mathbf{C}$ to compute $\left\{ \Delta q \right\}$
3. Construct and diagonalize $H_{\mu \nu}$ to get new MO coefficients, $\mathbf{C}$
$$H_{\mu \nu}=H_{\mu \nu}^0 + \frac{1}{2} S_{\mu\nu} \sum_C (\gamma_{AC}+\gamma_{BC})\Delta q_C, \mu \in A, \nu \in B$$
4. Repeat steps 2 and 3 until selfconsistency.

DFTB3
$$E^{\text{DFTB3}}=E^{\text{DFTB2}}+\sum_A \sum_{B>A} \sum_{C>B>A} \Gamma_{AB}\Delta q_A^2\Delta q_B$$
$\Gamma_{AB}$ is computed using interpolation using precomputed data. A SCF calculation is required.

Parameter sets and availability
DFTB is available in a variety of software packages.  I don't believe DFTB3 is currently in Gaussian and DFTB is also available in CHARMM and CP2K.  DFTB will soon be available in GAMESS.

Note that each user/lab must download the parameter file separately here.  There are several parameter sets.  The most popular sets for molecules are the MIO (materials and biological systems) for DFTB2 and 3OB (DFT3 organic and biological applications).

Dispersion and hydrogen bond corrections
Just like DFT and PM6, the DFTB can be corrected for dispersion and hydrogen bond effect.



This work is licensed under a Creative Commons Attribution 4.0

Saturday, September 6, 2014

Four Practical Comments Regarding RHF Calculations

1. If the program you use to perform a geometry optimization at the HF level complains of convergence problems, the first step is to determine whether it is the geometry optimization that failed to converge (i.e. the program could not find a geometry with a low gradient given a certain number of steps) or whether it is the SCF-procedure at a particular geometry that failed to converge (i.e. the program could not find a set or orbitals that have stopped changing given a certain number of steps). It is my experience that if the SCF does not converge in the default number of steps, then simply increasing the allowed number of SCF steps does not solve the problem. The problem is usually either in the initial guess orbitals or there is an error in the molecular structure.

2. Most quantum chemistry programs will have RHF as the default method, which implies a singlet wave function. So if you provide the program with a molecule that has an odd number of electrons (e.g. NH4) the program will complain that it cannot do a singlet calculation on a molecule with an odd number of electrons, and asks you to change either the charge or the multiplicity, i.e. to specify whether you want to do an UHF or ROHF calculation on the doublet state of the NH4 radical, or whether you meant to do a singlet RHF calculation on NH4+. You will get very different results depending on your choice, so the object here is not to use trial and error to find a combination of charge and multiplicity that satisfies the program. Rather you have to decide what makes sense chemically, before you do the calculation.

3. Once you have defined the position of the nuclei and the number of electrons (by specifying the overall charge of the molecule) the SCF procedure does the rest. Therefore, you don’t have to specify that the positive charge “is on the NH3 group” in CH3NH3+. Each orbital has access to all the basis functions in the molecule and the electrons generally “go where they want”. Also, changing the CN bond to a double bond in a graphics program does not affect the result (as long as the CN bond length is the same).

4. This should be obvious, but just in case. It makes no sense to compare the total energy (E) of two molecules (or collection of molecules) that have a different number and kinds of atoms. For example, substituting a hydrogen atom in a molecule with a methyl group will lead to a considerable lowering of the RHF energy. Most of this energy lowering will some from the electron-nuclear attraction energy of the electron in the 1s orbital of the C atom, but says nothing about the relative stability of the two molecules. In the Introduction to Molecular Modeling course I used to teach, I used to subtract 20 points (out of a hundred) for such a comparison in a laboratory report

Tuesday, September 2, 2014

Computational Chemistry Highlights: August issue

The August issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, David Bowler, Mario Barbatti, and Jan Jensen:

Interactive Chemical Reactivity Exploration





Saturday, August 9, 2014

Eight practical comments regarding geometry optimizations.

 1. An energy minimization can stop either because a stationary point is found or the program ran out of steps. It’s important that you know why it stopped. If it ran out of steps you have to do another energy minimization starting with the geometry that has the lowest energy. This is not always the last geometry!

2. Larger molecules will require more steps to reach a minimum than smaller molecules. The relationship is roughly linear, so if you double the size of the molecules you will probably take at least twice as many steps, and maybe more. The reason is that each step explores one degree of freedom the molecules, all degrees of freedom must be explored to find the minimum (unless you get lucky), and the number of degrees of freedom increases linearly with the number of atoms (3N-6). This is not quite true for Newton-Raphson energy minimizations if the Hessian is computed, but this is rarely done in practice (see point 5).

3. Floppy molecules with relative flat PESs require more steps than rigid molecules, e.g. a linear hydrocarbon takes longer to optimize than the corresponding cyclic structure. The reason is that flat surfaces mean small gradients and therefore small steps. Also, flat surfaces tend to be non-quadratic so the Newton-Raphson steps have to be scaled back.

4. It is sometimes more efficient to use internal coordinates (bond lengths and angles) for the optimization than Cartesian coordinates, since internal coordinates are more quadratic than Cartesian coordinates. However, if the molecule has many fused rings it might be better to use Cartesians coordinates because of linear-dependence issues.

5. For Newton-Raphson minimizations the Hessian is usually estimated rather than computed exactly because it is expensive to calculate. Attempts are usually made to improve or update the Hessian after each step. So, if you run 50 Newton-Raphson steps from an initial geometry, take the last geometry as the new guess, and run another 50 Newton-Raphson steps you will take a different path towards the minimum than if you run 100 Newton-Raphson steps from the initial geometry. This is because you start with a new guess at the Hessian in step “51” in the first approach. This usually works better in my experience.

6. If you want to find a transition state you must use the Newton-Raphson method, since only the Hessian can tell you in which direction you have to go uphill (the direction is defined by the normal mode corresponding to the imaginary frequency). For the same reason you must compute the Hessian for the initial guess geometry of your transition state.

7. For molecules with more than ~1000 atoms the algorithm needed to invert the Hessian (i.e. produce the inverse of H) becomes very expensive, so steepest descent or conjugate gradient methods are usually used to energy minimize such big molecules.

8. If your starting geometry for ammonium is planar (e.g. in the xy plane), it will stay planar. The reason for this is that the gradient perpendicular to the plane is zero because of symmetry. In essence there is two opposing ways of making ammonia non-planar and the net effect is a cancellation of the force out-of-the plane. Notice that the gradient components in the x and y directions are small (below the optimization threshold) but non-zero, while the gradient in the z-direction is exactly zero. Of course, a zero gradient means no displacement in the out-of-plane direction and ammonia stays planar during the energy minimization.

More generally, the molecule will retain the symmetry of the starting guess structure during the minimization, whether that symmetry corresponds to an energy minimum or not.

Monday, June 2, 2014

Tuesday, April 1, 2014

Computational Chemistry Highlights: March issue

The March issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors David Bowler, Esteban Vohringer-Martinez, François-Xavier Coudert, Mario Barbatti, Jan Jensen and Steven Bachrach:

Validation Challenge of Density-Functional Theory for Peptides—Example of Ac-Phe-Ala5-LysH+







Saturday, January 25, 2014

Protein unfolding at high temperature happens because of entropy


Here is a video in which I use Molecular Workbench to illustrate protein unfolding.  I then go on to explain how conformational entropy contributes to the spontaneous unfolding with increasing temperature. The videos are part of a series that I am working on.


This work is licensed under a Creative Commons Attribution 4.0 International License.

Wednesday, January 15, 2014

ChemTube3D goes inorganic on the iPad

Two exciting developments over at on of my favorite chemistry sites: ChemTube3D.  They have switched to JSmol so that it now works on iOS devises and doesn't require you to Java-enable your browser.

Furthermore, they have greatly expanded their inorganic chemistry section with interactive versions of figures from an inorganic chemistry textbook.

Thursday, January 2, 2014

Computational Chemistry Highlights: December issue

The December issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editor Steven Bachrach:


This work is licensed under a Creative Commons Attribution 4.0 International License.