Transition states¶

Basics¶

A transition state (TS) is an energy maximum along a reaction pathway that connects reactant and product structures. It is a critical point on the potential energy surface (PES) characterized by one negative eigenvalue in the Hessian matrix that corresponds to a vibrational mode along which the PES is concave down. The TS is essential to theoretical estimates of reaction rates and testing of mechanistic hypotheses due to the exponential decay of the reaction rate with increasing activation energy (the energy difference between the TS and reactant structures).

In contrast to energy minimization, TS optimization is designed to maximize the energy along the lowest eigenvalue mode of the Hessian. Performance is best when the initial structure already has a negative eigenvalue in the Hessian matrix with an eigenvector that points along the direction of the expected reaction. Therefore, it is important for TS optimizations to start with a high-quality initial guess that is already close to the desired optimized structure. There are many ways to obtain an initial guess structure for TS optimizations, such as careful modeling based on chemical intuition, coordinate driving (i.e. scanning a constraint value), internal coordinate interpolation, reaction path optimization (such as nudged elastic band or the string method), and others. The estimated TS structures generated by these approximate methods are sometimes treated as the actual TS structures in the literature, but they are not as precise as “true” TS optimization.

At the start of a TS optimization in geomeTRIC, the full Hessian matrix is calculated by central finite difference of the gradient. The local calculation may take a long time as the number of gradients needed is \(6 \times N_{\mathrm{atoms}}\). Using the Work Queue library or BigChem, you could parallelize the gradient calculations by distributing the calculations on remote machines. Alternatively, if you have an exact or approximate Hessian matrix computed externally, you may provide it as a text file and skip the finite difference Hessian calculation. At the conclusion of the optimization, the user may optionally request an additional Hessian calculation and vibrational analysis to check the number of imaginary modes in the final optimized structure; in most cases, the desired number of imaginary modes is 1.

Usage¶

To enable TS optimization in geomeTRIC, pass --transition yes on the command line when running geometric-optimize. Enabling this switch will change the optimization step, Hessian update method, and trust radii, as described below.

In addition, the Cartesian Hessian will be computed before any optimization steps, along with a vibrational analysis to determine the normal modes. If you want to automatically perform a second vibrational analysis after the optimization is complete, pass --hessian first+last on the command line. The gradient calculations in the finite difference Hessian can be parallelized.

The initial and maximum trust radii may be adjusted by passing (for example) --trust 0.02 --tmax 0.06 on the command line, which is double the default values of 0.01 and 0.03 respectively. Increasing these parameters beyond 0.03 and 0.10 respectively is not recommended.

Example¶

See Examples page.

Theory¶

TS optimization in geomeTRIC uses much of the same code and logic as energy minimization, but a few critical steps are different. The implementation is largely based on established methods in the literature.

Setting up the internal coordinates¶

The initial structure of TS optimizations often contain interatomic distances that are right on the threshold of being bonded vs. nonbonded. The TRIC coordinate system depends on the division of the system into one or more fragments, and works best when there is >1 fragment with no isolated atoms. To facilitate the setting up of TRIC, the Mayer bond order matrix is calculated for the initial structure if implemented in the selected Engine, and two atoms are considered to be bonded if their bond order is greater than the threshold specified by --bothre (the threshold is halved if one or both atoms is a transition metal). If this results in a single fragment, the threshold is increased until at least two fragments are found with >1 atom each, or until the maximum of 0.75 is reached. Any isolated atoms remaining at the end of this process are connected to their nearest neighbor before the internal coordinate system is built.

Hessian transformation¶

The Hessian is first transformed into internal coordinates following Peng et al.:

(1)¶\[\mathbf{H}^{(q)} = \mathbf{B}^{-1} \left( \mathbf{H}^{(x)} - \frac{d\mathbf{B}}{d\mathbf{x}} \cdot \mathbf{g}^{(q)} \right) (\mathbf{B}^{-1})^T\]

where \(B_{ij}=d q_i/d x_j\) is the Wilson B-matrix, \(\mathbf{B}^{-1}=(\mathbf{B}^T\mathbf{B})^{-1}\mathbf{B}\), and \(\mathbf{g}^{(q)}=\mathbf{B}^{-1}\mathbf{g}^{(x)}\) is the internal coordinate gradient. The second derivatives of the primitive ICs, including the rotation ICs, are calculated analytically.

Obtaining the optimization step¶

TS uses the restricted-step partitioned rational function optimization (RS-P-RFO) method to obtain the proposed optimization step in internal coordinates.

In this method, the local expansion of the PES around the current point \(\Delta E\) as a function of the internal coordinate displacement \(\mathbf{y} = \Delta \mathbf{q} = \mathbf{q}_{\textrm{next}} - \mathbf{q}_{\textrm{curr}}\) is approximated by a [2/2] Padé approximant following Banerjee et al.:

(2)¶\[\begin{split}\Delta E (\mathbf{y}) = \frac{\frac{1}{2} \begin{pmatrix} 0 & \mathbf{y}^T \end{pmatrix} \begin{pmatrix} 0 & \mathbf{g}^T\\ \mathbf{g} & \mathbf{H} \end{pmatrix} \begin{pmatrix} 0 \\ \mathbf{y} \end{pmatrix}}{ \begin{pmatrix} 0 & \mathbf{y}^T \end{pmatrix} \begin{pmatrix} 1 & \mathbf{0}^T\\ \mathbf{0} & \alpha \mathbf{I} \end{pmatrix} \begin{pmatrix} 0 \\ \mathbf{y} \end{pmatrix}}\end{split}\]

where \(\mathbf{g}\) and \(\mathbf{H}\) are the gradient and Hessian in internal coordinates and \(\alpha\) is a parameter that restricts the step size, to be discussed later. (In this section we have dropped the superscript \((q)\), because all quantities are in internal coordinates.) To find the stationary points of the Padé approximant, one solves the generalized eigenvalue problem

(3)¶\[\begin{split}\begin{pmatrix} 0 & \mathbf{g}^T\\ \mathbf{g} & \mathbf{H} \end{pmatrix} \begin{pmatrix} v^0_i \\ \mathbf{v}_i \end{pmatrix} = \lambda_i \begin{pmatrix} 1 & \mathbf{0}^T\\ \mathbf{0} & \alpha \mathbf{I} \end{pmatrix} \begin{pmatrix} v^0_i \\ \mathbf{v}_i \end{pmatrix}\end{split}\]

This gives a set of orthogonal directions \(\mathbf{y}_i\) and corresponding eigenvalues \(\mathbf{\lambda}_i\) that satisfy:

(4)¶\[\begin{split}\begin{aligned} & \mathbf{g}^T \mathbf{y}_i = \lambda_i \\ & \mathbf{g} + (\mathbf{H} - \lambda_i \alpha \mathbf{I}) \mathbf{y}_i = 0 \end{aligned}\end{split}\]

where \(\mathbf{y}_i = \mathbf{v}_i/v^0_i\). For minimization or maximization, one chooses the eigenvector corresponding to the smallest or highest eigenvalue respectively, and the eigenvalues in between correspond to saddle points with increasing numbers of negative eigenvalues (however, the second eigenpair is not directly used in RS-P-RFO). In the following equations the index \(i\) will be omitted for clarity assuming a particular eigenvalue / eigenvector pair has been chosen.

In the eigenbasis of the Hessian, i.e. \(\mathbf{H} \mathbf{w}_k = \omega_k \mathbf{w}_k\), the second equation in Eq. (4) simplifies into individual equations for the step projected into each eigenvector, i.e. \(\tilde{y}_k \equiv \mathbf{w}_k \cdot \mathbf{y}\):

(5)¶\[\tilde{y}_k = -\frac{\tilde{g}_k}{\omega_k - \lambda \alpha}\]

where \(\tilde{g}_k \equiv \mathbf{w}_k \cdot \mathbf{g}\).

In partitioned RFO, the optimization space is partitioned into two subspaces that correspond to maximization and minimization respectively, and a separate generalized EVP is solved for each subspace. The partitioning is performed by diagonalizing the Hessian and separating out the lowest eigenvalue / eigenvector pair for maximization, and minimization is carried out for the rest. Therefore, in normal mode coordinates, we have the following for the lowest eigenvalue:

(6)¶\[\begin{split}\begin{pmatrix} 0 & \tilde{g}_{tv}\\ \tilde{g}_{tv} & \omega_{tv} \end{pmatrix} \begin{pmatrix} v^0_{tv} \\ \tilde{\mathrm{v}}_{tv} \end{pmatrix} = \lambda_{tv} \begin{pmatrix} 1 & 0\\ 0 & \alpha \end{pmatrix} \begin{pmatrix} v^0_{tv} \\ \tilde{\mathrm{v}}_{tv} \end{pmatrix}\end{split}\]

where the subscript tv or “transition vector” indicates that the lowest eigenvalue/eigenvector pair of the Hessian is chosen. (Note that the generalized EVP is just a \(2 \times 2\) matrix.) Because we are maximizing within this subspace, we pick the highest eigenvalue/eigenvector pair from the generalized EVP, such that the step along the transition vector is:

(7)¶\[\tilde{y}_{tv} = -\frac{\tilde{g}_{tv}}{\omega_{tv} - \alpha \lambda_{tv;\ max}}\]

The rest of the optimization space corresponding to the other Hessian eigenvalues are set to be minimized. Therefore we have:

(8)¶\[\tilde{y}_{k} = -\frac{\tilde{g}_{k}}{\omega_{k} - \alpha \lambda_{ot;\ min}}\]

where \(k \neq tv\) is the index of any other Hessian eigenvalue/eigenvector pair, and the subscript ot indicates that the generalized EVP was set up using the “other” part of the Hessian, i.e. \(k \in ot\).

Transforming back from normal mode coordinates, we have:

(9)¶\[\mathbf{y} = \mathbf{w}_{tv} \tilde{y}_{tv} + \sum_{k \in ot} \mathbf{w}_k \tilde{y}_k\]

The parameter \(\alpha \geq 1\) is solved for in order to ensure that the P-RFO step is restricted to fall within the internal coordinate trust radius (hence the prefix “RS”), following Bofill et al. If the unrestricted step where \(\alpha = 1\) already falls within the IC trust radius \(R\), then all is good. Otherwise, the squared norm of the step \(|\mathbf{y}^2|\) and its derivative \(d|\mathbf{y}^2|/d\alpha\) are used to optimize the value of \(\alpha\) iteratively until \(|\mathbf{y}^2|(\alpha_{opt}) = R^2\) within a tolerance of 0.001, similar to what is done for energy minimization. The internal coordinate trust radius is itself optimized numerically so that the Cartesian step size falls within a trust radius that is controlled by the step history, same as for energy minimization.

Updating the Hessian matrix¶

When TS optimization is specified, the Hessian matrix is updated between optimization steps using Bofill’s formula which is a linear combination of the Murtagh-Sargent (also called symmetric rank-1) and Powell symmetric Broyden updates:

(10)¶\[\begin{split}\begin{aligned} & \mathbf{H}_{n+1} = (1-\phi) \mathbf{H}_{n+1; MS} + \phi \mathbf{H}_{n+1; PSB} \\ & \mathbf{H}_{n+1; MS} = \mathbf{H}_n + \frac{\boldsymbol{\xi} \otimes \boldsymbol{\xi}}{\boldsymbol{\delta}\cdot\boldsymbol{\xi}} \\ & \mathbf{H}_{n+1; PSB} = \mathbf{H}_n - \frac{(\boldsymbol{\delta}\cdot\boldsymbol{\xi})(\boldsymbol{\delta}\otimes\boldsymbol{\delta})}{|\boldsymbol{\delta}|^4} + \frac{\boldsymbol{\delta}\otimes\boldsymbol{\xi}+\boldsymbol{\xi}\otimes\boldsymbol{\delta}}{|\boldsymbol{\delta}|^2} \end{aligned}\end{split}\]

where

(11)¶\[\begin{split}\begin{aligned} & \boldsymbol{\delta} \equiv \mathbf{q}_{n+1} - \mathbf{q}_n \\ & \boldsymbol{\xi} \equiv (\mathbf{g}_{n+1} - \mathbf{g}_{n}) - \mathbf{H}_n \cdot \boldsymbol{\delta} \\ & \phi = 1 - \frac{(\boldsymbol\delta \cdot \boldsymbol\xi)^2}{|\boldsymbol\delta|^2|\boldsymbol\xi|^2} \end{aligned}\end{split}\]

i.e. \(\boldsymbol\delta\) is the current optimization step, \(\boldsymbol\xi\) is the difference between the actual gradient change and the predicted gradient change using the previous structure’s Hessian, and \(\phi\) measures the alignment between the two vectors, being equal to 1 when they are orthogonal and 0 when parallel. The PSB update is mixed in when the two vectors are almost orthogonal, as the MS update approaches zero and becomes unstable. This updating method is used in TS optimization because it does not preserve positive-definiteness of the Hessian matrix, as opposed to BFGS (which is used in energy minimization). All of the Hessian updates are carried out in internal coordinates.

Step size control¶

The initial and maximum trust radii for TS optimization are set to 0.01 and 0.03 by default (\(0.1 \times\) the values used for energy minimization) in order to ensure that the optimization step stays within the neighborhood of validity of the Hessian. The default values are chosen to maximize job success rates while keeping the number of cycles manageable, as increasing the trust radius by a factor of 2-3 only decreases the number of cycles by 20-30% in tests.

During the optimization the trust radius is adjusted based on the step quality, i.e. a step is “good” if the actual energy change is close to the predicted energy change. Energy changes that are more positive or more negative than the prediction are both considered to detract from the step quality, as opposed to energy minimization, in which more negative energy changes than predicted are considered good. The quality factor is calculated as:

(12)¶\[\begin{split}\begin{aligned} & Q = 1 - \left| \frac{\Delta E_{\mathrm{actual}}}{\Delta E_{\mathrm{pred}}} - 1\right| \\ & \textrm{where }\Delta E_{\mathrm{pred}} = \frac{1}{2} \boldsymbol \delta \cdot \mathbf{H}_n \cdot \boldsymbol \delta + \boldsymbol \delta \cdot \mathbf{g}_n \end{aligned}\end{split}\]

The trust radius is adjusted based on the value of \(Q\) as follows:

\(Q \geq 0.75\) : “Good” step, trust radius is increased by a factor of \(\sqrt{2}\), but not greater than the maximum.
\(0.75 > Q \geq 0.50\) : “Okay” step, trust radius is unchanged.
\(0.5 > Q \geq 0.0\) : “Poor” step, trust radius is decreased by setting it to \(0.5 \times \mathrm{min}(trust\_rad, step\_size)\), but not lower than the minimum.
\(Q < 0.0\) : Step is rejected in addition to decreasing the trust radius as above.

These thresholds are more conservative than in the case of energy minimization due to the need to stay in the valid neighborhood of the Hessian matrix.

Hessian calculations & vibrational analysis¶

At present, geomeTRIC only supports the calculation of Hessians via finite difference of the gradient, and it cannot call the QC software to compute the analytic Hessian. If you want to use an analytic Hessian from running the QC software separately, save it to a text file and pass --hessian file:/path/to/file on the command line. The Hessian file must be stored as a square matrix in Numpy-readable text format (not binary) with dimension \(3 \times N_{\mathrm{atoms}}\).

The gradient calculations may be parallelized by distributing the jobs to remote “worker” nodes using the Work Queue distributed computing library; this can greatly reduce the wall time relative to performing the gradient calculations serially. To enable this behavior, first ensure that the Work Queue library and Python module are installed, then pass --wqport #### on the command line where #### is a custom port number (I usually use a large four-digit number, such as 7953, that is not commonly used by other services). Then run the work_queue_worker program on the worker node, providing the host name that is running geometric-optimize and the port number. The worker node must have the QC software installed with the environment variables properly set when work_queue_worker is executed; one common approach is to write a batch script to execute workers on clusters managed by Slurm or similar job schedulers. If successful, the worker will establish a connection to the master and begin to accept gradient jobs. Parallelization is achieved by running multiple workers on one or more nodes (you can run workers locally too).

Similarly, BigChem can be used to parallelize the gradient calculations for the Hessian. First, the Redis server can be started on the head node:

redis-server --bind 0.0.0.0 --daemonize yes --logfile redis.log

Before we start the workers, the environment variables need to be set:

export BIGCHEM_BROKER_URL="redis://your_head_node_hostname_or_ip/0"
export BIGCHEM_BACKEND_URL="redis://your_head_node_hostname_or_ip/0"

Now the workers can be started on any computing nodes that can reach the head node:

celery -A bigchem.tasks worker --without-heartbeat --without-mingle --without-gossip --loglevel=INFO

While both the server and workers are active, use the --bigchem yes flag to enable BigChem for parallel Hessian calculations.

The Hessian calculation and vibrational analyses should give the same results as if you had requested them directly from the quantum chemistry code. After the vibrational analysis, the Gibbs free energy corrections are computed using an ideal gas / rigid rotor / harmonic oscillator approximation (imaginary frequency modes are ignored). The free energy calculation may be customized by passing --thermo <temp> <pres> and providing the temperature and pressure. Additionally, by passing --wigner <num_samples> one could obtain a desired number of sample points from the Wigner phase space distribution.