.. _transition:

Transition states
=================

Basics
------

A transition state (TS) is an energy maximum along a reaction pathway that connects reactant and product structures.  
It is a critical point on the potential energy surface (PES) characterized by one negative eigenvalue in the Hessian matrix that corresponds to a vibrational mode along which the PES is concave down.
The TS is essential to theoretical estimates of reaction rates and testing of mechanistic hypotheses due to the exponential decay of the reaction rate with increasing activation energy (the energy difference between the TS and reactant structures).

In contrast to energy minimization, TS optimization is designed to maximize the energy along the lowest eigenvalue mode of the Hessian.
Performance is best when the initial structure already has a negative eigenvalue in the Hessian matrix with an eigenvector that points along the direction of the expected reaction.
Therefore, it is important for TS optimizations to start with a high-quality initial guess that is already close to the desired optimized structure.
There are many ways to obtain an initial guess structure for TS optimizations, such as careful modeling based on chemical intuition, coordinate driving (i.e. scanning a constraint value), internal coordinate interpolation, reaction path optimization (such as nudged elastic band or the string method), and others.
The estimated TS structures generated by these approximate methods are sometimes treated as the actual TS structures in the literature, but they are not as precise as "true" TS optimization.

At the start of a TS optimization in geomeTRIC, the full Hessian matrix is calculated by central finite difference of the gradient.
The local calculation may take a long time as the number of gradients needed is :math:`6 \times N_{\mathrm{atoms}}`.
Using the :ref:`Work Queue <installcctools>` library, you could parallelize the gradient calculations by distributing the calculations on remote machines.
Alternatively, if you have an exact or approximate Hessian matrix computed externally, you may provide it as a text file and skip the finite difference Hessian calculation.
At the conclusion of the optimization, the user may optionally request an additional Hessian calculation and vibrational analysis to check the number of imaginary modes in the final optimized structure; in most cases, the desired number of imaginary modes is 1.

Usage
-----
To enable TS optimization in geomeTRIC, pass ``--transition yes`` on the command line when running ``geometric-optimize``.
Enabling this switch will change the optimization step, Hessian update method, and trust radii, as described below.

In addition, the Cartesian Hessian will be computed before any optimization steps, along with a vibrational analysis to determine the normal modes.
If you want to automatically perform a second vibrational analysis after the optimization is complete, pass ``--hessian first+last`` on the command line.
The gradient calculations in the finite difference Hessian can be :ref:`parallelized <hessian_calculation>`.

The initial and maximum trust radii may be adjusted by passing (for example) ``--trust 0.02 --tmax 0.06`` on the command line, which is double the default values of 0.01 and 0.03 respectively.
Increasing these parameters beyond 0.03 and 0.10 respectively is not recommended.

Example
-------
See :ref:`Examples page <example-malonate>`.

Theory
------

TS optimization in geomeTRIC uses much of the same code and logic as energy minimization, but a few critical steps are different.
The implementation is largely based on established methods in the literature.

.. _transition_tric:

Setting up the internal coordinates
"""""""""""""""""""""""""""""""""""

The initial structure of TS optimizations often contain interatomic distances that are right on the threshold of being bonded vs. nonbonded.
The TRIC coordinate system depends on the division of the system into one or more fragments, and works best when there is >1 fragment with no isolated atoms.
To facilitate the setting up of TRIC, the Mayer bond order matrix is calculated for the initial structure if implemented in the selected :ref:`Engine <engines>`, and two atoms are considered to be bonded if their bond order is greater than the threshold specified by ``--bothre`` (the threshold is halved if one or both atoms is a transition metal).
If this results in a single fragment, the threshold is increased until at least two fragments are found with >1 atom each, or until the maximum of 0.75 is reached.
Any isolated atoms remaining at the end of this process are connected to their nearest neighbor before the internal coordinate system is built.

Hessian transformation
""""""""""""""""""""""

..
    Created TinyURL because the super long DOI caused syntax errors

The Hessian is first transformed into internal coordinates following `Peng et al. <https://tinyurl.com/4ebtrh4n>`_:

.. math::
    \mathbf{H}^{(q)} = \mathbf{B}^{-1} \left( \mathbf{H}^{(x)} - \frac{d\mathbf{B}}{d\mathbf{x}} \cdot \mathbf{g}^{(q)} \right) (\mathbf{B}^{-1})^T
   :label: ic_hessian

where :math:`B_{ij}=d q_i/d x_j` is the Wilson B-matrix, :math:`\mathbf{B}^{-1}=(\mathbf{B}^T\mathbf{B})^{-1}\mathbf{B}`, and :math:`\mathbf{g}^{(q)}=\mathbf{B}^{-1}\mathbf{g}^{(x)}` is the internal coordinate gradient.
The second derivatives of the primitive ICs, including the rotation ICs, are calculated analytically.

Obtaining the optimization step
"""""""""""""""""""""""""""""""

TS uses the restricted-step partitioned rational function optimization (RS-P-RFO) method to obtain the proposed optimization step in internal coordinates.

In this method, the local expansion of the PES around the current point :math:`\Delta E` as a function of the internal coordinate displacement :math:`\mathbf{y} = \Delta \mathbf{q} = \mathbf{q}_{\textrm{next}} - \mathbf{q}_{\textrm{curr}}` is approximated by a [2/2] Padé approximant following `Banerjee et al. <https://simons.hec.utah.edu/papers/paperso/108.pdf>`_:

.. math::
    \Delta E (\mathbf{y}) = \frac{\frac{1}{2}
    \begin{pmatrix}
    0 & \mathbf{y}^T
    \end{pmatrix} 
    \begin{pmatrix}
    0 & \mathbf{g}^T\\
    \mathbf{g} & \mathbf{H}
    \end{pmatrix} 
    \begin{pmatrix}
    0 \\
    \mathbf{y}
    \end{pmatrix}}{
    \begin{pmatrix}
    0 & \mathbf{y}^T
    \end{pmatrix} 
    \begin{pmatrix}
    1 & \mathbf{0}^T\\
    \mathbf{0} & \alpha \mathbf{I}
    \end{pmatrix} 
    \begin{pmatrix}
    0 \\
    \mathbf{y}
    \end{pmatrix}}
    :label: eq_pade

where :math:`\mathbf{g}` and :math:`\mathbf{H}` are the gradient and Hessian in internal coordinates and :math:`\alpha` is a parameter that restricts the step size, to be discussed later.
(In this section we have dropped the superscript :math:`(q)`, because all quantities are in internal coordinates.)
To find the stationary points of the Padé approximant, one solves the generalized eigenvalue problem

.. math::
    \begin{pmatrix}
    0 & \mathbf{g}^T\\
    \mathbf{g} & \mathbf{H}
    \end{pmatrix} 
    \begin{pmatrix}
    v^0_i \\
    \mathbf{v}_i
    \end{pmatrix} = \lambda_i
    \begin{pmatrix}
    1 & \mathbf{0}^T\\
    \mathbf{0} & \alpha \mathbf{I}
    \end{pmatrix} 
    \begin{pmatrix}
    v^0_i \\
    \mathbf{v}_i
    \end{pmatrix}
    :label: eq_genevp

This gives a set of orthogonal directions :math:`\mathbf{y}_i` and corresponding eigenvalues :math:`\mathbf{\lambda}_i` that satisfy:

.. math::
    \begin{aligned}
    & \mathbf{g}^T \mathbf{y}_i = \lambda_i \\
    & \mathbf{g} + (\mathbf{H} - \lambda_i \alpha \mathbf{I}) \mathbf{y}_i = 0
    \end{aligned}
    :label: eq_split

where :math:`\mathbf{y}_i = \mathbf{v}_i/v^0_i`.
For minimization or maximization, one chooses the eigenvector corresponding to the smallest or highest eigenvalue respectively, and the eigenvalues in between correspond to saddle points with increasing numbers of negative eigenvalues (however, the second eigenpair is not directly used in RS-P-RFO).
In the following equations the index :math:`i` will be omitted for clarity assuming a particular eigenvalue / eigenvector pair has been chosen.

In the eigenbasis of the Hessian, i.e. :math:`\mathbf{H} \mathbf{w}_k = \omega_k \mathbf{w}_k`, the second equation in Eq. :eq:`eq_split` simplifies into individual equations for the step projected into each eigenvector, i.e. :math:`\tilde{y}_k \equiv \mathbf{w}_k \cdot \mathbf{y}`:

.. math::
   \tilde{y}_k = -\frac{\tilde{g}_k}{\omega_k - \lambda \alpha}
   :label: rfo_eigenbasis

where :math:`\tilde{g}_k \equiv \mathbf{w}_k \cdot \mathbf{g}`.

In partitioned RFO, the optimization space is partitioned into two subspaces that correspond to maximization and minimization respectively, and a separate generalized EVP is solved for each subspace.
The partitioning is performed by diagonalizing the Hessian and separating out the lowest eigenvalue / eigenvector pair for maximization, and minimization is carried out for the rest.
Therefore, in normal mode coordinates, we have the following for the lowest eigenvalue:

.. math::
    \begin{pmatrix}
    0 & \tilde{g}_{tv}\\
    \tilde{g}_{tv} & \omega_{tv}
    \end{pmatrix} 
    \begin{pmatrix}
    v^0_{tv} \\
    \tilde{\mathrm{v}}_{tv}
    \end{pmatrix} = \lambda_{tv}
    \begin{pmatrix}
    1 & 0\\
    0 & \alpha
    \end{pmatrix} 
    \begin{pmatrix}
    v^0_{tv} \\
    \tilde{\mathrm{v}}_{tv}
    \end{pmatrix}
    :label: eq_genevp_tv

where the subscript *tv* or "transition vector" indicates that the lowest eigenvalue/eigenvector pair of the Hessian is chosen.
(Note that the generalized EVP is just a :math:`2 \times 2` matrix.)
Because we are maximizing within this subspace, we pick the *highest* eigenvalue/eigenvector pair from the generalized EVP, such that the step along the transition vector is:

.. math::
    \tilde{y}_{tv} = -\frac{\tilde{g}_{tv}}{\omega_{tv} - \alpha \lambda_{tv;\ max}}
    :label: step_tv

The rest of the optimization space corresponding to the other Hessian eigenvalues are set to be minimized. Therefore we have:

.. math::
    \tilde{y}_{k} = -\frac{\tilde{g}_{k}}{\omega_{k} - \alpha \lambda_{ot;\ min}}
    :label: step_ot

where :math:`k \neq tv` is the index of any other Hessian eigenvalue/eigenvector pair, and the subscript *ot* indicates that the generalized EVP was set up using the "other" part of the Hessian, i.e. :math:`k \in ot`.

Transforming back from normal mode coordinates, we have:

.. math::
   \mathbf{y} = \mathbf{w}_{tv} \tilde{y}_{tv} + \sum_{k \in ot} \mathbf{w}_k \tilde{y}_k
   :label: step_combine

The parameter :math:`\alpha \geq 1` is solved for in order to ensure that the P-RFO step is *restricted* to fall within the internal coordinate trust radius (hence the prefix "RS"), following `Bofill et al. <https://doi.org/10.1007/s002140050387>`_
If the unrestricted step where :math:`\alpha = 1` already falls within the IC trust radius :math:`R`, then all is good. 
Otherwise, the squared norm of the step :math:`|\mathbf{y}^2|` and its derivative :math:`d|\mathbf{y}^2|/d\alpha` are used to optimize the value of :math:`\alpha` iteratively until :math:`|\mathbf{y}^2|(\alpha_{opt}) = R^2` within a tolerance of 0.001, similar to what is done for energy minimization.
The internal coordinate trust radius is itself optimized numerically so that the Cartesian step size falls within a trust radius that is controlled by the step history, same as for energy minimization. 

Updating the Hessian matrix
"""""""""""""""""""""""""""

When TS optimization is specified, the Hessian matrix is updated between optimization steps using `Bofill's formula <https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.540150102>`_ which is a linear combination of the Murtagh-Sargent (also called symmetric rank-1) and Powell symmetric Broyden updates:

.. math::
    \begin{aligned}
    & \mathbf{H}_{n+1} = (1-\phi) \mathbf{H}_{n+1; MS} + \phi \mathbf{H}_{n+1; PSB} \\
    & \mathbf{H}_{n+1; MS} = \mathbf{H}_n + \frac{\boldsymbol{\xi} \otimes \boldsymbol{\xi}}{\boldsymbol{\delta}\cdot\boldsymbol{\xi}} \\
    & \mathbf{H}_{n+1; PSB} = \mathbf{H}_n - \frac{(\boldsymbol{\delta}\cdot\boldsymbol{\xi})(\boldsymbol{\delta}\otimes\boldsymbol{\delta})}{|\boldsymbol{\delta}|^4} + \frac{\boldsymbol{\delta}\otimes\boldsymbol{\xi}+\boldsymbol{\xi}\otimes\boldsymbol{\delta}}{|\boldsymbol{\delta}|^2}
    \end{aligned}
   :label: full_ts_update

where

.. math::
    \begin{aligned}
    & \boldsymbol{\delta} \equiv \mathbf{q}_{n+1} - \mathbf{q}_n \\
    & \boldsymbol{\xi} \equiv (\mathbf{g}_{n+1} - \mathbf{g}_{n}) - \mathbf{H}_n \cdot \boldsymbol{\delta} \\
    & \phi = 1 - \frac{(\boldsymbol\delta \cdot \boldsymbol\xi)^2}{|\boldsymbol\delta|^2|\boldsymbol\xi|^2}
    \end{aligned}
   :label: ts_update_defs

i.e. :math:`\boldsymbol\delta` is the current optimization step, :math:`\boldsymbol\xi` is the difference between the actual gradient change and the *predicted* gradient change using the previous structure's Hessian, and :math:`\phi` measures the alignment between the two vectors, being equal to 1 when they are orthogonal and 0 when parallel.
The PSB update is mixed in when the two vectors are almost orthogonal, as the MS update approaches zero and becomes unstable.
This updating method is used in TS optmization because it does not preserve positive-definiteness of the Hessian matrix, as opposed to BFGS (which is used in energy minimization).
All of the Hessian updates are carried out in internal coordinates.

Step size control
"""""""""""""""""

The initial and maximum trust radii for TS optimization are set to 0.01 and 0.03 by default (:math:`0.1 \times` the values used for energy minimization) in order to ensure that the optimization step stays within the neighborhood of validity of the Hessian.
The default values are chosen to maximize job success rates while keeping the number of cycles manageable, as increasing the trust radius by a factor of 2-3 only decreases the number of cycles by 20-30% in tests.

During the optimization the trust radius is adjusted based on the step quality, i.e. a step is "good" if the actual energy change is close to the predicted energy change.
Energy changes that are more positive or more negative than the prediction are both considered to detract from the step quality, as opposed to energy minimization, in which more negative energy changes than predicted are considered good.
The quality factor is calculated as:

.. math::
    \begin{aligned}
    & Q = 1 - \left| \frac{\Delta E_{\mathrm{actual}}}{\Delta E_{\mathrm{pred}}} - 1\right| \\
    & \textrm{where }\Delta E_{\mathrm{pred}} = \frac{1}{2} \boldsymbol \delta \cdot \mathbf{H}_n \cdot \boldsymbol \delta + \boldsymbol \delta \cdot \mathbf{g}_n
    \end{aligned}
   :label: ts_quality

The trust radius is adjusted based on the value of :math:`Q` as follows:

* :math:`Q \geq 0.75` : "Good" step, trust radius is increased by a factor of :math:`\sqrt{2}`, but not greater than the maximum.
* :math:`0.75 > Q \geq 0.50` : "Okay" step, trust radius is unchanged.
* :math:`0.5 > Q \geq 0.0` : "Poor" step, trust radius is decreased by setting it to :math:`0.5 \times \mathrm{min}(trust\_rad, step\_size)`, but not lower than the minimum.
* :math:`Q < 0.0` : Step is rejected in addition to decreasing the trust radius as above.

These thresholds are more conservative than in the case of energy minimization due to the need to stay in the valid neighborhood of the Hessian matrix.

.. _hessian_calculation:

Hessian calculations & vibrational analysis
-------------------------------------------

At present, geomeTRIC only supports the calculation of Hessians via finite difference of the gradient, and it cannot call the QC software to compute the analytic Hessian.
If you want to use an analytic Hessian from running the QC software separately, save it to a text file and pass ``--hessian file:/path/to/file`` on the command line.
The Hessian file must be stored as a square matrix in Numpy-readable text format (not binary) with dimension :math:`3 \times N_{\mathrm{atoms}}`.

The gradient calculations may be parallelized by distributing the jobs to remote "worker" nodes using the `Work Queue distributed computing library <https://ccl.cse.nd.edu/software/workqueue/>`_; this can greatly reduce the wall time relative to performing the gradient calculations serially.
To enable this behavior, first ensure that the Work Queue library and Python module are installed, then pass ``--port ####`` on the command line where ``####`` is a custom port number (I usually use a large four-digit number, such as 7953, that is not commonly used by other services).
Then run the ``work_queue_worker`` program on the worker node, providing the host name that is running ``geometric-optimize`` and the port number.
The worker node must have the QC software installed with the environment variables properly set when ``work_queue_worker`` is executed; one common approach is to write a batch script to execute workers on clusters managed by Slurm or similar job schedulers.
If successful, the worker will establish a connection to the master and begin to accept gradient jobs.
Parallelization is achieved by running multiple workers on one or more nodes (you can run workers locally too).

The Hessian calculation and vibrational analyses should give the same results as if you had requested them directly from the quantum chemistry code.
After the vibrational analysis, the Gibbs free energy corrections are computed using an ideal gas / rigid rotor / harmonic oscillator approximation (imaginary frequency modes are ignored).
The free energy calculation may be customized by passing ``--thermo <temp> <pres>`` and providing the temperature and pressure.
Additionally, by passing ``--wigner <num_samples>`` one could obtain a desired number of sample points from the Wigner phase space distribution.