Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Cover

Density Functional Theory

Lecture Note

Get Started

Preamble

Density Functional Theory (DFT) is one of the most powerful theoretical frameworks for studying the electronic structure of atoms, molecules, and solids. It is widely used in condensed matter physics, quantum chemistry, and materials science to predict structural, electronic, magnetic, and optical properties of materials.

Course Curriculum

This course bridges the gap between rigorous quantum mechanical theory and state-of-the-art computational materials science, moving from fundamental equations to real-world software applications.

Part I: Theoretical Foundations of DFT

Starting from the quantum many-body problem, we detail the exact mathematical proofs behind the Hohenberg–Kohn theorems and the derivation of the effective single-particle Kohn–Sham equations. With the exact theory established, you will systematically unpack the approximations required for real-world materials. This includes navigating the hierarchy of exchange-correlation functionals (from LDA to hybrids), constructing exact basis sets and pseudopotentials (such as PAW methods), and treating magnetic ordering via Spin-Polarised DFT. Finally, we address the self-interaction error in strongly correlated \(d\) and \(f\) electron systems using the DFT+U (Hubbard) method.

Part II: Numerical Implementations & Practical DFT

Theoretical knowledge is only half the battle. The second half of the course opens the black box of industry-standard DFT codes like VASP and Quantum ESPRESSO. You will learn how to translate continuous mathematics into robust iterative algorithms by mastering Self-Consistent Field (SCF) Convergence—including how to diagnose charge sloshing in metals and apply advanced density mixing schemes (Pulay/DIIS, Broyden). We also cover rigorous Brillouin Zone Integration, detailing \(k\)-point sampling and the specific smearing methods (Methfessel-Paxton, tetrahedron) required to stabilise metallic Fermi surfaces.

By the end of this course, you will possess the practical expertise to run, troubleshoot, and analyze high-accuracy simulations for contemporary materials science research.


Meet your instructor

Prerequisites

Participants should be familiar with:

  • Basic Quantum Mechanics
  • Thermodynamics and Statistical Mechanics
  • Solid State Physics
  • Elementary linear algebra and differential equations

Course Objectives

By the end of this course, participants will:

  • Understand the theoretical foundations of Density Functional Theory
  • Learn the derivation and significance of the Hohenberg–Kohn theorems
  • Understand the Kohn–Sham formalism and its practical implementation
  • Become familiar with common exchange–correlation functionals
  • Learn how to perform and analyze DFT calculations
  • Explore advanced extensions such as DFT+U and time-dependent DFT

Target Audience

This course is designed for:

  • Graduate students in Physics, Chemistry, and Materials Science
  • Researchers entering the field of computational materials science
  • Students interested in electronic structure theory

Introduction

The density functional theory (DFT) has established itself as the primary tool to calculate the electronic structure of materials.

At its heart, the method minimizes the Helmholtz free energy using the variational principle. The Helmholtz energy of a system is a functional of the density, and minimizing it gives us the equilibrium density.

DFT for electronic structure calculation takes the eigenvalue of the Schrödinger equation as a functional of the charge density.

Schrödinger’s Equation: Revisited for Many-Particle Systems

The Schrödinger equation is the governing equation for quantum mechanical systems:

\[ \left[-\frac{\hbar^2}{2m_e}\nabla^2+V(\mathbf{r})\right]\Psi(\mathbf{r})=E\Psi(\mathbf{r}) \]

The problem grows intractable when many particles are present. The many-body Hamiltonian for \(N\) nuclei and \(n\) electrons is:

\begin{align*} \hat{H} =&-\frac{\hbar^2}{2m}\sum_{i}^n\nabla^2_i+ \frac{1}{2}\sum_{i\neq j}\frac{1}{|\mathbf{r}_i-\mathbf{r}_j|} & \qquad \mbox{Electron} \newline & -\frac{\hbar^2}{2M_j}\sum_j^N\nabla^2_j + \frac{1}{2}\sum_{i\neq j}\frac{Z_iZ_j}{|{R_i-R_j}|} & \mbox{Nuclei}\newline & -\sum_{i,j}\frac{Z_j}{|\mathbf{r}_i-\mathbf{R}_j|} & \mbox{Electron-Nucleus} \end{align*}

The time-independent Hamiltonian \(\hat{H}=H(p,q)\) is a function of the generalised momenta and coordinates \((p,q)\). Solving it becomes increasingly difficult for systems with more than one electron.

Even for a simple atom like oxygen, with 8 electrons, and taking a coarse grid of just 10 points in each spatial dimension, we need \(10^{24}\) data points. On modern computers, each real number requires 64 bits of memory. To store the wavefunction of a single oxygen atom we would need \(64\times 10^{24}\) bits \(\approx 8\times 10^{24}\) bytes \(\approx 10^{17}\) MB. (To appreciate how large this is, consider that the age of the universe is roughly \(4\times 10^{17}\) seconds — the numbers are comparable!)

So we need systematic approximations to solve the problem.

Born–Oppenheimer Approximation

Because protons are roughly \(10^3\) times heavier than electrons, they move far more slowly under the same force. This motivates the Born–Oppenheimer (BO) approximation: we freeze the nuclear degrees of freedom and solve only for the electronic structure at fixed nuclear positions.

Validity condition. The BO approximation is justified when the electronic excitation energy \(\hbar\omega_{\rm el}\) greatly exceeds the nuclear vibrational energy \(\hbar\omega_{\rm nuc}\):

\[ \omega_{\rm el} \gg \omega_{\rm nuc}, \qquad \frac{\omega_{\rm nuc}}{\omega_{\rm el}} \sim \left(\frac{m_e}{M}\right)^{1/2} \approx 10^{-2}. \]

This ratio is small for all atoms, which is why BO works so broadly. It breaks down near conical intersections — geometries where two electronic potential energy surfaces become degenerate — leading to non-adiabatic dynamics that are important in photochemistry and certain charge-transfer processes. These are beyond the scope of ground-state DFT.

Under this approximation:

  • The nuclear kinetic energy term vanishes: \(\frac{\hbar^2}{2M_j}\sum_j^N\nabla^2_j = 0\).
  • The nucleus–nucleus repulsion \(\frac{1}{2}\sum_{i\neq j}\frac{Z_iZ_j}{|{R_i-R_j}|}\) becomes a constant that shifts the total energy but does not affect the electronic wavefunction.
  • The electron–nucleus attraction \(\sum_{i,j}\frac{Z_j}{|\mathbf{r}_i-\mathbf{R}_j|}\) depends only on the electronic coordinates \(r\) (nuclei are fixed parameters).

The resulting electronic Hamiltonian is:

\begin{align*} \hat{H}_e =&-\frac{\hbar^2}{2m_e}\sum_{i=1}^n\nabla^2_i & \mbox{Kinetic energy of electrons}\newline &+ \frac{1}{2}\sum_{i\neq j}\frac{e^2}{|\mathbf{r}_i-\mathbf{r}_j|} &\mbox{Electron--electron repulsion}\newline & +V_{\rm ext}(\mathbf{r}) & \mbox{Electron--nucleus attraction} \end{align*}

The electron–electron repulsion term is still a many-body interaction — handling it requires further approximation.

Mean-Field Approximations

Hartree Approximation

After invoking the Born–Oppenheimer approximation, the remaining challenge is the electron–electron repulsion \(\frac{1}{2}\sum_{i\neq j}\frac{1}{|\mathbf{r}_i-\mathbf{r}_j|}\). The simplest treatment sets this to zero (non-interacting electrons), but this gives grossly incorrect results for most properties.

A better strategy is to assume that each electron does not interact with individual others, but instead moves in the mean (average) field generated by all the other electrons combined. This is called the Hartree approximation.

In this picture, the electron–electron repulsion felt by electron \(i\) is replaced by the electrostatic potential of the continuous charge density \(\rho(\mathbf{r}’)\) of all other electrons:

\[ \frac{1}{2}\sum_{i\neq j}\frac{e^2}{|\mathbf{r}_i-\mathbf{r}_j|} \approx \int \frac{e^2,\rho(\mathbf{r}‘)}{|\mathbf{r}_i-\mathbf{r}’|},d\mathbf{r}’ \equiv V_{\rm H}(\mathbf{r}_i), \qquad E_{\rm H}[\rho] = \frac{1}{2}\iint\frac{e^2,\rho(\mathbf{r})\rho(\mathbf{r}‘)}{|\mathbf{r}-\mathbf{r}’|},d\mathbf{r},d\mathbf{r}’ \]

where \(\rho(\mathbf{r}‘) = \sum_j |\phi_j(\mathbf{r}’)|^2\) is the electron number density. The quantity \(E_{\rm H}[\rho]\) is the Hartree energy — the classical electrostatic self-energy of the charge distribution. The corresponding single-electron potential \(V_{\rm H}(\mathbf{r}_i)\) enters the effective single-particle Hamiltonian.

The full Hartree Hamiltonian then reads:

\[ \hat{H}_{\rm Hartree} = -\frac{\hbar^2}{2m_e}\sum_{i=1}^n\nabla^2_i + \sum_i V_{\rm H}(\mathbf{r}_i) + V_{\rm ext}(\mathbf{r}) \]

Hartree–Fock Approximation

The Hartree approximation is incomplete because it ignores the fact that electrons are fermions: identical spin-\( \frac{1}{2} \) particles that obey the Pauli exclusion principle, which requires the many-body wavefunction to be antisymmetric under the exchange of any two electrons.

Fock incorporated this antisymmetry into the Hartree scheme, yielding the Hartree–Fock (HF) approximation. The wavefunction is constructed as a Slater determinant, which automatically satisfies antisymmetry:

\[ \Psi(r_1,r_2\cdots r_N)= \frac{1}{\sqrt{N!}}\begin{vmatrix} \psi_1(r_1) &\cdots &\psi_1(r_N)\ \vdots & \ddots & \vdots\ \psi_N(r_1) &\cdots &\psi_N(r_N)\ \end{vmatrix} \]

Swapping any two columns (i.e., exchanging two electrons) changes the sign of the determinant, ensuring the required antisymmetry. The single-particle orbitals \(\psi_i\) are then determined by the variational condition:

\[ \frac{\delta}{\delta \psi_j^*(r)}\left[\langle\Psi|\hat{H}|\Psi\rangle-\sum_i^N\epsilon_i\int d^3r |\psi_i(r)|^2\right]=0 \]

This procedure introduces an exchange energy term — a purely quantum mechanical contribution absent in the Hartree scheme — which lowers the total energy by keeping electrons of the same spin spatially separated. The correlation energy (everything beyond HF) remains unaccounted for and is the key quantity that DFT’s exchange-correlation functional \(E_{\rm xc}\) aims to capture.

Example: Hartree–Fock Calculation for the Helium Atom

The helium atom (two electrons, nucleus with charge \(Z=2\)) is the simplest system where many-body effects matter, making it an ideal test case.

Hamiltonian

\[ \hat{H} = -\frac{\hbar^2}{2m_e} (\nabla_1^2 + \nabla_2^2) - \frac{2e^2}{r_1} - \frac{2e^2}{r_2} + \frac{e^2}{|\mathbf{r}_1 - \mathbf{r}_2|} \]

Hartree approximation

Each electron feels the nuclear attraction plus the average repulsion from the other electron. The total wavefunction is a simple product of single-electron orbitals: \(\Psi(r_1, r_2) = \phi(r_1)\phi(r_2)\), which is not antisymmetric.

Hartree–Fock improvement

The wavefunction is antisymmetrized via a Slater determinant:

\[ \Psi_{HF}(r_1, r_2) = \frac{1}{\sqrt{2}} [\phi_a(r_1)\phi_b(r_2) - \phi_a(r_2)\phi_b(r_1)] \]

For the ground state, both electrons occupy the \(1s\) orbital with opposite spins.

Physical consequences

  • The Hartree method overestimates the energy because it ignores exchange.
  • The Hartree–Fock method lowers the energy by including exchange, but still neglects electron correlation (all effects beyond a single Slater determinant).

Numerical comparison

MethodGround-state energy (eV)
Hartree> \(-77.5\) eV
Hartree–Fock\(-77.5\) eV
Experiment\(-78.98\) eV

The gap between Hartree–Fock (\(-77.5\) eV) and experiment (\(-78.98\) eV) is called the correlation energy. Though small in absolute terms (\(\approx 1.5\) eV), it is chemically significant — chemical bond energies are typically of this magnitude. Capturing this missing correlation is the central motivation for DFT and its exchange-correlation functionals.

Hohenberg-Kohn Theorem

The Hohenberg–Kohn (HK) theorems, proposed in 1964 by Pierre Hohenberg and Walter Kohn, are the fundamental theoretical pillars of Density Functional Theory (DFT). They establish that the ground-state properties of a many-electron system are uniquely determined by its electron density, i.e. \(\rho(r)\), rather than by the complex many-body wavefunction \(\Psi\).

This is a remarkable result: it implies that a function of three spatial variables \(\rho(r)\) — rather than a wavefunction in \(3N\) dimensions — contains all the information needed to determine the ground state of an \(N\)-electron system. The HK theorems provide the rigorous justification for replacing the many-body Schrödinger equation with a variational problem over electron densities, forming the conceptual basis for all practical implementations of DFT.

Assumptions

We consider a system of \(N\) interacting electrons subject to an external scalar potential \(V_{\rm ext}(r)\), at zero temperature, in its ground state, with no time-dependent fields. The many-body Hamiltonian is:

\[ \begin{align} \hat{H} =& \hat{T}+\hat{V}_{ee}+\hat{V}_{ext}\\ \hat{H}\Psi =& E\Psi \end{align} \]

where

  • \(\hat{T}\) is the kinetic energy operator of the electrons,
  • \(\hat{V}_{ee}\) is the Coulomb electron–electron repulsion operator,
  • \(\hat{V}_{ext} = \sum_{i=1}^N V_{\rm ext}(r_i) \) is the external potential due to the nuclei or any applied field — it is the same for all systems considered in the proofs below.

The electron density \(\rho(r)\) is defined as the probability of finding any one electron at position \(r\), integrated over all other electron coordinates:

\begin{equation} \rho(r)=N\int |\Psi(r_1,r_2\cdots r_N)|^2 \,dr_2\,dr_3\cdots dr_N \label{eq:charge-wf} \end{equation}

\eqref{eq:charge-wf} Note that \(\int \rho(r),dr = N\).

The First Hohenberg–Kohn Theorem

Theorem: For any system of interacting particles in an external potential \(V_{\rm ext}(r)\), the external potential is uniquely determined — up to a constant — by the ground-state electron density \(\rho_0(r)\).

Because \(V_{\rm ext}(r)\) uniquely determines \(\hat{H}\) (and hence all eigenstates and eigenvalues), the ground-state density \(\rho_0(r)\) alone determines everything about the system. The wavefunction \(\Psi_0\) is therefore a functional of \(\rho_0\), written \(\Psi_0 = \Psi[\rho_0]\).

Proof (by contradiction)

Assume the contrary: suppose there exist two different external potentials \(V_{\rm ext}(\mathbf{r})\) and \(V’_{\rm ext}(\mathbf{r})\), differing by more than a constant,

\begin{equation} V_{\rm ext}(\mathbf{r}) \neq V'_{\rm ext}(\mathbf{r}) + \text{const.}, \end{equation}

that nevertheless yield the same ground-state density: \(\rho(\mathbf{r})=\rho’(\mathbf{r}) \equiv \rho(\mathbf{r})\).

Let the two Hamiltonians and their ground states be:

\[ \begin{aligned} \hat{H} & = \hat{T} + \hat{V}_{\rm ee} + \hat{V}_{\rm ext}, & \hat{H}|\Psi\rangle = E_0|\Psi\rangle, \\ \hat{H}’ & = \hat{T} + \hat{V}_{\rm ee} + \hat{V}‘_{\rm ext}, & \hat{H}’|\Psi’\rangle = E’_0|\Psi’\rangle. \end{aligned} \]

We apply the Rayleigh–Ritz variational principle, which states that the true ground-state energy minimises the energy expectation value: any trial state \(|\tilde{\Psi}\rangle \neq |\Psi_0\rangle\) satisfies \(\langle\tilde{\Psi}|\hat{H}|\tilde{\Psi}\rangle > E_0\).

Using \(\Psi’\) as a trial state for \(\hat{H}\):

\begin{align} E_0 \leq& \langle \Psi' | \hat{H} | \Psi' \rangle = \langle \Psi' | \hat{H}' | \Psi' \rangle + \langle \Psi' | \hat{H} - \hat{H}' | \Psi' \rangle \notag\\\\ =& \; E'\_0 + \int \left[ V\_{\rm ext}(\mathbf{r}) - V'\_{\rm ext}(\mathbf{r}) \right] \rho(\mathbf{r}) \, d\mathbf{r}. \label{eq:ineq1} \end{align}

Symmetrically, using \(\Psi\) as a trial state for \(\hat{H}’\):

\begin{equation} E'\_0 \leq \langle \Psi | \hat{H}' | \Psi \rangle = E\_0 + \int \left[ V'\_{\rm ext}(\mathbf{r}) - V\_{\rm ext}(\mathbf{r}) \right] \rho(\mathbf{r}) \, d\mathbf{r}. \label{eq:ineq2} \end{equation}

Adding the two inequalities \eqref{eq:ineq1} and \eqref{eq:ineq2}:

\[ \begin{align} E_0 + E’_0 & \leq E’_0 + E_0 + \int \underbrace{\left[ V_{\rm ext} - V’_{\rm ext} + V'_{\rm ext} - V_{\rm ext} \right]}_{=,0} \rho(\mathbf{r}) , d\mathbf{r}, \\ E_0 + E’_0 & \leq E_0 + E’_0. \end{align} \]

This is a contradiction. Our assumption must therefore be false: no two different external potentials (differing by more than a constant) can share the same ground-state density. The map \(\rho_0(r) \mapsto V_{\rm ext}(r)\) is therefore one-to-one (up to a constant). \(\blacksquare\)

Consequence: the universal energy functional

Since \(\rho_0 \to V_{\rm ext} \to \hat{H} \to \Psi_0\), the ground-state energy can be written as a functional of the density:

\[ \begin{align*} E[\rho] =& F[\rho]+\int V_{\rm ext}(r)\rho(r),dr\ F[\rho] =& \langle\Psi[\rho]|, \hat{T}+\hat{V}_{ee},| \Psi[\rho]\rangle \end{align*} \]

The quantity \(F[\rho]\) is called the universal functional of the density. It is “universal” because it depends only on the electron density — not on the specific external potential — making it, in principle, applicable to any system of interacting electrons.


The Second Hohenberg–Kohn Theorem

Theorem: The true ground-state density \(\rho_0(\mathbf{r})\) minimises the total energy functional \(E[\rho]\). That is, for any trial density \(\tilde{\rho}(\mathbf{r})\) satisfying

\begin{equation} \tilde{\rho}(\mathbf{r}) \geq 0, \quad \int \tilde{\rho}(\mathbf{r}) \, d\mathbf{r} = N, \end{equation}

we have

\begin{equation} E_0 \leq E[\tilde{\rho}], \end{equation}

with equality if and only if \(\tilde{\rho}(\mathbf{r}) = \rho_0(\mathbf{r})\).

The functional is:

\begin{equation} E[\rho] = F[\rho] + \int V_{\rm ext}(\mathbf{r}) \rho(\mathbf{r}) \, d\mathbf{r}, \end{equation}

where \(F[\rho]\) is the universal functional defined above.

Proof sketch: By the first HK theorem, any trial density \(\tilde{\rho} \neq \rho_0\) corresponds to some other external potential and thus to some other wavefunction \(\tilde{\Psi} \neq \Psi_0\). By the Rayleigh–Ritz principle applied to \(\hat{H}\), \(\langle\tilde{\Psi}|\hat{H}|\tilde{\Psi}\rangle \geq E_0\), which translates directly to \(E[\tilde{\rho}] \geq E_0\). \(\blacksquare\)

The second theorem transforms the problem of solving the many-body Schrödinger equation into a variational problem over electron densities — a function in three dimensions rather than \(3N\) dimensions. This is the promise of DFT.


Outlook

While the HK theorems establish that the ground-state density contains all physical information and that a universal energy functional \(E[\rho]\) exists, they do not provide an explicit form for the universal functional \(F[\rho]\). In particular, the kinetic energy \(T[\rho]\) and the electron–electron interaction \(V_{ee}[\rho]\) are unknown functionals of \(\rho\).

A note on \(v\)-representability. The HK proof assumes that the trial density \(\tilde{\rho}\) is \(v\)-representable — that it is the ground-state density of some external potential \(\tilde{V}_{\rm ext}\). Not all non-negative, normalised densities satisfy this condition. Lieb (1983) resolved this by reformulating DFT via a Legendre transform, defining the universal functional as:

\[ F[\rho] = \sum_{V_{\rm ext}} \left( E[V_{\rm ext}] - \int V_{\rm ext}(\mathbf{r}),\rho(\mathbf{r}),d\mathbf{r} \right), \]

which is well-defined for all non-negative \(\rho\) with \(\int\rho d\mathbf{r} = N\) — a larger class called ensemble \(v\)-representable densities. For the Kohn–Sham scheme specifically, the relevant condition is non-interacting \(v\)-representability: the ground-state density of the interacting system must also be the ground-state density of some non-interacting system in an effective potential. This is assumed to hold throughout the KS construction (Chapter 3) and is believed to be satisfied for all physically relevant ground-state densities, though a general proof remains an open problem.

Computing \(F[\rho]\) accurately is the central challenge of DFT in practice. The Kohn–Sham equations, developed in the next chapter, provide a practical path forward by introducing a fictitious system of non-interacting electrons designed to reproduce the exact ground-state density — sidestepping the need to evaluate \(F[\rho]\) directly.

Kohn-Sham Equations

The Hohenberg–Kohn theorems establish that the ground-state density \(\rho_0(\mathbf{r})\) uniquely determines all ground-state properties, and that the total energy \(E[\rho]\) is minimised by \(\rho_0\). However, they say nothing about how to compute the universal functional \(F[\rho]\), and in particular how to evaluate the kinetic energy \(T[\rho]\) as a functional of the density alone. Attempts to write \(T\) purely in terms of \(\rho\) (as in the Thomas–Fermi model) lead to poor accuracy.

The Kohn–Sham (KS) formalism, introduced by Walter Kohn and Lu Jeu Sham in 1965, provides an elegant and highly accurate solution: introduce a fictitious system of non-interacting electrons that, by construction, reproduces the exact ground-state density of the true interacting system. This maps the intractable many-body problem onto a set of effective single-particle equations — the Kohn–Sham equations — which are computationally feasible while remaining formally exact.

Decomposition of the Energy Functional

The key idea is to split the unknown universal functional \(F[\rho]\) into parts that can be handled accurately and a remainder that must be approximated:

\begin{equation} E[\rho] = T_s[\rho] + E_{\mathrm{H}}[\rho] + E_{\mathrm{xc}}[\rho] + \int V_{\mathrm{ext}}(\mathbf{r})\, \rho(\mathbf{r}) \, d\mathbf{r}, \label{eq:KS-energy} \end{equation}

where each term has a specific physical meaning:

  • \(T_s[\rho]\) is the kinetic energy of a fictitious non-interacting system with the same density \(\rho(\mathbf{r})\) as the real interacting system. Unlike the full kinetic energy \(T[\rho]\), this quantity can be computed exactly from the single-particle orbitals (see below).

  • \(E_{\mathrm{H}}[\rho]\) is the classical Hartree energy — the electrostatic self-energy of the electron charge distribution, treating it as a classical continuous fluid:

\begin{equation} E_{\mathrm{H}}[\rho] = \frac{1}{2} \iint \frac{\rho(\mathbf{r})\, \rho(\mathbf{r}')}{|\mathbf{r} - \mathbf{r}'|} \, d\mathbf{r} \, d\mathbf{r}'. \end{equation}

This is the dominant part of the electron–electron repulsion and is treated exactly.

  • \(E_{\mathrm{xc}}[\rho]\) is the exchange-correlation (XC) energy. It collects everything that is missing from the above terms:
\begin{equation} E_{\mathrm{xc}}[\rho] = \underbrace{(T[\rho] - T_s[\rho])}_{\text{kinetic correlation}} + \underbrace{(V_{ee}[\rho] - E_{\mathrm{H}}[\rho])}_{\text{exchange + correlation}}. \end{equation}

This includes the quantum exchange energy (from the antisymmetry of the wavefunction, as seen in the Hartree–Fock discussion in Chapter 1), all Coulomb correlation effects beyond the Hartree level, and the correction to the kinetic energy from the interacting nature of the true system. \(E_{\mathrm{xc}}\) is the only term that must be approximated; its exact form is unknown.

  • \(\int V_{\mathrm{ext}}(\mathbf{r}),\rho(\mathbf{r}),d\mathbf{r}\) is the interaction with the external potential (the nuclear attraction and any applied fields), treated exactly.

The Kohn–Sham ansatz reduces the DFT problem to finding a good approximation for \(E_{\mathrm{xc}}[\rho]\) — a functional of three-dimensional \(\rho\) alone — rather than solving the full \(3N\)-dimensional Schrödinger equation.

The Kohn–Sham Ansatz

The central assumption of the Kohn–Sham approach is:

There exists a system of non-interacting electrons — the KS reference system — whose ground-state density is identical to that of the true interacting system.

Non-interacting \(v\)-representability. This assumption is not trivially guaranteed. It requires that the interacting ground-state density \(\rho_0(\mathbf{r})\) can be reproduced as the ground-state density of some non-interacting system moving in a local effective potential \(V_{\rm eff}(\mathbf{r})\). Densities for which this holds are called non-interacting \(v\)-representable. While a general proof is lacking, no physically relevant counterexample has been found, and the assumption is accepted as holding for all practical ground-state densities encountered in electronic structure calculations. The Kohn–Sham equations derived below are exact under this assumption.

The density is represented in terms of Kohn–Sham orbitals \({\phi_i(\mathbf{r})}\):

\begin{equation} \rho(\mathbf{r}) = \sum_{i=1}^{N} |\phi_i(\mathbf{r})|^2. \label{eq:KS-density} \end{equation}

The kinetic energy of the non-interacting reference system is then computed exactly from these orbitals:

\begin{equation} T_s[\rho] = -\frac{1}{2}\sum_{i=1}^{N} \langle \phi_i | \nabla^2 | \phi_i \rangle = -\frac{1}{2}\sum_{i=1}^{N} \int \phi_i^*(\mathbf{r})\,\nabla^2\phi_i(\mathbf{r})\,d\mathbf{r}. \end{equation}

This is the key advantage over orbital-free approaches like Thomas–Fermi: by working with orbitals we recover the exact non-interacting kinetic energy at the cost of introducing \(N\) single-particle functions.

Derivation of the Kohn–Sham Equations

We seek the set of orbitals \({\phi_i}\) that minimises the total KS energy functional \eqref{eq:KS-energy}, subject to the constraint that the orbitals remain orthonormal:

\begin{equation} \langle \phi_i | \phi_j \rangle = \int \phi_i^*(\mathbf{r})\phi_j(\mathbf{r})\,d\mathbf{r} = \delta_{ij}. \end{equation}

This is a constrained minimisation problem, which we solve using the method of Lagrange multipliers. We construct the Lagrangian:

\begin{equation} \mathcal{L}[\{\phi_i\}] = E[\rho] - \sum_{i,j} \epsilon_{ij} \left( \langle \phi_i | \phi_j \rangle - \delta_{ij} \right), \end{equation}

where \(\epsilon_{ij}\) are the Lagrange multipliers enforcing orthonormality. Taking the functional derivative with respect to \(\phi_i^*(\mathbf{r})\) and setting it to zero yields:

\begin{equation} \frac{\delta E[\rho]}{\delta \phi_i^*(\mathbf{r})} = \sum_j \epsilon_{ij} \phi_j(\mathbf{r}). \end{equation}

Applying the chain rule to the left-hand side:

\begin{equation} \frac{\delta E[\rho]}{\delta \phi_i^*(\mathbf{r})} = \frac{\delta T_s}{\delta \phi_i^*} + \int \frac{\delta (E_{\mathrm{H}} + E_{\mathrm{xc}} + E_{\mathrm{ext}})}{\delta \rho(\mathbf{r}')} \frac{\delta \rho(\mathbf{r}')}{\delta \phi_i^*(\mathbf{r})} \, d\mathbf{r}'. \end{equation}

Evaluating the individual derivatives:

  1. \(\frac{\delta T_s}{\delta \phi_i^*(\mathbf{r})} = -\frac{1}{2}\nabla^2 \phi_i(\mathbf{r})\)
  2. \(\frac{\delta \rho(\mathbf{r}‘)}{\delta \phi_i^*(\mathbf{r})} = \delta(\mathbf{r} - \mathbf{r}’)\phi_i(\mathbf{r}’)\)
  3. \(\frac{\delta E_{\mathrm{ext}}}{\delta \rho(\mathbf{r})} = V_{\mathrm{ext}}(\mathbf{r})\)
  4. \(\frac{\delta E_{\mathrm{H}}}{\delta \rho(\mathbf{r})} = \int \frac{\rho(\mathbf{r}‘)}{|\mathbf{r}-\mathbf{r}’|},d\mathbf{r}’ \equiv V_{\mathrm{H}}(\mathbf{r})\)
  5. \(\frac{\delta E_{\mathrm{xc}}}{\delta \rho(\mathbf{r})} \equiv V_{\mathrm{xc}}(\mathbf{r})\)

This defines the Kohn–Sham effective potential:

\begin{equation} V_{\mathrm{eff}}(\mathbf{r}) = V_{\mathrm{ext}}(\mathbf{r}) + V_{\mathrm{H}}(\mathbf{r}) + V_{\mathrm{xc}}(\mathbf{r}). \end{equation}

The minimisation condition becomes:

\begin{equation} \left[ -\frac{1}{2}\nabla^2 + V_{\mathrm{eff}}(\mathbf{r}) \right] \phi_i(\mathbf{r}) = \sum_j \epsilon_{ij} \phi_j(\mathbf{r}). \end{equation}

Because the Hamiltonian \(\hat{h}{\mathrm{KS}} = -\frac{1}{2}\nabla^2 + V{\mathrm{eff}}\) is Hermitian, the matrix of Lagrange multipliers \(\epsilon_{ij}\) is also Hermitian and can be diagonalised by a unitary transformation of the orbitals. This transformation leaves the total density \(\rho(\mathbf{r})\) unchanged. In this diagonal representation, we obtain the canonical Kohn–Sham equations:

\begin{equation} \boxed{ \left[ -\frac{1}{2}\nabla^2 + V_{\mathrm{eff}}(\mathbf{r}) \right] \phi_i(\mathbf{r}) = \epsilon_i \phi_i(\mathbf{r}) } \label{eq:KS-eqn} \end{equation}

These are single-particle Schrödinger-like equations. The eigenvalues \(\epsilon_i\) are the Kohn–Sham orbital energies.

The Self-Consistent Field (SCF) Cycle

The Kohn–Sham equations are non-linear: the effective potential \(V_{\mathrm{eff}}(\mathbf{r})\) depends on \(V_{\mathrm{H}}\) and \(V_{\mathrm{xc}}\), which in turn depend on the density \(\rho(\mathbf{r})\), which is computed from the orbitals \(\phi_i(\mathbf{r})\) that are the solutions to the KS equations themselves.

Therefore, they must be solved iteratively using the Self-Consistent Field (SCF) method:

  1. Initial guess: Provide a starting guess for the electron density \(\rho^{(0)}(\mathbf{r})\) (often a superposition of atomic densities).
  2. Construct potential: Calculate \(V_{\mathrm{H}}[\rho]\) and \(V_{\mathrm{xc}}[\rho]\), and form \(V_{\mathrm{eff}}(\mathbf{r})\).
  3. Solve KS equations: Diagonalise the KS Hamiltonian to find the new orbitals \(\phi_i^{(1)}(\mathbf{r})\).
  4. Calculate new density: Form the new density \(\rho^{(1)}(\mathbf{r}) = \sum_i |\phi_i^{(1)}|^2\).
  5. Check convergence: If \(|\rho^{(1)} - \rho^{(0)}| < \text{tolerance}\), stop.
  6. Mixing: If not converged, generate a new input density by mixing the old and new densities (e.g. using Broyden or Pulay mixing) to prevent numerical oscillations, and return to Step 2.

Physical Meaning of Kohn–Sham Quantities

It is crucial to understand what the KS framework does and does not claim to represent physically.

The Total Energy

The total ground-state energy is not simply the sum of the KS orbital energies. Summing the eigenvalues gives:

\begin{equation} \sum_i \epsilon_i = \sum_i \langle \phi_i | -\frac{1}{2}\nabla^2 + V_{\mathrm{eff}} | \phi_i \rangle = T_s[\rho] + \int V_{\mathrm{eff}}(\mathbf{r})\rho(\mathbf{r})\,d\mathbf{r}. \end{equation}

Substituting the definition of \(V_{\mathrm{eff}}\):

\begin{equation} \sum_i \epsilon_i = T_s[\rho] + \int V_{\mathrm{ext}}\rho\,d\mathbf{r} + \int V_{\mathrm{H}}\rho\,d\mathbf{r} + \int V_{\mathrm{xc}}\rho\,d\mathbf{r}. \end{equation}

Comparing this to the exact total energy \eqref{eq:KS-energy}, we see that the Hartree and XC interactions have been double-counted in the eigenvalue sum. The correct total energy must be reconstructed by subtracting the double-counting terms:

\begin{equation} E[\rho] = \sum_i \epsilon_i - E_{\mathrm{H}}[\rho] - \int V_{\mathrm{xc}}(\mathbf{r})\rho(\mathbf{r})\,d\mathbf{r} + E_{\mathrm{xc}}[\rho]. \end{equation}

The KS Orbitals and Eigenvalues

In exact DFT, the fictitious KS orbitals \(\phi_i\) and their energies \(\epsilon_i\) have no strict physical meaning, with one exception: Janak’s theorem and related rigorous results show that the highest occupied KS eigenvalue corresponds exactly to the negative of the first ionisation energy (or chemical potential) of the exact many-body system:

\begin{equation} \epsilon_{\mathrm{HOMO}}^{\mathrm{exact}} = -I. \end{equation}

The other eigenvalues \(\epsilon_i\) do not formally correspond to electron removal or addition energies (quasiparticle excitations). The KS “band gap” (\(\epsilon_{\mathrm{LUMO}} - \epsilon_{\mathrm{HOMO}}\)) systematically underestimates the true fundamental gap, even if the exact XC functional were used. This is due to the derivative discontinuity of the exact XC functional with respect to particle number.

Despite this, KS orbitals and eigenvalues are widely (and somewhat informally) used to interpret band structures, density of states, and orbital energies, and often agree qualitatively with experiment, but this agreement is not guaranteed by the theory.

Formally correct excitation spectra require going beyond ground-state DFT, e.g. via Time-Dependent DFT (TDDFT) or many-body perturbation theory (GW approximation).

Summary

The Kohn–Sham scheme reduces the interacting many-body problem to a set of effective single-particle equations \eqref{eq:KS-eqn} that can be solved efficiently on modern computers. The key steps and ideas are:

ConceptRole
Non-interacting reference systemEnables exact computation of \(T_s[\rho]\) via orbitals
\(V_{\mathrm{H}}\)Classical mean-field Coulomb repulsion, computed exactly
\(V_{\mathrm{xc}}\)All many-body quantum effects, must be approximated
SCF loopResolves the self-consistency between density and potential
XC approximation (LDA, GGA, …)The only uncontrolled approximation in the KS framework

The formal exactness of the KS framework — all errors traceable to a single approximated term — combined with its single-particle structure makes DFT the workhorse of electronic structure theory. In the chapters that follow, we will examine how this framework is implemented in practice: basis sets, pseudopotentials, \(k\)-point sampling, and the computational details of solving the KS equations for real materials.

Exchange-Correlation Functionals

In the Kohn–Sham framework, all the complexity of many-body quantum mechanics is concentrated in a single term: the exchange-correlation energy \(E_{\rm xc}[\rho]\). If the exact \(E_{\rm xc}\) were known, DFT would be formally exact. In practice it must be approximated, and six decades of development have produced a rich hierarchy of approximations.

Rather than presenting this hierarchy as a list of progressively better recipes, this chapter derives it from two systematic expansions, each anchored at a different exactly-known limit:

  1. A spatial gradient expansion around the uniform electron gas (UEG) — the only interacting electron system for which \(E_{\rm xc}\) is known exactly. Expanding in powers of the dimensionless density gradient generates the semi-local hierarchy: LDA → GGA → meta-GGA.

  2. A coupling-constant expansion around the non-interacting KS limit — the point at which the exact exchange is known analytically. Expanding in powers of the interaction strength generates the non-local hierarchy: GGA → hybrid → double-hybrid.

These two series are orthogonal: the first improves the local description of the XC hole shape; the second improves its non-local content. Understanding their structure turns the choice of functional from a matter of recipe-following into a matter of physics.

Formal Definition and the Two Reference Points

The XC Hole

The XC energy is defined by the decomposition introduced in Chapter 3:

\[ E_{\rm xc}[\rho] = (T[\rho] - T_s[\rho]) + (V_{ee}[\rho] - E_{\rm H}[\rho]). \]

An equivalent and physically transparent form expresses \(E_{\rm xc}\) in terms of the exchange-correlation hole \(n_{\rm xc}(\mathbf{r}, \mathbf{r}‘)\) — the depletion of electron density at \(\mathbf{r}’\) caused by the presence of an electron at \(\mathbf{r}\):

\[ E_{\rm xc}[\rho] = \frac{1}{2}\iint \frac{\rho(\mathbf{r}),n_{\rm xc}(\mathbf{r},\mathbf{r}‘)}{|\mathbf{r}-\mathbf{r}’|},d\mathbf{r},d\mathbf{r}’. \]

The exact XC hole satisfies two rigorous constraints:

\begin{equation} n_{\rm xc}(\mathbf{r},\mathbf{r}') \leq 0, \qquad \int n_{\rm xc}(\mathbf{r},\mathbf{r}')\,d\mathbf{r}' = -1. \label{eq:hole-sumrule} \end{equation}

The second relation — the sum rule — encodes the fact that each electron excludes exactly one unit of charge from its neighbourhood. Approximations that satisfy this constraint tend to benefit from systematic error cancellation even when the detailed shape of the hole is wrong, which explains the surprising accuracy of LDA despite its crude derivation.

The Two Exactly-Known Limits

Two limits of \(E_{\rm xc}\) are known exactly and serve as expansion points.

Limit 1 — Uniform electron gas (\(s = 0\)): When the density is spatially uniform, \(\rho(\mathbf{r}) = \bar\rho\), the XC energy per electron \(\varepsilon_{\rm xc}(\bar\rho)\) is known with essentially arbitrary precision from quantum Monte Carlo simulations (Ceperley–Alder, 1980). The exchange component is analytic (Dirac, 1930). This is the natural anchor for a Taylor series in density variations.

Limit 2 — Non-interacting KS limit (\(\lambda = 0\)): Via the adiabatic connection, the XC energy is written as an integral over a coupling constant \(\lambda \in [0,1]\) that smoothly scales the electron–electron interaction from zero (the KS system) to its physical value:

\begin{equation} E_{\rm xc}[\rho] = \int_0^1 W_{\rm xc}^\lambda[\rho]\,d\lambda, \qquad W_{\rm xc}^\lambda = \langle\Psi^\lambda|\hat{V}_{ee}|\Psi^\lambda\rangle - E_{\rm H}[\rho], \label{eq:adiabatic} \end{equation}

where \(\Psi^\lambda\) is the ground state with the same density \(\rho\) but interaction scaled by \(\lambda\). At \(\lambda = 0\), \(\Psi^0\) is the KS Slater determinant and \(W_{\rm xc}^0 = E_{\rm x}^{\rm exact}\) is known analytically. This is the natural anchor for a Taylor series in interaction strength.

The First Taylor Series: Gradient Expansion and Semi-Local Functionals

The Dimensionless Gradient and the GEA

For a slowly varying density, we measure spatial variation by the reduced density gradient:

\[ s(\mathbf{r}) = \frac{|\nabla\rho(\mathbf{r})|}{2k_F(\mathbf{r}),\rho(\mathbf{r})}, \qquad k_F(\mathbf{r}) = (3\pi^2\rho(\mathbf{r}))^{1/3}, \]

where \(k_F\) is the local Fermi wavevector. The parameter \(s\) measures the density gradient in units of the local electron wavelength: \(s = 0\) for the UEG; \(s \sim 0.5\)–\(3\) in typical valence regions; \(s \to \infty\) in exponentially decaying density tails.

The formal gradient expansion approximation (GEA) from many-body perturbation theory gives the systematic Taylor series in \(s^2\) (odd powers vanish by symmetry):

\begin{equation} E_{\rm xc}^{\rm GEA}[\rho] = \int \rho\,\varepsilon_{\rm xc}^{\rm UEG}(\rho)\,d\mathbf{r} + \int C_{\rm xc}(\rho)\,s^2\,\rho\,d\mathbf{r} + \mathcal{O}(s^4), \label{eq:GEA} \end{equation}

where \(C_{\rm xc}(\rho)\) is a density-dependent coefficient computable from perturbation theory.

Rung 0: LDA — Zeroth Order

Truncating \eqref{eq:GEA} at zeroth order (\(s^0\)) gives the local density approximation:

\[ \boxed{E_{\rm xc}^{\rm LDA}[\rho] = \int \varepsilon_{\rm xc}^{\rm UEG}(\rho(\mathbf{r})),\rho(\mathbf{r}),d\mathbf{r}.} \]

The exchange part is analytic (Dirac, 1930):

\[ \varepsilon_{\rm x}^{\rm UEG}(\rho) = -\frac{3}{4}!\left(\frac{3}{\pi}\right)^{!1/3}!\rho^{1/3}. \]

The correlation \(\varepsilon_{\rm c}(\rho)\) is parameterised from QMC data: the Perdew–Wang (PW92) and Vosko–Wilk–Nusair (VWN) parameterisations are most common.

Why LDA works better than a zeroth-order truncation deserves. The LDA XC hole is that of the UEG, which satisfies the sum rule \eqref{eq:hole-sumrule} by construction. Even when the detailed angular shape of the hole is wrong, the spherically averaged hole — which is what determines \(E_{\rm xc}\) via a radial integral — is described well. This is systematic error cancellation rooted in constraint satisfaction, not coincidence.

Known systematic errors: overbinds (cohesive energies \(+1\)–\(2\) eV/atom); underestimates lattice constants (\(-1\)–\(3%\)); underestimates band gaps (\(-40%\)); no dispersion.

When to use LDA: Phonons and structural properties of simple and alkali metals, where the electron gas is nearly uniform and the error cancellation is most favourable.

Why the Raw GEA Fails

The \(\mathcal{O}(s^2)\) term in \eqref{eq:GEA} does not improve on LDA — it makes things worse. The GEA correction introduces oscillations in the XC hole at large \(|\mathbf{r}-\mathbf{r}’|\) (where \(s\) is large), causing the hole to become positive and violating \(n_{\rm xc} \leq 0\) and the \(-1\) sum rule in the density tails. This is a classical example of a Taylor series that diverges when the expansion parameter is not uniformly small: \(s \to \infty\) in the tail of any finite system.

Rung 1: GGA — Constrained Resummation of the Gradient Series

Rather than truncating \eqref{eq:GEA}, the generalised gradient approximation writes the XC energy in the enhancement factor form:

\[ \boxed{E_{\rm xc}^{\rm GGA}[\rho] = \int \varepsilon_{\rm xc}^{\rm UEG}(\rho),F_{\rm xc}(\rho, s),\rho,d\mathbf{r},} \]

where \(F_{\rm xc}(\rho, s)\) is a dimensionless enhancement factor. GGA is not the \(\mathcal{O}(s^2)\) truncation — it is a constrained resummation that reproduces the correct \(s^2\) behaviour near \(s = 0\) while satisfying exact constraints for all \(s\).

PBE (Perdew–Burke–Ernzerhof, 1996) determines \(F_{\rm xc}\) by imposing:

  1. \(F_{\rm xc} \to 1\) as \(s \to 0\): recovers LDA, consistent with the UEG limit.
  2. \(\partial F_{\rm xc}/\partial(s^2)|_0 = \mu\): matches the GEA gradient coefficient, ensuring consistency with perturbation theory at small \(s\).
  3. Cutoff at large \(s\): \(F_{\rm xc}\) saturates rather than growing without bound, enforcing the sum rule \(\int n_{\rm x}^{\rm GGA},d\mathbf{r}’ = -1\) even in the density tails. This is precisely the correction to the GEA failure above.
  4. Lieb–Oxford bound: \(E_{\rm xc}^{\rm GGA} \geq -1.679\int\rho^{4/3},d\mathbf{r}\), a rigorous lower bound satisfied for all \(s\).

The explicit PBE exchange enhancement factor is:

\[ F_{\rm x}^{\rm PBE}(s) = 1 + \kappa - \frac{\kappa}{1 + \mu s^2/\kappa}, \qquad \kappa = 0.804,\quad \mu = 0.2195, \]

where both parameters are fixed by constraints, not by fitting to data. PBE has zero empirical parameters.

LDA vs. PBE:

PropertyLDAPBE
Lattice constants\(-1\) to \(-3%\)\(+1\) to \(+2%\)
Atomisation energies\(+30\) kcal/mol\(+10\) kcal/mol
Band gaps\(-40%\)\(-40%\)
Hydrogen bondsReasonableGood
Van der WaalsVery poorPoor

The band gap error is essentially unchanged between LDA and PBE: it originates not in the gradient expansion but in the missing derivative discontinuity of the exact XC potential at integer electron number, which no semi-local functional can capture.

Lieb–Oxford bound. A rigorous lower bound on \(E_{\rm xc}\) for any \(N\)-electron density follows from the Lieb–Oxford inequality (Lieb and Oxford, 1981):

\[ E_{\rm xc}[\rho] \geq -C_{\rm LO}\int \rho(\mathbf{r})^{4/3},d\mathbf{r}, \qquad C_{\rm LO} \leq 1.679. \]

Physically this bound says that the XC energy cannot be more negative than a multiple of the exchange energy of a fully spin-polarised UEG at the same density — it bounds how much the XC hole can deepen the energy. The numerical constant \(C_{\rm LO} = 1.679\) was obtained by Lieb and Oxford from analysis of the indirect part of the Coulomb energy. PBE’s exchange enhancement factor is explicitly constructed to satisfy this bound for all \(s\), which is one of its 11 exact constraints. LDA satisfies the bound locally; GEA violates it at large \(s\).

The GL4 argument for \(a = 1/4\). The fraction of exact exchange in PBE0 is not empirical but follows from fourth-order Görling–Levy (GL) perturbation theory (Perdew, Ernzerhof, Burke, 1996). The GL expansion of the adiabatic connection integrand gives:

\[ W_{\rm xc}^\lambda = E_{\rm x}^{\rm exact} + 2\lambda E_{\rm c}^{\rm GL2} + 3\lambda^2 E_{\rm c}^{\rm GL3} + \cdots \]

Integrating over \(\lambda \in [0,1]\) with only the zeroth- and second-order terms:

\[ E_{\rm xc}^{\rm GL4} \approx E_{\rm x}^{\rm exact} + E_{\rm c}^{\rm GL2}. \]

Since GGA already approximates \(E_{\rm c}^{\rm GL2}\) partially (capturing the short-range correlation at the GGA level), the optimal fraction of exact exchange that avoids double-counting the correlation already in the GGA is estimated by requiring that the hybrid reproduces GL4 for weakly correlated systems. This gives \(a = 1/4\) as the parameter-free choice. Systems with stronger correlation (larger \(\lambda\) dependence of \(W_{\rm xc}^\lambda\)) would prefer smaller \(a\), while weakly correlated systems near the UEG prefer larger \(a\) — this is the physical reason why one value of \(a\) cannot be universally optimal.

Rung 2: Meta-GGA — Encoding \(\mathcal{O}(s^4)\) Through \(\tau\)

The gradient expansion can in principle be extended to \(\mathcal{O}(s^4)\), which involves second spatial derivatives \(\nabla^2\rho\). However, \(\nabla^2\rho\) included directly reintroduces oscillatory behaviour in the density tails. The kinetic energy density:

\[ \tau(\mathbf{r}) = \frac{1}{2}\sum_i f_i,|\nabla\phi_i(\mathbf{r})|^2 \]

encodes the same fourth-order gradient information without the oscillation pathology. To see why, consider the two limiting forms of \(\tau\).

The von Weizsäcker kinetic energy density \(\tau^W\) is the exact \(\tau\) for a system described by a single orbital \(\phi = \sqrt{\rho},e^{i\theta}\). For a real, nodeless orbital, \(\phi = \sqrt{\rho}\), so:

\[ \tau^W = \frac{1}{2}|\nabla\sqrt{\rho}|^2 = \frac{1}{2}\left|\frac{\nabla\rho}{2\sqrt{\rho}}\right|^2 = \frac{|\nabla\rho|^2}{8\rho}. \]

This is the minimum possible kinetic energy density consistent with a given \(\rho\); any multi-orbital system has \(\tau \geq \tau^W\) (the Pauli kinetic energy is non-negative). For the uniform electron gas with \(N\) occupied plane-wave orbitals, the kinetic energy density is:

\[ \tau^{\rm UEG}(\rho) = \frac{3}{10}(3\pi^2)^{2/3}\rho^{5/3}, \]

which is the Thomas–Fermi kinetic energy density. The gradient expansion then gives:

\[ \tau = \tau^W + \tau^{\rm UEG} + \mathcal{O}(\nabla^4\rho), \]

showing that \(\tau - \tau^W\) measures the deviation from single-orbital character, i.e. the degree of multi-orbital kinetic energy. The meta-GGA uses this through the iso-orbital indicator:

\[ \alpha(\mathbf{r}) = \frac{\tau(\mathbf{r}) - \tau^W(\mathbf{r})}{\tau^{\rm UEG}(\mathbf{r})}, \]

whose physical content follows directly:

  • \(\alpha = 0\): \(\tau = \tau^W\), exact for a single orbital — covalent bonds, core regions.
  • \(\alpha = 1\): \(\tau = \tau^W + \tau^{\rm UEG}\), the UEG limit — metallic regions, LDA should be recovered.
  • \(\alpha \gg 1\): many orbitals, slowly varying density — van der Waals region between closed-shell fragments.

The meta-GGA energy functional is:

\[ E_{\rm xc}^{\rm mGGA}[\rho] = \int \varepsilon_{\rm xc}(\rho,,s,,\alpha),\rho,d\mathbf{r}. \]

SCAN (Strongly Constrained and Appropriately Normed; Sun, Ruzsinszky, Perdew, 2015) satisfies all 17 known exact constraints applicable to a semi-local functional simultaneously. r²SCAN (Furness et al., 2020) is a numerically regularised variant preferred in plane-wave codes.

The Second Taylor Series: Coupling-Constant Expansion and Non-Local Functionals

The semi-local functionals above improve the spatial description of the XC hole but remain fundamentally local: \(\varepsilon_{\rm xc}\) at \(\mathbf{r}\) depends only on quantities at \(\mathbf{r}\). A qualitatively different class of improvement comes from the second Taylor series.

Expanding \(W_{\rm xc}^\lambda\) Around \(\lambda = 0\)

We expand the adiabatic connection integrand \eqref{eq:adiabatic} in \(\lambda\):

\begin{equation} W_{\rm xc}^\lambda = W_{\rm xc}^0 + \lambda\left.\frac{dW_{\rm xc}^\lambda}{d\lambda}\right|_{\lambda=0} + \frac{\lambda^2}{2}\left.\frac{d^2W_{\rm xc}^\lambda}{d\lambda^2}\right|_{\lambda=0} + \cdots \label{eq:AC-Taylor} \end{equation}

The zeroth-order term is exactly known:

\[ W_{\rm xc}^0 = E_{\rm x}^{\rm exact}[{\phi_i}] = -\frac{1}{2}\sum_{i,j}\iint\frac{\phi_i^(\mathbf{r})\phi_j(\mathbf{r})\phi_j^(\mathbf{r}‘)\phi_i(\mathbf{r}’)}{|\mathbf{r}-\mathbf{r}‘|},d\mathbf{r},d\mathbf{r}’. \]

The first derivative is identified by Görling–Levy (GL) perturbation theory as twice the second-order GL correlation energy \(E_{\rm c}^{\rm GL2}\):

\begin{equation} \left.\frac{dW_{\rm xc}^\lambda}{d\lambda}\right|_{\lambda=0} = 2E_{\rm c}^{\rm GL2} = -2\sum_{\substack{i \lt j \\ a \lt b}} \frac{|\langle\phi_i\phi_j\|\phi_a\phi_b\rangle|^2}{\epsilon_a + \epsilon_b - \epsilon_i - \epsilon_j}, \label{eq:GL2} \end{equation}

where \(i,j\) label occupied and \(a,b\) unoccupied KS orbitals. This has the same algebraic structure as MP2 correlation but is evaluated on KS orbitals. Integrating \eqref{eq:AC-Taylor}:

\begin{equation} E_{\rm xc} = E_{\rm x}^{\rm exact} + E_{\rm c}^{\rm GL2} + \frac{1}{6}\left.\frac{d^2W_{\rm xc}^\lambda}{d\lambda^2}\right|_{\lambda=0} + \cdots \label{eq:AC-integrated} \end{equation}

Each successive term is more expensive and more accurate.

Rung 3: Global Hybrid Functionals — Zeroth-Order Plus Remainder

Retaining only the zeroth-order term \(E_{\rm x}^{\rm exact}\) and approximating the remainder of the \(\lambda\) integral by a GGA gives the global hybrid:

\begin{equation} \boxed{E_{\rm xc}^{\rm hybrid} = a\,E_{\rm x}^{\rm exact} + (1-a)\,E_{\rm x}^{\rm GGA} + E_{\rm c}^{\rm GGA}.} \label{eq:hybrid} \end{equation}

The fraction \(a\) is not a free parameter: fourth-order GL perturbation theory (Perdew, Ernzerhof, Burke, 1996) shows that for systems near the weakly correlated limit, the integrand \(W_{\rm xc}^\lambda\) is nearly linear in \(\lambda\), and the trapezoidal rule \((W_0 + W_1)/2\) gives \(a = 1/2\). Since GGA already captures part of the \(\lambda = 1\) end, the effective exact exchange weight is smaller, and GL4 gives \(a = 1/4\) as the optimal parameter-free estimate.

PBE0

PBE0 implements \eqref{eq:hybrid} with \(a = 1/4\) from GL4 perturbation theory and PBE as the semi-local component:

\[ E_{\rm xc}^{\rm PBE0} = \frac{1}{4}E_{\rm x}^{\rm exact} + \frac{3}{4}E_{\rm x}^{\rm PBE} + E_{\rm c}^{\rm PBE}. \]

PBE0 has no empirical parameters. In the limit \(a \to 0\) it recovers PBE exactly. Band gaps improve from \(-40%\) (PBE) to \(\sim -20%\) (PBE0), and reaction barriers improve substantially. The cost in a periodic code scales as \(\mathcal{O}(N^3)\) with density fitting, making it expensive for large unit cells.

B3LYP

B3LYP (Becke, 1993) occupies the same position in the coupling-constant series but determines its three parameters \((a_0, a_x, a_c)\) empirically by fitting to the G2 molecular thermochemistry database rather than from perturbation theory:

\[ E_{\rm xc}^{\rm B3LYP} = (1-a_0)E_{\rm x}^{\rm LSDA} + a_0,E_{\rm x}^{\rm exact} + a_x\Delta E_{\rm x}^{\rm B88} + (1-a_c)E_{\rm c}^{\rm VWN} + a_c,E_{\rm c}^{\rm LYP}, \]

with \(a_0 = 0.20\), \(a_x = 0.72\), \(a_c = 0.81\). The LYP correlation (Lee–Yang–Parr) is derived from the Colle–Salvetti helium atom expression, not from the UEG, so it does not reduce to \(\varepsilon_{\rm c}^{\rm UEG}\) in the uniform limit. This makes B3LYP physically inconsistent for metallic systems but very well-suited to organic molecules (its fitting set): thermochemical accuracy \(\sim 3\) kcal/mol, bond lengths \(\sim 0.01\) Å. It should not be used for periodic metals or large-gap insulators.

The contrast between PBE0 and B3LYP is instructive: both sit at the same rung of the coupling-constant ladder, but PBE0 is constraint-derived and transferable while B3LYP is empirically optimal within its training domain.

The Spatial Structure of the Exchange Hole: Motivating Range Separation

Global hybrids apply the same fraction \(a\) of exact exchange uniformly at all length scales. But the exchange hole \(n_{\rm x}(\mathbf{r},\mathbf{r}’)\) behaves fundamentally differently at short and long range.

At short range (\(|\mathbf{r}-\mathbf{r}’| \to 0\)): the hole is deep, localised, and cusp-like. Semi-local functionals describe this region reasonably well.

At long range (\(|\mathbf{r}-\mathbf{r}’| \to \infty\)): the exact exchange hole of a finite system must decay as \(-1/r\) to satisfy the \(-1\) sum rule. This \(-1/r\) tail implies that the exchange potential \(V_{\rm x}(\mathbf{r}) \to -1/r\) asymptotically — a non-local behaviour that only exact exchange can reproduce; semi-local functionals decay exponentially instead.

For solids, the situation is reversed at long range: the dielectric response of the crystal screens the exchange hole so that the effective long-range decay is faster than \(-1/r\). In a perfect metal, the exchange interaction is fully screened beyond the Thomas–Fermi length (\(\sim 0.5\) Å), so exact long-range exchange is not only unnecessary but harmful — it produces a logarithmic divergence of the exchange potential at the Fermi surface.

This physics suggests splitting the Coulomb operator into short-range (SR) and long-range (LR) parts using the error function:

\[ \frac{1}{r_{12}} = \underbrace{\frac{\mathrm{erfc}(\omega r_{12})}{r_{12}}}{\mathrm{SR}: ;\to 0\text{ as }r{12}\to\infty} + \underbrace{\frac{\mathrm{erf}(\omega r_{12})}{r_{12}}}{\mathrm{LR}: ;\to 1/r{12}\text{ as }r_{12}\to\infty}, \]

where \(\omega\) controls the crossover: \(r_c \approx 1/\omega\). Different exact exchange fractions \(a_{\rm SR}\) and \(a_{\rm LR}\) are then applied to each part.

HSE06 — Short-Range Exact Exchange for Solids

HSE06 (Heyd–Scuseria–Ernzerhof, 2003/2006) applies exact exchange at short range only:

\[ E_{\rm xc}^{\rm HSE06} = \frac{1}{4}E_{\rm x}^{\rm exact,SR}(\omega) + \frac{3}{4}E_{\rm x}^{\rm PBE,SR}(\omega) + E_{\rm x}^{\rm PBE,LR}(\omega) + E_{\rm c}^{\rm PBE}, \]

with \(\omega = 0.11\) bohr\(^{-1}\) (\(\approx 0.21\) Å\(^{-1}\), crossover \(\sim 5\)–\(10\) Å).

The limits confirm the physical motivation: \(\omega \to 0\) recovers PBE0 (no screening, appropriate for molecules); \(\omega \to \infty\) recovers PBE (full screening, appropriate for metals). At \(\omega = 0.11\) bohr\(^{-1}\), HSE06 describes the screened exchange of a typical semiconductor. Computational advantages are significant: the short-range Fock operator decays exponentially in real space, enabling \(\mathcal{O}(N)\) scaling, making HSE06 tractable for unit cells of hundreds of atoms.

Performance: band gaps within \(\sim 10\)–\(15%\) of experiment for most semiconductors; correct metallic behaviour where PBE0 fails; standard functional for band gap calculations in periodic solids.

LC-\(\omega\)PBE and \(\omega\)B97X-V — Long-Range Exact Exchange for Molecules

For isolated molecules, the physical argument is opposite: the density tails are unscreened, and the correct \(-1/r\) asymptote of \(V_{\rm x}\) is essential for Rydberg states and charge-transfer excitations. Long-range corrected (LC) functionals apply exact exchange at long range:

\[ E_{\rm xc}^{\rm LC} = E_{\rm x}^{\rm GGA,SR}(\omega) + E_{\rm x}^{\rm exact,LR}(\omega) + E_{\rm c}^{\rm GGA}. \]

\(\omega\)B97X-V (Mardirossian–Head-Gordon, 2014; \(a_{\rm SR} = 0.167\), \(a_{\rm LR} = 1.0\), \(\omega = 0.30\) Å\(^{-1}\)) further combines long-range correction with the non-local VV10 dispersion functional, treating van der Waals at the functional level. It ranks among the best single functionals for molecular benchmarks.

CAM-B3LYP (Yanai et al., 2004) interpolates smoothly between \(a_{\rm SR} = 0.19\) and \(a_{\rm LR} = 0.65\) using \(\omega = 0.33\) Å\(^{-1}\), retaining B3LYP’s ground-state accuracy while improving excited states.

Summary: Global and Range-Separated Hybrids

Functional\(a_{\rm SR}\)\(a_{\rm LR}\)\(\omega\) (Å\(^{-1}\))Primary use
PBE00.250.25Molecules and solids (no screening)
B3LYP0.200.20Organic molecules
HSE060.250.000.21Periodic solids, large cells
LC-\(\omega\)PBE0.001.000.40Charge transfer, Rydberg
CAM-B3LYP0.190.650.33Molecular excited states
\(\omega\)B97X-V0.1671.000.30Molecular thermochemistry + vdW

Rung 4: Double-Hybrid Functionals — First-Order Term

Retaining both the zeroth- and first-order terms of \eqref{eq:AC-integrated} and approximating the GGA remainder for the higher orders gives the double-hybrid:

\[ \boxed{E_{\rm xc}^{\rm DH} = a,E_{\rm x}^{\rm exact} + (1-a),E_{\rm x}^{\rm GGA} + b,E_{\rm c}^{\rm GL2} + (1-b),E_{\rm c}^{\rm GGA},} \]

where \(E_{\rm c}^{\rm GL2}\) is the second-order Görling–Levy correlation \eqref{eq:GL2}. This costs \(\mathcal{O}(N^5)\) in system size due to the four-index two-electron integrals, restricting applications to systems of tens to a few hundred atoms.

B2PLYP (Grimme, 2006; \(a = 0.53\), \(b = 0.27\), empirical) and PBE-QIDH (Chai and Mao, 2012; \(a = (1/3)^{1/3}\), \(b = 1/3\), derived from GL perturbation theory without fitting) achieve thermochemical accuracy of \(\sim 1\) kcal/mol, rivalling wavefunction methods at substantially lower cost. Double hybrids are rarely applied to periodic solids due to the \(\mathcal{O}(N^5)\) cost.

The Two Expansions Together: Jacob’s Ladder Revisited

The full hierarchy is now organised by both Taylor series simultaneously:

RungSeriesOrderNew ingredientExample
LDAGradient, \(s^0\)ZerothUEG: \(\rho\)PW92, VWN
GGAGradient, resummation of \(s^2\)First (resummed)\(\nabla\rho\)PBE, PBEsol
meta-GGAGradient, encodes \(s^4\)Second (via \(\tau\))\(\tau\), \(\alpha\)SCAN, r²SCAN
HybridCoupling constant, \(\lambda^0\)Zeroth\(E_{\rm x}^{\rm exact}\)PBE0, HSE06
Double-hybridCoupling constant, \(\lambda^1\)First\(E_{\rm c}^{\rm GL2}\)B2PLYP, PBE-QIDH

Meta-hybrid functionals (SCAN0, r²SCAN0: \(a = 1/4\) exact exchange added to SCAN/r²SCAN) sit at the intersection — exploiting both expansions simultaneously for improved band gaps and magnetic exchange couplings.

Van der Waals: Beyond Both Series

London dispersion — the \(-C_6/R^6\) attraction between distant closed-shell fragments — arises from non-local correlated density fluctuations at distances \(R\) far exceeding an inter-atomic spacing. Neither the spatial gradient expansion (local by construction) nor the GL expansion at low order captures it: \(E_{\rm c}^{\rm GL2}\) underestimates medium-range correlation, and higher-order terms would be needed.

Practical remedies are dispersion corrections added to the DFT total energy:

  • DFT-D3(BJ) (Grimme et al., 2010): atom-pairwise \(-C_6/r^6 - C_8/r^8\) with Becke–Johnson damping. Cheap, widely available, recommended default.
  • Tkatchenko–Scheffler (TS): \(C_6\) coefficients from Hirshfeld density partitioning; captures chemical environment dependence.
  • Many-body dispersion (MBD): Collective dipole fluctuations via coupled quantum harmonic oscillators; important for molecular crystals and large biomolecules.
  • vdW-DF / VV10: Non-local correlation functional via a double spatial integral; dispersion at the functional level without atom-pairwise parameterisation.

For periodic systems, PBE-D3(BJ) or r²SCAN+rVV10 are recommended.

Self-Interaction Error

The self-interaction error (SIE) — the incomplete cancellation between \(E_{\rm H}[\rho]\) and \(E_{\rm xc}[\rho]\) for the unphysical self-repulsion of each electron — is illuminated by both Taylor series. From the gradient expansion perspective, semi-local functionals built on the UEG reference cannot reproduce the single-electron limit because the UEG has many electrons and no self-interaction to cancel. From the coupling-constant perspective, the SIE is progressively reduced as more exact exchange (which is exactly self-interaction-free) is included. This is why each rung from hybrid upward improves reaction barriers, charge-transfer energies, and localisation of \(d\)/\(f\) electrons — all dominated by regions where the self-interaction is severe.

Practical Decision Guide

Property / SystemRecommended functionalKey reason
Phonons, lattice constants (metals)LDA or PBEsolNear-UEG electron gas; error cancellation
Geometry optimisation (general)PBE or r²SCANPBE default; r²SCAN for improved quality
Band gaps (semiconductors)HSE06Screened SR exchange, tractable for large cells
Band gaps (insulators, molecules)PBE0Global hybrid sufficient; no screening needed
Magnetic exchange coupling \(J\)PBE0 or HSE06Reduced SIE on \(d\)/\(f\) orbitals
Molecular thermochemistryB3LYP or \(\omega\)B97X-VFitted or constraint-based with LR correction
Charge-transfer / Rydberg (TDDFT)CAM-B3LYP or LC-\(\omega\)PBELR exact exchange essential
Layered / vdW systemsPBE-D3(BJ) or r²SCAN+rVV10Explicit dispersion mandatory
Large periodic cells (\(>200\) atoms)PBE, r²SCAN, or HSE06Hybrid cost prohibitive beyond HSE06
High-accuracy thermochemistryB2PLYP-D3 or PBE-QIDHGL2 correlation, \(\sim 1\) kcal/mol

Outlook

The two Taylor series give a unified account of the XC landscape but neither converges uniformly. The gradient expansion fails for \(s \gg 1\) (exponential tails, surfaces), and the coupling-constant expansion is slow for strongly correlated systems (large \(\lambda\) dependence, near-degenerate ground states). The next major failure mode — the treatment of localised \(d\) and \(f\) electrons in Mott–Hubbard materials — and the DFT+U correction that targets it directly are the subject of a later chapter.

Basis Sets and Pseudopotentials

Solving the Kohn–Sham equations requires representing the single-particle orbitals \(\phi_i(\mathbf{r})\) numerically. There are two complementary choices to be made:

  1. How to expand the orbitals — the choice of basis set.
  2. How to treat the core electrons — full all-electron treatment, or replace the core with a pseudopotential (PP) or projector augmented wave (PAW) potential.

These choices profoundly affect computational cost, accuracy, and the type of system tractable. This chapter develops the theory behind the most common approaches and provides practical guidance on convergence.

Basis Set Expansion

The KS orbital \(\phi_i(\mathbf{r})\) is an element of an infinite-dimensional Hilbert space. In practice, it is expanded in a finite basis \({f_\mu(\mathbf{r})}\):

\begin{equation} \phi_i(\mathbf{r}) = \sum_\mu c_{i\mu}\, f_\mu(\mathbf{r}), \end{equation}

where the coefficients \(c_{i\mu}\) are determined by diagonalising the KS Hamiltonian in this basis. The KS eigenvalue problem \eqref{eq:KS-eqn} becomes the generalised matrix eigenvalue problem:

\begin{equation} \mathbf{H}_{\rm KS}\,\mathbf{c}_i = \epsilon_i\,\mathbf{S}\,\mathbf{c}_i, \end{equation}

where \(H_{\rm KS,\mu\nu} = \langle f_\mu | \hat{h}{\rm KS} | f\nu\rangle\) and \(S_{\mu\nu} = \langle f_\mu | f_\nu\rangle\) is the overlap matrix. The two dominant choices are plane waves and localised (atomic-like) basis functions.

Plane-Wave Basis Sets

For crystalline solids, Bloch’s theorem dictates that KS orbitals have the form:

\begin{equation} \phi_{i\mathbf{k}}(\mathbf{r}) = e^{i\mathbf{k}\cdot\mathbf{r}}\,u_{i\mathbf{k}}(\mathbf{r}), \end{equation}

where \(u_{i\mathbf{k}}(\mathbf{r})\) is lattice-periodic. The periodic part is naturally expanded in plane waves labelled by reciprocal lattice vectors \(\mathbf{G}\):

\begin{equation} u_{i\mathbf{k}}(\mathbf{r}) = \frac{1}{\sqrt{\Omega}}\sum_{\mathbf{G}} c_{i,\mathbf{k}+\mathbf{G}}\,e^{i\mathbf{G}\cdot\mathbf{r}}, \end{equation}

where \(\Omega\) is the unit cell volume. The basis is truncated at a kinetic energy cutoff \(E_{\rm cut}\). In atomic units (Hartree), the condition is:

\[ \frac{1}{2}|\mathbf{k}+\mathbf{G}|^2 \leq E_{\rm cut}, \]

equivalently \(\frac{\hbar^2}{2m_e}|\mathbf{k}+\mathbf{G}|^2 \leq E_{\rm cut}\) in SI notation. Codes typically quote \(E_{\rm cut}\) in eV (VASP) or Ry (Quantum ESPRESSO); 1 Hartree = 27.2 eV = 2 Ry.

Advantages of plane waves:

  • Systematic completeness: increasing \(E_{\rm cut}\) improves the basis in a controlled, unbiased way. Convergence is monitored by a single parameter.
  • No basis set superposition error (BSSE): Plane waves are not centred on atoms, so the basis does not artificially lower the energy when two atoms approach.
  • Efficient evaluation: The Hartree potential and charge density are related by Poisson’s equation, which is diagonal in reciprocal space. Fast Fourier transforms (FFTs) enable \(\mathcal{O}(N \log N)\) switching between real and reciprocal space.
  • Implemented in major codes: VASP, Quantum ESPRESSO, ABINIT, CP2K (mixed Gaussian/PW).

Disadvantages:

  • All-electron calculations are impractical: Near the nuclei, KS orbitals oscillate rapidly (due to orthogonality to core states), requiring an extremely large \(E_{\rm cut}\). This motivates the use of pseudopotentials or PAW (see below).
  • Poor for isolated molecules: A large vacuum supercell is needed to prevent periodic images from interacting, wasting computational effort.
  • Not naturally localised: Analysis (e.g. Bader charges, wannier functions) requires post-processing.

Convergence with Respect to \(E_{\rm cut}\)

A convergence test is mandatory in every plane-wave calculation. The procedure is:

  1. Fix all other parameters (cell, \(k\)-points, functional).
  2. Compute the total energy \(E_{\rm tot}(E_{\rm cut})\) for a series of cutoff values.
  3. Identify the cutoff where \(\Delta E_{\rm tot} < \epsilon_{\rm tol}\) (typically \(1\)–\(5\) meV/atom).

Typical converged cutoffs: \(\sim 400\)–\(600\) eV for GGA+PAW; \(\sim 700\)–\(1200\) eV for norm-conserving PPs; transition metal oxides and \(f\)-electron systems often require higher values.

\(k\)-Point Sampling

In a periodic solid, the KS equations must be solved at every \(\mathbf{k}\)-point in the first Brillouin zone (BZ). Physical observables involve BZ integrals, e.g.:

\begin{equation} \rho(\mathbf{r}) = \frac{\Omega}{(2\pi)^3}\sum_i \int_{\rm BZ} |\phi_{i\mathbf{k}}(\mathbf{r})|^2\,d\mathbf{k}. \end{equation}

In practice, the integral is replaced by a discrete sum over a finite mesh. The standard choice is the Monkhorst–Pack (MP) mesh: a uniform \(N_1 \times N_2 \times N_3\) grid in reciprocal space, optionally shifted to include or exclude \(\Gamma\). The total number of \(k\)-points is reduced by the point-group symmetry of the crystal.

Convergence with \(k\)-mesh density:

  • Metals require denser meshes (many \(k\)-points near the Fermi surface): \(20\times 20\times 20\) or more for simple metals.
  • Semiconductors and insulators with gaps converge faster: \(6\times 6\times 6\) is often sufficient.
  • A smearing method (Methfessel–Paxton, Fermi–Dirac, or Gaussian) is applied to the occupation numbers for metals to accelerate convergence.

The \(k\)-point density and \(E_{\rm cut}\) must be converged independently and jointly: the two parameters interact through the density, so a joint convergence test (varying both) is the most rigorous approach.

Localised Basis Sets

An alternative to plane waves is to expand orbitals in atom-centred functions that mimic the shape of atomic orbitals. Each basis function \(f_\mu(\mathbf{r}) = f_\mu(|\mathbf{r}-\mathbf{R}_A|)\) is centred on atom \(A\).

Gaussian-Type Orbitals (GTOs)

GTOs take the form \(f(r) = r^l e^{-\alpha r^2} Y_l^m(\hat{\mathbf{r}})\), where \(\alpha\) is the exponent and \(Y_l^m\) is a real spherical harmonic. Multi-Gaussian contractions are used:

\begin{equation} f_\mu(r) = \sum_k d_{\mu k}\,g_k(r,\alpha_k). \end{equation}

GTOs are the standard in quantum chemistry codes (Gaussian, CRYSTAL, FHI-aims). Common basis set families: 6-31G, cc-pVDZ/TZ, def2-TZVP. The acronyms encode the contraction scheme; “triple-zeta” basis sets use three groups of Gaussians per angular momentum channel and are typically converged for production calculations. Diffuse functions (denoted by \(+\) or “aug-”) are important for anions and excited states.

Numerical Atomic Orbitals (NAOs)

NAOs use numerically tabulated radial functions, typically derived from atomic DFT calculations. They are compact (strictly localised beyond a cutoff radius) and efficient for large-scale calculations. Used in FHI-aims (numeric all-electron), SIESTA, OpenMX.

Advantages of localised bases:

  • Efficient for molecules and non-periodic systems (no vacuum padding needed).
  • Sparse Hamiltonian and overlap matrices enable \(\mathcal{O}(N)\) scaling algorithms.
  • Natural interface with chemical intuition (orbital populations, Mulliken/Löwdin charges).

Disadvantages:

  • Basis set superposition error (BSSE): When two fragments approach, each borrows basis functions from the other, artificially lowering the energy. The counterpoise correction (Boys–Bernardi) partially remedies this.
  • Incompleteness: The basis is not systematically improvable by a single parameter; basis quality depends on the choice of exponents and contraction.
  • Egg-shaped convergence: Difficult to converge to the complete basis set (CBS) limit.

Core Electrons: Three Strategies

Relativistic and strongly oscillating core orbitals are expensive to represent and barely participate in chemistry or bonding. Three strategies exist for handling them.

All-Electron (AE) Calculations

All electrons, including core, are treated explicitly. Mandatory when:

  • Core–valence interactions are chemically important (e.g. hyperfine coupling, NMR shifts, X-ray spectra).
  • High-accuracy benchmarks are required.
  • Working with very light elements (H, Li) where “core” electrons don’t exist.

Examples: FHI-aims (NAO), WIEN2k (LAPW), FLEUR.

Norm-Conserving Pseudopotentials (NCPP)

The core electrons are replaced by an effective potential \(V_{\rm PP}(\mathbf{r})\) that reproduces the scattering properties of the full ionic potential in the valence region. The pseudo-wavefunction \(\tilde{\phi}_l(r)\) agrees with the true all-electron wavefunction \(\phi_l(r)\) outside a cutoff radius \(r_c\):

\begin{equation} \tilde{\phi}_l(r) = \phi_l(r), \quad r > r_c. \end{equation}

The norm-conservation condition requires:

\begin{equation} \int_0^{r_c} |\tilde{\phi}_l(r)|^2 r^2\,dr = \int_0^{r_c} |\phi_l(r)|^2 r^2\,dr, \end{equation}

ensuring that the charge inside \(r_c\) is preserved and that the logarithmic derivatives (which control scattering) match at \(r_c\). The Hamann–Schlüter–Chiang (HSC) and Troullier–Martins (TM) schemes are standard constructions.

NCPPs typically require \(E_{\rm cut} \sim 60\)–\(100\) Ry and are used in codes such as Quantum ESPRESSO and ABINIT.

Ultrasoft Pseudopotentials (USPP)

The ultrasoft pseudopotential (Vanderbilt, 1990) relaxes the norm-conservation condition, allowing smoother pseudo-wavefunctions that require much lower plane-wave cutoffs (\(E_{\rm cut} \sim 25\)–\(40\) Ry). The pseudo-wavefunction \(\tilde{\phi}_l(r)\) is permitted to deviate from the all-electron \(\phi_l(r)\) even for \(r > r_c^{\rm soft}\), producing a softer nodeless function.

The missing charge that would violate the norm-conservation sum rule is compensated by augmentation charges \(Q_{nm}(\mathbf{r})\) added to the density:

\begin{equation} \rho(\mathbf{r}) = \sum_i |\tilde{\phi}_i(\mathbf{r})|^2 + \sum_a\sum_{nm} \rho_{nm}^a Q_{nm}^a(\mathbf{r}), \end{equation}

where \(\rho_{nm}^a = \sum_i \langle\tilde{\phi}_i|\tilde{p}_n^a\rangle\langle\tilde{p}_m^a|\tilde{\phi}i\rangle\) are the occupancy matrices for atom \(a\), and \(Q{nm}^a(\mathbf{r})\) are localised functions inside the augmentation sphere that carry the missing charge. The generalised eigenvalue problem that results from USPP differs from the standard KS equation, requiring a modified orthonormality condition; however, this adds negligible overhead in practice.

USPP is available in Quantum ESPRESSO and CASTEP and offers an excellent cost-to-accuracy ratio for most elements.

Projector Augmented Wave (PAW) Method

The PAW method (Blöchl, 1994) is the most rigorous of the pseudopotential-like approaches and is now the default in most major plane-wave codes. It can be understood as a generalisation of USPP that is formally equivalent to an all-electron calculation.

The PAW Transformation

PAW introduces a linear transformation operator \(\hat{\mathcal{T}}\) that maps smooth, computationally convenient pseudo-wavefunctions \(|\tilde{\Psi}_i\rangle\) onto the true all-electron wavefunctions \(|\Psi_i\rangle\):

\begin{equation} |\Psi_i\rangle = \hat{\mathcal{T}}\,|\tilde{\Psi}_i\rangle. \label{eq:PAW-transform} \end{equation}

Inside the augmentation sphere of radius \(r_c^a\) centred on each atom \(a\), the smooth pseudo-wavefunction is a poor approximation to the oscillatory all-electron wavefunction. The transformation \(\hat{\mathcal{T}}\) corrects for this, atom by atom:

\begin{equation} \hat{\mathcal{T}} = \mathbf{1} + \sum_a \hat{\mathcal{T}}^a, \qquad \hat{\mathcal{T}}^a = \sum_{n} \left(|\phi_n^a\rangle - |\tilde{\phi}_n^a\rangle\right)\langle\tilde{p}_n^a|, \label{eq:PAW-T} \end{equation}

where:

  • \(|\phi_n^a\rangle\) are all-electron partial waves — solutions of the radial Schrödinger equation for the isolated atom \(a\) at reference energies, labelled by \(n = (l, m, \epsilon_{\rm ref})\).
  • \(|\tilde{\phi}_n^a\rangle\) are smooth pseudo partial waves that match \(|\phi_n^a\rangle\) outside \(r_c^a\) but are nodeless and smooth inside.
  • \(\langle\tilde{p}_n^a|\) are projector functions localised inside \(r_c^a\), dual to the pseudo partial waves: \(\langle\tilde{p}_n^a|\tilde{\phi}m^a\rangle = \delta{nm}\).

The transformation acts as follows: when \(\hat{\mathcal{T}}\) is applied to \(|\tilde{\Psi}_i\rangle\), the projectors \(\langle\tilde{p}_n^a|\) measure the overlap of the smooth wavefunction with the partial waves in each augmentation sphere, and the difference \((|\phi_n^a\rangle - |\tilde{\phi}_n^a\rangle)\) restores the all-electron character inside that sphere. Outside all augmentation spheres, \(\hat{\mathcal{T}}^a = 0\) and the smooth wavefunction is used directly.

Reconstruction of the All-Electron Density

The all-electron charge density is:

\begin{equation} \rho(\mathbf{r}) = \tilde{\rho}(\mathbf{r}) + \rho^1(\mathbf{r}) - \tilde{\rho}^1(\mathbf{r}), \label{eq:PAW-density} \end{equation}

where each term has a specific role:

  • \(\tilde{\rho}(\mathbf{r}) = \sum_i f_i,|\tilde{\Psi}_i(\mathbf{r})|^2\) is the smooth pseudo-density, computed from the plane-wave-expanded pseudo-wavefunctions throughout the entire cell. This is the computationally cheap term, evaluated with FFTs.

  • \(\rho^1(\mathbf{r}) = \sum_a\sum_{nm}\rho_{nm}^a,\phi_n^a(\mathbf{r})\phi_m^a(\mathbf{r})\) is the on-site all-electron density, reconstructed analytically from the all-electron partial waves inside the augmentation spheres. Here \(\rho_{nm}^a = \sum_i f_i\langle\tilde{\Psi}_i|\tilde{p}_n^a\rangle\langle\tilde{p}_m^a|\tilde{\Psi}_i\rangle\) are the occupation matrices that quantify the partial-wave character of each KS orbital at each atom.

  • \(\tilde{\rho}^1(\mathbf{r}) = \sum_a\sum_{nm}\rho_{nm}^a,\tilde{\phi}_n^a(\mathbf{r})\tilde{\phi}_m^a(\mathbf{r})\) is the on-site pseudo-density, which subtracts the smooth-wavefunction contribution inside the augmentation spheres so it is not double-counted with \(\tilde{\rho}\).

The total energy is decomposed in the same three-part way. The smooth part is evaluated in reciprocal space via FFTs; the on-site parts are computed analytically on radial grids inside each augmentation sphere. This decomposition is the key to PAW’s efficiency: the expensive oscillatory core region is handled on a small radial grid, while the smooth interstitial region uses the standard plane-wave machinery.

The PAW Total Energy

The total energy functional in PAW is:

\begin{equation} E = \tilde{E} + E^1 - \tilde{E}^1, \end{equation}

where \(\tilde{E}\) is the energy of the pseudo-system (computed with plane waves), \(E^1\) is the all-electron on-site energy inside the augmentation spheres, and \(\tilde{E}^1\) is the corresponding pseudo on-site energy that prevents double-counting. The KS equations derived from this functional take the form of a generalised eigenvalue problem:

\begin{equation} \hat{\tilde{H}}\,|\tilde{\Psi}_i\rangle = \epsilon_i\,\hat{S}\,|\tilde{\Psi}_i\rangle, \end{equation}

where \(\hat{\tilde{H}}\) is the smooth KS Hamiltonian (including on-site corrections) and \(\hat{S} = \mathbf{1} + \sum_a\sum_{nm}({\langle\tilde{p}n^a|\tilde{p}m^a\rangle{\rm AE} - \delta{nm}})|\tilde{p}_m^a\rangle\langle\tilde{p}_n^a|\) is the overlap operator that enforces the correct normalisation of the all-electron wavefunctions.

Strengths of PAW

  • Formally all-electron: The full all-electron density is available in the augmentation spheres through equation \eqref{eq:PAW-density}. This gives direct access to hyperfine parameters, electric field gradients, NMR chemical shifts, and core-level binding energies.
  • Low \(E_{\rm cut}\): The smooth part \(|\tilde{\Psi}i\rangle\) requires \(E{\rm cut} \sim 400\)–\(600\) eV (VASP), similar to USPP, far below what a true all-electron plane-wave calculation would need.
  • Transferability: The partial waves are atom-specific but not system-specific; the same PAW dataset works well across different chemical environments, unlike empirically fitted NCPP.
  • Extensibility: The occupation matrices \(\rho_{nm}^a\) are a natural handle for DFT+U and hybrid functional implementations.

Limitations of PAW

  • Fixed augmentation spheres: If \(r_c^a\) is too small, the smooth part is not genuinely smooth; if too large, augmentation spheres on adjacent atoms overlap, invalidating the non-overlapping sphere assumption. Standard VASP PAW datasets are designed to avoid overlap for typical bond lengths, but this must be verified under compression.
  • Linearisation of the partial wave expansion: PAW is exact only if the partial wave basis \({|\phi_n^a\rangle}\) is complete within the augmentation sphere. For highly ionic environments or unusual oxidation states, a larger partial wave set or scattering energies closer to the occupied states may be needed. VASP provides both standard and “hard” PAW datasets for such cases.

Comparison of core-electron strategies:

Method\(E_{\rm cut}\) (typical)Core accuracyAccess to core densityCostDefault in
NCPP\(60\)–\(100\) RyGood (outside \(r_c\))NoModerateQE, ABINIT
USPP\(25\)–\(40\) RyGood (valence)NoLowQE, CASTEP
PAW\(30\)–\(50\) RyExcellent (AE in spheres)YesLow–moderateVASP, QE
All-electron\(\gg 100\) RyExactYesHighFHI-aims, WIEN2k

Practical Convergence Protocol

A rigorous DFT calculation requires the following checks, performed in order:

  1. \(E_{\rm cut}\) convergence: Increase the plane-wave cutoff until the total energy changes by less than \(1\)–\(2\) meV/atom. Use the converged cutoff for all subsequent calculations.

  2. \(k\)-mesh convergence: Increase the Monkhorst–Pack grid density until the total energy and the quantity of interest (e.g. magnetic moment, band gap) are converged. Report the converged grid as \(N_1 \times N_2 \times N_3\).

  3. Supercell size (for defects, surfaces, molecules): Increase the cell size until the interaction between periodic images is negligible (\(\Delta E \lt 1\) meV/defect).

  4. SCF convergence criterion: Tighten the SCF energy tolerance to \(\sim 10^{-6}\) eV for energy differences; \(\sim 10^{-7}\) eV for forces or stress calculations.

  5. Structural convergence: Relax ionic positions until forces are below \(0.01\)–\(0.02\) eV/Å; relax the cell until stresses are below \(0.1\) kbar.

Failure to converge any one of these can produce errors that dwarf those from the choice of XC functional. Convergence tests must be performed for each new system class; results from one system cannot be assumed transferable.

Outlook

With the KS equations specified, the XC functional chosen, and the basis and pseudopotential converged, we have all the ingredients for a standard spin-unpolarised DFT calculation. The next chapter extends the framework to spin-polarised systems, which are essential for magnetic materials and the study of exchange coupling in Heusler alloys and related compounds.

Spin-Polarised DFT and Magnetism

The standard Kohn–Sham DFT developed in Chapter 3 treats electrons as spinless particles: the density \(\rho(\mathbf{r})\) carries no spin label, and the KS equations yield doubly degenerate orbital energies. This is adequate for non-magnetic systems, but fails entirely when the ground state breaks time-reversal symmetry through a net electron spin polarisation.

To describe magnetic materials — from ferromagnetic iron to the half-metallic Heusler alloys and antiferromagnetic insulators — we must extend DFT to spin-polarised DFT (SDFT), and, for materials with strong spin–orbit coupling, to the full non-collinear formalism. This chapter develops the theoretical framework and connects it to the key observables: magnetic moments, exchange coupling constants \(J\), and magnetic anisotropy energies (MAE).

The Spin-Density Functional

Collinear SDFT

In collinear SDFT, spins are quantised along a fixed axis (conventionally \(z\)). The fundamental variable is replaced by the spin-resolved density:

\begin{equation} \rho_\sigma(\mathbf{r}) = \sum_{i} f_{i\sigma},|\phi_{i\sigma}(\mathbf{r})|^2, \qquad \sigma \in {\uparrow, \downarrow}, \end{equation}

where \(f_{i\sigma}\) are the orbital occupation numbers and the sum runs over all KS orbitals of spin \(\sigma\). The total and magnetisation densities are:

\begin{equation} \rho(\mathbf{r}) = \rho_\uparrow(\mathbf{r}) + \rho_\downarrow(\mathbf{r}), \qquad m(\mathbf{r}) = \rho_\uparrow(\mathbf{r}) - \rho_\downarrow(\mathbf{r}). \end{equation}

The HK theorem is generalised to show that the ground state is uniquely determined by the pair \((\rho(\mathbf{r}), m(\mathbf{r}))\), or equivalently by \((\rho_\uparrow, \rho_\downarrow)\). The total energy functional becomes:

\begin{equation} E[\rho_\uparrow, \rho_\downarrow] = T_s[\rho_\uparrow, \rho_\downarrow] + E_{\rm H}[\rho] + E_{\rm xc}[\rho_\uparrow, \rho_\downarrow] + \int V_{\rm ext}(\mathbf{r}),\rho(\mathbf{r}),d\mathbf{r}. \end{equation}

The two sets of spin-channel KS equations decouple:

\begin{equation} \left[-\frac{1}{2}\nabla^2 + V_{\rm eff}^\sigma(\mathbf{r})\right]\phi_{i\sigma}(\mathbf{r}) = \epsilon_{i\sigma},\phi_{i\sigma}(\mathbf{r}), \label{eq:SDFT-KS} \end{equation}

with spin-dependent effective potentials:

\begin{equation} V_{\rm eff}^\sigma(\mathbf{r}) = V_{\rm ext}(\mathbf{r}) + V_{\rm H}(\mathbf{r}) + V_{\rm xc}^\sigma(\mathbf{r}), \qquad V_{\rm xc}^\sigma = \frac{\delta E_{\rm xc}}{\delta \rho_\sigma(\mathbf{r})}. \end{equation}

The Hartree potential \(V_{\rm H}\) is spin-independent (it depends on the total \(\rho\)), while \(V_{\rm xc}^\sigma\) differs between spin channels, producing a spin-splitting of the eigenvalues \(\epsilon_{i\uparrow} \neq \epsilon_{i\downarrow}\). This is the quantum-mechanical origin of the exchange splitting that drives ferromagnetism.

The Spin-Polarised XC Functional

The XC functional must be generalised to take both \(\rho_\uparrow\) and \(\rho_\downarrow\) as arguments. For the LDA, the local spin-density approximation (LSDA) uses the XC energy of a spin-polarised uniform electron gas:

\[ E_{\rm xc}^{\rm LSDA}[\rho_\uparrow, \rho_\downarrow] = \int \varepsilon_{\rm xc}^{\rm UEG}(\rho_\uparrow(\mathbf{r}), \rho_\downarrow(\mathbf{r})),\rho(\mathbf{r}),d\mathbf{r}. \]

Derivation of the spin-scaling relation (von Barth–Hedin). The key step is to express \(\varepsilon_{\rm xc}(\rho_\uparrow, \rho_\downarrow)\) in terms of the unpolarised value \(\varepsilon_{\rm xc}(\rho, 0)\) and the fully polarised value \(\varepsilon_{\rm xc}(\rho/2, \rho/2)\). For exchange, the exact spin-scaling follows from the Dirac formula: a fully spin-up gas of density \(\rho\) has exchange energy \(\varepsilon_{\rm x}(\rho, 0) = \varepsilon_{\rm x}^{\rm UEG}(\rho)\), while a spin-up gas of density \(\rho_\uparrow\) contributes \(\varepsilon_{\rm x}(\rho_\uparrow)\cdot(\rho_\uparrow/\rho)\) per unit total density. Adding both spin channels:

\[ \varepsilon_{\rm x}(\rho_\uparrow, \rho_\downarrow) = \frac{1}{2}\left[\varepsilon_{\rm x}^{\rm UEG}(2\rho_\uparrow)\frac{\rho_\uparrow}{\rho/2} + \varepsilon_{\rm x}^{\rm UEG}(2\rho_\downarrow)\frac{\rho_\downarrow}{\rho/2}\right]\cdot\frac{1}{2}. \]

Since \(\varepsilon_{\rm x}^{\rm UEG}(\rho) \propto \rho^{1/3}\), this simplifies to the exact spin-scaling relation \(\varepsilon_{\rm x}(\rho_\uparrow,\rho_\downarrow) = \frac{1}{2}[(1+\zeta)^{4/3}+(1-\zeta)^{4/3}],\varepsilon_{\rm x}^{\rm UEG}(\rho)\). For correlation, no exact spin-scaling exists; von Barth and Hedin (1972) proposed interpolating between the paramagnetic (\(\zeta=0\)) and ferromagnetic (\(\zeta=1\)) limits using a function \(f(\zeta)\) that satisfies \(f(0)=0\), \(f(1)=1\), and \(f’’(0) = 4/9\) (from the perturbative UEG correlation):

\[ \varepsilon_{\rm xc}(\rho_\uparrow, \rho_\downarrow) = \varepsilon_{\rm xc}(\rho, 0) + [\varepsilon_{\rm xc}(\rho/2, \rho/2) - \varepsilon_{\rm xc}(\rho, 0)],f(\zeta), \]

where \(\zeta = (\rho_\uparrow - \rho_\downarrow)/\rho\) is the spin polarisation and the standard choice is:

\[ f(\zeta) = \frac{(1+\zeta)^{4/3} + (1-\zeta)^{4/3} - 2}{2(2^{1/3}-1)}. \]

This interpolation is exact for exchange and approximate for correlation; it reduces to the correct limits at \(\zeta = 0\) (paramagnetic UEG) and \(\zeta = 1\) (fully polarised UEG). GGA and meta-GGA functionals are extended analogously using the same spin-scaling for exchange and analogous interpolations for correlation.

Magnetic Moments

The local magnetic moment on site \(A\) is:

\begin{equation} \mu_A = \int_{\Omega_A} m(\mathbf{r}),d\mathbf{r} = \int_{\Omega_A} [\rho_\uparrow(\mathbf{r}) - \rho_\downarrow(\mathbf{r})],d\mathbf{r}, \end{equation}

where \(\Omega_A\) is the Wigner–Seitz (or PAW augmentation) sphere of atom \(A\). The total magnetisation is:

\begin{equation} M = \int m(\mathbf{r}),d\mathbf{r} = N_\uparrow - N_\downarrow, \end{equation}

where \(N_\sigma = \int \rho_\sigma,d\mathbf{r}\) is the total number of electrons of spin \(\sigma\).

For elemental ferromagnets, SDFT recovers the experimental moments well: Fe (\(2.2,\mu_B\) exp., \(2.1\)–\(2.3,\mu_B\) DFT), Co (\(1.7,\mu_B\)), Ni (\(0.6,\mu_B\)). Orbital contributions (from spin–orbit coupling) are not included at the collinear SDFT level.

Magnetic Ordering: Ferro-, Antiferro-, and Ferrimagnetic States

By setting up different initial spin configurations in the SCF cycle, collinear SDFT can target different magnetic orderings:

  • Ferromagnetic (FM): all local moments aligned parallel, \(M \neq 0\).
  • Antiferromagnetic (AFM): moments on sublattices aligned antiparallel, \(M = 0\) globally. Requires a supercell large enough to contain both sublattices.
  • Ferrimagnetic (FiM): unequal antiparallel moments on distinct sublattices, \(M \neq 0\). Common in half-Heusler and spinel compounds.

To determine which ordering is the ground state, compute the total energy for each configuration and compare: \(\Delta E = E_{\rm AFM} - E_{\rm FM}\).

Exchange Coupling: The Heisenberg Model

The energy difference between magnetic configurations can be mapped onto the Heisenberg spin Hamiltonian:

\begin{equation} \hat{\mathcal{H}}_{\rm Heis} = -\sum_{\langle i,j\rangle} J_{ij},\hat{\mathbf{S}}_i \cdot \hat{\mathbf{S}}_j, \end{equation}

where \(\hat{\mathbf{S}}_i\) is the spin operator on site \(i\) and \(J_{ij}\) is the exchange coupling constant between sites \(i\) and \(j\). The sign convention: \(J > 0\) favours FM alignment; \(J \lt 0\) favours AFM.

Extraction of \(J\) from Total Energies

For a two-sublattice system with spin \(S\) per site, the energy difference between FM and AFM configurations (with \(z\) equivalent nearest-neighbour pairs per formula unit) is:

\begin{equation} \Delta E = E_{\rm AFM} - E_{\rm FM} = 2J,z,S^2. \label{eq:J-extraction} \end{equation}

Hence:

\begin{equation} J = \frac{E_{\rm AFM} - E_{\rm FM}}{2,z,S^2}. \end{equation}

For more complex magnetic structures with multiple shells of neighbours, the four-state method (Xiang et al.) or the spin-spiral approach (Section below) provides a systematic way to disentangle \(J_1, J_2, J_3, \ldots\)

Liechtenstein–Katsnelson–Antropov–Gubanov (LKAG) Method

An alternative to total energy differences is the LKAG (linear response) method, which computes \(J_{ij}\) directly from the Green’s function:

\begin{equation} J_{ij} = \frac{1}{4\pi}\int_{-\infty}^{E_F} {\rm Im},{\rm Tr}_L\left[\Delta_i G_{ij}^\uparrow(E),\Delta_j G_{ji}^\downarrow(E)\right],dE, \end{equation}

where \(\Delta_i = V_{\rm xc,i}^\uparrow - V_{\rm xc,i}^\downarrow\) is the local exchange splitting on site \(i\) and \(G_{ij}^\sigma\) are the spin-resolved intersite Green’s functions. This method gives the full \(J_{ij}\) tensor in a single DFT calculation and is available in codes such as FLEUR, SPR-KKR, and Questaal.

Practical note: Total energy methods (equation \eqref{eq:J-extraction}) are simpler and widely used. They require DFT+U or hybrid functionals for correlated \(d\)/\(f\)-electron systems where GGA overdelocalises the magnetic orbitals and underestimates \(J\).

Non-Collinear SDFT

In the collinear formalism, the magnetisation is constrained to lie along \(z\). This is insufficient for:

  • Spin spirals and incommensurate magnetic structures.
  • Spin frustration in triangular or kagome lattices.
  • Spin–orbit coupling (SOC) effects: the magnetisation direction matters when SOC is present, and it may vary in space.
  • Dzyaloshinskii–Moriya interaction (DMI) and skyrmions.

In non-collinear SDFT, the density variable is replaced by the full \(2\times 2\) density matrix in spin space:

\begin{equation} n_{\alpha\beta}(\mathbf{r}) = \sum_i f_i,\phi_i^\alpha(\mathbf{r})\left(\phi_i^\beta(\mathbf{r})\right)^*, \qquad \alpha,\beta \in {\uparrow,\downarrow}, \end{equation}

which is related to the magnetisation density vector \(\mathbf{m}(\mathbf{r})\) via:

\begin{equation} n_{\alpha\beta} = \frac{1}{2}\left[\rho,\delta_{\alpha\beta} + \mathbf{m}\cdot\boldsymbol{\sigma}_{\alpha\beta}\right], \end{equation}

where \(\boldsymbol{\sigma} = (\sigma_x, \sigma_y, \sigma_z)\) are the Pauli matrices.

The KS orbitals become two-component spinors \(\boldsymbol{\phi}_i = (\phi_i^\uparrow, \phi_i^\downarrow)^T\), and the KS equations take the matrix form:

\begin{equation} \left[-\frac{1}{2}\nabla^2,\mathbf{I} + \mathbf{V}_{\rm eff}(\mathbf{r})\right]\boldsymbol{\phi}_i(\mathbf{r}) = \epsilon_i,\boldsymbol{\phi}_i(\mathbf{r}), \end{equation}

with the \(2\times 2\) effective potential:

\begin{equation} \mathbf{V}_{\rm eff}(\mathbf{r}) = \left[V_{\rm ext} + V_{\rm H} + V_{\rm xc}^{(0)}\right]\mathbf{I} - \mathbf{B}_{\rm xc}(\mathbf{r})\cdot\boldsymbol{\sigma}, \end{equation}

where \(V_{\rm xc}^{(0)} = \frac{1}{2}(V_{\rm xc}^\uparrow + V_{\rm xc}^\downarrow)\) is the spin-averaged XC potential and \(\mathbf{B}_{\rm xc} = \frac{1}{2}(V_{\rm xc}^\uparrow - V_{\rm xc}^\downarrow)\hat{\mathbf{m}}\) is the XC magnetic field aligned along the local magnetisation direction \(\hat{\mathbf{m}}(\mathbf{r})\).

Spin–Orbit Coupling (SOC)

Spin–orbit coupling is a relativistic effect arising from the interaction of an electron’s spin magnetic moment with the magnetic field it experiences due to its orbital motion around the nucleus. In the Pauli approximation it takes the form:

\begin{equation} \hat{H}_{\rm SOC} = \frac{\hbar^2}{4m_e^2c^2}\frac{1}{r}\frac{dV}{dr},\hat{\mathbf{L}}\cdot\hat{\mathbf{S}}, \end{equation}

where \(\hat{\mathbf{L}}\) and \(\hat{\mathbf{S}}\) are the orbital and spin angular momentum operators, and \(dV/dr\) is the radial derivative of the ionic potential. SOC is strong near heavy nuclei (scales as \(Z^4\)), making it important for \(4d\), \(5d\), \(4f\), and \(5f\) elements.

In SDFT, SOC is most commonly included via the second-variational method:

  1. Solve the scalar-relativistic (collinear) SDFT problem first.
  2. Construct the SOC matrix elements in the basis of scalar-relativistic eigenstates.
  3. Diagonalise the full Hamiltonian \(\hat{H}_{\rm SR} + \hat{H}_{\rm SOC}\).

Full self-consistent inclusion of SOC requires the non-collinear spinor formalism.

Magnetic Anisotropy Energy (MAE)

The magnetic anisotropy energy is the difference in total energy between magnetisation oriented along the easy axis and a hard axis:

\begin{equation} {\rm MAE} = E(\hat{\mathbf{n}}_{\rm hard}) - E(\hat{\mathbf{n}}_{\rm easy}), \end{equation}

typically measured in \(\mu\)eV/atom for \(3d\) metals or meV/atom for heavy-element compounds. MAE determines the magnetic hardness of a material: a large positive MAE (perpendicular easy axis) is required for permanent magnets and perpendicular magnetic recording media.

MAE arises entirely from spin–orbit coupling: without SOC, all directions are degenerate. It vanishes in spin-orbit-free collinear SDFT and requires either:

  • Self-consistent non-collinear calculation with SOC for large MAE (e.g. \(L1_0\)-FePt, \(> 1\) meV/atom).
  • Force theorem (perturbative SOC): For small MAE, compute the non-self-consistent energy with SOC applied to the converged scalar-relativistic density. Computationally cheaper but breaks down when SOC significantly modifies the density.

The MAE is dominated by the band energy contribution (summed KS eigenvalue difference):

\begin{equation} {\rm MAE} \approx \sum_i (f_i^{\rm hard},\epsilon_i^{\rm hard} - f_i^{\rm easy},\epsilon_i^{\rm easy}), \end{equation}

which requires very dense \(k\)-meshes to converge (fine Brillouin zone features near the Fermi level). Typical convergence requires \(40\times 40\times 40\) or finer grids for transition metal systems.

Contribution from Orbital Moments: Bruno Relation

The connection between MAE and orbital moments was derived by Bruno (1989) using second-order perturbation theory in the SOC strength \(\xi\). The SOC Hamiltonian is:

\[ \hat{H}_{\rm SOC} = \xi,\hat{\mathbf{L}}\cdot\hat{\mathbf{S}}. \]

For a system with spin quantised along \(\hat{\mathbf{n}}\), the second-order energy correction is:

\[ E^{(2)}(\hat{\mathbf{n}}) = -\xi^2 \sum_{o,u} \frac{|\langle u|\hat{H}_{\rm SOC}^{(\hat{\mathbf{n}})}|o\rangle|^2}{\epsilon_u - \epsilon_o}, \]

where \(o\) and \(u\) label occupied and unoccupied states. Bruno showed that for a system with uniaxial symmetry and spin fixed along \(\hat{\mathbf{n}}\), the dominant contribution to this sum is proportional to the expectation value of the orbital angular momentum along \(\hat{\mathbf{n}}\):

\[ E^{(2)}(\hat{\mathbf{n}}) \approx -\frac{\xi}{4\mu_B}\langle \hat{L}_{\hat{\mathbf{n}}} \rangle, \]

where \(\langle \hat{L}_{\hat{\mathbf{n}}} \rangle = \mu_B^{-1}\langle\hat{m}_L^{\hat{\mathbf{n}}}\rangle\) is the orbital moment along \(\hat{\mathbf{n}}\) in units of \(\mu_B\). The MAE then follows as the difference between two magnetisation directions:

\[ {\rm MAE} = E(\hat{\mathbf{n}}_{\rm hard}) - E(\hat{\mathbf{n}}_{\rm easy}) \approx -\frac{\xi}{4\mu_B}\left(\langle L_z^{\rm hard}\rangle - \langle L_z^{\rm easy}\rangle\right). \]

Interpretation: The easy axis is the direction along which the orbital moment is largest (most negative \(E^{(2)}\)), since a larger orbital moment means more SOC energy lowering. Materials with large orbital moments — rare-earth \(4f\) ions, \(5d\) transition metals, or low-symmetry environments that quench the orbital moment less — consequently have large MAE. This is the microscopic basis for the empirical rule that heavier elements with unquenched orbital moments make better permanent magnets.

Caveats: Bruno’s derivation assumes (i) perturbative SOC (\(\xi \ll\) bandwidth), (ii) rigid band approximation (same orbitals for both directions), and (iii) collinear spin. It is quantitatively reliable for \(3d\) metals but breaks down for \(4f\) systems where SOC is comparable to the crystal field splitting, or for systems near band crossings where the denominator \(\epsilon_u - \epsilon_o\) becomes small. In those cases the full non-perturbative calculation is required.

Application to Heusler Alloys

Heusler alloys (\(X_2YZ\) full Heusler or \(XYZ\) half-Heusler) are a rich playground for SDFT because they exhibit a wide range of magnetic phenomena:

Half-metallic ferromagnetism: In compounds like Co\(_2\)MnSi, one spin channel (\(\uparrow\)) is metallic while the other (\(\downarrow\)) has a band gap at \(E_F\), giving a spin polarisation of 100% in principle. SDFT with GGA predicts this correctly for many Heuslers; a scissor correction or hybrid functional is needed to accurately reproduce the minority-spin gap.

Slater–Pauling rule: The total magnetic moment per formula unit follows \(M = N_v - 24\) (full Heusler) or \(M = N_v - 18\) (half-Heusler), where \(N_v\) is the total valence electron count. This rule, reproduced by SDFT, provides a rapid screening criterion for half-metallicity.

Exchange coupling and \(T_C\): The Curie temperature can be estimated from \(J_{ij}\) values via mean-field theory (\(k_B T_C^{\rm MF} = \frac{2}{3}J_0\)) or random phase approximation (RPA), with RPA giving more accurate results for itinerant magnets. SDFT+LKAG is the standard approach.

MAE in Heuslers: Most cubic Heuslers have small MAE (\(\sim \mu\)eV/atom) due to cubic symmetry. Tetragonally distorted Heuslers (e.g. Mn\(_3\)Ga) or inverse Heuslers can have significantly larger MAE, relevant for spin-transfer torque applications.

Practical Checklist for Magnetic DFT Calculations

  1. Initialise spin correctly: Supply a non-zero initial magnetic moment to each magnetic site; the SCF loop will not spontaneously break spin symmetry from a non-magnetic starting point.
  2. Check for multiple magnetic minima: Try several initial configurations (FM, AFM, ferrimagnetic) and compare final energies. Collinear SDFT can get trapped in local minima.
  3. Use appropriate XC: PBE is standard; consider PBE+U or hybrid functionals for correlated \(d/f\) systems where GGA under-localises the magnetic orbitals.
  4. For MAE: Use a non-collinear calculation with SOC; converge the \(k\)-mesh carefully (MAE is extremely sensitive to \(k\)-point density).
  5. For \(J\): Cross-check total-energy and Green’s function methods; the two should agree within \(\sim 10\)–\(20%\).
  6. Report orbital moments: When SOC is included, report both spin and orbital magnetic moments for comparison with XMCD experiments.

Outlook

Collinear SDFT provides an accurate and efficient framework for most magnetic materials, with non-collinear SDFT required for frustrated systems, SOC-driven anisotropy, and DMI. A systematic limitation shared by all SDFT methods is the treatment of strongly correlated \(d\) and \(f\) electrons, where the mean-field XC potential fails to capture Mott–Hubbard physics. The DFT+U method, which introduces an on-site Hubbard correction for localised orbitals, is the subject of the next chapter.

DFT+U: Hubbard Correction for Correlated Electrons

Standard DFT — whether LDA or GGA — describes electrons as moving in an effective single-particle potential constructed from the average charge density. This mean-field picture works well when electrons are itinerant, delocalised across the crystal in broad bands. It fails systematically for materials containing partially filled \(d\) or \(f\) shells, where electrons are spatially localised and strongly repel each other on the same atomic site. The canonical failures are well-known: GGA predicts NiO, CoO, and MnO to be metals; their experimental ground states are magnetic insulators with gaps of \(3\)–\(4\) eV. Rare-earth oxides, actinide compounds, and many Heusler alloys with localised Mn or Co \(d\) states share the same pathology.

The root cause is the self-interaction error (SIE) and the failure of the semi-local XC functional to reproduce the Coulomb blockade — the large energy cost \(U\) of placing two electrons on the same atomic orbital simultaneously. The DFT+U method (Anisimov, Zaanen, Andersen, 1991) corrects this by appending a Hubbard-like penalty term to the DFT energy functional, adding the cost of double occupation explicitly for a chosen set of localised orbitals. It is the most widely used method for strongly correlated materials in a plane-wave or PAW framework.

The Hubbard Model: Physical Motivation

The Hubbard model (Hubbard, 1963) is the minimal lattice model capturing the competition between kinetic energy (electron hopping, bandwidth \(W\)) and on-site Coulomb repulsion:

\begin{equation} \hat{H}_{\rm Hub} = -t\sum_{\langle i,j\rangle,\sigma}\hat{c}_{i\sigma}^\dagger\hat{c}_{j\sigma} + U\sum_i \hat{n}_{i\uparrow}\hat{n}_{i\downarrow}, \label{eq:Hubbard} \end{equation}

where \(\hat{c}_{i\sigma}^\dagger\) creates an electron of spin \(\sigma\) on site \(i\), \(t\) is the nearest-neighbour hopping integral, \(U\) is the on-site Coulomb repulsion, and \(\hat{n}_{i\sigma} = \hat{c}_{i\sigma}^\dagger\hat{c}_{i\sigma}\) is the occupation operator.

The two limits are illuminating:

  • \(U/W \ll 1\) (itinerant limit): The bandwidth \(W \sim zt\) (where \(z\) is the coordination number) dominates. Electrons delocalise and form a metallic band. Standard DFT handles this regime well.

  • \(U/W \gg 1\) (Mott–Hubbard limit): The on-site repulsion dominates. For a half-filled band (one electron per site), double occupancy costs energy \(U\), so electrons localise on individual sites and the system becomes a Mott insulator, even though band theory would predict a metal. DFT with semi-local XC functionals fails here because the mean-field treatment of electron–electron interaction cannot produce this correlation-driven localisation.

The Mott–Hubbard gap is approximately \(U - W\): it opens when the repulsion energy exceeds the kinetic energy gain from delocalisation. For transition metal oxides, \(U \sim 4\)–\(10\) eV and \(W \sim 1\)–\(4\) eV, placing them firmly in the correlated regime.

Why Standard DFT Fails for Correlated Systems

The failure of LDA/GGA for Mott insulators can be traced to two related deficiencies:

1. Self-interaction error (SIE). As discussed in Chapter 4, approximate XC functionals do not fully cancel the spurious self-repulsion of an electron with itself. For a localised \(d\) orbital occupied by a single electron, the SIE artificially delocalises the orbital, lowering its energy and pushing it into the valence band. The tendency of LDA/GGA to over-delocalise \(d\) and \(f\) electrons is a direct consequence.

2. Missing derivative discontinuity. The exact XC potential has a finite discontinuity at integer electron numbers \(N\):

\begin{equation} \Delta_{\rm xc} = V_{\rm xc}^+(N) - V_{\rm xc}^-(N) > 0, \label{eq:disc} \end{equation}

where \(V_{\rm xc}^\pm\) denote the XC potential just above and below integer \(N\). This discontinuity, combined with the KS band gap, gives the true quasiparticle gap: \(E_g^{\rm true} = E_g^{\rm KS} + \Delta_{\rm xc}\). Semi-local functionals give \(\Delta_{\rm xc} = 0\) exactly, so they systematically underestimate gaps. For Mott insulators the KS gap is zero (they are predicted metallic), and \(\Delta_{\rm xc}\) accounts for the entire physical gap.

DFT+U corrects both deficiencies for the targeted orbitals: the Hubbard \(U\) term penalises fractional occupations and introduces a step-like potential that mimics the derivative discontinuity.

The DFT+U Energy Functional

Occupation Matrix

The central object in DFT+U is the occupation matrix \(n_{mm’}^\sigma\) of the correlated subspace on each atom \(I\):

\begin{equation} n_{mm’}^{I\sigma} = \sum_{\mathbf{k},\nu} f_{\mathbf{k}\nu\sigma} \langle\phi_{\mathbf{k}\nu\sigma}|\hat{P}_{m’}^{I}\rangle \langle\hat{P}_m^{I}|\phi_{\mathbf{k}\nu\sigma}\rangle, \label{eq:occmat} \end{equation}

where \(f_{\mathbf{k}\nu\sigma}\) are KS orbital occupation numbers, \(|\phi_{\mathbf{k}\nu\sigma}\rangle\) are the KS spinor states, and \(\hat{P}_m^I = |\chi_m^I\rangle\langle\chi_m^I|\) is the projector onto the localised orbital \(|\chi_m^I\rangle\) (e.g. the \(d_{m}\) orbital on atom \(I\), with \(m = -l, \ldots, l\)). In PAW, these projectors are the on-site partial waves; in LCAO codes, they are the basis functions themselves.

The occupation matrix is Hermitian (\(n_{mm’}^{I\sigma} = (n_{m’m}^{I\sigma})^*\)) and its trace gives the total occupation of spin channel \(\sigma\) on site \(I\):

\begin{equation} N^{I\sigma} = \sum_m n_{mm}^{I\sigma} = \mathrm{Tr}[\mathbf{n}^{I\sigma}]. \end{equation}

The eigenvalues \({f_\alpha^{I\sigma}}\) of \(\mathbf{n}^{I\sigma}\) lie in \([0,1]\) and are the natural orbital occupation numbers of the correlated subspace.

The Full DFT+U Functional

The total DFT+U energy is:

\begin{equation} E_{\rm DFT+U} = E_{\rm DFT}[\rho] + E_U[{n_{mm’}^{I\sigma}}] - E_{\rm dc}[{N^{I\sigma}}], \label{eq:DFTU-total} \end{equation}

where \(E_{\rm DFT}\) is the standard LDA or GGA energy, \(E_U\) is the Hubbard correction, and \(E_{\rm dc}\) is the double-counting correction — a term that must be subtracted because the Coulomb interaction among the correlated electrons is already partially included in the DFT XC energy and must not be counted twice.

Two Formulations

Liechtenstein Formulation (Full Rotationally Invariant)

The Liechtenstein formulation (Liechtenstein, Anisimov, Zaanen, 1995) uses the full rotationally invariant Coulomb interaction characterised by two parameters: the average on-site Coulomb repulsion \(U\) and the average exchange interaction \(J\):

\begin{equation} E_U^{\rm Lich} = \frac{1}{2}\sum_{I,\sigma}\sum_{{m}} \left[U_{m_1m_3}^{m_2m_4} - \delta_{\sigma\sigma’}J_{m_1m_3}^{m_2m_4}\right] n_{m_1m_2}^{I\sigma},n_{m_3m_4}^{I\sigma’}, \end{equation}

where \(U_{m_1m_3}^{m_2m_4}\) and \(J_{m_1m_3}^{m_2m_4}\) are the screened Slater integrals parametrising the full Coulomb tensor. In practice these are reduced to a small number of radial integrals \(F^0, F^2, F^4\) (for \(d\) orbitals) related to \(U\) and \(J\) by:

\begin{equation} U = F^0, \qquad J = \frac{F^2 + F^4}{14}. \end{equation}

The double-counting correction in the Liechtenstein scheme uses the around mean-field (AMF) form:

\begin{equation} E_{\rm dc}^{\rm AMF} = \frac{U}{2}N(N-1) - \frac{J}{2}\sum_\sigma N^\sigma(N^\sigma - 1), \end{equation}

where \(N = \sum_\sigma N^\sigma\) is the total occupation of the correlated shell.

Dudarev Formulation (Simplified, \(U_{\rm eff}\))

For most practical purposes, the simpler Dudarev formulation (Dudarev et al., 1998) is preferred. It collapses \(U\) and \(J\) into a single effective parameter \(U_{\rm eff} = U - J\) and writes the correction as a penalty on fractional orbital occupations:

\begin{equation} \boxed{E_U^{\rm Dur} = \frac{U_{\rm eff}}{2}\sum_{I,\sigma} \mathrm{Tr}!\left[\mathbf{n}^{I\sigma}!\left(\mathbf{1} - \mathbf{n}^{I\sigma}\right)\right].} \label{eq:Dudarev} \end{equation}

This elegant form has a transparent physical interpretation: the factor \(n_{mm}^{I\sigma}(1 - n_{mm}^{I\sigma})\) vanishes when orbital \(m\) is either fully occupied (\(n = 1\)) or fully empty (\(n = 0\)), and is maximised at \(n = 1/2\) (half occupation). The correction thus adds a penalty for fractional occupations, driving the occupation matrix towards integer eigenvalues. This is precisely the correction for the SIE, which tends to stabilise unphysical non-integer occupations.

The corresponding correction to the KS potential is:

\begin{equation} V_{mm’}^{I\sigma,\rm DFT+U} = \frac{U_{\rm eff}}{2}\left(\delta_{mm’} - 2n_{mm’}^{I\sigma}\right), \label{eq:Vpot} \end{equation}

which shifts occupied orbitals (large \(n_{mm}\)) downward in energy by \(-U_{\rm eff}/2\) and empty orbitals (small \(n_{mm}\)) upward by \(+U_{\rm eff}/2\). The net effect is to open or widen the gap between occupied and empty states in the correlated subspace by approximately \(U_{\rm eff}\), as intended.

The double-counting in Dudarev is implicit: the functional is constructed so that the \(E_U - E_{\rm dc}\) combination reduces exactly to equation \eqref{eq:Dudarev}, with the double-counting absorbed algebraically. No separate choice of double-counting scheme is needed.

Comparison of Formulations and Equivalence Conditions

FeatureLiechtensteinDudarev
Parameters\(U\) and \(J\) separately\(U_{\rm eff} = U - J\) only
Rotational invarianceFullApproximate
Orbital polarisationCapturedNot captured
Double-countingAMF or FLL, explicit choiceImplicit, absorbed
Typical use\(f\)-electron systems, full anisotropy\(d\)-electron oxides, most practical work
Implementation in VASPLDAUTYPE = 1LDAUTYPE = 2 (default)

Conditions under which Dudarev and Liechtenstein are equivalent. The Dudarev functional is obtained from the Liechtenstein functional by making two approximations:

  1. Isotropic (spherically averaged) Coulomb interaction: replace the full Slater integrals \(U_{m_1m_3}^{m_2m_4}\) by their spherical average \(U,\delta_{m_1m_2}\delta_{m_3m_4} - J,\delta_{m_1m_4}\delta_{m_2m_3}\).

  2. FLL double-counting with \(J = 0\) in the double-counting term: when the FLL double-counting correction is applied and the exchange term \(J\) is absorbed into \(U_{\rm eff}\), the Liechtenstein energy reduces exactly to the Dudarev form.

Concretely, inserting the spherically averaged interaction into the Liechtenstein energy and subtracting the FLL double counting gives:

\[ E_U^{\rm Lich,spherical} - E_{\rm dc}^{\rm FLL} = \frac{U-J}{2}\sum_{I,\sigma}\mathrm{Tr}\left[\mathbf{n}^{I\sigma}(\mathbf{1}-\mathbf{n}^{I\sigma})\right] \equiv E_U^{\rm Dur}. \]

The two formulations therefore give identical results when: (a) the occupation matrix \(\mathbf{n}^{I\sigma}\) is diagonal (no off-diagonal orbital coherences), and (b) the Slater integrals satisfy the spherical approximation \(F^2/F^4 = 14/9\) (exact for a hydrogen-like atom, approximately satisfied for \(3d\) transition metals). They differ when orbital polarisation within the correlated shell is important — i.e. when different \(m\) orbitals have different occupations that couple through the full Coulomb tensor. This is the regime where the Liechtenstein formulation is necessary: rare-earth \(f\) systems, orbitally ordered manganites, and systems near the Mott transition where the orbital degree of freedom is active.

The Double-Counting Problem

The double-counting term \(E_{\rm dc}\) in equation \eqref{eq:DFTU-total} has no unique exact form because the DFT XC energy does not separate cleanly into contributions from the correlated subspace and the rest. Two schemes are in common use:

Fully localised limit (FLL), also called the atomic limit:

\begin{equation} E_{\rm dc}^{\rm FLL} = \frac{U}{2}N(N-1) - \frac{J}{2}\left[\frac{N}{2}!\left(\frac{N}{2}-1\right) + \frac{N}{2}!\left(\frac{N}{2}-1\right)\right], \label{eq:FLL} \end{equation}

valid when the correlated orbital is nearly fully occupied or nearly empty (close to the atomic limit). This is appropriate for \(f\)-electron systems and late transition metal oxides.

Around mean-field (AMF):

\begin{equation} E_{\rm dc}^{\rm AMF} = \frac{U}{2}\bar{n}(N-1) - \frac{J}{2}\sum_\sigma\bar{n}^\sigma(N^\sigma - 1), \end{equation}

where \(\bar{n} = N/(2l+1)\) is the average occupation per orbital. This is appropriate when the occupations are close to the average (near the itinerant limit).

The FLL correction gives a larger energy shift for occupied orbitals and is the standard choice in most codes. However, the double-counting error is a fundamental limitation of DFT+U: since it cannot be derived from first principles, it introduces an uncontrolled approximation that depends on how much of \(U\) is already captured in the DFT functional. This is one motivation for methods like DFT+DMFT, which treat the correlated subspace exactly.

Determining \(U\)

The Hubbard \(U\) parameter is not a unique material property — it depends on the DFT functional, the choice of localised basis, and the degree of screening in the material. Three approaches are used in practice.

Empirical Fitting

The simplest approach is to choose \(U_{\rm eff}\) to reproduce an experimentally known quantity — typically the band gap, lattice constant, or magnetic moment. For NiO, \(U_{\rm eff} \approx 5\)–\(6\) eV reproduces the experimental gap of \(\sim 4\) eV; for FeO, \(U_{\rm eff} \approx 4\) eV. Empirical \(U\) values are system-specific and should not be transferred between materials without validation.

A compilation of commonly used values:

Material / IonOrbital\(U_{\rm eff}\) (eV)Fitted property
Fe\(^{2+/3+}\) in oxides\(3d\)\(3.5\)–\(5.0\)Gap, moment
Co\(^{2+/3+}\)\(3d\)\(3.0\)–\(5.5\)Gap
Ni\(^{2+}\)\(3d\)\(5.0\)–\(6.5\)Gap
Mn\(^{2+/3+/4+}\)\(3d\)\(3.0\)–\(4.5\)Gap, structure
Cu\(^{2+}\)\(3d\)\(6.0\)–\(9.0\)Gap
Ce\(^{3+/4+}\)\(4f\)\(5.0\)–\(7.0\)\(f\) occupancy
U\(^{4+/5+}\)\(5f\)\(2.0\)–\(4.5\)Structure, gap

Linear Response (Cococcioni–de Gironcoli)

A first-principles approach computes \(U\) from the linear response of the occupation matrix to a localised perturbation potential \(\alpha^I\) applied to site \(I\) (Cococcioni and de Gironcoli, 2005):

\begin{equation} U^I = \left(\chi_0^{-1} - \chi^{-1}\right)^{II}, \label{eq:LR-U} \end{equation}

where \(\chi_0^{II} = \partial N^I/\partial\alpha^I|_{\rm bare}\) is the bare (non-self-consistent) response and \(\chi^{II} = \partial N^I/\partial\alpha^I|_{\rm SCF}\) is the fully self-consistent response. The difference captures the screening of the bare Coulomb interaction by all other electrons. The parameter \(U\) computed this way is self-consistent with the chosen DFT functional and projector, requiring no experimental input.

The procedure is: (1) apply a small perturbation \(\pm\alpha\) to the Hubbard projector of site \(I\); (2) compute the SCF and non-SCF changes in \(N^I\); (3) extract \(U\) from equation \eqref{eq:LR-U}. This is available as a post-processing step in Quantum ESPRESSO (hp.x) and can be scripted in VASP.

Limitations: Linear response \(U\) values tend to be smaller than empirically fitted values because they include screening by itinerant valence electrons. The result also depends on the choice of projectors and may differ between PAW and LCAO implementations.

Constrained Random Phase Approximation (cRPA)

The most rigorous approach is the constrained RPA (Aryasetiawan et al., 2004), which computes the partially screened Coulomb interaction \(W(\omega)\) by excluding the screening from the correlated subspace itself (to avoid double counting):

\begin{equation} U(\omega) = \langle\chi_m\chi_{m’}|W_r(\omega)|\chi_m\chi_{m’}\rangle, \end{equation}

where \(W_r\) is the RPA-screened interaction with the polarisation of the correlated \(d/f\) subspace removed. cRPA gives frequency-dependent \(U(\omega)\); the static limit \(U(0)\) is used in DFT+U, while the full frequency dependence feeds into DFT+DMFT. cRPA requires a GW-like calculation and is available in VASP (ALGO = CRPA) and the WIEN2k+DMFT interface.

Effect of \(U\) on the Electronic Structure

The potential correction \eqref{eq:Vpot} produces several characteristic changes to the electronic structure that can be verified against experiment:

Band gap opening. Occupied \(d/f\) states are shifted down by \(\sim U_{\rm eff}/2\) and empty states up by \(\sim U_{\rm eff}/2\), opening a gap of approximately \(U_{\rm eff}\) in the correlated subspace. For a Mott insulator like NiO, DFT gives a metallic state while DFT+U with \(U_{\rm eff} \approx 5\) eV correctly gives an insulating gap \(\sim 3\)–\(4\) eV.

Orbital polarisation. The integer-driving nature of DFT+U can lift orbital degeneracy via orbital ordering — a spontaneous symmetry breaking in which one particular \(d\) orbital (e.g. \(d_{z^2}\) vs. \(d_{x^2-y^2}\)) becomes preferentially occupied. This is relevant for manganites and other Jahn–Teller active systems.

Magnetic moments. By localising the \(d/f\) electrons, DFT+U increases the local spin moment towards the ionic limit. For Fe\(^{3+}\) in an oxide (half-filled \(3d\), \(S = 5/2\)), GGA gives moments of \(\sim 3.5,\mu_B\); DFT+U with appropriate \(U\) gives \(\sim 4.2,\mu_B\), closer to the ionic value of \(5.0,\mu_B\).

Exchange coupling \(J\). The exchange coupling constants extracted from DFT+U are generally more reliable than from GGA for Mott insulators, because the localised occupations improve the description of the superexchange mechanism. For half-metallic Heusler alloys, DFT+U may be needed when one or more magnetic sublattice contains strongly correlated \(d\) states.

Lattice parameters and forces. The occupation-dependent potential modifies the forces on ions, generally increasing bond lengths for correlated oxides by \(\sim 0.5\)–\(2%\) towards experiment.

Multiple Correlated Sites and Inter-Site \(U\)

When a material contains more than one type of correlated atom (e.g. both Fe and Co in a double perovskite, or both \(3d\) transition metal and \(4f\) rare earth), separate \(U\) values must be applied to each:

SCF Convergence and Density Mixing

The Part II overview described the SCF cycle as a sequence of six steps and catalogued the common failure modes. This chapter addresses the most fundamental of those steps: the density update — how the output density from one KS cycle is combined with the input density to produce the next iterate, and under what conditions this process converges.

Consider a concrete example. A spin-polarised calculation on a \(4 \times 4 \times 4\) supercell of bcc Fe (\(128\) atoms, \(\sim 50,000\) plane waves per \(\mathbf{k}\)-point) will not converge with naive density iteration at any mixing parameter. The total energy oscillates indefinitely, swinging by several eV per iteration, because long-wavelength charge fluctuations slosh back and forth across the metallic Fermi surface with no restoring force. The same system converges smoothly in \(\sim 25\) iterations once Kerker preconditioning and Pulay mixing are applied. The difference is not a matter of patience — it is a qualitative change from divergence to convergence, rooted in the dielectric response of the electron gas.

We begin by recasting the SCF cycle as a fixed-point problem and deriving the convergence condition from the Jacobian of the KS map. The charge-sloshing instability in metals emerges naturally from the Lindhard dielectric function. With the problem precisely formulated, we derive the main mixing schemes — linear, Kerker-preconditioned, Pulay/DIIS, and Broyden — as progressively more sophisticated strategies for reaching the fixed point.

The Self-Consistency Problem Revisited

The Fixed-Point Formulation

The SCF cycle of Chapter 3 (Section 3.3) consists of three computational steps: constructing \(V_{\rm eff}[\rho]\) from equation (3.xx), diagonalising \(\hat{h}{\rm KS}\), and assembling \(\rho{\rm out} = \sum_i f_i |\phi_i|^2\). We now recast this cycle as a fixed-point problem whose convergence properties can be analysed rigorously. Define the map \(F\) that takes an input density \(\rho_{\rm in}\) and returns the output density obtained by one full KS cycle:

\begin{equation}     \rho_{\rm in} \;\xrightarrow{V_{\rm eff}[\rho_{\rm in}]}\;     \hat{h}_{\rm KS} \;\xrightarrow{\text{diagonalise}}\;     \{\phi_i, \epsilon_i\} \;\xrightarrow{\sum_i f_i|\phi_i|^2}\;     \rho_{\rm out} = F[\rho_{\rm in}].     \label{eq:F-map} \end{equation}

Self-consistency means finding the density \(\rho^*\) that is a fixed point of \(F\):

\begin{equation}     \boxed{F[\rho^*] = \rho^*.}     \label{eq:fixed-point} \end{equation}

The residual at iteration \(n\) measures the distance from self-consistency:

\begin{equation}     R^{(n)} = F[\rho_{\rm in}^{(n)}] - \rho_{\rm in}^{(n)} = \rho_{\rm out}^{(n)} - \rho_{\rm in}^{(n)}.     \label{eq:residual} \end{equation}

All mixing schemes can be understood as strategies to drive \(|R^{(n)}| \to 0\) as efficiently as possible. The simplest strategy — feeding the output density directly back as the next input, \(\rho_{\rm in}^{(n+1)} = \rho_{\rm out}^{(n)}\) — corresponds to a naive fixed-point iteration. Whether this converges depends on the properties of \(F\).

The Jacobian and Convergence Conditions

Linearising \(F\) around the fixed point \(\rho^*\), the residual evolves as:

\begin{equation}     R^{(n+1)} \approx J \cdot R^{(n)}, \qquad J_{ij} = \left.\frac{\delta F[\rho]_i}{\delta \rho_j}\right|_{\rho = \rho^*},     \label{eq:jacobian} \end{equation}

where \(J\) is the Jacobian of the map \(F\), and the indices \(i, j\) run over the discretisation of the density (e.g., plane-wave coefficients \(\rho(\mathbf{G})\)). The iteration converges if and only if the spectral radius of \(J\) is less than unity:

\begin{equation}     \boxed{|\lambda_{\rm max}(J)| \lt 1,}     \label{eq:convergence-condition} \end{equation}

where \(\lambda_{\rm max}\) is the eigenvalue of \(J\) with the largest absolute value. When this condition is violated, the iteration diverges: small perturbations in the density are amplified rather than damped by successive KS cycles.

Charge Sloshing: Why Metals Are Difficult

The physical content of the Jacobian becomes transparent in reciprocal space. For a homogeneous electron gas, the density response to a perturbation of wavevector \(\mathbf{G}\) is governed by the Lindhard dielectric function:

\begin{equation}     \varepsilon(\mathbf{G}) = 1 - \frac{4\pi e^2}{G^2}\chi_0(\mathbf{G}),     \label{eq:lindhard} \end{equation}

where \(\chi_0(\mathbf{G})\) is the non-interacting density–density response function of the KS system. The KS map \(F\) relates the output density perturbation to the input perturbation through the inverse dielectric function: \(\delta\rho_{\rm out}(\mathbf{G}) = \varepsilon^{-1}(\mathbf{G}),\delta\rho_{\rm in}(\mathbf{G})\). For the Jacobian, this means:

\begin{equation}     J(\mathbf{G}) \approx 1 - \varepsilon^{-1}(\mathbf{G}).     \label{eq:J-dielectric} \end{equation}

For an insulator, there is a finite gap \(\Delta\) in the spectrum. The response function \(\chi_0\) remains bounded as \(G \to 0\), the dielectric constant \(\varepsilon(0)\) is finite (typically \(5\)–\(20\)), and all eigenvalues of \(J\) satisfy \(|\lambda| \lt 1\). The naive iteration converges, often rapidly.

For a metal, the situation is qualitatively different. The density of states at the Fermi level is finite, so \(\chi_0(\mathbf{G}) \to -N(\epsilon_F)\) as \(G \to 0\) (the static Lindhard limit). The dielectric function diverges as:

\begin{equation}     \varepsilon(\mathbf{G}) \approx 1 + \frac{k_{\rm TF}^2}{G^2}, \qquad G \to 0,     \label{eq:eps-metal} \end{equation}

where \(k_{\rm TF} = \sqrt{4\pi e^2 N(\epsilon_F)}\) is the Thomas–Fermi screening wavevector. This means \(\varepsilon^{-1}(\mathbf{G}) \to 0\) for small \(G\), and from equation \eqref{eq:J-dielectric}:

\begin{equation}     J(\mathbf{G}) \to 1 \quad \text{as} \quad G \to 0 \quad \text{(metals)}.     \label{eq:charge-sloshing} \end{equation}

The Jacobian has eigenvalues arbitrarily close to \(1\) for long-wavelength density fluctuations. These modes are neither damped nor amplified — they slosh back and forth between iterations without converging. This is the charge-sloshing instability, the dominant convergence problem in metallic DFT calculations. Physically, long-wavelength charge fluctuations in a metal cost very little energy (they are screened only weakly at long range), so the SCF cycle has no restoring force to drive them toward the self-consistent solution.

The key insight is that the convergence difficulty is wavevector-dependent: small-\(G\) components of the density converge slowly or not at all, while large-\(G\) components (short wavelength, within the screening cloud) converge quickly. An effective mixing scheme must therefore treat different Fourier components differently.

To make this concrete: for fcc Al (a simple metal with \(k_{\rm TF} \approx 1.6,\)Å\(^{-1}\)), the Jacobian eigenvalue for the smallest accessible \(\mathbf{G}\) in a conventional cell is \(\lambda \approx 0.97\). With simple mixing at \(\alpha = 0.3\), the convergence rate for this mode is \(|1 - 0.3(1 - 0.97)| = 0.991\) — meaning each iteration reduces the error by less than \(1%\). Reaching \(10^{-6}\) convergence in this mode alone would require \(\sim 1400\) iterations. For bcc Fe with a denser Fermi surface, the situation is worse. By contrast, an insulator like MgO has \(\varepsilon(0) \approx 10\), giving \(\lambda \approx 0.9\) for the slowest mode and convergence in \(\sim 50\) iterations even with simple mixing — and much faster with Pulay extrapolation.

Density Mixing Schemes

With the fixed-point problem and its convergence difficulties established, we now derive the main mixing strategies used in practice. Each can be understood as a progressively more sophisticated approach to the same problem: find the fixed point \(\rho^* = F[\rho^*]\) given a sequence of input–output pairs \((\rho_{\rm in}^{(n)}, \rho_{\rm out}^{(n)})\).

Simple (Linear) Mixing

The simplest improvement over naive iteration is to mix the input and output densities:

\begin{equation}     \boxed{\rho_{\rm in}^{(n+1)} = \rho_{\rm in}^{(n)} + \alpha\, R^{(n)},}     \label{eq:linear-mixing} \end{equation}

where \(\alpha \in (0, 1]\) is the mixing parameter and \(R^{(n)} = \rho_{\rm out}^{(n)} - \rho_{\rm in}^{(n)}\) is the residual. Setting \(\alpha = 1\) recovers naive iteration; smaller \(\alpha\) damps the update.

Linearising around the fixed point, the residual evolves as:

\begin{equation}     R^{(n+1)} = [1 - \alpha(1 - J)]\,R^{(n)}.     \label{eq:linear-residual} \end{equation}

The effective iteration matrix is \(M = 1 - \alpha(1 - J)\), with eigenvalues \(\mu = 1 - \alpha(1 - \lambda)\) where \(\lambda\) is an eigenvalue of \(J\). Convergence requires \(|\mu| \lt 1\) for all eigenvalues, which gives the condition:

\begin{equation}     \alpha \lt \frac{2}{1 + \lambda_{\rm max}}.     \label{eq:alpha-bound} \end{equation}

For a metal with \(\lambda_{\rm max} \to 1\) (the charge-sloshing modes), this bound gives \(\alpha \lt 1\) — but the convergence rate for these modes is \(|1 - \alpha(1 - \lambda)| \approx 1 - \alpha(1 - \lambda_{\rm max})\), which is very close to \(1\) for small \(\alpha\). Convergence is guaranteed but extremely slow: the long-wavelength modes creep toward self-consistency over hundreds of iterations.

Typical values: \(\alpha \sim 0.01\)–\(0.1\) for metals and magnetic systems; \(\alpha \sim 0.3\)–\(0.7\) for insulators. The smaller values needed for metals reflect the charge-sloshing problem directly.

Kerker Preconditioning

The analysis of charge sloshing reveals that the convergence problem is concentrated at small \(\mathbf{G}\): long-wavelength components of the residual are barely damped, while short-wavelength components converge quickly. The Kerker preconditioner (Kerker, 1981) addresses this by applying a \(\mathbf{G}\)-dependent damping to the residual before mixing.

The idea is to approximate the inverse dielectric function and use it to precondition the residual. From equation \eqref{eq:eps-metal}, the metallic dielectric function at small \(G\) is dominated by the Thomas–Fermi screening term \(k_{\rm TF}^2/G^2\). The Kerker preconditioner multiplies the residual in reciprocal space by a filter that suppresses the problematic small-\(G\) components:

\begin{equation}     \boxed{\tilde{R}^{(n)}(\mathbf{G}) = \frac{G^2}{G^2 + k_0^2}\,R^{(n)}(\mathbf{G}),}     \label{eq:kerker} \end{equation}

where \(k_0\) is a parameter with dimensions of inverse length, typically chosen close to \(k_{\rm TF}\). The preconditioned residual \(\tilde{R}\) is then used in place of \(R\) in the mixing formula:

\begin{equation}     \rho_{\rm in}^{(n+1)} = \rho_{\rm in}^{(n)} + \alpha\,\tilde{R}^{(n)}.     \label{eq:kerker-mixing} \end{equation}

The physical interpretation is immediate: the factor \(G^2/(G^2 + k_0^2)\) vanishes as \(G \to 0\) and approaches unity for \(G \gg k_0\). It therefore suppresses the long-wavelength charge sloshing modes while leaving the well-behaved short-wavelength modes unchanged. At intermediate wavelengths (\(G \sim k_0\)), the suppression is partial — the preconditioner interpolates smoothly between complete damping and no damping.

This is equivalent to multiplying the residual by an approximate inverse dielectric function. If \(k_0 = k_{\rm TF}\), the preconditioner matches the Thomas–Fermi screening exactly in the long-wavelength limit, making the preconditioned iteration matrix nearly independent of \(G\) and restoring uniform convergence across all wavevectors.

Computationally, the Kerker preconditioner is very cheap: one forward FFT to go to reciprocal space, a pointwise multiplication by \(G^2/(G^2 + k_0^2)\), and one inverse FFT to return to real space. No additional KS cycles or matrix operations are required.

Limitation: Kerker preconditioning is designed for metallic screening and can slow down convergence in insulators, where the long-wavelength response is already well-behaved. For insulators, the factor \(G^2/(G^2 + k_0^2)\) unnecessarily suppresses the update at small \(G\), requiring more iterations to reach self-consistency. Most codes therefore apply Kerker preconditioning only when the system is metallic or when charge sloshing is detected.

Pulay/DIIS Mixing

Linear mixing, even with Kerker preconditioning, uses only information from the current iteration to construct the update. This is wasteful: by iteration \(n\), the SCF cycle has accumulated \(n\) input–output pairs, each containing information about the map \(F\) near the fixed point. Pulay mixing (Pulay, 1980), also known as Direct Inversion in the Iterative Subspace (DIIS), exploits this history to construct a much better estimate of the self-consistent density.

The idea is to form a linear combination of the \(m\) most recent input densities and find the coefficients that minimise the norm of the corresponding residual. Suppose we have stored the last \(m\) residuals \({R^{(n)}, R^{(n-1)}, \ldots, R^{(n-m+1)}}\) and the corresponding input densities \({\rho_{\rm in}^{(n)}, \ldots, \rho_{\rm in}^{(n-m+1)}}\). Construct a trial input density as:

\begin{equation}     \bar{\rho}_{\rm in} = \sum_{i=1}^{m} c_i\, \rho_{\rm in}^{(n-m+i)},     \label{eq:pulay-trial} \end{equation}

with the constraint \(\sum_i c_i = 1\) (so that \(\bar{\rho}\) integrates to \(N\) electrons). If the map \(F\) is approximately linear near the fixed point, the residual of this trial density is approximately:

\begin{equation}     \bar{R} \approx \sum_{i=1}^{m} c_i\, R^{(n-m+i)}.     \label{eq:pulay-residual} \end{equation}

DIIS minimises \(|\bar{R}|^2 = \sum_{ij} c_i c_j \langle R^{(i)} | R^{(j)} \rangle\) subject to the constraint \(\sum_i c_i = 1\). Introducing a Lagrange multiplier \(\mu\), the stationarity conditions give the \((m+1) \times (m+1)\) linear system:

\begin{equation}     \begin{pmatrix}         B_{11} & B_{12} & \cdots & B_{1m} & 1 \\         B_{21} & B_{22} & \cdots & B_{2m} & 1 \\         \vdots & \vdots & \ddots & \vdots & \vdots \\         B_{m1} & B_{m2} & \cdots & B_{mm} & 1 \\         1 & 1 & \cdots & 1 & 0     \end{pmatrix}     \begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_m \\ -\mu \end{pmatrix}     =     \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ 1 \end{pmatrix},     \label{eq:DIIS-system} \end{equation}

where the DIIS matrix is:

\begin{equation}     \boxed{B_{ij} = \langle R^{(i)} | R^{(j)} \rangle = \int R^{(i)}(\mathbf{r})\, R^{(j)}(\mathbf{r})\, d\mathbf{r}.}     \label{eq:DIIS-matrix} \end{equation}

This is a small \(m \times m\) system (typically \(m = 4\)–\(8\)) and is trivially cheap to solve. The choice of subspace size \(m\) involves a trade-off: too few stored residuals (\(m \lt 4\)) and the extrapolation lacks sufficient information to capture the curvature of the SCF map; too many (\(m \gt 10\)–\(12\)) and the stored residuals become nearly linearly dependent, causing the DIIS matrix \(B_{ij}\) to become ill-conditioned and the coefficients \(c_i\) to develop large, oscillatory values. In practice, \(m = 6\)–\(8\) is a robust default for most systems. The optimal coefficients \({c_i}\) are then used to form the next input density:

\begin{equation}     \rho_{\rm in}^{(n+1)} = \sum_{i=1}^{m} c_i \left[\rho_{\rm in}^{(n-m+i)} + \alpha\, R^{(n-m+i)}\right],     \label{eq:pulay-update} \end{equation}

where the mixing parameter \(\alpha\) is still present but its precise value is much less critical than in simple mixing — the DIIS extrapolation handles the difficult modes.

Physical interpretation: DIIS effectively extrapolates from the sequence of residuals to estimate where the residual would vanish. If the residuals were evolving linearly, the DIIS solution would give the exact fixed point in \(m\) steps. In practice the map \(F\) is nonlinear, so the extrapolation is approximate, but it converges superlinearly near the solution — much faster than the geometric convergence of linear mixing.

Practical note: Pulay mixing combined with Kerker preconditioning is the default in most modern plane-wave codes. The Kerker filter handles the wavevector-dependent convergence rates, while DIIS handles the multi-step extrapolation. Together, they typically converge metallic systems in \(15\)–\(30\) SCF iterations.

Broyden Mixing

An alternative to DIIS is to build an approximate inverse Jacobian from the accumulated input–output history, then use it to compute a quasi-Newton update. This is Broyden mixing (Broyden, 1965; adapted for DFT by Vanderbilt and Louie, 1984, and Johnson, 1988).

The Newton step for the fixed-point equation \(\rho - F[\rho] = 0\) would be:

\begin{equation}     \rho^{(n+1)} = \rho^{(n)} - (1 - J)^{-1} R^{(n)} = \rho^{(n)} + G\, R^{(n)},     \label{eq:newton-step} \end{equation}

where \(G = -(1 - J)^{-1}\) is the inverse of the “SCF Jacobian” \((1 - J)\). Computing \(G\) exactly would require the full Jacobian \(J\), which is an \(N_{\rm PW} \times N_{\rm PW}\) matrix — completely intractable. The Broyden method builds an approximation \(G_n\) to \(G\) by requiring it to satisfy the secant condition: the approximate Jacobian must reproduce the last observed input–residual relationship exactly.

Define the changes between successive iterations:

\begin{equation}     \Delta\rho^{(n)} = \rho_{\rm in}^{(n)} - \rho_{\rm in}^{(n-1)}, \qquad     \Delta R^{(n)} = R^{(n)} - R^{(n-1)}.     \label{eq:broyden-deltas} \end{equation}

The secant condition requires \(G_n \Delta R^{(n)} = \Delta\rho^{(n)}\). Among all matrices satisfying this condition, the Broyden rank-1 update chooses the one closest to \(G_{n-1}\) in the Frobenius norm:

\begin{equation}     \boxed{G_n = G_{n-1} + \frac{(\Delta\rho^{(n)} - G_{n-1}\,\Delta R^{(n)})\,(\Delta R^{(n)})^T}{(\Delta R^{(n)})^T \Delta R^{(n)}}.}     \label{eq:broyden-update} \end{equation}

This is a rank-1 correction to the previous approximate Jacobian inverse. Starting from an initial guess \(G_0 = -\alpha, \mathbf{1}\) (equivalent to simple mixing), each iteration adds one rank-1 correction. After \(n\) iterations, \(G_n\) has rank at most \(n\) plus the rank of \(G_0\), and the update step is:

\begin{equation}     \rho_{\rm in}^{(n+1)} = \rho_{\rm in}^{(n)} + G_n\, R^{(n)}.     \label{eq:broyden-step} \end{equation}

Storage: The rank-1 structure means that \(G_n\) never needs to be stored as a full matrix. Instead, one stores the \(n\) vectors \({\Delta\rho^{(k)}, \Delta R^{(k)}}\) and applies \(G_n\) via a sequence of inner products and vector additions. For \(m\) stored iterations, this requires \(\mathcal{O}(m \cdot N_{\rm PW})\) storage and \(\mathcal{O}(m \cdot N_{\rm PW})\) operations per update — negligible compared to the KS diagonalisation.

Broyden’s second method (the “good Broyden” update) modifies the update formula to ensure that \(G_n\) remains symmetric. The physical motivation is that the true inverse Jacobian \(G\) of the SCF map inherits the symmetry of the density–density response function \(\chi\), which satisfies \(\chi(\mathbf{r}, \mathbf{r}‘) = \chi(\mathbf{r}’, \mathbf{r})\) by time-reversal symmetry. A non-symmetric \(G_n\) can develop spurious antisymmetric components that couple charge and magnetisation modes or mix unrelated \(\mathbf{G}\)-vectors, leading to erratic convergence. The symmetric Broyden-2 update avoids this:

\begin{equation}     G_n = G_{n-1} + \frac{(\Delta\rho^{(n)} - G_{n-1}\,\Delta R^{(n)})\,(\Delta\rho^{(n)})^T G_{n-1}}{(\Delta\rho^{(n)})^T G_{n-1}\,\Delta R^{(n)}}.     \label{eq:broyden2} \end{equation}

In practice, Broyden-2 produces smoother convergence trajectories than Broyden-1, particularly for spin-polarised systems where the charge and magnetisation densities are mixed simultaneously.

Modified Broyden with metric: In practice, the inner products in the Broyden update can be weighted by a metric \(w(\mathbf{G})\) in reciprocal space, chosen to downweight high-\(G\) components (which are noisy and less important for convergence). This modified Broyden method with Kerker-like weighting is used as the default mixing scheme in several major plane-wave codes.

Convergence: Broyden mixing converges superlinearly, similar to DIIS. The two methods are closely related mathematically: both use a subspace of previous iterations to improve the update, and in certain limits they produce identical iterates. In practice, Pulay/DIIS is somewhat more robust for difficult cases (magnetic metals, near-degenerate ground states), while Broyden can be slightly faster when convergence is smooth.

Practical Guidance

The choice of mixing scheme depends on the system type. The following table summarises the recommended settings:

System typeRecommended schemeTypical parametersTypical SCF iterations
Simple insulator (Si, NaCl)Linear or Pulay\(\alpha = 0.4\)–\(0.7\); no Kerker needed\(10\)–\(15\)
Semiconductor with narrow gapPulay + mild Kerker\(\alpha = 0.2\)–\(0.4\); \(k_0 \sim 1.0\) Å\(^{-1}\)\(15\)–\(25\)
Simple metal (Al, Cu)Pulay + Kerker\(\alpha = 0.05\)–\(0.2\); \(k_0 \sim 1.5\) Å\(^{-1}\)\(15\)–\(30\)
Magnetic metal (Fe, Ni)Pulay + Kerker or Broyden\(\alpha = 0.01\)–\(0.1\); \(k_0 \sim 1.5\) Å\(^{-1}\)\(25\)–\(50\)
Antiferromagnetic insulator (NiO)Pulay + Kerker\(\alpha = 0.01\)–\(0.05\); tighten \(k_0\)\(30\)–\(60\)
Slab / surface (metal)Pulay + Kerker\(\alpha = 0.02\)–\(0.1\); increase subspace\(30\)–\(60\)
Charged defect in large cellPulay + Kerker\(\alpha = 0.02\)–\(0.1\); non-DIIS initial steps\(40\)–\(80\)

Signs of convergence problems in the SCF output:

Oscillating total energy. The energy alternates between high and low values without trending downward. This is the hallmark of charge sloshing — reduce the mixing parameter \(\alpha\) and ensure Kerker preconditioning is active.

Residual decreasing then stalling. The SCF converges rapidly for the first \(5\)–\(10\) iterations, then the residual plateaus. This often indicates that the DIIS subspace is contaminated by early, poor-quality residuals. Allow a few initial iterations with simple mixing to bring the density into the basin of convergence before DIIS takes over.

Spin channels oscillating out of phase. In spin-polarised calculations, the up and down spin densities may trade magnitude between iterations. This signals an antiferromagnetic instability or an incorrect initial magnetic configuration. Fix the initial moments and consider using a more robust diagonalisation algorithm (Section 8.4) for the first few iterations to stabilise the spin order.

Code Notes: VASP and Quantum ESPRESSO Parameters

VASP:

ParameterControlsRecommended range
AMIXLinear mixing parameter \(\alpha\) for charge density\(0.02\)–\(0.8\) (default \(0.4\))
BMIXKerker wavevector \(k_0\) (in units of \(2\pi/a\))\(0.1\)–\(3.0\) (default \(1.0\))
AMIX_MAGMixing parameter for magnetisation densityOften \(2\)–\(4 \times\) AMIX
BMIX_MAGKerker parameter for magnetisationOften \(\sim\) BMIX
MAXMIXNumber of previous steps stored for Pulay/Broyden\(20\)–\(60\) (default \(-45\))
IMIXMixing scheme: \(0\) = no mixing, \(1\) = simple, \(4\) = BroydenDefault \(4\)
NELMDLNumber of initial non-DIIS steps (simple mixing)\(-5\) to \(-12\) (negative = auto)

Negative MAXMIX indicates Broyden mixing with \(|\)MAXMIX\(|\) stored steps. Positive values select Pulay/DIIS. The sign convention is code-specific; consult the VASP manual for the version in use.

Note on ALGO: The ALGO tag primarily controls the diagonalisation method (Section 8.4), but it also affects mixing behaviour indirectly. ALGO = All forces a full Davidson diagonalisation at every SCF step, which is more robust for difficult cases — particularly spin-polarised metals where the RMM-DIIS default (ALGO = Fast) can lose track of the correct eigenstate ordering, causing spin channels to oscillate. For systems that fail to converge with default settings, switching to ALGO = All for the first \(10\)–\(20\) SCF steps (via NELMDL) before reverting to ALGO = Fast is a common and effective strategy.

Quantum ESPRESSO:

ParameterControlsRecommended range
mixing_betaMixing parameter \(\alpha\)\(0.1\)–\(0.7\) (default \(0.7\))
mixing_mode'plain', 'TF' (Thomas–Fermi/Kerker), 'local-TF''local-TF' for metals
mixing_ndimNumber of previous iterations for Broyden/DIIS\(4\)–\(12\) (default \(8\))
electron_maxstepMaximum number of SCF iterations\(100\)–\(200\)

In QE, mixing_mode = 'local-TF' applies a local Thomas–Fermi preconditioning that is analogous to the Kerker preconditioner in reciprocal space. It is the recommended choice for metallic or small-gap systems.

Further Reading
  • Pulay mixing (DIIS): P. Pulay, Convergence acceleration of iterative sequences. The case   of SCF iteration, Chem. Phys. Lett. 73, 393 (1980). The paper that introduced the DIIS   method in the context of Hartree–Fock calculations; the adaptation to DFT density mixing is   straightforward.

  • Kerker preconditioning: G. P. Kerker, Efficient iteration scheme for self-consistent   pseudopotential calculations, Phys. Rev. B 23, 3082 (1981). Derives the \(G^2/(G^2+k_0^2)\)   preconditioner from the Thomas–Fermi dielectric function.

  • Broyden mixing: C. G. Broyden, A class of methods for solving nonlinear simultaneous   equations, Math. Comp. 19, 577 (1965). The general quasi-Newton framework. The adaptation   to DFT is described in: D. D. Johnson, Modified Broyden’s method for accelerating convergence   in self-consistent calculations, Phys. Rev. B 38, 12807 (1988).

  • Vanderbilt–Louie scheme: D. Vanderbilt and S. G. Louie, Total energies of diamond (111)   surface reconstructions by a linear combination of bulk bands, Phys. Rev. B 30, 6118   (1984). Early application of Broyden-type mixing to plane-wave DFT.

  • VASP implementation details: G. Kresse and J. Furthmüller, Efficient iterative schemes   for ab initio total-energy calculations using a plane-wave basis set, Phys. Rev. B 54,   11169 (1996). Sections III–IV describe the mixing and diagonalisation machinery used in VASP.

  • General review of SCF convergence: The treatment in R. M. Martin, Electronic Structure:   Basic Theory and Practical Methods (Cambridge, 2004), Chapter 9, gives an accessible overview.   A more mathematical treatment is in E. Cancès, SCF algorithms for Hartree–Fock electronic   calculations, Lecture Notes in Chemistry 74, 17 (2000).

Outlook

The mixing schemes developed in this chapter solve the outer SCF convergence problem: given a sequence of input–output density pairs, they find the self-consistent fixed point efficiently even for metallic systems with charge-sloshing instabilities. But the SCF loop contains two other pieces of numerical machinery that have been treated as black boxes so far: the eigenvalue solver that produces the KS orbitals and eigenvalues at each iteration, and the occupation scheme that assigns electrons to states near the Fermi level. Both can fail independently of the mixing, and both introduce their own convergence parameters.

The next chapter develops these two topics. Section 9.1 treats smearing and partial occupancies — the regularisation of Brillouin zone integrals over the discontinuous Fermi surface. Section 10.2 derives the iterative diagonalisation methods (Davidson and RMM-DIIS) that make the KS eigenvalue problem tractable for large systems. Section 10.3 ties everything together with convergence diagnostics and a worked example on bcc Fe.

Brillouin Zone Integration

Chapter 8 addressed the outer loop of the SCF cycle: how to update the density from one iteration to the next so that the fixed-point equation \(F[\rho^] = \rho^\) is satisfied efficiently. This chapter addresses the two pieces of numerical machinery that operate within each SCF step: how the KS eigenvalue problem is solved (Section 9.2), and how the resulting eigenvalues are used to assign occupations and perform Brillouin zone integrals (Section 9.1). We treat BZ integration first because the choice of smearing scheme affects the effective smoothness of the energy landscape seen by the eigenvalue solver.

Smearing and Partial Occupancies

The Problem with Sharp Occupation

The KS total energy involves a sum over occupied states weighted by occupation numbers \(f_i\). For an insulator at zero temperature, every state below the gap has \(f_i = 1\) and every state above has \(f_i = 0\). The Brillouin zone (BZ) integral for any smooth quantity — the total energy, charge density, or density of states — converges exponentially with the density of the \(\mathbf{k}\)-point mesh, because the integrand is an analytic function of \(\mathbf{k}\) throughout the BZ. A \(6 \times 6 \times 6\) Monkhorst–Pack grid is often sufficient for a well-converged insulator calculation.

For a metal, the situation is qualitatively different. The occupation function is a step function at the Fermi energy:

\begin{equation}     f_i(\mathbf{k}) = \theta(\epsilon_F - \epsilon_i(\mathbf{k})),     \label{eq:step-occupation} \end{equation}

which is discontinuous at the Fermi surface — the set of \(\mathbf{k}\)-points where \(\epsilon_i(\mathbf{k}) = \epsilon_F\). The BZ integral of a function with a discontinuity converges only as \(\mathcal{O}(1/N_k)\) with the number of \(\mathbf{k}\)-points \(N_k\), and the approach to convergence is oscillatory (the Gibbs phenomenon). In practice, this means that a metallic calculation with sharp occupations requires an extremely dense \(\mathbf{k}\)-mesh — often \(20 \times 20 \times 20\) or finer — to achieve meV-level convergence, making the calculation prohibitively expensive.

The physical origin is clear: the Fermi surface is a measure-zero set in the BZ, but the energy integral samples it through a finite \(\mathbf{k}\)-mesh. States that are just above or just below \(\epsilon_F\) make discontinuous contributions to the energy as the mesh is refined — a state that is occupied on one mesh may become unoccupied on a slightly denser mesh, causing the total energy to jump.

All smearing methods address this by replacing the discontinuous step function with a smooth approximation, effectively broadening the Fermi surface over an energy window of width \(\sigma\). This converts the BZ integrand from a discontinuous to a smooth (or even analytic) function, restoring rapid convergence with \(\mathbf{k}\)-point density — at the cost of introducing a systematic error that must be controlled or corrected.

Gaussian Smearing

The simplest approach replaces the step function with a smooth occupation based on the complementary error function:

\begin{equation}     \boxed{f_i = \frac{1}{2}\,\mathrm{erfc}\!\left(\frac{\epsilon_i - \epsilon_F}{\sigma}\right),}     \label{eq:gaussian-occ} \end{equation}

where \(\sigma \gt 0\) is the smearing width and \(\mathrm{erfc}(x) = 1 - \mathrm{erf}(x) = \frac{2}{\sqrt{\pi}}\int_x^\infty e^{-t^2},dt\). This corresponds to convolving the step function with a Gaussian of width \(\sigma\): states more than \(\sim 2\sigma\) below \(\epsilon_F\) have \(f_i \approx 1\); states more than \(\sim 2\sigma\) above have \(f_i \approx 0\); and states within \(\sim \sigma\) of \(\epsilon_F\) have fractional occupations.

The smearing introduces a systematic error in the total energy. For a system with a slowly varying density of states near \(\epsilon_F\), the error scales as:

\begin{equation}     E[\sigma] - E_0 = \mathcal{O}(\sigma^2),     \label{eq:gaussian-error} \end{equation}

where \(E_0\) is the true zero-temperature energy. To obtain \(E_0\), one must either use a very small \(\sigma\) (which reintroduces the convergence problem) or extrapolate.

The free energy and the \(\sigma \to 0\) extrapolation. The smeared system can be interpreted as a fictitious finite-temperature ensemble with an electronic entropy:

\begin{equation}     S = -k_B \sum_{i,\mathbf{k}} w_\mathbf{k}\left[f_i \ln f_i + (1 - f_i)\ln(1 - f_i)\right],     \label{eq:entropy} \end{equation}

where \(w_\mathbf{k}\) is the \(\mathbf{k}\)-point weight. The corresponding free energy is:

\begin{equation}     F[\sigma] = E[\sigma] - \sigma\, S.     \label{eq:free-energy} \end{equation}

For a system with a parabolic density of states near \(\epsilon_F\) (a good approximation for simple metals), the total energy and free energy bracket the true \(E_0\):

\begin{equation}     E[\sigma] = E_0 + a\sigma^2 + \mathcal{O}(\sigma^4), \qquad     F[\sigma] = E_0 - a\sigma^2 + \mathcal{O}(\sigma^4),     \label{eq:E-F-expansion} \end{equation}

with the same coefficient \(a\) but opposite signs. Averaging eliminates the \(\sigma^2\) error:

\begin{equation}     \boxed{E_0 \approx \frac{1}{2}\left(E[\sigma] + F[\sigma]\right) + \mathcal{O}(\sigma^4).}     \label{eq:sigma-extrapolation} \end{equation}

This is the \(\sigma \to 0\) extrapolation. Most plane-wave codes report this corrected energy alongside the raw total energy and the free energy; it is the value that should be used for comparing total energies, computing energy differences, and extracting formation energies — not the raw \(E[\sigma]\) or \(F[\sigma]\) individually.

Limitation: the extrapolation assumes a smoothly varying density of states. For systems with sharp features near \(\epsilon_F\) (van Hove singularities, narrow \(d\)-bands), the quadratic approximation breaks down and larger errors can persist even after extrapolation.

Fermi–Dirac Smearing

A physically motivated alternative uses the Fermi–Dirac distribution directly:

\begin{equation}     \boxed{f_i = \frac{1}{1 + \exp\!\left(\frac{\epsilon_i - \epsilon_F}{k_B T}\right)},}     \label{eq:FD-occ} \end{equation}

where \(T\) is interpreted as the electronic temperature and \(\sigma = k_BT\) plays the same role as the Gaussian smearing width. The advantage is physical transparency: Fermi–Dirac smearing corresponds to the exact equilibrium occupation at temperature \(T\). The variational quantity is the Mermin free energy:

\begin{equation}     \Omega = E - TS,     \label{eq:mermin} \end{equation}

where \(S\) is the exact Fermi–Dirac entropy. For genuine finite-temperature calculations — ab initio molecular dynamics above \(\sim 1000,\)K, warm dense matter, or the electronic contribution to free energies — Fermi–Dirac smearing is the physically correct choice.

For ground-state calculations at \(T = 0\), Fermi–Dirac smearing has a disadvantage: the occupation function has exponential tails extending to \(\pm\infty\), meaning that states far from \(\epsilon_F\) acquire small but non-zero fractional occupations. This is physically correct at finite \(T\) but introduces an \(\mathcal{O}(\sigma^2)\) error at \(T = 0\) that is larger than the corresponding error from Methfessel–Paxton smearing at the same \(\sigma\). The \(\sigma \to 0\) extrapolation of equation \eqref{eq:sigma-extrapolation} can still be applied, but with lower accuracy than Methfessel–Paxton for ground-state energy differences.

Methfessel–Paxton Smearing

The Gaussian and Fermi–Dirac methods both introduce an \(\mathcal{O}(\sigma^2)\) error in the total energy. Methfessel–Paxton (MP) smearing (Methfessel and Paxton, 1989) systematically reduces this error by expanding the \(\delta\)-function approximation in a complete set of Hermite polynomials, achieving \(\mathcal{O}(\sigma^{2N+2})\) accuracy at order \(N\).

The construction starts from the identity: the step function \(\theta(x)\) can be written as \(\frac{1}{2},\mathrm{erfc}(x)\) plus a correction expressible as a series in Hermite polynomials \(H_n(x)\). The key mathematical result is that the Hermite polynomials, weighted by the Gaussian \(e^{-x^2}\), form a complete orthogonal set on \((-\infty, \infty)\). Expanding the difference \(\theta(x) - \frac{1}{2},\mathrm{erfc}(x)\) in this basis and truncating at order \(N\) gives the MP approximation to the \(\delta\)-function:

\begin{equation}     \tilde{\delta}_N(x) = \frac{e^{-x^2}}{\sqrt{\pi}} \sum_{n=0}^{N} A_n H_{2n}(x),     \label{eq:MP-delta} \end{equation}

where \(x = (\epsilon_i - \epsilon_F)/\sigma\) and the coefficients are \(A_n = (-1)^n / (n!, 4^n\sqrt{\pi})\). The corresponding occupation function is obtained by integration:

\begin{equation}     f_N(x) = \int_x^\infty \tilde{\delta}_N(t)\,dt.     \label{eq:MP-occupation-general} \end{equation}

The \(N = 0\) case recovers Gaussian smearing: \(\tilde{\delta}_0(x) = e^{-x^2}/\sqrt{\pi}\), \(f_0(x) = \frac{1}{2},\mathrm{erfc}(x)\), with error \(\mathcal{O}(\sigma^2)\).

The \(N = 1\) case is derived explicitly. The first Hermite polynomial correction adds \(A_1 H_2(x) = -\frac{1}{4\sqrt{\pi}}(4x^2 - 2)\) to the \(\delta\)-function. Integrating gives:

\begin{equation}     \boxed{f_1(x) = \frac{1}{2}\,\mathrm{erfc}(x) + \frac{1}{\sqrt{\pi}}\,x\,e^{-x^2},     \qquad x = \frac{\epsilon_i - \epsilon_F}{\sigma}.}     \label{eq:MP-N1} \end{equation}

Note that \(f_1(x)\) is not monotonic and can take values slightly outside \([0,1]\): for \(x \approx -1\), the occupation exceeds \(1\) slightly, and for \(x \approx 1\), it dips below \(0\). This is not a pathology — the occupation numbers in MP smearing are not probabilities but mathematical weights designed to minimise the BZ integration error. The total electron count \(\sum_i f_i = N\) is still exactly conserved.

The crucial property is that the energy error now scales as \(\mathcal{O}(\sigma^4)\) rather than \(\mathcal{O}(\sigma^2)\):

\begin{equation}     E_{N=1}[\sigma] = E_0 + \mathcal{O}(\sigma^4).     \label{eq:MP-N1-error} \end{equation}

This means that for \(N \geq 1\), the raw energy \(E[\sigma]\) is already a good approximation to \(E_0\) without the \(\sigma \to 0\) extrapolation. In practice, the extrapolated energy is still computed and should still be used, but the correction it applies is much smaller than for Gaussian smearing at the same \(\sigma\).

The \(N = 2\) case adds the next Hermite correction, achieving \(\mathcal{O}(\sigma^6)\) accuracy. For most applications, \(N = 1\) is sufficient; \(N = 2\) provides a useful cross-check but rarely changes results by more than \(\sim 0.1,\)meV/atom at typical smearing widths.

Why not \(N \gt 2\)? Higher-order MP approximations produce occupation functions with increasingly large oscillations outside \([0,1]\), which can cause numerical instabilities in the SCF cycle (large negative occupations can make the charge density locally negative). The practical consensus is that \(N = 1\) or \(N = 2\) gives the optimal balance between accuracy and numerical stability.

The Tetrahedron Method

All smearing methods introduce a fictitious broadening that must be controlled or corrected. The tetrahedron method (Jepsen and Andersen, 1971; Blöchl, Jepsen, and Andersen, 1994) takes a fundamentally different approach: it performs the BZ integral exactly for a piecewise linear interpolation of the band energies \(\epsilon_i(\mathbf{k})\), without introducing any broadening parameter.

The BZ is decomposed into tetrahedra (typically by subdividing each parallelepiped of the \(\mathbf{k}\)-mesh into six tetrahedra). Within each tetrahedron, \(\epsilon_i(\mathbf{k})\) is interpolated linearly from its values at the four vertices. The integral of the step function \(\theta(\epsilon_F - \epsilon_i(\mathbf{k}))\) over a tetrahedron with a linearly varying integrand can be evaluated analytically — the result is a known function of \(\epsilon_F\) and the four vertex energies, involving only elementary algebra.

The Blöchl correction (Blöchl et al., 1994) adds a quadratic correction term that accounts for the curvature of \(\epsilon_i(\mathbf{k})\) beyond the linear interpolation, improving the accuracy from \(\mathcal{O}(1/N_k^2)\) to \(\mathcal{O}(1/N_k^4)\) for the same mesh density. This corrected tetrahedron method is the gold standard for BZ integration accuracy.

Advantages: No smearing parameter to choose or converge. No entropy correction needed. Exact for any quantity that depends linearly on \(\epsilon_i(\mathbf{k})\) within each tetrahedron. Produces smooth, non-oscillatory convergence with \(\mathbf{k}\)-mesh density.

Limitations: The tetrahedron method requires the BZ integral to be performed over a regular mesh — it cannot be used with symmetry-reduced or randomly shifted \(\mathbf{k}\)-sets. More importantly, the analytic integration assumes a fixed band structure: the Fermi energy and the tetrahedron weights are computed after the eigenvalues are known, so there is no smooth dependence of the total energy on atomic positions. This means the forces computed from a tetrahedron-method calculation contain small but non-zero discontinuities when an eigenvalue crosses \(\epsilon_F\) as atoms move, making the tetrahedron method unsuitable for geometry optimisation or molecular dynamics of metallic systems. For these tasks, MP smearing (\(N = 1\)) is preferred.

The tetrahedron method is, however, the optimal choice for: fixed-geometry total energy calculations, density of states (DOS) and projected DOS, optical properties and spectral functions, and any post-processing quantity requiring accurate spectral weight near the Fermi level.

Practical Guidance

The choice of smearing method and width depends on both the system type and the type of calculation:

Calculation typeSystemRecommended method\(\sigma\)
Geometry optimisationMetalMP \(N=1\)\(0.1\)–\(0.2\) eV
Geometry optimisationInsulatorGaussian or tetrahedron\(0.05\)–\(0.1\) eV (or none)
Single-point energyMetalTetrahedron (Blöchl)
Single-point energyInsulatorTetrahedron (Blöchl)
DOS / optical propertiesAnyTetrahedron (Blöchl)
AIMD (\(T \gt 1000,\)K)MetalFermi–Dirac\(k_BT\)
Phonons (DFPT)MetalMP \(N=1\)\(0.1\)–\(0.2\) eV

Choosing \(\sigma\): The smearing width must be large enough to smooth the Fermi surface sufficiently for the given \(\mathbf{k}\)-mesh, but small enough that the smearing error is acceptable. A useful diagnostic: if the entropy term \(TS\) exceeds \(\sim 1,\)meV/atom, the smearing is too large and either \(\sigma\) should be reduced or the \(\mathbf{k}\)-mesh should be made denser.

The interdependence of \(\sigma\) and \(\mathbf{k}\)-mesh: These two parameters are not independent. A denser \(\mathbf{k}\)-mesh resolves the Fermi surface better, allowing a smaller \(\sigma\); conversely, a larger \(\sigma\) compensates for a coarser mesh. The product \(\sigma \times N_k^{1/3}\) should remain approximately constant for a given target accuracy. In practice, one converges both simultaneously: fix \(\sigma\), increase the \(\mathbf{k}\)-mesh until the energy is stable, then reduce \(\sigma\) and repeat until both are converged.

Marzari–Vanderbilt cold smearing: An alternative to MP smearing that ensures the occupation function remains non-negative while achieving comparable accuracy (Marzari et al., 1999). It avoids the slight negative occupations of MP \(N \geq 1\) that can occasionally cause numerical difficulties. Available in several codes as an option alongside Gaussian and MP smearing.

Code Notes: VASP and Quantum ESPRESSO Parameters

VASP:

ParameterControlsValues
ISMEARSmearing method\(-5\): tetrahedron (Blöchl); \(-1\): Fermi–Dirac; \(0\): Gaussian; \(1\): MP \(N=1\); \(2\): MP \(N=2\)
SIGMASmearing width \(\sigma\) (eV)\(0.01\)–\(0.3\); default \(0.2\)

The extrapolated energy energy(sigma->0) in the OUTCAR corresponds to equation \eqref{eq:sigma-extrapolation} for Gaussian smearing and to the raw energy (with negligible correction) for MP \(N \geq 1\). Always use this value for energy comparisons. The entropy contribution is reported as EENTRO.

ISMEAR = -5 (tetrahedron) must not be used with fewer than \(3\) \(\mathbf{k}\)-points in any direction, or for calculations where forces are needed (geometry optimisation, MD). VASP will run but the forces will contain discontinuities that prevent smooth convergence of the ionic relaxation.

Quantum ESPRESSO:

ParameterControlsValues
occupationsOccupation scheme'smearing', 'tetrahedra', 'tetrahedra_opt' (Blöchl), 'fixed'
smearingSmearing function'gaussian', 'fermi-dirac', 'methfessel-paxton', 'marzari-vanderbilt'
degaussSmearing width \(\sigma\) (Ry)Typical: \(0.005\)–\(0.02\) Ry (\(0.07\)–\(0.27\) eV)

Note that QE uses Rydberg units for degauss (\(1,\)Ry \(= 13.606,\)eV), while VASP uses eV for SIGMA. A common source of error is confusing the two: degauss = 0.02 Ry \(\approx 0.27,\)eV, which is a reasonable smearing for metals; SIGMA = 0.02 eV is extremely small and may cause convergence problems on a coarse \(\mathbf{k}\)-mesh.

Marzari–Vanderbilt cold smearing is accessed via smearing = 'marzari-vanderbilt' or smearing = 'cold'.

Further Reading
  • Methfessel–Paxton smearing: M. Methfessel and A. T. Paxton, High-precision sampling for   Brillouin-zone integration in metals, Phys. Rev. B 40, 3616 (1989). The original paper   deriving the Hermite polynomial expansion and the error scaling \(\mathcal{O}(\sigma^{2N+2})\).

  • Tetrahedron method with Blöchl correction: P. E. Blöchl, O. Jepsen, and O. K. Andersen,   Improved tetrahedron method for Brillouin-zone integrations, Phys. Rev. B 49, 16223   (1994). Derives the quadratic correction that improves convergence from   \(\mathcal{O}(1/N_k^2)\) to \(\mathcal{O}(1/N_k^4)\).

  • Original tetrahedron method: O. Jepsen and O. K. Andersen, The electronic structure of   h.c.p. ytterbium, Solid State Commun. 9, 1763 (1971).

  • Marzari–Vanderbilt cold smearing: N. Marzari, D. Vanderbilt, A. De Vita, and M. C. Payne,   Thermal contraction and disordering of the Al(110) surface, Phys. Rev. Lett. 82, 3296   (1999). Introduces the cold smearing function with improved convergence properties for metals.

  • Mermin finite-temperature DFT: N. D. Mermin, Thermal properties of the inhomogeneous   electron gas, Phys. Rev. 137, A1441 (1965). The formal foundation for Fermi–Dirac   smearing as a finite-temperature variational principle.

  • General discussion: The interplay between \(\mathbf{k}\)-mesh density and smearing is   discussed clearly in G. Kresse and J. Furthmüller, Phys. Rev. B 54, 11169 (1996),   Section II.D.

Eigenvalue Solvers

Iterative Diagonalisation

The Eigenvalue Problem and Its Cost

At each SCF step, we must solve the Kohn–Sham eigenvalue equation:

\begin{equation}     \hat{h}_{\rm KS} |\phi_i\rangle = \epsilon_i |\phi_i\rangle,     \qquad     \hat{h}_{\rm KS} = -\frac{1}{2}\nabla^2 + V_{\rm eff}(\mathbf{r}),     \label{eq:KS-eigen} \end{equation}

for the lowest \(N_{\rm bands}\) eigenpairs \({(\epsilon_i, |\phi_i\rangle)}\). In a plane-wave basis, \(\hat{h}{\rm KS}\) is represented as a matrix of dimension \(N{\rm PW} \times N_{\rm PW}\), where \(N_{\rm PW}\) is the number of plane waves within the energy cutoff \(E_{\rm cut}\). For a typical calculation:

A \(50\)-atom cell with \(E_{\rm cut} = 500\) eV gives \(N_{\rm PW} \sim 30,000\) per \(\mathbf{k}\)-point. A \(200\)-atom cell gives \(N_{\rm PW} \sim 120,000\). Full diagonalisation of an \(N_{\rm PW} \times N_{\rm PW}\) matrix costs \(\mathcal{O}(N_{\rm PW}^3)\) operations. For \(N_{\rm PW} = 30,000\), this is \(\sim 2.7 \times 10^{13}\) floating-point operations — already at the limit of practicality for a single diagonalisation at a single \(\mathbf{k}\)-point. Since the SCF loop requires \(\sim 20\)–\(50\) diagonalisations, and the BZ integral requires tens to hundreds of \(\mathbf{k}\)-points, full diagonalisation is completely intractable for any system of practical interest.

The key observation is that we do not need all \(N_{\rm PW}\) eigenpairs — only the lowest \(N_{\rm bands}\), which is proportional to the number of electrons and hence to \(N_{\rm atom}\). For a \(50\)-atom cell, \(N_{\rm bands} \sim 200\), which is \(\ll N_{\rm PW}\). Iterative subspace methods exploit this by working in a small subspace of dimension \(m \sim 2\)–\(3 \times N_{\rm bands}\), reducing the cost from \(\mathcal{O}(N_{\rm PW}^3)\) to \(\mathcal{O}(N_{\rm bands}^2 \cdot N_{\rm PW})\) per SCF step — a reduction of several orders of magnitude.

The Davidson Algorithm

The Davidson algorithm (Davidson, 1975; adapted for plane-wave DFT by Wood and Zunger, 1985) is the most widely used iterative eigensolver in electronic structure codes. It belongs to the family of Krylov subspace methods but uses a preconditioner to accelerate convergence dramatically compared to the Lanczos algorithm.

The basic idea. Start with a set of \(N_{\rm bands}\) trial vectors \({|v_1\rangle, \ldots, |v_{N_{\rm bands}}\rangle}\) — typically the eigenvectors from the previous SCF step, or random vectors for the first step. These span an initial subspace \(\mathcal{V}\). The algorithm proceeds iteratively:

Step 1 — Project and diagonalise. Form the projected Hamiltonian and overlap matrices within the current subspace:

\begin{equation}     H_{ij}^{\rm sub} = \langle v_i | \hat{h}_{\rm KS} | v_j \rangle, \qquad     S_{ij}^{\rm sub} = \langle v_i | v_j \rangle.     \label{eq:davidson-project} \end{equation}

Diagonalise the small \(m \times m\) generalised eigenvalue problem \(\mathbf{H}^{\rm sub}\mathbf{c} = \epsilon, \mathbf{S}^{\rm sub}\mathbf{c}\) to obtain approximate eigenvalues \({\epsilon_i^{\rm sub}}\) and eigenvectors expressed as linear combinations of the subspace basis: \(|\tilde{\phi}i\rangle = \sum_j c{ij} |v_j\rangle\). This small diagonalisation costs \(\mathcal{O}(m^3)\), which is negligible.

Step 2 — Compute residuals. For each approximate eigenpair, compute the residual:

\begin{equation}     |r_i\rangle = (\hat{h}_{\rm KS} - \epsilon_i^{\rm sub})|\tilde{\phi}_i\rangle.     \label{eq:davidson-residual} \end{equation}

If \(|r_i| \leq \delta\) for all desired eigenpairs, the algorithm has converged. The threshold \(\delta\) is typically set to \(10^{-2}\)–\(10^{-3}\) times the SCF energy convergence criterion, since the eigenvalues need not be converged more tightly than the density at each SCF step.

Step 3 — Precondition and expand. The residual \(|r_i\rangle\) points in the direction of steepest descent toward the true eigenvector, but its components at different \(\mathbf{G}\) converge at very different rates. High-\(G\) components (large kinetic energy) have large eigenvalue separations and converge quickly; low-\(G\) components are nearly degenerate and converge slowly. The kinetic energy preconditioner corrects this disparity by approximately inverting the dominant part of the Hamiltonian:

\begin{equation}     \boxed{|t_i\rangle = \hat{P}\,|r_i\rangle, \qquad     P(\mathbf{G}) = \frac{1}{\frac{1}{2}|\mathbf{k}+\mathbf{G}|^2 - \epsilon_i^{\rm sub}}.}     \label{eq:preconditioner} \end{equation}

In reciprocal space, this simply divides each plane-wave component of the residual by the difference between its kinetic energy and the current eigenvalue estimate. The effect is to rotate the residual toward the true eigenspace: components that are far from the eigenvalue (large \(|\frac{1}{2}G^2 - \epsilon_i|\)) are suppressed, while components near the eigenvalue are amplified. This is the same principle as Kerker preconditioning for density mixing (Chapter 8, Section 8.2), applied here to the eigenvalue problem rather than the fixed-point problem.

In practice, a regularisation is applied to prevent the denominator from vanishing when \(\frac{1}{2}G^2 \approx \epsilon_i\). The standard choice is:

\begin{equation}     P(\mathbf{G}) = \frac{1}{\max\!\left(\frac{1}{2}|\mathbf{k}+\mathbf{G}|^2 - \epsilon_i^{\rm sub},\; \epsilon_{\rm ref}\right)},     \label{eq:preconditioner-regularised} \end{equation}

where \(\epsilon_{\rm ref}\) is a small positive constant (typically a fraction of the bandwidth).

The preconditioned residuals \({|t_i\rangle}\) are orthogonalised against the existing subspace and added as new basis vectors: \(\mathcal{V} \leftarrow \mathcal{V} \oplus \mathrm{span}{|t_i\rangle}\). The subspace dimension grows from \(m\) to \(m + N_{\rm bands}\).

Step 4 — Restart. When the subspace becomes too large (typically \(m \gt 3 N_{\rm bands}\)), the oldest basis vectors are discarded, keeping only the current best eigenvector approximations. This prevents the subspace from growing indefinitely while retaining the most useful information.

Convergence. With preconditioning, Davidson converges quadratically near the solution — each iteration roughly doubles the number of correct digits in the eigenvalues. In practice, \(3\)–\(5\) Davidson iterations per SCF step suffice to converge the eigenpairs to the required accuracy.

Block Davidson. Rather than processing one eigenpair at a time, the block variant processes all \(N_{\rm bands}\) eigenpairs simultaneously. The subspace expansion adds \(N_{\rm bands}\) preconditioned residuals at each step. This is more efficient because the Hamiltonian action \(\hat{h}_{\rm KS}|v\rangle\) — the most expensive operation, requiring FFTs between real and reciprocal space — can be batched over all vectors in the block.

RMM-DIIS

The Residual Minimisation Method — Direct Inversion in the Iterative Subspace (RMM-DIIS) (Pulay, 1980; Wood and Zunger, 1985; Kresse and Furthmüller, 1996) takes a different approach: instead of expanding a subspace and projecting, it directly minimises the residual norm \(|(\hat{h}_{\rm KS} - \epsilon_i)|\phi_i\rangle|^2\) for each band individually, using the DIIS extrapolation from Section 8.2.

The method. For each band \(i\), maintain a history of trial vectors and their residuals from the last \(m\) iterations: \({|\phi_i^{(n)}\rangle, |r_i^{(n)}\rangle}\). Construct the optimal linear combination:

\begin{equation}     |\bar{\phi}_i\rangle = \sum_{k=1}^{m} c_k |\phi_i^{(n-m+k)}\rangle,     \label{eq:rmm-trial} \end{equation}

where the coefficients \({c_k}\) minimise the residual norm, exactly as in the density-mixing DIIS of Chapter 8:

\begin{equation}     \min_{\{c_k\}} \left\|\sum_k c_k |r_i^{(n-m+k)}\rangle\right\|^2, \qquad \sum_k c_k = 1.     \label{eq:rmm-minimise} \end{equation}

This leads to the same small linear system with the overlap matrix \(B_{jk} = \langle r_i^{(j)} | r_i^{(k)} \rangle\) as in equation \eqref{eq:DIIS-system}. The optimal trial vector is then updated by applying the preconditioned residual:

\begin{equation}     |\phi_i^{(n+1)}\rangle = |\bar{\phi}_i\rangle + \hat{P}\,|\bar{r}_i\rangle.     \label{eq:rmm-update} \end{equation}

Key difference from Davidson: RMM-DIIS does not require the trial vectors to be mutually orthogonal at every step. Orthogonalisation is the most expensive operation in Davidson (\(\mathcal{O}(N_{\rm bands}^2 \cdot N_{\rm PW})\) per step), and RMM-DIIS avoids it almost entirely. This makes RMM-DIIS faster per iteration — typically \(30\)–\(50%\) cheaper than Davidson for the same system.

The risk: subspace collapse. Without explicit orthogonalisation, two or more bands can converge to the same eigenstate, losing one or more eigenpairs from the solution. This is the subspace collapse problem. It manifests as a sudden jump in the total energy when a previously tracked eigenstate “disappears” and is replaced by a duplicate. The risk is highest when eigenvalues are nearly degenerate (e.g., \(d\)-bands in transition metals) or when the initial guess is poor.

Practical mitigation. Occasional re-orthogonalisation — typically every \(5\)–\(10\) RMM-DIIS steps — prevents subspace collapse at minimal cost. Most implementations also monitor the overlap matrix of the trial vectors and trigger re-orthogonalisation when near-linear-dependence is detected.

The hybrid strategy. The most effective approach, used as the default in several major codes, combines both methods: start with a few Davidson steps to establish a well-separated, orthogonal initial subspace, then switch to RMM-DIIS for the remaining iterations where the eigenpairs are already approximately correct and only need refinement. The Davidson phase provides robustness; the RMM-DIIS phase provides speed.

Conjugate Gradient Minimisation

An alternative to the subspace methods above is direct minimisation of the KS total energy functional with respect to the orbital coefficients, using a conjugate gradient (CG) algorithm. Rather than solving the eigenvalue equation \eqref{eq:KS-eigen} iteratively, CG treats the orbitals as variational parameters and minimises:

\begin{equation}     E[\{\phi_i\}] = \sum_i f_i \langle\phi_i|\hat{h}_{\rm KS}|\phi_i\rangle + E_{\rm H}[\rho] + E_{\rm xc}[\rho] - \int V_{\rm H}\rho\,d\mathbf{r},     \label{eq:CG-functional} \end{equation}

subject to the orthonormality constraints \(\langle\phi_i|\phi_j\rangle = \delta_{ij}\).

CG is the most robust of the three methods: it is guaranteed to decrease the total energy at every step (it is a true minimisation, not an eigenvalue iteration), and it cannot suffer from subspace collapse. The price is speed — CG converges more slowly than Davidson or RMM-DIIS because it does not exploit the eigenvalue structure of the problem. It is typically \(2\)–\(5 \times\) slower per SCF step.

CG is the method of choice when: the other methods fail to converge (e.g., systems with extremely flat potential energy surfaces or near-degenerate states), when memory is limited (CG requires minimal subspace storage), or for the first few SCF steps of a difficult system where a reliable initial eigenspace must be established before switching to faster methods.

Computational Scaling

The dominant costs at each SCF step are:

OperationScalingBottleneck for
Hamiltonian action \(\hat{h}_{\rm KS}|\phi\rangle\) (FFTs)\(\mathcal{O}(N_{\rm PW} \log N_{\rm PW})\) per bandSmall systems
Orthogonalisation (Davidson)\(\mathcal{O}(N_{\rm bands}^2 \cdot N_{\rm PW})\)Medium systems (\(100\)–\(500\) atoms)
Subspace diagonalisation\(\mathcal{O}(N_{\rm bands}^3)\)Large systems (\(\gt 500\) atoms)

The crossover between regimes depends on the system and the parallelisation strategy. For most DFT calculations on \(50\)–\(200\) atom cells, the orthogonalisation and subspace diagonalisation costs dominate, giving an effective scaling of \(\mathcal{O}(N^3)\) with system size \(N\). This cubic wall is the fundamental limit of conventional KS-DFT and is the motivation for linear-scaling (\(\mathcal{O}(N)\)) methods, which require localised basis sets and are not yet standard for plane-wave calculations.

Parallelisation. The three levels of parallelism available in a plane-wave DFT calculation are: over \(\mathbf{k}\)-points (trivially parallel — no communication), over bands (moderate communication for orthogonalisation), and over plane waves (requires FFT communication). The optimal decomposition depends on the system: \(\mathbf{k}\)-point parallelism is most efficient for small unit cells with dense \(\mathbf{k}\)-meshes; band/plane-wave parallelism is needed for large supercells with few \(\mathbf{k}\)-points.

Code Notes: VASP and Quantum ESPRESSO Parameters

VASP:

ParameterControlsValues
ALGODiagonalisation algorithmNormal: Davidson; Fast: Davidson + RMM-DIIS (default); All: Davidson every step; Conjugate: CG; Damped: damped velocity CG
NBANDSNumber of bands to computeAuto (typically \(N_{\rm elec}/2 + \) padding); increase for metals or unoccupied-state properties
NELMMaximum SCF iterations\(40\)–\(200\) (default \(60\))
NELMINMinimum SCF iterations\(2\)–\(6\) (default \(2\)); increase for difficult systems
NELMDLNon-self-consistent initial steps\(-5\) to \(-12\); negative = auto; builds initial subspace without updating potential

ALGO = Fast (the default) runs Davidson for the first few iterations to establish the eigenspace, then switches to RMM-DIIS. This is the optimal choice for most systems. Use ALGO = All (Davidson every step) for difficult cases where RMM-DIIS causes instabilities — particularly magnetic metals, systems with nearly degenerate states, or the first calculation on a new system type where robustness matters more than speed. Use ALGO = Damped for molecular dynamics at elevated temperatures where the eigenspectrum changes significantly between ionic steps.

Quantum ESPRESSO:

ParameterControlsValues
diagonalizationAlgorithm'david': Davidson (default); 'cg': conjugate gradient; 'ppcg': parallel-preconditioned CG
diago_david_ndimDavidson subspace dimension per band\(2\)–\(8\) (default \(2\)); increase for difficult convergence
diago_full_accForce full accuracy at every SCF step.true. or .false. (default); use .true. for phonons and response properties
nbndNumber of bandsAuto; increase for metals or when unoccupied states are needed
electron_maxstepMaximum SCF iterations\(100\)–\(200\) (default \(100\))

In QE, diagonalization = 'david' is the default and works well for most systems. The 'cg' option is more robust but slower, and is recommended when Davidson fails to converge. The 'ppcg' option (parallel-preconditioned CG) is designed for massively parallel runs where band parallelism is dominant.

Further Reading
  • Davidson algorithm (original): E. R. Davidson, The iterative calculation of a few of   the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices,   J. Comput. Phys. 17, 87 (1975). The foundational paper for subspace iteration with   preconditioning.

  • Adaptation to DFT: D. M. Wood and A. Zunger, A new method for diagonalising large   matrices, J. Phys. A 18, 1343 (1985). Adapts the Davidson method to the plane-wave   pseudopotential framework.

  • VASP implementation (Davidson + RMM-DIIS): G. Kresse and J. Furthmüller, Efficient   iterative schemes for ab initio total-energy calculations using a plane-wave basis set,   Phys. Rev. B 54, 11169 (1996). Sections III and IV describe the block Davidson and   RMM-DIIS implementations in detail, including the hybrid strategy.

  • Conjugate gradient for DFT: M. C. Payne, M. P. Teter, D. C. Allan, T. A. Arias, and   J. D. Joannopoulos, Iterative minimization techniques for ab initio total-energy   calculations: molecular dynamics and conjugate gradients, Rev. Mod. Phys. 64, 1045   (1992). The standard reference for CG-based total energy minimisation.

  • Linear-scaling methods: S. Goedecker, Linear scaling electronic structure methods,   Rev. Mod. Phys. 71, 1085 (1999). Review of \(\mathcal{O}(N)\) approaches that circumvent   the cubic scaling wall.


Convergence Diagnostics and Practical Workflow

Reading the SCF Output

A converging SCF calculation produces a characteristic pattern in its output: the total energy decreases monotonically (or nearly so) toward a limiting value, the energy change per iteration \(\Delta E^{(n)} = E^{(n)} - E^{(n-1)}\) decreases geometrically, and the density residual norm \(|R^{(n)}|\) decreases in parallel. Understanding what each output quantity means is essential for diagnosing problems.

Total energy. The KS total energy \(E^{(n)}\) at iteration \(n\) is not variational with respect to the density — it is only variational at self-consistency. During the SCF cycle, \(E^{(n)}\) can increase between iterations (especially in the first few steps when the density is far from self-consistent). A sustained increase after \(\sim 10\) iterations is a sign of divergence.

The Harris–Foulkes energy. An alternative energy estimate that is variational at each step. It uses the input density \(\rho_{\rm in}^{(n)}\) rather than the self-consistent density to evaluate the XC and Hartree contributions, giving an upper bound to the true ground-state energy. The difference between the Harris–Foulkes energy and the KS energy provides a measure of how far the current density is from self-consistency. When both agree to within the convergence threshold, the SCF cycle has converged.

Energy change \(\Delta E\). This is the primary convergence indicator. A converged calculation shows \(|\Delta E|\) decreasing by roughly one order of magnitude every \(2\)–\(5\) iterations (for Pulay/Broyden mixing). If \(|\Delta E|\) stalls at a plateau or oscillates, see the diagnostic table in the Part II overview.

Density residual. Some codes report the norm of the density residual \(|R^{(n)}| = |\rho_{\rm out}^{(n)} - \rho_{\rm in}^{(n)}|\) or the RMS change in the charge density. This should decrease in tandem with \(|\Delta E|\). A large residual at apparent energy convergence indicates that the energy is accidentally stationary (a saddle point of the SCF map) rather than truly self-consistent — increase the maximum number of SCF iterations and tighten the convergence threshold.

Convergence Thresholds

The required SCF convergence depends on the property being computed:

Target propertyRequired energy convergenceRationale
Qualitative band structure\(10^{-4}\) eVBand positions change by \(\leq 10\) meV
Total energy differences\(10^{-6}\) eVFormation energies accurate to \(\sim 1\) meV/atom
Magnetic moments\(10^{-6}\) eVMoment stable to \(\leq 0.01,\mu_B\)
Forces (geometry optimisation)\(10^{-7}\) eVForces accurate to \(\sim 1\) meV/Å
Stress tensor / elastic constants\(10^{-8}\) eVStress accurate to \(\sim 0.1\) kbar
Phonons (DFPT / finite differences)\(10^{-8}\) eVForce constants require tight force convergence

Why tighter is not always better. Beyond a certain point, tightening the SCF threshold does not improve results because other sources of error dominate: finite \(\mathbf{k}\)-mesh, finite \(E_{\rm cut}\), pseudopotential transferability, and the XC functional approximation itself. A calculation converged to \(10^{-10}\) eV is not more accurate than one converged to \(10^{-7}\) eV if the \(\mathbf{k}\)-mesh introduces \(1,\)meV/atom errors. Additionally, round-off errors in double-precision arithmetic set a floor at \(\sim 10^{-12}\)–\(10^{-14}\) eV for typical system sizes.

Common Failure Modes and Remedies

The Part II overview provided a quick-reference table. Here we expand the most important failure modes with their underlying causes and systematic remedies.

Failure 1: Total energy oscillates without decreasing. The density is sloshing between iterations — the mixing parameter \(\alpha\) is too large for the system’s dielectric response (Chapter 8, Section 8.1). Reduce \(\alpha\) by a factor of \(2\)–\(5\). If Kerker preconditioning is not active, enable it. If the system is a metal or small-gap semiconductor, ensure that an appropriate smearing is used (Section 9.1) — sharp occupations exacerbate charge sloshing by making the density response discontinuous.

Failure 2: SCF converges slowly then stalls. The DIIS/Broyden subspace is contaminated by poor early iterates. Increase the number of initial non-DIIS steps (simple mixing with small \(\alpha\)) to bring the density into the basin of convergence before the subspace extrapolation takes over. Alternatively, increase the subspace size to allow more history, or clear the subspace and restart the mixing.

Failure 3: Spin channels oscillate out of phase. In spin-polarised calculations, the up and down densities can trade magnitude between iterations, producing an apparent antiferromagnetic oscillation that prevents convergence. This usually indicates either an incorrect initial magnetic configuration or a genuine competition between ferromagnetic and antiferromagnetic solutions. Fix the initial moments to match the expected ground state. Use a more robust diagonaliser (Davidson rather than RMM-DIIS) for the first \(\sim 10\) SCF steps to prevent subspace collapse from coupling to the spin oscillation. If both FM and AFM solutions converge, compare their total energies to determine the ground state.

Failure 4: Eigenvalue solver converges to wrong states. RMM-DIIS has lost track of one or more bands due to subspace collapse. The symptom is a sudden jump in the total energy mid-SCF, often accompanied by a change in the number of occupied states or the magnetic moment. Switch to Davidson for the problematic calculation. If the problem persists, increase the number of bands to provide a larger buffer between occupied and unoccupied states.

Failure 5: Energy converged but forces are noisy. The SCF threshold is too loose for the force accuracy required. Forces depend on the density through the Hellmann–Feynman theorem (Chapter 10), and small density errors that do not affect the total energy can produce significant force errors. Tighten the energy convergence by one to two orders of magnitude. Also check that the smearing is appropriate — tetrahedron method forces contain discontinuities for metals (Section 9.1).

Worked Example: SCF Convergence for bcc Fe

To tie the chapter together, consider the SCF convergence of ferromagnetic bcc Fe — a system that exercises every piece of machinery developed in Chapters 8 and 9.

System: bcc Fe, \(a = 2.87,\)Å, \(1\)-atom primitive cell, ferromagnetic with initial moment \(5.0,\mu_B\), \(E_{\rm cut} = 500,\)eV, \(12 \times 12 \times 12\) \(\mathbf{k}\)-mesh.

Smearing: Methfessel–Paxton \(N = 1\) with \(\sigma = 0.2,\)eV — this is a metal, so the tetrahedron method would give discontinuous forces, and Gaussian smearing would require the \(\sigma \to 0\) extrapolation.

Mixing: Pulay/DIIS with Kerker preconditioning (\(\alpha = 0.05\), \(k_0 \sim 1.5,\)Å\(^{-1}\)) — a small \(\alpha\) because Fe is a magnetic metal with both charge-sloshing and spin-oscillation risks.

Diagonalisation: Davidson for the first \(5\) steps to establish a clean eigenspace, then RMM-DIIS for speed.

Expected convergence behaviour:

Iterations \(1\)–\(5\): The total energy drops rapidly (by \(\sim 1\) eV) as the initial atomic superposition density relaxes toward the crystalline density. The magnetic moment adjusts from \(5.0,\mu_B\) toward the converged value of \(\sim 2.2,\mu_B\). Davidson diagonalisation ensures the correct band ordering is established.

Iterations \(5\)–\(15\): The energy change decreases geometrically, roughly one order of magnitude every \(3\)–\(4\) iterations. The DIIS extrapolation begins to accelerate convergence. The switch to RMM-DIIS at iteration \(5\) speeds up each step by \(\sim 40%\).

Iterations \(15\)–\(25\): Final convergence to \(10^{-7}\) eV. The magnetic moment is stable at \(2.17\)–\(2.22,\mu_B\) (depending on the functional). The entropy correction \(TS \approx 0.3,\)meV/atom confirms that \(\sigma = 0.2,\)eV is acceptable.

What would go wrong with poor settings: With simple mixing (\(\alpha = 0.3\), no Kerker), the same system would oscillate for \(\gt 200\) iterations without converging — the charge-sloshing modes at small \(\mathbf{G}\) are barely damped. With Gaussian smearing (rather than MP \(N=1\)), the total energy would converge but the \(\sigma \to 0\) extrapolation would introduce an additional \(\sim 5,\)meV error. With RMM-DIIS from the start (no initial Davidson), there is a risk of subspace collapse in the \(d\)-bands, which are nearly five-fold degenerate and prone to mixing.

Code Notes: VASP and Quantum ESPRESSO Parameters — Full Reference

VASP — Complete SCF parameter list:

ParameterControlsRecommendedNotes
ALGODiagonalisationFast (default)All for difficult systems
NELMMax SCF iterations\(60\)–\(200\)Increase for slabs, defects
NELMINMin SCF iterations\(2\)–\(6\)Prevents premature exit
NELMDLInitial non-DIIS steps\(-5\) to \(-12\)Negative = auto
EDIFFEnergy convergence (eV)\(10^{-6}\)–\(10^{-8}\)Match to target property
ISMEARSmearing method\(1\) (MP) for metals; \(-5\) (tet) for DOSSee Section 9.1
SIGMASmearing width (eV)\(0.1\)–\(0.2\)Check EENTRO \(\lt 1\) meV/atom
AMIXCharge mixing \(\alpha\)\(0.02\)–\(0.4\)Small for metals
BMIXKerker \(k_0\)\(0.5\)–\(3.0\)Enable for metals
AMIX_MAGSpin mixing \(\alpha\)\(2\)–\(4 \times\) AMIXOften larger than charge mixing
BMIX_MAGKerker for spin\(\sim\) BMIX
MAXMIXMixing history size\(20\)–\(60\)Increase for slabs
IMIXMixing scheme\(4\) (Broyden, default)
NBANDSNumber of bandsAuto + paddingIncrease for metals
LREALReal-space projection.FALSE. for small cells.TRUE. for \(\gt 20\) atoms

Quantum ESPRESSO — Complete SCF parameter list:

ParameterControlsRecommendedNotes
diagonalizationAlgorithm'david' (default)'cg' for robustness
electron_maxstepMax SCF iterations\(100\)–\(200\)
conv_thrEnergy convergence (Ry)\(10^{-8}\)–\(10^{-10}\)Note: Ry units
occupationsOccupation scheme'smearing' or 'tetrahedra_opt'
smearingSmearing function'methfessel-paxton' for metals
degaussSmearing width (Ry)\(0.005\)–\(0.02\)\(1,\)Ry \(= 13.6,\)eV
mixing_betaMixing parameter\(0.1\)–\(0.7\)Small for metals
mixing_modePreconditioning'local-TF' for metals
mixing_ndimMixing history\(4\)–\(12\)
nbndNumber of bandsAutoIncrease for metals
diago_david_ndimDavidson subspace\(2\)–\(8\)

Outlook

Chapters 8 and 9 together provide the complete numerical machinery for solving the Kohn–Sham equations self-consistently: density mixing drives the outer SCF loop to convergence (Chapter 8), while smearing regularises the BZ integrals and iterative diagonalisation makes the eigenvalue problem tractable (this chapter). With a converged, self-consistent ground-state density in hand, the next step is to extract physical observables.

The most immediate observable beyond the total energy is the force on each atom — the negative gradient of the total energy with respect to atomic positions. The Hellmann–Feynman theorem provides an exact expression for this force, but its practical evaluation requires careful treatment of the Pulay corrections that arise when the basis set depends on atomic positions. Forces, in turn, enable geometry optimisation (finding the equilibrium structure) and ab initio molecular dynamics (following the time evolution of the nuclei on the DFT potential energy surface). These topics are the subject of the next chapter.