Cover

Density Functional Theory

Lecture Note

v1.0 (2026)

Preamble

Density Functional Theory (DFT) is one of the most powerful theoretical frameworks for studying the electronic structure of atoms, molecules, and solids. It is widely used in condensed matter physics, quantum chemistry, and materials science to predict structural, electronic, magnetic, and optical properties of materials.

Course Curriculum

This course bridges the gap between rigorous quantum mechanical theory and state-of-the-art computational materials science, moving from fundamental equations to real-world software applications.

Part I: Theoretical Foundations of DFT

Starting from the quantum many-body problem, we detail the exact mathematical proofs behind the Hohenberg–Kohn theorems and the derivation of the effective single-particle Kohn–Sham equations. With the exact theory established, you will systematically unpack the approximations required for real-world materials. This includes navigating the hierarchy of exchange-correlation functionals (from LDA to hybrids), constructing exact basis sets and pseudopotentials (such as PAW methods), and treating magnetic ordering via Spin-Polarised DFT. Finally, we address the self-interaction error in strongly correlated \(d\) and \(f\) electron systems using the DFT+U (Hubbard) method.

Part II: Numerical Implementations & Practical DFT

Theoretical knowledge is only half the battle. The second half of the course opens the black box of industry-standard DFT codes like VASP and Quantum ESPRESSO. You will learn how to translate continuous mathematics into robust iterative algorithms by mastering Self-Consistent Field (SCF) Convergence—including how to diagnose charge sloshing in metals and apply advanced density mixing schemes (Pulay/DIIS, Broyden). We also cover rigorous Brillouin Zone Integration, detailing \(k\)-point sampling and the specific smearing methods (Methfessel-Paxton, tetrahedron) required to stabilise metallic Fermi surfaces.

By the end of this course, you will possess the practical expertise to run, troubleshoot, and analyze high-accuracy simulations for contemporary materials science research.

Meet your instructor

Prerequisites

Participants should be familiar with:

Basic Quantum Mechanics
Thermodynamics and Statistical Mechanics
Solid State Physics
Elementary linear algebra and differential equations

Course Objectives

By the end of this course, participants will:

Understand the theoretical foundations of Density Functional Theory
Learn the derivation and significance of the Hohenberg–Kohn theorems
Understand the Kohn–Sham formalism and its practical implementation
Become familiar with common exchange–correlation functionals
Learn how to perform and analyze DFT calculations
Explore advanced extensions such as DFT+U and time-dependent DFT

Target Audience

This course is designed for:

Graduate students in Physics, Chemistry, and Materials Science
Researchers entering the field of computational materials science
Students interested in electronic structure theory

Introduction

The density functional theory (DFT) has established itself as the primary tool to calculate the electronic structure of materials.

At its heart, the method reformulates the many-electron problem as a variational problem over the electron density \(\rho(\mathbf{r})\) — a function of three spatial coordinates — rather than the many-body wavefunction \(\Psi(\mathbf{r}_1,\ldots,\mathbf{r}_N)\), which lives in \(3N\) dimensions. This chapter sets up the problem the way a student typically meets it: starting from the Schrödinger equation, invoking the Born–Oppenheimer approximation, and then working through the Hartree and Hartree–Fock (HF) mean-field approximations using the variational principle as the central tool. By the end we will see exactly where these methods succeed, where they fail, and why DFT — built on a different fundamental variable — is the natural next step.

A note on units. Sections 1.1–1.3 retain the explicit \(\hbar^2/2m_e\) and \(e^2\) factors familiar from undergraduate quantum mechanics. From Chapter 3 onward we switch to Hartree atomic units (\(\hbar = m_e = e = 4\pi\varepsilon_0 = 1\)), in which \(-\tfrac{\hbar^2}{2m_e}\nabla^2 \to -\tfrac{1}{2}\nabla^2\) and the Coulomb repulsion is simply \(1/|\mathbf{r}_i - \mathbf{r}_j|\). Conversions: \(1,\mathrm{Ha} = 27.2114,\mathrm{eV} = 2,\mathrm{Ry}\); \(1,\mathrm{bohr} = 0.52918,\text{Å}\).

Schrödinger’s Equation: Revisited for Many-Particle Systems

The Schrödinger equation is the governing equation for quantum mechanical systems:

\[ \left[-\frac{\hbar^2}{2m_e}\nabla^2+V(\mathbf{r})\right]\Psi(\mathbf{r})=E\Psi(\mathbf{r}) \]

The problem grows intractable when many particles are present. The many-body Hamiltonian for \(N\) nuclei and \(n\) electrons is:

\begin{align*} \hat{H} =&-\frac{\hbar^2}{2m}\sum_{i}^n\nabla^2_i+ \frac{1}{2}\sum_{i\neq j}\frac{e^2}{|\mathbf{r}_i-\mathbf{r}_j|} & \qquad \mbox{Electron} \newline & -\frac{\hbar^2}{2M_j}\sum_j^N\nabla^2_j + \frac{1}{2}\sum_{i\neq j}\frac{Z_iZ_j e^2}{|{R_i-R_j}|} & \mbox{Nuclei}\newline & -\sum_{i,j}\frac{Z_j e^2}{|\mathbf{r}_i-\mathbf{R}_j|} & \mbox{Electron-Nucleus} \end{align*}

The time-independent Hamiltonian \(\hat{H}=H(p,q)\) is a function of the generalised momenta and coordinates \((p,q)\). Solving it becomes increasingly difficult for systems with more than one electron.

Even for a simple atom like oxygen, with 8 electrons, and taking a coarse grid of just 10 points in each spatial dimension, we need \(10^{24}\) data points. On modern computers, each real number requires 64 bits of memory. To store the wavefunction of a single oxygen atom we would need \(64\times 10^{24}\) bits \(\approx 8\times 10^{24}\) bytes \(\approx 10^{17}\) MB. (To appreciate how large this is, consider that the age of the universe is roughly \(4\times 10^{17}\) seconds — the numbers are comparable!)

So we need systematic approximations to solve the problem.

Born–Oppenheimer Approximation

Because protons are roughly \(10^3\) times heavier than electrons, they move far more slowly under the same force. This motivates the Born–Oppenheimer (BO) approximation: we freeze the nuclear degrees of freedom and solve only for the electronic structure at fixed nuclear positions.

Validity condition. The BO approximation is justified when the electronic excitation energy \(\hbar\omega_{\rm el}\) greatly exceeds the nuclear vibrational energy \(\hbar\omega_{\rm nuc}\):

\[ \omega_{\rm el} \gg \omega_{\rm nuc}, \qquad \frac{\omega_{\rm nuc}}{\omega_{\rm el}} \sim \left(\frac{m_e}{M}\right)^{1/2} \approx 10^{-2}. \]

This ratio is small for all atoms, which is why BO works so broadly. It breaks down near conical intersections — geometries where two electronic potential energy surfaces become degenerate — leading to non-adiabatic dynamics that are important in photochemistry and certain charge-transfer processes. These are beyond the scope of ground-state DFT.

Under this approximation:

The nuclear kinetic energy term vanishes: \(\frac{\hbar^2}{2M_j}\sum_j^N\nabla^2_j = 0\).
The nucleus–nucleus repulsion \(\frac{1}{2}\sum_{i\neq j}\frac{Z_iZ_j e^2}{|{R_i-R_j}|}\) becomes a constant that shifts the total energy but does not affect the electronic wavefunction.
The electron–nucleus attraction \(\sum_{i,j}\frac{Z_j e^2}{|\mathbf{r}_i-\mathbf{R}_j|}\) depends only on the electronic coordinates \(r\) (nuclei are fixed parameters).

The resulting electronic Hamiltonian is:

\begin{align*} \hat{H}_e =&-\frac{\hbar^2}{2m_e}\sum_{i=1}^n\nabla^2_i & \mbox{Kinetic energy of electrons}\newline &+ \frac{1}{2}\sum_{i\neq j}\frac{e^2}{|\mathbf{r}_i-\mathbf{r}_j|} &\mbox{Electron--electron repulsion}\newline & +V_{\rm ext}(\mathbf{r}) & \mbox{Electron--nucleus attraction} \end{align*}

The electron–electron repulsion term is still a many-body interaction — handling it requires further approximation. Before tackling it, we need the central tool that powers every approximation scheme we will meet: the variational principle.

The Variational Principle

The variational principle is the workhorse of approximate quantum mechanics. Every method in this chapter (Hartree, Hartree–Fock), the entire Kohn–Sham framework of Chapter 3, and the proof of the Hohenberg–Kohn theorems in Chapter 2 all rest on it.

For any normalised trial state \(|\tilde{\Psi}\rangle\) in the Hilbert space of the system, the expectation value of the Hamiltonian is an upper bound on the ground-state energy:

\[ E_0 \leq \langle\tilde{\Psi}|\hat{H}|\tilde{\Psi}\rangle, \]

with equality if and only if \(|\tilde{\Psi}\rangle = |\Psi_0\rangle\), the true ground state.

Proof (by completeness)

Expand the trial state in the complete orthonormal set of energy eigenstates \({|\Psi_n\rangle}\) of \(\hat{H}\) with eigenvalues \(E_n\) (ordered \(E_0 \leq E_1 \leq E_2 \leq \cdots\)):

\[ |\tilde{\Psi}\rangle = \sum_n c_n |\Psi_n\rangle, \qquad \sum_n |c_n|^2 = 1. \]

Then:

\[ \langle\tilde{\Psi}|\hat{H}|\tilde{\Psi}\rangle = \sum_n |c_n|^2 E_n \geq \sum_n |c_n|^2 E_0 = E_0. \]

Equality holds only when \(c_n = 0\) for all \(n\) with \(E_n > E_0\), i.e. when \(|\tilde\Psi\rangle\) is the (possibly degenerate) ground state. \(\blacksquare\)

Strategy. Choose a parametrised family of trial states \(|\tilde{\Psi}_\alpha\rangle\) depending on one or more parameters \(\alpha\), compute \(E(\alpha) = \langle\tilde{\Psi}_\alpha|\hat{H}|\tilde{\Psi}_\alpha\rangle\), and minimise over \(\alpha\). The result is the best estimate of \(E_0\) accessible within the chosen family. The narrower the family, the looser the bound; the richer the family, the closer to \(E_0\) — at greater computational cost.

Worked Example: Hydrogen Atom with a Gaussian Trial Wavefunction

The hydrogen atom is the natural test case because the exact answer is known (\(E_0^{\rm exact} = -\tfrac{1}{2},\mathrm{Ha} = -13.606,\mathrm{eV}\), with ground state \(\psi_0(r) = \pi^{-1/2}e^{-r}\) in atomic units). The exact ground state is exponential, not Gaussian, so a Gaussian trial wavefunction cannot reach \(E_0\) exactly — making it a clean example of how the variational principle bounds without saturating.

Take the normalised Gaussian trial wavefunction in atomic units:

\[ \tilde{\psi}_\alpha(\mathbf{r}) = \left(\frac{2\alpha}{\pi}\right)^{3/4} e^{-\alpha r^2}, \qquad \alpha > 0, \]

with \(\alpha\) the width parameter. The hydrogen-atom Hamiltonian (atomic units, \(Z=1\)) is \(\hat{H} = -\tfrac{1}{2}\nabla^2 - 1/r\).

Kinetic energy. Using \(\langle\hat{T}\rangle = \tfrac{1}{2}\int|\nabla\tilde\psi|^2,d\mathbf{r}\) and the standard Gaussian integral:

\[ \langle\hat{T}\rangle_\alpha = \frac{3\alpha}{2}. \]

Potential energy. Performing the angular integration and using \(\int_0^\infty r,e^{-2\alpha r^2},dr = 1/(4\alpha)\):

\[ \langle\hat{V}\rangle_\alpha = -\left(\frac{2\alpha}{\pi}\right)^{3/2}! \int e^{-2\alpha r^2},\frac{1}{r},d\mathbf{r} = -2\sqrt{\frac{2\alpha}{\pi}}. \]

Total energy.

\[ E(\alpha) = \frac{3\alpha}{2} - 2\sqrt{\frac{2\alpha}{\pi}}. \]

Minimisation. Setting \(dE/d\alpha = 0\):

\[ \frac{3}{2} - \sqrt{\frac{2}{\pi}}\cdot\frac{1}{\sqrt{\alpha}} = 0 \quad\Longrightarrow\quad \alpha_{\rm opt} = \frac{8}{9\pi} \approx 0.2829. \]

Substituting back:

\[ E_{\rm var} = E(\alpha_{\rm opt}) = -\frac{4}{3\pi},\mathrm{Ha} \approx -0.4244,\mathrm{Ha} \approx -11.5,\mathrm{eV}. \]

Compare with the exact \(E_0 = -0.5,\mathrm{Ha} = -13.6,\mathrm{eV}\): the Gaussian trial gives a variational bound \(E_{\rm var} > E_0\), with an error of \(\sim 15%\). The bound is loose because the Gaussian lacks the cusp at \(r=0\) and the exponential tail of the exact eigenstate — but the principle worked: minimising the energy expectation value over a parameter family produces the best estimate within that family.

This is precisely how every method below works. The Hartree approximation restricts the trial wavefunction to a product of single-particle orbitals; Hartree–Fock restricts it to a Slater determinant; Kohn–Sham DFT (Chapter 3) restricts the search to densities representable by a non-interacting reference system. Each is a variational approximation over a different restricted class — and the rigour of the bound \(E_0 \leq \langle\tilde\Psi|\hat{H}|\tilde\Psi\rangle\) is what guarantees the methods are systematically improvable.

Mean-Field Approximations

Hartree Approximation

After invoking the Born–Oppenheimer approximation, the remaining challenge is the electron–electron repulsion \(\frac{1}{2}\sum_{i\neq j}\frac{e^2}{|\mathbf{r}_i-\mathbf{r}_j|}\). The simplest treatment sets this to zero (non-interacting electrons), but this gives grossly incorrect results for most properties.

A better strategy is to assume that each electron does not interact with individual others, but instead moves in the mean (average) field generated by all the other electrons combined. This is the Hartree approximation, and it is variational: we restrict the trial wavefunction to a simple product of single-particle orbitals,

\[ \tilde\Psi_{\rm H}(\mathbf{r}_1,\ldots,\mathbf{r}_N) = \phi_1(\mathbf{r}_1)\phi_2(\mathbf{r}_2)\cdots\phi_N(\mathbf{r}_N), \]

and minimise \(\langle\tilde\Psi_{\rm H}|\hat{H}|\tilde\Psi_{\rm H}\rangle\) with respect to each orbital \(\phi_i\) subject to the normalisation constraint \(\langle\phi_i|\phi_i\rangle = 1\).

The electron–electron repulsion felt by electron \(i\) is replaced by the electrostatic potential of the continuous charge density \(\rho(\mathbf{r}’)\) of all other electrons:

\begin{align*} \frac{1}{2}\sum_{i\neq j}\frac{e^2}{|\mathbf{r}_i-\mathbf{r}_j|} &\approx \int \frac{e^2\,\rho(\mathbf{r}')}{|\mathbf{r}_i-\mathbf{r}'|}\,d\mathbf{r}' \equiv V_{\rm H}(\mathbf{r}_i), \\ E_{\rm H}[\rho] &= \frac{1}{2}\iint\frac{e^2\,\rho(\mathbf{r})\rho(\mathbf{r}')}{|\mathbf{r}-\mathbf{r}'|}\,d\mathbf{r}\,d\mathbf{r}'. \end{align*}

where \(\rho(\mathbf{r}‘) = \sum_j |\phi_j(\mathbf{r}’)|^2\) is the electron number density. The quantity \(E_{\rm H}[\rho]\) is the Hartree energy — the classical electrostatic self-energy of the charge distribution. The corresponding single-electron potential \(V_{\rm H}(\mathbf{r}_i)\) enters the effective single-particle Hamiltonian.

The full Hartree Hamiltonian then reads:

\begin{equation*} \hat{H}_{\rm Hartree} = -\frac{\hbar^2}{2m_e}\sum_{i=1}^n\nabla^2_i + \sum_i V_{\rm H}(\mathbf{r}_i) + V_{\rm ext}(\mathbf{r}). \end{equation*}

The Hartree approximation has two glaring deficiencies. First, the product wavefunction is not antisymmetric, so it violates the Pauli principle. Second, the Hartree potential includes the electron’s own contribution to \(\rho\) — a self-interaction error. Both are fixed (the first exactly, the second only partially) by Hartree–Fock.

Hartree–Fock Approximation

The Hartree approximation is incomplete because it ignores the fact that electrons are fermions: identical spin-\(\tfrac{1}{2}\) particles that obey the Pauli exclusion principle, which requires the many-body wavefunction to be antisymmetric under the exchange of any two electrons.

Fock incorporated this antisymmetry into the Hartree scheme, yielding the Hartree–Fock (HF) approximation. The variational ansatz is broadened from a product to a Slater determinant, which automatically satisfies antisymmetry:

\[ \Psi_{\rm HF}(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_N)= \frac{1}{\sqrt{N!}}\begin{vmatrix} \psi_1(\mathbf{x}_1) &\cdots &\psi_1(\mathbf{x}_N)\\ \vdots & \ddots & \vdots\\ \psi_N(\mathbf{x}_1) &\cdots &\psi_N(\mathbf{x}_N)\\ \end{vmatrix} \]

where \(\mathbf{x}_i = (\mathbf{r}_i, s_i)\) combines spatial and spin coordinates and the \(\psi_i\) are orthonormal spin orbitals. Swapping any two columns (i.e. exchanging two electrons) changes the sign of the determinant, ensuring the required antisymmetry.

Evaluating \(\langle\Psi_{\rm HF}|\hat{H}_e|\Psi_{\rm HF}\rangle\) using the Slater determinant gives the Hartree–Fock energy functional:

\begin{align} E_{\rm HF} =& \sum_i \langle\psi_i|\hat{h}_0|\psi_i\rangle + \frac{1}{2}\sum_{i,j}\left[J_{ij} - K_{ij}\right], \\ \hat{h}_0 =& -\frac{\hbar^2}{2m_e}\nabla^2 + V_{\rm ext}, \end{align}

where \(\hat{h}_0\) is the one-electron core Hamiltonian and the two-electron integrals are:

\begin{align} J_{ij} =& \iint \frac{e^2\,|\psi_i(\mathbf{x})|^2\,|\psi_j(\mathbf{x}')|^2}{|\mathbf{r}-\mathbf{r}'|}\,d\mathbf{x}\,d\mathbf{x}', &&\text{Coulomb integral} \\ K_{ij} =& \iint \frac{e^2\,\psi_i^*(\mathbf{x})\psi_j^*(\mathbf{x}')\psi_j(\mathbf{x})\psi_i(\mathbf{x}')}{|\mathbf{r}-\mathbf{r}'|}\,d\mathbf{x}\,d\mathbf{x}'. &&\text{Exchange integral} \label{eq:Kij} \end{align}

The integral \(K_{ij}\) — the exchange integral — has no classical analogue. It arises from the antisymmetry of the determinant: when two electrons of the same spin “exchange” their labels, the resulting cross-term in \(|\Psi_{\rm HF}|^2\) lowers the energy by keeping same-spin electrons spatially separated. \(K_{ij}\) vanishes when spin orbitals \(\psi_i\) and \(\psi_j\) have opposite spins. We will encounter \(K_{ij}\) again in Chapter 4 as the exact exchange term that hybrid DFT functionals add back into the energy.

Applying the variational principle (minimise \(E_{\rm HF}\) over the orbitals subject to orthonormality \(\langle\psi_i|\psi_j\rangle = \delta_{ij}\)) yields the canonical Hartree–Fock equations:

\begin{equation} \hat{F}\psi_i(\mathbf{x}) = \epsilon_i\,\psi_i(\mathbf{x}), \label{eq:HF} \end{equation}

where the Fock operator is:

\[ \hat{F} = \hat{h}_0 + \sum_j\left[\hat{J}_j - \hat{K}_j\right], \]

with the Coulomb operator \(\hat{J}_j\psi_i = \left(\int |\psi_j(\mathbf{x}‘)|^2/|\mathbf{r}-\mathbf{r}’|,d\mathbf{x}‘\right)\psi_i\) and the exchange operator \(\hat{K}_j\psi_i = \left(\int \psi_j^*(\mathbf{x}’)\psi_i(\mathbf{x}‘)/|\mathbf{r}-\mathbf{r}’|,d\mathbf{x}’\right)\psi_j\). The exchange operator is non-local: it mixes the values of \(\psi_i\) at different points through \(\psi_j\), in contrast to the local multiplicative Coulomb operator \(\hat{J}_j\). This non-locality is what makes evaluating exact exchange computationally expensive — a recurring issue in Chapter 4.

The HF equations are non-linear (each \(\hat{F}\) depends on the \(\psi_j\) it produces) and must be solved self-consistently, anticipating the SCF loop structure we will see for the Kohn–Sham equations in Chapter 3.

What HF captures and what it misses. HF includes exchange exactly (by construction) and treats kinetic energy exactly within the single-determinant ansatz. It does not capture electron correlation — the part of the interaction missed when the wavefunction is restricted to a single Slater determinant. Correlation energy is defined as:

\[ E_{\rm corr} \equiv E_{\rm exact} - E_{\rm HF}, \]

and \(E_{\rm corr} < 0\) (HF is an upper bound by the variational principle). Although small in absolute terms (\(\sim 1\)–\(2\) eV for typical molecules), correlation is precisely the scale of chemical bond energies, so a quantitative theory must include it. Capturing correlation cheaply is the central motivation for DFT and its exchange-correlation functional \(E_{\rm xc}\).

Worked Example: The Helium Atom

The helium atom (two electrons, nucleus with charge \(Z = 2\)) is the simplest system where electron–electron interaction matters. We work it through in detail because the variational treatment reveals both the power and the limitation of mean-field methods, and the gap to experiment is exactly the correlation energy that DFT will need to recover.

In atomic units the Hamiltonian is:

\[ \hat{H} = -\frac{1}{2}(\nabla_1^2 + \nabla_2^2) - \frac{Z}{r_1} - \frac{Z}{r_2} + \frac{1}{|\mathbf{r}_1 - \mathbf{r}_2|}. \]

Trial wavefunction. Take both electrons in a \(1s\)-type orbital with effective nuclear charge \(\alpha\) (rather than \(Z = 2\)) — the parameter that captures screening: each electron sees a nucleus partially shielded by the other. Spin-singlet, so the spatial part is symmetric:

\[ \tilde\Psi(\mathbf{r}1,\mathbf{r}2) = \phi\alpha(r_1)\phi\alpha(r_2), \qquad \phi_\alpha(r) = \left(\frac{\alpha^3}{\pi}\right)^{1/2}e^{-\alpha r}. \]

This is a product wavefunction (Hartree-style); the singlet spin state makes it equivalent to a properly antisymmetrised closed-shell Slater determinant for this two-electron system.

Kinetic energy. Each electron contributes \(\langle -\tfrac{1}{2}\nabla^2\rangle_\phi = \alpha^2/2\) (standard hydrogenic result), so:

\[ \langle T\rangle = \alpha^2. \]

Electron–nucleus attraction. Each electron contributes \(\langle -Z/r\rangle_\phi = -Z\alpha\):

\[ \langle V_{en}\rangle = -2Z\alpha. \]

Electron–electron repulsion. This is the non-trivial integral:

\[ \langle V_{ee}\rangle = \iint \frac{|\phi_\alpha(r_1)|^2 |\phi_\alpha(r_2)|^2}{|\mathbf{r}_1-\mathbf{r}_2|},d\mathbf{r}_1,d\mathbf{r}_2. \]

Evaluating using the multipole expansion of \(1/|\mathbf{r}_1-\mathbf{r}_2|\) (only the \(\ell = 0\) term survives for spherical \(\phi\)) gives the classical result:

\[ \langle V_{ee}\rangle = \frac{5\alpha}{8}. \]

This is the key integral: it tells us how much energy two electrons in the same hydrogenic orbital cost each other.

Total energy. Summing:

\[ E(\alpha) = \alpha^2 - 2Z\alpha + \frac{5\alpha}{8}. \]

Minimisation. Setting \(dE/d\alpha = 0\):

\[ 2\alpha - 2Z + \frac{5}{8} = 0 \quad\Longrightarrow\quad \boxed{\alpha_{\rm opt} = Z - \frac{5}{16}.} \]

For \(Z = 2\) (helium), \(\alpha_{\rm opt} = 27/16 = 1.6875\) — each electron effectively sees a nuclear charge reduced from 2 to 1.6875, the screening by the other electron quantified.

Substituting back:

\[ E_{\rm var}(He) = -\left(Z - \frac{5}{16}\right)^2 = -\left(\frac{27}{16}\right)^2 \approx -2.848,\mathrm{Ha} \approx -77.49,\mathrm{eV}. \]

Comparison with experiment and full HF.

Method	Ground-state energy	Comment
Non-interacting (\(\alpha = Z\))	\(-2.750,\mathrm{Ha} = -74.83,\mathrm{eV}\)	Ignores \(V_{ee}\) entirely
Hartree (product, no exchange)	\(\sim -77.5,\mathrm{eV}\)	Mean-field, no Pauli
Variational (single-\(\zeta\), \(\alpha\) optimised)	\(-77.49,\mathrm{eV}\)	The calculation above
Full HF (numerically exact orbital)	\(-77.87,\mathrm{eV}\)	Best single-determinant
Experiment	\(-79.01,\mathrm{eV}\)	True ground-state energy

The correlation energy for helium is therefore:

\[ E_{\rm corr} = E_{\rm exact} - E_{\rm HF} \approx -1.14,\mathrm{eV}. \]

Though small in absolute terms, this gap is the magnitude of typical chemical bond energies — and it is precisely what was lost when we restricted the wavefunction to a product (or single-determinant) form. The exact two-electron ground state has the electrons correlated: when one is near the nucleus, the other tends to be on the opposite side, an angular correlation no product wavefunction can capture. Recovering this correlation energy cheaply — without returning to the full \(3N\)-dimensional wavefunction — is the entire motivation for DFT.

Thomas–Fermi: The First Density Functional

Before Kohn and Sham, a natural question had already been asked: can we write the entire energy as a functional of \(\rho(\mathbf{r})\) alone, with no reference to orbitals? The Thomas–Fermi (TF) model (Thomas 1927; Fermi 1928) attempts exactly this, and although it fails quantitatively, the failure is instructive — it tells us precisely what the Kohn–Sham construction of Chapter 3 must restore.

The TF idea is to use the uniform electron gas (UEG) kinetic energy as a local functional of the density. For a UEG of density \(\rho\), the kinetic energy per unit volume is (derivation deferred to Chapter 4):

\[ t^{\rm UEG}(\rho) = \frac{3}{10}(3\pi^2)^{2/3},\rho^{5/3}. \]

The TF ansatz applies this locally:

\begin{equation} T^{\rm TF}[\rho] = \frac{3}{10}(3\pi^2)^{2/3}\int \rho(\mathbf{r})^{5/3}\,d\mathbf{r}. \label{eq:TFkin} \end{equation}

Combined with the classical Hartree energy and the external potential interaction, the TF total energy functional is:

\[ E^{\rm TF}[\rho] = T^{\rm TF}[\rho] + \int V_{\rm ext}\rho,d\mathbf{r} + E_{\rm H}[\rho]. \]

Minimising with respect to \(\rho\) subject to \(\int\rho,d\mathbf{r} = N\) yields the Thomas–Fermi equation:

\[ \frac{5}{3}\cdot\frac{3}{10}(3\pi^2)^{2/3}\rho^{2/3}(\mathbf{r}) + V_{\rm ext}(\mathbf{r}) + V_{\rm H}(\mathbf{r}) = \mu, \]

where \(\mu\) is the chemical potential (Lagrange multiplier). This is a self-consistent equation for the density alone — no orbitals, no wavefunctions, no \(3N\)-dimensional object.

Why it fails. The TF model is qualitatively wrong about several things:

No binding. Teller (1962) proved that TF theory cannot describe molecular binding: the energy of a molecule is always greater than the sum of its atomic constituents. This is a theorem, not a numerical artifact.
Wrong asymptotic behaviour. The TF density decays as \(r^{-6}\) at large distances rather than exponentially as required for bound states.
No shell structure. The TF density is smooth and structureless; it does not show the atomic shell oscillations seen in Hartree–Fock calculations or experiment.
Kinetic energy errors of \(\sim 10%\). Even with the Dirac exchange correction added (TF+D, also called Thomas–Fermi–Dirac), atomic energies are off by tens of eV.

The root cause is the kinetic energy \eqref{eq:TFkin}. The UEG-based local form is fine for slowly varying densities, but real atomic and molecular densities vary rapidly — there is no length scale on which a UEG is a good local approximation. The shell structure, the cusps at nuclei, and the exponential tails are all features that a local kinetic energy functional cannot reproduce.

The lesson for DFT. TF taught us two things:

Encoding all the physics in \(\rho(\mathbf{r})\) alone is in principle possible — this is what the Hohenberg–Kohn theorems (Chapter 2) will prove rigorously, three decades after TF.
Writing the kinetic energy as an explicit functional of \(\rho\) is hard. Any local approximation misses essential physics.

The Kohn–Sham scheme of Chapter 3 sidesteps the second problem with a single elegant trick: reintroduce orbitals, but only enough to compute the kinetic energy of a fictitious non-interacting reference system exactly. Everything else — the corrections that distinguish the real interacting system from the fictitious one — is packaged into the exchange-correlation functional \(E_{\rm xc}[\rho]\), which is small enough that even crude approximations work reasonably well.

The Path to DFT

We can now state the situation precisely. The methods of this chapter have given us a hierarchy of approximations to the many-electron problem, each anchored by the variational principle:

Method	Trial wavefunction	Variables	Captures	Misses
Hartree	Product \(\phi_1\cdots\phi_N\)	\(N\) orbitals	Mean field	Exchange, correlation, antisymmetry
Hartree–Fock	Single Slater determinant	\(N\) orbitals	+ Exchange, antisymmetry	Correlation
Thomas–Fermi	Density only	\(\rho(\mathbf{r})\)	Variational over density	Kinetic energy, shell structure
Full CI / exact	All determinants	\(3N\)-dim. \(\Psi\)	Everything	(Intractable)

The trade-off is stark. Hartree–Fock scales as \(\mathcal{O}(N^4)\) and captures exchange exactly but misses correlation. The exact many-body wavefunction lives in \(3N\) dimensions and is intractable for any \(N > 10\) or so. Thomas–Fermi reduces the problem to a function of three variables but loses essential physics in the kinetic energy.

What if we could change the fundamental variable? Instead of the wavefunction \(\Psi(\mathbf{r}_1,\ldots,\mathbf{r}_N)\) — a function in \(3N\) dimensions — what if the ground-state energy is exactly a functional of the electron density \(\rho(\mathbf{r})\) alone, a function in only three dimensions?

This is not a vague aspiration. The Hohenberg–Kohn theorems of Chapter 2 prove it rigorously: every ground-state property of an interacting many-electron system in any external potential is uniquely determined by its ground-state density \(\rho_0(\mathbf{r})\). The many-body wavefunction is a functional of the density, \(\Psi_0 = \Psi[\rho_0]\), and so is the energy:

\[ E_0 = E[\rho_0] = \min_\rho E[\rho]. \]

The minimisation is over densities — a function of three variables — not over wavefunctions in \(3N\) dimensions. This is the promise of DFT: combine the rigour of the variational principle with the computational tractability of the Thomas–Fermi reduction, while avoiding TF’s pitfalls.

The two theorems of Chapter 2 establish this rigorously. The Kohn–Sham construction of Chapter 3 turns the theorems into an actual computational method by reintroducing orbitals just for the kinetic energy — solving Thomas–Fermi’s worst problem — and packaging everything else into a small correction term \(E_{\rm xc}[\rho]\) that we will spend Chapter 4 learning to approximate.

Hohenberg-Kohn (HK) Theorems

The Hohenberg–Kohn (HK) theorems, proposed in 1964 by Pierre Hohenberg and Walter Kohn, are the fundamental theoretical pillars of Density Functional Theory (DFT). They establish that the ground-state properties of a many-electron system are uniquely determined by its electron density, i.e. \(\rho(r)\), rather than by the complex many-body wavefunction \(\Psi\).

This is a remarkable result: it implies that a function of three spatial variables \(\rho(r)\) — rather than a wavefunction in \(3N\) dimensions — contains all the information needed to determine the ground state of an \(N\)-electron system. The HK theorems provide the rigorous justification for replacing the many-body Schrödinger equation with a variational problem over electron densities, forming the conceptual basis for all practical implementations of DFT.

Assumptions

We consider a system of \(N\) interacting electrons subject to an external scalar potential \(V_{\rm ext}(r)\), at zero temperature, in its ground state, with no time-dependent fields. The many-body Hamiltonian is:

\[ \begin{align} \hat{H} =& \hat{T}+\hat{V}_{ee}+\hat{V}_{ext}\\ \hat{H}\Psi =& E\Psi \end{align} \]

where

\(\hat{T}\) is the kinetic energy operator of the electrons,
\(\hat{V}_{ee}\) is the Coulomb electron–electron repulsion operator,
\(\hat{V}_{ext} = \sum_{i=1}^N V_{\rm ext}(r_i) \) is the external potential due to the nuclei or any applied field — it is the same for all systems considered in the proofs below.

The electron density \(\rho(\mathbf{r})\) is the number density of electrons at position \(\mathbf{r}\) — not the probability density \(|\Psi|^2\) itself, but the marginal single-particle density obtained by integrating out the coordinates of all other electrons and multiplying by \(N\) to account for the fact that any of the \(N\) indistinguishable electrons can be found at \(\mathbf{r}\):

\begin{equation} \rho(\mathbf{r}) = N\int |\Psi(\mathbf{r},\mathbf{r}_2,\ldots,\mathbf{r}_N)|^2\,d\mathbf{r}_2\,d\mathbf{r}_3\cdots d\mathbf{r}_N. \label{eq:charge-wf} \end{equation}

By construction \(\int \rho(\mathbf{r}),d\mathbf{r} = N\): the density integrates to the total number of electrons, not unity. \(\rho(\mathbf{r})\) is the central variable of DFT.

The First Hohenberg–Kohn Theorem

For any system of interacting particles in an external potential \(V_{\rm ext}(r)\), the external potential is uniquely determined — up to a constant — by the ground-state electron density \\(\rho_0(r)\\).

Because \(V_{\rm ext}(r)\) uniquely determines \(\hat{H}\) (and hence all eigenstates and eigenvalues), the ground-state density \(\rho_0(r)\) alone determines everything about the system. The wavefunction \(\Psi_0\) is therefore a functional of \(\rho_0\), written \(\Psi_0 = \Psi[\rho_0]\).

Proof (by contradiction)

Assume the contrary: suppose there exist two different external potentials \(V_{\rm ext}(\mathbf{r})\) and \(V’_{\rm ext}(\mathbf{r})\), differing by more than a constant,

\begin{equation} V_{\rm ext}(\mathbf{r}) \neq V'_{\rm ext}(\mathbf{r}) + \text{const.}, \end{equation}

that nevertheless yield the same ground-state density: \(\rho(\mathbf{r})=\rho’(\mathbf{r}) \equiv \rho(\mathbf{r})\).

Let the two Hamiltonians and their ground states be:

\[ \begin{aligned} \hat{H} & = \hat{T} + \hat{V}_{\rm ee} + \hat{V}_{\rm ext}, & \hat{H}|\Psi\rangle = E_0|\Psi\rangle, \\ \hat{H}’ & = \hat{T} + \hat{V}_{\rm ee} + \hat{V}‘_{\rm ext}, & \hat{H}’|\Psi’\rangle = E’_0|\Psi’\rangle. \end{aligned} \]

We apply the Rayleigh–Ritz variational principle (Chapter 1.4), which states that the true ground-state energy minimises the energy expectation value: any trial state \(|\tilde{\Psi}\rangle \neq |\Psi_0\rangle\) satisfies \(\langle\tilde{\Psi}|\hat{H}|\tilde{\Psi}\rangle > E_0\).

Using \(\Psi’\) as a trial state for \(\hat{H}\):

\begin{align} E_0 \leq& \langle \Psi' | \hat{H} | \Psi' \rangle = \langle \Psi' | \hat{H}' | \Psi' \rangle + \langle \Psi' | \hat{H} - \hat{H}' | \Psi' \rangle \notag\\\\ =& \; E'\_0 + \int \left[ V\_{\rm ext}(\mathbf{r}) - V'\_{\rm ext}(\mathbf{r}) \right] \rho(\mathbf{r}) \, d\mathbf{r}. \label{eq:ineq1} \end{align}

Symmetrically, using \(\Psi\) as a trial state for \(\hat{H}’\):

\begin{equation} E'\_0 \leq \langle \Psi | \hat{H}' | \Psi \rangle = E\_0 + \int \left[ V'\_{\rm ext}(\mathbf{r}) - V\_{\rm ext}(\mathbf{r}) \right] \rho(\mathbf{r}) \, d\mathbf{r}. \label{eq:ineq2} \end{equation}

Adding the two inequalities \eqref{eq:ineq1} and \eqref{eq:ineq2}:

\[ \begin{align} E_0 + E’_0 & \leq E’_0 + E_0 + \int \underbrace{\left[ V_{\rm ext} - V’_{\rm ext} + V’_{\rm ext} - V_{\rm ext} \right]}_{=,0} \rho(\mathbf{r}) , d\mathbf{r}, \\ E_0 + E’_0 & \leq E_0 + E’_0. \end{align} \]

This is a contradiction. Our assumption must therefore be false: no two different external potentials (differing by more than a constant) can share the same ground-state density. The map \(\rho_0(r) \mapsto V_{\rm ext}(r)\) is therefore one-to-one (up to a constant). \(\blacksquare\)

Consequence: the universal energy functional

Since \(\rho_0 \to V_{\rm ext} \to \hat{H} \to \Psi_0\), the ground-state energy can be written as a functional of the density:

\[ \begin{align*} E[\rho] =& F[\rho]+\int V_{\rm ext}(r)\rho(r),dr\\ F[\rho] =& \langle\Psi[\rho]|, \hat{T}+\hat{V}_{ee},| \Psi[\rho]\rangle \end{align*} \]

The quantity \(F[\rho]\) is called the universal functional of the density. It is “universal” because it depends only on the electron density — not on the specific external potential — making it, in principle, applicable to any system of interacting electrons.

The Second Hohenberg–Kohn Theorem

The true ground-state density \(\rho_0(\mathbf{r})\) minimises the total energy functional \(E[\rho]\).

That is, for any trial density \(\tilde{\rho}(\mathbf{r})\) satisfying

\begin{equation} \tilde{\rho}(\mathbf{r}) \geq 0, \quad \int \tilde{\rho}(\mathbf{r}) \, d\mathbf{r} = N, \end{equation}

we have

\begin{equation} E_0 \leq E[\tilde{\rho}], \end{equation}

with equality if and only if \(\tilde{\rho}(\mathbf{r}) = \rho_0(\mathbf{r})\).

The functional is:

\begin{equation} E[\rho] = F[\rho] + \int V_{\rm ext}(\mathbf{r}) \rho(\mathbf{r}) \, d\mathbf{r}, \end{equation}

where \(F[\rho]\) is the universal functional defined above.

Proof of the 2nd HK Theorem

By the First HK Theorem, every v-representable density \(\tilde{\rho}\) uniquely determines an external potential \(\tilde{V}_{\rm ext}\) (up to a constant), and hence a Hamiltonian \(\tilde{H}\) and a ground-state wavefunction \(\tilde{\Psi}\). The ground-state energy functional is therefore well-defined:

\[ E[\tilde{\rho}] = \langle \tilde{\Psi}[\tilde{\rho}] | \hat{T} + \hat{V}_{ee} + \hat{V}_{\rm ext} | \tilde{\Psi}[\tilde{\rho}] \rangle = F[\tilde{\rho}] + \int V_{\rm ext}(\mathbf{r})\tilde{\rho}(\mathbf{r}) d\mathbf{r} \]

Now consider the true ground state \(|\Psi_0\rangle\) with density \(\rho_0\) and energy \(E_0\). For any trial density \(\tilde{\rho} \neq \rho_0\), the associated wavefunction \(\tilde{\Psi}\) is different from \(\Psi_0\) (again by the First HK Theorem). Since \(\Psi_0\) is the true ground state of \(\hat{H}\), the Rayleigh–Ritz variational principle (Chapter 1.4) gives:

\[ E_0 = \langle \Psi_0 | \hat{H} | \Psi_0 \rangle \leq \langle \tilde{\Psi} | \hat{H} | \tilde{\Psi} \rangle. \]

The right-hand side is precisely \(E[\tilde{\rho}]\) as defined above (using \(\hat{V}_{\rm ext}\) of the original system). Therefore:

\[ E_0 \leq E[\tilde{\rho}], \quad \forall, \tilde{\rho} \neq \rho_0, \]

with equality if and only if \(\tilde{\Psi} = \Psi_0\), i.e. \(\tilde{\rho} = \rho_0\). \(\blacksquare\)

The key logical chain is:

The First HK Theorem guarantees the map \(\tilde{\rho} \mapsto \tilde{\Psi}\) is one-to-one, so \(\tilde{\rho} \neq \rho_0 \Rightarrow \tilde{\Psi} \neq \Psi_0\).
The Rayleigh–Ritz principle then directly delivers the inequality \(E[\tilde{\rho}] \geq E_0\).

The second theorem transforms the problem of solving the many-body Schrödinger equation into a variational problem over electron densities — a function in three dimensions rather than \(3N\) dimensions. This is the promise of DFT.

Outlook

While the HK theorems establish that the ground-state density contains all physical information and that a universal energy functional \(E[\rho]\) exists, they do not provide an explicit form for the universal functional \(F[\rho]\). In particular, the kinetic energy \(T[\rho]\) and the electron–electron interaction \(V_{ee}[\rho]\) are unknown functionals of \(\rho\).

The Levy constrained-search formulation. The original HK derivation defines \(F[\rho]\) only for densities that are ground-state densities of some external potential — the \(v\)-representable densities. Not every smooth non-negative \(\rho\) with \(\int\rho,d\mathbf{r} = N\) satisfies this; characterising the set of \(v\)-representable densities is in fact an open problem. Levy (1979) and Lieb (1983) eliminated this concern by recasting the universal functional as a constrained search over wavefunctions:

\begin{equation} F[\rho] = \min_{\Psi \to \rho}\,\langle\Psi|\hat{T} + \hat{V}_{ee}|\Psi\rangle, \label{eq:Levy} \end{equation}

where the minimisation runs over all antisymmetric \(N\)-electron wavefunctions \(\Psi\) that yield the given density \(\rho(\mathbf{r})\) via equation \eqref{eq:charge-wf}. Two features make this definition powerful:

It is well-defined for any non-negative \(\rho\) integrating to \(N\) — the larger class of \(N\)-representable densities — sidestepping the \(v\)-representability question entirely.
The minimisation principle of the second HK theorem follows directly: for any trial density \(\tilde\rho\), \(E[\tilde\rho] = F[\tilde\rho] + \int V_{\rm ext}\tilde\rho,d\mathbf{r} \geq E_0\), because the inner search over \(\Psi\) is itself a Rayleigh–Ritz minimisation.

In the Kohn–Sham scheme (Chapter 3), an analogous non-interacting constrained search defines \(T_s[\rho] = \min_{\Psi \to \rho}\langle\Psi|\hat{T}|\Psi\rangle\) over Slater determinants yielding \(\rho\). The existence of a Slater determinant for any reasonable \(\rho\) is called non-interacting \(v\)-representability and is believed to hold for all physically relevant ground-state densities, though a general proof remains an open problem.

Practical takeaway. The universal functional \(F[\rho]\) is well-defined and unique, but its explicit form is unknown. Computing it accurately is the central challenge of DFT in practice. The Kohn–Sham equations, developed in the next chapter, provide a practical path forward by introducing a fictitious system of non-interacting electrons designed to reproduce the exact ground-state density — sidestepping the need to evaluate \(F[\rho]\) directly.

Kohn-Sham Equations

The Hohenberg–Kohn theorems establish that the ground-state density \(\rho_0(\mathbf{r})\) uniquely determines all ground-state properties, and that the total energy \(E[\rho]\) is minimised by \(\rho_0\). However, they say nothing about how to compute the universal functional \(F[\rho]\), and in particular how to evaluate the kinetic energy \(T[\rho]\) as a functional of the density alone. As we saw with the Thomas–Fermi model in Chapter 1.6, attempts to write \(T\) purely in terms of \(\rho\) using a local UEG ansatz lead to qualitative failures: no molecular binding, wrong asymptotic decay, no shell structure.

The Kohn–Sham (KS) formalism, introduced by Walter Kohn and Lu Jeu Sham in 1965, provides an elegant and highly accurate solution: introduce a fictitious system of non-interacting electrons that, by construction, reproduces the exact ground-state density of the true interacting system. By reintroducing single-particle orbitals — but only as a computational device for the kinetic energy — KS theory salvages the rigour of the Hohenberg–Kohn theorems while sidestepping the local-functional pitfalls of Thomas–Fermi. The result is a set of effective single-particle equations — the Kohn–Sham equations — that are computationally feasible while remaining formally exact.

Decomposition of the Energy Functional

The key idea is to split the unknown universal functional \(F[\rho]\) into parts that can be handled accurately and a remainder that must be approximated:

\begin{equation} E[\rho] = T_s[\rho] + E_{\mathrm{H}}[\rho] + E_{\mathrm{xc}}[\rho] + \int V_{\mathrm{ext}}(\mathbf{r})\, \rho(\mathbf{r}) \, d\mathbf{r}, \label{eq:KS-energy} \end{equation}

where each term has a specific physical meaning:

\(T_s[\rho]\) is the kinetic energy of a fictitious non-interacting system with the same density \(\rho(\mathbf{r})\) as the real interacting system. Unlike the full kinetic energy \(T[\rho]\), this quantity can be computed exactly from the single-particle orbitals (see below).
\(E_{\mathrm{H}}[\rho]\) is the classical Hartree energy — the electrostatic self-energy of the electron charge distribution, treating it as a classical continuous fluid:

\begin{equation} E_{\mathrm{H}}[\rho] = \frac{1}{2} \iint \frac{\rho(\mathbf{r})\, \rho(\mathbf{r}')}{|\mathbf{r} - \mathbf{r}'|} \, d\mathbf{r} \, d\mathbf{r}'. \end{equation}

This is the dominant part of the electron–electron repulsion and is treated exactly.

\(E_{\mathrm{xc}}[\rho]\) is the exchange-correlation (XC) energy. It collects everything that is missing from the above terms:

\begin{equation} E_{\mathrm{xc}}[\rho] = \underbrace{(T[\rho] - T_s[\rho])}_{\text{kinetic correlation}} + \underbrace{(V_{ee}[\rho] - E_{\mathrm{H}}[\rho])}_{\text{exchange + correlation}}. \end{equation}

This includes the quantum exchange energy (the \(K_{ij}\) integrals introduced in Chapter 1.5.2), all Coulomb correlation effects beyond the Hartree level, and the correction to the kinetic energy from the interacting nature of the true system. \(E_{\mathrm{xc}}\) is the only term that must be approximated; its exact form is unknown.

\(\int V_{\mathrm{ext}}(\mathbf{r}),\rho(\mathbf{r}),d\mathbf{r}\) is the interaction with the external potential (the nuclear attraction and any applied fields), treated exactly.

The Kohn–Sham ansatz reduces the DFT problem to finding a good approximation for \(E_{\mathrm{xc}}[\rho]\) — a functional of three-dimensional \(\rho\) alone — rather than solving the full \(3N\)-dimensional Schrödinger equation.

The Kohn–Sham Ansatz

The central assumption of the Kohn–Sham approach is:

There exists a system of non-interacting electrons — the KS reference system — whose ground-state density is identical to that of the true interacting system.

Non-interacting \(v\)-representability. This assumption is not trivially guaranteed. It requires that the interacting ground-state density \(\rho_0(\mathbf{r})\) can be reproduced as the ground-state density of some non-interacting system moving in a local effective potential \(V_{\rm eff}(\mathbf{r})\). Densities for which this holds are called non-interacting \(v\)-representable. While a general proof is lacking, no physically relevant counterexample has been found, and the assumption is accepted as holding for all practical ground-state densities encountered in electronic structure calculations. The Kohn–Sham equations derived below are exact under this assumption.

The density is represented in terms of Kohn–Sham orbitals \({\phi_i(\mathbf{r})}\):

\begin{equation} \rho(\mathbf{r}) = \sum_{i=1}^{N} |\phi_i(\mathbf{r})|^2. \label{eq:KS-density} \end{equation}

The kinetic energy of the non-interacting reference system is then computed exactly from these orbitals:

\begin{equation} T_s[\rho] = -\frac{1}{2}\sum_{i=1}^{N} \langle \phi_i | \nabla^2 | \phi_i \rangle = -\frac{1}{2}\sum_{i=1}^{N} \int \phi_i^*(\mathbf{r})\,\nabla^2\phi_i(\mathbf{r})\,d\mathbf{r}. \end{equation}

This is the key advantage over orbital-free approaches like Thomas–Fermi: by working with orbitals we recover the exact non-interacting kinetic energy at the cost of introducing \(N\) single-particle functions.

Derivation of the Kohn–Sham Equations

We seek the set of orbitals \({\phi_i}\) that minimises the total KS energy functional \eqref{eq:KS-energy}, subject to the constraint that the orbitals remain orthonormal:

\begin{equation} \langle \phi_i | \phi_j \rangle = \int \phi_i^*(\mathbf{r})\phi_j(\mathbf{r})\,d\mathbf{r} = \delta_{ij}. \end{equation}

This is a constrained minimisation problem, which we solve using the method of Lagrange multipliers. We construct the Lagrangian:

\begin{equation} \mathcal{L}[\{\phi_i\}] = E[\rho] - \sum_{i,j} \epsilon_{ij} \left( \langle \phi_i | \phi_j \rangle - \delta_{ij} \right), \end{equation}

where \(\epsilon_{ij}\) are the Lagrange multipliers enforcing orthonormality. Taking the functional derivative with respect to \(\phi_i^*(\mathbf{r})\) and setting it to zero yields:

\begin{equation} \frac{\delta E[\rho]}{\delta \phi_i^*(\mathbf{r})} = \sum_j \epsilon_{ij} \phi_j(\mathbf{r}). \end{equation}

Applying the chain rule to the left-hand side — \(T_s\) is an explicit functional of the orbitals, while \(E_{\rm H}\), \(E_{\rm xc}\), and \(E_{\rm ext}\) depend on the orbitals only through the density \(\rho(\mathbf{r}‘) = \sum_j|\phi_j(\mathbf{r}’)|^2\):

\begin{equation} \frac{\delta E[\rho]}{\delta \phi_i^*(\mathbf{r})} = \frac{\delta T_s}{\delta \phi_i^*(\mathbf{r})} + \int \frac{\delta (E_{\mathrm{H}} + E_{\mathrm{xc}} + E_{\mathrm{ext}})}{\delta \rho(\mathbf{r}')} \frac{\delta \rho(\mathbf{r}')}{\delta \phi_i^*(\mathbf{r})} \, d\mathbf{r}'. \end{equation}

Evaluating the individual derivatives:

\(\frac{\delta T_s}{\delta \phi_i^(\mathbf{r})} = -\frac{1}{2}\nabla^2 \phi_i(\mathbf{r})\) — standard variation of \(T_s = -\tfrac{1}{2}\sum_j \int \phi_j^\nabla^2\phi_j,d\mathbf{r}\).
\(\frac{\delta \rho(\mathbf{r}‘)}{\delta \phi_i^(\mathbf{r})} = \phi_i(\mathbf{r}’)\delta(\mathbf{r}-\mathbf{r}‘)\) — from \(\rho(\mathbf{r}’) = \sum_j |\phi_j(\mathbf{r}‘)|^2\), differentiating with respect to \(\phi_i^(\mathbf{r})\) picks out the \(j = i\) term and localises it at \(\mathbf{r}’ = \mathbf{r}\).
\(\frac{\delta E_{\mathrm{ext}}}{\delta \rho(\mathbf{r})} = V_{\mathrm{ext}}(\mathbf{r})\) — from \(E_{\rm ext} = \int V_{\rm ext}\rho,d\mathbf{r}\).
\(\frac{\delta E_{\mathrm{H}}}{\delta \rho(\mathbf{r})} = \int \frac{\rho(\mathbf{r}‘)}{|\mathbf{r}-\mathbf{r}’|},d\mathbf{r}’ \equiv V_{\mathrm{H}}(\mathbf{r})\) — differentiating the symmetric double integral and exploiting symmetry to cancel the factor \(\tfrac{1}{2}\).
\(\frac{\delta E_{\mathrm{xc}}}{\delta \rho(\mathbf{r})} \equiv V_{\mathrm{xc}}(\mathbf{r})\) — by definition; this is the exchange-correlation potential whose explicit form is the subject of Chapter 4.

This defines the Kohn–Sham effective potential:

\begin{equation} V_{\mathrm{eff}}(\mathbf{r}) = V_{\mathrm{ext}}(\mathbf{r}) + V_{\mathrm{H}}(\mathbf{r}) + V_{\mathrm{xc}}(\mathbf{r}). \end{equation}

The minimisation condition becomes:

\begin{equation} \left[ -\frac{1}{2}\nabla^2 + V_{\mathrm{eff}}(\mathbf{r}) \right] \phi_i(\mathbf{r}) = \sum_j \epsilon_{ij} \phi_j(\mathbf{r}). \end{equation}

Because the Hamiltonian \(\hat{h}{\mathrm{KS}} = -\frac{1}{2}\nabla^2 + V{\mathrm{eff}}\) is Hermitian, the matrix of Lagrange multipliers \(\epsilon_{ij}\) is also Hermitian and can be diagonalised by a unitary transformation of the orbitals. This transformation leaves the total density \(\rho(\mathbf{r})\) unchanged. In this diagonal representation, we obtain the canonical Kohn–Sham equations:

\begin{equation} \boxed{ \left[ -\frac{1}{2}\nabla^2 + V_{\mathrm{eff}}(\mathbf{r}) \right] \phi_i(\mathbf{r}) = \epsilon_i \phi_i(\mathbf{r}) } \label{eq:KS-eqn} \end{equation}

These are single-particle Schrödinger-like equations. The eigenvalues \(\epsilon_i\) are the Kohn–Sham orbital energies.

The Self-Consistent Field (SCF) Cycle

The Kohn–Sham equations are non-linear: the effective potential \(V_{\mathrm{eff}}(\mathbf{r})\) depends on \(V_{\mathrm{H}}\) and \(V_{\mathrm{xc}}\), which in turn depend on the density \(\rho(\mathbf{r})\), which is computed from the orbitals \(\phi_i(\mathbf{r})\) that are the solutions to the KS equations themselves. There is no way to write down the solution in closed form: the equations must be solved self-consistently through iteration.

SCF vs. Iterative Methods: Two Different Iterations

The term “iterative” is often used loosely in DFT, but two fundamentally different iterative procedures are involved, and distinguishing them is essential for understanding how a calculation actually runs.

The SCF cycle is a fixed-point iteration. The mathematical problem is to find a density \(\rho^*(\mathbf{r})\) such that

\[ \rho^* = F[\rho^*], \qquad F[\rho] \equiv \sum_i |\phi_i\rho|^2, \]

where the right-hand side is constructed by (i) building \(V_{\rm eff}[\rho]\), (ii) solving the KS equations for the orbitals, and (iii) summing the orbital densities. The map \(F\) is non-linear in \(\rho\) — \(V_{\rm H}\) and \(V_{\rm xc}\) are non-linear functionals of the density — so the fixed point cannot be found by direct linear algebra. Instead it is found by iteration: \(\rho^{(n+1)} = F[\rho^{(n)}]\), possibly with mixing to ensure convergence (Chapter 8). The number of SCF iterations is typically 20–80 for a converged calculation.

The eigenvalue solver is a linear iterative method. Inside step (ii) of each SCF iteration, the linear problem \[ \hat{h}{\rm KS},\phi_i = \epsilon_i,\phi_i, \qquad \hat{h}{\rm KS} = -\tfrac{1}{2}\nabla^2 + V_{\rm eff}(\mathbf{r}), \] must be solved at fixed \(V_{\rm eff}\). This is a standard generalised eigenvalue problem, but the matrix dimension \(N_{\rm PW} \sim 10^4\)–\(10^5\) is too large for direct diagonalisation when only the lowest \(N_{\rm bands} \sim 10^2\)–\(10^3\) eigenpairs are needed. The practical algorithms — Davidson, RMM-DIIS, conjugate gradients — are iterative subspace methods (Chapter 10) that build up the desired eigenspace through successive matrix-vector products \(\hat{h}_{\rm KS}|\psi\rangle\). They typically converge in 3–8 inner iterations per SCF step.

The two iterations are nested: the outer SCF loop calls the inner eigenvalue solver as a subroutine at every step. They share the property of being iterative and producing convergence plots, but they solve different mathematical problems with different convergence theories:

Feature	SCF (outer)	Eigenvalue solver (inner)
Mathematical problem	Non-linear fixed point \(\rho = F[\rho]\)	Linear eigenvalue \(A\mathbf{x} = \lambda\mathbf{x}\)
Variable	Density \(\rho(\mathbf{r})\)	Orbital coefficients \({c_{\mathbf{G},i}}\)
Convergence theory	Banach fixed-point + spectral analysis of Jacobian	Krylov subspace / Rayleigh–Ritz
Acceleration	Pulay/Broyden mixing (Chapter 8)	Davidson/RMM-DIIS subspace expansion (Chapter 10)
Diagnostic	“SCF iteration N: ΔE = …”	“RMM-DIIS: rms = …” per band
Failure mode	Charge sloshing, oscillation	Band mixing, eigenvalue stalling
Typical iter. count	20–80	3–8 per SCF step

A typical DFT calculation therefore performs \(\mathcal{O}(100–500)\) inner eigenvalue iterations in total. Both must converge for the calculation to succeed; failure of one manifests differently from failure of the other.

The SCF Algorithm

In practice, the SCF procedure is:

Initial guess: Provide a starting density \(\rho^{(0)}(\mathbf{r})\), typically a superposition of atomic densities (VASP ICHARG = 2, QE startingpot = 'atomic').
Construct effective potential: Calculate \(V_{\mathrm{H}}[\rho]\) (via the Poisson equation, usually in reciprocal space) and \(V_{\mathrm{xc}}[\rho]\) (by evaluating the XC functional on the real-space grid). Assemble \(V_{\rm eff}(\mathbf{r}) = V_{\rm ext} + V_{\rm H} + V_{\rm xc}\).
Solve KS equations (inner loop): Diagonalise the KS Hamiltonian at each \(\mathbf{k}\)-point via an iterative subspace method (Davidson, RMM-DIIS), obtaining the orbitals \(\phi_i^{(n+1)}(\mathbf{r})\) and energies \(\epsilon_i\). This step is itself iterative — see the “inner loop” terminology above.
Construct new density: Determine occupations \(f_i\) from the Fermi level (Chapter 9 smearing), and form \(\rho_{\rm out}^{(n+1)}(\mathbf{r}) = \sum_i f_i,|\phi_i^{(n+1)}(\mathbf{r})|^2\).
Check convergence: If the total energy change \(|E^{(n+1)} - E^{(n)}|\) and the density change \(|\rho_{\rm out} - \rho_{\rm in}|\) are both below their respective tolerances (typically \(10^{-5}\)–\(10^{-8}\) eV for energy), stop and report the converged ground state.
Mix (if not converged): Construct the input density for the next SCF iteration as a weighted combination of input and output densities using Pulay, Broyden, or Kerker mixing (Chapter 8), then return to step 2.

This procedure is shown schematically in Figure 3.1.

Flowchart of the DFT SCF cycle showing nested outer (density self-consistency) and inner (eigenvalue solver) iterative loops — **Figure 3.1.** The DFT SCF cycle. The **outer loop** (purple dashed border) implements density self-consistency: a non-linear fixed-point iteration \(\rho = F[\rho]\) accelerated by mixing schemes. The **inner loop** (blue dashed border) implements the linear eigenvalue solver \(\hat{h}_{\rm KS}\phi = \epsilon\phi\) at fixed effective potential, itself an iterative method (Davidson, RMM-DIIS) that builds the eigenspace through matrix-vector products. The two iterations are nested: a typical calculation requires 20–80 outer SCF steps, each invoking the inner solver for 3–8 iterations per \(\mathbf{k}\)-point. The blocks are colour-coded by role: green for initialisation and output, gold for density/potential operations, blue for the eigenvalue solver, plum for SCF mixing, and terracotta for the convergence test.

The details of the mixing algorithms in step 6 and the inner eigenvalue solver in step 3 are sufficiently involved that they receive their own chapters: Chapter 8 develops density mixing schemes (linear, Kerker, Pulay/DIIS, Broyden), and Chapter 10 develops iterative diagonalisation (Davidson, RMM-DIIS, conjugate gradients).

Physical Meaning of Kohn–Sham Quantities

It is crucial to understand what the KS framework does and does not claim to represent physically.

The Total Energy

The total ground-state energy is not simply the sum of the KS orbital energies. Summing the eigenvalues gives:

\begin{equation} \sum_i \epsilon_i = \sum_i \langle \phi_i | -\frac{1}{2}\nabla^2 + V_{\mathrm{eff}} | \phi_i \rangle = T_s[\rho] + \int V_{\mathrm{eff}}(\mathbf{r})\rho(\mathbf{r})\,d\mathbf{r}. \end{equation}

Substituting the definition of \(V_{\mathrm{eff}}\):

\begin{equation} \sum_i \epsilon_i = T_s[\rho] + \int V_{\mathrm{ext}}\rho\,d\mathbf{r} + \int V_{\mathrm{H}}\rho\,d\mathbf{r} + \int V_{\mathrm{xc}}\rho\,d\mathbf{r}. \end{equation}

Comparing this to the exact total energy \eqref{eq:KS-energy}, we see that the Hartree and XC interactions have been double-counted in the eigenvalue sum. The correct total energy must be reconstructed by subtracting the double-counting terms:

\begin{equation} E[\rho] = \sum_i \epsilon_i - E_{\mathrm{H}}[\rho] - \int V_{\mathrm{xc}}(\mathbf{r})\rho(\mathbf{r})\,d\mathbf{r} + E_{\mathrm{xc}}[\rho]. \end{equation}

The KS Orbitals and Eigenvalues

In exact DFT, the fictitious KS orbitals \(\phi_i\) and their energies \(\epsilon_i\) have no strict physical meaning, with one exception: the highest occupied KS eigenvalue equals the negative of the first ionisation energy of the exact many-body system:

\begin{equation} \epsilon_{\mathrm{HOMO}}^{\mathrm{exact}} = -I. \end{equation}

This IP theorem was established independently by Almbladh and von Barth (1985) and Levy–Perdew–Sahni (1984) using the asymptotic form of the exact density at large \(|\mathbf{r}|\). The result is not a direct consequence of Janak’s theorem (Janak 1978), which states the distinct identity \(\epsilon_i = \partial E/\partial f_i\) (the KS eigenvalue is the derivative of the total energy with respect to its occupation number). Janak’s theorem applied at integer \(N\) gives the chemical potential, and combined with the long-range behaviour of the exact density yields \(\epsilon_{\rm HOMO} = -I\); both ingredients are required.

The other eigenvalues \(\epsilon_i\) do not formally correspond to electron removal or addition energies (quasiparticle excitations). The KS “band gap” (\(\epsilon_{\mathrm{LUMO}} - \epsilon_{\mathrm{HOMO}}\)) systematically underestimates the true fundamental gap, even if the exact XC functional were used. This is due to the derivative discontinuity of the exact XC functional with respect to particle number.

Despite this, KS orbitals and eigenvalues are widely (and somewhat informally) used to interpret band structures, density of states, and orbital energies, and often agree qualitatively with experiment, but this agreement is not guaranteed by the theory.

Formally correct excitation spectra require going beyond ground-state DFT, e.g. via Time-Dependent DFT (TDDFT) or many-body perturbation theory (GW approximation).

Summary

The Kohn–Sham scheme reduces the interacting many-body problem to a set of effective single-particle equations \eqref{eq:KS-eqn} that can be solved efficiently on modern computers. The key steps and ideas are:

Concept	Role
Non-interacting reference system	Enables exact computation of \(T_s[\rho]\) via orbitals
\(V_{\mathrm{H}}\)	Classical mean-field Coulomb repulsion, computed exactly
\(V_{\mathrm{xc}}\)	All many-body quantum effects, must be approximated
SCF loop	Resolves the self-consistency between density and potential
XC approximation (LDA, GGA, …)	The only uncontrolled approximation in the KS framework

The formal exactness of the KS framework — all errors traceable to a single approximated term — combined with its single-particle structure makes DFT the workhorse of electronic structure theory. In the chapters that follow, we will examine how this framework is implemented in practice: basis sets, pseudopotentials, \(k\)-point sampling, and the computational details of solving the KS equations for real materials.

Exchange-Correlation Functionals

In the Kohn–Sham framework, all the complexity of many-body quantum mechanics is concentrated in a single term: the exchange-correlation energy \(E_{\rm xc}[\rho]\). If the exact \(E_{\rm xc}\) were known, DFT would be formally exact. In practice it must be approximated, and six decades of development have produced a rich hierarchy of approximations.

Rather than presenting this hierarchy as a list of progressively better recipes, this chapter derives it from two systematic expansions, each anchored at a different exactly-known limit:

A spatial gradient expansion around the uniform electron gas (UEG) — the only interacting electron system for which \(E_{\rm xc}\) is known exactly. Expanding in powers of the dimensionless density gradient generates the semi-local hierarchy: LDA → GGA → meta-GGA.
A coupling-constant expansion around the non-interacting KS limit — the point at which the exact exchange is known analytically. Expanding in powers of the interaction strength generates the non-local hierarchy: GGA → hybrid → double-hybrid.

These two series are orthogonal: the first improves the local description of the XC hole shape; the second improves its non-local content. Understanding their structure turns the choice of functional from a matter of recipe-following into a matter of physics.

Jacob’s Ladder of Functionals

Jacob's ladder of XC functionals showing five rungs from LDA to fully non-local RPA, organised by gradient and coupling-constant series — **Figure 4.1.** Jacob’s ladder of XC functionals. Left rail: gradient expansion series (LDA → GGA → meta-GGA). Right rail: coupling-constant series (hybrid → double-hybrid). The two rails are orthogonal improvements; meta-hybrid functionals (SCAN0) sit at their crossing. Computational cost increases by roughly one order of magnitude per rung.

Perdew’s Jacob’s ladder organises XC functionals by the local ingredients each rung is allowed to use — and parallel to it, the coupling-constant series organises them by how much of the exact \(\lambda = 0\) endpoint they retain. The chapter develops both:

  Rung 5  ┃ Fully non-local correlation         ┃ ↑ Heaven (exact XC)
          ┃   (RPA, ACFD)                        ┃
  ────────┃──────────────────────────────────────┃
  Rung 4  ┃ Hyper-GGA / double-hybrid            ┃ + E_c^{GL2}     [coupling]
          ┃   (B2PLYP, PBE-QIDH)                 ┃
  ────────┃──────────────────────────────────────┃
  Rung 3  ┃ Hybrid (global / range-separated)    ┃ + E_x^{exact}   [coupling]
          ┃   (PBE0, B3LYP, HSE06, ωB97X-V)      ┃
  ────────┃──────────────────────────────────────┃
  Rung 2  ┃ meta-GGA                             ┃ + τ, α          [gradient]
          ┃   (SCAN, r²SCAN, TPSS)               ┃
  ────────┃──────────────────────────────────────┃
  Rung 1  ┃ GGA                                  ┃ + ∇ρ            [gradient]
          ┃   (PBE, PBEsol, BLYP)                ┃
  ────────┃──────────────────────────────────────┃
  Rung 0  ┃ LDA / LSDA                           ┃   ρ             [gradient]
          ┃   (PW92, VWN)                        ┃
          ┃                                      ┃ ↓ Earth (Hartree)

The left half (rungs 0–2) ascends the gradient series; the right half (rungs 3–4) ascends the coupling-constant series. Meta-hybrid functionals (e.g. SCAN0) live at the crossing of the two. Computational cost grows roughly tenfold per rung; predictive accuracy improves accordingly for the right class of problem, but no rung is uniformly best.

Formal Definition and the Two Reference Points

The XC Hole

The XC energy is defined by the decomposition introduced in Chapter 3:

\[ E_{\rm xc}[\rho] = (T[\rho] - T_s[\rho]) + (V_{ee}[\rho] - E_{\rm H}[\rho]). \]

An equivalent and physically transparent form expresses \(E_{\rm xc}\) in terms of the exchange-correlation hole \(n_{\rm xc}(\mathbf{r}, \mathbf{r}‘)\) — the depletion of electron density at \(\mathbf{r}’\) caused by the presence of an electron at \(\mathbf{r}\):

\[ E_{\rm xc}[\rho] = \frac{1}{2}\iint \frac{\rho(\mathbf{r}),n_{\rm xc}(\mathbf{r},\mathbf{r}‘)}{|\mathbf{r}-\mathbf{r}’|},d\mathbf{r},d\mathbf{r}’. \]

The exact XC hole satisfies two rigorous constraints:

\begin{equation} n_{\rm xc}(\mathbf{r},\mathbf{r}') \leq 0, \qquad \int n_{\rm xc}(\mathbf{r},\mathbf{r}')\,d\mathbf{r}' = -1. \label{eq:hole-sumrule} \end{equation}

The second relation — the sum rule — encodes the fact that each electron excludes exactly one unit of charge from its neighbourhood. Approximations that satisfy this constraint tend to benefit from systematic error cancellation even when the detailed shape of the hole is wrong, which explains the surprising accuracy of LDA despite its crude derivation.

Schematic of the exchange hole, correlation hole, and XC hole as a function of electron-electron separation — **Figure 4.2.** Schematic XC hole \(n_{\rm xc}(\mathbf{r},\mathbf{r}‘)\) as a function of \(|\mathbf{r}’-\mathbf{r}|\). The exchange hole (blue dashed) has a cusp at \(r’=r\) and a long \(-1/r\) tail for finite systems. The correlation hole (terracotta dotted) deepens the total hole at short range and integrates to zero. The XC hole (plum solid) satisfies the exact sum rule \(\int n_{\rm xc},d\mathbf{r}’ = -1\).

### The Two Exactly-Known Limits

Two limits of \(E_{\rm xc}\) are known exactly and serve as expansion points.

Limit 1 — Uniform electron gas (\(s = 0\)): When the density is spatially uniform, \(\rho(\mathbf{r}) = \bar\rho\), the XC energy per electron \(\varepsilon_{\rm xc}(\bar\rho)\) is known with essentially arbitrary precision from quantum Monte Carlo simulations (Ceperley–Alder, 1980). The exchange component is analytic (Dirac, 1930). This is the natural anchor for a Taylor series in density variations.

Limit 2 — Non-interacting KS limit (\(\lambda = 0\)): Via the adiabatic connection, the XC energy is written as an integral over a coupling constant \(\lambda \in [0,1]\) that smoothly scales the electron–electron interaction from zero (the KS system) to its physical value:

\begin{equation} E_{\rm xc}[\rho] = \int_0^1 W_{\rm xc}^\lambda[\rho]\,d\lambda, \qquad W_{\rm xc}^\lambda = \langle\Psi^\lambda|\hat{V}_{ee}|\Psi^\lambda\rangle - E_{\rm H}[\rho], \label{eq:adiabatic} \end{equation}

where \(\Psi^\lambda\) is the ground state with the same density \(\rho\) but interaction scaled by \(\lambda\). At \(\lambda = 0\), \(\Psi^0\) is the KS Slater determinant and \(W_{\rm xc}^0 = E_{\rm x}^{\rm exact}\) is known analytically. This is the natural anchor for a Taylor series in interaction strength.

Derivation. Define a one-parameter family of Hamiltonians \(\hat{H}^\lambda = \hat{T} + \lambda\hat{V}{ee} + \hat{V}{\rm ext}^\lambda\), where the external potential \(V_{\rm ext}^\lambda\) is adjusted at each \(\lambda\) so that the ground-state density remains fixed at the physical \(\rho(\mathbf{r})\). The endpoints are the KS system (\(\lambda=0\)) and the physical system (\(\lambda=1\)). Applying the Hellmann–Feynman theorem to \(E^\lambda = \langle\Psi^\lambda|\hat{H}^\lambda|\Psi^\lambda\rangle\):

\[ \frac{dE^\lambda}{d\lambda} = \langle\Psi^\lambda|\hat{V}{ee}|\Psi^\lambda\rangle + \int \frac{dV{\rm ext}^\lambda}{d\lambda},\rho(\mathbf{r}),d\mathbf{r}. \]

Integrating from \(\lambda = 0\) to \(\lambda = 1\) gives \(E^1 - E^0 = \int_0^1\langle\hat{V}{ee}\rangle^\lambda,d\lambda + \int(V{\rm ext}^1 - V_{\rm ext}^0)\rho,d\mathbf{r}\). Using \(E^1 = T + V_{ee} + \int V_{\rm ext}\rho\) and \(E^0 = T_s + \int V_{\rm KS}\rho\), and the KS decomposition \(V_{ee} = E_{\rm H} + E_{\rm xc} - (T - T_s)\), the external-potential boundary terms cancel exactly, leaving:

\[ E_{\rm xc} = \int_0^1 \left(\langle\Psi^\lambda|\hat{V}{ee}|\Psi^\lambda\rangle - E{\rm H}[\rho]\right)d\lambda \equiv \int_0^1 W_{\rm xc}^\lambda,d\lambda, \]

which is \eqref{eq:adiabatic}. The kinetic energy correction \(T - T_s\) is absorbed into the integral by the way the external potential is adjusted along the path — this is what makes \(W_{\rm xc}^\lambda\), rather than just \(\langle V_{ee}\rangle^\lambda\), the right quantity to integrate.

The First Taylor Series: Gradient Expansion and Semi-Local Functionals

The Dimensionless Gradient and the GEA

For a slowly varying density, we measure spatial variation by the reduced density gradient:

\[ s(\mathbf{r}) = \frac{|\nabla\rho(\mathbf{r})|}{2k_F(\mathbf{r}),\rho(\mathbf{r})}, \qquad k_F(\mathbf{r}) = (3\pi^2\rho(\mathbf{r}))^{1/3}, \]

where \(k_F\) is the local Fermi wavevector. The parameter \(s\) measures the density gradient in units of the local electron wavelength: \(s = 0\) for the UEG; \(s \sim 0.5\)–\(3\) in typical valence regions; \(s \to \infty\) in exponentially decaying density tails.

The formal gradient expansion approximation (GEA) from many-body perturbation theory gives the systematic Taylor series in \(s^2\) (odd powers vanish by symmetry):

\begin{equation} E_{\rm xc}^{\rm GEA}[\rho] = \int \rho\,\varepsilon_{\rm xc}^{\rm UEG}(\rho)\,d\mathbf{r} + \int C_{\rm xc}(\rho)\,s^2\,\rho\,d\mathbf{r} + \mathcal{O}(s^4), \label{eq:GEA} \end{equation}

where \(C_{\rm xc}(\rho)\) is a density-dependent coefficient computable from perturbation theory.

Rung 0: LDA — Zeroth Order

Truncating \eqref{eq:GEA} at zeroth order (\(s^0\)) gives the local density approximation:

\[ \boxed{E_{\rm xc}^{\rm LDA}[\rho] = \int \varepsilon_{\rm xc}^{\rm UEG}(\rho(\mathbf{r})),\rho(\mathbf{r}),d\mathbf{r}.} \]

The exchange part is analytic (Dirac, 1930):

\[ \varepsilon_{\rm x}^{\rm UEG}(\rho) = -\frac{3}{4}!\left(\frac{3}{\pi}\right)^{!1/3}!\rho^{1/3}. \]

The correlation \(\varepsilon_{\rm c}(\rho)\) is parameterised from QMC data: the Perdew–Wang (PW92) and Vosko–Wilk–Nusair (VWN) parameterisations are most common.

Why LDA works better than a zeroth-order truncation deserves. The LDA XC hole is that of the UEG, which satisfies the sum rule \eqref{eq:hole-sumrule} by construction. Even when the detailed angular shape of the hole is wrong, the spherically averaged hole — which is what determines \(E_{\rm xc}\) via a radial integral — is described well. This is systematic error cancellation rooted in constraint satisfaction, not coincidence.

Known systematic errors: overbinds (cohesive energies \(+1\)–\(2\) eV/atom); underestimates lattice constants (\(-1\)–\(3%\)); underestimates band gaps (\(-40%\)); no dispersion.

When to use LDA: Phonons and structural properties of simple and alkali metals, where the electron gas is nearly uniform and the error cancellation is most favourable.

Why the Raw GEA Fails

The \(\mathcal{O}(s^2)\) term in \eqref{eq:GEA} does not improve on LDA — it makes things worse. The GEA correction introduces oscillations in the XC hole at large \(|\mathbf{r}-\mathbf{r}’|\) (where \(s\) is large), causing the hole to become positive and violating \(n_{\rm xc} \leq 0\) and the \(-1\) sum rule in the density tails. This is a classical example of a Taylor series that diverges when the expansion parameter is not uniformly small: \(s \to \infty\) in the tail of any finite system.

Rung 1: GGA — Constrained Resummation of the Gradient Series

Rather than truncating \eqref{eq:GEA}, the generalised gradient approximation writes the XC energy in the enhancement factor form:

\[ \boxed{E_{\rm xc}^{\rm GGA}[\rho] = \int \varepsilon_{\rm xc}^{\rm UEG}(\rho),F_{\rm xc}(\rho, s),\rho,d\mathbf{r},} \]

where \(F_{\rm xc}(\rho, s)\) is a dimensionless enhancement factor. GGA is not the \(\mathcal{O}(s^2)\) truncation — it is a constrained resummation that reproduces the correct \(s^2\) behaviour near \(s = 0\) while satisfying exact constraints for all \(s\).

PBE (Perdew–Burke–Ernzerhof, 1996) determines \(F_{\rm xc}\) by imposing:

\(F_{\rm xc} \to 1\) as \(s \to 0\): recovers LDA, consistent with the UEG limit.
\(\partial F_{\rm xc}/\partial(s^2)|_0 = \mu\): matches the GEA gradient coefficient, ensuring consistency with perturbation theory at small \(s\).
Cutoff at large \(s\): \(F_{\rm xc}\) saturates rather than growing without bound, enforcing the sum rule \(\int n_{\rm x}^{\rm GGA},d\mathbf{r}’ = -1\) even in the density tails. This is precisely the correction to the GEA failure above.
Lieb–Oxford bound: \(E_{\rm xc}^{\rm GGA} \geq -1.679\int\rho^{4/3},d\mathbf{r}\), a rigorous lower bound satisfied for all \(s\).

The explicit PBE exchange enhancement factor is:

\[ F_{\rm x}^{\rm PBE}(s) = 1 + \kappa - \frac{\kappa}{1 + \mu s^2/\kappa}, \qquad \kappa = 0.804,\quad \mu = 0.2195, \]

where both parameters are fixed by constraints, not by fitting to data. PBE has zero empirical parameters.

LDA vs. PBE:

Property	LDA	PBE
Lattice constants	\(-1\) to \(-3%\)	\(+1\) to \(+2%\)
Atomisation energies	\(+30\) kcal/mol	\(+10\) kcal/mol
Band gaps	\(-40%\)	\(-40%\)
Hydrogen bonds	Reasonable	Good
Van der Waals	Very poor	Poor

The band gap error is essentially unchanged between LDA and PBE: it originates not in the gradient expansion but in the missing derivative discontinuity of the exact XC potential at integer electron number, which no semi-local functional can capture.

Lieb–Oxford bound. A rigorous lower bound on \(E_{\rm xc}\) for any \(N\)-electron density follows from the Lieb–Oxford inequality (Lieb and Oxford, 1981):

\[ E_{\rm xc}[\rho] \geq -C_{\rm LO}\int \rho(\mathbf{r})^{4/3},d\mathbf{r}, \qquad C_{\rm LO} \leq 1.679. \]

Physically this bound says that the XC energy cannot be more negative than a multiple of the exchange energy of a fully spin-polarised UEG at the same density — it bounds how much the XC hole can deepen the energy. The numerical constant \(C_{\rm LO} = 1.679\) was obtained by Lieb and Oxford from analysis of the indirect part of the Coulomb energy. PBE’s exchange enhancement factor is explicitly constructed to satisfy this bound for all \(s\), which is one of its 11 exact constraints. LDA satisfies the bound locally; GEA violates it at large \(s\).

Rung 2: Meta-GGA — Encoding \(\mathcal{O}(s^4)\) Through \(\tau\)

The gradient expansion can in principle be extended to \(\mathcal{O}(s^4)\), which involves second spatial derivatives \(\nabla^2\rho\). However, \(\nabla^2\rho\) included directly reintroduces oscillatory behaviour in the density tails. The kinetic energy density:

\[ \tau(\mathbf{r}) = \frac{1}{2}\sum_i f_i,|\nabla\phi_i(\mathbf{r})|^2 \]

encodes the same fourth-order gradient information without the oscillation pathology. To see why, consider the two limiting forms of \(\tau\).

The von Weizsäcker kinetic energy density \(\tau^W\) is the exact \(\tau\) for a system described by a single orbital \(\phi = \sqrt{\rho},e^{i\theta}\). For a real, nodeless orbital, \(\phi = \sqrt{\rho}\), so:

\[ \tau^W = \frac{1}{2}|\nabla\sqrt{\rho}|^2 = \frac{1}{2}\left|\frac{\nabla\rho}{2\sqrt{\rho}}\right|^2 = \frac{|\nabla\rho|^2}{8\rho}. \]

This is the minimum possible kinetic energy density consistent with a given \(\rho\); any multi-orbital system has \(\tau \geq \tau^W\) (the Pauli kinetic energy is non-negative). For the uniform electron gas with \(N\) occupied plane-wave orbitals, the kinetic energy density is:

\[ \tau^{\rm UEG}(\rho) = \frac{3}{10}(3\pi^2)^{2/3}\rho^{5/3}, \]

which is the Thomas–Fermi kinetic energy density. The gradient expansion then gives:

\[ \tau = \tau^W + \tau^{\rm UEG} + \mathcal{O}(\nabla^4\rho), \]

showing that \(\tau - \tau^W\) measures the deviation from single-orbital character, i.e. the degree of multi-orbital kinetic energy. The meta-GGA uses this through the iso-orbital indicator:

\[ \alpha(\mathbf{r}) = \frac{\tau(\mathbf{r}) - \tau^W(\mathbf{r})}{\tau^{\rm UEG}(\mathbf{r})}, \]

whose physical content follows directly:

\(\alpha = 0\): \(\tau = \tau^W\), exact for a single orbital — covalent bonds, core regions.
\(\alpha = 1\): \(\tau = \tau^W + \tau^{\rm UEG}\), the UEG limit — metallic regions, LDA should be recovered.
\(\alpha \gg 1\): many orbitals, slowly varying density — van der Waals region between closed-shell fragments.

The meta-GGA energy functional is:

\[ E_{\rm xc}^{\rm mGGA}[\rho] = \int \varepsilon_{\rm xc}(\rho,,s,,\alpha),\rho,d\mathbf{r}. \]

SCAN (Strongly Constrained and Appropriately Normed; Sun, Ruzsinszky, Perdew, 2015) satisfies all 17 known exact constraints applicable to a semi-local functional simultaneously — more than any earlier functional. The constraints are conditions the exact \(E_{\rm xc}\) provably obeys; satisfying them forces SCAN to approximate the exact functional in mathematically controlled ways. A representative set:

Correct UEG limit: \(E_{\rm xc} \to E_{\rm xc}^{\rm UEG}\) when \(\rho\) is uniform.
Lieb–Oxford bound: \(E_{\rm xc}[\rho] \geq -C_{\rm LO}\int\rho^{4/3}d\mathbf{r}\) for all densities and all spin polarisations.
Uniform coordinate scaling: under \(\rho(\mathbf{r}) \to \gamma^3\rho(\gamma\mathbf{r})\), the exchange energy scales as \(E_{\rm x}[\rho_\gamma] = \gamma E_{\rm x}[\rho]\) — a rigorous consequence of the form of the Coulomb operator.
One-electron self-interaction freedom in the appropriate norm: for any one-electron density (signalled by \(\alpha = 0\) at every \(\mathbf{r}\)), SCAN’s exchange–correlation hole reduces to the exact single-electron form, eliminating most of the LDA/GGA self-interaction error in this limit.
\(\tau^W/\tau \to 1\) for one-orbital regions: the iso-orbital indicator \(\alpha\) goes to zero in single-orbital regions (covalent bonds, core), and SCAN’s enhancement factor responds correctly.

The remaining constraints govern asymptotic behaviour at low and high density, second-order gradient expansion coefficients, and spin-scaling. Crucially, the constraints fix all of SCAN’s parameters: it is constrained, not fitted. The “appropriately normed” half of the name refers to choosing the small number of free parameters to reproduce norms — exactly soluble or benchmark systems like the H atom, the He isoelectronic series, and the jellium surface — rather than fitting to a large empirical database. r²SCAN (Furness et al., 2020) is a numerically regularised variant that preserves all the constraints but removes numerical oscillations in the exchange–correlation potential; it is the preferred form in plane-wave codes (VASP, QE).

The Second Taylor Series: Coupling-Constant Expansion and Non-Local Functionals

The semi-local functionals above improve the spatial description of the XC hole but remain fundamentally local: \(\varepsilon_{\rm xc}\) at \(\mathbf{r}\) depends only on quantities at \(\mathbf{r}\). A qualitatively different class of improvement comes from the second Taylor series.

Expanding \(W_{\rm xc}^\lambda\) Around \(\lambda = 0\)

We expand the adiabatic connection integrand \eqref{eq:adiabatic} in \(\lambda\):

\begin{equation} W_{\rm xc}^\lambda = W_{\rm xc}^0 + \lambda\left.\frac{dW_{\rm xc}^\lambda}{d\lambda}\right|_{\lambda=0} + \frac{\lambda^2}{2}\left.\frac{d^2W_{\rm xc}^\lambda}{d\lambda^2}\right|_{\lambda=0} + \cdots \label{eq:AC-Taylor} \end{equation}

The zeroth-order term is exactly known:

\[ W_{\rm xc}^0 = E_{\rm x}^{\rm exact}[{\phi_i}] = -\frac{1}{2}\sum_{i,j}\iint\frac{\phi_i^(\mathbf{r})\phi_j(\mathbf{r})\phi_j^(\mathbf{r}‘)\phi_i(\mathbf{r}’)}{|\mathbf{r}-\mathbf{r}‘|},d\mathbf{r},d\mathbf{r}’. \]

The first derivative is identified by Görling–Levy (GL) perturbation theory as twice the second-order GL correlation energy \(E_{\rm c}^{\rm GL2}\):

\begin{equation} \left.\frac{dW_{\rm xc}^\lambda}{d\lambda}\right|_{\lambda=0} = 2E_{\rm c}^{\rm GL2} = -2\sum_{\substack{i \lt j \\ a \lt b}} \frac{|\langle\phi_i\phi_j\|\phi_a\phi_b\rangle|^2}{\epsilon_a + \epsilon_b - \epsilon_i - \epsilon_j}, \label{eq:GL2} \end{equation}

where \(i,j\) label occupied and \(a,b\) unoccupied KS orbitals. This has the same algebraic structure as MP2 correlation but is evaluated on KS orbitals. Integrating \eqref{eq:AC-Taylor}:

\begin{equation} E_{\rm xc} = E_{\rm x}^{\rm exact} + E_{\rm c}^{\rm GL2} + \frac{1}{6}\left.\frac{d^2W_{\rm xc}^\lambda}{d\lambda^2}\right|_{\lambda=0} + \cdots \label{eq:AC-integrated} \end{equation}

Each successive term is more expensive and more accurate.

Rung 3: Global Hybrid Functionals — Zeroth-Order Plus Remainder

Retaining only the zeroth-order term \(E_{\rm x}^{\rm exact}\) and approximating the remainder of the \(\lambda\) integral by a GGA gives the global hybrid:

\begin{equation} \boxed{E_{\rm xc}^{\rm hybrid} = a\,E_{\rm x}^{\rm exact} + (1-a)\,E_{\rm x}^{\rm GGA} + E_{\rm c}^{\rm GGA}.} \label{eq:hybrid} \end{equation}

The fraction \(a\) is not a free parameter: fourth-order GL perturbation theory (Perdew, Ernzerhof, Burke, 1996) shows that for systems near the weakly correlated limit, the integrand \(W_{\rm xc}^\lambda\) is nearly linear in \(\lambda\), and the trapezoidal rule \((W_0 + W_1)/2\) gives \(a = 1/2\). Since GGA already captures part of the \(\lambda = 1\) end, the effective exact exchange weight is smaller, and GL4 gives \(a = 1/4\) as the optimal parameter-free estimate.

PBE0

PBE0 implements \eqref{eq:hybrid} with \(a = 1/4\) from GL4 perturbation theory and PBE as the semi-local component:

\[ E_{\rm xc}^{\rm PBE0} = \frac{1}{4}E_{\rm x}^{\rm exact} + \frac{3}{4}E_{\rm x}^{\rm PBE} + E_{\rm c}^{\rm PBE}. \]

PBE0 has no empirical parameters. In the limit \(a \to 0\) it recovers PBE exactly. Band gaps improve from \(-40%\) (PBE) to \(\sim -20%\) (PBE0), and reaction barriers improve substantially. The cost in a periodic code scales as \(\mathcal{O}(N^3)\) with density fitting, making it expensive for large unit cells.

B3LYP

B3LYP (Becke, 1993) occupies the same position in the coupling-constant series but determines its three parameters \((a_0, a_x, a_c)\) empirically by fitting to the G2 molecular thermochemistry database rather than from perturbation theory:

\[ E_{\rm xc}^{\rm B3LYP} = (1-a_0)E_{\rm x}^{\rm LSDA} + a_0,E_{\rm x}^{\rm exact} + a_x\Delta E_{\rm x}^{\rm B88} + (1-a_c)E_{\rm c}^{\rm VWN} + a_c,E_{\rm c}^{\rm LYP}, \]

with \(a_0 = 0.20\), \(a_x = 0.72\), \(a_c = 0.81\). The LYP correlation (Lee–Yang–Parr) is derived from the Colle–Salvetti helium atom expression, not from the UEG, so it does not reduce to \(\varepsilon_{\rm c}^{\rm UEG}\) in the uniform limit. This makes B3LYP physically inconsistent for metallic systems but very well-suited to organic molecules (its fitting set): thermochemical accuracy \(\sim 3\) kcal/mol, bond lengths \(\sim 0.01\) Å. It should not be used for periodic metals or large-gap insulators.

The contrast between PBE0 and B3LYP is instructive: both sit at the same rung of the coupling-constant ladder, but PBE0 is constraint-derived and transferable while B3LYP is empirically optimal within its training domain.

The Spatial Structure of the Exchange Hole: Motivating Range Separation

Global hybrids apply the same fraction \(a\) of exact exchange uniformly at all length scales. But the exchange hole \(n_{\rm x}(\mathbf{r},\mathbf{r}’)\) behaves fundamentally differently at short and long range.

At short range (\(|\mathbf{r}-\mathbf{r}’| \to 0\)): the hole is deep, localised, and cusp-like. Semi-local functionals describe this region reasonably well.

At long range (\(|\mathbf{r}-\mathbf{r}’| \to \infty\)): the exact exchange hole of a finite system must decay as \(-1/r\) to satisfy the \(-1\) sum rule. This \(-1/r\) tail implies that the exchange potential \(V_{\rm x}(\mathbf{r}) \to -1/r\) asymptotically — a non-local behaviour that only exact exchange can reproduce; semi-local functionals decay exponentially instead.

For solids, the situation is reversed at long range: the dielectric response of the crystal screens the exchange hole so that the effective long-range decay is faster than \(-1/r\). In a perfect metal, the exchange interaction is fully screened beyond the Thomas–Fermi length (\(\sim 0.5\) Å), so exact long-range exchange is not only unnecessary but harmful — it produces a logarithmic divergence of the exchange potential at the Fermi surface.

This physics suggests splitting the Coulomb operator into short-range (SR) and long-range (LR) parts using the error function:

\[ \frac{1}{r_{12}} = \underbrace{\frac{\mathrm{erfc}(\omega r_{12})}{r_{12}}}{\mathrm{SR}: ;\to 0\text{ as }r{12}\to\infty} + \underbrace{\frac{\mathrm{erf}(\omega r_{12})}{r_{12}}}{\mathrm{LR}: ;\to 1/r{12}\text{ as }r_{12}\to\infty}, \]

where \(\omega\) controls the crossover: \(r_c \approx 1/\omega\). Different exact exchange fractions \(a_{\rm SR}\) and \(a_{\rm LR}\) are then applied to each part.

HSE06 — Short-Range Exact Exchange for Solids

HSE06 (Heyd–Scuseria–Ernzerhof, 2003/2006) applies exact exchange at short range only:

\[ E_{\rm xc}^{\rm HSE06} = \frac{1}{4}E_{\rm x}^{\rm exact,SR}(\omega) + \frac{3}{4}E_{\rm x}^{\rm PBE,SR}(\omega) + E_{\rm x}^{\rm PBE,LR}(\omega) + E_{\rm c}^{\rm PBE}, \]

with \(\omega = 0.11\) bohr\(^{-1}\) (\(\approx 0.21\) Å\(^{-1}\), crossover \(\sim 5\)–\(10\) Å).

The limits confirm the physical motivation: \(\omega \to 0\) recovers PBE0 (no screening, appropriate for molecules); \(\omega \to \infty\) recovers PBE (full screening, appropriate for metals). At \(\omega = 0.11\) bohr\(^{-1}\), HSE06 describes the screened exchange of a typical semiconductor. Computational advantages are significant: the short-range Fock operator decays exponentially in real space, enabling \(\mathcal{O}(N)\) scaling, making HSE06 tractable for unit cells of hundreds of atoms.

Performance: band gaps within \(\sim 10\)–\(15%\) of experiment for most semiconductors; correct metallic behaviour where PBE0 fails; standard functional for band gap calculations in periodic solids.

LC-\(\omega\)PBE and \(\omega\)B97X-V — Long-Range Exact Exchange for Molecules

For isolated molecules, the physical argument is opposite: the density tails are unscreened, and the correct \(-1/r\) asymptote of \(V_{\rm x}\) is essential for Rydberg states and charge-transfer excitations. Long-range corrected (LC) functionals apply exact exchange at long range:

\[ E_{\rm xc}^{\rm LC} = E_{\rm x}^{\rm GGA,SR}(\omega) + E_{\rm x}^{\rm exact,LR}(\omega) + E_{\rm c}^{\rm GGA}. \]

\(\omega\)B97X-V (Mardirossian–Head-Gordon, 2014; \(a_{\rm SR} = 0.167\), \(a_{\rm LR} = 1.0\), \(\omega = 0.30\) Å\(^{-1}\)) further combines long-range correction with the non-local VV10 dispersion functional, treating van der Waals at the functional level. It ranks among the best single functionals for molecular benchmarks.

CAM-B3LYP (Yanai et al., 2004) interpolates smoothly between \(a_{\rm SR} = 0.19\) and \(a_{\rm LR} = 0.65\) using \(\omega = 0.33\) Å\(^{-1}\), retaining B3LYP’s ground-state accuracy while improving excited states.

Summary: Global and Range-Separated Hybrids

Functional	\(a_{\rm SR}\)	\(a_{\rm LR}\)	\(\omega\) (Å\(^{-1}\))	Primary use
PBE0	0.25	0.25	—	Molecules and solids (no screening)
B3LYP	0.20	0.20	—	Organic molecules
HSE06	0.25	0.00	0.21	Periodic solids, large cells
LC-\(\omega\)PBE	0.00	1.00	0.40	Charge transfer, Rydberg
CAM-B3LYP	0.19	0.65	0.33	Molecular excited states
\(\omega\)B97X-V	0.167	1.00	0.30	Molecular thermochemistry + vdW

Rung 4: Double-Hybrid Functionals — First-Order Term

Retaining both the zeroth- and first-order terms of \eqref{eq:AC-integrated} and approximating the GGA remainder for the higher orders gives the double-hybrid:

\[ \boxed{E_{\rm xc}^{\rm DH} = a,E_{\rm x}^{\rm exact} + (1-a),E_{\rm x}^{\rm GGA} + b,E_{\rm c}^{\rm GL2} + (1-b),E_{\rm c}^{\rm GGA},} \]

where \(E_{\rm c}^{\rm GL2}\) is the second-order Görling–Levy correlation \eqref{eq:GL2}. This costs \(\mathcal{O}(N^5)\) in system size due to the four-index two-electron integrals, restricting applications to systems of tens to a few hundred atoms.

B2PLYP (Grimme, 2006; \(a = 0.53\), \(b = 0.27\), empirical) and PBE-QIDH (Chai and Mao, 2012; \(a = (1/3)^{1/3}\), \(b = 1/3\), derived from GL perturbation theory without fitting) achieve thermochemical accuracy of \(\sim 1\) kcal/mol, rivalling wavefunction methods at substantially lower cost. Double hybrids are rarely applied to periodic solids due to the \(\mathcal{O}(N^5)\) cost.

The Two Expansions Together: Jacob’s Ladder Revisited

The full hierarchy is now organised by both Taylor series simultaneously:

Rung	Series	Order	New ingredient	Example
LDA	Gradient, \(s^0\)	Zeroth	UEG: \(\rho\)	PW92, VWN
GGA	Gradient, resummation of \(s^2\)	First (resummed)	\(\nabla\rho\)	PBE, PBEsol
meta-GGA	Gradient, encodes \(s^4\)	Second (via \(\tau\))	\(\tau\), \(\alpha\)	SCAN, r²SCAN
Hybrid	Coupling constant, \(\lambda^0\)	Zeroth	\(E_{\rm x}^{\rm exact}\)	PBE0, HSE06
Double-hybrid	Coupling constant, \(\lambda^1\)	First	\(E_{\rm c}^{\rm GL2}\)	B2PLYP, PBE-QIDH

Meta-hybrid functionals (SCAN0, r²SCAN0: \(a = 1/4\) exact exchange added to SCAN/r²SCAN) sit at the intersection — exploiting both expansions simultaneously for improved band gaps and magnetic exchange couplings.

Van der Waals: Beyond Both Series

London dispersion — the \(-C_6/R^6\) attraction between distant closed-shell fragments — arises from non-local correlated density fluctuations at distances \(R\) far exceeding an inter-atomic spacing. Neither the spatial gradient expansion (local by construction) nor the GL expansion at low order captures it: \(E_{\rm c}^{\rm GL2}\) underestimates medium-range correlation, and higher-order terms would be needed.

Practical remedies are dispersion corrections added to the DFT total energy:

DFT-D3(BJ) (Grimme et al., 2010): atom-pairwise \(-C_6/r^6 - C_8/r^8\) with Becke–Johnson damping. Cheap, widely available, recommended default.
Tkatchenko–Scheffler (TS): \(C_6\) coefficients from Hirshfeld density partitioning; captures chemical environment dependence.
Many-body dispersion (MBD): Collective dipole fluctuations via coupled quantum harmonic oscillators; important for molecular crystals and large biomolecules.
vdW-DF / VV10: Non-local correlation functional via a double spatial integral; dispersion at the functional level without atom-pairwise parameterisation.

For periodic systems, PBE-D3(BJ) or r²SCAN+rVV10 are recommended.

Self-Interaction Error

The self-interaction error (SIE) — the incomplete cancellation between \(E_{\rm H}[\rho]\) and \(E_{\rm xc}[\rho]\) for the unphysical self-repulsion of each electron — is illuminated by both Taylor series. From the gradient expansion perspective, semi-local functionals built on the UEG reference cannot reproduce the single-electron limit because the UEG has many electrons and no self-interaction to cancel. From the coupling-constant perspective, the SIE is progressively reduced as more exact exchange (which is exactly self-interaction-free) is included. This is why each rung from hybrid upward improves reaction barriers, charge-transfer energies, and localisation of \(d\)/\(f\) electrons — all dominated by regions where the self-interaction is severe.

Practical Decision Guide

Property / System	Recommended functional	Key reason
Phonons, lattice constants (metals)	LDA or PBEsol	Near-UEG electron gas; error cancellation
Geometry optimisation (general)	PBE or r²SCAN	PBE default; r²SCAN for improved quality
Band gaps (semiconductors)	HSE06	Screened SR exchange, tractable for large cells
Band gaps (insulators, molecules)	PBE0	Global hybrid sufficient; no screening needed
Magnetic exchange coupling \(J\)	PBE0 or HSE06	Reduced SIE on \(d\)/\(f\) orbitals
Molecular thermochemistry	B3LYP or \(\omega\)B97X-V	Fitted or constraint-based with LR correction
Charge-transfer / Rydberg (TDDFT)	CAM-B3LYP or LC-\(\omega\)PBE	LR exact exchange essential
Layered / vdW systems	PBE-D3(BJ) or r²SCAN+rVV10	Explicit dispersion mandatory
Large periodic cells (\(>200\) atoms)	PBE, r²SCAN, or HSE06	Hybrid cost prohibitive beyond HSE06
High-accuracy thermochemistry	B2PLYP-D3 or PBE-QIDH	GL2 correlation, \(\sim 1\) kcal/mol

Outlook

The two Taylor series give a unified account of the XC landscape but neither converges uniformly. The gradient expansion fails for \(s \gg 1\) (exponential tails, surfaces), and the coupling-constant expansion is slow for strongly correlated systems (large \(\lambda\) dependence, near-degenerate ground states). The next major failure mode — the treatment of localised \(d\) and \(f\) electrons in Mott–Hubbard materials — and the DFT+U correction that targets it directly are the subject of a later chapter.

Basis Sets and Pseudopotentials

Solving the Kohn–Sham equations requires representing the single-particle orbitals \(\phi_i(\mathbf{r})\) numerically. There are two complementary choices to be made:

How to expand the orbitals — the choice of basis set.
How to treat the core electrons — full all-electron treatment, or replace the core with a pseudopotential (PP) or projector augmented wave (PAW) potential.

These choices profoundly affect computational cost, accuracy, and the type of system tractable. This chapter develops the theory behind the most common approaches and provides practical guidance on convergence.

Basis Set Expansion

The KS orbital \(\phi_i(\mathbf{r})\) is an element of an infinite-dimensional Hilbert space. In practice, it is expanded in a finite basis \({f_\mu(\mathbf{r})}\):

\begin{equation} \phi_i(\mathbf{r}) = \sum_\mu c_{i\mu}\, f_\mu(\mathbf{r}), \end{equation}

where the coefficients \(c_{i\mu}\) are determined by diagonalising the KS Hamiltonian in this basis. The KS eigenvalue problem \eqref{eq:KS-eqn} becomes the generalised matrix eigenvalue problem:

\begin{equation} \mathbf{H}_{\rm KS}\,\mathbf{c}_i = \epsilon_i\,\mathbf{S}\,\mathbf{c}_i, \end{equation}

where \(H_{\rm KS,\mu\nu} = \langle f_\mu | \hat{h}{\rm KS} | f\nu\rangle\) and \(S_{\mu\nu} = \langle f_\mu | f_\nu\rangle\) is the overlap matrix. The two dominant choices are plane waves and localised (atomic-like) basis functions.

Plane-Wave Basis Sets

For crystalline solids, Bloch’s theorem dictates that KS orbitals have the form:

\begin{equation} \phi_{i\mathbf{k}}(\mathbf{r}) = e^{i\mathbf{k}\cdot\mathbf{r}}\,u_{i\mathbf{k}}(\mathbf{r}), \end{equation}

where \(u_{i\mathbf{k}}(\mathbf{r})\) is lattice-periodic. The periodic part is naturally expanded in plane waves labelled by reciprocal lattice vectors \(\mathbf{G}\):

\begin{equation} u_{i\mathbf{k}}(\mathbf{r}) = \frac{1}{\sqrt{\Omega}}\sum_{\mathbf{G}} c_{i,\mathbf{k}+\mathbf{G}}\,e^{i\mathbf{G}\cdot\mathbf{r}}, \end{equation}

where \(\Omega\) is the unit cell volume. The basis is truncated at a kinetic energy cutoff \(E_{\rm cut}\). In Hartree atomic units the condition is:

\[ \frac{1}{2}|\mathbf{k}+\mathbf{G}|^2 \leq E_{\rm cut}, \]

equivalently \(\frac{\hbar^2}{2m_e}|\mathbf{k}+\mathbf{G}|^2 \leq E_{\rm cut}\) in SI notation. Codes quote \(E_{\rm cut}\) in different units: VASP uses eV, Quantum ESPRESSO uses Ry, and the underlying expressions assume Hartree. Conversions: \(1,\mathrm{Ha} = 27.2114,\mathrm{eV} = 2,\mathrm{Ry}\), so a VASP input of \(\texttt{ENCUT}=500\) eV corresponds to \(\sim 18.4,\mathrm{Ha}\), while \(\texttt{ecutwfc}=60\) Ry in QE corresponds to \(30,\mathrm{Ha} = 816,\mathrm{eV}\). When evaluating \(\tfrac{1}{2}|\mathbf{G}|^2\) by hand, keep \(\mathbf{G}\) in inverse bohr and \(E_{\rm cut}\) in Hartree to avoid factor-of-13.6 errors.

Advantages of plane waves:

Systematic completeness: increasing \(E_{\rm cut}\) improves the basis in a controlled, unbiased way. Convergence is monitored by a single parameter.
No basis set superposition error (BSSE): Plane waves are not centred on atoms, so the basis does not artificially lower the energy when two atoms approach.
Efficient evaluation: The Hartree potential and charge density are related by Poisson’s equation, which is diagonal in reciprocal space. Fast Fourier transforms (FFTs) enable \(\mathcal{O}(N \log N)\) switching between real and reciprocal space.
Implemented in major codes: VASP, Quantum ESPRESSO, ABINIT, CP2K (mixed Gaussian/PW).

Disadvantages:

All-electron calculations are impractical: Near the nuclei, KS orbitals oscillate rapidly (due to orthogonality to core states), requiring an extremely large \(E_{\rm cut}\). This motivates the use of pseudopotentials or PAW (see below).
Poor for isolated molecules: A large vacuum supercell is needed to prevent periodic images from interacting, wasting computational effort.
Not naturally localised: Analysis (e.g. Bader charges, wannier functions) requires post-processing.

Convergence with Respect to \(E_{\rm cut}\)

A convergence test is mandatory in every plane-wave calculation. The procedure is:

Fix all other parameters (cell, \(k\)-points, functional).
Compute the total energy \(E_{\rm tot}(E_{\rm cut})\) for a series of cutoff values.
Identify the cutoff where \(\Delta E_{\rm tot} < \epsilon_{\rm tol}\) (typically \(1\)–\(5\) meV/atom).

Typical converged cutoffs: \(\sim 400\)–\(600\) eV for GGA+PAW; \(\sim 700\)–\(1200\) eV for norm-conserving PPs; transition metal oxides and \(f\)-electron systems often require higher values.

\(k\)-Point Sampling

In a periodic solid, the KS equations must be solved at every \(\mathbf{k}\)-point in the first Brillouin zone (BZ). This section summarises the essential ideas needed to set up a basis-set calculation; the detailed treatment of smearing, the tetrahedron method, and the convergence rates of BZ integrals for metals is deferred to Chapter 9.

Physical observables involve BZ integrals, e.g.:

\begin{equation} \rho(\mathbf{r}) = \frac{\Omega}{(2\pi)^3}\sum_i \int_{\rm BZ} |\phi_{i\mathbf{k}}(\mathbf{r})|^2\,d\mathbf{k}. \end{equation}

In practice, the integral is replaced by a discrete sum over a finite mesh. The standard choice is the Monkhorst–Pack (MP) mesh: a uniform \(N_1 \times N_2 \times N_3\) grid in reciprocal space, optionally shifted to include or exclude \(\Gamma\). The total number of \(k\)-points is reduced by the point-group symmetry of the crystal.

Convergence with \(k\)-mesh density:

Metals require denser meshes (many \(k\)-points near the Fermi surface): \(20\times 20\times 20\) or more for simple metals.
Semiconductors and insulators with gaps converge faster: \(6\times 6\times 6\) is often sufficient.
A smearing method (Methfessel–Paxton, Fermi–Dirac, or Gaussian) is applied to the occupation numbers for metals to accelerate convergence.

The \(k\)-point density and \(E_{\rm cut}\) must be converged independently and jointly: the two parameters interact through the density, so a joint convergence test (varying both) is the most rigorous approach.

Localised Basis Sets

An alternative to plane waves is to expand orbitals in atom-centred functions that mimic the shape of atomic orbitals. Each basis function \(f_\mu(\mathbf{r}) = f_\mu(|\mathbf{r}-\mathbf{R}_A|)\) is centred on atom \(A\).

Gaussian-Type Orbitals (GTOs)

GTOs take the form \(f(r) = r^l e^{-\alpha r^2} Y_l^m(\hat{\mathbf{r}})\), where \(\alpha\) is the exponent and \(Y_l^m\) is a real spherical harmonic. Multi-Gaussian contractions are used:

\begin{equation} f_\mu(r) = \sum_k d_{\mu k}\,g_k(r,\alpha_k). \end{equation}

GTOs are the standard in quantum chemistry codes (Gaussian, CRYSTAL, FHI-aims). Common basis set families: 6-31G, cc-pVDZ/TZ, def2-TZVP. The acronyms encode the contraction scheme; “triple-zeta” basis sets use three groups of Gaussians per angular momentum channel and are typically converged for production calculations. Diffuse functions (denoted by \(+\) or “aug-”) are important for anions and excited states.

Numerical Atomic Orbitals (NAOs)

NAOs use numerically tabulated radial functions, typically derived from atomic DFT calculations. They are compact (strictly localised beyond a cutoff radius) and efficient for large-scale calculations. Used in FHI-aims (numeric all-electron), SIESTA, OpenMX.

Advantages of localised bases:

Efficient for molecules and non-periodic systems (no vacuum padding needed).
Sparse Hamiltonian and overlap matrices enable \(\mathcal{O}(N)\) scaling algorithms.
Natural interface with chemical intuition (orbital populations, Mulliken/Löwdin charges).

Disadvantages:

Basis set superposition error (BSSE): When two fragments approach, each borrows basis functions from the other, artificially lowering the energy. The counterpoise correction (Boys–Bernardi) partially remedies this.
Incompleteness: The basis is not systematically improvable by a single parameter; basis quality depends on the choice of exponents and contraction.
Egg-shaped convergence: Difficult to converge to the complete basis set (CBS) limit.

Core Electrons: Three Strategies

Relativistic and strongly oscillating core orbitals are expensive to represent and barely participate in chemistry or bonding. Three strategies exist for handling them.

All-Electron (AE) Calculations

All electrons, including core, are treated explicitly. Mandatory when:

Core–valence interactions are chemically important (e.g. hyperfine coupling, NMR shifts, X-ray spectra).
High-accuracy benchmarks are required.
Working with very light elements (H, Li) where “core” electrons don’t exist.

Examples: FHI-aims (NAO), WIEN2k (LAPW), FLEUR.

Norm-Conserving Pseudopotentials (NCPP)

The core electrons are replaced by an effective potential \(V_{\rm PP}(\mathbf{r})\) that reproduces the scattering properties of the full ionic potential in the valence region. The pseudo-wavefunction \(\tilde{\phi}_l(r)\) agrees with the true all-electron wavefunction \(\phi_l(r)\) outside a cutoff radius \(r_c\):

\begin{equation} \tilde{\phi}_l(r) = \phi_l(r), \quad r > r_c. \end{equation}

The norm-conservation condition requires:

\begin{equation} \int_0^{r_c} |\tilde{\phi}_l(r)|^2 r^2\,dr = \int_0^{r_c} |\phi_l(r)|^2 r^2\,dr, \end{equation}

ensuring that the charge inside \(r_c\) is preserved and that the logarithmic derivatives (which control scattering) match at \(r_c\). The Hamann–Schlüter–Chiang (HSC) and Troullier–Martins (TM) schemes are standard constructions.

NCPPs typically require \(E_{\rm cut} \sim 60\)–\(100\) Ry and are used in codes such as Quantum ESPRESSO and ABINIT.

Ultrasoft Pseudopotentials (USPP)

The ultrasoft pseudopotential (Vanderbilt, 1990) relaxes the norm-conservation condition, allowing smoother pseudo-wavefunctions that require much lower plane-wave cutoffs (\(E_{\rm cut} \sim 25\)–\(40\) Ry). The pseudo-wavefunction \(\tilde{\phi}_l(r)\) is permitted to deviate from the all-electron \(\phi_l(r)\) even for \(r > r_c^{\rm soft}\), producing a softer nodeless function.

The missing charge that would violate the norm-conservation sum rule is compensated by augmentation charges \(Q_{nm}(\mathbf{r})\) added to the density:

\begin{equation} \rho(\mathbf{r}) = \sum_i |\tilde{\phi}_i(\mathbf{r})|^2 + \sum_a\sum_{nm} \rho_{nm}^a Q_{nm}^a(\mathbf{r}), \end{equation}

where \(\rho_{nm}^a = \sum_i \langle\tilde{\phi}_i|\tilde{p}_n^a\rangle\langle\tilde{p}_m^a|\tilde{\phi}i\rangle\) are the occupancy matrices for atom \(a\), and \(Q{nm}^a(\mathbf{r})\) are localised functions inside the augmentation sphere that carry the missing charge. The generalised eigenvalue problem that results from USPP differs from the standard KS equation, requiring a modified orthonormality condition; however, this adds negligible overhead in practice.

USPP is available in Quantum ESPRESSO and CASTEP and offers an excellent cost-to-accuracy ratio for most elements.

Projector Augmented Wave (PAW) Method

PAW method showing all-electron and pseudo wavefunctions and the augmentation sphere — **Figure 5.1.** *(a)* All-electron (AE, terracotta) and pseudo (PS, blue dashed) wavefunctions as a function of \(r\). Inside the augmentation sphere \(r < r_c\), the AE wavefunction has nodes and a Coulomb singularity at the nucleus; the PS wavefunction is nodeless and smooth. Outside \(r_c\), they are identical. *(b)* The PAW transformation \(\hat{T} = 1 + \sum_n (|\phi_n\rangle - |\tilde\phi_n\rangle)\langle\tilde{p}_n|\) maps the smooth PS density \(\tilde\rho\) to the full AE density \(\rho = \tilde\rho + \rho^1 - \tilde\rho^1\), where the on-site corrections are computed inside the augmentation sphere from the AE and PS partial waves.

The **PAW method** (Blöchl, 1994) is the most rigorous of the pseudopotential-like approaches and is now the default in most major plane-wave codes. It can be understood as a generalisation of USPP that is formally equivalent to an all-electron calculation.

The PAW Transformation

PAW introduces a linear transformation operator \(\hat{\mathcal{T}}\) that maps smooth, computationally convenient pseudo-wavefunctions \(|\tilde{\Psi}_i\rangle\) onto the true all-electron wavefunctions \(|\Psi_i\rangle\):

\begin{equation} |\Psi_i\rangle = \hat{\mathcal{T}}\,|\tilde{\Psi}_i\rangle. \label{eq:PAW-transform} \end{equation}

Inside the augmentation sphere of radius \(r_c^a\) centred on each atom \(a\), the smooth pseudo-wavefunction is a poor approximation to the oscillatory all-electron wavefunction. The transformation \(\hat{\mathcal{T}}\) corrects for this, atom by atom:

\begin{equation} \hat{\mathcal{T}} = \mathbf{1} + \sum_a \hat{\mathcal{T}}^a, \qquad \hat{\mathcal{T}}^a = \sum_{n} \left(|\phi_n^a\rangle - |\tilde{\phi}_n^a\rangle\right)\langle\tilde{p}_n^a|, \label{eq:PAW-T} \end{equation}

where:

\(|\phi_n^a\rangle\) are all-electron partial waves — solutions of the radial Schrödinger equation for the isolated atom \(a\) at reference energies, labelled by \(n = (l, m, \epsilon_{\rm ref})\).
\(|\tilde{\phi}_n^a\rangle\) are smooth pseudo partial waves that match \(|\phi_n^a\rangle\) outside \(r_c^a\) but are nodeless and smooth inside.
\(\langle\tilde{p}_n^a|\) are projector functions localised inside \(r_c^a\), dual to the pseudo partial waves: \(\langle\tilde{p}_n^a|\tilde{\phi}m^a\rangle = \delta{nm}\).

Constructing the projectors. The duality relation does not uniquely fix \(|\tilde{p}_n^a\rangle\); it is a biorthogonality condition with the pseudo partial waves. The standard construction (Blöchl 1994; Kresse–Joubert 1999) starts from auxiliary functions \(|\chi_n^a\rangle\) that are localised inside \(r_c^a\) and have the same angular character as \(|\tilde{\phi}_n^a\rangle\) — typically obtained by acting on \(|\tilde{\phi}n^a\rangle\) with the operator \((\epsilon{\rm ref} - \hat{T} - \tilde{V})\), which annihilates the pseudo partial wave outside the sphere. The projectors are then biorthogonalised:

\[ |\tilde{p}n^a\rangle = \sum_m |\chi_m^a\rangle (B^{-1}){mn}, \qquad B_{mn} = \langle\chi_m^a|\tilde{\phi}_n^a\rangle, \]

which guarantees \(\langle\tilde{p}_n^a|\tilde{\phi}m^a\rangle = \delta{nm}\) by construction. The construction is performed once per element during PAW dataset generation; the user only sees the finished projector–partial-wave set in the POTCAR (VASP) or UPF (QE) file.

The transformation acts as follows: when \(\hat{\mathcal{T}}\) is applied to \(|\tilde{\Psi}_i\rangle\), the projectors \(\langle\tilde{p}_n^a|\) measure the overlap of the smooth wavefunction with the partial waves in each augmentation sphere, and the difference \((|\phi_n^a\rangle - |\tilde{\phi}_n^a\rangle)\) restores the all-electron character inside that sphere. Outside all augmentation spheres, \(\hat{\mathcal{T}}^a = 0\) and the smooth wavefunction is used directly.

Reconstruction of the All-Electron Density

The all-electron charge density is:

\begin{equation} \rho(\mathbf{r}) = \tilde{\rho}(\mathbf{r}) + \rho^1(\mathbf{r}) - \tilde{\rho}^1(\mathbf{r}), \label{eq:PAW-density} \end{equation}

where each term has a specific role:

\(\tilde{\rho}(\mathbf{r}) = \sum_i f_i,|\tilde{\Psi}_i(\mathbf{r})|^2\) is the smooth pseudo-density, computed from the plane-wave-expanded pseudo-wavefunctions throughout the entire cell. This is the computationally cheap term, evaluated with FFTs.
\(\rho^1(\mathbf{r}) = \sum_a\sum_{nm}\rho_{nm}^a,\phi_n^a(\mathbf{r})\phi_m^a(\mathbf{r})\) is the on-site all-electron density, reconstructed analytically from the all-electron partial waves inside the augmentation spheres. Here \(\rho_{nm}^a = \sum_i f_i\langle\tilde{\Psi}_i|\tilde{p}_n^a\rangle\langle\tilde{p}_m^a|\tilde{\Psi}_i\rangle\) are the occupation matrices that quantify the partial-wave character of each KS orbital at each atom.
\(\tilde{\rho}^1(\mathbf{r}) = \sum_a\sum_{nm}\rho_{nm}^a,\tilde{\phi}_n^a(\mathbf{r})\tilde{\phi}_m^a(\mathbf{r})\) is the on-site pseudo-density, which subtracts the smooth-wavefunction contribution inside the augmentation spheres so it is not double-counted with \(\tilde{\rho}\).

The total energy is decomposed in the same three-part way. The smooth part is evaluated in reciprocal space via FFTs; the on-site parts are computed analytically on radial grids inside each augmentation sphere. This decomposition is the key to PAW’s efficiency: the expensive oscillatory core region is handled on a small radial grid, while the smooth interstitial region uses the standard plane-wave machinery.

The PAW Total Energy

The total energy functional in PAW is:

\begin{equation} E = \tilde{E} + E^1 - \tilde{E}^1, \end{equation}

where \(\tilde{E}\) is the energy of the pseudo-system (computed with plane waves), \(E^1\) is the all-electron on-site energy inside the augmentation spheres, and \(\tilde{E}^1\) is the corresponding pseudo on-site energy that prevents double-counting. The KS equations derived from this functional take the form of a generalised eigenvalue problem:

\begin{equation} \hat{\tilde{H}}\,|\tilde{\Psi}_i\rangle = \epsilon_i\,\hat{S}\,|\tilde{\Psi}_i\rangle, \end{equation}

where \(\hat{\tilde{H}}\) is the smooth KS Hamiltonian (including on-site corrections) and \(\hat{S} = \mathbf{1} + \sum_a\sum_{nm}({\langle\tilde{p}n^a|\tilde{p}m^a\rangle{\rm AE} - \delta{nm}})|\tilde{p}_m^a\rangle\langle\tilde{p}_n^a|\) is the overlap operator that enforces the correct normalisation of the all-electron wavefunctions.

Frozen Core and Semi-Core States

A PAW dataset (or any pseudopotential) requires a decision about which electrons are “core” (treated as a frozen reference contribution to the potential) and which are “valence” (treated explicitly in the SCF loop). The choice is not always obvious: a state labelled chemically as “core” may polarise significantly under chemical bonding or magnetic ordering, and must then be promoted to valence. These promoted states are called semi-core states.

The most common cases requiring semi-core treatment are:

\(3d\) transition metals: the \(3p\) (and sometimes \(3s\)) states overlap energetically with the \(3d\) and \(4s\) valence states. In compounds with short metal–oxygen bonds, with large magnetic moments, or under high pressure, the \(3p\) electrons polarise and must be included. VASP provides Fe_pv (\(3p\) in valence) and Fe_sv (both \(3s\) and \(3p\) in valence) for such cases.
\(4d\) and \(5d\) transition metals: the \(4p\)/\(5p\) semi-core states are even more important. Standard VASP recommendations: use Mo_pv, Nb_pv, Ta_pv, W_pv rather than the plain potentials for most production work.
Alkali and alkaline earths: \(2s/2p\) semi-core states are needed for Na, Mg, K, Ca. Standard names: Na_pv, K_sv, Ca_sv.
Rare earths and actinides: \(5s5p\) (for \(4f\) elements) or \(6s6p\) (for \(5f\) elements) are routinely promoted to valence to capture the contraction of the \(f\) shell under bonding.

The trade-off is computational: each promoted semi-core state increases the number of bands \(N_{\rm bands}\) and typically requires a larger \(E_{\rm cut}\) because the semi-core wavefunctions oscillate more rapidly than valence states. Typical numbers: Fe (standard) needs \(E_{\rm cut} = 270\) eV; Fe_pv needs \(\sim 295\) eV; Fe_sv needs \(\sim 320\) eV. The accuracy gain is largest for total-energy differences (cohesive energies, phase stability, magnetic ordering energies); single-point band structure is less sensitive.

Recommendation: Start with the standard PAW potential and switch to _pv/_sv variants if (i) computing energy differences sensitive to inner-shell polarisation, (ii) treating compressed or strongly hybridised environments, or (iii) the VASP wiki / QE pseudopotential library explicitly recommends the semi-core variant for your element.

Strengths of PAW

Formally all-electron: The full all-electron density is available in the augmentation spheres through equation \eqref{eq:PAW-density}. This gives direct access to hyperfine parameters, electric field gradients, NMR chemical shifts, and core-level binding energies.
Low \(E_{\rm cut}\): The smooth part \(|\tilde{\Psi}i\rangle\) requires \(E{\rm cut} \sim 400\)–\(600\) eV (VASP), similar to USPP, far below what a true all-electron plane-wave calculation would need.
Transferability: The partial waves are atom-specific but not system-specific; the same PAW dataset works well across different chemical environments, unlike empirically fitted NCPP.
Extensibility: The occupation matrices \(\rho_{nm}^a\) are a natural handle for DFT+U and hybrid functional implementations.

Limitations of PAW

Fixed augmentation spheres: If \(r_c^a\) is too small, the smooth part is not genuinely smooth; if too large, augmentation spheres on adjacent atoms overlap, invalidating the non-overlapping sphere assumption. Standard VASP PAW datasets are designed to avoid overlap for typical bond lengths, but this must be verified under compression.
Linearisation of the partial wave expansion: PAW is exact only if the partial wave basis \({|\phi_n^a\rangle}\) is complete within the augmentation sphere. For highly ionic environments or unusual oxidation states, a larger partial wave set or scattering energies closer to the occupied states may be needed. VASP provides both standard and “hard” PAW datasets for such cases.

Comparison of core-electron strategies:

Method	\(E_{\rm cut}\) (typical)	Core accuracy	Access to core density	Cost	Default in
NCPP	\(60\)–\(100\) Ry	Good (outside \(r_c\))	No	Moderate	QE, ABINIT
USPP	\(25\)–\(40\) Ry	Good (valence)	No	Low	QE, CASTEP
PAW	\(30\)–\(50\) Ry	Excellent (AE in spheres)	Yes	Low–moderate	VASP, QE
All-electron	\(\gg 100\) Ry	Exact	Yes	High	FHI-aims, WIEN2k

Practical Convergence Protocol

A rigorous DFT calculation requires the following checks, performed in order:

\(E_{\rm cut}\) convergence: Increase the plane-wave cutoff until the total energy changes by less than \(1\)–\(2\) meV/atom. Use the converged cutoff for all subsequent calculations.
\(k\)-mesh convergence: Increase the Monkhorst–Pack grid density until the total energy and the quantity of interest (e.g. magnetic moment, band gap) are converged. Report the converged grid as \(N_1 \times N_2 \times N_3\).
Supercell size (for defects, surfaces, molecules): Increase the cell size until the interaction between periodic images is negligible (\(\Delta E \lt 1\) meV/defect).
SCF convergence criterion: Tighten the SCF energy tolerance to \(\sim 10^{-6}\) eV for energy differences; \(\sim 10^{-7}\) eV for forces or stress calculations.
Structural convergence: Relax ionic positions until forces are below \(0.01\)–\(0.02\) eV/Å; relax the cell until stresses are below \(0.1\) kbar.

Failure to converge any one of these can produce errors that dwarf those from the choice of XC functional. Convergence tests must be performed for each new system class; results from one system cannot be assumed transferable.

Outlook

With the KS equations specified, the XC functional chosen, and the basis and pseudopotential converged, we have all the ingredients for a standard spin-unpolarised DFT calculation. The next chapter extends the framework to spin-polarised systems, which are essential for magnetic materials and the study of exchange coupling in Heusler alloys and related compounds.

Spin-Polarised DFT and Magnetism

The standard Kohn–Sham DFT developed in Chapter 3 treats electrons as spinless particles: the density \(\rho(\mathbf{r})\) carries no spin label, and the KS equations yield doubly degenerate orbital energies. This is adequate for non-magnetic systems, but fails entirely when the ground state breaks time-reversal symmetry through a net electron spin polarisation.

To describe magnetic materials — from ferromagnetic iron to the half-metallic Heusler alloys and antiferromagnetic insulators — we must extend DFT to spin-polarised DFT (SDFT), and, for materials with strong spin–orbit coupling, to the full non-collinear formalism. This chapter develops the theoretical framework and connects it to the key observables: magnetic moments, exchange coupling constants \(J\), and magnetic anisotropy energies (MAE).

The Spin-Density Functional

Collinear SDFT

In collinear SDFT, spins are quantised along a fixed axis (conventionally \(z\)). The fundamental variable is replaced by the spin-resolved density:

\begin{equation} \rho\_\sigma(\mathbf{r}) = \sum\_{i} f\_{i\sigma}\,|\phi\_{i\sigma}(\mathbf{r})|^2, \qquad \sigma \in \{\uparrow, \downarrow\}, \end{equation}

where \(f_{i\sigma}\) are the orbital occupation numbers and the sum runs over all KS orbitals of spin \(\sigma\). The total and magnetisation densities are:

\begin{equation} \rho(\mathbf{r}) = \rho\_\uparrow(\mathbf{r}) + \rho\_\downarrow(\mathbf{r}), \qquad m(\mathbf{r}) = \rho\_\uparrow(\mathbf{r}) - \rho\_\downarrow(\mathbf{r}). \end{equation}

The HK theorem is generalised to show that the ground state is uniquely determined by the pair \((\rho(\mathbf{r}), m(\mathbf{r}))\), or equivalently by \((\rho_\uparrow, \rho_\downarrow)\). The total energy functional becomes:

\begin{equation} E[\rho\_\uparrow, \rho\_\downarrow] = T\_s[\rho\_\uparrow, \rho\_\downarrow] + E\_{\rm H}[\rho] + E\_{\rm xc}[\rho\_\uparrow, \rho\_\downarrow] + \int V\_{\rm ext}(\mathbf{r})\,\rho(\mathbf{r})\,d\mathbf{r}. \end{equation}

The two sets of spin-channel KS equations decouple:

\begin{equation} \left[-\frac{1}{2}\nabla^2 + V\_{\rm eff}^\sigma(\mathbf{r})\right]\phi\_{i\sigma}(\mathbf{r}) = \epsilon\_{i\sigma}\,\phi\_{i\sigma}(\mathbf{r}), \label{eq:SDFT-KS} \end{equation}

with spin-dependent effective potentials:

\begin{equation} V\_{\rm eff}^\sigma(\mathbf{r}) = V\_{\rm ext}(\mathbf{r}) + V\_{\rm H}(\mathbf{r}) + V\_{\rm xc}^\sigma(\mathbf{r}), \qquad V\_{\rm xc}^\sigma = \frac{\delta E\_{\rm xc}}{\delta \rho\_\sigma(\mathbf{r})}. \end{equation}

The Hartree potential \(V_{\rm H}\) is spin-independent (it depends on the total \(\rho\)), while \(V_{\rm xc}^\sigma\) differs between spin channels, producing a spin-splitting of the eigenvalues \(\epsilon_{i\uparrow} \neq \epsilon_{i\downarrow}\). This is the quantum-mechanical origin of the exchange splitting that drives ferromagnetism.

The Spin-Polarised XC Functional

The XC functional must be generalised to take both \(\rho_\uparrow\) and \(\rho_\downarrow\) as arguments. For the LDA, the local spin-density approximation (LSDA) uses the XC energy of a spin-polarised uniform electron gas:

\[ E_{\rm xc}^{\rm LSDA}[\rho_\uparrow, \rho_\downarrow] = \int \varepsilon_{\rm xc}^{\rm UEG}(\rho_\uparrow(\mathbf{r}), \rho_\downarrow(\mathbf{r})),\rho(\mathbf{r}),d\mathbf{r}. \]

The Oliver–Perdew spin-scaling relation for exchange. For exchange, the spin-polarised energy can be written exactly in terms of the spin-unpolarised functional. Note first that a fully polarised gas of density \(\rho\) (all electrons in one spin channel) has the same exchange energy as a spin-unpolarised gas of density \(2\rho\) per spin channel — because each spin channel separately is a closed UEG, and exchange only acts between same-spin electrons. This gives the spin-scaling relation (Oliver and Perdew, 1979):

\[ E_{\rm x}[\rho_\uparrow, \rho_\downarrow] = \frac{1}{2}!\left( E_{\rm x}[2\rho_\uparrow] + E_{\rm x}[2\rho_\downarrow] \right), \]

where \(E_{\rm x}[\rho]\) on the right is the spin-unpolarised exchange functional evaluated at twice the density of each spin channel. Substituting the Dirac form \(\varepsilon_{\rm x}^{\rm UEG}(\rho) = -\tfrac{3}{4}(3/\pi)^{1/3}\rho^{1/3}\) and writing \(\zeta = (\rho_\uparrow - \rho_\downarrow)/\rho\), the result simplifies to:

\[ \varepsilon_{\rm x}(\rho_\uparrow, \rho_\downarrow) = \varepsilon_{\rm x}^{\rm UEG}(\rho)\cdot\frac{(1+\zeta)^{4/3} + (1-\zeta)^{4/3}}{2}. \]

The factor \([(1+\zeta)^{4/3} + (1-\zeta)^{4/3}]/2\) is the exchange spin-enhancement factor: it equals 1 at \(\zeta = 0\) (paramagnetic) and \(2^{1/3} \approx 1.26\) at \(\zeta = \pm 1\) (fully polarised), reflecting that full spin polarisation lowers the exchange energy by 26%.

Correlation is harder. No exact spin-scaling exists for the correlation energy because correlation arises from interactions between opposite-spin electrons (in addition to same-spin correlation, which is partially included in exchange). Von Barth and Hedin (1972) proposed interpolating between the paramagnetic (\(\zeta = 0\)) and ferromagnetic (\(\zeta = 1\)) limits:

\[ \varepsilon_{\rm c}(\rho, \zeta) = \varepsilon_{\rm c}(\rho, 0) + \left[\varepsilon_{\rm c}(\rho, 1) - \varepsilon_{\rm c}(\rho, 0)\right] f(\zeta), \]

with the interpolation function:

\[ f(\zeta) = \frac{(1+\zeta)^{4/3} + (1-\zeta)^{4/3} - 2}{2(2^{1/3}-1)}, \]

which satisfies \(f(0)=0\), \(f(1)=1\), and reproduces the leading \(\zeta^2\) behaviour expected from perturbation theory in the spin polarisation. The values \(\varepsilon_{\rm c}(\rho, 0)\) and \(\varepsilon_{\rm c}(\rho, 1)\) are parametrised separately from QMC (Perdew–Wang 1992; Vosko–Wilk–Nusair 1980 use the same structure).

GGA and meta-GGA functionals extend this scheme by treating exchange via the exact spin-scaling relation applied to the enhancement factor \(F_{\rm x}(\rho, s)\), and correlation via analogous interpolations parametrised against UEG data.

Magnetic Moments

The local magnetic moment on site \(A\) is:

\begin{equation} \mu\_A = \int\_{\Omega\_A} m(\mathbf{r})\,d\mathbf{r} = \int\_{\Omega\_A} [\rho\_\uparrow(\mathbf{r}) - \rho\_\downarrow(\mathbf{r})]\,d\mathbf{r}, \end{equation}

where \(\Omega_A\) is the Wigner–Seitz (or PAW augmentation) sphere of atom \(A\). The total magnetisation is:

\begin{equation} M = \int m(\mathbf{r})\,d\mathbf{r} = N\_\uparrow - N\_\downarrow, \end{equation}

where \(N_\sigma = \int \rho_\sigma,d\mathbf{r}\) is the total number of electrons of spin \(\sigma\).

For elemental ferromagnets, SDFT recovers the experimental moments well: Fe (\(2.2,\mu_B\) exp., \(2.1\)–\(2.3,\mu_B\) DFT), Co (\(1.7,\mu_B\)), Ni (\(0.6,\mu_B\)). Orbital contributions (from spin–orbit coupling) are not included at the collinear SDFT level.

The Stoner Criterion

A natural question is: when will SDFT spontaneously break spin symmetry to produce a magnetic ground state? The answer is given by the Stoner criterion — the microscopic condition for itinerant ferromagnetism, derivable from SDFT in the linear-response limit.

Consider a small spin polarisation \(\delta m\) imposed on a non-magnetic metal. The exchange splitting that develops is approximately \(\Delta_{\rm xc} = I,\delta m\), where \(I\) is the Stoner exchange integral — essentially the magnitude of the spin-asymmetric XC potential per unit magnetisation. The number of electrons that flip from \(\downarrow\) to \(\uparrow\) in response to this splitting is \(\delta m = N(\epsilon_F) \Delta_{\rm xc} = N(\epsilon_F) I,\delta m\), where \(N(\epsilon_F)\) is the density of states at the Fermi level per spin channel. Self-consistency \(\delta m = N(\epsilon_F) I,\delta m\) admits a non-trivial solution \(\delta m \neq 0\) only when:

\begin{equation} I \, N(\epsilon\_F) > 1. \label{eq:Stoner} \end{equation}

This is the Stoner criterion: spontaneous itinerant ferromagnetism occurs when the product of the Stoner parameter and the paramagnetic DOS at \(\epsilon_F\) exceeds unity. Physically, \(I\) favours spin polarisation (exchange energy gain) while a low \(N(\epsilon_F)\) opposes it (kinetic energy cost). For the elemental \(3d\) ferromagnets, calculated values are:

Element	\(I\) (eV)	\(N(\epsilon_F)\) (eV\(^{-1}\)/spin)	\(IN(\epsilon_F)\)	Magnetic?
Fe (bcc)	0.93	1.54	1.43	Yes (FM)
Co (hcp)	0.99	1.72	1.70	Yes (FM)
Ni (fcc)	1.01	2.02	2.04	Yes (FM)
Pd (fcc)	0.68	1.14	0.78	No (enhanced paramagnet)
Pt (fcc)	0.63	0.79	0.50	No

Pd narrowly fails the criterion and is consequently an enhanced Pauli paramagnet with strongly amplified spin susceptibility — sometimes called “nearly magnetic.” This single inequality captures, at the mean-field SDFT level, why Fe/Co/Ni are ferromagnetic and Pd/Pt are not, despite all five elements having narrow \(d\) bands.

Magnetic Ordering: Ferro-, Antiferro-, and Ferrimagnetic States

By setting up different initial spin configurations in the SCF cycle, collinear SDFT can target different magnetic orderings:

Ferromagnetic (FM): all local moments aligned parallel, \(M \neq 0\).
Antiferromagnetic (AFM): moments on sublattices aligned antiparallel, \(M = 0\) globally. Requires a supercell large enough to contain both sublattices.
Ferrimagnetic (FiM): unequal antiparallel moments on distinct sublattices, \(M \neq 0\). Common in half-Heusler and spinel compounds.

To determine which ordering is the ground state, compute the total energy for each configuration and compare: \(\Delta E = E_{\rm AFM} - E_{\rm FM}\).

Exchange Coupling: The Heisenberg Model

The energy difference between magnetic configurations can be mapped onto the Heisenberg spin Hamiltonian:

\begin{equation} \hat{\mathcal{H}}\_{\rm Heis} = -\sum\_{\langle i,j\rangle} J\_{ij}\,\hat{\mathbf{S}}\_i \cdot \hat{\mathbf{S}}\_j, \end{equation}

where \(\hat{\mathbf{S}}_i\) is the spin operator on site \(i\) and \(J_{ij}\) is the exchange coupling constant between sites \(i\) and \(j\). The sign convention: \(J > 0\) favours FM alignment; \(J \lt 0\) favours AFM.

Extraction of \(J\) from Total Energies

For a two-sublattice system with spin \(S\) per site, the energy difference between FM and AFM configurations (with \(z\) equivalent nearest-neighbour pairs per formula unit) is:

\begin{equation} \Delta E = E\_{\rm AFM} - E\_{\rm FM} = 2J\,z\,S^2. \label{eq:J-extraction} \end{equation}

Hence:

\begin{equation} J = \frac{E\_{\rm AFM} - E\_{\rm FM}}{2\,z\,S^2}. \end{equation}

For more complex magnetic structures with multiple shells of neighbours, the four-state method (Xiang et al.) or the spin-spiral approach (Section below) provides a systematic way to disentangle \(J_1, J_2, J_3, \ldots\)

Liechtenstein–Katsnelson–Antropov–Gubanov (LKAG) Method

An alternative to total energy differences is the LKAG (linear response) method, which computes \(J_{ij}\) directly from the Green’s function:

\begin{equation} J\_{ij} = \frac{1}{4\pi}\int\_{-\infty}^{E\_F} {\rm Im}\,{\rm Tr}\_L\left[\Delta\_i G\_{ij}^\uparrow(E)\,\Delta\_j G\_{ji}^\downarrow(E)\right]\,dE, \end{equation}

where \(\Delta_i = V_{\rm xc,i}^\uparrow - V_{\rm xc,i}^\downarrow\) is the local exchange splitting on site \(i\) and \(G_{ij}^\sigma\) are the spin-resolved intersite Green’s functions. This method gives the full \(J_{ij}\) tensor in a single DFT calculation and is available in codes such as FLEUR, SPR-KKR, and Questaal.

Practical note: Total energy methods (equation \eqref{eq:J-extraction}) are simpler and widely used. They require DFT+U or hybrid functionals for correlated \(d\)/\(f\)-electron systems where GGA overdelocalises the magnetic orbitals and underestimates \(J\).

Non-Collinear SDFT

In the collinear formalism, the magnetisation is constrained to lie along \(z\). This is insufficient for:

Spin spirals and incommensurate magnetic structures.
Spin frustration in triangular or kagome lattices.
Spin–orbit coupling (SOC) effects: the magnetisation direction matters when SOC is present, and it may vary in space.
Dzyaloshinskii–Moriya interaction (DMI) and skyrmions.

Dzyaloshinskii–Moriya Interaction

The DMI is an antisymmetric exchange interaction between two spins that prefers a canted (non-collinear) alignment, in contrast to the Heisenberg coupling that prefers parallel (\(J > 0\)) or antiparallel (\(J < 0\)) configurations. Its form is:

\begin{equation} \hat{\mathcal{H}}\_{\rm DMI} = \sum\_{\langle i,j\rangle} \mathbf{D}\_{ij}\cdot\left(\hat{\mathbf{S}}\_i \times \hat{\mathbf{S}}\_j\right), \label{eq:DMI} \end{equation}

where \(\mathbf{D}_{ij}\) is the DM vector characterising the bond \(i\text{–}j\). The microscopic origin is the combination of spin–orbit coupling and broken inversion symmetry: the Moriya symmetry rules state that \(\mathbf{D}_{ij} = 0\) when the midpoint of the bond is an inversion centre, so DMI vanishes by symmetry in centrosymmetric crystals. It is enabled by:

Bulk inversion-symmetry breaking in non-centrosymmetric crystals (e.g. B20 compounds like MnSi, FeGe — sources of bulk skyrmion lattices).
Interface inversion-symmetry breaking at heavy-metal / ferromagnet interfaces (e.g. Pt/Co, Ir/Fe — sources of interfacial DMI driving Néel-type skyrmions in thin films).

The minimum-energy configuration of \(\hat{\mathcal{H}}_{\rm DMI}\) alone is a rotation of spins around \(\mathbf{D}_{ij}\); competition with Heisenberg \(J\) and magnetic anisotropy produces helical spin spirals, cycloids, and topologically nontrivial skyrmions. Computing \(\mathbf{D}_{ij}\) from DFT requires non-collinear SDFT with SOC; the standard approach is to compute energies of spin spirals as a function of wavevector \(\mathbf{q}\) and extract the linear-in-\(q\) term, or to use the Liechtenstein-style Green’s function approach generalised to the off-diagonal spin-flip channels.

Non-Collinear Density Matrix

In non-collinear SDFT, the density variable is replaced by the full \(2\times 2\) density matrix in spin space:

\begin{equation} n\_{\alpha\beta}(\mathbf{r}) = \sum\_i f\_i\,\phi\_i^\alpha(\mathbf{r})\left(\phi\_i^\beta(\mathbf{r})\right)^*, \qquad \alpha,\beta \in \{\uparrow,\downarrow\}, \end{equation}

which is related to the magnetisation density vector \(\mathbf{m}(\mathbf{r})\) via:

\begin{equation} n\_{\alpha\beta} = \frac{1}{2}\left[\rho\,\delta\_{\alpha\beta} + \mathbf{m}\cdot\boldsymbol{\sigma}\_{\alpha\beta}\right], \end{equation}

where \(\boldsymbol{\sigma} = (\sigma_x, \sigma_y, \sigma_z)\) are the Pauli matrices.

The KS orbitals become two-component spinors \(\boldsymbol{\phi}_i = (\phi_i^\uparrow, \phi_i^\downarrow)^T\), and the KS equations take the matrix form:

\begin{equation} \left[-\frac{1}{2}\nabla^2\,\mathbf{I} + \mathbf{V}\_{\rm eff}(\mathbf{r})\right]\boldsymbol{\phi}\_i(\mathbf{r}) = \epsilon\_i\,\boldsymbol{\phi}\_i(\mathbf{r}), \end{equation}

with the \(2\times 2\) effective potential:

\begin{equation} \mathbf{V}\_{\rm eff}(\mathbf{r}) = \left[V\_{\rm ext} + V\_{\rm H} + V\_{\rm xc}^{(0)}\right]\mathbf{I} - \mathbf{B}\_{\rm xc}(\mathbf{r})\cdot\boldsymbol{\sigma}, \end{equation}

where \(V_{\rm xc}^{(0)} = \frac{1}{2}(V_{\rm xc}^\uparrow + V_{\rm xc}^\downarrow)\) is the spin-averaged XC potential and \(\mathbf{B}_{\rm xc} = \frac{1}{2}(V_{\rm xc}^\uparrow - V_{\rm xc}^\downarrow)\hat{\mathbf{m}}\) is the XC magnetic field aligned along the local magnetisation direction \(\hat{\mathbf{m}}(\mathbf{r})\).

Spin–Orbit Coupling (SOC)

Spin–orbit coupling is a relativistic effect arising from the interaction of an electron’s spin magnetic moment with the magnetic field it experiences due to its orbital motion around the nucleus. In the Pauli approximation it takes the form:

\begin{equation} \hat{H}\_{\rm SOC} = \frac{\hbar^2}{4m\_e^2c^2}\frac{1}{r}\frac{dV}{dr}\,\hat{\mathbf{L}}\cdot\hat{\mathbf{S}}, \end{equation}

where \(\hat{\mathbf{L}}\) and \(\hat{\mathbf{S}}\) are the orbital and spin angular momentum operators, and \(dV/dr\) is the radial derivative of the ionic potential. SOC is strong near heavy nuclei (scales as \(Z^4\)), making it important for \(4d\), \(5d\), \(4f\), and \(5f\) elements.

In SDFT, SOC is most commonly included via the second-variational method:

Solve the scalar-relativistic (collinear) SDFT problem first.
Construct the SOC matrix elements in the basis of scalar-relativistic eigenstates.
Diagonalise the full Hamiltonian \(\hat{H}_{\rm SR} + \hat{H}_{\rm SOC}\).

Full self-consistent inclusion of SOC requires the non-collinear spinor formalism.

Magnetic Anisotropy Energy (MAE)

The magnetic anisotropy energy is the difference in total energy between magnetisation oriented along the easy axis and a hard axis:

\begin{equation} {\rm MAE} = E(\hat{\mathbf{n}}\_{\rm hard}) - E(\hat{\mathbf{n}}\_{\rm easy}), \end{equation}

typically measured in \(\mu\)eV/atom for \(3d\) metals or meV/atom for heavy-element compounds. MAE determines the magnetic hardness of a material: a large positive MAE (perpendicular easy axis) is required for permanent magnets and perpendicular magnetic recording media.

MAE arises entirely from spin–orbit coupling: without SOC, all directions are degenerate. It vanishes in spin-orbit-free collinear SDFT and requires either:

Self-consistent non-collinear calculation with SOC for large MAE (e.g. \(L1_0\)-FePt, \(> 1\) meV/atom).
Force theorem (perturbative SOC): For small MAE, compute the non-self-consistent energy with SOC applied to the converged scalar-relativistic density. Computationally cheaper but breaks down when SOC significantly modifies the density.

The MAE is dominated by the band energy contribution (summed KS eigenvalue difference):

\begin{equation} {\rm MAE} \approx \sum\_i (f\_i^{\rm hard}\,\epsilon\_i^{\rm hard} - f\_i^{\rm easy}\,\epsilon\_i^{\rm easy}), \end{equation}

which requires very dense \(k\)-meshes to converge (fine Brillouin zone features near the Fermi level). Typical convergence requires \(40\times 40\times 40\) or finer grids for transition metal systems.

Contribution from Orbital Moments: Bruno Relation

The connection between MAE and orbital moments was derived by Bruno (1989) using second-order perturbation theory in the SOC strength \(\xi\). The SOC Hamiltonian is:

\[ \hat{H}_{\rm SOC} = \xi,\hat{\mathbf{L}}\cdot\hat{\mathbf{S}}. \]

For a system with spin quantised along \(\hat{\mathbf{n}}\), the second-order energy correction is:

\[ E^{(2)}(\hat{\mathbf{n}}) = -\xi^2 \sum_{o,u} \frac{|\langle u|\hat{H}_{\rm SOC}^{(\hat{\mathbf{n}})}|o\rangle|^2}{\epsilon_u - \epsilon_o}, \]

where \(o\) and \(u\) label occupied and unoccupied states. Bruno showed that for a system with uniaxial symmetry and spin fixed along \(\hat{\mathbf{n}}\), the dominant contribution to this sum is proportional to the expectation value of the orbital angular momentum along \(\hat{\mathbf{n}}\):

\[ E^{(2)}(\hat{\mathbf{n}}) \approx -\frac{\xi}{4\mu_B}\langle \hat{L}_{\hat{\mathbf{n}}} \rangle, \]

where \(\langle \hat{L}_{\hat{\mathbf{n}}} \rangle = \mu_B^{-1}\langle\hat{m}_L^{\hat{\mathbf{n}}}\rangle\) is the orbital moment along \(\hat{\mathbf{n}}\) in units of \(\mu_B\). The MAE then follows as the difference between two magnetisation directions:

\[ {\rm MAE} = E(\hat{\mathbf{n}}_{\rm hard}) - E(\hat{\mathbf{n}}_{\rm easy}) \approx -\frac{\xi}{4\mu_B}\left(\langle L_z^{\rm hard}\rangle - \langle L_z^{\rm easy}\rangle\right). \]

Interpretation: The easy axis is the direction along which the orbital moment is largest (most negative \(E^{(2)}\)), since a larger orbital moment means more SOC energy lowering. Materials with large orbital moments — rare-earth \(4f\) ions, \(5d\) transition metals, or low-symmetry environments that quench the orbital moment less — consequently have large MAE. This is the microscopic basis for the empirical rule that heavier elements with unquenched orbital moments make better permanent magnets.

Caveats: Bruno’s derivation assumes (i) perturbative SOC (\(\xi \ll\) bandwidth), (ii) rigid band approximation (same orbitals for both directions), and (iii) collinear spin. It is quantitatively reliable for \(3d\) metals but breaks down for \(4f\) systems where SOC is comparable to the crystal field splitting, or for systems near band crossings where the denominator \(\epsilon_u - \epsilon_o\) becomes small. In those cases the full non-perturbative calculation is required.

Application to Heusler Alloys

Heusler alloys (\(X_2YZ\) full Heusler or \(XYZ\) half-Heusler) are a rich playground for SDFT because they exhibit a wide range of magnetic phenomena:

Half-metallic ferromagnetism: In compounds like Co\(_2\)MnSi, one spin channel (\(\uparrow\)) is metallic while the other (\(\downarrow\)) has a band gap at \(E_F\), giving a spin polarisation of 100% in principle. SDFT with GGA predicts this correctly for many Heuslers; a scissor correction or hybrid functional is needed to accurately reproduce the minority-spin gap.

Slater–Pauling rule: The total magnetic moment per formula unit follows \(M = N_v - 24\) (full Heusler) or \(M = N_v - 18\) (half-Heusler), where \(N_v\) is the total valence electron count. This rule, reproduced by SDFT, provides a rapid screening criterion for half-metallicity.

Exchange coupling and \(T_C\): The Curie temperature can be estimated from \(J_{ij}\) values via mean-field theory (\(k_B T_C^{\rm MF} = \frac{2}{3}J_0\)) or random phase approximation (RPA), with RPA giving more accurate results for itinerant magnets. SDFT+LKAG is the standard approach.

MAE in Heuslers: Most cubic Heuslers have small MAE (\(\sim \mu\)eV/atom) due to cubic symmetry. Tetragonally distorted Heuslers (e.g. Mn\(_3\)Ga) or inverse Heuslers can have significantly larger MAE, relevant for spin-transfer torque applications.

Practical Checklist for Magnetic DFT Calculations

Initialise spin correctly: Supply a non-zero initial magnetic moment to each magnetic site; the SCF loop will not spontaneously break spin symmetry from a non-magnetic starting point.
Check for multiple magnetic minima: Try several initial configurations (FM, AFM, ferrimagnetic) and compare final energies. Collinear SDFT can get trapped in local minima.
Use appropriate XC: PBE is standard; consider PBE+U or hybrid functionals for correlated \(d/f\) systems where GGA under-localises the magnetic orbitals.
For MAE: Use a non-collinear calculation with SOC; converge the \(k\)-mesh carefully (MAE is extremely sensitive to \(k\)-point density).
For \(J\): Cross-check total-energy and Green’s function methods; the two should agree within \(\sim 10\)–\(20%\).
Report orbital moments: When SOC is included, report both spin and orbital magnetic moments for comparison with XMCD experiments.

Outlook

Collinear SDFT provides an accurate and efficient framework for most magnetic materials, with non-collinear SDFT required for frustrated systems, SOC-driven anisotropy, and DMI. A systematic limitation shared by all SDFT methods is the treatment of strongly correlated \(d\) and \(f\) electrons, where the mean-field XC potential fails to capture Mott–Hubbard physics. The DFT+U method, which introduces an on-site Hubbard correction for localised orbitals, is the subject of the next chapter.

DFT+U: Hubbard Correction for Correlated Electrons

Standard DFT — whether LDA or GGA — describes electrons as moving in an effective single-particle potential constructed from the average charge density. This mean-field picture works well when electrons are itinerant, delocalised across the crystal in broad bands. It fails systematically for materials containing partially filled \(d\) or \(f\) shells, where electrons are spatially localised and strongly repel each other on the same atomic site. The canonical failures are well-known: GGA predicts NiO, CoO, and MnO to be metals; their experimental ground states are magnetic insulators with gaps of \(3\)–\(4\) eV. Rare-earth oxides, actinide compounds, and many Heusler alloys with localised Mn or Co \(d\) states share the same pathology.

The root cause is the self-interaction error (SIE) and the failure of the semi-local XC functional to reproduce the Coulomb blockade — the large energy cost \(U\) of placing two electrons on the same atomic orbital simultaneously. The DFT+U method (Anisimov, Zaanen, Andersen, 1991) corrects this by appending a Hubbard-like penalty term to the DFT energy functional, adding the cost of double occupation explicitly for a chosen set of localised orbitals. It is the most widely used method for strongly correlated materials in a plane-wave or PAW framework.

The Hubbard Model: Physical Motivation

The Hubbard model (Hubbard, 1963) is the minimal lattice model capturing the competition between kinetic energy (electron hopping, bandwidth \(W\)) and on-site Coulomb repulsion:

\begin{equation} \hat{H}\_{\rm Hub} = -t\sum\_{\langle i,j\rangle,\sigma}\hat{c}\_{i\sigma}^\dagger\hat{c}\_{j\sigma} + U\sum\_i \hat{n}\_{i\uparrow}\hat{n}\_{i\downarrow}, \label{eq:Hubbard} \end{equation}

where \(\hat{c}_{i\sigma}^\dagger\) creates an electron of spin \(\sigma\) on site \(i\), \(t\) is the nearest-neighbour hopping integral, \(U\) is the on-site Coulomb repulsion, and \(\hat{n}_{i\sigma} = \hat{c}_{i\sigma}^\dagger\hat{c}_{i\sigma}\) is the occupation operator.

The two limits are illuminating:

\(U/W \ll 1\) (itinerant limit): The bandwidth \(W \sim zt\) (where \(z\) is the coordination number) dominates. Electrons delocalise and form a metallic band. Standard DFT handles this regime well.
\(U/W \gg 1\) (Mott–Hubbard limit): The on-site repulsion dominates. For a half-filled band (one electron per site), double occupancy costs energy \(U\), so electrons localise on individual sites and the system becomes a Mott insulator, even though band theory would predict a metal. DFT with semi-local XC functionals fails here because the mean-field treatment of electron–electron interaction cannot produce this correlation-driven localisation.

The Mott–Hubbard gap is approximately \(U - W\): it opens when the repulsion energy exceeds the kinetic energy gain from delocalisation. For transition metal oxides, \(U \sim 4\)–\(10\) eV and \(W \sim 1\)–\(4\) eV, placing them firmly in the correlated regime.

Why Standard DFT Fails for Correlated Systems

The failure of LDA/GGA for Mott insulators can be traced to two related deficiencies:

1. Self-interaction error (SIE). As discussed in Chapter 4, approximate XC functionals do not fully cancel the spurious self-repulsion of an electron with itself. For a localised \(d\) orbital occupied by a single electron, the SIE artificially delocalises the orbital, lowering its energy and pushing it into the valence band. The tendency of LDA/GGA to over-delocalise \(d\) and \(f\) electrons is a direct consequence.

2. Missing derivative discontinuity. The exact XC potential has a finite discontinuity at integer electron numbers \(N\):

\begin{equation} \Delta\_{\rm xc} = V\_{\rm xc}^+(N) - V\_{\rm xc}^-(N) > 0, \label{eq:disc} \end{equation}

where \(V_{\rm xc}^\pm\) denote the XC potential just above and below integer \(N\). This discontinuity, combined with the KS band gap, gives the true quasiparticle gap: \(E_g^{\rm true} = E_g^{\rm KS} + \Delta_{\rm xc}\). Semi-local functionals give \(\Delta_{\rm xc} = 0\) exactly, so they systematically underestimate gaps. For Mott insulators the KS gap is zero (they are predicted metallic), and \(\Delta_{\rm xc}\) accounts for the entire physical gap.

DFT+U corrects both deficiencies for the targeted orbitals: the Hubbard \(U\) term penalises fractional occupations and introduces a step-like potential that mimics the derivative discontinuity.

The DFT+U Energy Functional

Occupation Matrix

The central object in DFT+U is the occupation matrix \(n_{mm’}^\sigma\) of the correlated subspace on each atom \(I\):

\begin{equation} n\_{mm'}^{I\sigma} = \sum\_{\mathbf{k},\nu} f\_{\mathbf{k}\nu\sigma} \langle\phi\_{\mathbf{k}\nu\sigma}|\hat{P}\_{m'}^{I}\rangle \langle\hat{P}\_m^{I}|\phi\_{\mathbf{k}\nu\sigma}\rangle, \label{eq:occmat} \end{equation}

where \(f_{\mathbf{k}\nu\sigma}\) are KS orbital occupation numbers, \(|\phi_{\mathbf{k}\nu\sigma}\rangle\) are the KS spinor states, and \(\hat{P}_m^I = |\chi_m^I\rangle\langle\chi_m^I|\) is the projector onto the localised orbital \(|\chi_m^I\rangle\) (e.g. the \(d_{m}\) orbital on atom \(I\), with \(m = -l, \ldots, l\)). In PAW, these projectors are the on-site partial waves; in LCAO codes, they are the basis functions themselves.

The occupation matrix is Hermitian (\(n_{mm’}^{I\sigma} = (n_{m’m}^{I\sigma})^*\)) and its trace gives the total occupation of spin channel \(\sigma\) on site \(I\):

\begin{equation} N^{I\sigma} = \sum\_m n\_{mm}^{I\sigma} = \mathrm{Tr}[\mathbf{n}^{I\sigma}]. \end{equation}

The eigenvalues \({f_\alpha^{I\sigma}}\) of \(\mathbf{n}^{I\sigma}\) lie in \([0,1]\) and are the natural orbital occupation numbers of the correlated subspace.

The Full DFT+U Functional

The total DFT+U energy is:

\begin{equation} E\_{\rm DFT+U} = E\_{\rm DFT}[\rho] + E\_U[\{n\_{mm'}^{I\sigma}\}] - E\_{\rm dc}[\{N^{I\sigma}\}], \label{eq:DFTU-total} \end{equation}

where \(E_{\rm DFT}\) is the standard LDA or GGA energy, \(E_U\) is the Hubbard correction, and \(E_{\rm dc}\) is the double-counting correction — a term that must be subtracted because the Coulomb interaction among the correlated electrons is already partially included in the DFT XC energy and must not be counted twice.

Two Formulations

Liechtenstein Formulation (Full Rotationally Invariant)

The Liechtenstein formulation (Liechtenstein, Anisimov, Zaanen, 1995) uses the full rotationally invariant Coulomb interaction characterised by two parameters: the average on-site Coulomb repulsion \(U\) and the average exchange interaction \(J\):

\begin{equation} E\_U^{\rm Lich} = \frac{1}{2}\sum\_{I,\sigma}\sum\_{\{m\}} \left[U\_{m\_1m\_3}^{m\_2m\_4} - \delta\_{\sigma\sigma'}J\_{m\_1m\_3}^{m\_2m\_4}\right] n\_{m\_1m\_2}^{I\sigma}\,n\_{m\_3m\_4}^{I\sigma'}, \end{equation}

where \(U_{m_1m_3}^{m_2m_4}\) and \(J_{m_1m_3}^{m_2m_4}\) are the screened Slater integrals parametrising the full Coulomb tensor. In practice these are reduced to a small number of radial integrals \(F^0, F^2, F^4\) (for \(d\) orbitals) related to \(U\) and \(J\) by:

\begin{equation} U = F^0, \qquad J = \frac{F^2 + F^4}{14}. \end{equation}

The double-counting correction in the Liechtenstein scheme uses the around mean-field (AMF) form:

\begin{equation} E\_{\rm dc}^{\rm AMF} = \frac{U}{2}N(N-1) - \frac{J}{2}\sum\_\sigma N^\sigma(N^\sigma - 1), \end{equation}

where \(N = \sum_\sigma N^\sigma\) is the total occupation of the correlated shell.

Dudarev Formulation (Simplified, \(U_{\rm eff}\))

For most practical purposes, the simpler Dudarev formulation (Dudarev et al., 1998) is preferred. It collapses \(U\) and \(J\) into a single effective parameter \(U_{\rm eff} = U - J\) and writes the correction as a penalty on fractional orbital occupations:

\begin{equation} \boxed{E\_U^{\rm Dur} = \frac{U\_{\rm eff}}{2}\sum\_{I,\sigma} \mathrm{Tr}\!\left[\mathbf{n}^{I\sigma}\!\left(\mathbf{1} - \mathbf{n}^{I\sigma}\right)\right].} \label{eq:Dudarev} \end{equation}

This elegant form has a transparent physical interpretation: the factor \(n_{mm}^{I\sigma}(1 - n_{mm}^{I\sigma})\) vanishes when orbital \(m\) is either fully occupied (\(n = 1\)) or fully empty (\(n = 0\)), and is maximised at \(n = 1/2\) (half occupation). The correction thus adds a penalty for fractional occupations, driving the occupation matrix towards integer eigenvalues. This is precisely the correction for the SIE, which tends to stabilise unphysical non-integer occupations.

The corresponding correction to the KS potential is:

\begin{equation} V\_{mm'}^{I\sigma,\rm DFT+U} = \frac{U\_{\rm eff}}{2}\left(\delta\_{mm'} - 2n\_{mm'}^{I\sigma}\right), \label{eq:Vpot} \end{equation}

which shifts occupied orbitals (large \(n_{mm}\)) downward in energy by \(-U_{\rm eff}/2\) and empty orbitals (small \(n_{mm}\)) upward by \(+U_{\rm eff}/2\). The net effect is to open or widen the gap between occupied and empty states in the correlated subspace by approximately \(U_{\rm eff}\), as intended.

The double-counting in Dudarev is implicit: the functional is constructed so that the \(E_U - E_{\rm dc}\) combination reduces exactly to equation \eqref{eq:Dudarev}, with the double-counting absorbed algebraically. No separate choice of double-counting scheme is needed.

Comparison of Formulations and Equivalence Conditions

Feature	Liechtenstein	Dudarev
Parameters	\(U\) and \(J\) separately	\(U_{\rm eff} = U - J\) only
Rotational invariance	Full	Approximate
Orbital polarisation	Captured	Not captured
Double-counting	AMF or FLL, explicit choice	Implicit, absorbed
Typical use	\(f\)-electron systems, full anisotropy	\(d\)-electron oxides, most practical work
Implementation in VASP	`LDAUTYPE = 1`	`LDAUTYPE = 2` (default)

Conditions under which Dudarev and Liechtenstein are equivalent. The Dudarev functional is obtained from the Liechtenstein functional by making two approximations:

Isotropic (spherically averaged) Coulomb interaction: replace the full Slater integrals \(U_{m_1m_3}^{m_2m_4}\) by their spherical average \(U,\delta_{m_1m_2}\delta_{m_3m_4} - J,\delta_{m_1m_4}\delta_{m_2m_3}\).
FLL double-counting with \(J = 0\) in the double-counting term: when the FLL double-counting correction is applied and the exchange term \(J\) is absorbed into \(U_{\rm eff}\), the Liechtenstein energy reduces exactly to the Dudarev form.

Concretely, inserting the spherically averaged interaction into the Liechtenstein energy and subtracting the FLL double counting gives:

\[ E_U^{\rm Lich,spherical} - E_{\rm dc}^{\rm FLL} = \frac{U-J}{2}\sum_{I,\sigma}\mathrm{Tr}\left[\mathbf{n}^{I\sigma}(\mathbf{1}-\mathbf{n}^{I\sigma})\right] \equiv E_U^{\rm Dur}. \]

The two formulations therefore give identical results when: (a) the occupation matrix \(\mathbf{n}^{I\sigma}\) is diagonal (no off-diagonal orbital coherences), and (b) the Slater integrals satisfy the spherical approximation \(F^2/F^4 = 14/9\) (exact for a hydrogen-like atom, approximately satisfied for \(3d\) transition metals). They differ when orbital polarisation within the correlated shell is important — i.e. when different \(m\) orbitals have different occupations that couple through the full Coulomb tensor. This is the regime where the Liechtenstein formulation is necessary: rare-earth \(f\) systems, orbitally ordered manganites, and systems near the Mott transition where the orbital degree of freedom is active.

The Double-Counting Problem

The double-counting term \(E_{\rm dc}\) in equation \eqref{eq:DFTU-total} has no unique exact form because the DFT XC energy does not separate cleanly into contributions from the correlated subspace and the rest. Two schemes are in common use:

Fully localised limit (FLL), also called the atomic limit:

\begin{equation} E\_{\rm dc}^{\rm FLL} = \frac{U}{2}N(N-1) - \frac{J}{2}\left[\frac{N}{2}\!\left(\frac{N}{2}-1\right) + \frac{N}{2}\!\left(\frac{N}{2}-1\right)\right], \label{eq:FLL} \end{equation}

valid when the correlated orbital is nearly fully occupied or nearly empty (close to the atomic limit). This is appropriate for \(f\)-electron systems and late transition metal oxides.

Around mean-field (AMF):

\begin{equation} E\_{\rm dc}^{\rm AMF} = \frac{U}{2}\bar{n}(N-1) - \frac{J}{2}\sum\_\sigma\bar{n}^\sigma(N^\sigma - 1), \end{equation}

where \(\bar{n} = N/(2l+1)\) is the average occupation per orbital. This is appropriate when the occupations are close to the average (near the itinerant limit).

The FLL correction gives a larger energy shift for occupied orbitals and is the standard choice in most codes. However, the double-counting error is a fundamental limitation of DFT+U: since it cannot be derived from first principles, it introduces an uncontrolled approximation that depends on how much of \(U\) is already captured in the DFT functional. This is one motivation for methods like DFT+DMFT, which treat the correlated subspace exactly.

Determining \(U\)

The Hubbard \(U\) parameter is not a unique material property — it depends on the DFT functional, the choice of localised basis, and the degree of screening in the material. Three approaches are used in practice.

Empirical Fitting

The simplest approach is to choose \(U_{\rm eff}\) to reproduce an experimentally known quantity — typically the band gap, lattice constant, or magnetic moment. For NiO, \(U_{\rm eff} \approx 5\)–\(6\) eV reproduces the experimental gap of \(\sim 4\) eV; for FeO, \(U_{\rm eff} \approx 4\) eV. Empirical \(U\) values are system-specific and should not be transferred between materials without validation.

A compilation of commonly used values:

Material / Ion	Orbital	\(U_{\rm eff}\) (eV)	Fitted property
Fe\(^{2+/3+}\) in oxides	\(3d\)	\(3.5\)–\(5.0\)	Gap, moment
Co\(^{2+/3+}\)	\(3d\)	\(3.0\)–\(5.5\)	Gap
Ni\(^{2+}\)	\(3d\)	\(5.0\)–\(6.5\)	Gap
Mn\(^{2+/3+/4+}\)	\(3d\)	\(3.0\)–\(4.5\)	Gap, structure
Cu\(^{2+}\)	\(3d\)	\(6.0\)–\(9.0\)	Gap
Ce\(^{3+/4+}\)	\(4f\)	\(5.0\)–\(7.0\)	\(f\) occupancy
U\(^{4+/5+}\)	\(5f\)	\(2.0\)–\(4.5\)	Structure, gap

Linear Response (Cococcioni–de Gironcoli)

A first-principles approach computes \(U\) from the linear response of the occupation matrix to a localised perturbation potential \(\alpha^I\) applied to site \(I\) (Cococcioni and de Gironcoli, 2005):

\begin{equation} U^I = \left(\chi\_0^{-1} - \chi^{-1}\right)^{II}, \label{eq:LR-U} \end{equation}

where \(\chi_0^{II} = \partial N^I/\partial\alpha^I|_{\rm bare}\) is the bare (non-self-consistent) response and \(\chi^{II} = \partial N^I/\partial\alpha^I|_{\rm SCF}\) is the fully self-consistent response. The difference captures the screening of the bare Coulomb interaction by all other electrons. The parameter \(U\) computed this way is self-consistent with the chosen DFT functional and projector, requiring no experimental input.

The procedure is: (1) apply a small perturbation \(\pm\alpha\) to the Hubbard projector of site \(I\); (2) compute the SCF and non-SCF changes in \(N^I\); (3) extract \(U\) from equation \eqref{eq:LR-U}. This is available as a post-processing step in Quantum ESPRESSO (hp.x) and can be scripted in VASP.

Limitations: Linear response \(U\) values tend to be smaller than empirically fitted values because they include screening by itinerant valence electrons. The result also depends on the choice of projectors and may differ between PAW and LCAO implementations.

Constrained Random Phase Approximation (cRPA)

The most rigorous approach is the constrained RPA (Aryasetiawan et al., 2004), which computes the partially screened Coulomb interaction \(W(\omega)\) by excluding the screening from the correlated subspace itself (to avoid double counting):

\begin{equation} U(\omega) = \langle\chi\_m\chi\_{m'}|W\_r(\omega)|\chi\_m\chi\_{m'}\rangle, \end{equation}

where \(W_r\) is the RPA-screened interaction with the polarisation of the correlated \(d/f\) subspace removed. cRPA gives frequency-dependent \(U(\omega)\); the static limit \(U(0)\) is used in DFT+U, while the full frequency dependence feeds into DFT+DMFT. cRPA requires a GW-like calculation and is available in VASP (ALGO = CRPA) and the WIEN2k+DMFT interface.

Effect of \(U\) on the Electronic Structure

The potential correction \eqref{eq:Vpot} produces several characteristic changes to the electronic structure that can be verified against experiment:

Band gap opening. Occupied \(d/f\) states are shifted down by \(\sim U_{\rm eff}/2\) and empty states up by \(\sim U_{\rm eff}/2\), opening a gap of approximately \(U_{\rm eff}\) in the correlated subspace. For a Mott insulator like NiO, DFT gives a metallic state while DFT+U with \(U_{\rm eff} \approx 5\) eV correctly gives an insulating gap \(\sim 3\)–\(4\) eV.

Orbital polarisation. The integer-driving nature of DFT+U can lift orbital degeneracy via orbital ordering — a spontaneous symmetry breaking in which one particular \(d\) orbital (e.g. \(d_{z^2}\) vs. \(d_{x^2-y^2}\)) becomes preferentially occupied. This is relevant for manganites and other Jahn–Teller active systems.

Magnetic moments. By localising the \(d/f\) electrons, DFT+U increases the local spin moment towards the ionic limit. For Fe\(^{3+}\) in an oxide (half-filled \(3d\), \(S = 5/2\)), GGA gives moments of \(\sim 3.5,\mu_B\); DFT+U with appropriate \(U\) gives \(\sim 4.2,\mu_B\), closer to the ionic value of \(5.0,\mu_B\).

Exchange coupling \(J\). The exchange coupling constants extracted from DFT+U are generally more reliable than from GGA for Mott insulators, because the localised occupations improve the description of the superexchange mechanism. For half-metallic Heusler alloys, DFT+U may be needed when one or more magnetic sublattice contains strongly correlated \(d\) states.

Lattice parameters and forces. The occupation-dependent potential modifies the forces on ions, generally increasing bond lengths for correlated oxides by \(\sim 0.5\)–\(2%\) towards experiment.

Multiple Correlated Sites and Inter-Site \(U\)

When a material contains more than one type of correlated atom (e.g. both Fe and Co in a double perovskite, or both \(3d\) transition metal and \(4f\) rare earth), separate \(U\) values must be applied to each:

\begin{equation} E\_U^{\rm multi} = \sum\_{I \in \{ \rm correlated\}} \frac{U\_{\rm eff}^{(I)}}{2}\sum\_\sigma\mathrm{Tr}\!\left[\mathbf{n}^{I\sigma}(\mathbf{1} - \mathbf{n}^{I\sigma})\right], \label{eq:multi-U} \end{equation}

with each \(U_{\rm eff}^{(I)}\) chosen from linear-response, cRPA, or empirical fitting on the corresponding sublattice. In VASP this is done by providing per-element LDAUL (target \(\ell\)) and LDAUU/LDAUJ arrays; in Quantum ESPRESSO, by listing Hubbard_U(itype) per atomic species. A worked example is LaMnO\(_3\) with both Mn (\(3d\), \(U_{\rm eff} \approx 4\) eV) and La (\(4f\), \(U_{\rm eff} \approx 7\) eV) treated explicitly — necessary in the rare-earth manganites where the \(4f\) shell is partially filled and hybridises with the Mn states.

DFT+U+V: Inter-Site Hubbard Term

The Hubbard \(U\) corrects on-site Coulomb repulsion but ignores the residual inter-site electron–electron interaction \(V_{IJ}\) between neighbouring correlated atoms. This nearest-neighbour \(V\) is small (typically \(0.5\)–\(1.5\) eV) but qualitatively important for systems where covalent hybridisation competes with on-site localisation — e.g. transition-metal oxides with strong \(p\text{–}d\) covalency such as NiO, MnO, the cuprates, and topological materials at the boundary between band insulators and Mott insulators.

The DFT+U+V extension (Campo and Cococcioni, 2010) adds a Hubbard-like penalty for inter-site occupation matrices \(n_{mm’}^{IJ\sigma} = \sum_{\mathbf{k}\nu} f_{\mathbf{k}\nu\sigma}\langle\phi_{\mathbf{k}\nu\sigma}|\hat{P}_{m’}^J\rangle\langle\hat{P}_m^I|\phi_{\mathbf{k}\nu\sigma}\rangle\):

\begin{equation} E\_{U+V} = E\_U - \sum\_{\langle I,J\rangle, \sigma} \frac{V\_{IJ}}{2}\,\mathrm{Tr}\!\left[\mathbf{n}^{IJ\sigma}\mathbf{n}^{JI\sigma}\right], \label{eq:DFT-U-V} \end{equation}

where the sum runs over nearest-neighbour pairs \(\langle I, J\rangle\) of correlated sites. The inter-site \(V_{IJ}\) parameters are computed by the same linear-response procedure as \(U\) (Cococcioni–de Gironcoli 2005, extended), now perturbing site \(I\) and reading the response of site \(J\)’s occupation matrix.

When DFT+U+V matters. For ionic Mott insulators (NiO, MnO, FeO with strong O–\(d\) charge transfer), DFT+U+V improves the description of the charge-transfer gap — the energetic separation between O \(2p\) and \(d\) bands — without requiring an artificially large \(U\) on the metal site. For NiO, DFT+U gives the correct \(3d\)–\(3d\) Mott gap but underestimates the charge-transfer character; DFT+U+V with \(V_{Ni-O} \approx 1\) eV gives both correctly. DFT+U+V is available in Quantum ESPRESSO (Hubbard_V) and in development for VASP.

Worked Example: NiO from GGA to DFT+U

NiO is the canonical test case for DFT+U. Its experimental ground state is an antiferromagnetic (AFM-II ordering) insulator with a charge-transfer gap of \(\sim 4.0\) eV and a Ni magnetic moment of \(1.6\)–\(1.9,\mu_B\). The Ni\(^{2+}\) ion has the \(3d^8\) configuration with \(S = 1\), in an octahedral O environment that splits the \(d\) shell into \(t_{2g}\) (fully occupied, 6 electrons) and \(e_g\) (half occupied, 2 electrons spin-polarised).

Standard GGA (PBE). A standard PBE calculation of antiferromagnetic NiO predicts:

A metallic ground state (or, depending on cell setup, a tiny gap of \(0.2\)–\(0.4\) eV).
A Ni moment of \(\sim 1.3,\mu_B\), much smaller than experiment.
A density of states (DOS) at the Fermi level dominated by Ni \(e_g\) states hybridised with O \(2p\), with no separation between the upper and lower Hubbard subbands.

Both errors trace back to the same cause: PBE overdelocalises the Ni \(3d\) electrons because the SIE artificially lowers their on-site energy.

DFT+U with \(U_{\rm eff} = 5.0\) eV. Applying the Dudarev correction to the Ni \(3d\) orbitals opens a gap of \(\sim 3.0\)–\(3.5\) eV and increases the Ni moment to \(\sim 1.7,\mu_B\) — in good agreement with experiment. Schematically, the projected DOS reorganises as follows:

  Energy ↑
  ────────────────────────────────────────
         │     │       ┃     │  ←─ Ni e_g↑ (empty)    } Upper Hubbard
   +2 eV │  Ni │       ┃     │                      } band (Mott gap)
         │  t  │   ━━━━╋━━━━━│  ←─ E_F (in the gap)
   −2 eV │  2g │       ┃ Ni  │  ←─ Ni e_g↓ (filled)   } Lower Hubbard
         │  ↑↓ │       ┃ e_g │                      } band
         │ ──  │       ┃ ↓   │  ←─ O 2p              } Anion p-band
   −5 eV │  O  │ O 2p  ┃     │     (charge-transfer)
         │  2p │       ┃     │
  ────────────────────────────────────────
         GGA (metal)   DFT+U (insulator)

The Ni \(e_g\) majority-spin states (filled in the AFM ground state) shift downward by \(\sim U_{\rm eff}/2\); the empty \(e_g\) minority-spin states shift up by the same amount, opening the Mott gap. The O \(2p\) states are only weakly affected because they are not in the Hubbard manifold — but the relative position of O \(2p\) and Ni \(3d\) determines whether NiO is a Mott or a charge-transfer insulator. The Zaanen–Sawatzky–Allen classification places NiO at the crossover, which is why both \(U\) (for the Ni–Ni gap) and \(V_{\rm Ni-O}\) (for the Ni–O charge transfer) are needed for full quantitative agreement.

\(U\) dependence. Varying \(U_{\rm eff}\) from 3 to 8 eV produces:

\(U_{\rm eff}\) (eV)	Gap (eV)	Moment (\(\mu_B\))	Character
0 (PBE)	0.0	1.3	Metal
3.0	1.8	1.55	Insulator, gap too small
5.0	3.0	1.70	Best match to experiment
6.5	3.7	1.80	Gap slightly too large
8.0	4.5	1.90	Over-correction

The result is sensitive to \(U\), confirming that a quantitative DFT+U calculation requires either careful empirical fitting or a first-principles \(U\) from linear response / cRPA. The linear-response \(U\) for NiO from Cococcioni–de Gironcoli is \(\sim 4.6\) eV, in good agreement with the empirical fit.

Limitations and the Path to DFT+DMFT

DFT+U is a mean-field treatment of the on-site Hubbard \(U\). It corrects three errors of semi-local DFT — the SIE on localised orbitals, the missing derivative discontinuity, and the absence of orbital polarisation — but at the level of a static, Hartree-like decoupling of the Hubbard term. It cannot describe phenomena that depend on the dynamical nature of correlations:

Quasiparticle satellites and Hubbard bands. The exact spectral function of a Mott insulator has two broad Hubbard bands at energies \(\pm U/2\) plus, in the metallic phase near the Mott transition, a narrow quasiparticle peak at \(\epsilon_F\). DFT+U produces the two Hubbard bands but cannot generate the quasiparticle peak — it has no mechanism for fractional spectral weight redistribution. Photoemission spectra of vanadates (V\(_2\)O\(_3\), V\(_2\)O\(_5\)) and the iron pnictides cannot be reproduced.
Mass renormalisation. In a correlated metal near the Mott transition, the effective mass is enhanced by \(m^*/m = 1/Z\) with \(Z\) the quasiparticle weight. DFT+U gives \(Z = 1\) by construction; this is qualitatively wrong for heavy-fermion compounds and bad metals.
Finite-temperature physics. Mott transitions, paramagnetic insulators, and Kondo screening are intrinsically finite-\(T\) effects involving entropy. DFT+U is a \(T = 0\) method.
Dynamical screening. The frequency dependence of \(U(\omega)\) — captured by cRPA but collapsed to its static value in DFT+U — is important for plasmon satellites and high-energy photoemission features.

The next level of sophistication is DFT+DMFT (dynamical mean-field theory; Kotliar et al., 2006), which treats the local correlated subspace exactly by solving an effective Anderson impurity problem self-consistently with the DFT band structure. The Hubbard \(U\) becomes a matrix in occupation space rather than a single number, and the resulting self-energy \(\Sigma_{mm’}(\omega)\) is fully frequency-dependent. DFT+DMFT captures all four phenomena listed above and is the de-facto standard for compounds like V\(_2\)O\(_3\), SrVO\(_3\), heavy fermions (CeRu\(_2\)Si\(_2\), URu\(_2\)Si\(_2\)), and iron pnictides. The computational cost is substantial — an impurity solver (continuous-time QMC, NRG, exact diagonalisation) is called many times per DFT iteration — and the implementations are still less mature than DFT+U (TRIQS+DFTtools, EDMFT+QSGW, AMULET, DCore are research-grade codes).

Method	Treatment of \(U\)	Self-energy	Captures Hubbard bands	Captures QP peak	Cost
GGA	Implicit, semi-local	Static, local	No	No	\(\mathcal{O}(N^3)\)
Hybrid (HSE06)	Partial via exact exchange	Static, non-local	Partially	No	\(\mathcal{O}(N^3\)–\(N^4)\)
DFT+U	Static, on-site	Static, orbital-dependent	Yes	No	\(\mathcal{O}(N^3)\)
DFT+DMFT	Dynamical, on-site	\(\Sigma(\omega)\), local	Yes	Yes	\(\mathcal{O}(N^3)\times N_{\rm imp}\)
DFT+GW	Non-local, dynamical	\(\Sigma(\mathbf{r},\mathbf{r}’,\omega)\)	Yes (qualitatively)	Yes (qualitatively)	\(\mathcal{O}(N^4)\)

When to escalate from DFT+U to DMFT. Use DFT+U when (a) the system is a clear Mott or charge-transfer insulator at \(T = 0\), (b) you need ground-state properties (energies, forces, moments) rather than spectra, and (c) the static-\(U\) approximation is justified. Move to DFT+DMFT when (a) you need photoemission or optical spectra including satellites, (b) the system is near a Mott transition, (c) it is a heavy fermion or bad metal, or (d) finite-\(T\) effects are essential.

Practical Recipe and Pitfalls

A practical DFT+U workflow:

Identify the correlated subspace. For \(3d\) transition metals, the \(3d\) shell; for \(4f\)/\(5f\), the \(f\) shell. Set LDAUL (VASP) or Hubbard_U per species (QE) accordingly.
Choose \(U_{\rm eff}\). For exploratory work, use literature values from the table above. For publication-quality results, compute \(U\) via linear response (Cococcioni–de Gironcoli) or cRPA — and report the value alongside results.
Initial occupation matrix matters. The DFT+U energy landscape has multiple local minima corresponding to different occupation patterns (orbital orderings). Default initial occupations may converge to a metastable state. Use LDAUTYPE = 2 with OCCEXT (VASP) or manually set starting_ns in QE to bias toward the expected physical ground state.
Check the converged occupations. Print the occupation matrix at convergence; physical solutions have integer eigenvalues (\(\sim 0\) or \(\sim 1\)). Eigenvalues near \(0.5\) signal either a metastable solution or genuinely fractional occupation (mixed-valence systems).
Energy comparison caveat. DFT+U total energies cannot be compared directly to standard DFT energies — they reference different functionals. Always compare DFT+U energies of different structures or configurations using the same \(U\) value and projector.
Forces and relaxation. DFT+U forces include the derivative of the Hubbard term and are well-defined, but bond lengths typically increase by \(0.5\)–\(2%\). Re-relax structures with DFT+U if the geometry depends on the correlated-electron localisation.

Common pitfalls:

Using the same \(U\) for the same element in different oxidation states or coordination environments: \(U_{\rm eff}(\rm Fe^{2+})\) and \(U_{\rm eff}(\rm Fe^{3+})\) differ by \(\sim 1\) eV in linear-response calculations.
Forgetting that \(U\) values reported in the literature are projector-dependent: a VASP PAW \(U\) is not directly transferable to an LCAO code without retabulation.
Applying DFT+U to weakly correlated states (e.g. \(d^0\) configurations like Ti\(^{4+}\) in TiO\(_2\)): the correction is unnecessary and can degrade results for empty \(d\) shells with no Hubbard physics.

Outlook

DFT+U is the workhorse correction for strongly correlated electrons in plane-wave DFT, offering qualitative-to-quantitative improvement at marginal extra cost over standard GGA. It is, however, a static and atomic-orbital-specific correction; the systematic path to a complete description of correlation runs through DFT+DMFT and ultimately many-body Green’s function methods (GW, GW+DMFT, vertex corrections).

The chapters that follow turn from the physics of the XC functional to the numerical methods needed to solve the Kohn–Sham equations in practice: self-consistent field iteration and density mixing (Chapter 8), Brillouin zone integration and smearing (Chapter 9), and iterative diagonalisation of the KS Hamiltonian (Chapter 10).

SCF Convergence and Density Mixing

The Part II overview described the SCF cycle as a sequence of six steps and catalogued the common failure modes. This chapter addresses the most fundamental of those steps: the density update — how the output density from one KS cycle is combined with the input density to produce the next iterate, and under what conditions this process converges.

Consider a concrete example. A spin-polarised calculation on a \(4 \times 4 \times 4\) supercell of bcc Fe (\(128\) atoms, \(\sim 50,000\) plane waves per \(\mathbf{k}\)-point) will not converge with naive density iteration at any mixing parameter. The total energy oscillates indefinitely, swinging by several eV per iteration, because long-wavelength charge fluctuations slosh back and forth across the metallic Fermi surface with no restoring force. The same system converges smoothly in \(\sim 25\) iterations once Kerker preconditioning and Pulay mixing are applied. The difference is not a matter of patience — it is a qualitative change from divergence to convergence, rooted in the dielectric response of the electron gas.

We begin by recasting the SCF cycle as a fixed-point problem and deriving the convergence condition from the Jacobian of the KS map. The charge-sloshing instability in metals emerges naturally from the Lindhard dielectric function. With the problem precisely formulated, we derive the main mixing schemes — linear, Kerker-preconditioned, Pulay/DIIS, and Broyden — as progressively more sophisticated strategies for reaching the fixed point.

The Self-Consistency Problem Revisited

The Fixed-Point Formulation

The SCF cycle of Chapter 3 (Section 3.3) consists of three computational steps: constructing \(V_{\rm eff}[\rho]\) from the Kohn–Sham equation \eqref{eq:KS-eqn}, diagonalising \(\hat{h}_{\rm KS}\), and assembling \(\rho_{\rm out} = \sum_i f_i |\phi_i|^2\). We now recast this cycle as a fixed-point problem whose convergence properties can be analysed rigorously. Define the map \(F\) that takes an input density \(\rho_{\rm in}\) and returns the output density obtained by one full KS cycle:

\begin{equation} \rho_{\rm in} \;\xrightarrow{V_{\rm eff}[\rho_{\rm in}]}\; \hat{h}_{\rm KS} \;\xrightarrow{\text{diagonalise}}\; \{\phi_i, \epsilon_i\} \;\xrightarrow{\sum_i f_i|\phi_i|^2}\; \rho_{\rm out} = F[\rho_{\rm in}]. \label{eq:F-map} \end{equation}

Self-consistency means finding the density \(\rho^*\) that is a fixed point of \(F\):

\begin{equation} \boxed{F[\rho^*] = \rho^*.} \label{eq:fixed-point} \end{equation}

The residual at iteration \(n\) measures the distance from self-consistency:

\begin{equation} R^{(n)} = F[\rho_{\rm in}^{(n)}] - \rho_{\rm in}^{(n)} = \rho_{\rm out}^{(n)} - \rho_{\rm in}^{(n)}. \label{eq:residual} \end{equation}

All mixing schemes can be understood as strategies to drive \(|R^{(n)}| \to 0\) as efficiently as possible. The simplest strategy — feeding the output density directly back as the next input, \(\rho_{\rm in}^{(n+1)} = \rho_{\rm out}^{(n)}\) — corresponds to a naive fixed-point iteration. Whether this converges depends on the properties of \(F\).

The Jacobian and Convergence Conditions

Linearising \(F\) around the fixed point \(\rho^*\), the residual evolves as:

\begin{equation} R^{(n+1)} \approx J \cdot R^{(n)}, \qquad J_{ij} = \left.\frac{\delta F[\rho]_i}{\delta \rho_j}\right|_{\rho = \rho^*}, \label{eq:jacobian} \end{equation}

where \(J\) is the Jacobian of the map \(F\), and the indices \(i, j\) run over the discretisation of the density (e.g., plane-wave coefficients \(\rho(\mathbf{G})\)). The iteration converges if and only if the spectral radius of \(J\) is less than unity:

\begin{equation} \boxed{|\lambda_{\rm max}(J)| \lt 1,} \label{eq:convergence-condition} \end{equation}

where \(\lambda_{\rm max}\) is the eigenvalue of \(J\) with the largest absolute value. When this condition is violated, the iteration diverges: small perturbations in the density are amplified rather than damped by successive KS cycles.

Charge Sloshing: Why Metals Are Difficult

The physical content of the Jacobian becomes transparent in reciprocal space. For a homogeneous electron gas, the density response to a perturbation of wavevector \(\mathbf{G}\) is governed by the Lindhard dielectric function:

\begin{equation} \varepsilon(\mathbf{G}) = 1 - \frac{4\pi e^2}{G^2}\chi_0(\mathbf{G}), \label{eq:lindhard} \end{equation}

where \(\chi_0(\mathbf{G})\) is the non-interacting density–density response function of the KS system. The KS map \(F\) relates the output density perturbation to the input perturbation through the inverse dielectric function: \(\delta\rho_{\rm out}(\mathbf{G}) = \varepsilon^{-1}(\mathbf{G}),\delta\rho_{\rm in}(\mathbf{G})\). For the Jacobian, this means:

\begin{equation} J(\mathbf{G}) \approx 1 - \varepsilon^{-1}(\mathbf{G}). \label{eq:J-dielectric} \end{equation}

For an insulator, there is a finite gap \(\Delta\) in the spectrum. The response function \(\chi_0\) remains bounded as \(G \to 0\), the dielectric constant \(\varepsilon(0)\) is finite (typically \(5\)–\(20\)), and all eigenvalues of \(J\) satisfy \(|\lambda| \lt 1\). The naive iteration converges, often rapidly.

For a metal, the situation is qualitatively different. The density of states at the Fermi level is finite, so \(\chi_0(\mathbf{G}) \to -N(\epsilon_F)\) as \(G \to 0\) (the static Lindhard limit). The dielectric function diverges as:

\begin{equation} \varepsilon(\mathbf{G}) \approx 1 + \frac{k_{\rm TF}^2}{G^2}, \qquad G \to 0, \label{eq:eps-metal} \end{equation}

where \(k_{\rm TF} = \sqrt{4\pi e^2 N(\epsilon_F)}\) is the Thomas–Fermi screening wavevector. This means \(\varepsilon^{-1}(\mathbf{G}) \to 0\) for small \(G\), and from equation \eqref{eq:J-dielectric}:

\begin{equation} J(\mathbf{G}) \to 1 \quad \text{as} \quad G \to 0 \quad \text{(metals)}. \label{eq:charge-sloshing} \end{equation}

The Jacobian has eigenvalues arbitrarily close to \(1\) for long-wavelength density fluctuations. These modes are neither damped nor amplified — they slosh back and forth between iterations without converging. This is the charge-sloshing instability, the dominant convergence problem in metallic DFT calculations. Physically, long-wavelength charge fluctuations in a metal cost very little energy (they are screened only weakly at long range), so the SCF cycle has no restoring force to drive them toward the self-consistent solution.

The key insight is that the convergence difficulty is wavevector-dependent: small-\(G\) components of the density converge slowly or not at all, while large-\(G\) components (short wavelength, within the screening cloud) converge quickly. An effective mixing scheme must therefore treat different Fourier components differently.

To make this concrete: for fcc Al (a simple metal with \(k_{\rm TF} \approx 1.6,\)Å\(^{-1}\)), the Jacobian eigenvalue for the smallest accessible \(\mathbf{G}\) in a conventional cell is \(\lambda \approx 0.97\). With simple mixing at \(\alpha = 0.3\), the convergence rate for this mode is \(|1 - 0.3(1 - 0.97)| = 0.991\) — meaning each iteration reduces the error by less than \(1%\). Reaching \(10^{-6}\) convergence in this mode alone would require \(\sim 1400\) iterations. For bcc Fe with a denser Fermi surface, the situation is worse. By contrast, an insulator like MgO has \(\varepsilon(0) \approx 10\), giving \(\lambda \approx 0.9\) for the slowest mode and convergence in \(\sim 50\) iterations even with simple mixing — and much faster with Pulay extrapolation.

Density Mixing Schemes

With the fixed-point problem and its convergence difficulties established, we now derive the main mixing strategies used in practice. Each can be understood as a progressively more sophisticated approach to the same problem: find the fixed point \(\rho^* = F[\rho^*]\) given a sequence of input–output pairs \((\rho_{\rm in}^{(n)}, \rho_{\rm out}^{(n)})\).

Simple (Linear) Mixing

The simplest improvement over naive iteration is to mix the input and output densities:

\begin{equation} \boxed{\rho_{\rm in}^{(n+1)} = \rho_{\rm in}^{(n)} + \alpha\, R^{(n)},} \label{eq:linear-mixing} \end{equation}

where \(\alpha \in (0, 1]\) is the mixing parameter and \(R^{(n)} = \rho_{\rm out}^{(n)} - \rho_{\rm in}^{(n)}\) is the residual. Setting \(\alpha = 1\) recovers naive iteration; smaller \(\alpha\) damps the update.

Linearising around the fixed point, the residual evolves as:

\begin{equation} R^{(n+1)} = [1 - \alpha(1 - J)]\,R^{(n)}. \label{eq:linear-residual} \end{equation}

The effective iteration matrix is \(M = 1 - \alpha(1 - J)\), with eigenvalues \(\mu = 1 - \alpha(1 - \lambda)\) where \(\lambda\) is an eigenvalue of \(J\). Convergence requires \(|\mu| \lt 1\) for all eigenvalues, which gives the condition:

\begin{equation} \alpha \lt \frac{2}{1 + \lambda_{\rm max}}. \label{eq:alpha-bound} \end{equation}

For a metal with \(\lambda_{\rm max} \to 1\) (the charge-sloshing modes), this bound gives \(\alpha \lt 1\) — but the convergence rate for these modes is \(|1 - \alpha(1 - \lambda)| \approx 1 - \alpha(1 - \lambda_{\rm max})\), which is very close to \(1\) for small \(\alpha\). Convergence is guaranteed but extremely slow: the long-wavelength modes creep toward self-consistency over hundreds of iterations.

Typical values: \(\alpha \sim 0.01\)–\(0.1\) for metals and magnetic systems; \(\alpha \sim 0.3\)–\(0.7\) for insulators. The smaller values needed for metals reflect the charge-sloshing problem directly.

Kerker Preconditioning

The analysis of charge sloshing reveals that the convergence problem is concentrated at small \(\mathbf{G}\): long-wavelength components of the residual are barely damped, while short-wavelength components converge quickly. The Kerker preconditioner (Kerker, 1981) addresses this by applying a \(\mathbf{G}\)-dependent damping to the residual before mixing.

The idea is to approximate the inverse dielectric function and use it to precondition the residual. From equation \eqref{eq:eps-metal}, the metallic dielectric function at small \(G\) is dominated by the Thomas–Fermi screening term \(k_{\rm TF}^2/G^2\). The Kerker preconditioner multiplies the residual in reciprocal space by a filter that suppresses the problematic small-\(G\) components:

\begin{equation} \boxed{\tilde{R}^{(n)}(\mathbf{G}) = \frac{G^2}{G^2 + k_0^2}\,R^{(n)}(\mathbf{G}),} \label{eq:kerker} \end{equation}

where \(k_0\) is a parameter with dimensions of inverse length, typically chosen close to \(k_{\rm TF}\). The preconditioned residual \(\tilde{R}\) is then used in place of \(R\) in the mixing formula:

\begin{equation} \rho_{\rm in}^{(n+1)} = \rho_{\rm in}^{(n)} + \alpha\,\tilde{R}^{(n)}. \label{eq:kerker-mixing} \end{equation}

The physical interpretation is immediate: the factor \(G^2/(G^2 + k_0^2)\) vanishes as \(G \to 0\) and approaches unity for \(G \gg k_0\). It therefore suppresses the long-wavelength charge sloshing modes while leaving the well-behaved short-wavelength modes unchanged. At intermediate wavelengths (\(G \sim k_0\)), the suppression is partial — the preconditioner interpolates smoothly between complete damping and no damping.

This is equivalent to multiplying the residual by an approximate inverse dielectric function. If \(k_0 = k_{\rm TF}\), the preconditioner matches the Thomas–Fermi screening exactly in the long-wavelength limit, making the preconditioned iteration matrix nearly independent of \(G\) and restoring uniform convergence across all wavevectors.

Computationally, the Kerker preconditioner is very cheap: one forward FFT to go to reciprocal space, a pointwise multiplication by \(G^2/(G^2 + k_0^2)\), and one inverse FFT to return to real space. No additional KS cycles or matrix operations are required.

Limitation: Kerker preconditioning is designed for metallic screening and can slow down convergence in insulators, where the long-wavelength response is already well-behaved. For insulators, the factor \(G^2/(G^2 + k_0^2)\) unnecessarily suppresses the update at small \(G\), requiring more iterations to reach self-consistency. Most codes therefore apply Kerker preconditioning only when the system is metallic or when charge sloshing is detected.

Pulay/DIIS Mixing

Linear mixing, even with Kerker preconditioning, uses only information from the current iteration to construct the update. This is wasteful: by iteration \(n\), the SCF cycle has accumulated \(n\) input–output pairs, each containing information about the map \(F\) near the fixed point. Pulay mixing (Pulay, 1980), also known as Direct Inversion in the Iterative Subspace (DIIS), exploits this history to construct a much better estimate of the self-consistent density.

The idea is to form a linear combination of the \(m\) most recent input densities and find the coefficients that minimise the norm of the corresponding residual. Suppose we have stored the last \(m\) residuals \({R^{(n)}, R^{(n-1)}, \ldots, R^{(n-m+1)}}\) and the corresponding input densities \({\rho_{\rm in}^{(n)}, \ldots, \rho_{\rm in}^{(n-m+1)}}\). Construct a trial input density as:

\begin{equation} \bar{\rho}_{\rm in} = \sum_{i=1}^{m} c_i\, \rho_{\rm in}^{(n-m+i)}, \label{eq:pulay-trial} \end{equation}

with the constraint \(\sum_i c_i = 1\) (so that \(\bar{\rho}\) integrates to \(N\) electrons). If the map \(F\) is approximately linear near the fixed point, the residual of this trial density is approximately:

\begin{equation} \bar{R} \approx \sum_{i=1}^{m} c_i\, R^{(n-m+i)}. \label{eq:pulay-residual} \end{equation}

DIIS minimises \(|\bar{R}|^2 = \sum_{ij} c_i c_j \langle R^{(i)} | R^{(j)} \rangle\) subject to the constraint \(\sum_i c_i = 1\). Introducing a Lagrange multiplier \(\mu\), the stationarity conditions give the \((m+1) \times (m+1)\) linear system:

\begin{equation} \begin{pmatrix} B_{11} & B_{12} & \cdots & B_{1m} & 1 \\ B_{21} & B_{22} & \cdots & B_{2m} & 1 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ B_{m1} & B_{m2} & \cdots & B_{mm} & 1 \\ 1 & 1 & \cdots & 1 & 0 \end{pmatrix} \begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_m \\ -\mu \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ 1 \end{pmatrix}, \label{eq:DIIS-system} \end{equation}

where the DIIS matrix is:

\begin{equation} \boxed{B_{ij} = \langle R^{(i)} | R^{(j)} \rangle = \int R^{(i)}(\mathbf{r})\, R^{(j)}(\mathbf{r})\, d\mathbf{r}.} \label{eq:DIIS-matrix} \end{equation}

This is a small \(m \times m\) system (typically \(m = 4\)–\(8\)) and is trivially cheap to solve. The choice of subspace size \(m\) involves a trade-off: too few stored residuals (\(m \lt 4\)) and the extrapolation lacks sufficient information to capture the curvature of the SCF map; too many (\(m \gt 10\)–\(12\)) and the stored residuals become nearly linearly dependent, causing the DIIS matrix \(B_{ij}\) to become ill-conditioned and the coefficients \(c_i\) to develop large, oscillatory values. In practice, \(m = 6\)–\(8\) is a robust default for most systems. The optimal coefficients \({c_i}\) are then used to form the next input density:

\begin{equation} \rho_{\rm in}^{(n+1)} = \sum_{i=1}^{m} c_i \left[\rho_{\rm in}^{(n-m+i)} + \alpha\, R^{(n-m+i)}\right], \label{eq:pulay-update} \end{equation}

where the mixing parameter \(\alpha\) is still present but its precise value is much less critical than in simple mixing — the DIIS extrapolation handles the difficult modes.

Physical interpretation: DIIS effectively extrapolates from the sequence of residuals to estimate where the residual would vanish. If the residuals were evolving linearly, the DIIS solution would give the exact fixed point in \(m\) steps. In practice the map \(F\) is nonlinear, so the extrapolation is approximate, but it converges superlinearly near the solution — much faster than the geometric convergence of linear mixing.

Practical note: Pulay mixing combined with Kerker preconditioning is the default in most modern plane-wave codes. The Kerker filter handles the wavevector-dependent convergence rates, while DIIS handles the multi-step extrapolation. Together, they typically converge metallic systems in \(15\)–\(30\) SCF iterations.

Broyden Mixing

An alternative to DIIS is to build an approximate inverse Jacobian from the accumulated input–output history, then use it to compute a quasi-Newton update. This is Broyden mixing (Broyden, 1965; adapted for DFT by Vanderbilt and Louie, 1984, and Johnson, 1988).

The Newton step for the fixed-point equation \(\rho - F[\rho] = 0\) would be:

\begin{equation} \rho^{(n+1)} = \rho^{(n)} - (1 - J)^{-1} R^{(n)} = \rho^{(n)} + G\, R^{(n)}, \label{eq:newton-step} \end{equation}

where \(G = -(1 - J)^{-1}\) is the inverse of the “SCF Jacobian” \((1 - J)\). Computing \(G\) exactly would require the full Jacobian \(J\), which is an \(N_{\rm PW} \times N_{\rm PW}\) matrix — completely intractable. The Broyden method builds an approximation \(G_n\) to \(G\) by requiring it to satisfy the secant condition: the approximate Jacobian must reproduce the last observed input–residual relationship exactly.

Define the changes between successive iterations:

\begin{equation} \Delta\rho^{(n)} = \rho_{\rm in}^{(n)} - \rho_{\rm in}^{(n-1)}, \qquad \Delta R^{(n)} = R^{(n)} - R^{(n-1)}. \label{eq:broyden-deltas} \end{equation}

The secant condition requires \(G_n \Delta R^{(n)} = \Delta\rho^{(n)}\). Among all matrices satisfying this condition, the Broyden rank-1 update chooses the one closest to \(G_{n-1}\) in the Frobenius norm:

\begin{equation} \boxed{G_n = G_{n-1} + \frac{(\Delta\rho^{(n)} - G_{n-1}\,\Delta R^{(n)})\,(\Delta R^{(n)})^T}{(\Delta R^{(n)})^T \Delta R^{(n)}}.} \label{eq:broyden-update} \end{equation}

This is a rank-1 correction to the previous approximate Jacobian inverse. Starting from an initial guess \(G_0 = -\alpha, \mathbf{1}\) (equivalent to simple mixing), each iteration adds one rank-1 correction. After \(n\) iterations, \(G_n\) has rank at most \(n\) plus the rank of \(G_0\), and the update step is:

\begin{equation} \rho_{\rm in}^{(n+1)} = \rho_{\rm in}^{(n)} + G_n\, R^{(n)}. \label{eq:broyden-step} \end{equation}

Storage: The rank-1 structure means that \(G_n\) never needs to be stored as a full matrix. Instead, one stores the \(n\) vectors \({\Delta\rho^{(k)}, \Delta R^{(k)}}\) and applies \(G_n\) via a sequence of inner products and vector additions. For \(m\) stored iterations, this requires \(\mathcal{O}(m \cdot N_{\rm PW})\) storage and \(\mathcal{O}(m \cdot N_{\rm PW})\) operations per update — negligible compared to the KS diagonalisation.

Broyden’s second method (the “good Broyden” update) modifies the update formula to ensure that \(G_n\) remains symmetric. The physical motivation is that the true inverse Jacobian \(G\) of the SCF map inherits the symmetry of the density–density response function \(\chi\), which satisfies \(\chi(\mathbf{r}, \mathbf{r}‘) = \chi(\mathbf{r}’, \mathbf{r})\) by time-reversal symmetry. A non-symmetric \(G_n\) can develop spurious antisymmetric components that couple charge and magnetisation modes or mix unrelated \(\mathbf{G}\)-vectors, leading to erratic convergence. The symmetric Broyden-2 update avoids this:

\begin{equation} G_n = G_{n-1} + \frac{(\Delta\rho^{(n)} - G_{n-1}\,\Delta R^{(n)})\,(\Delta\rho^{(n)})^T G_{n-1}}{(\Delta\rho^{(n)})^T G_{n-1}\,\Delta R^{(n)}}. \label{eq:broyden2} \end{equation}

In practice, Broyden-2 produces smoother convergence trajectories than Broyden-1, particularly for spin-polarised systems where the charge and magnetisation densities are mixed simultaneously.

Modified Broyden with metric: In practice, the inner products in the Broyden update can be weighted by a metric \(w(\mathbf{G})\) in reciprocal space, chosen to downweight high-\(G\) components (which are noisy and less important for convergence). This modified Broyden method with Kerker-like weighting is used as the default mixing scheme in several major plane-wave codes.

Convergence: Broyden mixing converges superlinearly, similar to DIIS. The two methods are closely related mathematically: both use a subspace of previous iterations to improve the update, and in certain limits they produce identical iterates. In practice, Pulay/DIIS is somewhat more robust for difficult cases (magnetic metals, near-degenerate ground states), while Broyden can be slightly faster when convergence is smooth.

Practical Guidance

The choice of mixing scheme depends on the system type. The following table summarises the recommended settings:

System type	Recommended scheme	Typical parameters	Typical SCF iterations
Simple insulator (Si, NaCl)	Linear or Pulay	\(\alpha = 0.4\)–\(0.7\); no Kerker needed	\(10\)–\(15\)
Semiconductor with narrow gap	Pulay + mild Kerker	\(\alpha = 0.2\)–\(0.4\); \(k_0 \sim 1.0\) Å\(^{-1}\)	\(15\)–\(25\)
Simple metal (Al, Cu)	Pulay + Kerker	\(\alpha = 0.05\)–\(0.2\); \(k_0 \sim 1.5\) Å\(^{-1}\)	\(15\)–\(30\)
Magnetic metal (Fe, Ni)	Pulay + Kerker or Broyden	\(\alpha = 0.01\)–\(0.1\); \(k_0 \sim 1.5\) Å\(^{-1}\)	\(25\)–\(50\)
Antiferromagnetic insulator (NiO)	Pulay + Kerker	\(\alpha = 0.01\)–\(0.05\); tighten \(k_0\)	\(30\)–\(60\)
Slab / surface (metal)	Pulay + Kerker	\(\alpha = 0.02\)–\(0.1\); increase subspace	\(30\)–\(60\)
Charged defect in large cell	Pulay + Kerker	\(\alpha = 0.02\)–\(0.1\); non-DIIS initial steps	\(40\)–\(80\)

Signs of convergence problems in the SCF output:

Oscillating total energy. The energy alternates between high and low values without trending downward. This is the hallmark of charge sloshing — reduce the mixing parameter \(\alpha\) and ensure Kerker preconditioning is active.

Residual decreasing then stalling. The SCF converges rapidly for the first \(5\)–\(10\) iterations, then the residual plateaus. This often indicates that the DIIS subspace is contaminated by early, poor-quality residuals. Allow a few initial iterations with simple mixing to bring the density into the basin of convergence before DIIS takes over.

Spin channels oscillating out of phase. In spin-polarised calculations, the up and down spin densities may trade magnitude between iterations. This signals an antiferromagnetic instability or an incorrect initial magnetic configuration. Fix the initial moments and consider using a more robust diagonalisation algorithm (Section 8.4) for the first few iterations to stabilise the spin order.

Code Notes: VASP and Quantum ESPRESSO Parameters

VASP:

Parameter	Controls	Recommended range
`AMIX`	Linear mixing parameter \(\alpha\) for charge density	\(0.02\)–\(0.8\) (default \(0.4\))
`BMIX`	Kerker wavevector \(k_0\) (in units of \(2\pi/a\))	\(0.1\)–\(3.0\) (default \(1.0\))
`AMIX_MAG`	Mixing parameter for magnetisation density	Often \(2\)–\(4 \times\) `AMIX`
`BMIX_MAG`	Kerker parameter for magnetisation	Often \(\sim\) `BMIX`
`MAXMIX`	Number of previous steps stored for Pulay/Broyden	\(20\)–\(60\) (default \(-45\))
`IMIX`	Mixing scheme: \(0\) = no mixing, \(1\) = simple, \(4\) = Broyden	Default \(4\)
`NELMDL`	Number of initial non-DIIS steps (simple mixing)	\(-5\) to \(-12\) (negative = auto)

Negative MAXMIX indicates Broyden mixing with \(|\)MAXMIX\(|\) stored steps. Positive values select Pulay/DIIS. The sign convention is code-specific; consult the VASP manual for the version in use.

Note on ALGO: The ALGO tag primarily controls the diagonalisation method (Section 8.4), but it also affects mixing behaviour indirectly. ALGO = All forces a full Davidson diagonalisation at every SCF step, which is more robust for difficult cases — particularly spin-polarised metals where the RMM-DIIS default (ALGO = Fast) can lose track of the correct eigenstate ordering, causing spin channels to oscillate. For systems that fail to converge with default settings, switching to ALGO = All for the first \(10\)–\(20\) SCF steps (via NELMDL) before reverting to ALGO = Fast is a common and effective strategy.

Quantum ESPRESSO:

Parameter	Controls	Recommended range
`mixing_beta`	Mixing parameter \(\alpha\)	\(0.1\)–\(0.7\) (default \(0.7\))
`mixing_mode`	`'plain'`, `'TF'` (Thomas–Fermi/Kerker), `'local-TF'`	`'local-TF'` for metals
`mixing_ndim`	Number of previous iterations for Broyden/DIIS	\(4\)–\(12\) (default \(8\))
`electron_maxstep`	Maximum number of SCF iterations	\(100\)–\(200\)

In QE, mixing_mode = 'local-TF' applies a local Thomas–Fermi preconditioning that is analogous to the Kerker preconditioner in reciprocal space. It is the recommended choice for metallic or small-gap systems.

Further Reading

Pulay mixing (DIIS): P. Pulay, Convergence acceleration of iterative sequences. The case of SCF iteration, Chem. Phys. Lett. 73, 393 (1980). The paper that introduced the DIIS method in the context of Hartree–Fock calculations; the adaptation to DFT density mixing is straightforward.
Kerker preconditioning: G. P. Kerker, Efficient iteration scheme for self-consistent pseudopotential calculations, Phys. Rev. B 23, 3082 (1981). Derives the \(G^2/(G^2+k_0^2)\) preconditioner from the Thomas–Fermi dielectric function.
Broyden mixing: C. G. Broyden, A class of methods for solving nonlinear simultaneous equations, Math. Comp. 19, 577 (1965). The general quasi-Newton framework. The adaptation to DFT is described in: D. D. Johnson, Modified Broyden’s method for accelerating convergence in self-consistent calculations, Phys. Rev. B 38, 12807 (1988).
Vanderbilt–Louie scheme: D. Vanderbilt and S. G. Louie, Total energies of diamond (111) surface reconstructions by a linear combination of bulk bands, Phys. Rev. B 30, 6118 (1984). Early application of Broyden-type mixing to plane-wave DFT.
VASP implementation details: G. Kresse and J. Furthmüller, Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set, Phys. Rev. B 54, 11169 (1996). Sections III–IV describe the mixing and diagonalisation machinery used in VASP.
General review of SCF convergence: The treatment in R. M. Martin, Electronic Structure: Basic Theory and Practical Methods (Cambridge, 2004), Chapter 9, gives an accessible overview. A more mathematical treatment is in E. Cancès, SCF algorithms for Hartree–Fock electronic calculations, Lecture Notes in Chemistry 74, 17 (2000).

Outlook

The mixing schemes developed in this chapter solve the outer SCF convergence problem: given a sequence of input–output density pairs, they find the self-consistent fixed point efficiently even for metallic systems with charge-sloshing instabilities. But the SCF loop contains two other pieces of numerical machinery that have been treated as black boxes so far: the eigenvalue solver that produces the KS orbitals and eigenvalues at each iteration, and the occupation scheme that assigns electrons to states near the Fermi level. Both can fail independently of the mixing, and both introduce their own convergence parameters.

The next two chapters develop these two topics. Chapter 9 treats smearing and partial occupancies — the regularisation of Brillouin zone integrals over the discontinuous Fermi surface — through Gaussian, Fermi–Dirac, Methfessel–Paxton, and tetrahedron schemes. Chapter 10 derives the iterative diagonalisation methods (Davidson and RMM-DIIS) that make the KS eigenvalue problem tractable for large systems, and closes with convergence diagnostics and a worked example on bcc Fe that ties together all the numerical machinery developed in Chapters 8–10.

Brillouin Zone Integration

Chapter 8 addressed the outer loop of the SCF cycle: how to update the density from one iteration to the next so that the fixed-point equation \(F[\rho^] = \rho^\) is satisfied efficiently. This chapter addresses two pieces of numerical machinery that operate within each SCF step: how the BZ integral is discretised onto a finite \(\mathbf{k}\)-point mesh (Section: Monkhorst–Pack Sampling), and how the occupation function near the Fermi level is regularised to achieve rapid convergence for metals (Section: Smearing). The eigenvalue solver that produces the orbitals and energies at each \(\mathbf{k}\)-point is treated separately in Chapter 10. BZ integration is covered first because the choice of \(\mathbf{k}\)-mesh and smearing scheme affects the effective smoothness of the energy landscape seen by the SCF cycle.

Monkhorst–Pack \(k\)-Point Sampling

Monkhorst-Pack k-point mesh in the Brillouin zone and smearing of the Fermi level — **Figure 9.1.** *(a)* A 6×6 Monkhorst–Pack mesh in a 2D square Brillouin zone. Green points are the irreducible k-points (related by symmetry to the full set); blue points are the symmetry-equivalent reducible set. The irreducible wedge (shaded green) reduces the computational cost by the order of the point group. *(b)* Occupation function \(f(\epsilon)\) and density of states \(D(\epsilon)\) near the Fermi level for a metal. The step function (grey dashed) requires infinite k-point density; Fermi–Dirac (terracotta) and Methfessel–Paxton (plum) smearings regularise the integral at the cost of a finite broadening \(\sigma\), which must be corrected or extrapolated to zero.

In a periodic solid, all physical observables require integration over the first Brillouin zone. The electron density, total energy, and density of states are all BZ integrals of the form:

\begin{equation} \langle A \rangle = \frac{\Omega_{\rm BZ}}{(2\pi)^3}\int_{\rm BZ} A(\mathbf{k})\,d\mathbf{k} \approx \sum_{\mathbf{k} \in \mathcal{S}} w_{\mathbf{k}}\,A(\mathbf{k}), \label{eq:BZ-integral} \end{equation}

where the continuous integral is replaced by a weighted sum over a discrete set \(\mathcal{S}\) of \(\mathbf{k}\)-points with weights \(w_{\mathbf{k}}\) summing to unity. The central question is: how to choose \(\mathcal{S}\) so that the sum converges to the integral as rapidly as possible?

The Monkhorst–Pack Grid

The Monkhorst–Pack (MP) scheme (Monkhorst and Pack, 1976) places a uniform grid of \(N_1 \times N_2 \times N_3\) \(\mathbf{k}\)-points in the BZ, centred so as to avoid the \(\Gamma\) point (which is often a high-symmetry point requiring special treatment):

\begin{equation} \mathbf{k}_{n_1 n_2 n_3} = \sum_{i=1}^{3} \frac{2n_i - N_i - 1}{2N_i}\,\mathbf{b}_i, \qquad n_i = 1,\ldots,N_i, \label{eq:MP-grid} \end{equation}

where \(\mathbf{b}i\) are the primitive reciprocal lattice vectors. For an odd \(N_i\), the grid includes \(\Gamma\); for even \(N_i\), it is shifted by half a grid spacing and avoids \(\Gamma\). The total number of \(\mathbf{k}\)-points in the full BZ is \(N{\rm tot} = N_1 N_2 N_3\).

Reducing by symmetry. The crystal point group \(G\) maps each \(\mathbf{k}\) to a set of symmetry-equivalent points. Only the irreducible \(\mathbf{k}\)-points — one representative from each equivalence class — need to be computed explicitly; the rest contribute through their multiplicity weights \(w_{\mathbf{k}} = |\text{orbit}(\mathbf{k})|/N_{\rm tot}\). For a cubic crystal with \(N \times N \times N\) MP grid, the full BZ has \(N^3\) points but the irreducible wedge contains only \(\sim N^3/48\) (for Oh symmetry). A \(12 \times 12 \times 12\) mesh has 1728 points total but only \(\sim 72\) irreducible \(\mathbf{k}\)-points for bcc Fe — a factor-of-24 saving.

Convergence rate. For a smooth, periodic integrand (insulators, gapped systems), the trapezoidal rule on a uniform grid converges exponentially with \(N\). For metals with a Fermi surface discontinuity, convergence degrades to \(\mathcal{O}(1/N^2)\) at best and is oscillatory — this is the problem that smearing methods resolve.

Convergence Testing

The \(\mathbf{k}\)-point convergence of a calculation must always be tested explicitly. The standard protocol:

Fix all other parameters (\(E_{\rm cut}\), \(\sigma\), geometry).
Compute the total energy for a sequence of meshes: \(4^3, 6^3, 8^3, 10^3, 12^3, \ldots\)
Plot the energy per atom vs. \(1/N_k\) (or \(1/N_k^{1/3}\)). For insulators, convergence is exponential and the curve drops quickly; for metals, it is a slower power law.
Declare convergence when the energy change between successive meshes is below your target threshold (typically \(1\)–\(5\) meV/atom for structural properties; \(<1\) meV/atom for energy differences and phase stability).

Typical converged meshes by system type:

System type	Typical mesh	k-points (irreducible)
Insulator / semiconductor (primitive cell)	\(6\times6\times6\)	\(\sim 16\)–\(32\)
Simple metal (primitive cell)	\(12\times12\times12\)	\(\sim 72\)–\(120\)
Transition metal (bcc/fcc primitive)	\(16\times16\times16\)	\(\sim 120\)–\(200\)
Magnetic supercell (\(\sim 50\) atoms)	\(4\times4\times4\)	\(\sim 8\)–\(20\)
Surface slab	\(12\times12\times1\)	\(\sim 40\)–\(80\)
Molecular (large box)	\(\Gamma\)-only	\(1\)

For supercells, the BZ folds: a \(2\times2\times2\) supercell of a primitive-cell calculation with an \(N\times N\times N\) mesh is equivalent in k-point density to an \((N/2)\times(N/2)\times(N/2)\) mesh of the primitive cell. Always compare meshes at the same k-point density (points per unit length in reciprocal space), not the same grid indices.

Code Notes: k-point setup in VASP and Quantum ESPRESSO

VASP — the KPOINTS file controls the mesh. The most common format for an automatic MP grid:

Automatic
0
Monkhorst-Pack
 12 12 12
  0  0  0

The last line is the shift (0 0 0 = grid includes Γ; 1 1 1 = shifted half a step, avoids Γ for even-numbered grids). For hexagonal systems, use Gamma-centred grids (replace Monkhorst-Pack with Gamma) to preserve the 3-fold symmetry.

Quantum ESPRESSO — set in the &K_POINTS card of pw.x:

K_POINTS automatic
12 12 12  0 0 0

The three integers are \(N_1 N_2 N_3\); the last three are shifts (0 = no shift, 1 = shift by \(1/(2N_i)\)). Use K_POINTS gamma for \(\Gamma\)-only calculations (molecules, large supercells).

Smearing and Partial Occupancies

The Problem with Sharp Occupation

The KS total energy involves a sum over occupied states weighted by occupation numbers \(f_i\). For an insulator at zero temperature, every state below the gap has \(f_i = 1\) and every state above has \(f_i = 0\). The Brillouin zone (BZ) integral for any smooth quantity — the total energy, charge density, or density of states — converges exponentially with the density of the \(\mathbf{k}\)-point mesh, because the integrand is an analytic function of \(\mathbf{k}\) throughout the BZ. A \(6 \times 6 \times 6\) Monkhorst–Pack grid is often sufficient for a well-converged insulator calculation.

For a metal, the situation is qualitatively different. The occupation function is a step function at the Fermi energy:

\begin{equation} f_i(\mathbf{k}) = \theta(\epsilon_F - \epsilon_i(\mathbf{k})), \label{eq:step-occupation} \end{equation}

which is discontinuous at the Fermi surface — the set of \(\mathbf{k}\)-points where \(\epsilon_i(\mathbf{k}) = \epsilon_F\). The BZ integral of a function with a discontinuity converges only as \(\mathcal{O}(1/N_k)\) with the number of \(\mathbf{k}\)-points \(N_k\), and the approach to convergence is oscillatory (the Gibbs phenomenon). In practice, this means that a metallic calculation with sharp occupations requires an extremely dense \(\mathbf{k}\)-mesh — often \(20 \times 20 \times 20\) or finer — to achieve meV-level convergence, making the calculation prohibitively expensive.

The physical origin is clear: the Fermi surface is a measure-zero set in the BZ, but the energy integral samples it through a finite \(\mathbf{k}\)-mesh. States that are just above or just below \(\epsilon_F\) make discontinuous contributions to the energy as the mesh is refined — a state that is occupied on one mesh may become unoccupied on a slightly denser mesh, causing the total energy to jump.

All smearing methods address this by replacing the discontinuous step function with a smooth approximation, effectively broadening the Fermi surface over an energy window of width \(\sigma\). This converts the BZ integrand from a discontinuous to a smooth (or even analytic) function, restoring rapid convergence with \(\mathbf{k}\)-point density — at the cost of introducing a systematic error that must be controlled or corrected.

Gaussian Smearing

The simplest approach replaces the step function with a smooth occupation based on the complementary error function:

\begin{equation} \boxed{f_i = \frac{1}{2}\,\mathrm{erfc}\!\left(\frac{\epsilon_i - \epsilon_F}{\sigma}\right),} \label{eq:gaussian-occ} \end{equation}

where \(\sigma \gt 0\) is the smearing width and \(\mathrm{erfc}(x) = 1 - \mathrm{erf}(x) = \frac{2}{\sqrt{\pi}}\int_x^\infty e^{-t^2},dt\). This corresponds to convolving the step function with a Gaussian of width \(\sigma\): states more than \(\sim 2\sigma\) below \(\epsilon_F\) have \(f_i \approx 1\); states more than \(\sim 2\sigma\) above have \(f_i \approx 0\); and states within \(\sim \sigma\) of \(\epsilon_F\) have fractional occupations.

The smearing introduces a systematic error in the total energy. For a system with a slowly varying density of states near \(\epsilon_F\), the error scales as:

\begin{equation} E[\sigma] - E_0 = \mathcal{O}(\sigma^2), \label{eq:gaussian-error} \end{equation}

where \(E_0\) is the true zero-temperature energy. To obtain \(E_0\), one must either use a very small \(\sigma\) (which reintroduces the convergence problem) or extrapolate.

The free energy and the \(\sigma \to 0\) extrapolation. The smeared system can be interpreted as a fictitious finite-temperature ensemble with an electronic entropy:

\begin{equation} S = -k_B \sum_{i,\mathbf{k}} w_\mathbf{k}\left[f_i \ln f_i + (1 - f_i)\ln(1 - f_i)\right], \label{eq:entropy} \end{equation}

where \(w_\mathbf{k}\) is the \(\mathbf{k}\)-point weight. The corresponding free energy is:

\begin{equation} F[\sigma] = E[\sigma] - \sigma\, S. \label{eq:free-energy} \end{equation}

For a system with a parabolic density of states near \(\epsilon_F\) (a good approximation for simple metals), the total energy and free energy bracket the true \(E_0\):

\begin{equation} E[\sigma] = E_0 + a\sigma^2 + \mathcal{O}(\sigma^4), \qquad F[\sigma] = E_0 - a\sigma^2 + \mathcal{O}(\sigma^4), \label{eq:E-F-expansion} \end{equation}

with the same coefficient \(a\) but opposite signs. Averaging eliminates the \(\sigma^2\) error:

\begin{equation} \boxed{E_0 \approx \frac{1}{2}\left(E[\sigma] + F[\sigma]\right) + \mathcal{O}(\sigma^4).} \label{eq:sigma-extrapolation} \end{equation}

This is the \(\sigma \to 0\) extrapolation. Most plane-wave codes report this corrected energy alongside the raw total energy and the free energy; it is the value that should be used for comparing total energies, computing energy differences, and extracting formation energies — not the raw \(E[\sigma]\) or \(F[\sigma]\) individually.

Limitation: the extrapolation assumes a smoothly varying density of states. For systems with sharp features near \(\epsilon_F\) (van Hove singularities, narrow \(d\)-bands), the quadratic approximation breaks down and larger errors can persist even after extrapolation.

Fermi–Dirac Smearing

A physically motivated alternative uses the Fermi–Dirac distribution directly:

\begin{equation} \boxed{f_i = \frac{1}{1 + \exp\!\left(\frac{\epsilon_i - \epsilon_F}{k_B T}\right)},} \label{eq:FD-occ} \end{equation}

where \(T\) is interpreted as the electronic temperature and \(\sigma = k_BT\) plays the same role as the Gaussian smearing width. The advantage is physical transparency: Fermi–Dirac smearing corresponds to the exact equilibrium occupation at temperature \(T\). The variational quantity is the Mermin free energy:

\begin{equation} \Omega = E - TS, \label{eq:mermin} \end{equation}

where \(S\) is the exact Fermi–Dirac entropy. For genuine finite-temperature calculations — ab initio molecular dynamics above \(\sim 1000,\)K, warm dense matter, or the electronic contribution to free energies — Fermi–Dirac smearing is the physically correct choice.

For ground-state calculations at \(T = 0\), Fermi–Dirac smearing has a disadvantage: the occupation function has exponential tails extending to \(\pm\infty\), meaning that states far from \(\epsilon_F\) acquire small but non-zero fractional occupations. This is physically correct at finite \(T\) but introduces an \(\mathcal{O}(\sigma^2)\) error at \(T = 0\) that is larger than the corresponding error from Methfessel–Paxton smearing at the same \(\sigma\). The \(\sigma \to 0\) extrapolation of equation \eqref{eq:sigma-extrapolation} can still be applied, but with lower accuracy than Methfessel–Paxton for ground-state energy differences.

Methfessel–Paxton Smearing

The Gaussian and Fermi–Dirac methods both introduce an \(\mathcal{O}(\sigma^2)\) error in the total energy. Methfessel–Paxton (MP) smearing (Methfessel and Paxton, 1989) systematically reduces this error by expanding the \(\delta\)-function approximation in a complete set of Hermite polynomials, achieving \(\mathcal{O}(\sigma^{2N+2})\) accuracy at order \(N\).

The construction starts from the identity: the step function \(\theta(x)\) can be written as \(\frac{1}{2},\mathrm{erfc}(x)\) plus a correction expressible as a series in Hermite polynomials \(H_n(x)\). The key mathematical result is that the Hermite polynomials, weighted by the Gaussian \(e^{-x^2}\), form a complete orthogonal set on \((-\infty, \infty)\). Expanding the difference \(\theta(x) - \frac{1}{2},\mathrm{erfc}(x)\) in this basis and truncating at order \(N\) gives the MP approximation to the \(\delta\)-function:

\begin{equation} \tilde{\delta}_N(x) = \frac{e^{-x^2}}{\sqrt{\pi}} \sum_{n=0}^{N} A_n H_{2n}(x), \label{eq:MP-delta} \end{equation}

where \(x = (\epsilon_i - \epsilon_F)/\sigma\) and the coefficients are \(A_n = (-1)^n / (n!, 4^n\sqrt{\pi})\). The corresponding occupation function is obtained by integration:

\begin{equation} f_N(x) = \int_x^\infty \tilde{\delta}_N(t)\,dt. \label{eq:MP-occupation-general} \end{equation}

The \(N = 0\) case recovers Gaussian smearing: \(\tilde{\delta}_0(x) = e^{-x^2}/\sqrt{\pi}\), \(f_0(x) = \frac{1}{2},\mathrm{erfc}(x)\), with error \(\mathcal{O}(\sigma^2)\).

The \(N = 1\) case is derived explicitly. The first Hermite polynomial correction adds \(A_1 H_2(x) = -\frac{1}{4\sqrt{\pi}}(4x^2 - 2)\) to the \(\delta\)-function. Integrating gives:

\begin{equation} \boxed{f_1(x) = \frac{1}{2}\,\mathrm{erfc}(x) + \frac{1}{\sqrt{\pi}}\,x\,e^{-x^2}, \qquad x = \frac{\epsilon_i - \epsilon_F}{\sigma}.} \label{eq:MP-N1} \end{equation}

Note that \(f_1(x)\) is not monotonic and can take values slightly outside \([0,1]\): for \(x \approx -1\), the occupation exceeds \(1\) slightly, and for \(x \approx 1\), it dips below \(0\). This is not a pathology — the occupation numbers in MP smearing are not probabilities but mathematical weights designed to minimise the BZ integration error. The total electron count \(\sum_i f_i = N\) is still exactly conserved.

The crucial property is that the energy error now scales as \(\mathcal{O}(\sigma^4)\) rather than \(\mathcal{O}(\sigma^2)\):

\begin{equation} E_{N=1}[\sigma] = E_0 + \mathcal{O}(\sigma^4). \label{eq:MP-N1-error} \end{equation}

This means that for \(N \geq 1\), the raw energy \(E[\sigma]\) is already a good approximation to \(E_0\) without the \(\sigma \to 0\) extrapolation. In practice, the extrapolated energy is still computed and should still be used, but the correction it applies is much smaller than for Gaussian smearing at the same \(\sigma\).

The \(N = 2\) case adds the next Hermite correction, achieving \(\mathcal{O}(\sigma^6)\) accuracy. For most applications, \(N = 1\) is sufficient; \(N = 2\) provides a useful cross-check but rarely changes results by more than \(\sim 0.1,\)meV/atom at typical smearing widths.

Why not \(N \gt 2\)? Higher-order MP approximations produce occupation functions with increasingly large oscillations outside \([0,1]\), which can cause numerical instabilities in the SCF cycle (large negative occupations can make the charge density locally negative). The practical consensus is that \(N = 1\) or \(N = 2\) gives the optimal balance between accuracy and numerical stability.

The Tetrahedron Method

All smearing methods introduce a fictitious broadening that must be controlled or corrected. The tetrahedron method (Jepsen and Andersen, 1971; Blöchl, Jepsen, and Andersen, 1994) takes a fundamentally different approach: it performs the BZ integral exactly for a piecewise linear interpolation of the band energies \(\epsilon_i(\mathbf{k})\), without introducing any broadening parameter.

Decomposition. The BZ is partitioned into a regular \(\mathbf{k}\)-mesh. Each parallelepiped of the mesh is subdivided into six congruent tetrahedra (the natural simplicial decomposition of a 3D cell). Within each tetrahedron \(T\) with vertices \(\mathbf{k}_1,\ldots,\mathbf{k}_4\) and band energies \(\epsilon_n = \epsilon_i(\mathbf{k}_n)\) (sorted so \(\epsilon_1 \leq \epsilon_2 \leq \epsilon_3 \leq \epsilon_4\)), the band energy is linearly interpolated:

\[ \tilde\epsilon_i(\mathbf{k}) = \sum_{n=1}^{4} \alpha_n(\mathbf{k}),\epsilon_n, \qquad \mathbf{k} \in T, \]

where \(\alpha_n(\mathbf{k})\) are the barycentric coordinates of \(\mathbf{k}\) in \(T\), with \(\sum_n \alpha_n = 1\). The integration of the step function \(\theta(\epsilon_F - \tilde\epsilon)\) over the linear interpolant inside \(T\) is then a purely geometric problem: the level set \(\tilde\epsilon = \epsilon_F\) is a plane that cuts \(T\) into two polyhedra.

Analytic weights. Carrying out the integral analytically gives the tetrahedron weights \(w_n(\epsilon_F)\), which depend on where \(\epsilon_F\) sits relative to the four vertex energies \(\epsilon_n\). The standard results (Blöchl 1994; for the volume integral \(\int_T \theta(\epsilon_F - \tilde\epsilon)/V_T\)) are:

Regime	Filled fraction of \(T\)
\(\epsilon_F < \epsilon_1\)	\(0\) (tetrahedron empty)
\(\epsilon_1 \leq \epsilon_F < \epsilon_2\)	\(\dfrac{(\epsilon_F - \epsilon_1)^3}{(\epsilon_2-\epsilon_1)(\epsilon_3-\epsilon_1)(\epsilon_4-\epsilon_1)}\)
\(\epsilon_2 \leq \epsilon_F < \epsilon_3\)	piecewise cubic in \(\epsilon_F\) (one truncated corner)
\(\epsilon_3 \leq \epsilon_F < \epsilon_4\)	\(1 - \dfrac{(\epsilon_4 - \epsilon_F)^3}{(\epsilon_4-\epsilon_1)(\epsilon_4-\epsilon_2)(\epsilon_4-\epsilon_3)}\)
\(\epsilon_F \geq \epsilon_4\)	\(1\) (tetrahedron full)

The vertex weights \(w_n\) (the fraction of the integration measure attributable to vertex \(n\)) follow by differentiation: \(w_n = \partial[\text{filled fraction}]/\partial\epsilon_n\). The total electron count is then \(N_{\rm el} = \sum_{i,T}\sum_n w_n^{i,T}(\epsilon_F)\), and the Fermi energy \(\epsilon_F\) is determined by the implicit equation \(N_{\rm el}(\epsilon_F) = N\). Solving for \(\epsilon_F\) is done by bisection over a sorted list of all vertex energies; the resulting integration is exact for any function that is piecewise linear on the tetrahedral mesh.

The Blöchl correction. The linear interpolant misses the curvature of \(\epsilon_i(\mathbf{k})\) within each tetrahedron. The dominant correction is the second-derivative contribution that arises from the Taylor expansion of the true band energy around the tetrahedron centroid. Blöchl, Jepsen, and Andersen (1994) showed that adding the correction

\[ \Delta w_n = \frac{1}{40}\sum_T D_T(\epsilon_F)!\sum_{m=1}^4 (\epsilon_m - \epsilon_n), \]

where \(D_T(\epsilon_F)\) is the density of states from tetrahedron \(T\) at \(\epsilon_F\), restores fourth-order accuracy: the integration error scales as \(\mathcal{O}(N_k^{-4})\) rather than the \(\mathcal{O}(N_k^{-2})\) of the uncorrected linear interpolation. The correction is the second-derivative analogue of Simpson’s rule for the BZ integral — it costs essentially nothing extra (the DOS at \(\epsilon_F\) is already computed) and reduces the \(\mathbf{k}\)-mesh density needed for a target accuracy by a factor of 2–3 in each direction. This corrected tetrahedron method is the gold standard for BZ integration accuracy and is selected in VASP by ISMEAR = -5 (tetrahedra_opt in QE).

Advantages: No smearing parameter to choose or converge. No entropy correction needed. Exact for any quantity that depends linearly on \(\epsilon_i(\mathbf{k})\) within each tetrahedron; fourth-order accurate with the Blöchl correction. Produces smooth, non-oscillatory convergence with \(\mathbf{k}\)-mesh density.

Limitations: The tetrahedron method requires the BZ integral to be performed over a regular mesh — it cannot be used with symmetry-reduced or randomly shifted \(\mathbf{k}\)-sets without additional care. More importantly, the analytic integration assumes a fixed band structure: the Fermi energy and the tetrahedron weights are computed after the eigenvalues are known, so there is no smooth dependence of the total energy on atomic positions. This means the forces computed from a tetrahedron-method calculation contain small but non-zero discontinuities when an eigenvalue crosses \(\epsilon_F\) as atoms move, making the tetrahedron method unsuitable for geometry optimisation or molecular dynamics of metallic systems. For these tasks, MP smearing (\(N = 1\)) is preferred.

The tetrahedron method is, however, the optimal choice for: fixed-geometry total energy calculations, density of states (DOS) and projected DOS, optical properties and spectral functions, and any post-processing quantity requiring accurate spectral weight near the Fermi level.

Practical Guidance

The choice of smearing method and width depends on both the system type and the type of calculation:

Calculation type	System	Recommended method	\(\sigma\)
Geometry optimisation	Metal	MP \(N=1\)	\(0.1\)–\(0.2\) eV
Geometry optimisation	Insulator	Gaussian or tetrahedron	\(0.05\)–\(0.1\) eV (or none)
Single-point energy	Metal	Tetrahedron (Blöchl)	—
Single-point energy	Insulator	Tetrahedron (Blöchl)	—
DOS / optical properties	Any	Tetrahedron (Blöchl)	—
AIMD (\(T \gt 1000,\)K)	Metal	Fermi–Dirac	\(k_BT\)
Phonons (DFPT)	Metal	MP \(N=1\)	\(0.1\)–\(0.2\) eV

Choosing \(\sigma\): The smearing width must be large enough to smooth the Fermi surface sufficiently for the given \(\mathbf{k}\)-mesh, but small enough that the smearing error is acceptable. A useful diagnostic: if the entropy term \(TS\) exceeds \(\sim 1,\)meV/atom, the smearing is too large and either \(\sigma\) should be reduced or the \(\mathbf{k}\)-mesh should be made denser.

The interdependence of \(\sigma\) and \(\mathbf{k}\)-mesh: These two parameters are not independent. A denser \(\mathbf{k}\)-mesh resolves the Fermi surface better, allowing a smaller \(\sigma\); conversely, a larger \(\sigma\) compensates for a coarser mesh. The product \(\sigma \times N_k^{1/3}\) should remain approximately constant for a given target accuracy. In practice, one converges both simultaneously: fix \(\sigma\), increase the \(\mathbf{k}\)-mesh until the energy is stable, then reduce \(\sigma\) and repeat until both are converged.

Marzari–Vanderbilt cold smearing: An alternative to MP smearing that ensures the occupation function remains non-negative while achieving comparable accuracy (Marzari et al., 1999). It avoids the slight negative occupations of MP \(N \geq 1\) that can occasionally cause numerical difficulties. Available in several codes as an option alongside Gaussian and MP smearing.

Code Notes: VASP and Quantum ESPRESSO Parameters

VASP:

Parameter	Controls	Values
`ISMEAR`	Smearing method	\(-5\): tetrahedron (Blöchl); \(-1\): Fermi–Dirac; \(0\): Gaussian; \(1\): MP \(N=1\); \(2\): MP \(N=2\)
`SIGMA`	Smearing width \(\sigma\) (eV)	\(0.01\)–\(0.3\); default \(0.2\)

The extrapolated energy energy(sigma->0) in the OUTCAR corresponds to equation \eqref{eq:sigma-extrapolation} for Gaussian smearing and to the raw energy (with negligible correction) for MP \(N \geq 1\). Always use this value for energy comparisons. The entropy contribution is reported as EENTRO.

ISMEAR = -5 (tetrahedron) must not be used with fewer than \(3\) \(\mathbf{k}\)-points in any direction, or for calculations where forces are needed (geometry optimisation, MD). VASP will run but the forces will contain discontinuities that prevent smooth convergence of the ionic relaxation.

Quantum ESPRESSO:

Parameter	Controls	Values
`occupations`	Occupation scheme	`'smearing'`, `'tetrahedra'`, `'tetrahedra_opt'` (Blöchl), `'fixed'`
`smearing`	Smearing function	`'gaussian'`, `'fermi-dirac'`, `'methfessel-paxton'`, `'marzari-vanderbilt'`
`degauss`	Smearing width \(\sigma\) (Ry)	Typical: \(0.005\)–\(0.02\) Ry (\(0.07\)–\(0.27\) eV)

Note that QE uses Rydberg units for degauss (\(1,\)Ry \(= 13.606,\)eV), while VASP uses eV for SIGMA. A common source of error is confusing the two: degauss = 0.02 Ry \(\approx 0.27,\)eV, which is a reasonable smearing for metals; SIGMA = 0.02 eV is extremely small and may cause convergence problems on a coarse \(\mathbf{k}\)-mesh.

Marzari–Vanderbilt cold smearing is accessed via smearing = 'marzari-vanderbilt' or smearing = 'cold'.

Further Reading

Methfessel–Paxton smearing: M. Methfessel and A. T. Paxton, High-precision sampling for Brillouin-zone integration in metals, Phys. Rev. B 40, 3616 (1989). The original paper deriving the Hermite polynomial expansion and the error scaling \(\mathcal{O}(\sigma^{2N+2})\).
Tetrahedron method with Blöchl correction: P. E. Blöchl, O. Jepsen, and O. K. Andersen, Improved tetrahedron method for Brillouin-zone integrations, Phys. Rev. B 49, 16223 (1994). Derives the quadratic correction that improves convergence from \(\mathcal{O}(1/N_k^2)\) to \(\mathcal{O}(1/N_k^4)\).
Original tetrahedron method: O. Jepsen and O. K. Andersen, The electronic structure of h.c.p. ytterbium, Solid State Commun. 9, 1763 (1971).
Marzari–Vanderbilt cold smearing: N. Marzari, D. Vanderbilt, A. De Vita, and M. C. Payne, Thermal contraction and disordering of the Al(110) surface, Phys. Rev. Lett. 82, 3296 (1999). Introduces the cold smearing function with improved convergence properties for metals.
Mermin finite-temperature DFT: N. D. Mermin, Thermal properties of the inhomogeneous electron gas, Phys. Rev. 137, A1441 (1965). The formal foundation for Fermi–Dirac smearing as a finite-temperature variational principle.
General discussion: The interplay between \(\mathbf{k}\)-mesh density and smearing is discussed clearly in G. Kresse and J. Furthmüller, Phys. Rev. B 54, 11169 (1996), Section II.D.

Eigenvalue Solvers

The Kohn–Sham SCF loop developed in Chapter 8 has three computational hot spots, each handled by a specialised algorithm:

Density mixing (Chapter 8) — converges the outer self-consistency cycle.
Brillouin zone integration (Chapter 9) — assembles the density from band energies and occupations at each \(\mathbf{k}\)-point.
Eigenvalue solver — at every SCF iteration and every \(\mathbf{k}\)-point, solves \(\hat{h}{\rm KS}\phi_i = \epsilon_i\phi_i\) for the lowest \(N{\rm bands}\) eigenpairs.

This chapter develops the third. The Kohn–Sham eigenvalue problem is the inner loop of DFT: it is called \(N_{\rm SCF} \times N_{\mathbf{k}}\) times per total-energy evaluation, so even factor-of-two improvements in its cost translate into substantial wall-clock savings. Because the matrix is too large to diagonalise directly (\(N_{\rm PW} \sim 10^4\)–\(10^5\)) but only the lowest \(N_{\rm bands} \sim 10^2\)–\(10^3\) eigenpairs are needed, the methods used in practice are iterative subspace algorithms: Davidson, RMM-DIIS, and conjugate-gradient minimisation. These trade off robustness, memory, and convergence rate, and the choice depends on system size and band-structure character.

The chapter concludes with a section on parallelisation, which is essential for any production calculation, and with the diagnostic toolkit and worked-example workflow on bcc Fe that ties together everything from Chapters 8–10.

Iterative Diagonalisation

The Eigenvalue Problem and Its Cost

At each SCF step, we must solve the Kohn–Sham eigenvalue equation:

\begin{equation} \hat{h}_{\rm KS} |\phi_i\rangle = \epsilon_i |\phi_i\rangle, \qquad \hat{h}_{\rm KS} = -\frac{1}{2}\nabla^2 + V_{\rm eff}(\mathbf{r}), \label{eq:KS-eigen} \end{equation}

for the lowest \(N_{\rm bands}\) eigenpairs \({(\epsilon_i, |\phi_i\rangle)}\). In a plane-wave basis, \(\hat{h}{\rm KS}\) is represented as a matrix of dimension \(N{\rm PW} \times N_{\rm PW}\), where \(N_{\rm PW}\) is the number of plane waves within the energy cutoff \(E_{\rm cut}\). For a typical calculation:

A \(50\)-atom cell with \(E_{\rm cut} = 500\) eV gives \(N_{\rm PW} \sim 30,000\) per \(\mathbf{k}\)-point. A \(200\)-atom cell gives \(N_{\rm PW} \sim 120,000\). Full diagonalisation of an \(N_{\rm PW} \times N_{\rm PW}\) matrix costs \(\mathcal{O}(N_{\rm PW}^3)\) operations. For \(N_{\rm PW} = 30,000\), this is \(\sim 2.7 \times 10^{13}\) floating-point operations — already at the limit of practicality for a single diagonalisation at a single \(\mathbf{k}\)-point. Since the SCF loop requires \(\sim 20\)–\(50\) diagonalisations, and the BZ integral requires tens to hundreds of \(\mathbf{k}\)-points, full diagonalisation is completely intractable for any system of practical interest.

The key observation is that we do not need all \(N_{\rm PW}\) eigenpairs — only the lowest \(N_{\rm bands}\), which is proportional to the number of electrons and hence to \(N_{\rm atom}\). For a \(50\)-atom cell, \(N_{\rm bands} \sim 200\), which is \(\ll N_{\rm PW}\). Iterative subspace methods exploit this by working in a small subspace of dimension \(m \sim 2\)–\(3 \times N_{\rm bands}\), reducing the cost from \(\mathcal{O}(N_{\rm PW}^3)\) to \(\mathcal{O}(N_{\rm bands}^2 \cdot N_{\rm PW})\) per SCF step — a reduction of several orders of magnitude.

The Davidson Algorithm

The Davidson algorithm (Davidson, 1975; adapted for plane-wave DFT by Wood and Zunger, 1985) is the most widely used iterative eigensolver in electronic structure codes. It belongs to the family of Krylov subspace methods but uses a preconditioner to accelerate convergence dramatically compared to the Lanczos algorithm.

The basic idea. Start with a set of \(N_{\rm bands}\) trial vectors \({|v_1\rangle, \ldots, |v_{N_{\rm bands}}\rangle}\) — typically the eigenvectors from the previous SCF step, or random vectors for the first step. These span an initial subspace \(\mathcal{V}\). The algorithm proceeds iteratively:

Step 1 — Project and diagonalise. Form the projected Hamiltonian and overlap matrices within the current subspace:

\begin{equation} H_{ij}^{\rm sub} = \langle v_i | \hat{h}_{\rm KS} | v_j \rangle, \qquad S_{ij}^{\rm sub} = \langle v_i | v_j \rangle. \label{eq:davidson-project} \end{equation}

Diagonalise the small \(m \times m\) generalised eigenvalue problem \(\mathbf{H}^{\rm sub}\mathbf{c} = \epsilon, \mathbf{S}^{\rm sub}\mathbf{c}\) to obtain approximate eigenvalues \({\epsilon_i^{\rm sub}}\) and eigenvectors expressed as linear combinations of the subspace basis: \(|\tilde{\phi}i\rangle = \sum_j c{ij} |v_j\rangle\). This small diagonalisation costs \(\mathcal{O}(m^3)\), which is negligible.

Step 2 — Compute residuals. For each approximate eigenpair, compute the residual:

\begin{equation} |r_i\rangle = (\hat{h}_{\rm KS} - \epsilon_i^{\rm sub})|\tilde{\phi}_i\rangle. \label{eq:davidson-residual} \end{equation}

If \(|r_i| \leq \delta\) for all desired eigenpairs, the algorithm has converged. The threshold \(\delta\) is typically set to \(10^{-2}\)–\(10^{-3}\) times the SCF energy convergence criterion, since the eigenvalues need not be converged more tightly than the density at each SCF step.

Step 3 — Precondition and expand. The residual \(|r_i\rangle\) points in the direction of steepest descent toward the true eigenvector, but its components at different \(\mathbf{G}\) converge at very different rates. High-\(G\) components (large kinetic energy) have large eigenvalue separations and converge quickly; low-\(G\) components are nearly degenerate and converge slowly. The kinetic energy preconditioner corrects this disparity by approximately inverting the dominant part of the Hamiltonian:

\begin{equation} \boxed{|t_i\rangle = \hat{P}\,|r_i\rangle, \qquad P(\mathbf{G}) = \frac{1}{\frac{1}{2}|\mathbf{k}+\mathbf{G}|^2 - \epsilon_i^{\rm sub}}.} \label{eq:preconditioner} \end{equation}

In reciprocal space, this simply divides each plane-wave component of the residual by the difference between its kinetic energy and the current eigenvalue estimate. The effect is to rotate the residual toward the true eigenspace: components that are far from the eigenvalue (large \(|\frac{1}{2}G^2 - \epsilon_i|\)) are suppressed, while components near the eigenvalue are amplified. This is the same principle as Kerker preconditioning for density mixing (Chapter 8, Section 8.2), applied here to the eigenvalue problem rather than the fixed-point problem.

In practice, a regularisation is applied to prevent the denominator from vanishing when \(\frac{1}{2}G^2 \approx \epsilon_i\). The standard choice is:

\begin{equation} P(\mathbf{G}) = \frac{1}{\max\!\left(\frac{1}{2}|\mathbf{k}+\mathbf{G}|^2 - \epsilon_i^{\rm sub},\; \epsilon_{\rm ref}\right)}, \label{eq:preconditioner-regularised} \end{equation}

where \(\epsilon_{\rm ref}\) is a small positive constant (typically a fraction of the bandwidth).

The preconditioned residuals \({|t_i\rangle}\) are orthogonalised against the existing subspace and added as new basis vectors: \(\mathcal{V} \leftarrow \mathcal{V} \oplus \mathrm{span}{|t_i\rangle}\). The subspace dimension grows from \(m\) to \(m + N_{\rm bands}\).

Step 4 — Restart. When the subspace becomes too large (typically \(m \gt 3 N_{\rm bands}\)), the oldest basis vectors are discarded, keeping only the current best eigenvector approximations. This prevents the subspace from growing indefinitely while retaining the most useful information.

Convergence. With preconditioning, Davidson converges quadratically near the solution — each iteration roughly doubles the number of correct digits in the eigenvalues. In practice, \(3\)–\(5\) Davidson iterations per SCF step suffice to converge the eigenpairs to the required accuracy.

Block Davidson. Rather than processing one eigenpair at a time, the block variant processes all \(N_{\rm bands}\) eigenpairs simultaneously. The subspace expansion adds \(N_{\rm bands}\) preconditioned residuals at each step. This is more efficient because the Hamiltonian action \(\hat{h}_{\rm KS}|v\rangle\) — the most expensive operation, requiring FFTs between real and reciprocal space — can be batched over all vectors in the block.

RMM-DIIS

The Residual Minimisation Method — Direct Inversion in the Iterative Subspace (RMM-DIIS) (Pulay, 1980; Wood and Zunger, 1985; Kresse and Furthmüller, 1996) takes a different approach: instead of expanding a subspace and projecting, it directly minimises the residual norm \(|(\hat{h}_{\rm KS} - \epsilon_i)|\phi_i\rangle|^2\) for each band individually, using the DIIS extrapolation from Section 8.2.

The method. For each band \(i\), maintain a history of trial vectors and their residuals from the last \(m\) iterations: \({|\phi_i^{(n)}\rangle, |r_i^{(n)}\rangle}\). Construct the optimal linear combination:

\begin{equation} |\bar{\phi}_i\rangle = \sum_{k=1}^{m} c_k |\phi_i^{(n-m+k)}\rangle, \label{eq:rmm-trial} \end{equation}

where the coefficients \({c_k}\) minimise the residual norm, exactly as in the density-mixing DIIS of Chapter 8:

\begin{equation} \min_{\{c_k\}} \left\|\sum_k c_k |r_i^{(n-m+k)}\rangle\right\|^2, \qquad \sum_k c_k = 1. \label{eq:rmm-minimise} \end{equation}

This leads to the same small linear system with the overlap matrix \(B_{jk} = \langle r_i^{(j)} | r_i^{(k)} \rangle\) as in equation \eqref{eq:DIIS-system}. The optimal trial vector is then updated by applying the preconditioned residual:

\begin{equation} |\phi_i^{(n+1)}\rangle = |\bar{\phi}_i\rangle + \hat{P}\,|\bar{r}_i\rangle. \label{eq:rmm-update} \end{equation}

Key difference from Davidson: RMM-DIIS does not require the trial vectors to be mutually orthogonal at every step. Orthogonalisation is the most expensive operation in Davidson (\(\mathcal{O}(N_{\rm bands}^2 \cdot N_{\rm PW})\) per step), and RMM-DIIS avoids it almost entirely. This makes RMM-DIIS faster per iteration — typically \(30\)–\(50%\) cheaper than Davidson for the same system.

The risk: subspace collapse. Without explicit orthogonalisation, two or more bands can converge to the same eigenstate, losing one or more eigenpairs from the solution. This is the subspace collapse problem. It manifests as a sudden jump in the total energy when a previously tracked eigenstate “disappears” and is replaced by a duplicate. The risk is highest when eigenvalues are nearly degenerate (e.g., \(d\)-bands in transition metals) or when the initial guess is poor.

Practical mitigation. Occasional re-orthogonalisation — typically every \(5\)–\(10\) RMM-DIIS steps — prevents subspace collapse at minimal cost. Most implementations also monitor the overlap matrix of the trial vectors and trigger re-orthogonalisation when near-linear-dependence is detected.

The hybrid strategy. The most effective approach, used as the default in several major codes, combines both methods: start with a few Davidson steps to establish a well-separated, orthogonal initial subspace, then switch to RMM-DIIS for the remaining iterations where the eigenpairs are already approximately correct and only need refinement. The Davidson phase provides robustness; the RMM-DIIS phase provides speed.

Conjugate Gradient Minimisation

An alternative to the subspace methods above is direct minimisation of the KS total energy functional with respect to the orbital coefficients, using a conjugate gradient (CG) algorithm. Rather than solving the eigenvalue equation \eqref{eq:KS-eigen} iteratively, CG treats the orbitals as variational parameters and minimises:

\begin{equation} E[\{\phi_i\}] = \sum_i f_i \langle\phi_i|\hat{h}_{\rm KS}|\phi_i\rangle + E_{\rm H}[\rho] + E_{\rm xc}[\rho] - \int V_{\rm H}\rho\,d\mathbf{r}, \label{eq:CG-functional} \end{equation}

subject to the orthonormality constraints \(\langle\phi_i|\phi_j\rangle = \delta_{ij}\).

CG is the most robust of the three methods: it is guaranteed to decrease the total energy at every step (it is a true minimisation, not an eigenvalue iteration), and it cannot suffer from subspace collapse. The price is speed — CG converges more slowly than Davidson or RMM-DIIS because it does not exploit the eigenvalue structure of the problem. It is typically \(2\)–\(5 \times\) slower per SCF step.

CG is the method of choice when: the other methods fail to converge (e.g., systems with extremely flat potential energy surfaces or near-degenerate states), when memory is limited (CG requires minimal subspace storage), or for the first few SCF steps of a difficult system where a reliable initial eigenspace must be established before switching to faster methods.

Parallelisation

The eigenvalue solver is also where DFT calculations spend most of their wall-clock time, so its parallelisation determines the scalability of the whole calculation. Plane-wave DFT codes exploit four levels of parallelism, listed in roughly the order they should be increased as the number of cores grows:

1. \(\mathbf{k}\)-point parallelism. Different \(\mathbf{k}\)-points are independent of each other at fixed potential — they communicate only through the density and total energy. This is the most efficient parallelisation: nearly perfect scaling, minimal communication. Limited by the number of irreducible \(\mathbf{k}\)-points in the BZ. Controlled by KPAR (VASP) or -npool (Quantum ESPRESSO).

2. Band parallelism. Different bands (eigenpairs) within a single \(\mathbf{k}\)-point are distributed across cores. Requires communication for orthogonalisation (\(\mathcal{O}(N_{\rm bands}^2 \cdot N_{\rm PW})\)) and for subspace diagonalisation. Scales well up to a few dozen cores per \(\mathbf{k}\)-point, then orthogonalisation overhead becomes limiting. Controlled by NCORE and NPAR (VASP) or -nband (QE).

3. Plane-wave (\(\mathbf{G}\)-vector) parallelism. The plane-wave coefficients of each band are distributed across cores. Required for very large unit cells where bands cannot fit in single-node memory. Requires FFT communication at every Hamiltonian application, which is the dominant cost; the FFT slabs are distributed and 3D transposes are performed at each step. Typically used in combination with band parallelism: cores are arranged in a 2D grid, with \(P_{\rm band} \times P_{\mathbf{G}}\) processes per \(\mathbf{k}\)-point. In QE this is -ntg (task groups for FFT).

4. Subspace-diagonalisation parallelism. The dense \(N_{\rm bands} \times N_{\rm bands}\) subspace matrix is diagonalised at each iteration. For \(N_{\rm bands} > 1000\), this becomes a bottleneck and is parallelised via ScaLAPACK or ELPA. Controlled by LSCALAPACK and LSCAAWARE (VASP) or -ndiag (QE).

Choosing the decomposition. A practical rule of thumb for \(N_{\rm cores}\) total cores:

Set \(P_{\mathbf{k}} = N_{\rm irr.,k}\) or a divisor thereof (one pool per \(\mathbf{k}\)).
Set \(P_{\rm band} \cdot P_{\mathbf{G}} = N_{\rm cores} / P_{\mathbf{k}}\) per pool, with \(P_{\rm band}\) chosen so that each rank holds \(\sim 5\)–\(20\) bands.
Use -ndiag (QE) or the LSCALAPACK route (VASP) only when the subspace diagonalisation shows up significantly in profiling.

For example, a 200-atom magnetic supercell with 8 irreducible \(\mathbf{k}\)-points and \(N_{\rm bands} = 600\), running on 256 cores, might use \(P_{\mathbf{k}} = 8\), \(P_{\rm band} = 8\), \(P_{\mathbf{G}} = 4\), giving \(\sim 75\) bands per band-group and \(\sim 15,000\) \(\mathbf{G}\)-vectors per FFT-group — a balanced point.

Concrete code settings:

Parallelism level	VASP	Quantum ESPRESSO
\(\mathbf{k}\)-points	`KPAR = P_k`	`-npool P_k`
Bands	`NCORE = P_b / NPAR` (NCORE = cores per band group)	`-nband P_b`
Plane waves / FFT	Coupled to `NCORE`	`-ntg P_g`
Subspace diag.	`LSCALAPACK = .TRUE.`	`-ndiag P_d` (must be a perfect square)

For hybrid-functional or DFT+U calculations, the exact-exchange and Hubbard contributions add additional MPI/OpenMP communication patterns; LHFCALC = .TRUE. calculations in VASP often benefit from larger NCORE (more cores per band group) to overlap communication with FFT work.

Computational Scaling

The dominant costs at each SCF step are:

Operation	Scaling	Bottleneck for
Hamiltonian action \(\hat{h}_{\rm KS}\|\phi\rangle\) (FFTs)	\(\mathcal{O}(N_{\rm PW} \log N_{\rm PW})\) per band	Small systems
Orthogonalisation (Davidson)	\(\mathcal{O}(N_{\rm bands}^2 \cdot N_{\rm PW})\)	Medium systems (\(100\)–\(500\) atoms)
Subspace diagonalisation	\(\mathcal{O}(N_{\rm bands}^3)\)	Large systems (\(\gt 500\) atoms)

The crossover between regimes depends on the system and the parallelisation strategy. For most DFT calculations on \(50\)–\(200\) atom cells, the orthogonalisation and subspace diagonalisation costs dominate, giving an effective scaling of \(\mathcal{O}(N^3)\) with system size \(N\). This cubic wall is the fundamental limit of conventional KS-DFT and is the motivation for linear-scaling (\(\mathcal{O}(N)\)) methods, which require localised basis sets and are not yet standard for plane-wave calculations.

Code Notes: VASP and Quantum ESPRESSO Parameters

VASP:

Parameter	Controls	Values
`ALGO`	Diagonalisation algorithm	`Normal`: Davidson; `Fast`: Davidson + RMM-DIIS (default); `All`: Davidson every step; `Conjugate`: CG; `Damped`: damped velocity CG
`NBANDS`	Number of bands to compute	Auto (typically \(N_{\rm elec}/2 + \) padding); increase for metals or unoccupied-state properties
`NELM`	Maximum SCF iterations	\(40\)–\(200\) (default \(60\))
`NELMIN`	Minimum SCF iterations	\(2\)–\(6\) (default \(2\)); increase for difficult systems
`NELMDL`	Non-self-consistent initial steps	\(-5\) to \(-12\); negative = auto; builds initial subspace without updating potential

ALGO = Fast (the default) runs Davidson for the first few iterations to establish the eigenspace, then switches to RMM-DIIS. This is the optimal choice for most systems. Use ALGO = All (Davidson every step) for difficult cases where RMM-DIIS causes instabilities — particularly magnetic metals, systems with nearly degenerate states, or the first calculation on a new system type where robustness matters more than speed. Use ALGO = Damped for molecular dynamics at elevated temperatures where the eigenspectrum changes significantly between ionic steps.

Quantum ESPRESSO:

Parameter	Controls	Values
`diagonalization`	Algorithm	`'david'`: Davidson (default); `'cg'`: conjugate gradient; `'ppcg'`: parallel-preconditioned CG
`diago_david_ndim`	Davidson subspace dimension per band	\(2\)–\(8\) (default \(2\)); increase for difficult convergence
`diago_full_acc`	Force full accuracy at every SCF step	`.true.` or `.false.` (default); use `.true.` for phonons and response properties
`nbnd`	Number of bands	Auto; increase for metals or when unoccupied states are needed
`electron_maxstep`	Maximum SCF iterations	\(100\)–\(200\) (default \(100\))

In QE, diagonalization = 'david' is the default and works well for most systems. The 'cg' option is more robust but slower, and is recommended when Davidson fails to converge. The 'ppcg' option (parallel-preconditioned CG) is designed for massively parallel runs where band parallelism is dominant.

Further Reading

Davidson algorithm (original): E. R. Davidson, The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices, J. Comput. Phys. 17, 87 (1975). The foundational paper for subspace iteration with preconditioning.
Adaptation to DFT: D. M. Wood and A. Zunger, A new method for diagonalising large matrices, J. Phys. A 18, 1343 (1985). Adapts the Davidson method to the plane-wave pseudopotential framework.
VASP implementation (Davidson + RMM-DIIS): G. Kresse and J. Furthmüller, Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set, Phys. Rev. B 54, 11169 (1996). Sections III and IV describe the block Davidson and RMM-DIIS implementations in detail, including the hybrid strategy.
Conjugate gradient for DFT: M. C. Payne, M. P. Teter, D. C. Allan, T. A. Arias, and J. D. Joannopoulos, Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients, Rev. Mod. Phys. 64, 1045 (1992). The standard reference for CG-based total energy minimisation.
Linear-scaling methods: S. Goedecker, Linear scaling electronic structure methods, Rev. Mod. Phys. 71, 1085 (1999). Review of \(\mathcal{O}(N)\) approaches that circumvent the cubic scaling wall.

Convergence Diagnostics and Practical Workflow

Reading the SCF Output

A converging SCF calculation produces a characteristic pattern in its output: the total energy decreases monotonically (or nearly so) toward a limiting value, the energy change per iteration \(\Delta E^{(n)} = E^{(n)} - E^{(n-1)}\) decreases geometrically, and the density residual norm \(|R^{(n)}|\) decreases in parallel. Understanding what each output quantity means is essential for diagnosing problems.

Total energy. The KS total energy \(E^{(n)}\) at iteration \(n\) is not variational with respect to the density — it is only variational at self-consistency. During the SCF cycle, \(E^{(n)}\) can increase between iterations (especially in the first few steps when the density is far from self-consistent). A sustained increase after \(\sim 10\) iterations is a sign of divergence.

The Harris–Foulkes energy. An alternative energy estimate that is variational at each step. It uses the input density \(\rho_{\rm in}^{(n)}\) rather than the self-consistent density to evaluate the XC and Hartree contributions, giving an upper bound to the true ground-state energy. The difference between the Harris–Foulkes energy and the KS energy provides a measure of how far the current density is from self-consistency. When both agree to within the convergence threshold, the SCF cycle has converged.

Energy change \(\Delta E\). This is the primary convergence indicator. A converged calculation shows \(|\Delta E|\) decreasing by roughly one order of magnitude every \(2\)–\(5\) iterations (for Pulay/Broyden mixing). If \(|\Delta E|\) stalls at a plateau or oscillates, see the diagnostic table in the Part II overview.

Density residual. Some codes report the norm of the density residual \(|R^{(n)}| = |\rho_{\rm out}^{(n)} - \rho_{\rm in}^{(n)}|\) or the RMS change in the charge density. This should decrease in tandem with \(|\Delta E|\). A large residual at apparent energy convergence indicates that the energy is accidentally stationary (a saddle point of the SCF map) rather than truly self-consistent — increase the maximum number of SCF iterations and tighten the convergence threshold.

Convergence Thresholds

The required SCF convergence depends on the property being computed:

Target property	Required energy convergence	Rationale
Qualitative band structure	\(10^{-4}\) eV	Band positions change by \(\leq 10\) meV
Total energy differences	\(10^{-6}\) eV	Formation energies accurate to \(\sim 1\) meV/atom
Magnetic moments	\(10^{-6}\) eV	Moment stable to \(\leq 0.01,\mu_B\)
Forces (geometry optimisation)	\(10^{-7}\) eV	Forces accurate to \(\sim 1\) meV/Å
Stress tensor / elastic constants	\(10^{-8}\) eV	Stress accurate to \(\sim 0.1\) kbar
Phonons (DFPT / finite differences)	\(10^{-8}\) eV	Force constants require tight force convergence

Why tighter is not always better. Beyond a certain point, tightening the SCF threshold does not improve results because other sources of error dominate: finite \(\mathbf{k}\)-mesh, finite \(E_{\rm cut}\), pseudopotential transferability, and the XC functional approximation itself. A calculation converged to \(10^{-10}\) eV is not more accurate than one converged to \(10^{-7}\) eV if the \(\mathbf{k}\)-mesh introduces \(1,\)meV/atom errors. Additionally, round-off errors in double-precision arithmetic set a floor at \(\sim 10^{-12}\)–\(10^{-14}\) eV for typical system sizes.

Common Failure Modes and Remedies

The Part II overview provided a quick-reference table. Here we expand the most important failure modes with their underlying causes and systematic remedies.

Failure 1: Total energy oscillates without decreasing. The density is sloshing between iterations — the mixing parameter \(\alpha\) is too large for the system’s dielectric response (Chapter 8, Section 8.1). Reduce \(\alpha\) by a factor of \(2\)–\(5\). If Kerker preconditioning is not active, enable it. If the system is a metal or small-gap semiconductor, ensure that an appropriate smearing is used (Section 9.1) — sharp occupations exacerbate charge sloshing by making the density response discontinuous.

Failure 2: SCF converges slowly then stalls. The DIIS/Broyden subspace is contaminated by poor early iterates. Increase the number of initial non-DIIS steps (simple mixing with small \(\alpha\)) to bring the density into the basin of convergence before the subspace extrapolation takes over. Alternatively, increase the subspace size to allow more history, or clear the subspace and restart the mixing.

Failure 3: Spin channels oscillate out of phase. In spin-polarised calculations, the up and down densities can trade magnitude between iterations, producing an apparent antiferromagnetic oscillation that prevents convergence. This usually indicates either an incorrect initial magnetic configuration or a genuine competition between ferromagnetic and antiferromagnetic solutions. Fix the initial moments to match the expected ground state. Use a more robust diagonaliser (Davidson rather than RMM-DIIS) for the first \(\sim 10\) SCF steps to prevent subspace collapse from coupling to the spin oscillation. If both FM and AFM solutions converge, compare their total energies to determine the ground state.

Failure 4: Eigenvalue solver converges to wrong states. RMM-DIIS has lost track of one or more bands due to subspace collapse. The symptom is a sudden jump in the total energy mid-SCF, often accompanied by a change in the number of occupied states or the magnetic moment. Switch to Davidson for the problematic calculation. If the problem persists, increase the number of bands to provide a larger buffer between occupied and unoccupied states.

Failure 5: Energy converged but forces are noisy. The SCF threshold is too loose for the force accuracy required. Forces depend on the density through the Hellmann–Feynman theorem (Chapter 10), and small density errors that do not affect the total energy can produce significant force errors. Tighten the energy convergence by one to two orders of magnitude. Also check that the smearing is appropriate — tetrahedron method forces contain discontinuities for metals (Section 9.1).

Worked Example: SCF Convergence for bcc Fe

To tie the chapter together, consider the SCF convergence of ferromagnetic bcc Fe — a system that exercises every piece of machinery developed in Chapters 8 and 9.

System: bcc Fe, \(a = 2.87,\)Å, \(1\)-atom primitive cell, ferromagnetic with initial moment \(5.0,\mu_B\), \(E_{\rm cut} = 500,\)eV, \(12 \times 12 \times 12\) \(\mathbf{k}\)-mesh.

Smearing: Methfessel–Paxton \(N = 1\) with \(\sigma = 0.2,\)eV — this is a metal, so the tetrahedron method would give discontinuous forces, and Gaussian smearing would require the \(\sigma \to 0\) extrapolation.

Mixing: Pulay/DIIS with Kerker preconditioning (\(\alpha = 0.05\), \(k_0 \sim 1.5,\)Å\(^{-1}\)) — a small \(\alpha\) because Fe is a magnetic metal with both charge-sloshing and spin-oscillation risks.

Diagonalisation: Davidson for the first \(5\) steps to establish a clean eigenspace, then RMM-DIIS for speed.

Expected convergence behaviour:

Iterations \(1\)–\(5\): The total energy drops rapidly (by \(\sim 1\) eV) as the initial atomic superposition density relaxes toward the crystalline density. The magnetic moment adjusts from \(5.0,\mu_B\) toward the converged value of \(\sim 2.2,\mu_B\). Davidson diagonalisation ensures the correct band ordering is established.

Iterations \(5\)–\(15\): The energy change decreases geometrically, roughly one order of magnitude every \(3\)–\(4\) iterations. The DIIS extrapolation begins to accelerate convergence. The switch to RMM-DIIS at iteration \(5\) speeds up each step by \(\sim 40%\).

Iterations \(15\)–\(25\): Final convergence to \(10^{-7}\) eV. The magnetic moment is stable at \(2.17\)–\(2.22,\mu_B\) (depending on the functional). The entropy correction \(TS \approx 0.3,\)meV/atom confirms that \(\sigma = 0.2,\)eV is acceptable.

What would go wrong with poor settings: With simple mixing (\(\alpha = 0.3\), no Kerker), the same system would oscillate for \(\gt 200\) iterations without converging — the charge-sloshing modes at small \(\mathbf{G}\) are barely damped. With Gaussian smearing (rather than MP \(N=1\)), the total energy would converge but the \(\sigma \to 0\) extrapolation would introduce an additional \(\sim 5,\)meV error. With RMM-DIIS from the start (no initial Davidson), there is a risk of subspace collapse in the \(d\)-bands, which are nearly five-fold degenerate and prone to mixing.

Code Notes: VASP and Quantum ESPRESSO Parameters — Full Reference

VASP — Complete SCF parameter list:

Parameter	Controls	Recommended	Notes
`ALGO`	Diagonalisation	`Fast` (default)	`All` for difficult systems
`NELM`	Max SCF iterations	\(60\)–\(200\)	Increase for slabs, defects
`NELMIN`	Min SCF iterations	\(2\)–\(6\)	Prevents premature exit
`NELMDL`	Initial non-DIIS steps	\(-5\) to \(-12\)	Negative = auto
`EDIFF`	Energy convergence (eV)	\(10^{-6}\)–\(10^{-8}\)	Match to target property
`ISMEAR`	Smearing method	\(1\) (MP) for metals; \(-5\) (tet) for DOS	See Section 9.1
`SIGMA`	Smearing width (eV)	\(0.1\)–\(0.2\)	Check `EENTRO` \(\lt 1\) meV/atom
`AMIX`	Charge mixing \(\alpha\)	\(0.02\)–\(0.4\)	Small for metals
`BMIX`	Kerker \(k_0\)	\(0.5\)–\(3.0\)	Enable for metals
`AMIX_MAG`	Spin mixing \(\alpha\)	\(2\)–\(4 \times\) `AMIX`	Often larger than charge mixing
`BMIX_MAG`	Kerker for spin	\(\sim\) `BMIX`
`MAXMIX`	Mixing history size	\(20\)–\(60\)	Increase for slabs
`IMIX`	Mixing scheme	\(4\) (Broyden, default)
`NBANDS`	Number of bands	Auto + padding	Increase for metals
`LREAL`	Real-space projection	`.FALSE.` for small cells	`.TRUE.` for \(\gt 20\) atoms

Quantum ESPRESSO — Complete SCF parameter list:

Parameter	Controls	Recommended	Notes
`diagonalization`	Algorithm	`'david'` (default)	`'cg'` for robustness
`electron_maxstep`	Max SCF iterations	\(100\)–\(200\)
`conv_thr`	Energy convergence (Ry)	\(10^{-8}\)–\(10^{-10}\)	Note: Ry units
`occupations`	Occupation scheme	`'smearing'` or `'tetrahedra_opt'`
`smearing`	Smearing function	`'methfessel-paxton'` for metals
`degauss`	Smearing width (Ry)	\(0.005\)–\(0.02\)	\(1,\)Ry \(= 13.6,\)eV
`mixing_beta`	Mixing parameter	\(0.1\)–\(0.7\)	Small for metals
`mixing_mode`	Preconditioning	`'local-TF'` for metals
`mixing_ndim`	Mixing history	\(4\)–\(12\)
`nbnd`	Number of bands	Auto	Increase for metals
`diago_david_ndim`	Davidson subspace	\(2\)–\(8\)

Outlook

Chapters 8 and 9 together provide the complete numerical machinery for solving the Kohn–Sham equations self-consistently: density mixing drives the outer SCF loop to convergence (Chapter 8), while smearing regularises the BZ integrals and iterative diagonalisation makes the eigenvalue problem tractable (this chapter). With a converged, self-consistent ground-state density in hand, the next step is to extract physical observables.

The most immediate observable beyond the total energy is the force on each atom — the negative gradient of the total energy with respect to atomic positions. The Hellmann–Feynman theorem provides an exact expression for this force, but its practical evaluation requires careful treatment of the Pulay corrections that arise when the basis set depends on atomic positions. Forces, in turn, enable geometry optimisation (finding the equilibrium structure) and ab initio molecular dynamics (following the time evolution of the nuclei on the DFT potential energy surface). These topics are the subject of the next chapter.

Examination Questions and Answer Hints

This chapter collects representative examination questions drawn from all parts of the course, pitched at the M.Sc. level. Each question is followed by a structured Answer Hint — not a model answer, but a map of the concepts, logical steps, and equations an examiner would expect to see. Working through these before an examination is most effective when you attempt the question before reading the hint.

The questions are grouped by chapter and marked with an estimated difficulty: [S] short answer (5–10 min), [M] medium (15–20 min), [L] long / proof-based (30+ min).

Tip

Note on prerequisites. Q1.3 (variational He atom) and Q2.1 (HK proof) both rely on the Rayleigh–Ritz variational principle. These questions assume familiarity with the variational theorem: for any normalised trial state \(|\tilde\Psi\rangle\), the expectation value of the Hamiltonian satisfies \(\langle\tilde\Psi|\hat{H}|\tilde\Psi\rangle \geq E_0\), with equality iff \(|\tilde\Psi\rangle\) is the true ground state. This result underpins both the justification for DFT energy minimisation and the proof of the HK theorems, and is a thread that runs through the entire course.

Part I: Theoretical Foundations

Chapter 1 — The Many-Body Problem

Q1.1 [S] Write down the full many-body Hamiltonian for a system of \(N\) electrons and \(M\) nuclei. Identify each term and state which terms are neglected under the Born–Oppenheimer approximation. Why is this approximation justified?

**Answer Hint.** The Hamiltonian has five terms: nuclear kinetic energy \\(\hat{T}_n\\), electronic kinetic energy \\(\hat{T}_e\\), electron–electron repulsion \\(\hat{V}_{ee}\\), nuclear–nuclear repulsion \\(\hat{V}_{nn}\\), and electron–nuclear attraction \\(\hat{V}_{en}\\). Under Born–Oppenheimer, \\(\hat{T}_n\\) is neglected and \\(\hat{V}_{nn}\\) becomes a constant. Justification: nuclei are \\(\sim 10^3\\)–\\(10^5\\) times heavier than electrons, so they move on a far slower timescale. Electrons adiabatically follow the nuclear configuration, decoupling the two equations.

Q1.2 [S] Why does the number of variables in \(\Psi(r_1, r_2, \ldots, r_N)\) scale as \(3N\) rather than 3? What practical consequence does this have for \(N = 100\)?

**Answer Hint.** Each electron has three spatial coordinates; the wavefunction lives in \\(3N\\)-dimensional configuration space. For \\(N=100\\), storing \\(\Psi\\) on a grid of even 10 points per dimension requires \\(10^{300}\\) numbers — vastly more than the estimated \\(10^{80}\\) atoms in the observable universe. This exponential wall motivates the move to density-based (DFT) or other reduced descriptions.

Q1.3 [M] Use a hydrogenic trial wavefunction \(\phi(\mathbf{r}) = (\alpha^3/\pi)^{1/2} e^{-\alpha r}\) to estimate the ground-state energy of helium variationally. Find the optimal \(\alpha_{\rm opt}\), compute \(E(\alpha_{\rm opt})\), and compare to the Hartree–Fock result (\(-77.9\) eV) and experiment (\(-79.0\) eV). What physics is missing from this trial function?

**Answer Hint.**

Setup: Write the He Hamiltonian as \(\hat{H} = \hat{h}1 + \hat{h}2 + \hat{V}{ee}\), where \(\hat{h}i = -\tfrac{1}{2}\nabla_i^2 - Z/r_i\) (\(Z=2\)) and \(\hat{V}{ee} = 1/r{12}\). Use the product ansatz \(\Phi(\mathbf{r}_1, \mathbf{r}_2) = \phi(\mathbf{r}_1)\phi(\mathbf{r}_2)\) — two electrons in the same hydrogenic orbital with variational exponent \(\alpha\).

Three expectation values:

Kinetic energy per electron: \(\langle\phi|-\tfrac{1}{2}\nabla^2|\phi\rangle = \alpha^2/2\).
Electron–nuclear attraction: \(\langle\phi|-Z/r|\phi\rangle = -Z\alpha\).
Electron–electron repulsion (the key integral): for two hydrogenic densities, the standard result is \(\langle \hat{V}_{ee}\rangle = 5\alpha/8\).

Total energy:

\[ E(\alpha) = \alpha^2 - 2Z\alpha + \frac{5\alpha}{8} = \alpha^2 - \left(2Z - \frac{5}{8}\right)\alpha. \]

Minimisation: \(\partial E/\partial\alpha = 0\) gives

\[ \alpha_{\rm opt} = Z - \frac{5}{16} = 2 - \frac{5}{16} = \frac{27}{16} \approx 1.6875. \]

The physical interpretation: \(\alpha_{\rm opt} < Z = 2\) because each electron partially screens the nucleus from the other, reducing the effective nuclear charge it feels.

Optimal energy:

\[ E(\alpha_{\rm opt}) = -\left(Z - \frac{5}{16}\right)^2 = -\left(\frac{27}{16}\right)^2 \approx -2.848 \text{ Ha} \approx -77.5 \text{ eV}. \]

Comparison: This is above the HF limit (\(-77.9\) eV) and experiment (\(-79.0\) eV), consistent with the variational principle — no approximation can go below the true ground state. The remaining \(\sim 1.1\) eV gap from experiment is electron correlation: the instantaneous \(r_{12}\)-dependence of the wavefunction, absent in any single-product ansatz. This is precisely the correlation energy \(E_c\) that DFT’s \(E_{\rm xc}\) must capture.

Chapter 2 — Hohenberg–Kohn Theorems

Q2.1 [L] State and prove the First Hohenberg–Kohn Theorem. Your proof must clearly identify where the Rayleigh–Ritz variational principle is used.

**Answer Hint.**

Statement: The external potential \(V_{\rm ext}(\mathbf{r})\) is determined uniquely (up to a constant) by the ground-state electron density \(\rho_0(\mathbf{r})\).

Proof structure (by contradiction): Assume two potentials \(V\) and \(V’\) (differing by more than a constant) give the same \(\rho_0\). They define two Hamiltonians \(\hat{H}\), \(\hat{H}‘\) with ground states \(\Psi_0\), \(\Psi_0’\) and energies \(E_0\), \(E_0’\). Apply the variational principle using \(\Psi_0’\) as a trial state for \(\hat{H}\):

\[ E_0 < \langle \Psi_0’ | \hat{H} | \Psi_0’ \rangle = E_0’ + \int [V - V’]\rho_0, d\mathbf{r}. \]

By symmetry (swap primed and unprimed):

\[ E_0’ < E_0 + \int [V’ - V]\rho_0, d\mathbf{r}. \]

Adding gives \(E_0 + E_0’ < E_0 + E_0’\) — a contradiction. Note: the strict inequality requires the ground states to be non-degenerate; address this subtlety if asked.

Q2.2 [M] State the Second Hohenberg–Kohn Theorem and prove it using the result of the First. What role does \(v\)-representability play?

**Answer Hint.**

Statement: The true ground-state density \(\rho_0\) minimises the total energy functional \(E[\rho] = F[\rho] + \int V_{\rm ext}\rho, d\mathbf{r}\) among all \(v\)-representable densities.

Proof: By HK1, every trial density \(\tilde\rho \neq \rho_0\) maps to a unique \(\tilde\Psi \neq \Psi_0\). Since \(\Psi_0\) is the true ground state, the variational principle gives \(E_0 = \langle\Psi_0|\hat{H}|\Psi_0\rangle \leq \langle\tilde\Psi|\hat{H}|\tilde\Psi\rangle = E[\tilde\rho]\), with equality iff \(\tilde\rho = \rho_0\).

\(v\)-representability: The proof requires \(\tilde\rho\) to be the ground-state density of some external potential. Not all non-negative, \(N\)-normalised densities satisfy this condition — it is not trivially obvious which densities are physically realisable as ground states. Lieb’s constrained-search (Legendre-transform) formulation defines \(F[\rho]\) via

\[ F[\rho] = \inf_{\Psi \to \rho} \langle\Psi|\hat{T}+\hat{V}_{ee}|\Psi\rangle, \]

a minimisation over all antisymmetric \(\Psi\) yielding density \(\rho\). This is well-defined for any \(N\)-representable \(\rho\) (a much larger class), sidestepping the \(v\)-representability restriction entirely. The KS scheme then requires the weaker condition of non-interacting \(v\)-representability: the ground-state density of the interacting system must also be achievable as the ground-state density of some non-interacting system in a local potential. This is assumed throughout KS theory and is believed to hold for all physical ground-state densities, but remains unproven in general.

Q2.3 [S] What is the universal functional \(F[\rho]\)? Why is it called “universal,” and why is computing it accurately the central challenge of DFT?

**Answer Hint.** \\(F[\rho] = \langle\Psi[\rho]|\hat{T}+\hat{V}_{ee}|\Psi[\rho]\rangle\\). It is "universal" because it depends only on the density, not on \\(V_{\rm ext}\\) — it is the same functional for all systems. Computing it is hard because it encodes all many-body kinetic and correlation effects. In particular, the kinetic energy \\(T[\rho]\\) as a functional of density alone (as in Thomas–Fermi theory) is highly inaccurate; this motivates the Kohn–Sham trick of introducing orbitals.

Q2.4 [S] A student claims: “The HK theorem tells us DFT is exact, so any DFT calculation gives the exact answer.” Identify two reasons why this claim is wrong.

**Answer Hint.** (i) The HK theorems guarantee *existence* of a universal functional, not its explicit form. The exchange-correlation functional \\(E_{\rm xc}[\rho]\\) must be approximated in every practical calculation; LDA, GGA etc. introduce uncontrolled errors. (ii) The HK theorems apply only to the *ground state* at zero temperature. Excited states, finite temperature, and time-dependent processes require extensions (TDDFT, finite-temperature DFT).

Chapter 3 — Kohn–Sham Equations

Q3.1 [M] Derive the Kohn–Sham equations from the variational minimisation of the total energy functional. Clearly define the Kohn–Sham effective potential \(V_{\rm eff}\).

**Answer Hint.** Start from

\[ E[\rho] = T_s[\rho] + E_{\rm H}[\rho] + E_{\rm xc}[\rho] + \int V_{\rm ext}\rho, d\mathbf{r}. \]

Introduce \(T_s\) as an explicit orbital functional: \(T_s = \sum_i f_i \langle\phi_i|-\tfrac{1}{2}\nabla^2|\phi_i\rangle\), with \(\rho(\mathbf{r}) = \sum_i f_i |\phi_i(\mathbf{r})|^2\). Impose orthonormality \(\langle\phi_i|\phi_j\rangle = \delta_{ij}\) via Lagrange multipliers \(\epsilon_{ij}\). The stationarity condition \(\delta\mathcal{L}/\delta\phi_i^* = 0\) gives

\[ -\frac{1}{2}\nabla^2\phi_i + \frac{\delta(E_{\rm H}+E_{\rm xc})}{\delta\rho},\phi_i + V_{\rm ext}\phi_i = \sum_j \epsilon_{ij}\phi_j. \]

The functional derivatives are: \(\delta E_{\rm H}/\delta\rho = V_{\rm H}(\mathbf{r}) = \int\rho(\mathbf{r}‘)/|\mathbf{r}-\mathbf{r}’|, d\mathbf{r}’\) and \(\delta E_{\rm xc}/\delta\rho = V_{\rm xc}(\mathbf{r})\). Choosing the unitary transformation that diagonalises the multiplier matrix \(\epsilon_{ij} \to \epsilon_i \delta_{ij}\) (canonical KS orbitals) yields the KS equations in their standard form:

\[ \left[-\frac{1}{2}\nabla^2 + V_{\rm eff}(\mathbf{r})\right]\phi_i(\mathbf{r}) = \epsilon_i,\phi_i(\mathbf{r}), \qquad V_{\rm eff} = V_{\rm ext} + V_{\rm H} + V_{\rm xc}. \]

Note: \(V_{\rm eff}\) depends on \(\rho\) which depends on the \(\phi_i\) — the equations are nonlinear and must be solved self-consistently.

Q3.2 [S] The KS scheme introduces a fictitious non-interacting system. In what precise sense is this system “fictitious,” and what physical quantity does it share with the true interacting system?

**Answer Hint.** The non-interacting system has no electron–electron interaction; it moves in an effective one-body potential \\(V_{\rm eff}\\) that is designed so that its ground-state density equals that of the true interacting system. The *density* \\(\rho(\mathbf{r})\\) is exact (by construction); the individual KS eigenvalues \\(\epsilon_i\\) and orbitals \\(\phi_i\\) are *not* observables of the real system (with the notable exception of the highest occupied eigenvalue, which equals the negative of the ionisation energy under the exact XC functional).

Q3.3 [S] Write out the self-consistent field (SCF) cycle for solving the KS equations. What quantity is converged, and why is iterative solution necessary?

**Answer Hint.** The SCF loop: (1) Start with an initial guess \\(\rho^{(0)}\\). (2) Construct \\(V_{\rm eff}[\rho]\\). (3) Solve the KS eigenvalue problem to get \\(\{\phi_i, \epsilon_i\}\\). (4) Compute the output density \\(\rho^{\rm out} = \sum_i f_i |\phi_i|^2\\). (5) Mix \\(\rho^{\rm in}\\) and \\(\rho^{\rm out}\\) to form the new input. Repeat until \\(|\rho^{\rm out} - \rho^{\rm in}| \lt \epsilon_{\rm tol}\\). Iteration is necessary because \\(V_{\rm eff}\\) depends on \\(\rho\\), which in turn depends on the eigenstates of \\(V_{\rm eff}\\) — a nonlinear self-consistency condition.

Chapter 4 — Exchange-Correlation Functionals

Q4.1 [S] Write down the LDA expression for \(E_{\rm xc}\) and state the physical system from which \(\varepsilon_{\rm xc}^{\rm UEG}(\rho)\) is derived.

**Answer Hint.**

\[ E_{\rm xc}^{\rm LDA}[\rho] = \int \varepsilon_{\rm xc}^{\rm UEG}(\rho(\mathbf{r})),\rho(\mathbf{r}), d\mathbf{r}. \]

\(\varepsilon_{\rm xc}^{\rm UEG}\) is the XC energy per particle of the uniform electron gas (UEG), a model of interacting electrons in a uniform positive background. The exchange part is analytic (Dirac): \(\varepsilon_{\rm x}^{\rm UEG} = -(3/4)(3/\pi)^{1/3}\rho^{1/3}\). The correlation part is obtained from quantum Monte Carlo simulations (Ceperley–Alder data, parameterised by Perdew–Wang or Vosko–Wilk–Nusair).

Q4.2 [M] Explain why LDA often gives surprisingly good results despite being based on the uniform electron gas. What is the role of the XC hole sum rule?

**Answer Hint.** The XC energy depends on the *spherically averaged* XC hole \\(\bar{n}_{\rm xc}(r)\\), not its angular details. The LDA hole (borrowed from the UEG) satisfies the exact sum rule \\(\int n_{\rm xc}(\mathbf{r},\mathbf{r}')\, d\mathbf{r}' = -1\\) by construction, so its spherical average integrates correctly. Even if the shape of the hole is wrong in detail, this constraint produces systematic error cancellation. The overbinding tendency of LDA (too negative \\(E_{\rm xc}\\)) is a known and quantifiable error rather than random noise.

Q4.3 [M] Describe the GGA improvement over LDA. Why does the raw gradient expansion approximation (GEA) fail, and how does PBE correct for this?

**Answer Hint.** GEA adds a correction proportional to \\(|\nabla\rho|^2/\rho^{4/3}\\) (the dimensionless reduced gradient \\(s^2\\)), but it violates the XC hole sum rule in the density tails (where \\(s \to \infty\\)) — the hole becomes positive and the \\(-1\\) sum rule is broken.

PBE uses an enhancement factor form \(E_{\rm xc}^{\rm GGA} = \int \varepsilon_{\rm xc}^{\rm UEG}(\rho) F_{\rm xc}(\rho, s)\rho, d\mathbf{r}\) and determines \(F_{\rm xc}\) by imposing exact constraints: recovery of LDA at \(s=0\), correct GEA gradient coefficient near \(s=0\), saturation at large \(s\) to restore the sum rule, and the Lieb–Oxford bound. PBE has zero empirical parameters.

Q4.4 [S] What is Jacob’s Ladder in the context of XC functionals? Name one functional at each of the first four rungs and the key new ingredient added at each rung.

**Answer Hint.**

Rung	Ingredient added	Example
LDA	UEG energy density \(\rho\)	PW92
GGA	Reduced gradient \(s =	\nabla\rho
meta-GGA	Kinetic energy density \(\tau\) or \(\nabla^2\rho\)	SCAN
Hybrid	Fraction of exact (HF) exchange	PBE0, HSE06

Double hybrids add perturbative correlation (GL2). Each rung adds non-locality of a different kind.

Q4.5 [S] What is the self-interaction error (SIE) in DFT, and why does it not appear in Hartree–Fock theory?

**Answer Hint.** In exact theory, the Hartree self-repulsion of each electron \\(E_{\rm H}[\rho_i]\\) is exactly cancelled by a corresponding exchange term. In DFT with approximate \\(E_{\rm xc}\\), this cancellation is incomplete, leaving a spurious self-repulsion. Consequence: delocalisation error, incorrect dissociation of \\(H_2^+\\), underestimated barriers. In Hartree–Fock, the exact exchange integral \\(\langle ij|ji\rangle\\) cancels the self-interaction in \\(E_{\rm H}\\) exactly for each orbital, by construction.

Q4.6 [S] A DFT+PBE calculation of a van der Waals bonded molecular crystal (e.g. graphite or a layered organic solid) predicts a near-zero interlayer binding energy. Explain precisely why PBE fails here, and describe two physically motivated remedies.

**Answer Hint.** London dispersion forces arise from *non-local, correlated density fluctuations* at large inter-fragment separations \\(R\\), producing a \\(-C_6/R^6\\) attraction. PBE is semi-local: \\(E_{\rm xc}^{\rm PBE}\\) at point \\(\mathbf{r}\\) depends only on \\(\rho\\) and \\(\nabla\rho\\) at that same point. It cannot describe the instantaneous dipole–dipole correlations between distant fragments; the interaction energy decays exponentially rather than as \\(R^{-6}\\). The GEA and its constrained resummations cannot cure this — it is a fundamental non-locality, not a gradient correction problem.

Remedies:

DFT-D3(BJ) (Grimme et al., 2010): Add a pairwise atom–atom correction \(-C_6^{AB}/r_{AB}^6 - C_8^{AB}/r_{AB}^8\) with Becke–Johnson damping. Cheap, widely implemented, recommended default for molecular crystals. Invoke in VASP via IVDW = 12.
Non-local correlation functional (vdW-DF / VV10): Replace \(E_c^{\rm GGA}\) with a functional that includes a double spatial integral coupling densities at different points \(\mathbf{r}\) and \(\mathbf{r}’\). Physically includes the \(C_6/R^6\) tail without empirical atom parameters. Computationally more expensive but more transferable, especially for heterogeneous systems.

The key point for the examiner: the failure is not due to missing gradient information (adding \(\nabla\rho\) does not help) but due to missing non-local correlation.

Q4.7 [S] For each of the following materials, state the most appropriate class of XC functional and justify your choice: (a) bcc iron; (b) a hybrid organic–inorganic perovskite with a band gap of \(\sim 1.6\) eV; (c) a weakly correlated simple metal (Al); (d) a Mott insulator (VO\(_2\)).

**Answer Hint.**

(a) bcc Fe — spin-polarised GGA (PBE or PBEsol): Fe is a transition metal with itinerant magnetism; GGA captures the exchange splitting adequately, and the \(3d\) states are not as strongly correlated as in oxides. DFT+U is sometimes added for oxides but is not standard for pure Fe.

(b) Hybrid perovskite — range-separated hybrid (HSE06): Band gap is in the semiconductor regime where GGA underestimates by \(\sim 40%\). PBE0 is too expensive for large unit cells; HSE06 provides similar accuracy with tractable cost by limiting exact exchange to short range. DFT-D3 should also be added for vdW stacking in layered perovskites.

(c) Al — LDA or GGA: Al is a nearly-free-electron metal; its electron density is close to uniform, so LDA’s errors cancel well. PBE slightly overbinds but both work. Hybrids add cost with no accuracy benefit for nearly free-electron metals.

(d) VO\(_2\) — DFT+U or hybrid (HSE06): V \(3d\) states are localised and GGA fails to reproduce the Mott gap. DFT+U with \(U_{\rm eff} \sim 3\)–\(4\) eV on V \(d\) is the standard workhorse; HSE06 is an alternative if a parameter-free approach is needed but at higher cost.

Chapter 5 — Basis Sets and Pseudopotentials

Q5.1 [S] What is the kinetic energy cutoff \(E_{\rm cut}\) in a plane-wave calculation? Write the condition that a reciprocal lattice vector \(\mathbf{G}\) must satisfy to be included in the basis.

**Answer Hint.** The plane-wave basis includes all \\(\mathbf{G}\\) satisfying

\[ \frac{1}{2}|\mathbf{k}+\mathbf{G}|^2 \leq E_{\rm cut}. \]

(atomic units; in SI replace \(1/2\) with \(\hbar^2/2m_e\)). Increasing \(E_{\rm cut}\) adds more plane waves and systematically improves the basis — a key advantage over atom-centred bases. VASP quotes \(E_{\rm cut}\) in eV; Quantum ESPRESSO in Ry (1 Ry = 13.6 eV).

Q5.2 [M] Compare norm-conserving pseudopotentials (NCPP), ultrasoft pseudopotentials (USPP), and the PAW method. What physical quantity does norm conservation enforce, and what does relaxing it require?

**Answer Hint.**

NCPP: The pseudo-wavefunction is normalised identically to the all-electron wavefunction inside \(r_c\): \(\int_0^{r_c}|\tilde\phi|^2, dr = \int_0^{r_c}|\phi|^2, dr\). This ensures correct scattering properties (logarithmic derivatives). Requires \(E_{\rm cut} \sim 60\)–\(100\) Ry.
USPP: Relaxes norm conservation, allowing smoother pseudo-wavefunctions. Missing charge is compensated by augmentation charges added to the density. Requires only \(\sim 25\)–\(40\) Ry.
PAW: Formally equivalent to an all-electron calculation via a linear transformation \(\hat{\mathcal{T}}\) that maps smooth pseudo-wavefunctions onto full oscillatory all-electron wavefunctions inside augmentation spheres. Provides access to the full all-electron density. \(E_{\rm cut} \sim 400\)–\(600\) eV for GGA+PAW. Default in VASP.

Q5.3 [S] A student runs a calculation doubling the plane-wave cutoff but finds no change in the total energy. What does this confirm, and what should be checked next?

**Answer Hint.** It confirms that the total energy is converged with respect to \\(E_{\rm cut}\\) — the basis is complete enough for this property. Next convergence checks: (1) \\(k\\)-point mesh density (especially for metals); (2) supercell size if periodic images of a defect or molecule are a concern; (3) SCF convergence threshold to ensure the self-consistency loop has actually converged; (4) force convergence for structural relaxation.

Q5.4 [S] What is basis set superposition error (BSSE), and in which type of basis does it arise? How can it be corrected?

**Answer Hint.** BSSE arises in **atom-centred (localised) basis sets**: when two atoms approach, each atom borrows basis functions from the other's basis, artificially lowering the total energy and overestimating the binding. It does *not* occur in plane-wave bases because the basis is not atom-centred. Correction: the **Boys–Bernardi counterpoise correction** — compute the energy of each fragment using the full dimer basis and subtract the fragment-only energies.

Chapter 6 — Spin-Polarised DFT

Q6.1 [S] Why must the electron density be spin-resolved \(\rho_\uparrow(\mathbf{r}), \rho_\downarrow(\mathbf{r})\) for magnetic systems? What goes wrong if spin polarisation is neglected for, say, iron?

**Answer Hint.** In a magnetic system, spin-up and spin-down electrons experience different effective potentials due to exchange splitting. A spin-unpolarised calculation forces \\(\rho_\uparrow = \rho_\downarrow\\) at every point, preventing the system from lowering its energy by spin polarisation — it constraints the variational search to a subspace that excludes the magnetic ground state. For bcc Fe, the spin-unpolarised calculation gives the wrong crystal structure (predicts fcc, because GGA without magnetism favours the more closely-packed fcc minimum) and zero magnetic moment; spin-polarised LDA/GGA correctly gives bcc with \\(\sim 2.2\,\mu_B\\) per atom and the correct cohesive energy.

Q6.2 [S] Define the spin magnetisation density \(m(\mathbf{r})\) and the local spin density approximation (LSDA). How does LSDA reduce to LDA in the non-magnetic limit?

**Answer Hint.** \\(m(\mathbf{r}) = \rho_\uparrow(\mathbf{r}) - \rho_\downarrow(\mathbf{r})\\). LSDA:

\[ E_{\rm xc}^{\rm LSDA}[\rho_\uparrow,\rho_\downarrow] = \int \varepsilon_{\rm xc}^{\rm UEG}(\rho_\uparrow(\mathbf{r}),\rho_\downarrow(\mathbf{r})),\rho(\mathbf{r}), d\mathbf{r}, \]

where \(\varepsilon_{\rm xc}^{\rm UEG}(\rho_\uparrow, \rho_\downarrow)\) is the XC energy per particle of the spin-polarised UEG, parameterised from quantum Monte Carlo (Ceperley–Alder). The spin-polarisation enters through the relative spin polarisation \(\zeta = (\rho_\uparrow - \rho_\downarrow)/\rho\), which interpolates between the unpolarised (\(\zeta=0\)) and fully polarised (\(\zeta=1\)) limits via a spin-scaling relation due to von Barth and Hedin. When \(\rho_\uparrow = \rho_\downarrow = \rho/2\) (non-magnetic limit), \(\zeta = 0\) and \(\varepsilon_{\rm xc}^{\rm UEG}\) reduces to the unpolarised UEG value — recovering LDA exactly.

Q6.3 [M] Derive the spin-dependent Kohn–Sham equations for collinear spin-polarised DFT. Show explicitly that the exchange splitting of eigenvalues arises from \(V_{\rm xc}^\uparrow \neq V_{\rm xc}^\downarrow\), and write down what \(V_{\rm xc}^\sigma\) is in LSDA.

**Answer Hint.**

Starting point: In collinear SDFT, the total energy is a functional of both spin densities:

\[ E[\rho_\uparrow,\rho_\downarrow] = T_s[\rho_\uparrow,\rho_\downarrow] + E_{\rm H}[\rho] + E_{\rm xc}[\rho_\uparrow,\rho_\downarrow] + \int V_{\rm ext}(\mathbf{r}),\rho(\mathbf{r}),d\mathbf{r}. \]

Variational minimisation with respect to \(\phi_{i\sigma}^\)* (subject to orthonormality in each spin channel) gives two independent sets of KS equations — one per spin channel \(\sigma \in {\uparrow,\downarrow}\):

\[ \left[-\frac{1}{2}\nabla^2 + V_{\rm eff}^\sigma(\mathbf{r})\right]\phi_{i\sigma}(\mathbf{r}) = \epsilon_{i\sigma},\phi_{i\sigma}(\mathbf{r}), \]

with the spin-dependent effective potential

\[ V_{\rm eff}^\sigma = V_{\rm ext} + V_{\rm H}[\rho] + V_{\rm xc}^\sigma[\rho_\uparrow, \rho_\downarrow], \]

where \(V_{\rm H}\) depends only on the total density \(\rho = \rho_\uparrow + \rho_\downarrow\) (same for both spins), and

\[ V_{\rm xc}^\sigma(\mathbf{r}) = \frac{\delta E_{\rm xc}[\rho_\uparrow, \rho_\downarrow]}{\delta \rho_\sigma(\mathbf{r})}. \]

Exchange splitting: In LSDA, \(\varepsilon_{\rm xc}^{\rm UEG}\) is an asymmetric function of \((\rho_\uparrow, \rho_\downarrow)\) whenever \(\rho_\uparrow \neq \rho_\downarrow\). Therefore \(\delta E_{\rm xc}/\delta\rho_\uparrow \neq \delta E_{\rm xc}/\delta\rho_\downarrow\), i.e., \(V_{\rm xc}^\uparrow \neq V_{\rm xc}^\downarrow\). This spin-asymmetric potential shifts the spin-up eigenvalues \(\epsilon_{i\uparrow}\) relative to spin-down \(\epsilon_{i\downarrow}\) — the exchange splitting \(\Delta_{\rm xc} = V_{\rm xc}^\uparrow - V_{\rm xc}^\downarrow\). In LSDA explicitly:

\[ V_{\rm xc}^{\sigma,\rm LSDA}(\mathbf{r}) = \varepsilon_{\rm xc}^{\rm UEG}(\rho_\uparrow,\rho_\downarrow) + \rho,\frac{\partial \varepsilon_{\rm xc}^{\rm UEG}}{\partial \rho_\sigma}\bigg|_{\mathbf{r}}. \]

The two equations are coupled only through the shared \(V_{\rm H}[\rho]\) and the LSDA XC potential — they are solved self-consistently together within the same SCF loop.

Chapter 7 — DFT+U

Q7.1 [M] Explain the self-interaction error in the context of strongly correlated systems. Why does standard GGA predict NiO to be a metal rather than a Mott insulator?

**Answer Hint.** In NiO, the Ni \\(3d\\) states are narrow and strongly localised. Standard GGA treats the on-site Coulomb repulsion \\(U\\) among \\(d\\) electrons at the mean-field level, which underestimates the Mott gap. Because the self-interaction error in GGA artificially delocalises the \\(d\\) electrons, the predicted density of states shows \\(d\\) bands straddling the Fermi level — i.e., metallic behaviour — even though experiment shows a gap of \\(\sim 4\\) eV. The DFT+U correction adds an explicit penalty for fractional occupation of the \\(d\\) subspace, driving the occupations toward integer values and opening the gap.

Q7.2 [S] Write the Dudarev DFT+U energy correction and explain the role of the occupation matrix \(\mathbf{n}^{I\sigma}\). What does the term \(\mathrm{Tr}[\mathbf{n}(1-\mathbf{n})]\) penalise?

**Answer Hint.**

\[ E_U^{\rm Dur} = \frac{U_{\rm eff}}{2}\sum_{I,\sigma}\mathrm{Tr}!\left[\mathbf{n}^{I\sigma}(\mathbf{1}-\mathbf{n}^{I\sigma})\right], \]

where \((n^{I\sigma}){mm’} = \sum_i f_i \langle\phi_i|\tilde{p}m^I\rangle\langle\tilde{p}{m’}^I|\phi_i\rangle\) is the occupation of the correlated \(d\) (or \(f\)) subspace on atom \(I\). The trace \(\mathrm{Tr}[\mathbf{n}(1-\mathbf{n})]\) is zero when all eigenvalues of \(\mathbf{n}\) are 0 or 1 (integer occupations) and positive otherwise — so the correction penalises fractional occupations, driving localisation. Only one parameter \(U{\rm eff} = U - J\) is needed, making this formulation simpler than the Liechtenstein approach.

Q7.3 [S] What is the double-counting correction in DFT+U, and why is it necessary? Name the two most common schemes.

**Answer Hint.** The standard GGA already partially includes Coulomb interactions among the correlated electrons through \\(E_{\rm H}\\) and \\(E_{\rm xc}\\). Adding \\(E_U\\) directly would count these interactions twice. The double-counting correction \\(E_{\rm dc}\\) is subtracted to avoid this. Two common schemes: (1) **Fully localised limit (FLL)**, appropriate for strongly correlated insulators; (2) **Around mean-field (AMF)**, appropriate for weakly correlated metals. Unfortunately, the exact form of \\(E_{\rm dc}\\) is not known; this ambiguity is a fundamental limitation of DFT+U.

Part II: Numerical Implementation

Chapter 8 — SCF Convergence and Density Mixing

Q8.1 [S] What is charge sloshing in a metallic DFT calculation? Identify the physical origin of the instability and state which mixing scheme was designed to suppress it.

Answer Hint. In metals, a small perturbation of the density at long wavelength (small \\(\mathbf{G}\\)) produces a large response (diverging dielectric screening as \\(G \to 0\\)). Simple linear mixing \\(\rho^{\rm in}\_{n+1} = \rho^{\rm in}\_n + \alpha(\rho^{\rm out}\_n - \rho^{\rm in}\_n)\\) amplifies these long-wavelength modes: the correction at small \\(G\\) is too large, causing the density to oscillate between iterations. **Kerker preconditioning** suppresses this by multiplying the residual by \\(G^2/(G^2 + k\_0^2)\\), damping corrections at \\(G \lt k\_0\\), where \\(k_0\\) is the Thomas–Fermi screening wavevector.

Q8.2 [M] Describe the Pulay (DIIS) mixing scheme. What quantity is minimised, and how does it use information from previous iterations?

**Answer Hint.** DIIS (Direct Inversion in the Iterative Subspace) stores the last \\(m\\) input densities \\(\{\rho_n^{\rm in}\}\\) and residuals \\(\{R_n = \rho_n^{\rm out} - \rho_n^{\rm in}\}\\). The next input is a linear combination

\[ \rho^{\rm in}{n+1} = \sum{i=1}^m \alpha_i \rho_i^{\rm in} \]

where the coefficients \({\alpha_i}\) minimise \(|\sum_i \alpha_i R_i|^2\) subject to \(\sum_i \alpha_i = 1\). This is solved by a small linear system. The method is quasi-Newton: it builds a model of the Jacobian of the residual from the stored history, achieving superlinear convergence near the fixed point, compared to linear convergence for simple mixing.

Q8.3 [S] A VASP calculation for a metallic slab fails to converge after 200 iterations with AMIX = 0.4. Suggest three modifications to the INCAR and explain the physical reasoning for each.

**Answer Hint.** (1) **Reduce `AMIX`** (e.g. to 0.02–0.1): smaller mixing step damps the sloshing, at the cost of slower convergence per iteration. (2) **Set `BMIX = 1.0`–`3.0`**: enables Kerker preconditioning, directly targeting the long-wavelength instability in the metal. (3) **Increase `MAXMIX`** (e.g. to 40–60): gives Broyden/DIIS a larger history to build a better Jacobian model, which is important for slabs that have many degrees of freedom in the charge distribution. Optionally: switch to `ALGO = All` to use subspace rotation.

Chapter 9 — Brillouin Zone Integration and Smearing

Q9.1 [S] Why do metals require special treatment in Brillouin zone integration? Write the BZ integral for the electron density and explain the origin of the numerical difficulty.

**Answer Hint.** Physical observables involve integrals of the form

\[ \langle A \rangle = \frac{\Omega}{(2\pi)^3}\sum_n \int_{\rm BZ} A_{n\mathbf{k}}, f_{n\mathbf{k}}, d\mathbf{k}, \]

where \(f_{n\mathbf{k}} = \theta(\mu - \epsilon_{n\mathbf{k}})\) is the Fermi–Dirac occupation at \(T=0\). The Heaviside function is discontinuous at the Fermi surface. On a discrete \(k\)-mesh, the Fermi surface may pass between grid points, causing the integral to oscillate wildly with mesh density. This is why insulators converge far faster than metals: there is no Fermi surface discontinuity.

Q9.2 [M] Compare Gaussian smearing, Methfessel–Paxton smearing, and the linear tetrahedron method. For which systems and properties is each recommended?

**Answer Hint.**

Gaussian: Replaces \(\theta(\mu-\epsilon)\) by a Gaussian broadened step. Simple but introduces a \(\sigma\)-dependent error in the free energy; the \(\sigma \to 0\) extrapolation can be unreliable. Use: only for quick tests.
Methfessel–Paxton (MP): Expands the step function in a Hermite polynomial series; the \(N=1\) variant gives a broadened occupation with negative oscillations that cancel the smearing error to higher order. The \(\sigma\to 0\) energy extrapolation is \(E(\sigma) = E_0 + \mathcal{O}(\sigma^{2N+2})\). Use: structural relaxations and total energies of metals with moderate \(\sigma \sim 0.1\)–\(0.2\) eV.
Tetrahedron method (ISMEAR = -5 in VASP): Linear interpolation of band energies between \(k\)-points within each tetrahedron. Exact integration in the limit of a fine mesh; no \(\sigma\) parameter. Use: DOS, optical spectra, and properties requiring high accuracy in the Fermi surface topology. Do not use for structural relaxations (forces are not analytic).

Chapter 10 — Eigenvalue Solvers

Q10.1 [S] Why is the full diagonalisation of the KS Hamiltonian impractical for large systems? What is the computational scaling of full diagonalisation, and how do iterative methods improve on this?

**Answer Hint.** The KS Hamiltonian has dimension \\(N_{\rm PW} \times N_{\rm PW}\\), where \\(N_{\rm PW}\\) grows with the system size and \\(E_{\rm cut}\\). Full diagonalisation (e.g., LAPACK) scales as \\(\mathcal{O}(N_{\rm PW}^3)\\), which becomes prohibitive for \\(N_{\rm PW} \sim 10^5\\)–\\(10^6\\). Iterative methods (Davidson, RMM-DIIS) exploit the fact that only the \\(N_{\rm occ} \ll N_{\rm PW}\\) lowest eigenpairs are needed. They build a small Krylov/Rayleigh–Ritz subspace, diagonalise within it, and refine — achieving \\(\mathcal{O}(N_{\rm PW} \times N_{\rm occ})\\) per iteration.

Q10.2 [M] Describe the Davidson diagonalisation algorithm. What is the subspace expansion step, and how does it avoid recomputing the full Hamiltonian?

**Answer Hint.** Starting from a trial subspace \\(\{|\psi_i\rangle\}\\): (1) Compute the Rayleigh quotient \\(\epsilon_i = \langle\psi_i|\hat{H}|\psi_i\rangle/\langle\psi_i|\psi_i\rangle\\) — only matrix-vector products \\(\hat{H}|\psi_i\rangle\\) are needed, not matrix elements. In a plane-wave code this is done via FFT in \\(\mathcal{O}(N_{\rm PW}\log N_{\rm PW})\\). (2) Compute residuals \\(|r_i\rangle = (\hat{H} - \epsilon_i)|\psi_i\rangle\\). (3) Apply a **preconditioner** \\(\hat{D}^{-1}\\) (diagonal in \\(\mathbf{G}\\)-space: \\(D_{\mathbf{G}} = |\mathbf{k}+\mathbf{G}|^2/2 + \epsilon_0\\)) to obtain search directions that are steepest-descent steps in kinetic-energy-weighted distance. (4) Orthogonalise and add to the subspace; diagonalise the small projected Hamiltonian; iterate. Convergence is quadratic near the solution.

Cross-Cutting Conceptual Questions

Q11.1 [M] A colleague applies DFT+GGA to a cobalt oxide (CoO) surface and reports a magnetic moment of 0 \(\mu_B\) per Co, which contradicts experiment. Walk through the chain of possible failures, from theory choice to numerical implementation.

**Answer Hint.** Work from first principles outward: (1) *Spin polarisation:* Is `ISPIN = 2` set? A spin-unpolarised calculation cannot produce a magnetic moment by construction. (2) *Initial magnetic moments:* Were reasonable initial moments provided (e.g., `MAGMOM`)? DFT can converge to a local minimum (non-magnetic) if started too close to zero. (3) *GGA self-interaction error:* For Co \\(3d\\) states, GGA may underestimate localisation. Try DFT+U with reasonable \\(U_{\rm eff} \sim 3\\)–\\(5\\) eV for Co. (4) *k\\)-mesh:* A coarse mesh for a metallic oxide surface may smear out the magnetic splitting. (5) *Surface vs. bulk:* Surface CoO may have quenched moments; compare to bulk CoO first.

Q11.2 [S] Rank the following functionals in increasing computational cost for a periodic solid with 100 atoms: LDA, PBE, SCAN, PBE0, HSE06. Briefly justify each step.

**Answer Hint.**

LDA \(<\) PBE \(\approx\) SCAN \(<\) HSE06 \(<\) PBE0.

LDA and PBE are semi-local: same cost, \(\mathcal{O}(N^3)\) dominated by diagonalisation.
SCAN is meta-GGA; needs \(\tau(\mathbf{r})\), slightly more expensive but same scaling.
HSE06 includes short-range exact exchange only: the Fock operator is localised in real space (exponential decay), so its evaluation scales \(\sim\mathcal{O}(N)\) for large systems and is tractable for 100-atom cells.
PBE0 includes long-range exact exchange: the Fock operator is non-local, requiring 4-index integrals over all reciprocal space. Scales \(\mathcal{O}(N^3)\) with a large prefactor, impractical for 100-atom periodic cells.

Q11.3 [L] A student is computing the vacancy formation energy in rutile TiO\(_2\) using VASP. List all the numerical convergence parameters that must be checked, the order in which to check them, and a physically motivated reason for each.

**Answer Hint.** In order:

\(E_{\rm cut}\): Plane-wave basis completeness. For TiO\(_2\) with PAW, converge to \(\sim 1\) meV/atom; typical value \(\sim 500\)–\(600\) eV.
\(k\)-mesh (bulk and supercell): BZ sampling. Use a \(\Gamma\)-centred mesh; the supercell mesh is coarser than bulk but must be tested independently.
Supercell size: Vacancy–vacancy interaction through periodic images. Converge the formation energy with cell size (\(2\times2\times2\), \(3\times3\times2\), etc.); electrostatic correction schemes (Freysoldt–Neugebauer–Van de Walle) may be needed for charged vacancies.
SCF convergence (EDIFF): Set to \(10^{-6}\)–\(10^{-7}\) eV for energy differences; forces need \(10^{-7}\) eV or tighter.
Ionic relaxation (EDIFFG): Forces below \(0.01\)–\(0.02\) eV/Å; the vacancy geometry relaxes significantly.
Smearing and \(\sigma\): TiO\(_2\) is a semiconductor; use ISMEAR = 0 (Gaussian) or ISMEAR = -5 (tetrahedron). Check that EENTRO \(< 1\) meV/atom.
Spin polarisation: Ti \(3d^0\) nominally; but oxygen vacancies can create \(d^1\) Ti, which is magnetic. Run spin-polarised (ISPIN = 2) to be safe.
(Optional) DFT+U: GGA underestimates the TiO\(2\) band gap (\(\sim 1.8\) vs \(3.0\) eV experimental); a \(U{\rm eff}\) on Ti \(d\) states may be needed for reliable defect levels.

Quick-Reference Formula Sheet

The following identities are cited frequently in examination answers. You should be able to derive each from first principles.

Quantity	Expression
Electron density from KS orbitals	\(\rho(\mathbf{r}) = \sum_i f_i \|\phi_i(\mathbf{r})\|^2\)
KS effective potential	\(V_{\rm eff} = V_{\rm ext} + V_{\rm H} + V_{\rm xc}\)
Spin-dependent \(V_{\rm eff}\)	\(V_{\rm eff}^\sigma = V_{\rm ext} + V_{\rm H}[\rho] + V_{\rm xc}^\sigma[\rho_\uparrow,\rho_\downarrow]\)
Hartree potential	\(V_{\rm H}(\mathbf{r}) = \int \rho(\mathbf{r}’)/
XC potential	\(V_{\rm xc}(\mathbf{r}) = \delta E_{\rm xc}[\rho]/\delta\rho(\mathbf{r})\)
Variational He trial energy	\(E(\alpha) = \alpha^2 - (2Z - 5/8)\alpha\), minimum at \(\alpha_{\rm opt} = Z - 5/16\)
LDA exchange (Dirac)	\(\varepsilon_{\rm x}^{\rm UEG} = -(3/4)(3/\pi)^{1/3}\rho^{1/3}\)
XC hole sum rule	\(\int n_{\rm xc}(\mathbf{r},\mathbf{r}‘), d\mathbf{r}’ = -1\)
Dudarev DFT+U correction	\(E_U = \frac{U_{\rm eff}}{2}\sum_{I,\sigma}\mathrm{Tr}[\mathbf{n}^{I\sigma}(\mathbf{1}-\mathbf{n}^{I\sigma})]\)
Plane-wave cutoff condition	\(\frac{1}{2}
Kerker preconditioner	\(G^2/(G^2 + k_0^2)\)
PAW density	\(\rho = \tilde{\rho} + \rho^1 - \tilde{\rho}^1\)

Keyboard shortcuts

Density Functional Theory

Density Functional Theory