Cover
Foundations of Probability & Randomness
1.1 Probability Spaces and Random Variables
A probability space is a triple \((\Omega, \mathcal{F}, P)\) where \(\Omega\) is the sample space, \(\mathcal{F}\) is a \(\sigma\)-algebra of events, and \(P: \mathcal{F} \to [0,1]\) is a probability measure satisfying \(P(\Omega) = 1\).
A random variable \(X: \Omega \to \mathbb{R}\) maps outcomes to real numbers. For a continuous random variable its probability density function (PDF) \(p(x)\) satisfies:
$$P(a \leq X \leq b) = \int_a^b p(x), dx, \qquad \int_{-\infty}^{\infty} p(x), dx = 1. \tag{1.1}$$
The cumulative distribution function (CDF) is:
$$F(x) = \int_{-\infty}^{x} p(x’), dx’. \tag{1.2}$$
Key Distributions in Physics
Gaussian (Normal): $$p(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right). \tag{1.3}$$ Arises in measurement errors, Brownian motion, and the CLT (see §1.2).
Poisson: $$P(n; \lambda) = \frac{\lambda^n e^{-\lambda}}{n!}, \quad n = 0, 1, 2, \ldots \tag{1.4}$$ Governs photon counts in a detector, nuclear decay events, and cosmic ray arrivals in a given time window.
Exponential: $$p(t) = \lambda e^{-\lambda t}, \quad t \geq 0. \tag{1.5}$$ The waiting time between successive Poisson events — equivalently, the lifetime of a radioactive nucleus with decay constant \(\lambda\).
Maxwell-Boltzmann speed distribution (to be sampled in Ch. 2): $$p(v) = 4\pi n \left(\frac{m}{2\pi k_B T}\right)^{3/2} v^2 \exp!\left(-\frac{mv^2}{2k_B T}\right). \tag{1.6}$$
Note
Radioactive Decay A sample of \(^{14}\)C has decay constant \(\lambda = 3.84 \times 10^{-12}\) s\(^{-1}\). The number of decays in a 1-second window follows a Poisson distribution. If the initial activity is \(A_0 = 10^4\) Bq, the standard deviation in the count is \(\sqrt{A_0} \approx 100\) — illustrating how counting statistics limit detector precision.
Moments and Cumulants
The mean and variance of \(X\): $$\mu = \langle X \rangle = \int x, p(x), dx, \qquad \sigma^2 = \langle (X - \mu)^2 \rangle. \tag{1.7}$$
The moment generating function \(M(t) = \langle e^{tX} \rangle\) and characteristic function \(\phi(k) = \langle e^{ikX} \rangle\) encode all moments.
1.2 The Law of Large Numbers and Central Limit Theorem
Law of Large Numbers
Let \(X_1, X_2, \ldots, X_N\) be independent, identically distributed (i.i.d.) random variables with mean \(\mu\). Define the sample mean:
$$\bar{X}N = \frac{1}{N} \sum{i=1}^N X_i. \tag{1.8}$$
The strong law of large numbers states:
$$\bar{X}_N \xrightarrow{a.s.} \mu \quad \text{as } N \to \infty. \tag{1.9}$$
This underpins every Monte Carlo estimator: replace an ensemble average by a sample average.
Central Limit Theorem
Regardless of the parent distribution \(p(x)\) (provided \(\sigma^2 < \infty\)), the distribution of \(\bar{X}_N\) converges to a Gaussian:
$$\frac{\bar{X}_N - \mu}{\sigma/\sqrt{N}} \xrightarrow{d} \mathcal{N}(0,1) \quad \text{as } N \to \infty. \tag{1.10}$$
Consequence: the statistical error of any MC estimator scales as:
$$\epsilon \sim \frac{\sigma}{\sqrt{N}}. \tag{1.11}$$
To halve the error, one must quadruple the number of samples. This \(1/\sqrt{N}\) convergence rate is universal — it does not depend on dimensionality, unlike grid-based quadrature.
Note
Why MC beats quadrature in high dimensions A regular grid with \(k\) points per dimension in \(d\) dimensions requires \(N = k^d\) points. The error scales as \(k^{-p}\) for a \(p\)-th order method, i.e. \(N^{-p/d}\). For \(d = 10\) and \(p = 4\), the error goes as \(N^{-0.4}\), slower than MC’s \(N^{-1/2}\). Above \(d \approx 8\), MC wins unconditionally.
1.3 Generating Random Numbers
True vs. Pseudo-Random Numbers
A true random number generator (TRNG) harvests physical entropy — thermal noise, radioactive decay, atmospheric noise. Pseudo-random number generators (PRNGs) are deterministic algorithms producing sequences that pass statistical tests of randomness. For scientific computing, PRNGs are preferable: they are fast, reproducible (given the seed), and well-characterised.
Linear Congruential Generators (LCGs)
$$x_{n+1} = (a, x_n + c) \bmod m. \tag{1.12}$$
The Hull–Dobell theorem gives conditions on \(a, c, m\) for full period \(m\). A classic choice (ANSI C rand): \(a = 1103515245\), \(c = 12345\), \(m = 2^{31}\).
Warning
LCGs are not suitable for serious MC work LCGs have short periods and strong lattice correlations. RANDU (\(a=65539\), \(m=2^{31}\)) places all triples \((x_n, x_{n+1}, x_{n+2})\) on 15 parallel planes — catastrophic for 3D integration. Never use it.
Mersenne Twister (MT19937)
The default PRNG in Python (numpy.random). Key properties:
- Period: \(2^{19937} - 1\) (a Mersenne prime, hence the name)
- Passes all Diehard and TestU01 statistical tests
- 623-dimensional equidistribution
- State: 624 32-bit integers
import numpy as np
rng = np.random.default_rng(seed=42) # modern API, reproducible
u = rng.uniform(0, 1, size=10_000) # 10000 uniform samples
PCG and Xoshiro Generators
Modern alternatives to MT for performance-critical code:
- PCG64: default in
numpy.random.default_rng. Better statistical properties than MT, smaller state. - Xoshiro256**: extremely fast, suitable for parallel streams.
Random Numbers on the Unit Sphere
Uniformly distributed points on \(S^2\) require:
$$\theta = \arccos(1 - 2u_1), \quad \phi = 2\pi u_2, \tag{1.13}$$
where \(u_1, u_2 \sim \mathcal{U}[0,1]\). Do not sample \(\theta\) uniformly — this clusters points at the poles.
1.4 Testing Randomness
A sequence \({x_n}\) that fails statistical tests may bias MC results. Standard tests:
Chi-Squared Uniformity Test
Divide \([0,1]\) into \(k\) equal bins with expected count \(E = N/k\) each. The test statistic:
$$\chi^2 = \sum_{i=1}^k \frac{(O_i - E)^2}{E} \tag{1.14}$$
follows a \(\chi^2\) distribution with \(k-1\) degrees of freedom under \(H_0\) (perfect uniformity). Reject if \(p < 0.05\).
Autocorrelation Test
The lag-\(\tau\) autocorrelation:
$$C(\tau) = \frac{\langle x_n x_{n+\tau} \rangle - \langle x_n \rangle^2}{\langle x_n^2 \rangle - \langle x_n \rangle^2}. \tag{1.15}$$
For a good PRNG, \(C(\tau) \approx 0\) for all \(\tau > 0\). In MCMC, non-zero \(C(\tau)\) reflects physical correlations and must be accounted for in error estimation (see Ch. 4).
Spectral Test
In \(d\) dimensions, plot \(d\)-tuples \((x_n, x_{n+1}, \ldots, x_{n+d-1})\). Ideally these fill the unit hypercube uniformly; LCGs reveal planes.
Tip
The TestU01 Library The most comprehensive battery of statistical tests is TestU01 (L’Ecuyer & Simard). It includes BigCrush (over 100 tests). MT19937 passes; many simple generators fail.
1.5 Computational Lab — Class 2
Objectives:
- Implement a simple LCG and compare its \(\chi^2\) and spectral behaviour to MT19937.
- Generate \(N = 10^5\) exponential random variates using the inverse CDF (see Ch. 2) and compare the empirical histogram to equation (1.5).
- Simulate radioactive decay of \(^{60}\)Co (\(t_{1/2} = 5.27\) yr) for 1000 atoms over 20 years using the waiting-time method. Plot the activity as a function of time and compare to the analytic result \(A(t) = A_0 e^{-\lambda t}\).
Note
Skeleton Code See Appendix A for the Python skeleton. You will need
numpy,scipy.stats, andmatplotlib.
Summary
- A Monte Carlo estimator of \(\langle f \rangle = \int f(x) p(x), dx\) is \(\hat{f} = \frac{1}{N}\sum_{i=1}^N f(x_i)\) with \(x_i \sim p(x)\).
- The error is \(\sigma/\sqrt{N}\), universal and dimension-independent.
- Use MT19937 or PCG64; never use LCGs or RANDU for scientific work.
- Always test your RNG and seed it for reproducibility.
Sampling Methods
2.1 The Sampling Problem
We want to draw samples \(x_1, \ldots, x_N\) from a target distribution \(p(x)\). If we can do this efficiently, we can estimate any expectation value:
$$\langle f \rangle = \int f(x), p(x), dx \approx \frac{1}{N} \sum_{i=1}^N f(x_i). \tag{2.1}$$
This chapter treats methods that work when \(p(x)\) is known analytically and can be sampled directly. Chapter 4 handles the harder case (MCMC) when only the unnormalised density is available.
2.2 Inverse Transform Sampling
If \(F(x) = \int_{-\infty}^x p(x’), dx’\) is the CDF and \(u \sim \mathcal{U}[0,1]\), then:
$$x = F^{-1}(u) \tag{2.2}$$
is distributed according to \(p(x)\). This follows from the probability integral transform: \(F(X) \sim \mathcal{U}[0,1]\) for any continuous \(X\).
Derivation
$$P(x \leq x_0) = P(F^{-1}(u) \leq x_0) = P(u \leq F(x_0)) = F(x_0). \tag{2.3}$$
Examples
Exponential distribution (radioactive decay lifetimes): $$F(t) = 1 - e^{-\lambda t} \implies t = -\frac{1}{\lambda}\ln(1-u) = -\frac{1}{\lambda}\ln u. \tag{2.4}$$
Since \(1 - u \sim \mathcal{U}[0,1]\) when \(u \sim \mathcal{U}[0,1]\), the last equality holds.
Cauchy distribution (resonance lineshapes, Lorentzian profiles): $$p(x) = \frac{1}{\pi}\frac{\gamma}{(x-x_0)^2 + \gamma^2} \implies x = x_0 + \gamma \tan!\left(\pi\left(u - \tfrac{1}{2}\right)\right). \tag{2.5}$$
Planck spectrum (photon energy sampling in radiative transfer):
The Planck energy density \(u(\nu) \propto \nu^3/(e^{h\nu/k_BT}-1)\) has no closed-form CDF inverse, but can be sampled via the method in §2.3 or by a dedicated algorithm (Baluev 2014).
Note
Stellar Atmosphere Simulation Monte Carlo codes for stellar atmosphere modelling (e.g., CMFGEN, PHOENIX) sample photon packets from the Planck distribution at each atmospheric layer. The inverse CDF for the Planck spectrum is approximated numerically, then applied millions of times per model.
2.3 Rejection Sampling (von Neumann Method)
When \(F^{-1}\) is unavailable, find a proposal distribution \(q(x)\) and constant \(M\) such that:
$$p(x) \leq M, q(x) \quad \forall x. \tag{2.6}$$
Algorithm:
- Draw \(x \sim q(x)\).
- Draw \(u \sim \mathcal{U}[0,1]\).
- If \(u \leq p(x) / (M q(x))\), accept \(x\); else reject and return to 1.
The acceptance rate is \(1/M\), so we want \(M\) as small as possible — the tightest envelope.
Note
Why This Works The joint distribution of accepted \((x, u)\) pairs is uniform under the curve \(p(x)\). Marginalising over \(u\) gives \(p(x)\).
Example: Maxwell-Boltzmann Speed Distribution
The 3D speed distribution (eq. 1.6) is awkward to invert. With proposal \(q(v) = \text{Gamma}(3/2, v_p)\) (where \(v_p = \sqrt{2k_BT/m}\)), one can achieve \(M \approx 1.1\), giving a 91% acceptance rate.
Note
Molecular Dynamics Initialisation MD simulations of noble gases or Lennard-Jones fluids initialise particle speeds by rejection-sampling the Maxwell-Boltzmann distribution. For argon at 300 K, \(v_p \approx 352\) m/s and the most probable speed drawn should match this within \(\sim 1%\) for \(N = 10^4\) particles.
2.4 Importance Sampling
The idea: instead of sampling from \(p(x)\), sample from a proposal \(q(x)\) that concentrates where the integrand \(f(x)p(x)\) is large, and reweight:
$$\langle f \rangle = \int f(x), p(x), dx = \int f(x), \frac{p(x)}{q(x)}, q(x), dx \approx \frac{1}{N}\sum_{i=1}^N f(x_i), w(x_i), \tag{2.7}$$
where \(x_i \sim q(x)\) and the importance weight is:
$$w(x) = \frac{p(x)}{q(x)}. \tag{2.8}$$
Optimal Proposal Distribution
The variance of the estimator (2.7) is minimised when:
$$q^*(x) = \frac{|f(x)|, p(x)}{\int |f(x’)|, p(x’), dx’}, \tag{2.9}$$
i.e., \(q^*(x) \propto |f(x)|, p(x)\). This is rarely available in closed form but guides the choice of \(q\).
Variance Reduction
Define the naive variance \(\sigma_0^2 = \text{Var}{p}[f]\) and the importance-sampled variance \(\sigma{IS}^2\). The variance reduction factor is \(\sigma_0^2 / \sigma_{IS}^2\).
Note
Tail Probabilities in Particle Detectors Estimating the probability of a particle depositing energy \(E > E_{\text{threshold}}\) in a thin detector requires sampling the far tail of \(p(E)\). Naive MC is hopelessly inefficient: if \(P(E > E_{\text{th}}) = 10^{-6}\), one needs \(\sim 10^8\) events for even 0.1% precision. Importance sampling with a shifted exponential proposal reduces this to \(\sim 10^4\) events — a factor \(10^4\) speedup.
Note
Virial Coefficients of Dense Gases The second virial coefficient of a gas with pair potential \(\phi(r)\) is: $$B_2(T) = -2\pi \int_0^\infty (e^{-\beta\phi(r)} - 1), r^2, dr.$$ For a hard-sphere potential, the integrand is concentrated near \(r = \sigma\). Using \(q(r) \propto r^2 e^{-r/\sigma}\) as proposal reduces variance by a factor \(\sim 5\) compared to uniform sampling.
2.5 Stratified and Quasi-Monte Carlo Sampling
Stratified Sampling
Divide the integration domain into \(k\) strata \(S_1, \ldots, S_k\). Draw \(n_j\) samples from each stratum and combine:
$$\hat{I} = \sum_{j=1}^k \frac{|S_j|}{n_j} \sum_{i \in S_j} f(x_i). \tag{2.10}$$
Optimal allocation: \(n_j \propto |S_j| \sigma_j\) where \(\sigma_j\) is the standard deviation of \(f\) within stratum \(j\). Variance is always less than or equal to naive MC.
Quasi-Monte Carlo (QMC)
Replace pseudorandom sequences with low-discrepancy sequences that fill space more uniformly:
Halton sequence (base \(b\) van der Corput sequence): $$h_n^{(b)} = \sum_{k=0}^K d_k b^{-(k+1)}, \tag{2.11}$$ where \(d_k\) are the digits of \(n\) in base \(b\).
Sobol sequences: higher-dimensional generalisations with better uniformity.
QMC achieves error \(\mathcal{O}((\log N)^d / N)\) for smooth functions, beating MC’s \(1/\sqrt{N}\) for moderate \(d\).
Warning
QMC sequences are deterministic and do not admit straightforward error estimation. Use randomised QMC (scrambled Sobol) to recover statistical error bars.
Note
Classical Partition Functions The configurational partition function of a \(d\)-dimensional harmonic oscillator lattice: $$Z = \int e^{-\beta \sum_{i} \frac{1}{2}k x_i^2}, d^N x$$ can be evaluated exactly, making it a useful benchmark. QMC (Sobol) converges in \(N = 10^4\) evaluations to the same accuracy as plain MC with \(N = 10^6\) for \(d = 20\).
2.6 Computational Lab — Class 4
Objectives:
-
Sample the Maxwell-Boltzmann speed distribution for N₂ at 300 K using rejection sampling. Plot the histogram and compare to equation (1.6). Compute \(\langle v \rangle\), \(\langle v^2 \rangle^{1/2}\), and \(v_{\rm mp}\) from your samples.
-
Use importance sampling to estimate the integral $$I = \int_0^\infty x^4 e^{-x}, dx = 24$$ using the proposal \(q(x) = e^{-x}\). Compare the variance to naive uniform-domain MC.
-
Compare stratified MC, Sobol QMC, and plain MC for integrating \(f(x,y) = \sin(\pi x)\sin(\pi y)\) over \([0,1]^2\). Plot error vs. \(N\) on a log-log scale.
Note
Skeleton Code Sobol sequences are available in
scipy.stats.qmc.Sobol. See Appendix A.
Summary
| Method | When to use | Error scaling |
|---|---|---|
| Inverse CDF | Closed-form \(F^{-1}\) exists | \(1/\sqrt{N}\) |
| Rejection sampling | \(p(x)\) bounded above by \(Mq(x)\) | \(1/\sqrt{N}\) |
| Importance sampling | Integrand concentrated in a small region | \(1/\sqrt{N}\) (reduced variance) |
| Stratified MC | Smooth integrands, known structure | Better than \(1/\sqrt{N}\) |
| Quasi-MC | Smooth, moderate dimension | \(\sim (\log N)^d / N\) |
Monte Carlo Integration
3.1 The Basic Estimator
Given an integral \(I = \int_\Omega f(\mathbf{x}), d\mathbf{x}\) over a domain \(\Omega \subset \mathbb{R}^d\) with volume \(|\Omega|\), draw \(\mathbf{x}_i \sim \mathcal{U}(\Omega)\) and estimate:
$$\hat{I} = \frac{|\Omega|}{N} \sum_{i=1}^N f(\mathbf{x}_i). \tag{3.1}$$
The standard error of this estimator is:
$$\sigma_{\hat{I}} = \frac{|\Omega|}{\sqrt{N}} \sqrt{\frac{1}{N-1}\sum_{i=1}^N \left(f(\mathbf{x}_i) - \bar{f}\right)^2}. \tag{3.2}$$
This is the fundamental MC integration formula. The error scales as \(N^{-1/2}\) regardless of dimension — the key advantage over deterministic quadrature in high dimensions.
Note
Estimating π \(\pi = 4 \times P(\text{point in unit circle})\). Draw \(N\) points uniform in \([-1,1]^2\), accept those with \(x^2 + y^2 \leq 1\): $$\hat{\pi} = 4 \times \frac{N_{\rm in}}{N}.$$ For \(N = 10^6\): error \(\approx 4\sigma/\sqrt{N} \approx 0.002\). This is pedagogically useful but inefficient — better methods exist.
3.2 Variance Reduction Techniques
3.2.1 Antithetic Variates
If \(f\) is monotone, the pairs \((f(u), f(1-u))\) are negatively correlated. Replace naive estimator with:
$$\hat{I}{\rm AV} = \frac{1}{N}\sum{i=1}^{N/2} \frac{f(u_i) + f(1-u_i)}{2}. \tag{3.3}$$
The variance is reduced by a factor \((1 + \rho)/2\) where \(\rho < 0\) is the correlation between \(f(u)\) and \(f(1-u)\).
3.2.2 Control Variates
Choose a function \(g(\mathbf{x})\) with known mean \(\mu_g = \int g(\mathbf{x})p(\mathbf{x}),d\mathbf{x}\) and high correlation with \(f\). Define:
$$h(\mathbf{x}) = f(\mathbf{x}) - c(g(\mathbf{x}) - \mu_g). \tag{3.4}$$
Since \(\langle h \rangle = \langle f \rangle\), we can estimate \(\langle f \rangle\) via \(\hat{h}\). The optimal coefficient is:
$$c^* = \frac{\text{Cov}(f, g)}{\text{Var}(g)}, \tag{3.5}$$
giving variance reduction factor \((1 - \rho_{fg}^2)\). If \(|\rho_{fg}| = 0.99\), variance drops by \(\sim 10^4\).
Note
Second Virial Coefficient of Hard Spheres For the pair potential \(\phi(r) = 0\) for \(r > \sigma\) and \(\infty\) for \(r < \sigma\): $$B_2 = -2\pi \int_0^\infty (e^{-\beta\phi(r)} - 1), r^2, dr = \frac{2\pi\sigma^3}{3}.$$ This is analytically known. When estimating \(B_2\) for a soft potential (e.g., Lennard-Jones), use the hard-sphere \(B_2\) as a control variate: \(g(r) = \Theta(\sigma - r)\). This can reduce variance by a factor of 3–5 near the Boyle temperature.
3.2.3 Rao-Blackwellisation
Analytically integrate out some variables and MC-sample only the remainder. If \(f(\mathbf{x}) = f(x_1, x_2)\) and \(\int f(x_1, x_2) p_2(x_2),dx_2 = \tilde{f}(x_1)\) is tractable:
$$\hat{I} = \frac{1}{N}\sum_{i=1}^N \tilde{f}(x_{1,i}), \quad \text{Var}[\tilde{f}(X_1)] \leq \text{Var}[f(X_1, X_2)]. \tag{3.6}$$
3.3 Multi-Dimensional Integration
The Curse of Dimensionality
For a \(d\)-dimensional integral with regular quadrature using \(k\) points per dimension:
| Method | Error | Points needed for \(\epsilon = 10^{-3}\), \(d = 10\) |
|---|---|---|
| Trapezoidal | \(k^{-2}\) | \(10^{15}\) |
| Simpson’s | \(k^{-4}\) | \(10^{7.5}\) |
| Monte Carlo | \(N^{-1/2}\) | \(10^6\) |
MC requires \(N = \sigma^2/\epsilon^2\) samples — independent of \(d\).
Phase Space Integrals in Particle Physics
The \(n\)-body phase space integral for a decay \(A \to 1 + 2 + \cdots + n\) involves a \(3n - 4\) dimensional integral. For \(n = 5\) (11-dimensional), MC is the only practical approach. Tools like RAMBO or VEGAS are standard.
VEGAS algorithm (Lepage 1978): adaptive importance sampling that iteratively refines a grid to concentrate points where \(|f|\) is large. Widely used in particle physics for cross-section calculations.
Note
Atomic Form Factors The electron density form factor for atomic hydrogen: $$f(\mathbf{q}) = \int |\psi_{1s}(\mathbf{r})|^2 e^{i\mathbf{q}\cdot\mathbf{r}}, d^3r = \frac{1}{(1 + q^2 a_0^2/4)^2}$$ is analytically known. For multi-electron atoms, the 3\(Z\)-dimensional integral \(\int |\Psi(\mathbf{r}_1,\ldots,\mathbf{r}_Z)|^2 e^{i\mathbf{q}\cdot\mathbf{r}_1}, d^{3Z}r\) requires MC. For iron (\(Z=26\)), this is a 78-dimensional integral.
3.4 Estimating Partition Functions
The canonical partition function: $$Z = \int e^{-\beta H(\mathbf{q},\mathbf{p})}, d\mathbf{q}, d\mathbf{p} \tag{3.7}$$
is a \(6N\)-dimensional integral for \(N\) particles. Direct MC evaluation is only possible for small systems. In practice, we compute ratios and differences of free energies (see Ch. 7).
Thermodynamic Averages
Any observable \(A\) in the canonical ensemble:
$$\langle A \rangle = \frac{\int A(\mathbf{x}), e^{-\beta H(\mathbf{x})}, d\mathbf{x}}{\int e^{-\beta H(\mathbf{x})}, d\mathbf{x}}. \tag{3.8}$$
Naive MC estimates numerator and denominator separately — both are exponentially small in \(N\) and heavily dominated by rare configurations. Importance sampling with \(p(\mathbf{x}) \propto e^{-\beta H(\mathbf{x})}\) is the solution — this is the Metropolis algorithm (Ch. 4).
Note
Configurational Energy of Argon For 108 argon atoms in a cubic box at 85 K (near the triple point), the configurational partition function is a \(324\)-dimensional integral. Naive MC requires \(\sim 10^{100}\) samples to find the relevant configuration space. Metropolis sampling (Ch. 4) visits configurations according to their Boltzmann weight and makes the problem tractable with \(\sim 10^6\) steps.
3.5 Error Analysis for MC Integrals
Standard Error and Confidence Intervals
Given \(N\) samples \({f_i}\):
$$\hat{I} = \frac{1}{N}\sum_i f_i, \qquad s^2 = \frac{1}{N-1}\sum_i (f_i - \hat{I})^2, \qquad \sigma_{\hat{I}} = \frac{s}{\sqrt{N}}. \tag{3.9}$$
A 95% confidence interval: \(\hat{I} \pm 1.96,\sigma_{\hat{I}}\).
Multiple Independent Runs
Run \(M\) independent simulations, each giving \(\hat{I}_k\). The grand estimate and its error:
$$\bar{I} = \frac{1}{M}\sum_{k=1}^M \hat{I}k, \qquad \sigma{\bar{I}} = \frac{s_M}{\sqrt{M}}. \tag{3.10}$$
This is the most robust approach when \(M \geq 5\).
Warning
Do not underestimate your error Publishing MC results without statistical errors is unacceptable. Always report \(\hat{I} \pm \sigma_{\hat{I}}\) and state \(N\). The error should decrease as \(N^{-1/2}\) — if it does not, check for bugs or autocorrelation (relevant in MCMC, Ch. 4).
3.6 Computational Lab — Class 6
Objectives:
-
Compute \(B_2(T)\) for a Lennard-Jones gas with \(\epsilon/k_B = 120\) K and \(\sigma = 3.4\) Å (argon parameters) at \(T = 300\) K using plain MC, antithetic variates, and importance sampling. Compare the variance of the three estimators.
-
Estimate the configurational energy \(\langle U \rangle\) of 4 hard spheres in a 2D box using uniform MC sampling. How many samples are needed before a single configuration with no overlaps is found? This motivates the Metropolis algorithm.
-
Use VEGAS (via
scipy.integrate.quadorvegaspackage) to compute a 4D integral of your choice. Compare to Sobol QMC.
Summary
- The MC estimator has error \(\sigma/\sqrt{N}\), dimension-independent.
- Antithetic variates, control variates, and importance sampling all reduce variance without changing the estimator’s mean.
- Multi-dimensional integrals arising in statistical mechanics and particle physics are the natural domain of MC.
- Always quote statistical errors.
Markov Chain Monte Carlo
4.1 The Idea Behind MCMC
In statistical mechanics we want averages of the form:
$$\langle A \rangle = \frac{\sum_{\mathbf{x}} A(\mathbf{x}), e^{-\beta H(\mathbf{x})}}{\sum_{\mathbf{x}} e^{-\beta H(\mathbf{x})}}. \tag{4.1}$$
We cannot enumerate all microstates \(\mathbf{x}\) (there are \(\sim 2^N\) for a spin system). We cannot directly draw from \(p(\mathbf{x}) \propto e^{-\beta H(\mathbf{x})}\) because \(Z\) is unknown.
The solution: construct a Markov chain with stationary distribution \(p(\mathbf{x})\). After an initial burn-in, the chain samples from \(p(\mathbf{x})\) and we can average along its trajectory.
4.2 Markov Chains
A Markov chain is a sequence \(\mathbf{x}^{(0)}, \mathbf{x}^{(1)}, \ldots\) where the transition \(\mathbf{x}^{(t)} \to \mathbf{x}^{(t+1)}\) depends only on \(\mathbf{x}^{(t)}\). The transition kernel \(T(\mathbf{x} \to \mathbf{x}’)\) satisfies:
$$\sum_{\mathbf{x}‘} T(\mathbf{x} \to \mathbf{x}’) = 1. \tag{4.2}$$
Detailed Balance
A sufficient condition for \(p(\mathbf{x})\) to be the stationary distribution is detailed balance:
$$p(\mathbf{x}), T(\mathbf{x} \to \mathbf{x}‘) = p(\mathbf{x}’), T(\mathbf{x}’ \to \mathbf{x}). \tag{4.3}$$
This is a microscopic reversibility condition: in equilibrium, probability flux from \(\mathbf{x}\) to \(\mathbf{x}’\) equals the reverse flux.
Ergodicity
For the chain to converge to \(p\) from any initial state, we need ergodicity: every state must be reachable from every other state in a finite number of steps. This requires the chain to be:
- Irreducible: all states communicate.
- Aperiodic: no trapping in cycles.
Note
For physical systems, ergodicity can fail due to broken ergodicity — e.g., below \(T_c\) in an Ising ferromagnet, the chain can get trapped in a single magnetisation sector. Cluster algorithms (Ch. 5) partially address this.
4.3 The Metropolis–Hastings Algorithm
Proposed independently by Metropolis et al. (1953, Los Alamos) and generalised by Hastings (1970).
Setup: propose a move \(\mathbf{x} \to \mathbf{x}‘\) with probability \(Q(\mathbf{x} \to \mathbf{x}’)\). Accept with probability:
$$A(\mathbf{x} \to \mathbf{x}‘) = \min!\left(1,, \frac{p(\mathbf{x}’), Q(\mathbf{x}’ \to \mathbf{x})}{p(\mathbf{x}), Q(\mathbf{x} \to \mathbf{x}’)}\right). \tag{4.4}$$
If \(Q\) is symmetric (\(Q(\mathbf{x} \to \mathbf{x}‘) = Q(\mathbf{x}’ \to \mathbf{x})\)), this reduces to the Metropolis criterion:
$$A = \min!\left(1,, \frac{p(\mathbf{x}’)}{p(\mathbf{x})}\right) = \min!\left(1,, e^{-\beta \Delta H}\right), \tag{4.5}$$
where \(\Delta H = H(\mathbf{x}’) - H(\mathbf{x})\).
Algorithm (Metropolis):
initialise x
for t = 1 to N_steps:
propose x' from Q(x → x')
compute ΔH = H(x') - H(x)
if ΔH < 0: accept x' → x
else: accept x' with probability exp(-β ΔH)
record x
Verification of Detailed Balance
$$p(\mathbf{x}), T(\mathbf{x} \to \mathbf{x}‘) = p(\mathbf{x}), Q(\mathbf{x} \to \mathbf{x}’), \min!\left(1, \frac{p(\mathbf{x}‘)}{p(\mathbf{x})}\right) = \min(p(\mathbf{x}’), p(\mathbf{x})), Q. \tag{4.6}$$
This is symmetric in \(\mathbf{x}, \mathbf{x}’\), confirming detailed balance.
Note
Lennard-Jones Fluid For \(N\) particles in a box with pair potential \(\phi(r) = 4\epsilon[(\sigma/r)^{12} - (\sigma/r)^6]\), a standard MC move displaces one particle by \(\delta \mathbf{r}\) drawn from a uniform cube of side \(2\delta_{\max}\). The energy change \(\Delta H\) only involves the moved particle’s neighbours — \(\mathcal{O}(1)\) work per step. Tune \(\delta_{\max}\) to achieve 40–50% acceptance rate for efficient sampling.
Note
Polymer Chain Configurational Sampling A freely jointed chain of \(n\) monomers has \(3(n-1)\) degrees of freedom. A pivot move rotates a random sub-chain about a random bond by a random angle. The Metropolis criterion with the Boltzmann weight for bending energy and excluded volume efficiently samples chain configurations. This is the basis of polymer Monte Carlo in biophysics and materials science.
4.4 Gibbs Sampling
In Gibbs sampling, update each variable \(x_i\) in turn by sampling from the conditional distribution:
$$x_i^{(t+1)} \sim p(x_i \mid x_1^{(t+1)}, \ldots, x_{i-1}^{(t+1)}, x_{i+1}^{(t)}, \ldots, x_d^{(t)}). \tag{4.7}$$
No acceptance/rejection step — all proposals are accepted. Requires the conditionals to be tractable.
Note
Ising Model via Gibbs Sampling For the Ising model, the conditional distribution of spin \(s_i\) given all others is: $$P(s_i = +1 \mid {s_{j \neq i}}) = \frac{1}{1 + e^{-2\beta J \sum_j s_j}},$$ where the sum is over nearest neighbours. This is a single Bernoulli draw — \(\mathcal{O}(1)\) cost. Gibbs sampling of the Ising model is equivalent to the heat bath algorithm.
4.5 Convergence Diagnostics
The chain must reach equilibrium before measurements begin. Never skip this step.
Burn-in
Discard the first \(N_{\rm burn}\) steps. Plot the running average of an observable vs. step number; burn-in is adequate when the average has stabilised.
Autocorrelation Time
Consecutive samples are correlated. The integrated autocorrelation time:
$$\tau_{\rm int} = \frac{1}{2} + \sum_{\tau=1}^{\infty} C(\tau), \tag{4.8}$$
where \(C(\tau) = \langle s_t s_{t+\tau} \rangle_c / \langle s_t^2 \rangle_c\) is the normalised autocorrelation. The effective sample size is:
$$N_{\rm eff} = \frac{N}{2\tau_{\rm int}}. \tag{4.9}$$
The true statistical error is \(\sigma / \sqrt{N_{\rm eff}}\), not \(\sigma / \sqrt{N}\).
Warning
Critical Slowing Down Near a continuous phase transition at \(T_c\), the autocorrelation time diverges as: $$\tau_{\rm int} \sim \xi^z \sim |T - T_c|^{-\nu z},$$ where \(\xi\) is the correlation length and \(z \approx 2\) for local algorithms (Metropolis). For a 100×100 Ising lattice at \(T_c\), \(\tau_{\rm int} \sim 10^3\) MC sweeps. Cluster algorithms (Ch. 5) have \(z \approx 0.2\).
Gelman-Rubin Diagnostic
Run \(M \geq 4\) independent chains. The potential scale reduction factor:
$$\hat{R} = \sqrt{\frac{\hat{V}}{W}}, \tag{4.10}$$
where \(W\) is the within-chain variance and \(\hat{V}\) is the pooled variance estimate. Convergence: \(\hat{R} < 1.01\).
4.6 Computational Lab — Class 8
Objectives:
-
Implement the Metropolis algorithm for a 1D double-well potential \(V(x) = (x^2 - 1)^2\) at inverse temperature \(\beta = 2\). Plot the trajectory, histogram, and autocorrelation function \(C(\tau)\). Compute \(\tau_{\rm int}\).
-
Implement MC for a Lennard-Jones dimer in 2D. Compare the mean inter-particle separation to the analytic result from the pair distribution function.
-
Run the Metropolis algorithm for the 1D Ising model (\(N=50\) spins) and compute the specific heat \(C_V = \beta^2(\langle E^2 \rangle - \langle E \rangle^2)\). Compare to the exact result.
Note
The exact specific heat for the 1D Ising model (no field): \(C_V / k_B = (\beta J)^2 / \cosh^2(\beta J)\) per spin.
Summary
- MCMC constructs a Markov chain with the target distribution \(p(\mathbf{x}) \propto e^{-\beta H}\) as its stationary distribution.
- Metropolis-Hastings: detailed balance is satisfied by the acceptance criterion \(A = \min(1, e^{-\beta \Delta H})\).
- Gibbs sampling: sample each variable from its conditional; always accepted.
- Statistical errors require \(N_{\rm eff} = N / (2\tau_{\rm int})\), not \(N\).
- Use Gelman-Rubin \(\hat{R}\) and autocorrelation plots to verify convergence.
The Ising Model & Statistical Mechanics
5.1 The Ising Model
The Ising model is the workhorse of computational statistical mechanics. Despite its simplicity, it captures the physics of ferromagnetic phase transitions, liquid-gas criticality, binary alloys, and lattice gases.
Hamiltonian
$$H = -J \sum_{\langle i,j \rangle} s_i s_j - h \sum_i s_i, \qquad s_i = \pm 1. \tag{5.1}$$
Here \(\langle i,j \rangle\) denotes nearest-neighbour pairs, \(J > 0\) is the ferromagnetic coupling, and \(h\) is an external field.
Key observables:
$$M = \frac{1}{N}\sum_i s_i \quad (\text{magnetisation per spin}), \tag{5.2}$$ $$E = \langle H \rangle / N \quad (\text{energy per spin}), \tag{5.3}$$ $$C_V = \frac{\partial \langle E \rangle}{\partial T} = \frac{\beta^2}{N}(\langle H^2 \rangle - \langle H \rangle^2), \tag{5.4}$$ $$\chi = \frac{\partial \langle M \rangle}{\partial h}\bigg|_{h=0} = \frac{\beta}{N}(\langle M^2 N^2\rangle - \langle MN\rangle^2). \tag{5.5}$$
Exact Results
1D Ising (Ising 1925): no phase transition at finite \(T\). Exact free energy per spin: $$f = -k_B T \ln\left(e^{\beta J} \cosh(\beta h) + \sqrt{e^{2\beta J}\sinh^2(\beta h) + e^{-2\beta J}}\right). \tag{5.6}$$
2D Ising, zero field (Onsager 1944): exact solution gives a phase transition at: $$k_B T_c = \frac{2J}{\ln(1+\sqrt{2})} \approx 2.269,J. \tag{5.7}$$
The exact free energy, magnetisation, and specific heat are known analytically. This makes the 2D Ising model the ideal benchmark for any MC code.
Note
Ferromagnetic Transition in Iron Iron has \(T_c \approx 1044\) K. Near \(T_c\), the spontaneous magnetisation vanishes as \(M \sim |T - T_c|^\beta\) with \(\beta \approx 0.326\) (3D Ising universality class). The 3D Ising model belongs to this class; its \(T_c\) and critical exponents are accessible via MC.
Note
Binary Alloys: Order-Disorder Transitions The Cu-Zn (brass) alloy system undergoes a B2 → A2 order-disorder transition at \(\sim 470\)°C. The two sub-lattices in the BCC structure map exactly onto an Ising model with \(s_i = +1\) (Cu) or \(-1\) (Zn). The MC-computed transition temperature and specific heat anomaly match experiment.
5.2 Metropolis Algorithm for the Ising Model
One Monte Carlo sweep = \(N\) attempted single-spin flips (each spin on average visited once).
Single-spin flip update:
- Select spin \(i\) uniformly at random.
- Compute the energy change if \(s_i \to -s_i\): $$\Delta H = 2J s_i \sum_{j \in \partial i} s_j + 2h s_i. \tag{5.8}$$
- Accept with probability \(\min(1, e^{-\beta \Delta H})\).
Tip
Lookup Table Optimisation For the 2D square lattice with no field, \(\Delta H\) takes only 5 distinct values: \({-8J, -4J, 0, 4J, 8J}\). Pre-compute the acceptance probabilities in a table. This gives a \(\sim 3\times\) speedup over computing exponentials on the fly.
Measuring Physical Quantities
Average over \(N_{\rm meas}\) uncorrelated measurements (separated by \(\sim 2\tau_{\rm int}\) sweeps):
for t in range(N_sweeps):
for i in range(N_spins): # one sweep
flip spin i with Metropolis acceptance
if t > N_burn and t % delta_t == 0:
record E, M
compute mean, variance, error
Finite-Size Scaling
For a finite lattice of linear size \(L\), the apparent critical temperature \(T_c(L)\) shifts as:
$$T_c(L) = T_c(\infty) + a L^{-1/\nu}, \tag{5.9}$$
where \(\nu\) is the correlation length exponent (\(\nu = 1\) in 2D). The susceptibility and specific heat peak scale as:
$$\chi_{\max}(L) \sim L^{\gamma/\nu}, \qquad C_{V,\max}(L) \sim L^{\alpha/\nu}. \tag{5.10}$$
Note
2D Ising Critical Exponents Exact Onsager values: \(\nu = 1\), \(\gamma = 7/4\), \(\alpha = 0\) (log divergence), \(\beta = 1/8\). These are landmarks to verify your MC code against.
5.3 Phase Transitions and Critical Slowing Down
Near \(T_c\), spatial correlations grow as \(\xi \sim |T - T_c|^{-\nu}\). The autocorrelation time:
$$\tau \sim \xi^z \sim |T - T_c|^{-\nu z}. \tag{5.11}$$
For Metropolis (local updates), \(z \approx 2.17\) in 2D. On a \(256 \times 256\) lattice at \(T_c\), \(\tau_{\rm int} \sim 10^4\) sweeps — rendering precision measurements impractical without cluster moves.
Note
Liquid-Gas Criticality The critical point of CO₂ (\(T_c = 304.1\) K, \(P_c = 73.8\) bar) belongs to the 3D Ising universality class. MC simulations of the lattice gas model — where \(s_i = +1\) means “occupied” and \(s_i = -1\) means “empty” — reproduce the critical density fluctuations and coexistence curve with \(\beta = 0.326\), in agreement with experiment.
5.4 Cluster Algorithms
Wolff Algorithm
- Choose a seed spin \(i\) at random.
- Add neighbouring spin \(j\) to the cluster if \(s_j = s_i\) with probability \(p = 1 - e^{-2\beta J}\).
- Repeat for all cluster boundary spins (breadth-first or depth-first).
- Flip all spins in the cluster.
The Wolff algorithm satisfies detailed balance and has dynamic exponent \(z \approx 0.25\) in 2D — nearly 10× smaller than Metropolis near \(T_c\).
Swendsen-Wang Algorithm
Build all clusters simultaneously using the Fortuin-Kasteleyn mapping; colour each cluster independently. Less efficient per step than Wolff but offers different statistical properties.
Tip
When to use cluster vs. Metropolis
- Below \(T_c\): Metropolis is efficient (small \(\tau_{\rm int}\), small clusters formed in Wolff).
- Near \(T_c\): Wolff dramatically outperforms Metropolis.
- With an external field: Wolff is inefficient (clusters don’t flip easily). Use Metropolis.
5.5 Extensions: Potts Model and Lattice Gas
q-State Potts Model
Generalise the Ising model to \(q\) states:
$$H = -J \sum_{\langle i,j \rangle} \delta_{\sigma_i, \sigma_j}, \qquad \sigma_i \in {1, 2, \ldots, q}. \tag{5.12}$$
For \(q = 2\) this is the Ising model. For \(q = 3\): second-order transition in 2D. For \(q \geq 5\) in 2D: first-order transition.
Note
Grain Growth in Polycrystalline Materials The \(q\)-state Potts model (large \(q\)) is a standard model for grain microstructure in metals. Each grain orientation corresponds to a Potts state. MC simulations with \(q = 48\) and \(N \sim 10^6\) sites reproduce the experimentally observed grain size distribution and von Neumann law for grain growth.
Lattice Gas
Map \(s_i = (1 + n_i)/2\) where \(n_i \in {0, 1}\) is the occupancy. The Ising Hamiltonian becomes:
$$H_{\rm LG} = -4J \sum_{\langle i,j \rangle} n_i n_j - \mu \sum_i n_i + \text{const.} \tag{5.13}$$
This maps the liquid-gas transition to the ferromagnetic Ising transition.
5.6 Computational Lab — Class 10
Objectives:
-
Implement the 2D Ising model (\(L = 32, 64\)) with periodic boundary conditions. Using Metropolis, measure \(\langle |M| \rangle\), \(\chi\), and \(C_V\) as a function of \(T \in [1.5J, 3.5J]\). Locate \(T_c\) and compare to the Onsager value.
-
Implement the Wolff cluster algorithm. Compare \(\tau_{\rm int}(M)\) for Metropolis vs. Wolff at \(T = T_c\) for \(L = 32\). Plot \(\tau_{\rm int}\) as a function of \(L\) and extract \(z\).
-
Perform a finite-size scaling collapse of \(\chi(T, L)\) using the 2D Ising exponents. Do the curves for \(L = 16, 32, 64\) collapse onto a single universal curve?
Note
Skeleton Code A minimal Ising Metropolis implementation in NumPy runs \(\sim 10^7\) steps/second on a modern CPU. For \(L = 64\) (4096 spins), this gives \(\sim 2400\) sweeps/second — enough for the lab in minutes.
Summary
- The Ising model provides exact benchmarks (Onsager) and connects to diverse physical systems.
- Metropolis single-spin flips satisfy detailed balance; the energy change involves only nearest neighbours.
- Finite-size scaling reveals the universality class and allows extraction of critical exponents.
- Wolff cluster algorithm: dynamic exponent \(z \approx 0.25\) vs. \(z \approx 2.17\) for Metropolis.
- The same MC framework applies to Potts models, lattice gases, and adsorption problems.
The Heisenberg Model & Classical Spin Systems
6.1 Classical Spin Models
The Ising model restricts spins to two states. Real magnetic materials often have continuous spin degrees of freedom. Classical spin models capture this:
| Model | Spin \(\mathbf{S}_i\) | Dimensionality | Example system |
|---|---|---|---|
| Ising | \(\pm 1\) | \(n=1\) | Uniaxial magnets |
| XY | \((\cos\theta_i, \sin\theta_i)\) | \(n=2\) | Planar magnets, superfluids |
| Heisenberg | \((S_i^x, S_i^y, S_i^z)\), \( | \mathbf{S} | =1\) |
These are the \(O(n)\) models, characterised by the symmetry group of the spin space.
6.2 The Classical Heisenberg Model
Hamiltonian
$$H = -J \sum_{\langle i,j \rangle} \mathbf{S}_i \cdot \mathbf{S}_j - \mathbf{h} \cdot \sum_i \mathbf{S}_i, \quad |\mathbf{S}_i| = 1. \tag{6.1}$$
Each spin is a unit vector on \(S^2\). For \(J > 0\) (ferromagnetic) in 3D, there is a phase transition at finite \(T_c\). In 1D and 2D, the Mermin-Wagner theorem forbids continuous symmetry breaking at any \(T > 0\).
Note
Mermin-Wagner Theorem In a 2D system with a continuous symmetry (\(n \geq 2\)) and short-range interactions, there is no spontaneous symmetry breaking at any finite temperature. Long-range order is destroyed by spin-wave fluctuations. The 2D Ising model (discrete symmetry) is exempt.
Physical Examples
Note
MnO and FeF₂ MnO has a rock-salt structure with Mn²⁺ ions carrying spin \(S = 5/2\). Below the Néel temperature \(T_N = 122\) K, spins order antiferromagnetically (\(J < 0\)) in (111) planes — well described by the classical Heisenberg model. FeF₂ is a nearly ideal Ising antiferromagnet (\(T_N = 78\) K) due to strong single-ion anisotropy.
Note
Spin-Ice Materials: Dy₂Ti₂O₇ In spin-ice, magnetic moments sit on a pyrochlore lattice and are constrained to point along local ⟨111⟩ axes. The ice rule (2-in, 2-out per tetrahedron) maps to the proton disorder in water ice. MC simulations with dipolar interactions reproduce the broad specific heat anomaly, diffuse neutron scattering, and emergent magnetic monopole excitations.
6.3 MC Updates for Continuous Spins
Random Rotation on \(S^2\)
A Metropolis update must propose \(\mathbf{S}’\) from the surface of the unit sphere. Do not sample angles uniformly — this oversamples the poles.
Correct uniform sampling on \(S^2\): $$\cos\theta \sim \mathcal{U}[-1,1], \quad \phi \sim \mathcal{U}[0, 2\pi]. \tag{6.2}$$
In practice, propose a small rotation:
$$\mathbf{S}’_i = \mathbf{S}_i + \delta \mathbf{v}, \quad \text{then normalise}, \tag{6.3}$$
where \(\delta \mathbf{v}\) is drawn from a small cube. Tune \(\delta\) for 40–50% acceptance.
The energy change: $$\Delta H = -J (\mathbf{S}’i - \mathbf{S}i) \cdot \mathbf{h}{\rm eff}, \quad \mathbf{h}{\rm eff} = \sum_{j \in \partial i} \mathbf{S}_j + \mathbf{h}/J. \tag{6.4}$$
Over-Relaxation
The over-relaxation (or microcanonical) step reflects the spin about its local field direction:
$$\mathbf{S}’i = 2(\hat{\mathbf{h}}{\rm eff} \cdot \mathbf{S}i)\hat{\mathbf{h}}{\rm eff} - \mathbf{S}_i. \tag{6.5}$$
This is always accepted (\(\Delta H = 0\)) and moves the spin to the “opposite side” of its local field. Combined with Metropolis steps in ratio \(\sim 4:1\) (over-relaxation: Metropolis), it dramatically reduces autocorrelation time while maintaining ergodicity.
6.4 The XY Model and the Kosterlitz-Thouless Transition
The 2D XY model has spins \(\mathbf{S}_i = (\cos\theta_i, \sin\theta_i)\):
$$H = -J \sum_{\langle i,j \rangle} \cos(\theta_i - \theta_j). \tag{6.6}$$
Despite Mermin-Wagner (no long-range order), the 2D XY model has a remarkable topological phase transition at \(T_{\rm KT} = \pi J / 2\) (Kosterlitz-Thouless 1973, Nobel Prize 2016).
Vortices
A vortex is a topological defect where the spin winds by \(2\pi\) around a loop: $$\oint \nabla \theta \cdot d\mathbf{l} = 2\pi n, \quad n \in \mathbb{Z}. \tag{6.7}$$
Below \(T_{\rm KT}\): vortices bound in pairs (vanishing net topological charge). Algebraic long-range order.
Above \(T_{\rm KT}\): free vortices proliferate, destroying quasi-long-range order. The transition is driven by vortex unbinding.
Note
Superfluid Helium Films The 2D superfluid transition of \(^4\)He adsorbed on a substrate (e.g., graphite) belongs to the XY universality class. The superfluid stiffness \(\rho_s\) jumps discontinuously at \(T_{\rm KT}\) — the “universal jump” predicted by Nelson and Kosterlitz and confirmed by Rudnick (1978). MC simulations of the 2D XY model reproduce this jump.
Measuring the KT Transition
The helicity modulus (spin-wave stiffness): $$\Upsilon = J\left[\langle \cos(\theta_i - \theta_j) \rangle - \frac{J}{T} \langle \sin^2(\theta_i - \theta_j) \rangle\right] \tag{6.8}$$
(averaged over horizontal bonds). The universal jump condition: \(\Upsilon(T_{\rm KT}) = 2T_{\rm KT}/\pi\). This gives a sharp MC signature of \(T_{\rm KT}\).
6.5 Frustrated Spin Systems
When competing interactions cannot all be simultaneously satisfied, the system is frustrated.
Triangular Antiferromagnet
For \(J < 0\) on a triangular lattice, not all pairs of neighbours can be antiparallel. The ground state is the 120° structure; entropy remains finite at \(T = 0\). MC reveals a continuous spectrum of degenerate ground states.
Heisenberg Antiferromagnet on the Kagome Lattice
Extreme frustration — the classical ground-state degeneracy is exponentially large. The system remains disordered (a classical spin liquid) down to \(T = 0\). MC simulations show no peak in \(C_V\), a signature of the absence of order.
Note
Spin Ice: Emergent Monopoles In Dy₂Ti₂O₇, the “2-in, 2-out” ice rule is violated at finite \(T\) by topological excitations that act as magnetic monopoles (Castelnovo, Moessner & Sondhi 2008). MC simulations with dipolar interactions reproduce the monopole density, their diffusion constant, and the anomalous low-\(T\) specific heat.
6.6 Spin-Spin Correlation Functions
The spin-spin correlation function:
$$G(\mathbf{r}) = \langle \mathbf{S}0 \cdot \mathbf{S}{\mathbf{r}} \rangle \tag{6.9}$$
characterises spatial ordering. In the ordered phase: \(G(r) \to M^2\) as \(r \to \infty\). At \(T_c\): \(G(r) \sim r^{-(d-2+\eta)}\) with anomalous exponent \(\eta\).
The correlation length \(\xi\) is extracted from the exponential decay in the disordered phase:
$$G(r) \sim e^{-r/\xi} \quad (T > T_c). \tag{6.10}$$
6.7 Computational Lab — Class 11
Objectives:
-
Simulate the 3D classical Heisenberg model on an \(L = 16\) cubic lattice. Using Metropolis + over-relaxation, compute \(\langle |\mathbf{M}| \rangle\) and \(\chi\) as a function of \(T\). Compare \(T_c\) to the literature value \(T_c \approx 1.443 J/k_B\).
-
Simulate the 2D XY model on an \(L = 64\) lattice. Compute the helicity modulus \(\Upsilon(T)\) and locate \(T_{\rm KT}\). Compare to the theoretical prediction \(T_{\rm KT} = \pi J / 2 \approx 1.571 J/k_B\).
-
Visualise vortices in the 2D XY model above and below \(T_{\rm KT}\) by plotting the local winding number on each plaquette. Count free vortices as a function of \(T\).
Summary
- The classical Heisenberg model (continuous spins on \(S^2\)) requires correct uniform sampling of \(S^2\).
- Over-relaxation dramatically reduces \(\tau_{\rm int}\) at low cost.
- The 2D XY model: KT transition at \(T_{\rm KT} = \pi J/2\), driven by vortex unbinding.
- Frustration (triangular AF, kagomé, spin ice) leads to macroscopic ground-state degeneracy and exotic physics.
- Spin-spin correlation functions and helicity modulus are key MC observables.
Advanced MCMC & Free Energy Methods
7.1 The Free Energy Problem
Most MC methods sample the canonical distribution and give direct access to \(\langle A \rangle\) but not to the free energy \(F = -k_B T \ln Z\). The free energy requires the partition function \(Z\), which is not a thermal average. Yet \(F\) is central to:
- Phase equilibria (which phase is stable at a given \(T, P\)?).
- Chemical reaction rates (via the activation free energy).
- Protein folding (stability of folded vs. unfolded states).
- Alloy phase diagrams.
This chapter presents three families of methods to access \(F\).
7.2 Parallel Tempering (Replica Exchange MC)
Motivation
At low temperatures, the energy landscape has many local minima separated by high barriers. Ordinary MC takes exponentially long to escape. Parallel tempering runs multiple replicas simultaneously at different temperatures and swaps their configurations.
Algorithm
Run \(M\) replicas at temperatures \(T_1 < T_2 < \cdots < T_M\). After \(n_{\rm swap}\) MC sweeps at each temperature, attempt to swap configurations \(\mathbf{x}k \leftrightarrow \mathbf{x}{k+1}\) between adjacent replicas with acceptance probability:
$$A_{\rm swap} = \min!\left(1,, \exp!\left[(\beta_{k+1} - \beta_k)(H(\mathbf{x}k) - H(\mathbf{x}{k+1}))\right]\right). \tag{7.1}$$
Detailed balance is satisfied. High-temperature replicas explore freely; accepted swaps inject these configurations into the low-temperature replicas.
Choosing Temperatures
Target a swap acceptance rate of 20–40%. This requires overlapping energy distributions. A rule of thumb for the spacing:
$$\Delta\beta = \beta_{k+1} - \beta_k \approx \sqrt{2 / N_{\rm dof}} \cdot \beta_k, \tag{7.2}$$
where \(N_{\rm dof}\) is the number of degrees of freedom.
Note
Protein Folding Free Energy Landscapes Parallel tempering (also called Replica Exchange MD or REMD in the MD context) is the standard method for sampling biomolecular free energy landscapes. For a 50-residue peptide, 32–64 replicas spanning 280–600 K allow exploration of the folded, partially folded, and unfolded states. The free energy profile \(F(\text{RMSD from native})\) reveals the folding funnel and barriers.
Note
Nucleation in First-Order Transitions For a supersaturated Lennard-Jones liquid below \(T_c\), parallel tempering can cross the nucleation barrier (\(\sim 10^2 k_B T\)) that traps ordinary MC in the metastable phase. The critical nucleus size and nucleation rate are extracted from the sampled free energy surface.
7.3 Wang-Landau Algorithm
Instead of sampling at fixed \(T\), estimate the density of states \(g(E)\) defined by:
$$Z(T) = \int g(E), e^{-\beta E}, dE. \tag{7.3}$$
Once \(g(E)\) is known, the partition function and free energy are available at all temperatures from a single simulation.
Algorithm
- Start with \(g(E) = 1\) and a flat histogram \(H(E) = 0\. Set modification factor \(f = e^1\).
- Perform random walk in energy space. At each step, propose a new state \(E’\) and accept with: $$A = \min!\left(1, \frac{g(E)}{g(E’)}\right). \tag{7.4}$$
- Update: \(g(E) \to g(E) \times f\) and \(H(E) \to H(E) + 1\).
- When \(H(E)\) is flat (all energies visited equally), set \(f \to \sqrt{f}\) and reset \(H(E) = 0\).
- Stop when \(f < e^{10^{-8}}\) (convergence criterion).
Note
Extracting Thermodynamics From \(g(E)\), compute: $$Z(T) = \sum_E g(E) e^{-\beta E}, \quad F(T) = -k_B T \ln Z(T), \quad C_V(T) = \frac{\partial^2 (F/T)}{\partial(1/T)^2}.$$ This gives the complete thermal behaviour from a single WL run.
Note
Ising Model Density of States For the \(L = 16\) 2D Ising model (4096 energies), the Wang-Landau algorithm converges to \(g(E)\) in \(\sim 10^8\) steps. The resulting \(F(T)\) matches the Onsager exact result to \(\sim 0.1%\) across the full temperature range, including the peak in \(C_V\) at \(T_c\).
7.4 Umbrella Sampling and WHAM
The Problem
Some physical processes involve a rare event between state A and state B separated by a free energy barrier \(\Delta F^\ddagger \gg k_B T\). Ordinary MC never crosses the barrier.
Umbrella Sampling
Introduce a biasing potential \(V_{\rm bias}(\xi)\) centred on a value of the reaction coordinate \(\xi\):
$$H_{\rm biased}(\mathbf{x}) = H(\mathbf{x}) + V_{\rm bias}(\xi(\mathbf{x})). \tag{7.5}$$
Common choice: harmonic umbrella \(V_{\rm bias}(\xi) = \frac{1}{2}K(\xi - \xi_0)^2\). Run independent simulations at each \(\xi_0\) (each “window”), covering the reaction coordinate from A to B.
WHAM: Weighted Histogram Analysis Method
Combine the biased histograms from all windows into the unbiased free energy profile:
$$F(\xi) = -k_B T \ln P(\xi) + \text{const.} \tag{7.6}$$
WHAM solves a self-consistent set of equations to extract \(F(\xi)\) with optimal statistical efficiency.
Note
Solvation Free Energy of NaCl in Water The free energy of dissolving NaCl: \(\Delta F_{\rm solv} = F_{\rm ion pair in water} - F_{\rm ion pair in vacuum}\). Umbrella sampling along the Na-Cl distance \(r\) from 2.4 Å to 10 Å, with 16 windows and \(K = 40\) kcal/mol/Ų, gives the full potential of mean force. The minimum at \(r \approx 2.7\) Å (contact ion pair) and barrier at \(r \approx 4\) Å (solvent-separated pair) match experiment.
Note
Alloy Phase Diagrams via Free Energy Integration For a Cu-Au alloy, the free energy difference between ordered (L1₀) and disordered (A1) phases as a function of composition is computed by thermodynamic integration (see §7.5). The calculated phase diagram matches the experimental one to within 30 K in \(T_c\).
7.5 Free Energy Perturbation and Thermodynamic Integration
Free Energy Perturbation (FEP)
Define a parameter \(\lambda\) that interpolates between two Hamiltonians: \(H_0\) (state A) and \(H_1\) (state B).
$$\Delta F = F_B - F_A = -k_B T \ln \left\langle e^{-\beta(H_1 - H_0)} \right\rangle_A. \tag{7.7}$$
Equation (7.7) is exact but only converges when the energy distributions of A and B significantly overlap. For large \(\Delta F\), one must use intermediate states \(\lambda = 0, 0.1, 0.2, \ldots, 1.0\).
Thermodynamic Integration (TI)
$$\Delta F = \int_0^1 \left\langle \frac{\partial H(\lambda)}{\partial \lambda} \right\rangle_\lambda d\lambda. \tag{7.8}$$
Compute the integrand at discrete \(\lambda\) values by MC simulations; integrate numerically. More stable than FEP for large \(\Delta F\).
Note
Drug Binding Affinity Calculations The relative binding affinity of two drug candidates A and B to a protein target: $$\Delta\Delta F_{\rm bind} = \Delta F_{\rm bind}^B - \Delta F_{\rm bind}^A$$ is computed via a thermodynamic cycle using FEP or TI. This is standard practice in pharmaceutical MC/MD, with GROMACS and NAMD implementing TI natively. Typical accuracy: 1–2 kcal/mol.
7.6 Computational Lab — Class 12
Objectives:
-
Implement Wang-Landau for the 1D Ising model (\(N = 20\) spins). Recover \(g(E)\) and compute \(F(T)\), \(C_V(T)\), \(M(T)\). Compare to the exact transfer-matrix result.
-
Use umbrella sampling to compute the potential of mean force between two Lennard-Jones particles in 2D. Compare to the pair distribution function \(g(r)\) from a direct Metropolis simulation.
-
(Bonus) Implement parallel tempering for the 2D Ising model at \(L = 16\) with 4 temperatures bracketing \(T_c\). Monitor the swap acceptance rate and the replica diffusion in temperature space.
Summary
- Parallel tempering: multiple replicas at different \(T\); swap configurations to overcome barriers.
- Wang-Landau: iteratively flatten the energy histogram to estimate \(g(E)\); gives \(F(T)\) at all temperatures.
- Umbrella sampling + WHAM: sample rare events with a biasing potential; remove the bias analytically.
- FEP / TI: compute free energy differences via perturbation theory or numerical integration over \(\lambda\).
- All methods obey detailed balance; errors can be estimated by standard MC techniques.
Kinetic Monte Carlo
8.1 From Equilibrium to Dynamics
Standard MCMC samples the equilibrium distribution. It does not provide physical time: the MC “time” (number of sweeps) is not proportional to real time, and the sequence of configurations is not a physical trajectory.
Kinetic Monte Carlo (KMC) is different: it is a stochastic simulation of the master equation, giving a physically correct time evolution of a system between rare events (hops, reactions, emissions). KMC is the method of choice when:
- Dynamics is governed by thermally activated processes with known rates.
- The system dwells for long periods in a state, then transitions rapidly to another.
- One wants to access timescales far beyond MD (ns to hours or years).
The Master Equation
Let \(P_\alpha(t)\) be the probability of being in state \(\alpha\) at time \(t\). The master equation is:
$$\frac{dP_\alpha}{dt} = \sum_{\beta \neq \alpha} \left[ k_{\beta \to \alpha} P_\beta - k_{\alpha \to \beta} P_\alpha \right], \tag{8.1}$$
where \(k_{\alpha \to \beta}\) is the transition rate from state \(\alpha\) to state \(\beta\). KMC is an exact stochastic simulation of equation (8.1).
Note
KMC vs. MD MD integrates Newton’s equations and is limited to \(\sim\)ns timescales by the MD timestep (\(\sim\)1 fs). KMC jumps between metastable states, accessing \(\mu\)s–s timescales. The price: rates \(k_{\alpha\to\beta}\) must be provided as input.
8.2 Transition State Theory and Arrhenius Rates
The rate of a thermally activated process is given by Arrhenius theory:
$$k = \nu_0, e^{-E_a / k_B T}, \tag{8.2}$$
where \(E_a\) is the activation energy (energy barrier) and \(\nu_0\) is an attempt frequency (typically \(\sim 10^{12}\)–\(10^{13}\) s\(^{-1}\) for solids, related to the Debye frequency).
More precisely, Transition State Theory (Eyring) gives:
$$k = \frac{k_B T}{h} e^{-\Delta G^\ddagger / k_B T} = \frac{k_B T}{h} e^{\Delta S^\ddagger / k_B} e^{-\Delta H^\ddagger / k_B T}, \tag{8.3}$$
where \(\Delta G^\ddagger\), \(\Delta H^\ddagger\), \(\Delta S^\ddagger\) are the free energy, enthalpy, and entropy of activation.
Note
Surface Diffusion: Adatom Hopping on Cu(100) A Cu adatom on a Cu(100) surface hops between four-fold hollow sites. The barrier \(E_a = 0.49\) eV (from DFT/NEB) and \(\nu_0 = 3 \times 10^{12}\) s\(^{-1}\) give a hop rate at 300 K: $$k = 3\times10^{12} \times e^{-0.49/0.026} \approx 10^{-6} \text{ hops/s}.$$ At 700 K: \(k \approx 3\times10^8\) hops/s. KMC bridges these 14 decades in rate.
Computing Barriers: NEB Method
The Nudged Elastic Band (NEB) method finds the minimum energy path between two states by relaxing a chain of images connected by spring forces:
$$\mathbf{F}_i = -\nabla E(\mathbf{r}_i) + \mathbf{F}^{\rm spring}_i. \tag{8.4}$$
The saddle point (transition state) gives \(E_a\). NEB is implemented in VASP, GPAW, and ASE.
8.3 The BKL Algorithm (Bortz-Kalos-Lebowitz)
Also called the residence-time algorithm or Gillespie algorithm (Gillespie 1977 in chemistry), the n-fold way (Bortz et al. 1975 in physics).
Setup: at each state \(\alpha\), there are \(M\) possible processes with rates \(k_1, k_2, \ldots, k_M\). Define the total rate:
$$K_{\rm tot} = \sum_{j=1}^M k_j. \tag{8.5}$$
Algorithm:
- Compute \(K_{\rm tot}\) and cumulative rates \(R_j = \sum_{i \leq j} k_i\).
- Draw \(u_1 \sim \mathcal{U}[0,1]\). Execute event \(j^\) where \(R_{j^-1} < u_1 K_{\rm tot} \leq R_{j^*}\).
- Advance time by \(\Delta t = -\ln(u_2) / K_{\rm tot}\) where \(u_2 \sim \mathcal{U}[0,1]\).
- Update the system state and all affected rates.
- Repeat.
This is rejection-free: every step executes a physical event. There is no wasted computation.
Note
Physical Time in KMC The time increment \(\Delta t = -\ln u / K_{\rm tot}\) is an exponentially distributed waiting time. Summing these gives physical time in seconds (if rates are in s\(^{-1}\)). This is exact — not an approximation.
8.4 KMC for Surface Processes: Crystal Growth
A paradigmatic KMC application is thin-film deposition and crystal growth. Processes included:
| Process | Rate |
|---|---|
| Adsorption from gas phase | \(k_{\rm ads} = F \cdot \exp(-E_{\rm ads}/k_BT) / \nu_0\) |
| Terrace diffusion | \(k_{\rm diff} = \nu_0, e^{-E_{\rm diff}/k_BT}\) |
| Edge diffusion | \(k_{\rm edge} = \nu_0, e^{-E_{\rm edge}/k_BT}\) |
| Desorption | \(k_{\rm des} = \nu_0, e^{-E_{\rm des}/k_BT}\) |
| Island nucleation | \(k_{\rm nuc} = \nu_0, e^{-E_{\rm nuc}/k_BT}\) |
Note
Silicon MBE: Growth Mode Selection KMC simulations of Si(001) molecular beam epitaxy (MBE) with DFT-computed barriers reproduce the experimentally observed crossover from layer-by-layer (Frank-van der Merwe) to island (Volmer-Weber) growth as a function of deposition rate \(F\) and temperature \(T\). The critical island density \(N_x \sim (F/\nu_0)^{1/3} e^{E_{\rm diff}/3k_BT}\) is directly measurable in KMC.
Note
Thin-Film Deposition: TiN Coatings KMC models of TiN(001) growth (hard coating for cutting tools) capture the competition between N₂ dissociation, Ti surface diffusion (\(E_a = 0.3\) eV), and incorporation at step edges. Simulations over \(10^{10}\) events (physical time \(\sim\) 1 ms) reproduce the experimentally observed columnar microstructure.
8.5 KMC in Heterogeneous Catalysis
Catalytic surface reactions are ideally suited to KMC: individual elementary steps have well-defined rates; experiments access steady-state turnover frequencies (TOF); timescales are far beyond MD.
CO Oxidation on Pt(111)
The Langmuir-Hinshelwood mechanism:
$$\text{CO} + * \xrightarrow{k_{\rm ads}^{\rm CO}} \text{CO}^, \quad \text{O}_2 + 2 \xrightarrow{k_{\rm ads}^{\rm O_2}} 2\text{O}^, \quad \text{CO}^ + \text{O}^* \xrightarrow{k_{\rm rxn}} \text{CO}_2 + 2*. \tag{8.6}$$
KMC processes:
- CO adsorption/desorption (\(E_a^{\rm des} = 1.45\) eV)
- O₂ dissociative adsorption (\(E_a = 0\))
- CO diffusion (\(E_a = 0.5\) eV)
- O diffusion (\(E_a = 0.7\) eV)
- CO + O reaction (\(E_a = 1.0\) eV, Brønsted-Evans-Polanyi)
Note
Reproducing the Ertl Oscillations At certain \(T\) and \(P_{\rm CO}/P_{\rm O_2}\) ratios, the CO oxidation rate on Pt oscillates (discovered by Ertl, Nobel Prize 2007). KMC simulations on a \(128 \times 128\) Pt(110) lattice with CO, O, and surface reconstruction reproduce the kinetic oscillations, bistability, and the CO-poisoning transition — purely from elementary rate constants.
8.6 KMC for Vacancy Migration and Diffusion in Solids
Vacancy Diffusion in FCC Metals
In a pure FCC metal (e.g., Cu, Ni), a vacancy diffuses by hopping between nearest-neighbour sites. The rate for a single hop:
$$k_{\rm hop} = \nu_0, e^{-E_m / k_B T}, \tag{8.7}$$
where \(E_m\) is the migration energy. The diffusion coefficient:
$$D = a^2 k_{\rm hop} / z, \tag{8.8}$$
where \(a\) is the lattice constant and \(z\) is the coordination number.
Note
Alloy Ordering Kinetics In a binary A-B alloy, the vacancy-mediated ordering kinetics (e.g., Ni₃Al) involve dozens of distinct hop rates depending on the local chemical environment. KMC with pair interaction energies from cluster expansion (fitted to DFT) reproduces the experimentally observed ordering time at 700°C to within a factor of 2.
Note
Radioactive Transmutation Chains KMC naturally handles decay chains: ²³⁸U → ²³⁴Th → ²³⁴Pa → ²³⁴U → … → ²⁰⁶Pb. Each decay is a Poisson process with rate \(\lambda = \ln 2 / t_{1/2}\). KMC gives the exact time evolution of each isotope’s population, including correlations — more informative than the Bateman equations when starting from small atom counts (e.g., a single ²³⁸U nucleus).
8.7 Advanced KMC: Adaptive and Off-Lattice Methods
On-Lattice vs. Off-Lattice KMC
Standard KMC uses a fixed lattice. Off-lattice KMC (also: adaptive KMC, AKMC) is needed when the transition states are not known a priori (e.g., glasses, amorphous materials).
AKMC Algorithm:
- From current state, use a saddle-point search method (dimer, ART) to find transition states.
- Compute rates via harmonic TST.
- Execute one event via BKL.
- Repeat.
Note
Diffusion in Amorphous Silicon In a-Si:H (hydrogenated amorphous silicon), H atoms diffuse through a disordered network of Si-H bonds, Si-Si bonds, and voids. AKMC identifies thousands of distinct hop environments. The simulated diffusion coefficient \(D(T)\) agrees with experiment over 10 decades in \(D\).
Parallel KMC
For large systems (\(10^6\) sites), divide the lattice into domains and run KMC in each domain in parallel, synchronising at regular intervals (Synchronous Sublattice KMC, Shim & Amar 2005).
8.8 Computational Lab — Class 14
Objectives:
-
Implement lattice KMC for adatom diffusion on a 2D square lattice with 4 degenerate hop rates \(k = \nu_0 e^{-E_a/k_BT}\). Compute the mean-square displacement \(\langle r^2(t) \rangle\) and extract the diffusion coefficient \(D(T)\). Compare to \(D = k a^2 / 4\). Plot \(\ln D\) vs. \(1/T\) (Arrhenius plot).
-
Add deposition from a beam (rate \(F\) per site) and desorption. Run KMC for 10⁷ events. Visualise island nucleation and growth. Measure island density \(N_x\) as a function of \(T\) and \(F\).
-
Implement a simple 2-species reaction A + B → C on a lattice with diffusion and adsorption. Measure the steady-state TOF as a function of temperature and compare to the mean-field rate equation.
Note
Skeleton Code Use a sorted list or a Fenwick tree for efficient event selection in the BKL algorithm. See Appendix C for pseudocode. NumPy’s
np.searchsortedimplements \(\mathcal{O}(\log M)\) event selection.
Summary
- KMC solves the master equation exactly for systems with known transition rates.
- BKL (Gillespie) algorithm: rejection-free, physical time, scales as \(\mathcal{O}(M)\) or \(\mathcal{O}(\log M)\) per event.
- Rates come from Arrhenius/TST theory and DFT/NEB barrier calculations.
- Applications: surface growth (MBE, CVD), heterogeneous catalysis (CO oxidation), vacancy diffusion in alloys, decay chain kinetics.
- Off-lattice / adaptive KMC extends the method to systems with unknown transition states.
Applications, Error Analysis & Special Topics
9.1 Statistical Error Analysis in MC
Reliable error quantification is as important as the simulation itself. This section collects the essential tools.
9.1.1 Blocking Analysis
Divide \(N\) correlated MC samples into \(B\) blocks of size \(b = N/B\). Compute block averages \(\bar{A}_k\). The block variance:
$$\sigma_B^2 = \frac{1}{B(B-1)} \sum_{k=1}^B (\bar{A}_k - \bar{A})^2. \tag{9.1}$$
Plot \(\sigma_B^2\) vs. block size \(b\). When \(b \gg \tau_{\rm int}\), the block averages are independent and \(\sigma_B^2\) plateaus. This plateau gives the true statistical error.
Warning
Never trust naive variance If you ignore autocorrelations, your error bars will be underestimated by a factor \(\sqrt{2\tau_{\rm int}}\). For Metropolis near \(T_c\), this can be a factor of 100.
9.1.2 Bootstrap Resampling
Draw \(N^*\) bootstrap samples (with replacement) from the \(N\) MC data points. Compute the estimator for each bootstrap sample. The bootstrap distribution of the estimator approximates its sampling distribution.
For a function \(\theta = g(\langle A \rangle, \langle B \rangle)\) of correlated averages (e.g., \(\chi = \beta(\langle M^2 \rangle - \langle M \rangle^2)\)):
$$\hat{\sigma}\theta = \text{std}[\theta_1^, \theta_2^, \ldots, \theta{N^}^]. \tag{9.2}$$
Bootstrap handles nonlinear estimators naturally, unlike error propagation.
9.1.3 Jackknife Resampling
Leave out the \(k\)-th data point and compute the estimator on the remaining \(N-1\) points:
$$\hat{\theta}k = g({A_i}{i \neq k}). \tag{9.3}$$
The jackknife variance:
$$\sigma_{\rm JK}^2 = \frac{N-1}{N}\sum_{k=1}^N (\hat{\theta}_k - \bar{\theta})^2. \tag{9.4}$$
Jackknife is \(\mathcal{O}(N)\) faster than bootstrap for expensive estimators. It is exact for linear estimators and a good approximation for smooth nonlinear functions.
Note
Extracting Critical Exponents The Binder cumulant \(U_4 = 1 - \langle M^4 \rangle / (3 \langle M^2 \rangle^2)\) is a nonlinear function of MC data. Jackknife correctly propagates errors through the nonlinearity. For the 2D Ising model with \(L = 64\) and \(N = 10^6\) sweeps, jackknife gives \(\sigma_{U_4} \approx 0.002\), enabling precise location of \(T_c\).
9.2 Bayesian Inference and MCMC
Bayesian inference treats model parameters \(\theta\) as random variables with a posterior distribution:
$$p(\theta \mid \mathcal{D}) = \frac{p(\mathcal{D} \mid \theta), p(\theta)}{p(\mathcal{D})} \propto \mathcal{L}(\theta), \pi(\theta), \tag{9.5}$$
where \(\mathcal{L}(\theta) = p(\mathcal{D} \mid \theta)\) is the likelihood and \(\pi(\theta)\) is the prior. The posterior is precisely the distribution one wants to sample — and MCMC is the tool.
Application to Equation of State Fitting
Given MD/MC simulation data \({(T_i, P_i, V_i)}\) with uncertainties \(\sigma_i\), fit a van der Waals or Redlich-Kwong EOS:
$$p(\mathbf{r}) = \frac{Nk_BT}{V - Nb} - \frac{aN^2}{V^2}. \tag{9.6}$$
The posterior \(p(a, b \mid {P_i})\) is sampled via Metropolis. The resulting marginal posteriors give \(a) and \(b\) with full uncertainty quantification — more informative than a least-squares fit.
Note
Astrophysical Parameter Estimation The posterior distribution of cosmological parameters (\(\Omega_m, \Omega_\Lambda, H_0, \ldots\)) given CMB power spectrum data is sampled by MCMC — specifically, the COSMOMC and Cobaya codes. The 6-parameter \(\Lambda\)CDM model posterior lives in a curved, banana-shaped manifold best explored by Hamiltonian Monte Carlo.
Note
Neutron Star Equation of State Gravitational wave events (e.g., GW170817) constrain the neutron star EOS via Bayesian inference. MCMC samples the posterior over EOS parameters (nuclear symmetry energy, skewness) given the measured tidal deformabilities. Each likelihood evaluation requires solving the Tolman-Oppenheimer-Volkoff equations — making efficiency critical.
9.3 Monte Carlo in Particle Physics
Detector Simulation: Geant4
Geant4 (CERN) is the standard toolkit for simulating particle interactions with matter. It uses MC to:
- Propagate particles through detector geometry (ionisation energy loss, Bethe-Bloch formula).
- Simulate electromagnetic showers (pair production, bremsstrahlung, Compton scattering).
- Simulate hadronic showers (nuclear interactions).
Each particle track is an independent MC trajectory. The LHC experiments run \(\sim 10^9\) simulated events per year — the dominant computational cost.
Note
Calorimeter Response In the ATLAS electromagnetic calorimeter (liquid argon), an incoming 100 GeV electron produces a shower of \(\sim 10^5\) particles. Geant4 tracks each particle using MC sampling of the cross-sections for bremsstrahlung and pair production. The simulated energy deposit matches the measured 1.5% energy resolution.
Phase Space Sampling: RAMBO
For \(n\)-body final states in particle decays, the phase space measure is:
$$d\Phi_n = \prod_{i=1}^n \frac{d^3 p_i}{(2\pi)^3 2E_i} \cdot (2\pi)^4 \delta^{(4)}!\left(\sum p_i - P\right). \tag{9.7}$$
RAMBO (Kleiss, Stirling & Ellis 1985) samples this uniformly in \(3n - 4\) dimensions. For \(n = 10\) (26 dimensions), MC is the only feasible approach.
9.4 Neutron and Photon Transport
The radiative transfer equation for photon specific intensity \(I_\nu\):
$$\frac{dI_\nu}{ds} = -\kappa_\nu I_\nu + j_\nu, \tag{9.8}$$
where \(\kappa_\nu\) is the absorption coefficient and \(j_\nu\) is the emission coefficient, is solved by MC by tracking photon packets:
- Emit a photon packet with frequency \(\nu\) sampled from the source spectrum.
- Sample the free path \(\ell = -\ln u / \kappa_\nu\).
- At the interaction site: scatter (sample new direction from the phase function) or absorb.
- Repeat until the packet escapes or is absorbed.
Note
Stellar Interior Opacity MC transport codes (e.g., SEDONA, Cloudy) compute radiative opacities in stellar atmospheres by tracking \(10^6\)–\(10^7\) photon packets through a model atmosphere with thousands of atomic lines. The resulting synthetic spectrum is compared to the observed one to determine stellar parameters (\(T_{\rm eff}\), \(\log g\), metallicity).
Note
Nuclear Reactor Shielding: MCNP MCNP (Los Alamos) uses MC to simulate neutron and photon transport through reactor shielding. Neutrons are tracked through the geometry, with cross-sections sampled from the ENDF library at each interaction point. Variance reduction (importance sampling, splitting) is essential to reach the deep-penetration regime where attenuation is \(10^{-10}\) of the source.
9.5 Computational Best Practices
Reproducibility
Tip
Always do these
- Seed your RNG and record the seed in your output.
- Save the full configuration at intervals for restart.
- Version-control your code (git).
- Record all parameters (\(N, T, L, N_{\rm steps}, N_{\rm burn}\)) in the output file header.
- Report error bars on every published number.
Parallelisation
- Trivially parallel: run \(M\) independent replicas with different seeds. Combine for better statistics.
- Domain decomposition: split the lattice across CPUs; communicate boundary spins (MPI).
- GPU acceleration: for Ising/Heisenberg on large lattices, checkerboard updates allow full vectorisation. A modern GPU achieves \(\sim 10^{10}\) spin flips/second.
Performance Tips for Python
# Use vectorised NumPy, avoid Python loops over spins
# For L=64 Ising, full-lattice checkerboard update:
sublattice_A = spins[::2, ::2] # even-even sites
local_field = (np.roll(spins, 1, 0) + np.roll(spins, -1, 0) +
np.roll(spins, 1, 1) + np.roll(spins, -1, 1))[::2, ::2]
delta_E = 2 * J * sublattice_A * local_field
accept = np.random.random(sublattice_A.shape) < np.exp(-beta * delta_E)
spins[::2, ::2] = np.where(accept, -sublattice_A, sublattice_A)
For production runs, use Numba JIT or C extensions for 10–100× speedup over pure Python.
9.6 Final Project Discussion
The course concludes with student project presentations. Suggested project topics:
- 3D Ising model: phase diagram and critical exponents vs. series expansion results.
- Heisenberg model on the Pyrochlore lattice: spin ice signatures.
- KMC of CO₂ → CO + O dissociation on a Ru catalyst surface.
- Parallel tempering for a frustrated spin glass (\(J_{ij} = \pm J\) random).
- Bayesian EOS fitting to LJ simulation data.
- Wang-Landau calculation of the Potts model density of states and the order of the transition for \(q = 4, 5, 6\).
Summary
This chapter collected tools for:
- Error analysis: blocking, bootstrap, jackknife — essential for autocorrelated MC data.
- Bayesian inference: MCMC samples posterior distributions; fundamental to modern data analysis in physics.
- Particle physics: Geant4 detector simulation; phase space MC.
- Transport: photon and neutron MC; the radiative transfer equation.
- Best practices: reproducibility, parallelisation, GPU acceleration.
Note
The Road Ahead The methods in this course underlie essentially all quantitative computational physics. The natural extensions are: Hamiltonian Monte Carlo (Ch. 5 of Betancourt’s review), normalising flows for enhanced sampling, and machine-learning-accelerated MC (using neural network potentials as fast surrogates for DFT). The foundation you have built is directly applicable to all of these.
Appendix A: Python Reference for Monte Carlo
This appendix collects skeleton code for the computational labs and a reference for the key libraries.
A.1 Environment Setup
# Recommended: create a dedicated conda environment
conda create -n mc_course python=3.11
conda activate mc_course
pip install numpy scipy matplotlib numba tqdm
pip install vegas # for adaptive MC integration
A.2 Random Number Generation
import numpy as np
# Modern API: always use default_rng with a seed
rng = np.random.default_rng(seed=42)
# Uniform samples
u = rng.uniform(0, 1, size=10_000)
# Standard normal
z = rng.standard_normal(size=10_000)
# Exponential (radioactive decay lifetimes)
lam = 1.0 # decay constant
t = rng.exponential(scale=1/lam, size=10_000)
# or via inverse CDF:
t_inv = -np.log(rng.uniform(size=10_000)) / lam
# Uniform on S^2 (correct method)
cos_theta = rng.uniform(-1, 1, size=1000)
phi = rng.uniform(0, 2*np.pi, size=1000)
sin_theta = np.sqrt(1 - cos_theta**2)
x = sin_theta * np.cos(phi)
y = sin_theta * np.sin(phi)
z = cos_theta # unit vectors on sphere
A.3 Maxwell-Boltzmann Sampling (Chapter 2 Lab)
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi
def sample_maxwell_boltzmann(T, m, N, rng=None):
"""Sample N speeds from Maxwell-Boltzmann at temperature T."""
if rng is None:
rng = np.random.default_rng()
kB = 1.380649e-23
sigma = np.sqrt(kB * T / m)
# v = sqrt(vx^2 + vy^2 + vz^2), vx ~ N(0, sigma)
v = np.sqrt(np.sum(rng.normal(0, sigma, size=(N, 3))**2, axis=1))
return v
# Argon parameters
T = 300.0 # K
m = 6.634e-26 # kg (Ar mass)
N = 100_000
rng = np.random.default_rng(seed=0)
v = sample_maxwell_boltzmann(T, m, N, rng)
kB = 1.380649e-23
v_p = np.sqrt(2 * kB * T / m) # most probable speed
print(f"<v> = {np.mean(v):.1f} m/s (theory: {np.sqrt(8*kB*T/(np.pi*m)):.1f})")
print(f"v_rms = {np.sqrt(np.mean(v**2)):.1f} m/s (theory: {np.sqrt(3*kB*T/m):.1f})")
print(f"v_mp = {v_p:.1f} m/s (bin peak ~ {v[np.argmax(np.histogram(v,100)[0])]:.1f})")
A.4 Metropolis for the 1D Double-Well (Chapter 4 Lab)
import numpy as np
def double_well(x):
return (x**2 - 1)**2
def metropolis_1d(beta, N_steps, delta, x0=0.0, seed=42):
rng = np.random.default_rng(seed)
x = x0
trajectory = np.zeros(N_steps)
n_accept = 0
for t in range(N_steps):
x_new = x + rng.uniform(-delta, delta)
dE = double_well(x_new) - double_well(x)
if np.log(rng.uniform()) < -beta * dE:
x = x_new
n_accept += 1
trajectory[t] = x
return trajectory, n_accept / N_steps
beta = 2.0
traj, acc = metropolis_1d(beta, N_steps=500_000, delta=0.5)
print(f"Acceptance rate: {acc:.2f}")
A.5 2D Ising Model (Chapter 5 Lab)
import numpy as np
def ising_energy(spins, J=1.0):
"""Total energy of 2D Ising model with periodic BC."""
E = -J * np.sum(spins * (
np.roll(spins, 1, 0) + np.roll(spins, -1, 0) +
np.roll(spins, 1, 1) + np.roll(spins, -1, 1)
))
return E / 2 # avoid double-counting
def ising_metropolis(L, T, N_sweeps, J=1.0, seed=42):
rng = np.random.default_rng(seed)
beta = 1.0 / T
spins = rng.choice([-1, 1], size=(L, L))
# Pre-compute acceptance probabilities
dE_vals = np.array([-8, -4, 0, 4, 8]) * J
acc_prob = {dE: min(1.0, np.exp(-beta * dE)) for dE in dE_vals}
E_list, M_list = [], []
for sweep in range(N_sweeps):
# N random single-spin flips per sweep
for _ in range(L * L):
i, j = rng.integers(0, L, size=2)
nb_sum = (spins[(i+1)%L, j] + spins[(i-1)%L, j] +
spins[i, (j+1)%L] + spins[i, (j-1)%L])
dE = 2 * J * spins[i, j] * nb_sum
if rng.uniform() < acc_prob[dE]:
spins[i, j] *= -1
E_list.append(ising_energy(spins, J) / (L*L))
M_list.append(np.mean(spins))
return np.array(E_list), np.array(M_list)
# Example: L=32, T=2.27 (near Tc)
E, M = ising_metropolis(32, T=2.27, N_sweeps=5000)
print(f"<E>/N = {np.mean(E[1000:]):.4f} ± {np.std(E[1000:])/np.sqrt(4000):.4f}")
print(f"<|M|> = {np.mean(np.abs(M[1000:])):.4f}")
A.6 Wolff Cluster Algorithm (Chapter 5 Lab)
import numpy as np
from collections import deque
def wolff_sweep(spins, beta, J=1.0, rng=None):
if rng is None:
rng = np.random.default_rng()
L = spins.shape[0]
p_add = 1 - np.exp(-2 * beta * J)
# Pick random seed spin
i0, j0 = rng.integers(0, L, size=2)
cluster_spin = spins[i0, j0]
in_cluster = np.zeros((L, L), dtype=bool)
in_cluster[i0, j0] = True
queue = deque([(i0, j0)])
while queue:
i, j = queue.popleft()
for di, dj in [(-1,0),(1,0),(0,-1),(0,1)]:
ni, nj = (i+di)%L, (j+dj)%L
if not in_cluster[ni, nj] and spins[ni, nj] == cluster_spin:
if rng.uniform() < p_add:
in_cluster[ni, nj] = True
queue.append((ni, nj))
spins[in_cluster] *= -1
return np.sum(in_cluster) # cluster size
A.7 BKL / Gillespie KMC (Chapter 8 Lab)
import numpy as np
def kmc_adatom_diffusion(L, T, E_a, nu0, N_events, seed=42):
"""
KMC for single adatom diffusion on 2D square lattice.
Returns trajectory (positions) and times.
"""
rng = np.random.default_rng(seed)
kB = 8.617e-5 # eV/K
k_hop = nu0 * np.exp(-E_a / (kB * T))
K_tot = 4 * k_hop # 4 neighbours
x, y = L // 2, L // 2 # start at centre
t = 0.0
times = np.zeros(N_events)
positions = np.zeros((N_events, 2), dtype=int)
moves = np.array([[1,0],[-1,0],[0,1],[0,-1]])
for n in range(N_events):
# Time advance
dt = -np.log(rng.uniform()) / K_tot
t += dt
# Choose event (uniform since all rates equal)
move = moves[rng.integers(4)]
x = (x + move[0]) % L
y = (y + move[1]) % L
times[n] = t
positions[n] = [x, y]
return times, positions
# Example: Cu adatom, E_a = 0.49 eV, T = 600 K
times, pos = kmc_adatom_diffusion(100, T=600, E_a=0.49,
nu0=3e12, N_events=100_000)
# Compute MSD
origin = pos[0]
msd = np.mean((pos - pos[0])**2, axis=1)
D_fit = np.polyfit(times[100:], msd[100:], 1)[0] / 4 # 2D: MSD = 4Dt
print(f"D = {D_fit:.4e} m^2/s")
A.8 Blocking Analysis for Autocorrelated Data
import numpy as np
def blocking_analysis(data, max_block_size=None):
"""
Compute standard error as a function of block size.
Returns block_sizes, standard_errors.
"""
N = len(data)
if max_block_size is None:
max_block_size = N // 4
block_sizes = []
std_errors = []
b = 1
while b <= max_block_size:
n_blocks = N // b
blocks = data[:n_blocks*b].reshape(n_blocks, b).mean(axis=1)
se = np.std(blocks, ddof=1) / np.sqrt(n_blocks)
block_sizes.append(b)
std_errors.append(se)
b = max(b + 1, int(b * 1.5))
return np.array(block_sizes), np.array(std_errors)
# Usage:
# bs, se = blocking_analysis(magnetisation_timeseries)
# plt.semilogx(bs, se); plt.xlabel('Block size'); plt.ylabel('SE')
# True error = plateau value
Appendix B: Key Statistical Mechanics Formulas
B.1 Ensembles
| Ensemble | Fixed variables | Partition function | Free energy |
|---|---|---|---|
| Microcanonical (NVE) | \(N, V, E\) | \(\Omega(E)\) | \(S = k_B \ln \Omega\) |
| Canonical (NVT) | \(N, V, T\) | \(Z = \sum_i e^{-\beta E_i}\) | \(F = -k_BT \ln Z\) |
| Grand canonical (\(\mu\)VT) | \(\mu, V, T\) | \(\mathcal{Z} = \sum_{N,i} e^{-\beta(E_i - \mu N)}\) | \(\Omega = -k_BT \ln \mathcal{Z}\) |
| Isothermal-isobaric (NPT) | \(N, P, T\) | \(\Delta = \sum_i e^{-\beta(E_i + PV_i)}\) | \(G = -k_BT \ln \Delta\) |
B.2 Thermodynamic Relations from MC Data
Given canonical MC samples \({E_k, M_k}\):
$$\langle E \rangle = \frac{1}{N}\sum_k E_k, \qquad \langle M \rangle = \frac{1}{N}\sum_k M_k$$
$$C_V = \frac{\langle E^2 \rangle - \langle E \rangle^2}{k_B T^2} = \frac{\beta^2}{N_{\rm spins}}\left(\langle H^2 \rangle - \langle H \rangle^2\right)$$
$$\chi = \frac{\langle M^2 \rangle - \langle M \rangle^2}{k_B T} = \frac{\beta}{N_{\rm spins}}\left(\langle (MN)^2 \rangle - \langle MN \rangle^2\right)$$
$$P = \frac{Nk_BT}{V} + \frac{\langle \mathcal{W} \rangle}{3V}, \quad \mathcal{W} = -\sum_{i<j} r_{ij} \frac{\partial \phi}{\partial r_{ij}} \quad (\text{virial})$$
B.3 Ising Model Reference Values
2D Square Lattice (Onsager exact solution, \(h = 0\)):
$$k_BT_c = \frac{2J}{\ln(1+\sqrt{2})} \approx 2.2692,J$$
$$M(T) = \left[1 - \sinh^{-4}(2\beta J)\right]^{1/8} \quad (T < T_c)$$
Critical exponents (2D Ising):
| Exponent | Symbol | Value |
|---|---|---|
| Correlation length | \(\nu\) | 1 |
| Magnetisation | \(\beta\) | 1/8 |
| Susceptibility | \(\gamma\) | 7/4 |
| Specific heat | \(\alpha\) | 0 (log divergence) |
| Anomalous dimension | \(\eta\) | 1/4 |
| Dynamic (Metropolis) | \(z\) | ≈ 2.17 |
| Dynamic (Wolff) | \(z\) | ≈ 0.25 |
3D Ising critical exponents (numerical):
| Exponent | Value |
|---|---|
| \(\nu\) | 0.6301 |
| \(\beta\) | 0.3265 |
| \(\gamma\) | 1.2372 |
| \(\eta\) | 0.0364 |
B.4 Heisenberg and XY Models
3D Classical Heisenberg (\(J > 0\)): $$k_B T_c \approx 1.4432,J \quad (\text{simple cubic, MC})$$
2D XY model — Kosterlitz-Thouless transition: $$k_B T_{\rm KT} = \frac{\pi J}{2} \approx 1.5708,J$$
Universal helicity modulus jump: $$\lim_{T \to T_{\rm KT}^-} \Upsilon(T) = \frac{2}{\pi} k_B T_{\rm KT}$$
Mermin-Wagner theorem: \(T_c = 0\) for \(n \geq 2\) in \(d \leq 2\).
B.5 Lennard-Jones Fluid Reference
Lennard-Jones potential: $$\phi(r) = 4\varepsilon\left[\left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^6\right]$$
Critical point (reduced units \(T^* = k_BT/\varepsilon\), \(\rho^* = \rho\sigma^3\)): $$T_c^* \approx 1.326, \quad \rho_c^* \approx 0.316, \quad P_c^* \approx 0.129$$
Triple point: \(T_{\rm tp}^* \approx 0.694\), \(\rho_{\rm tp,liq}^* \approx 0.845\).
Argon parameters: \(\varepsilon/k_B = 119.8\) K, \(\sigma = 3.405\) Å.
B.6 Arrhenius and Transition State Theory
$$k = \nu_0, e^{-E_a/k_BT} \quad \text{(Arrhenius)}$$
$$k = \frac{k_BT}{h} e^{-\Delta G^\ddagger/k_BT} = \frac{k_BT}{h} e^{\Delta S^\ddagger/k_B} e^{-\Delta H^\ddagger/k_BT} \quad \text{(Eyring TST)}$$
Typical attempt frequency for solids: \(\nu_0 \sim 10^{12}\text{–}10^{13}\) s\(^{-1}\).
B.7 Statistical Error Formulas
| Quantity | Formula |
|---|---|
| Sample mean | \(\bar{A} = N^{-1}\sum_i A_i\) |
| Sample variance | \(s^2 = (N-1)^{-1}\sum_i(A_i - \bar{A})^2\) |
| Standard error (i.i.d.) | \(\text{SE} = s/\sqrt{N}\) |
| Integrated autocorr. time | \(\tau_{\rm int} = \frac{1}{2} + \sum_{\tau=1}^\infty C(\tau)\) |
| Effective sample size | \(N_{\rm eff} = N/(2\tau_{\rm int})\) |
| True standard error | \(\text{SE}{\rm true} = s/\sqrt{N{\rm eff}}\) |
| Jackknife variance | \(\sigma_{\rm JK}^2 = \frac{N-1}{N}\sum_k(\hat{\theta}_k - \bar{\theta})^2\) |
B.8 Free Energy Methods Reference
Free energy perturbation: $$\Delta F_{A\to B} = -k_BT \ln \langle e^{-\beta(H_B - H_A)} \rangle_A$$
Thermodynamic integration: $$\Delta F_{A\to B} = \int_0^1 \left\langle \frac{\partial H(\lambda)}{\partial\lambda} \right\rangle_\lambda d\lambda$$
Wang-Landau: \(Z(T) = \sum_E g(E) e^{-\beta E}\) from flat-histogram sampling of \(g(E)\).
Parallel tempering swap acceptance: $$A = \min!\left(1, e^{(\beta_k - \beta_{k+1})(E_k - E_{k+1})}\right)$$
Appendix C: Algorithm Pseudocode
Clean, language-agnostic pseudocode for all major algorithms in this course.
C.1 Metropolis–Hastings
INPUT: target p(x), proposal Q(x→x'), initial x, N_steps, β
OUTPUT: trajectory {x^(t)}
x ← x_0
for t = 1 to N_steps:
propose x' ~ Q(x → x')
α ← p(x') Q(x'→x) / [p(x) Q(x→x')]
u ~ Uniform(0, 1)
if u < min(1, α):
x ← x' # accept
record x^(t) ← x
For symmetric proposals (\(Q\) symmetric): \(\alpha = p(x’)/p(x) = e^{-\beta \Delta H}\).
C.2 Gibbs Sampler
INPUT: conditional distributions p(x_i | x_{-i}), initial x, N_steps
OUTPUT: trajectory {x^(t)}
for t = 1 to N_steps:
for i = 1 to d:
x_i ← sample from p(x_i | x_1,...,x_{i-1}, x_{i+1},...,x_d)
record x^(t) ← x
C.3 Ising Metropolis (Single-Spin Flip)
INPUT: L × L spin lattice, β, J, N_sweeps
OUTPUT: time series of E, M
initialise spins randomly ±1
precompute: acc[ΔE] = min(1, exp(-β ΔE)) for ΔE in {-8J,-4J,0,4J,8J}
for sweep = 1 to N_sweeps:
for k = 1 to L²:
pick spin (i,j) at random
ΔE ← 2J · s[i,j] · (s[i+1,j] + s[i-1,j] + s[i,j+1] + s[i,j-1])
u ~ Uniform(0,1)
if u < acc[ΔE]:
s[i,j] ← -s[i,j]
record E ← Σ_<ij> (-J s_i s_j), M ← Σ_i s_i / L²
C.4 Wolff Cluster Algorithm
INPUT: L × L spin lattice, β, J
OUTPUT: updated spins, cluster size
pick random seed (i₀, j₀)
cluster_spin ← s[i₀, j₀]
p_add ← 1 - exp(-2βJ)
cluster ← {(i₀, j₀)}
queue ← {(i₀, j₀)}
while queue not empty:
(i, j) ← pop from queue
for each neighbour (i', j') of (i, j):
if s[i',j'] == cluster_spin and (i',j') not in cluster:
u ~ Uniform(0,1)
if u < p_add:
add (i',j') to cluster and queue
for each (i,j) in cluster:
s[i,j] ← -s[i,j]
return |cluster|
C.5 Wang-Landau Algorithm
INPUT: system, energy range [E_min, E_max], bin width ΔE
OUTPUT: density of states g(E)
initialise: g(E) ← 1 for all E bins
H(E) ← 0 (histogram)
f ← exp(1) ≈ 2.718
flatness_criterion ← 0.8
x ← random initial configuration
E ← energy(x)
while f > exp(1e-8):
propose x' (random spin flip or move)
E' ← energy(x')
u ~ Uniform(0,1)
if u < g(E) / g(E'):
x ← x'
E ← E'
g(E) ← g(E) · f
H(E) ← H(E) + 1
if histogram is flat (min(H) > flatness_criterion · mean(H)):
f ← sqrt(f)
H(E) ← 0 for all E
return g(E)
Flatness check: \(\min_E H(E) > 0.8 \times \langle H \rangle\).
C.6 BKL / Gillespie KMC
INPUT: initial state α, rate catalog {k_j(α)}, T_max
OUTPUT: trajectory of states and times
t ← 0
state ← α
while t < T_max:
# 1. Compute all rates from current state
rates ← [k_1(state), k_2(state), ..., k_M(state)]
K_tot ← sum(rates)
# 2. Draw time increment
u1 ~ Uniform(0,1)
dt ← -ln(u1) / K_tot
t ← t + dt
# 3. Select event
u2 ~ Uniform(0,1)
cumulative ← 0
for j = 1 to M:
cumulative ← cumulative + rates[j]
if u2 · K_tot ≤ cumulative:
execute event j
break
# 4. Update state and rate catalog
state ← new_state_after_event_j
record (t, state)
Event selection can be done in \(\mathcal{O}(\log M)\) using binary search on the cumulative rate array.
C.7 Parallel Tempering
INPUT: M replicas at temperatures T_1 < T_2 < ... < T_M
N_swap steps between swap attempts
OUTPUT: sampled configurations at each temperature
initialise: x_k ~ random, for k = 1...M
for step = 1 to N_total:
# Standard MC/MD at each temperature (in parallel)
for k = 1 to M (parallel):
perform N_swap sweeps of x_k at β_k
# Attempt swaps between adjacent replicas
for k = 1 to M-1 (sequential or checkerboard):
Δ ← (β_{k+1} - β_k) · (H(x_k) - H(x_{k+1}))
u ~ Uniform(0,1)
if u < min(1, exp(Δ)):
swap x_k ↔ x_{k+1}
record x_k for all k
C.8 Umbrella Sampling + WHAM
INPUT: M windows with centres {ξ₀^(m)}, spring constants {K^(m)}
N_steps per window
OUTPUT: free energy profile F(ξ)
# Step 1: Run biased simulations
for m = 1 to M:
H_biased(x) ← H(x) + ½ K^(m) (ξ(x) - ξ₀^(m))²
run N_steps Metropolis with H_biased
record histogram n_m(ξ)
# Step 2: WHAM self-consistent equations
initialise F_m ← 0 for m = 1...M
repeat until convergence:
# Unbiased density
ρ(ξ) ← Σ_m n_m(ξ) / Σ_m N_m exp(β F_m - β V_m(ξ))
# Update free energy offsets
F_m ← -k_BT ln Σ_ξ ρ(ξ) exp(-β V_m(ξ))
F(ξ) ← -k_BT ln ρ(ξ)
C.9 Blocking Analysis
INPUT: time series {A_t}, t = 1...N
OUTPUT: standard error as function of block size b
for b = 1, 2, 4, 8, ..., N/4:
n_blocks ← floor(N / b)
block_means ← [mean(A[k*b : (k+1)*b]) for k in 0..n_blocks-1]
SE(b) ← std(block_means) / sqrt(n_blocks)
True error ← plateau value of SE(b) for large b
τ_int ← SE(b_plateau)² · N / (2 · SE(b=1)²)
Appendix D: Further Reading & References
D.1 Textbooks
Monte Carlo Methods (General)
- Newman, M. E. J. & Barkema, G. T. — Monte Carlo Methods in Statistical Physics (Oxford, 1999). The standard graduate text. Covers Ising, Potts, cluster algorithms, and finite-size scaling in depth.
- Frenkel, D. & Smit, B. — Understanding Molecular Simulation (Academic Press, 3rd ed. 2023). The reference for MC and MD in chemical physics. Comprehensive treatment of free energy methods.
- Landau, D. P. & Binder, K. — A Guide to Monte Carlo Simulations in Statistical Physics (Cambridge, 4th ed. 2014). Encyclopaedic coverage of algorithms and applications.
- Krauth, W. — Statistical Mechanics: Algorithms and Computations (Oxford, 2006). Elegant and rigorous; excellent on cluster algorithms and MCMC theory.
Probability and Statistics
- Gelman, A. et al. — Bayesian Data Analysis (CRC Press, 3rd ed. 2013). Standard reference for Bayesian inference and MCMC.
- Robert, C. P. & Casella, G. — Monte Carlo Statistical Methods (Springer, 2nd ed. 2004). Mathematically rigorous treatment of MCMC convergence.
Kinetic Monte Carlo
- Voter, A. F. — “Introduction to the Kinetic Monte Carlo Method”, in Radiation Effects in Solids (Springer, 2007). The clearest introductory review.
- Reuter, K. — “First-Principles Kinetic Monte Carlo Simulations for Heterogeneous Catalysis”, in Modelling and Simulation of Heterogeneous Catalytic Reactions (Wiley-VCH, 2011).
D.2 Key Papers
Algorithms
- Metropolis, N. et al. — “Equation of State Calculations by Fast Computing Machines”, J. Chem. Phys. 21, 1087 (1953). The original Metropolis paper.
- Hastings, W. K. — “Monte Carlo Sampling Methods Using Markov Chains and Their Applications”, Biometrika 57, 97 (1970).
- Swendsen, R. H. & Wang, J.-S. — “Nonuniversal Critical Dynamics in Monte Carlo Simulations”, Phys. Rev. Lett. 58, 86 (1987). Cluster algorithm.
- Wolff, U. — “Collective Monte Carlo Updating for Spin Systems”, Phys. Rev. Lett. 62, 361 (1989).
- Wang, F. & Landau, D. P. — “Efficient, Multiple-Range Random Walk Algorithm to Calculate the Density of States”, Phys. Rev. Lett. 86, 2050 (2001).
- Geyer, C. J. — “Markov Chain Monte Carlo Maximum Likelihood”, Proc. 23rd Symp. Interface (1991). Parallel tempering.
- Bortz, A. B., Kalos, M. H. & Lebowitz, J. L. — “A New Algorithm for Monte Carlo Simulation of Ising Spin Systems”, J. Comput. Phys. 17, 10 (1975). The BKL algorithm.
- Gillespie, D. T. — “Exact Stochastic Simulation of Coupled Chemical Reactions”, J. Phys. Chem. 81, 2340 (1977).
Exact Results
- Onsager, L. — “Crystal Statistics. I. A Two-Dimensional Model with an Order-Disorder Transition”, Phys. Rev. 65, 117 (1944).
- Kosterlitz, J. M. & Thouless, D. J. — “Ordering, Metastability and Phase Transitions in Two-Dimensional Systems”, J. Phys. C 6, 1181 (1973).
- Mermin, N. D. & Wagner, H. — “Absence of Ferromagnetism or Antiferromagnetism in One- or Two-Dimensional Isotropic Heisenberg Models”, Phys. Rev. Lett. 17, 1133 (1966).
Applications
- Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. — “Optimization by Simulated Annealing”, Science 220, 671 (1983).
- Reuter, K. & Scheffler, M. — “Composition, Structure, and Stability of RuO₂(110) as a Function of Oxygen Pressure”, Phys. Rev. B 65, 035406 (2001). First-principles KMC.
- Castelnovo, C., Moessner, R. & Sondhi, S. L. — “Magnetic Monopoles in Spin Ice”, Nature 451, 42 (2008).
D.3 Review Articles
- Binder, K. — “Monte Carlo Simulations in Statistical Physics”, Am. J. Phys. 80, 1099 (2012). Excellent overview for physicists.
- Betancourt, M. — “A Conceptual Introduction to Hamiltonian Monte Carlo”, arXiv:1701.02434 (2017). Best pedagogical treatment of HMC.
- Voter, A. F. — “Hyperdynamics: Accelerated Molecular Dynamics of Infrequent Events”, Phys. Rev. Lett. 78, 3908 (1997). Foundation of accelerated dynamics methods.
- Rohwedder, T. & Schneider, R. — “Error Estimates for the Coupled Cluster Method”, ESAIM (2013). For context on quantum chemistry methods.
D.4 Software Packages
| Package | Purpose | Language |
|---|---|---|
| LAMMPS | Molecular dynamics + MC | C++ |
| GROMACS | MD with free energy | C++/CUDA |
| HOOMD-blue | GPU MC/MD | Python/C++ |
| PyMC | Bayesian MCMC | Python |
| emcee | Affine-invariant MCMC | Python |
| MCNP6 | Neutron/photon transport | Fortran |
| Geant4 | Particle detector simulation | C++ |
| SPPARKS | KMC for materials | C++ |
| KMCLib | Lattice KMC | Python/C++ |
| ASE | Atomic simulation environment | Python |
| vegas | Adaptive MC integration | Python |
| scipy.stats.qmc | Quasi-MC sequences | Python |
D.5 Online Resources
- Computational Physics course notes, Krauth (ENS Paris):
https://www.lps.ens.fr/~krauth/ - NIST Digital Library of Mathematical Functions:
https://dlmf.nist.gov - TestU01 RNG testing library:
https://simul.iro.umontreal.ca/testu01/ - OpenKIM: verified interatomic potentials for KMC/MD:
https://openkim.org - Materials Project: DFT-computed properties for KMC input:
https://materialsproject.org