Foundations of Probability & Randomness

1.1 Probability Spaces and Random Variables

A probability space is a triple $(\Omega, \mathcal{F}, P)$ where $\Omega$ is the sample space, $\mathcal{F}$ is a $\sigma$-algebra of events, and $P: \mathcal{F} \to [0,1]$ is a probability measure satisfying $P(\Omega) = 1$.

A random variable $X: \Omega \to \mathbb{R}$ maps outcomes to real numbers. For a continuous random variable its probability density function (PDF) $p(x)$ satisfies:

$$P(a \leq X \leq b) = \int_a^b p(x), dx, \qquad \int_{-\infty}^{\infty} p(x), dx = 1. \tag{1.1}$$

The cumulative distribution function (CDF) is:

$$F(x) = \int_{-\infty}^{x} p(x’), dx’. \tag{1.2}$$

Key Distributions in Physics

Gaussian (Normal): $$p(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right). \tag{1.3}$$ Arises in measurement errors, Brownian motion, and the CLT (see §1.2).

Poisson: $$P(n; \lambda) = \frac{\lambda^n e^{-\lambda}}{n!}, \quad n = 0, 1, 2, \ldots \tag{1.4}$$ Governs photon counts in a detector, nuclear decay events, and cosmic ray arrivals in a given time window.

Exponential: $$p(t) = \lambda e^{-\lambda t}, \quad t \geq 0. \tag{1.5}$$ The waiting time between successive Poisson events — equivalently, the lifetime of a radioactive nucleus with decay constant $\lambda$.

Maxwell-Boltzmann speed distribution (to be sampled in Ch. 2): $$p(v) = 4\pi n \left(\frac{m}{2\pi k_B T}\right)^{3/2} v^2 \exp!\left(-\frac{mv^2}{2k_B T}\right). \tag{1.6}$$

Note

Radioactive Decay A sample of $^{14}$C has decay constant $\lambda = 3.84 \times 10^{-12}$ s$^{-1}$. The number of decays in a 1-second window follows a Poisson distribution. If the initial activity is $A_0 = 10^4$ Bq, the standard deviation in the count is $\sqrt{A_0} \approx 100$ — illustrating how counting statistics limit detector precision.

Moments and Cumulants

The mean and variance of $X$: $$\mu = \langle X \rangle = \int x, p(x), dx, \qquad \sigma^2 = \langle (X - \mu)^2 \rangle. \tag{1.7}$$

The moment generating function $M(t) = \langle e^{tX} \rangle$ and characteristic function $\phi(k) = \langle e^{ikX} \rangle$ encode all moments.

1.2 The Law of Large Numbers and Central Limit Theorem

Law of Large Numbers

Let $X_1, X_2, \ldots, X_N$ be independent, identically distributed (i.i.d.) random variables with mean $\mu$. Define the sample mean:

$$\bar{X}N = \frac{1}{N} \sum{i=1}^N X_i. \tag{1.8}$$

The strong law of large numbers states:

$$\bar{X}_N \xrightarrow{a.s.} \mu \quad \text{as } N \to \infty. \tag{1.9}$$

This underpins every Monte Carlo estimator: replace an ensemble average by a sample average.

Central Limit Theorem

Regardless of the parent distribution $p(x)$ (provided $\sigma^2 < \infty$), the distribution of $\bar{X}_N$ converges to a Gaussian:

$$\frac{\bar{X}_N - \mu}{\sigma/\sqrt{N}} \xrightarrow{d} \mathcal{N}(0,1) \quad \text{as } N \to \infty. \tag{1.10}$$

Consequence: the statistical error of any MC estimator scales as:

$$\epsilon \sim \frac{\sigma}{\sqrt{N}}. \tag{1.11}$$

To halve the error, one must quadruple the number of samples. This $1/\sqrt{N}$ convergence rate is universal — it does not depend on dimensionality, unlike grid-based quadrature.

Note

Why MC beats quadrature in high dimensions A regular grid with $k$ points per dimension in $d$ dimensions requires $N = k^d$ points. The error scales as $k^{-p}$ for a $p$-th order method, i.e. $N^{-p/d}$. For $d = 10$ and $p = 4$, the error goes as $N^{-0.4}$, slower than MC’s $N^{-1/2}$. Above $d \approx 8$, MC wins unconditionally.

1.3 Generating Random Numbers

True vs. Pseudo-Random Numbers

A true random number generator (TRNG) harvests physical entropy — thermal noise, radioactive decay, atmospheric noise. Pseudo-random number generators (PRNGs) are deterministic algorithms producing sequences that pass statistical tests of randomness. For scientific computing, PRNGs are preferable: they are fast, reproducible (given the seed), and well-characterised.

Linear Congruential Generators (LCGs)

$$x_{n+1} = (a, x_n + c) \bmod m. \tag{1.12}$$

The Hull–Dobell theorem gives conditions on $a, c, m$ for full period $m$. A classic choice (ANSI C rand): $a = 1103515245$, $c = 12345$, $m = 2^{31}$.

Warning

LCGs are not suitable for serious MC work LCGs have short periods and strong lattice correlations. RANDU ($a=65539$, $m=2^{31}$) places all triples $(x_n, x_{n+1}, x_{n+2})$ on 15 parallel planes — catastrophic for 3D integration. Never use it.

Mersenne Twister (MT19937)

The default PRNG in Python (numpy.random). Key properties:

Period: $2^{19937} - 1$ (a Mersenne prime, hence the name)
Passes all Diehard and TestU01 statistical tests
623-dimensional equidistribution
State: 624 32-bit integers

import numpy as np
rng = np.random.default_rng(seed=42)   # modern API, reproducible
u = rng.uniform(0, 1, size=10_000)     # 10000 uniform samples

PCG and Xoshiro Generators

Modern alternatives to MT for performance-critical code:

PCG64: default in numpy.random.default_rng. Better statistical properties than MT, smaller state.
Xoshiro256**: extremely fast, suitable for parallel streams.

Random Numbers on the Unit Sphere

Uniformly distributed points on $S^2$ require:

$$\theta = \arccos(1 - 2u_1), \quad \phi = 2\pi u_2, \tag{1.13}$$

where $u_1, u_2 \sim \mathcal{U}[0,1]$. Do not sample $\theta$ uniformly — this clusters points at the poles.

1.4 Testing Randomness

A sequence ${x_n}$ that fails statistical tests may bias MC results. Standard tests:

Chi-Squared Uniformity Test

Divide $[0,1]$ into $k$ equal bins with expected count $E = N/k$ each. The test statistic:

$$\chi^2 = \sum_{i=1}^k \frac{(O_i - E)^2}{E} \tag{1.14}$$

follows a $\chi^2$ distribution with $k-1$ degrees of freedom under $H_0$ (perfect uniformity). Reject if $p < 0.05$.

Autocorrelation Test

The lag-$\tau$ autocorrelation:

$$C(\tau) = \frac{\langle x_n x_{n+\tau} \rangle - \langle x_n \rangle^2}{\langle x_n^2 \rangle - \langle x_n \rangle^2}. \tag{1.15}$$

For a good PRNG, $C(\tau) \approx 0$ for all $\tau > 0$. In MCMC, non-zero $C(\tau)$ reflects physical correlations and must be accounted for in error estimation (see Ch. 4).

Spectral Test

In $d$ dimensions, plot $d$-tuples $(x_n, x_{n+1}, \ldots, x_{n+d-1})$. Ideally these fill the unit hypercube uniformly; LCGs reveal planes.

Tip

The TestU01 Library The most comprehensive battery of statistical tests is TestU01 (L’Ecuyer & Simard). It includes BigCrush (over 100 tests). MT19937 passes; many simple generators fail.

1.5 Computational Lab — Class 2

Objectives:

Implement a simple LCG and compare its $\chi^2$ and spectral behaviour to MT19937.
Generate $N = 10^5$ exponential random variates using the inverse CDF (see Ch. 2) and compare the empirical histogram to equation (1.5).
Simulate radioactive decay of $^{60}$Co ($t_{1/2} = 5.27$ yr) for 1000 atoms over 20 years using the waiting-time method. Plot the activity as a function of time and compare to the analytic result $A(t) = A_0 e^{-\lambda t}$.

Note

Skeleton Code See Appendix A for the Python skeleton. You will need numpy, scipy.stats, and matplotlib.

Summary

A Monte Carlo estimator of $\langle f \rangle = \int f(x) p(x), dx$ is $\hat{f} = \frac{1}{N}\sum_{i=1}^N f(x_i)$ with $x_i \sim p(x)$.
The error is $\sigma/\sqrt{N}$, universal and dimension-independent.
Use MT19937 or PCG64; never use LCGs or RANDU for scientific work.
Always test your RNG and seed it for reproducibility.

Keyboard shortcuts

Monte Carlo Methods for Physicists