Landscape Statistics at Scale#
What’s in this notebook? This notebook demonstrates how to use JAXVacua to study the statistical properties of the flux landscape across multiple Calabi-Yau geometries. We reproduce key results from the literature on \(W_0\) distributions and landscape observables.
In this notebook, you will learn:
How to run ISD sampling campaigns across multiple geometries
The universal Gaussian distribution of \(W_0\) and the scaling \(\sigma \sim \sqrt{N_{\mathrm{flux}}}\)
How vacuum statistics depend on the geometry (\(h^{2,1}\), \(Q_{\mathrm{D3}}\))
Mass hierarchy distributions across the landscape
Prerequisites: NB06: ISD sampling principle, NB07: ISD sampling workflows.
References:
\(W_0\) distributions: 2307.15749
Deep landscape observations: 2501.03984
(Created: March 2026)
Outline#
Setup#
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import jax
import jax.numpy as jnp
jax.config.update("jax_enable_x64", True)
from jax import vmap
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib.gridspec import GridSpec
from matplotlib.colors import Normalize
from scipy.stats import gaussian_kde, norm, rayleigh
from scipy.optimize import curve_fit
mpl.rcParams.update({"figure.dpi": 200, "font.size": 10, "axes.labelsize": 11,
"figure.figsize": (3.4, 2.8), "legend.frameon": False,
"font.family": "serif",
"xtick.direction": "in", "ytick.direction": "in"})
import jaxvacua as jvc
Part 1: Single-geometry statistics#
We start with a single geometry and sample a large ensemble of SUSY flux vacua, then characterise the resulting \(W_0\) distribution.
Sampling campaign#
# Load model: degree-18 hypersurface, h12=2
model = jvc.FluxVacuaFinder(h12=2, model_ID=1, maximum_degree=2, limit="LCS", model_type="KS",prange=10)
print(f"Geometry: h12={model.h12}, Q_D3={model.D3_tadpole}")
# Sample SUSY vacua
N_target = 1000 # increase for better statistics (published results use ~10^6)
sampler = jvc.data_sampler(model, flux_bounds=[-5, 5], moduli_bounds=[1, 5], dilaton_bounds=[1, 5])
rns_key = jvc.PRNGSequence(42)
moduli, tau, fluxes, residuals = model.sample_SUSY_flux_vacua(
N=N_target, sampler=sampler, rns_key=rns_key,
max_iters=100, tol=1e-10, max_tadpole=model.D3_tadpole,
mode="ISD", vmap_dim=10**2, print_progress=True,
)
# Compute observables
W0 = np.array(vmap(model.superpotential_gauge_invariant)(moduli, tau, fluxes))
Nflux = np.array(vmap(model.tadpole)(fluxes))
g_s = np.array(1.0 / tau.imag)
print(f"\nSampled {len(W0)} vacua")
print(f"|W0| range: [{np.abs(W0).min():.3f}, {np.abs(W0).max():.1f}]")
The Gaussian distribution of \(W_0\)#
The key result of 2307.15749: the distribution of \(W_0 = \mathrm{Re}(W_0) + i\,\mathrm{Im}(W_0)\) is well approximated by a 2D Gaussian:
This means \(|W_0|\) follows a Rayleigh distribution with parameter \(\sigma\).
fig, axes = plt.subplots(1, 3, figsize=(12.0, 3.8), constrained_layout=True)
# (a) W0 in the complex plane
sc = axes[0].scatter(W0.real, W0.imag, s=5, alpha=0.45, c=Nflux, cmap="viridis",
edgecolors="none", rasterized=True)
axes[0].set_xlabel(r"Re$(W_0)$")
axes[0].set_ylabel(r"Im$(W_0)$")
axes[0].set_aspect("equal", adjustable="box")
axes[0].text(0.05, 0.92, "(a)", transform=axes[0].transAxes, fontsize=12, fontweight="bold")
cb = fig.colorbar(sc, ax=axes[0], shrink=0.85, pad=0.02)
cb.set_label(r"$N_{\mathrm{flux}}$", fontsize=9)
# (b) Histogram of Re(W0) and Im(W0) with Gaussian fit
sigma_re = np.std(W0.real)
sigma_im = np.std(W0.imag)
sigma_avg = np.sqrt((sigma_re**2 + sigma_im**2) / 2)
x_hist = np.linspace(-3 * sigma_avg, 3 * sigma_avg, 240)
axes[1].hist(W0.real, bins=32, density=True, alpha=0.5, color="#2563eb",
edgecolor="white", linewidth=0.3, label=r"Re$(W_0)$")
axes[1].hist(W0.imag, bins=32, density=True, alpha=0.5, color="#f97316",
edgecolor="white", linewidth=0.3, label=r"Im$(W_0)$")
axes[1].plot(x_hist, norm.pdf(x_hist, 0, sigma_avg), 'k-', lw=1.5,
label=f"$\mathcal{{N}}(0, {sigma_avg:.1f}^2)$")
axes[1].set_xlabel(r"Re/Im$(W_0)$")
axes[1].set_ylabel("Density")
axes[1].legend(fontsize=7)
axes[1].text(0.05, 0.92, "(b)", transform=axes[1].transAxes, fontsize=12, fontweight="bold")
# (c) |W0| distribution with Rayleigh fit on a logarithmic x-axis
W0_abs = np.abs(W0)
W0_pos = W0_abs[W0_abs > 0]
x_min = max(float(W0_pos.min()), 1e-3)
x_max = float(W0_pos.max())
bins_abs = np.logspace(np.log10(x_min), np.log10(x_max), 34)
x_ray = np.logspace(np.log10(x_min), np.log10(x_max), 240)
axes[2].hist(W0_pos, bins=bins_abs, density=True, alpha=0.55, color="#7c3aed",
edgecolor="white", linewidth=0.3, label=r"$|W_0|$ data")
axes[2].plot(x_ray, rayleigh.pdf(x_ray, scale=sigma_avg), 'k-', lw=1.5,
label=f"Rayleigh($\sigma={sigma_avg:.1f}$)")
axes[2].set_xscale("log")
axes[2].set_xlim(x_min/2, x_max)
axes[2].set_xlabel(r"$|W_0|$")
axes[2].set_ylabel("Density")
axes[2].legend(fontsize=7)
axes[2].text(0.05, 0.92, "(c)", transform=axes[2].transAxes, fontsize=12, fontweight="bold")
plt.show()
print(f"sigma_Re = {sigma_re:.2f}, sigma_Im = {sigma_im:.2f}, sigma_avg = {sigma_avg:.2f}")
Scaling of \(\sigma\) with \(N_{\mathrm{flux}}\)#
The width of the \(W_0\) distribution scales as \(\sigma \sim \sqrt{N_{\mathrm{flux}}}\). We verify this by binning vacua by their tadpole value and computing \(\sigma\) in each bin.
# Bin by Nflux ranges rather than exact integer values; exact-N bins are too sparse
# for a small notebook-scale sample.
Nflux_int = Nflux.astype(int)
n_bins = min(8, max(3, len(Nflux_int) // 35))
edges = np.linspace(Nflux_int.min(), Nflux_int.max(), n_bins + 1)
min_count = max(5, len(Nflux_int) // 40)
bin_N = []
bin_sigma = []
bin_count = []
for j, (lo, hi) in enumerate(zip(edges[:-1], edges[1:])):
if j == len(edges) - 2:
mask = (Nflux_int >= lo) & (Nflux_int <= hi)
else:
mask = (Nflux_int >= lo) & (Nflux_int < hi)
if mask.sum() >= min_count:
W_bin = W0[mask]
sig = np.sqrt((np.var(W_bin.real) + np.var(W_bin.imag)) / 2)
bin_N.append(np.mean(Nflux_int[mask]))
bin_sigma.append(sig)
bin_count.append(mask.sum())
bin_N = np.asarray(bin_N)
bin_sigma = np.asarray(bin_sigma)
bin_count = np.asarray(bin_count)
fig, ax = plt.subplots(figsize=(4.2, 3.0))
sizes = 18 + 1.5 * bin_count if len(bin_count) else 20
ax.scatter(bin_N, bin_sigma, s=sizes, c="#2563eb", edgecolors="white",
linewidth=0.3, zorder=3, label="Binned sample")
# Fit sigma = a * sqrt(N) when there are enough non-empty bins.
if len(bin_N) > 2:
def sqrt_model(N, a):
return a * np.sqrt(N)
popt, _ = curve_fit(sqrt_model, bin_N, bin_sigma)
N_fit = np.linspace(bin_N.min(), bin_N.max(), 100)
ax.plot(N_fit, sqrt_model(N_fit, *popt), 'k--', lw=1,
label=f"$\sigma = {popt[0]:.2f}\sqrt{{N_{{\mathrm{{flux}}}}}}$")
ax.set_xlabel(r"$N_{\mathrm{flux}}$ (bin centre)")
ax.set_ylabel(r"$\sigma(W_0)$")
ax.legend(fontsize=8)
fig.tight_layout()
plt.show()
print(f"Used {len(bin_N)} non-empty N_flux bins; minimum count per bin = {min_count}.")
Part 2: Multi-geometry comparison#
We now repeat the sampling across several geometries with different \(h^{2,1}\) to test the universality of the Gaussian distribution.
# We reuse the h12=2 (model 1) data from Part 1 and sample a second h12=2 model.
# For a full multi-geometry campaign with h12=3,4,5, run as a standalone script
# (JIT compilation for larger models can take several minutes).
N_secondary = max(300, len(W0))
results = {}
# h12=2, model 1 from Part 1
sig_1 = np.sqrt((np.var(W0.real) + np.var(W0.imag)) / 2)
results["Model 1 ($h^{2,1}=2$)"] = {
"W0": W0, "Nflux": Nflux, "sigma": sig_1,
"Q": model.D3_tadpole, "h12": 2, "n_vacua": len(W0)
}
print(f"Model 1 (h12=2): {len(W0)} vacua, sigma = {sig_1:.2f}, Q = {model.D3_tadpole}")
# h12=2, model 2 -- different geometry, same h12.
print(f"\nSampling Model 2 (h12=2, ID=2) with N={N_secondary}...")
model_2 = jvc.FluxVacuaFinder(h12=2, model_ID=2, maximum_degree=2, limit="LCS", model_type="KS")
sampler_2 = jvc.data_sampler(model_2, flux_bounds=[-10, 10], moduli_bounds=[1, 5], dilaton_bounds=[1, 5])
mod_2, tau_2, fl_2, res_2 = model_2.sample_SUSY_flux_vacua(
N=N_secondary, sampler=sampler_2, rns_key=jvc.PRNGSequence(123),
max_iters=100, tol=1e-10, max_tadpole=model_2.D3_tadpole,
mode="ISD", vmap_dim=2*10**3, print_progress=True,
)
W0_2 = np.array(vmap(model_2.superpotential_gauge_invariant)(mod_2, tau_2, fl_2))
Nf_2 = np.array(vmap(model_2.tadpole)(fl_2))
sig_2 = np.sqrt((np.var(W0_2.real) + np.var(W0_2.imag)) / 2)
results["Model 2 ($h^{2,1}=2$)"] = {
"W0": W0_2, "Nflux": Nf_2, "sigma": sig_2,
"Q": model_2.D3_tadpole, "h12": 2, "n_vacua": len(W0_2)
}
print(f"Model 2 (h12=2): {len(W0_2)} vacua, sigma = {sig_2:.2f}, Q = {model_2.D3_tadpole}")
print(f"\nSuccessfully sampled {len(results)} geometries")
Universal Gaussian behaviour#
colors = ["#2563eb", "#f97316", "#16a34a", "#dc2626", "#7c3aed"]
fig, axes = plt.subplots(1, 2, figsize=(8.2, 3.2), constrained_layout=True)
# (a) Normalised Re(W0) distributions -- should all collapse to N(0,1).
common_bins = np.linspace(-4, 4, 33)
for i, (label, data) in enumerate(results.items()):
W_norm = data["W0"].real / data["sigma"]
axes[0].hist(W_norm, bins=common_bins, density=True, histtype="stepfilled",
alpha=0.35, color=colors[i], edgecolor=colors[i], linewidth=0.8,
label=f'{label} ({data["n_vacua"]})')
x_g = np.linspace(-4, 4, 240)
axes[0].plot(x_g, norm.pdf(x_g), 'k-', lw=1.5, label=r"$\mathcal{N}(0,1)$")
axes[0].set_xlabel(r"Re$(W_0) / \sigma$")
axes[0].set_ylabel("Density")
axes[0].set_xlim(-4, 4)
axes[0].legend(fontsize=7, loc="upper right")
axes[0].text(0.05, 0.92, "(a)", transform=axes[0].transAxes, fontsize=12, fontweight="bold")
# (b) sigma vs Q_D3.
Q_vals = np.array([data["Q"] for data in results.values()], dtype=float)
sig_vals = np.array([data["sigma"] for data in results.values()], dtype=float)
axes[1].scatter(Q_vals, sig_vals, s=55, c=colors[:len(results)],
edgecolors="black", linewidth=0.5, zorder=3)
for i, (label, data) in enumerate(results.items()):
axes[1].annotate(label, (data["Q"], data["sigma"]), fontsize=7,
xytext=(5, 5), textcoords="offset points")
if len(Q_vals) > 1 and np.ptp(Q_vals) == 0:
axes[1].set_xlim(Q_vals[0] - 1, Q_vals[0] + 1)
axes[1].set_xlabel(r"$Q_{\mathrm{D3}}$")
axes[1].set_ylabel(r"$\sigma(W_0)$")
axes[1].text(0.05, 0.92, "(b)", transform=axes[1].transAxes, fontsize=12, fontweight="bold")
plt.show()
Part 3: Landscape observables#
Beyond \(W_0\), we can extract further physical observables from the vacuum ensemble.
String coupling and tadpole distributions#
fig, axes = plt.subplots(1, 2, figsize=(7, 3))
# Use the largest dataset (first geometry)
label0 = list(results.keys())[0]
data0 = results[label0]
W0_0 = data0["W0"]
Nf_0 = data0["Nflux"]
# Re-compute g_s from the first geometry (need tau)
# We already have it from Part 1
axes[0].hist(g_s, bins=30, color="#06b6d4", alpha=0.6, edgecolor="white", linewidth=0.3)
axes[0].set_xlabel(r"$g_s = 1/\mathrm{Im}(\tau)$")
axes[0].set_ylabel("Count")
axes[0].text(0.05, 0.92, "(a)", transform=axes[0].transAxes, fontsize=12, fontweight="bold")
axes[0].set_xscale("log")
axes[0].set_yscale("log")
axes[1].hist(Nf_0.astype(int), bins=range(int(Nf_0.min()), int(Nf_0.max())+2),
color="#ec4899", alpha=0.6, edgecolor="white", linewidth=0.3)
axes[1].set_xlabel(r"$N_{\mathrm{flux}}$")
axes[1].set_ylabel("Count")
axes[1].text(0.05, 0.92, "(b)", transform=axes[1].transAxes, fontsize=12, fontweight="bold")
fig.tight_layout()
plt.show()
Probability of small \(|W_0|\)#
For KKLT-type moduli stabilisation, we need \(|W_0| \ll 1\). The fraction of vacua with \(|W_0| < \epsilon\) is:
for the Rayleigh distribution. We compare this prediction against the data.
fig, ax = plt.subplots(figsize=(3.4, 2.8))
epsilons = np.logspace(-2, np.log10(np.abs(W0).max()), 100)
# Empirical CDF
W0_abs_sorted = np.sort(np.abs(W0))
ecdf = np.arange(1, len(W0_abs_sorted)+1) / len(W0_abs_sorted)
ax.plot(W0_abs_sorted, ecdf, color="#2563eb", lw=1.5, label="Data (ECDF)")
# Rayleigh prediction
rayleigh_cdf = 1 - np.exp(-epsilons**2 / (2 * sigma_avg**2))
ax.plot(epsilons, rayleigh_cdf, 'k--', lw=1, label=f"Rayleigh($\sigma={sigma_avg:.1f}$)")
ax.set_xscale("log")
ax.set_xlabel(r"$\epsilon$")
ax.set_ylabel(r"$P(|W_0| < \epsilon)$")
ax.legend(fontsize=8)
fig.tight_layout()
plt.show()
# Probability of |W0| < 1
frac_small = np.mean(np.abs(W0) < 1)
frac_pred = 1 - np.exp(-1 / (2 * sigma_avg**2))
print(f"P(|W0| < 1): data = {frac_small:.4f}, Rayleigh = {frac_pred:.4f}")
Take-aways#
Observable |
Result |
Reference |
|---|---|---|
\(W_0\) distribution |
Universal 2D Gaussian \(\mathcal{N}(0, \sigma^2)\) |
|
Width scaling |
\(\sigma \sim \sqrt{N_{\mathrm{flux}}}\) |
|
Universality |
Same shape across \(h^{2,1} = 2, 3, 4, 5\) |
|
Small $ |
W_0 |
$ |
Caveats:
This notebook uses \(N \sim 200\)–\(500\) vacua per geometry for speed. Published results use \(\mathcal{O}(10^6)\).
The Gaussian approximation breaks down at the tails (\(|W_0| \ll \sigma\)) — see 2307.15749 for a detailed discussion.
For exhaustive enumeration (rather than sampling), see the
flux_boundingmodule and 2501.03984.
Further reading#
NB07 — ISD sampling (sample-generation workflow)
NB08 — Flux bounding (complete enumeration alternative)
NB13 — Visualisation cookbook (plotting recipes used here)
NB14 — Hessian analysis (mass-spectrum analysis at sampled vacua)
arXiv:2307.15749 — Dubey, Krippendorf, Schachner: \(W_0\) distributions
arXiv:2501.03984 — Deep observations of the Type IIB flux landscape