Landscape Statistics at Scale

Landscape Statistics at Scale#

What’s in this notebook? This notebook shows how to summarise flux-vacuum ensembles statistically once the vacua have already been generated. It demonstrates the workflow on small demonstration ensembles, while pointing to the much larger samples needed for publication-level landscape statements.

In this notebook, you will learn:

How to run compact ISD sampling campaigns across one or two geometries.

How to visualise the approximate Gaussian distribution of $W_0$ and the scaling of its width with tadpole.

How ensemble statistics depend on geometry, $h^{2,1}$ and $Q_{\mathrm{D3}}$.

How to collect observables such as $g_s$, $N_{\mathrm{flux}}$, residuals and mass-hierarchy diagnostics.

Prerequisites: NB06 — ISD sampling principle, NB07 — ISD sampling workflows.

References: $W_0$ distributions: arXiv:2307.15749; deep landscape observations: arXiv:2501.03984.

Runtime note: the defaults are intentionally small demonstration ensembles. They demonstrate the analysis workflow, not publication-level landscape statistics.

Setup#

import numpy as np
import warnings
warnings.filterwarnings('ignore')

import jax
import jax.numpy as jnp
jax.config.update("jax_enable_x64", True)

import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib.gridspec import GridSpec
from matplotlib.colors import Normalize
from scipy.stats import gaussian_kde, norm, rayleigh
from scipy.optimize import curve_fit

mpl.rcParams.update({"figure.dpi": 200, "font.size": 10, "axes.labelsize": 11,
                      "figure.figsize": (3.4, 2.8), "legend.frameon": False,
                      "font.family": "serif",
                      "xtick.direction": "in", "ytick.direction": "in"})

import jaxvacua as jvc

Part 1: Single-geometry statistics#

We start with a single geometry and sample a large ensemble of SUSY flux vacua, then characterise the resulting $W_0$ distribution.

Sampling campaign#

# Load model: degree-18 hypersurface, h12=2
model = jvc.FluxVacuaFinder(h12=2, model_ID=1, maximum_degree=2, limit="LCS", model_type="KS",prange=10)
print(f"Geometry: h12={model.h12}, Q_D3={model.Q()}")

# Sample SUSY vacua
N_target = 100  # increase for better statistics (published results use much larger ensembles)
sampler = jvc.data_sampler(model, flux_bounds=[-5, 5], moduli_bounds=[1, 5], dilaton_bounds=[1, 5])
rns_key = jvc.PRNGSequence(42)

moduli, tau, fluxes, residuals = model.sample_SUSY_flux_vacua(
    N=N_target, sampler=sampler, rns_key=rns_key,
    max_iters=30, tol=1e-10, Q=model.Q(),
    mode="ISD", vmap_dim=100, print_progress=True,
)

# Compute observables
W0 = np.array(model.superpotential_gauge_invariant(moduli, tau, fluxes))
Nflux = np.array(model.tadpole(fluxes))
g_s = np.array(1.0 / tau.imag)

print(f"\nSampled {len(W0)} vacua")
print(f"|W0| range: [{np.abs(W0).min():.3f}, {np.abs(W0).max():.1f}]")

The Gaussian distribution of $W_0$#

The key result of 2307.15749: the distribution of $W_0 = \mathrm{Re}(W_0) + i\,\mathrm{Im}(W_0)$ is well approximated by a 2D Gaussian:

(1)#\[W_0 \sim \mathcal{N}(0, \sigma^2)\]

This means $|W_0|$ follows a Rayleigh distribution with parameter $\sigma$.

fig, axes = plt.subplots(1, 3, figsize=(12.0, 3.8), constrained_layout=True)

# (a) W0 in the complex plane
sc = axes[0].scatter(W0.real, W0.imag, s=5, alpha=0.45, c=Nflux, cmap="viridis",
                     edgecolors="none", rasterized=True)
axes[0].set_xlabel(r"Re$(W_0)$")
axes[0].set_ylabel(r"Im$(W_0)$")
axes[0].set_aspect("equal", adjustable="box")
axes[0].text(0.05, 0.92, "(a)", transform=axes[0].transAxes, fontsize=12, fontweight="bold")
cb = fig.colorbar(sc, ax=axes[0], shrink=0.85, pad=0.02)
cb.set_label(r"$N_{\mathrm{flux}}$", fontsize=9)

# (b) Histogram of Re(W0) and Im(W0) with Gaussian fit
sigma_re = np.std(W0.real)
sigma_im = np.std(W0.imag)
sigma_avg = np.sqrt((sigma_re**2 + sigma_im**2) / 2)

x_hist = np.linspace(-3 * sigma_avg, 3 * sigma_avg, 240)

axes[1].hist(W0.real, bins=32, density=True, alpha=0.5, color="#2563eb",
             edgecolor="white", linewidth=0.3, label=r"Re$(W_0)$")
axes[1].hist(W0.imag, bins=32, density=True, alpha=0.5, color="#f97316",
             edgecolor="white", linewidth=0.3, label=r"Im$(W_0)$")
axes[1].plot(x_hist, norm.pdf(x_hist, 0, sigma_avg), 'k-', lw=1.5,
             label=f"$\mathcal{{N}}(0, {sigma_avg:.1f}^2)$")
axes[1].set_xlabel(r"Re/Im$(W_0)$")
axes[1].set_ylabel("Density")
axes[1].legend(fontsize=7)
axes[1].text(0.05, 0.92, "(b)", transform=axes[1].transAxes, fontsize=12, fontweight="bold")

# (c) |W0| distribution with Rayleigh fit on a logarithmic x-axis
W0_abs = np.abs(W0)
W0_pos = W0_abs[W0_abs > 0]
x_min = max(float(W0_pos.min()), 1e-3)
x_max = float(W0_pos.max())
bins_abs = np.logspace(np.log10(x_min), np.log10(x_max), 34)
x_ray = np.logspace(np.log10(x_min), np.log10(x_max), 240)

axes[2].hist(W0_pos, bins=bins_abs, density=True, alpha=0.55, color="#7c3aed",
             edgecolor="white", linewidth=0.3, label=r"$|W_0|$ data")
axes[2].plot(x_ray, rayleigh.pdf(x_ray, scale=sigma_avg), 'k-', lw=1.5,
             label=f"Rayleigh($\sigma={sigma_avg:.1f}$)")
axes[2].set_xscale("log")
axes[2].set_xlim(x_min/2, x_max)
axes[2].set_xlabel(r"$|W_0|$")
axes[2].set_ylabel("Density")
axes[2].legend(fontsize=7)
axes[2].text(0.05, 0.92, "(c)", transform=axes[2].transAxes, fontsize=12, fontweight="bold")

plt.show()

print(f"sigma_Re = {sigma_re:.2f},  sigma_Im = {sigma_im:.2f},  sigma_avg = {sigma_avg:.2f}")

Scaling of $\sigma$ with $N_{\mathrm{flux}}$#

The width of the $W_0$ distribution scales as $\sigma \sim \sqrt{N_{\mathrm{flux}}}$. We verify this by binning vacua by their tadpole value and computing $\sigma$ in each bin.

# Bin by Nflux ranges rather than exact integer values; exact-N bins are too sparse
# for a small notebook-scale sample.
Nflux_int = Nflux.astype(int)
n_bins = min(8, max(3, len(Nflux_int) // 35))
edges = np.linspace(Nflux_int.min(), Nflux_int.max(), n_bins + 1)
min_count = max(5, len(Nflux_int) // 40)

bin_N = []
bin_sigma = []
bin_count = []

for j, (lo, hi) in enumerate(zip(edges[:-1], edges[1:])):
    if j == len(edges) - 2:
        mask = (Nflux_int >= lo) & (Nflux_int <= hi)
    else:
        mask = (Nflux_int >= lo) & (Nflux_int < hi)
    if mask.sum() >= min_count:
        W_bin = W0[mask]
        sig = np.sqrt((np.var(W_bin.real) + np.var(W_bin.imag)) / 2)
        bin_N.append(np.mean(Nflux_int[mask]))
        bin_sigma.append(sig)
        bin_count.append(mask.sum())

bin_N = np.asarray(bin_N)
bin_sigma = np.asarray(bin_sigma)
bin_count = np.asarray(bin_count)

fig, ax = plt.subplots(figsize=(4.2, 3.0))

sizes = 18 + 1.5 * bin_count if len(bin_count) else 20
ax.scatter(bin_N, bin_sigma, s=sizes, c="#2563eb", edgecolors="white",
           linewidth=0.3, zorder=3, label="Binned sample")

# Fit sigma = a * sqrt(N) when there are enough non-empty bins.
if len(bin_N) > 2:
    def sqrt_model(N, a):
        return a * np.sqrt(N)
    popt, _ = curve_fit(sqrt_model, bin_N, bin_sigma)
    N_fit = np.linspace(bin_N.min(), bin_N.max(), 100)
    ax.plot(N_fit, sqrt_model(N_fit, *popt), 'k--', lw=1,
            label=f"$\sigma = {popt[0]:.2f}\sqrt{{N_{{\mathrm{{flux}}}}}}$")

ax.set_xlabel(r"$N_{\mathrm{flux}}$ (bin centre)")
ax.set_ylabel(r"$\sigma(W_0)$")
ax.legend(fontsize=8)
fig.tight_layout()
plt.show()

print(f"Used {len(bin_N)} non-empty N_flux bins; minimum count per bin = {min_count}.")

Part 2: Multi-geometry comparison#

We now repeat the sampling across several geometries with different $h^{2,1}$ to test the universality of the Gaussian distribution.

# We reuse the h12=2 (model 1) data from Part 1 and sample a second h12=2 model.
# For a full multi-geometry campaign with h12=3,4,5, run as a standalone script
# (JIT compilation for larger models can take several minutes).

N_secondary = max(50, len(W0))
results = {}

# h12=2, model 1 from Part 1
sig_1 = np.sqrt((np.var(W0.real) + np.var(W0.imag)) / 2)
results["Model 1 ($h^{2,1}=2$)"] = {
    "W0": W0, "Nflux": Nflux, "sigma": sig_1,
    "Q": model.Q(), "h12": 2, "n_vacua": len(W0)
}
print(f"Model 1 (h12=2): {len(W0)} vacua, sigma = {sig_1:.2f}, Q = {model.Q()}")

# h12=2, model 2 -- different geometry, same h12.
print(f"\nSampling Model 2 (h12=2, ID=2) with N={N_secondary}...")
model_2 = jvc.FluxVacuaFinder(h12=2, model_ID=2, maximum_degree=2, limit="LCS", model_type="KS")
sampler_2 = jvc.data_sampler(model_2, flux_bounds=[-10, 10], moduli_bounds=[1, 5], dilaton_bounds=[1, 5])

mod_2, tau_2, fl_2, res_2 = model_2.sample_SUSY_flux_vacua(
    N=N_secondary, sampler=sampler_2, rns_key=jvc.PRNGSequence(123),
    max_iters=30, tol=1e-10, Q=model_2.Q(),
    mode="ISD", vmap_dim=100, print_progress=True,
)

W0_2 = np.array(model_2.superpotential_gauge_invariant(mod_2, tau_2, fl_2))
Nf_2 = np.array(model_2.tadpole(fl_2))
sig_2 = np.sqrt((np.var(W0_2.real) + np.var(W0_2.imag)) / 2)

results["Model 2 ($h^{2,1}=2$)"] = {
    "W0": W0_2, "Nflux": Nf_2, "sigma": sig_2,
    "Q": model_2.Q(), "h12": 2, "n_vacua": len(W0_2)
}
print(f"Model 2 (h12=2): {len(W0_2)} vacua, sigma = {sig_2:.2f}, Q = {model_2.Q()}")

print(f"\nSuccessfully sampled {len(results)} geometries")

Universal Gaussian behaviour#

colors = ["#2563eb", "#f97316", "#16a34a", "#dc2626", "#7c3aed"]

fig, axes = plt.subplots(1, 2, figsize=(8.2, 3.2), constrained_layout=True)

# (a) Normalised Re(W0) distributions -- should all collapse to N(0,1).
common_bins = np.linspace(-4, 4, 33)
for i, (label, data) in enumerate(results.items()):
    W_norm = data["W0"].real / data["sigma"]
    axes[0].hist(W_norm, bins=common_bins, density=True, histtype="stepfilled",
                 alpha=0.35, color=colors[i], edgecolor=colors[i], linewidth=0.8,
                 label=f'{label} ({data["n_vacua"]})')

x_g = np.linspace(-4, 4, 240)
axes[0].plot(x_g, norm.pdf(x_g), 'k-', lw=1.5, label=r"$\mathcal{N}(0,1)$")
axes[0].set_xlabel(r"Re$(W_0) / \sigma$")
axes[0].set_ylabel("Density")
axes[0].set_xlim(-4, 4)
axes[0].legend(fontsize=7, loc="upper right")
axes[0].text(0.05, 0.92, "(a)", transform=axes[0].transAxes, fontsize=12, fontweight="bold")

# (b) sigma vs Q_D3.
Q_vals = np.array([data["Q"] for data in results.values()], dtype=float)
sig_vals = np.array([data["sigma"] for data in results.values()], dtype=float)

axes[1].scatter(Q_vals, sig_vals, s=55, c=colors[:len(results)],
                edgecolors="black", linewidth=0.5, zorder=3)

for i, (label, data) in enumerate(results.items()):
    axes[1].annotate(label, (data["Q"], data["sigma"]), fontsize=7,
                     xytext=(5, 5), textcoords="offset points")

if len(Q_vals) > 1 and np.ptp(Q_vals) == 0:
    axes[1].set_xlim(Q_vals[0] - 1, Q_vals[0] + 1)
axes[1].set_xlabel(r"$Q_{\mathrm{D3}}$")
axes[1].set_ylabel(r"$\sigma(W_0)$")
axes[1].text(0.05, 0.92, "(b)", transform=axes[1].transAxes, fontsize=12, fontweight="bold")

plt.show()

Part 3: Landscape observables#

Beyond $W_0$, we can extract further physical observables from the vacuum ensemble.

String coupling and tadpole distributions#

fig, axes = plt.subplots(1, 2, figsize=(7, 3))

# Use the largest dataset (first geometry)
label0 = list(results.keys())[0]
data0 = results[label0]
W0_0 = data0["W0"]
Nf_0 = data0["Nflux"]

# Re-compute g_s from the first geometry (need tau)
# We already have it from Part 1
axes[0].hist(g_s, bins=30, color="#06b6d4", alpha=0.6, edgecolor="white", linewidth=0.3)
axes[0].set_xlabel(r"$g_s = 1/\mathrm{Im}(\tau)$")
axes[0].set_ylabel("Count")
axes[0].text(0.05, 0.92, "(a)", transform=axes[0].transAxes, fontsize=12, fontweight="bold")
axes[0].set_xscale("log")
axes[0].set_yscale("log")

axes[1].hist(Nf_0.astype(int), bins=range(int(Nf_0.min()), int(Nf_0.max())+2),
             color="#ec4899", alpha=0.6, edgecolor="white", linewidth=0.3)
axes[1].set_xlabel(r"$N_{\mathrm{flux}}$")
axes[1].set_ylabel("Count")
axes[1].text(0.05, 0.92, "(b)", transform=axes[1].transAxes, fontsize=12, fontweight="bold")

fig.tight_layout()
plt.show()

Probability of small $|W_0|$#

For KKLT-type moduli stabilisation, we need $|W_0| \ll 1$. The fraction of vacua with $|W_0| < \epsilon$ is:

(2)#\[P(|W_0| < \epsilon) \approx 1 - e^{-\epsilon^2 / (2\sigma^2)}\]

for the Rayleigh distribution. We compare this prediction against the data.

fig, ax = plt.subplots(figsize=(3.4, 2.8))

epsilons = np.logspace(-2, np.log10(np.abs(W0).max()), 100)

# Empirical CDF
W0_abs_sorted = np.sort(np.abs(W0))
ecdf = np.arange(1, len(W0_abs_sorted)+1) / len(W0_abs_sorted)

ax.plot(W0_abs_sorted, ecdf, color="#2563eb", lw=1.5, label="Data (ECDF)")

# Rayleigh prediction
rayleigh_cdf = 1 - np.exp(-epsilons**2 / (2 * sigma_avg**2))
ax.plot(epsilons, rayleigh_cdf, 'k--', lw=1, label=f"Rayleigh($\sigma={sigma_avg:.1f}$)")

ax.set_xscale("log")
ax.set_xlabel(r"$\epsilon$")
ax.set_ylabel(r"$P(|W_0| < \epsilon)$")
ax.legend(fontsize=8)

fig.tight_layout()
plt.show()

# Probability of |W0| < 1
frac_small = np.mean(np.abs(W0) < 1)
frac_pred = 1 - np.exp(-1 / (2 * sigma_avg**2))
print(f"P(|W0| < 1): data = {frac_small:.4f}, Rayleigh = {frac_pred:.4f}")

Take-aways#

Observable	Result	Reference
$W_0$ distribution	Universal 2D Gaussian $\mathcal{N}(0, \sigma^2)$	2307.15749
Width scaling	$\sigma \sim \sqrt{N_{\mathrm{flux}}}$	2307.15749
Universality	Same shape across $h^{2,1} = 2, 3, 4, 5$	2307.15749
Small $	W_0	$

Caveats:

This notebook uses $N \sim 100$ vacua per geometry for speed. Published results use $\mathcal{O}(10^6)$.
The Gaussian approximation breaks down at the tails ($|W_0| \ll \sigma$) — see 2307.15749 for a detailed discussion.
For exhaustive enumeration (rather than sampling), see the flux_bounding module and 2501.03984.

Landscape Statistics at Scale

Contents