Table of Contents

Two-point Correlation Function

To extract the Baryon Acoustic Oscillation (BAO) signal from a survey of millions of galaxies, cosmologists need a robust statistical tool to measure how those galaxies cluster together. The primary tool for this is the two-point correlation function, denoted as $\xi(r)$.

Fundamentally, $\xi(r)$ quantifies the excess probability of finding two galaxies separated by a specific distance $r$, compared to a completely random, uniform distribution of galaxies.

Mathematical Definition

Imagine a Universe with a mean galaxy number density of $\bar{n}$. If we randomly drop two infinitesimal volume elements, $dV_1$ and $dV_2$, separated by a distance $r$, the probability $dP$ of finding one galaxy in $dV_1$ and another in $dV_2$ is given by:

$$dP = \bar{n}^2 [1 + \xi(r)] dV_1 dV_2$$

If $\xi(r) = 0$: The galaxies are distributed completely randomly (a Poisson distribution). The probability is just $\bar{n}^2 dV_1 dV_2$.
If $\xi(r) > 0$: The galaxies are clustered. You are more likely to find a pair of galaxies at this separation than you would by chance. Gravity naturally drives $\xi(r)$ to be positive at small scales.
If $\xi(r) < 0$: The galaxies are anti-correlated. You are less likely to find a pair at this distance than in a random distribution (e.g., due to vast cosmic voids).

Practical Estimation: The Landy-Szalay Estimator

In practice, cosmologists do not have an infinite Universe to measure; they have a finite survey with complex boundaries, varying observation depths, and instrumental artifacts.

To calculate $\xi(r)$ from real data, observers generate a “Random” catalog—a simulated dataset of points distributed completely randomly, but matching the exact 3D geometry and selection effects of the actual “Data” survey. They then count the number of pairs separated by a distance $r$.

The standard method used in modern cosmology is the Landy-Szalay estimator:

$$\hat{\xi}(r) = \frac{DD(r) - 2DR(r) + RR(r)}{RR(r)}$$

Where: * $DD(r)$ is the number of Data-Data pairs separated by distance $r$. * $DR(r)$ is the number of Data-Random pairs. * $RR(r)$ is the number of Random-Random pairs.

*(Note: These pair counts are normalized by the total number of possible pairs in each catalog).*

Relationship to the Power Spectrum $P(k)$

In cosmology, it is often mathematically convenient to work in Fourier space rather than configuration (real) space. The Fourier transform equivalent of the two-point correlation function is the Matter Power Spectrum, $P(k)$, where $k$ is the wavenumber ($k \sim 2\pi/r$).

For a statistically isotropic Universe (where clustering depends only on the magnitude of the distance $r$, not the direction), the two-point correlation function is the Fourier transform of the power spectrum:

$$\xi(r) = \frac{1}{2\pi^2} \int_0^{\infty} P(k) \frac{\sin(kr)}{kr} k^2 dk$$

Both $\xi(r)$ and $P(k)$ contain the exact same physical information, but they suffer from different observational systematics, so cosmologists usually measure both to cross-check their results.

The BAO Peak in the Correlation Function

As established in the previous section, the physics of the early Universe created a spherical shell of baryons at the comoving sound horizon, $r_s \approx 147$ Mpc.

Because galaxies preferentially form in regions of high dark matter and baryon density, this primordial shell leaves a distinct imprint on the late-time distribution of galaxies. When we plot $\xi(r)$ for millions of galaxies, we see a smooth, exponential decay at small scales (due to standard gravitational clustering), interrupted by a distinct, localized “bump” or peak at exactly $r \approx 147$ Mpc.

This single peak in the 2PCF is the statistical manifestation of the acoustic waves stalling at recombination, and precisely locating the center of this peak at different redshifts is how the Universe’s expansion history is mapped.

Fig 1: BAO peak from the 2-point correlation function.