Gaussian Measures, Part 1 - The Univariate Case

A brief introduction to Gaussian measures in one dimension, serving to provide the setup for an extension to multiple, and eventually infinite, dimensions.

I intend for this to be part one of a (at least) three part series on Gaussian measures, with the ultimate goal being to understand Gaussian processes as random elements in some suitable infinite-dimensional space. Defining a rigorous infinite-dimensional analog of the familiar Gaussian distribution is no small task, and texts on this subject can be quite intimidating. I’ve found that, personally, the key to make these references more approachable was to first develop a deep understanding of Gaussian measures in finite dimensions. Indeed, many of the concepts in the infinite-dimensional case are directly motivated by their finite-dimensional analogs. In particular, I found the parallels between the transitions from one-to-multiple and multiple-to-infinite dimensions to be quite enlightening. Therefore, we start here with the simplest case: Gaussian measures in one dimension. This basic case is likely worth exploring even for those well-acquainted with the Gaussian distribution, as it requires a shift in thinking about densities to thinking more abstractly in terms of measures. While the former seems perfectly sufficient in one dimension, we will find that the measure-theoretic approach becomes a necessity in generalizing to infinite dimensions. This post also serves to establish notation, and introduce some key concepts that will be used throughout this series, including Fourier transforms (characteristic functions), Radon-Nikodym derivatives, and the change-of-variables formula.

Density Function

We start by recalling that the univariate Gaussian density takes the form

N(xm,σ2):=12πσ2exp{12σ2(xm)2}.(1) \mathcal{N}(x|m, \sigma^2) := \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left\{-\frac{1}{2\sigma^2}(x - m)^2 \right\}. \tag{1}

We’re typically used to defining the Gaussian as a random variable with density equal to $\mathcal{N}(x|m, \sigma^2)$. Since we’re interested in measures here, we can simply define the corresponding measure by integrating this density.

Definition. A probability measure $\mu$ defined on the Borel measurable space $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ is called Gaussian provided that, for any Borel set $B \in \mathcal{B}(\mathbb{R})$, either \begin{align} \mu(B) = \int_{B} \mathcal{N}(x|m, \sigma^2) dx \end{align} for some fixed $m \in \mathbb{R}$ and $\sigma^2 > 0$; or \begin{align} \mu(B) = \delta_m(B). \end{align}

Note that a Gaussian measure is a Borel measure; that is, we define it on the Borel sets B(R)\mathcal{B}(\mathbb{R}). This will remain true as we extend to multiple, and even infinite, dimensions. The first case in the above definition is the familiar one, seeing as we’re simply integrating over the Gaussian density. The notation dxdx in the integral formally means that the integration is with respect to the Lebesgue measure λ\lambda on (R,B(R))(\mathbb{R}, \mathcal{B}(\mathbb{R})). Another way we could phrase this is to say that a probability measure μ\mu is Gaussian provided that its Radon-Nikodym derivative with respect to λ\lambda is N(xm,σ2)\mathcal{N}(x|m, \sigma^2); i.e., \begin{align} \frac{d\mu}{d\lambda}(x) = \mathcal{N}(x|m, \sigma^2). \end{align} The density, of course, is only defined if σ2>0\sigma^2 > 0. It turns out to be nice to also allow for the σ2=0\sigma^2 = 0 case. While N(xm,0)\mathcal{N}(x|m, 0) is not defined, we can formalize this notion as a Dirac measure δm\delta_m, which is defined by δm(B):=1[mB]. \delta_m(B) := 1[m \in B]. In this case the Gaussian measure is simply a point mass - all of the probability is concentrated at the mean mm. We call such a Gaussian measure degenerate, while Gaussian measures that admit densities are labelled non-degenerate. We write μ=N(m,σ2)\mu = \mathcal{N}(m, \sigma^2) to signify that μ\mu is a Gaussian measure with density N(xm,σ2)\mathcal{N}(x|m, \sigma^2) if σ2>0\sigma^2 > 0, or μ=δm\mu = \delta_m if σ2=0\sigma^2 = 0. When m=0m = 0 we call μ\mu centered or symmetric. Note that in this case the measure μ\mu is symmetric in the sense that μ(B)=μ(B)\mu(B) = \mu(-B) for any Borel set BB. If, moreover, σ2=1\sigma^2 = 1 then we call μ=N(0,1)\mu = \mathcal{N}(0, 1) the standard Gaussian.

Up to now, we have been treating mm and σ2\sigma^2 as generic numbers, but one can show that they correspond to the mean and variance of μ\mu, respectively.

Proposition. Let $\mu = \mathcal{N}(m, \sigma^2)$ be a Gaussian measure. Then, \begin{align} m &= \int x \mu(dx), && \sigma^2 = \int [x - m]^2 \mu(dx). \end{align}

The proof in the N(m,0)\mathcal{N}(m, 0) case is trivial given the fact that integrating a measurable function with respect to δm\delta_m is equivalent to evaluating that function at mm. Thus, \begin{align} &\int x \delta_m(dx) = m, &&\int [x - m]^2 \delta_m(dx) = [m - m]^2 = 0 = \sigma^2. \end{align} The derivations of the non-degenerate case are quite standard results, so we won’t take the time to prove them here.

Fourier Transform

A Gaussian measure can alternatively be defined via its Fourier transform \begin{align} \hat{\mu}(t) := (\mathcal{F}(\mu))(t) := \int e^{its} \mu(ds). \end{align} The notation F(μ)\mathcal{F}(\mu) makes it clear that the Fourier transform is an operator that acts on the measure μ\mu, though we will typically stick with the more succinct notation μ^\hat{\mu}. Note that this is a generalization of the standard Fourier transform, which acts on functions, to an operator which instead acts on measures. Probability theorists draw a distinction between the two by referring to μ^\hat{\mu} as the characteristic function of μ\mu. A classical result is that the Fourier transform of a Gaussian density is itself a Gaussian density (up to scaling). The following result captures this case, as well as the degenerate one.

Proposition. Let $\mu = \mathcal{N}(m, \sigma^2)$ be a Gaussian measure. Then its Fourier transform is given by \begin{align} \hat{\mu}(t) &= \exp\left(itm - \frac{1}{2}t^2 \sigma^2 \right). \tag{2} \end{align}

The Fourier transform completely characterizes μ\mu and hence we could have taken (2) as an alternative definition of a Gaussian measure. Indeed, it is this definition that ends up proving much more useful, in that it can be easily generalized to Gaussian measures in multiple, and infinite, dimensions. We also note that μ^\hat{\mu} conveniently captures both the degenerate and non-degenerate cases in one expression. In the degenerate case, we have δ^m(t)=eitsδm(ds)=eitm, \hat{\delta}_m(t) = \int e^{its} \delta_m(ds) = e^{itm}, which indeed agrees with (2) with σ2=0\sigma^2 = 0. The complete result can be derived in many different ways; a quick Google search should satisfy the curious reader.

Random Variables

We have so far focused our discussion on measures μ\mu defined on the measurable space (R,B(R))(\mathbb{R}, \mathcal{B}(\mathbb{R})). We now extend our discussion to include Gaussian random variables. In short, a random variable XX is Gaussian if its distribution (i.e., law) is Gaussian. Let’s be a bit more precise though.

Definition. Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space, and $X: \Omega \to (\mathbb{R}, \mathcal{B}(\mathbb{R}))$ a random variable. The distribution (or law) of $X$ is defined to be the probability measure $\mathbb{P} \circ X^{-1}$ on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$. We write $\mathcal{L}(X) = \mu$ ($\mathcal{L}$ for "law") or $X \sim \mu$ to mean that the random variable $X$ has distribution $\mu$.

Definition. We say that $X$ is a Gaussian random variable if $\mathcal{L}(X) = \mathbb{P} \circ X^{-1}$ is a Gaussian measure.

To be clear on notation, we write PX1\mathbb{P} \circ X^{-1} to denote the pushforward of the measure P\mathbb{P} under the map XX, which is given by \begin{align} &(\mathbb{P} \circ X^{-1})(B) := \mathbb{P}(X^{-1}(B)), &&B \in \mathcal{B}(\mathbb{R}). \end{align} Here, X1(B):={ωΩ:X(ω)B}X^{-1}(B) := \{\omega \in \Omega : X(\omega) \in B\} denotes the inverse image (i.e., pre-image) of BB under XX.

The introduction of random variables provides a new language to express the concepts introduced above. For example, suppose that XμX \sim \mu. Then we can write the expectation of XX in a few different ways: Eμ[X]:=Rx μ(dx)=Rx (PX1)(dx)=ΩX(ω) P(dω). \mathbb{E}_{\mu}[X] := \int_{\mathbb{R}} x \ \mu(dx) = \int_{\mathbb{R}} x \ (\mathbb{P} \circ X^{-1})(dx) = \int_{\Omega} X(\omega) \ \mathbb{P}(d\omega). The final equality is courtesy of the change-of-variables formula, a result that we will be using repeatedly throughout these notes. Following the above notation, we can also write the Fourier transform μ^\hat{\mu} in terms of the random variable XX as μ^(t)=Eμ[eitX]. \hat{\mu}(t) = \mathbb{E}_{\mu}\left[e^{itX} \right].

The Central Limit Theorem

While it is not the focus of these notes, a post on Gaussian measures seems incomplete without mentioning the central limit theorem (CLT). We just state the basic result here.

Theorem. Let $X_1, X_2, \dots$ be independent and identically distributed random variables with mean $m$ and variance $\sigma^2 < \infty$. Let $S_n := X_1 + \dots + X_n$ and $Z \sim \mathcal{N}(0,1)$. Then the following convergence result holds, which can be stated equivalently in terms of weak convergence of measures or distributional convergence of random variables: \begin{align} &\mathcal{L}\left(\frac{S_n - m}{\sigma\sqrt{n}}\right) \overset{w}{\to} \mathcal{L}(Z), &&\frac{S_n - m}{\sigma\sqrt{n}} \overset{d}{\to} Z, \end{align} as $n \to \infty$.