Gaussian Measures, Part 2 - The Multivariate Case
A fairly deep dive into Gaussian measures in finitely many dimensions. The next step in building up to the infinite-dimensional case.
Preliminaries
Let . Throughout this post, we write for the standard inner product on , and the norm induced by this inner product. We write to denote the set of linear maps from to . We will frequently consider linear functions of the form , and denote the set of all such functions as . Every can be uniquely represented as an inner product with some vector . When we wish to make this identification explicit, we will write to denote the linear map given by . Likewise, if we are working with a generic linear map , then we will write to denote the unique vector satisfying . We will also loosely refer to as a projection onto . Note that if has unit norm, then this is precisely the magnitude of the orthogonal projection of onto .
Sigma Algebra
We recall from the previous post that a univariate Gaussian measure is defined on the Borel -algebra . Analogously, we will define an -dimensional Gaussian measure on the Borel sets . There are two reasonable approaches to define , and I want to take a moment to highlight them since the same two options will present themselves when we consider defining the Borel sets over infinite-dimensional spaces.
Option 1: Leverage the Standard Topology on
A Borel -algebra can be defined for any space that comes equipped with a topology; i.e., a collection of open sets. The Borel -algebra is then defined as the smallest -algebra that contains all of these open sets. In the present setting, this means where denotes the -algebra generated by a collection of sets . A nice perspective on is that it is the smallest -algebra that ensures all continuous functions are measurable. We note that this is not a property of Borel -algebras more generally, but one that does hold in the special case of ; see this StackExchange post for some details.
Option 2: Product of One-Dimensional Borel Sets
A second reasonable approach is to try extending what we have already defined in one dimension, which means simply taking Cartesian products of one-dimensional Borel sets: It turns out that the resulting -algebra agrees with that defined in option 1, so there is no ambiguity in the notation.
Definition: One-Dimensional Projections
With the -algebra defined, we now consider how to define a Gaussian measure on the measurable space . We will explore a few different equivalent definitions, starting with this: a measure is Gaussian if all of its one-dimensional projections are (univariate) Gaussians.
Definition. A probability measure defined on the Borel measurable space is called Gaussian if, for all linear maps , the pushforward measure is Gaussian on .
As in the univariate setting, we define a random variable as Gaussian if its law is a Gaussian measure. Recall that each linear map can be identified with a unique such that , which we indicate by writing . We thus see that is the distribution of the random variable . The previous definition can therefore be re-stated in the language of random variables as follows: an -dimensional random variable is Gaussian if every linear combination of the entries of is univariate Gaussian. More precisely:
Definition. Let be a a probability space and a random vector. Then is called Gaussian if is a univariate Gaussian random variable for all .
Notice that by choosing (the vector with a in its entry and zeros everywhere else), then this definition immediately tells us that a Gaussian random vector has univariate Gaussian marginal distributions. That is, if then is univariate Gaussian for all .
Fourier Transform
Just as in the univariate case, the Fourier transform provides an alternate, equivalent, characterization of Gaussian measures. First, we recall how such a Fourier transform is defined in the multiple variable setting.
Definition. Let be a measure on . Then the Fourier transform of is defined as
We can alternatively view as a function of ; that is, Note that this is similar in spirit to the definition of the -dimensional Gaussian measure, in the sense that the extension from one to multiple dimensions is acheived by considering one-dimensional linear projections. This idea will also provide the basis for an extension to infinite dimensions.
With this background established, we can state the following, which gives an alternate definition of Gaussian measures.
Theorem. A probability measure defined on the Borel measurable space is Gaussian if and only if its Fourier transform is of the form for some fixed vector and symmetric, positive semi-definite matrix .
The proof, which is given in the appendix, also provides the expressions for the mean and covariance of as a byproduct.
Corollary. Let be a Gaussian measure with Fourier transform . Then the mean vector and covariance matrix of are given by \begin{align} m &= \int x \mu(dx) \tag{5} \newline C &= \int (x-m)(x-m)^\top \mu(dx). \end{align}
Density Function
The one-dimensional projections and Fourier transform provide equivalent definitions of multivariate Gaussian measures. The more familiar notion of the Gaussian density provides a third characterization, with the caveat that it only pertains to the case that the covariance matrix $C$ is positive definite.
Proposition. Let $\mu$ be a Gaussian measure with mean vector $m$ and covariance matrix $C$, as in (5). Then $\mu$ admits a Lebesgue density if and only if $C$ is positive definite, in which case $$ \frac{d\mu}{d\lambda}(x) = \text{det}(2\pi C)^{-1/2}\exp\left\{-\frac{1}{2} \langle C^{-1}(x-m), x-m\rangle\right\}. \tag{6} $$
Transformation of Standard Gaussian Random Variables
In this section we provide yet another characterization of Gaussian measures. We consider a generative perspective, whereby a Gaussian random vector $X \in \mathbb{R}^n$ arises via a linear transformation of $n$ iid $\mathcal{N}(0,1)$ random variables.
Proposition. Let $Z_1, \dots, Z_n$ be iid $\mathcal{N}(0, 1)$ random variables stacked into the column vector $Z \in \mathbb{R}^n$. Then, for any fixed vector $m \in \mathbb{R}^n$ and matrix $A \in \mathbb{R}^{n \times n}$, the random variable given by $$ X := m + AZ \tag{7} $$ has a Gaussian distribution $\mathcal{N}(m, AA^\top)$. Conversely, let $X \in \mathbb{R}^n$ be a Gaussian random variable. Then there exists a vector $m \in \mathbb{R}^n$ and matrix $A \in \mathbb{R}^{n \times n}$ such that $X = m + AZ$.
Another way to think about this is that we have defined a transport map such that \begin{align} T(Z) &= X, &&\text{where } T(z) = m + Az. \end{align} That is, we feed in vectors with iid standard Gaussian components, and get out vectors with distribution . This is a very practical way to look at multivariate Gaussians, immediately providing the basis for a sampling algorithm. Indeed, suppose we want to draw iid samples from the distribution . Then the above proposition gives us a way to do so, provided that we can (1) draw univariate samples; and (2) factorize the matrix as for some . This procedure is summarized in the below corollary.
Corollary. The following algorithm produces a sample from the distribution .
1. Draw iid samples and stack them in a column vector .
2. Compute a factorization .
3. Return .
Repeating steps 1 and 3 will produce independent samples from (the matrix factorization need not be re-computed each time).
As for the factorization, the Cholesky decomposition is a standard choice when is positive definite. When is only positive semidefinite, the eigendecomposition provides another option, since so setting does the trick. Note that is positive semidefinite so is just a diagonal matrix with nonnegative values on the diagonal.
Covariance Operator
As shown in (5) (and derived in the appendix), the covariance matrix associated with a Gaussian measure satisfies \begin{align} C &= \int (x - m)(x - m)^\top \mu(dx), \end{align} where and are the quantities given in the Fourier transform (4). We take a step further in this section by viewing the covariance as an operator rather than a matrix. Definitions of the covariance operator differ slightly across various textbooks and literature; we will try to touch on the different conventions here and explain their connections. As a starting point, we consider the following definition.
Definition. Let be a Gaussian measure with Fourier transform given by (4). Then the covariance operator of is defined as the function given by
We immediately have a variety of equivalent expressions for this operator: \begin{align} \mathcal{C}\left(y, y^\prime \right) &= \langle Cy, y^\prime\rangle \newline &= y^\top \left[\int (x - m)(x - m)^\top \mu(dx)\right] y^\prime \newline &= \int \langle y, x - m\rangle \langle y^\prime, x - m \rangle \mu(dx). \tag{10} \end{align} In terms of the random variable , we can also write this as \begin{align} \mathcal{C}\left(y, y^\prime \right) &= \int \langle y, x - m\rangle \langle y^\prime, x - m \rangle \mu(dx) \newline &= \int \left(\langle y, x\rangle - \langle y, \mathbb{E}[X]\rangle\right) \left(\langle y^\prime, x\rangle - \langle y^\prime, \mathbb{E}[X]\rangle\right) \mu(dx) \newline &= \mathbb{E}\left[\left(\langle y, x\rangle - \mathbb{E} \langle y,X\rangle\right) \left(\langle y^\prime, x\rangle - \mathbb{E} \langle y^\prime,X\rangle\right)\right] \newline &= \text{Cov}\left[\langle y,X\rangle, \langle y^\prime,X\rangle \right]. \end{align} In words, the covariance operator outputs the covariance between the one dimensional projections of along the directions and . Given that the multivariate Gaussian measure is defined in terms of its one-dimensional projections, this should feel fairly natural. In fact, we see that the Fourier transform of can be written as When the same argument is fed into both slots of the covariance operator (as is the case in the Fourier transform expression above), the result is seen to correspond to the variance of the one-dimensional projection:
Inner Products
One feature that makes the covariance operator a convenient mathematical object to study is the inner product structure it provides. Indeed, the following result states that the covariance operator is almost an inner product, and is a true inner product when the covariance matrix is positive definite.
Proposition. Let be a Gaussian measure with Fourier transform given by (4). Then the covariance operator (9) is symmetric, bilinear, and positive semidefinite. If , the covariance matrix of , is positive definite, then the covariance operator is also positive definite and thus defines an inner product.
Proof. Bilinearity follows immediately from definition (9). Symmetry similarly follows, and is more immediately obvious in expression (10). Since is positive semidefinite, then so is also positive semidefinite. The inequality is strict when is positive definite and , in which case is an inner product.
We can therefore think of as defining a new inner product by weighting the Euclidean inner product by a positive definite matrix .
A Closely Related Operator
Our definition for the covariance operator arises form a looking at the quadratic form (the expression that appears in the Fourier transform) in a new way. In particular, we viewed this as a function of two arguments, such that the above quadratic form is the value the function takes when both arguments happen to be . We could look at this from yet another perspective by considering the quadratic form as a function of only one of its arguments, say, the left one. This gives another useful operator that is closely related to .
Definition. Let be a Gaussian measure with mean and covariance matrix . We define the operator by
By plugging in the definition of the covariance matrix, we see that this is equivalent to We thus have the connection between , , and : While some sources also refer to as the covariance operator, we will reserve this term for . The following result is immediate, since inherits the claimed properties from .
Proposition. The linear operator is self-adjoint and positive semidefinite.
At this point, the definition of seems rather unnecessary given its similarity to . These are, after all, essentially the same objects aside from the fact that we view as an element of and as an element of , the set of linear maps from to . These distinctions will become more consequential when we start considering Gaussian measures in more abstract settings.
Alternative Definition
As mentioned above, definitions of the covariance operator very slightly in the literature. One basic modification commonly seen is to assume that is centered (zero mean) and thus define the covariance operator as This is done primarily for convenience, as one can always center a Gaussian measure and then add back the mean when needed. Indeed, assume we are working with a Gaussian measure with mean . To apply (14), we center the measure, which formally means considering the pushforward where . Using subscripts to indicate the measure associated with each operator, we apply the change-of-variables theorem to obtain \begin{align} \mathcal{C}_{\nu}(y, y^\prime) &= \int \langle y, x\rangle \langle y^\prime, x\rangle (\mu \circ T^{-1})(dx) \newline &= \int \langle y, T(x)\rangle \langle y^\prime, T(x)\rangle \mu(dx) \newline &= \int \langle y, x-m \rangle \langle y^\prime, x-m \rangle \mu(dx), \end{align} which we see agrees with (10), our (uncentered) definition of . Thus, our original definition (10) can be thought of as first centering the measure and then applying (14). We could similarly have defined in this way, via This is simply (13) with .
Dual Space Interpretation
As we have done repeatedly throughout this post, we can identify with its dual . This may seem needlessly pedantic in the present context, but becomes necessary when defining Gaussian measures on infinite-dimensional spaces. The expression (10) provides the natural jumping off point for reinterpreting the covariance operator as acting on linear functionals. To this end, we can consider re-defining the covariance operator as , where By identifying each with its dual vector , this definition is seen to agree with (9). Note that and are linear, so we could have equivalently defined as
We can similarly apply the dual space interpretation to . There are a view different ways we can think about this. Let’s start by identifying the codomain of with its dual and hence re-define this operator as , where Under this definition, maps an input to a linear functional . Alternatively, we could identify the domain with its dual, and instead consider the operator , where We can of course combine these two ideas and consider the map . However, thinking ahead to more abstract settings, it is actually a bit more interesting to consider by identifying with its double dual. From this perspective, the operator is defined by Notice that in this case maps a dual vector to a double dual vector (i.e., the output is itself a function that accepts a linear functional as input). Since, , , and are all isomorphic, in the present setting these various perspectives are interesting but perhaps a bit overkill. When we consider the infinite-dimensional setting in the subsequent post, not all of these perspectives will generalize. The key will be identifying the perspective that does actually generalize to infinite dimensional settings.
Conditional Distributions
Appendix
Proof of (4): Fourier Transform Characterization
Assume that the probability measure has a Fourier transform given by for some nonrandom vector and symmetric positive semidefinite matrix . We must show that the pushforward is Gaussian for an arbitrary . We will do so by invoking the known form of the Fourier transform for univariate Gaussians. To this end, let and consider \begin{align} \mathcal{F}\left(\mu \circ \ell_y^{-1}\right)(t) &= \int e^{its} \left(\mu \circ \ell_y^{-1} \right)(ds) \newline &= \int e^{it \ell_y(x)} \mu(dx) \newline &= \int e^{i \langle ty, x\rangle} \mu(dx) \newline &= \hat{\mu}(ty) \newline &= \exp\left(i \langle m, ty\rangle - \frac{1}{2}\langle C(ty), ty\rangle \right) \newline &= \exp\left(it \langle m, y\rangle - \frac{1}{2}t^2\langle Cy, y\rangle \right), \end{align} where the second equality uses the change-of-variables formula, and the final uses the assumed form of . Also recall the alternate notation for the Fourier transform: . We recognize the final expression above as the Fourier transform of a univariate Gaussian measure with mean and variance , evaluated at frequency . This implies that is Gaussian. Since was arbitrary, it follows by definition that is Gaussian.
Conversely, assume that is Gaussian. Then, is
univariate Gaussian for all . We must
show that assumes the claimed form. Letting ,
we have
\begin{align}
\hat{\mu}(y)
&= \int e^{i \langle y, x\rangle} \mu(dx) \newline
&= \int e^{is} \left(\mu \circ \ell_y^{-1}\right)(ds) \newline
&= \mathcal{F}\left(\mu \circ \ell_y^{-1}\right)(1) \newline
&= \exp\left(i m(y) - \frac{1}{2}\sigma^2(y) \right),
\end{align}
where and are the mean and variance of ,
respectively. The first equality again uses the change-of-variables formula, while
the last expression follows from the assumption that
is Gaussian, and hence must have a Fourier transform of this form. It remains
to verify that and
to complete the proof. By definition, the
mean of is given by
\begin{align}
m(y) &= \int \ell_y(x) \mu(dx) \newline
&= \int \langle y, x\rangle \mu(dx) \newline
&= \left\langle y, \int x \mu(dx) \right\rangle \newline
&=: \langle y, m \rangle,
\end{align}
where we have used the linearity of integration and defined the nonrandom
vector . Now, for the variance we have
\begin{align}
\sigma^2(y)
&= \int \left[\ell_y(x) - m(y) \right]^2 \mu(dx) \newline
&= \int \left[\langle y, x\rangle - \langle y, m \rangle \right]^2 \mu(dx) \newline
&= \int \langle y, x-m\rangle^2 \mu(dx) \newline
&= y^\top \left[\int (x-m)(x-m)^\top \mu(dx) \right] y \newline
&=: y^\top C y \newline
&= \langle Cy, y \rangle.
\end{align}
Note that is the expectation of a nonnegative quantity, so
for all ; i.e.,
is positive semidefinite. We have thus shown that
\begin{align}
\hat{\mu}(y) &= \exp\left(\langle y, m\rangle - \frac{1}{2}\langle Cy,y\rangle \right),
\end{align}
with a positive semidefinite matrix, as required.
Proof of (6): Density Function
Let’s start by assuming is a Gaussian measure with mean and positive definite covariance matrix . Then admits an eigendecomposition where the columns of are orthonormal and with . Then by definition of a Gaussian measure, the one-dimensional projections are Gaussian, with respective means and variances (see the above proof for the derivation of the mean and variance). Note that the positive definite assumption ensures that the variances are all strictly positive. Since the variances are positive, each of these univariate Gaussians admits a density for . We will now show that can be written as the product of independent univariate Gaussian measures. We will leverage the Fourier transform to establish this fact. Letting , we will lighten notation by writing and ; and can thus be represented with respect to the eigenbasis as \begin{align} &y = \sum_{i=1}^{n} \alpha_i u_i, &m = \sum_{i=1}^{n} \beta_i u_i. \end{align} Taking the Fourier transform of , we have \begin{align} \hat{\mu}(y) &= \exp\left(i\langle y,m\rangle - \frac{1}{2}\langle Cy,y\rangle \right) \newline &= \exp\left(i\left\langle \sum_{i=1}^{n} \alpha_i u_i, \sum_{i=1}^{n} \beta_i u_i \right\rangle - \frac{1}{2}\left\langle \sum_{i=1}^{n} \alpha_i Cu_i, \sum_{i=1}^{n} \alpha_i u_i \right\rangle \right) \newline &= \exp\left(i\sum_{i=1}^{n} \alpha_i \beta_i - \frac{1}{2}\sum_{i=1}^{n}\lambda_i \alpha_i^2 \right) \newline &= \prod_{i=1}^{n} \exp\left(i\alpha_i \beta_i - \frac{1}{2}\lambda_i \alpha_i^2 \right) \newline &= \prod_{i=1}^{n} \mathcal{F}\left(\mathcal{N}(\beta_i, \lambda_i) \right)(\alpha_i). \end{align}
Proof of (7): Transformation of Standard Gaussian
For completeness, we start by proving the following basic fact.
Lemma. Let and define the random vector . Then the law of is multivariate Gaussian, in particular .
Proof. Let and denote the law of and , respectively. Observe that is the product measure constructed from copies of ; that is, . We will establish the Gaussianity of by appealing to the Fourier transform. Let and consider \begin{align} \hat{\mu}(y) &= \int e^{i \langle y, x\rangle} \mu(dx) \newline &= \int \prod_{i=1}^{n} \exp\left(i y_i x_i\right) (\nu \otimes \cdots \otimes \nu)(dx_1, \dots, dx_n) \newline &= \prod_{i=1}^{n} \int \exp\left(iy_i x_i \right) \nu(dx_i) \newline &= \prod_{i=1}^{n} \hat{\nu}(y_i) \newline &= \prod_{i=1}^{n} \exp\left(-\frac{1}{2}y_i^2\right) \newline &= \exp\left(-\frac{1}{2} \sum_{i=1}^{n} y_i^2 \right) \newline &= \exp\left(-\frac{1}{2} \langle Iy, y \rangle \right), \end{align} where we have used the Fourier transform of the univariate Gaussian measure . We recognize the final expression to be the Fourier transform of a Gaussian measure with mean vector and covariance matrix .
Proof of (7). Proceeding with the main result, we first show that the
random variable has law . This follows
immediately from the above lemma and basic facts about Fourier transforms.
In particular, recall the following properties of Fourier transforms.
Recall that we write to denote the law of a random variable
, and thus is the Fourier transform of this law. We are interested
in the Fourier transform
which is easily derived if one recalls the effect of affine transformations on
Fourier transforms. To be self-contained, we derive the required results here;
let be an arbitrary -dimensional random vector, and , be non-random
as above. Then,
\begin{align}
\hat{\mathcal{L}}(AY)(x)
&= \mathbb{E}\left[\exp\left(i\langle x, AY\rangle \right) \right]
= \mathbb{E}\left[\exp\left(i\langle A^\top x, Y\rangle \right) \right]
= \hat{\mathcal{L}}(Y)(A^\top x)
\end{align}
and
\begin{align}
\hat{\mathcal{L}}(m + Y)(x)
&= \mathbb{E}\left[\exp\left(i\langle x, m+Y\rangle \right) \right]
= \exp\left(i\langle x, m\rangle \right)\mathbb{E}\left[\exp\left(i\langle x, Y\rangle \right) \right]
= \exp\left(i\langle x, m\rangle \right) \hat{\mathcal{L}}(Y)(x).
\end{align}
We combine these two results to the present problem, obtaining
\begin{align}
\hat{\mathcal{L}}(X)(x)
&= \hat{\mathcal{L}}(m + AZ)(x) \newline
&= \exp\left(i\langle x, m\rangle \right) \hat{\mathcal{L}}(Z)(A^\top x) \newline
&= \exp\left(i\langle x, m\rangle \right)\exp\left(-\frac{1}{2}\langle A^\top x, A^\top x\rangle \right) \newline
&= \exp\left(i\langle x, m\rangle \right)\exp\left(-\frac{1}{2}\langle AA^\top x, x\rangle \right) \newline
&= \exp\left(i\langle x, m\rangle - \frac{1}{2}\langle AA^\top x, x\rangle \right),
\end{align}
where we have used the fact
References
- Gaussian Measures (Vladimir Bogachev)
- An Introduction to Stochastic PDEs (Martin Hairer)
TODOs
- Proof that zeros in covariance matrix imply independence.
- Proof that zeros in precision matrix imply conditional independence.