\[ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\Pr}{\mathbb{P}} \newcommand{\given}{\mid} \newcommand{\Def}{:=} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Gaussian}{\mathcal{N}} \newcommand{\fwd}{\mathcal{G}} \newcommand{\u}{u} \newcommand{\yobs}{y^{\dagger}} \newcommand{\y}{y} \newcommand{\noise}{\epsilon} \newcommand{\covNoise}{\Sigma} \newcommand{\meanVec}{m} \newcommand{\covMat}{C} \newcommand{\dimObs}{n} \newcommand{\dimPar}{d} \newcommand{\parSpace}{\mathcal{U}} \newcommand{\misfit}{\Phi} \newcommand{\misfitReg}{\Phi_R} \newcommand{\misfitPost}{\Phi_{\pi}} \newcommand{\covPrior}{\covMat} \newcommand{\meanPrior}{\meanVec} \newcommand{\dens}{\pi} \newcommand{\priorDens}{\pi_0} \newcommand{\postDens}{\pi} \newcommand{\normCst}{Z} \newcommand{\joint}{\overline{\pi}} \newcommand{\meanObs}{\meanVec^{\y}} \newcommand{\covObs}{\covMat^{\y}} \newcommand{\covCross}{\covMat^{\u \y}} \newcommand{\tcovCross}{\covMat^{\y \u}} \newcommand{\GaussProj}{\mathcal{P}_{\Gaussian}} \newcommand{\meanPost}{\meanVec_{\star}} \newcommand{\covPost}{\covMat_{\star}} \newcommand{\transport}{T} \newcommand{\nens}{J} \]
The ensemble Kalman filter (EnKF) is a well-established algorithm for state estimation in high-dimensional state space models. More recently, it has gained popularity as a general-purpose derivative-free tool for optimization and approximate posterior sampling; i.e., for the solution of inverse problems. The label ensemble Kalman inversion (EKI) is generally used to refer to the class of algorithms that adapt the EnKF methodology for such purposes. While these algorithms are typically quite simple – mostly relying on slight modifications of the standard EnKF update formula – there are quite a few subtleties required in designing and analyzing EKI methods. In particular, while much of the EKI literature is focused on optimization, small modifications of optimization-focused algorithms can be made to instead target the goal of posterior sampling. In a series of posts, we will walk through these subtleties, exploring the potential of the EnKF both as a derivative-free (approximate) optimizer and sampler. We start in this post by outlining the basic setup and goals, and then proceed to introduce a basic EnKF algorithm for approximate posterior sampling.
1 Setup: Inverse Problems
This section serves to introduce the notation that will be used throughout the entire series of posts. Our focus will be on inverse problems, with the goal being to recover a latent parameter \(\u \in \R^{\dimPar}\) from indirect, and potentially noisy, observations \(\yobs \in \R^{\dimObs}\). We assume the parameter and data are related via a forward model (i.e., parameter-to-observable map) \(\fwd: \R^{\dimPar} \to \R^{\dimObs}\), giving the relationship \(y \approx \fwd(\u)\).
1.1 Optimization
We start by formulating the solution to the inverse problem as an optimization problem. One of the most basic approaches we might take is to seek the value of the parameter that minimizes the quadratic error between the data and the model prediction.
Define the least squares model-data misfit function \[ \misfit(u) := \frac{1}{2}\lVert \yobs - \fwd(\u)\rVert^2_{\covNoise} := \frac{1}{2}(\yobs - \fwd(\u))^\top \covNoise^{-1}(\yobs - \fwd(\u)), \tag{1}\] weighted by a positive definite matrix \(\covNoise\). The (nonlinear) least squares minimization problem is then given by \[ u_{\star} \in \text{argmin}_{u \in \parSpace} \ \misfit(\u). \tag{2}\]
The above definition also serves to define the notation we will be using throughout this series to denote weighted Euclidean norms. Note also that \(\misfit(u)\) depends on the observed data \(\yobs\), but we suppress this dependence in the notation. A natural extension to the least-squares problem is to add a regularization term to the objective function. We will focus on quadratic regularization terms, which is referred to as Tikhonov regularization and ridge regression in the inverse problems and statistical literatures, respectively.
Define the Tikhonov-regularized least squares function by \[ \misfitReg(\u) := \frac{1}{2}\lVert \yobs - \fwd(\u)\rVert^2_{\covNoise} + \frac{1}{2}\lVert \u - \meanPrior\rVert^2_{\covPrior}. \tag{3}\] The Tikhonov-regularized least squares optimization problem is given by \[ u_{\star} \in \text{argmin}_{u \in \parSpace} \misfitReg(\u). \tag{4}\]
The Tikhonov loss function balances the model fit to the data with the requirement to keep \(\u\) “close” to \(\meanPrior\), where the relative weights of these objectives are determined by the (positive definite) covariances \(\covNoise\) and \(\covPrior\).
1.2 Sampling
We next consider the Bayesian formulation of the inverse problem, whereby the goal is no longer to identify a single value \(\u_{\star}\), but instead to construct a probability distribution over all possible \(\u\). The Bayesian approach requires the definition of a joint distribution over the data and the parameter, \((\u,\y)\). We view the observed data \(\yobs\) as a particular realization of the random variable \(\y\). The solution of the Bayesian inverse problem is given by the conditional distribution \(\u \given [\y=\yobs]\), known as the posterior distribution. We will often shorten this notation by writing \(\u \given \yobs\).
Throughout this series, we will primarily focus on the joint distribution on \((\u, \y)\) induced by the following model: \[ \begin{align} \y &= \fwd(\u) + \noise \newline \u &\sim \priorDens \newline \noise &\sim \Gaussian(0, \covNoise), \end{align} \tag{5}\] where \(\priorDens\) is a prior distribution on the parameter, \(\covNoise\) is the fixed (known) covariance of the additive Gaussian noise, and \(\u\) and \(\noise\) are independent. The EnKF methodology we will discuss is particularly well-suited to such additive Gaussian models with known noise covariance, but there has been work on relaxing these restrictions. The above model defines a joint distribution \(\joint\) on \((\u,\y)\) via the product of densities \[ p(\u,\y) := p(\y \given \u) \priorDens(\u) = \Gaussian(\y \given \fwd(\u), \covNoise)\priorDens(\u), \tag{6}\] with the posterior density given by Bayes’ theorem \[ \begin{align} \postDens(\u) &:= p(\u \given \yobs) = \frac{1}{\normCst}\Gaussian(\yobs \given \fwd(\u), \covNoise)\priorDens(\u), &&\normCst := \int_{\parSpace} \Gaussian(\yobs \given \fwd(\u), \covNoise)\priorDens(\u) d\u. \end{align} \tag{7}\] We omit the dependence on \(\yobs\) in the notation \(\postDens(\u)\) and \(Z\).
We seek to draw samples from the posterior distribution \(\u \given \yobs\) under the model in Equation 5. We can phrase this as the task of sampling the probability distribution with density \[ \postDens(\u) \propto \exp\left\{\misfitPost(\u)\right\}, \tag{8}\] where \[ \begin{align} \misfitPost(\u) := -\log \postDens(\u) &= -\log p(\y \given \u) - \log \priorDens(\u) \\ &= \frac{1}{2}\lVert \yobs - \fwd(\u)\rVert^2_{\covNoise} - \log \priorDens(\u) + C, \end{align} \tag{9}\] is the (unnormalized) negative log posterior density, up to an additive constant \(C\) that is independent of \(\u\).
We introduce the notation \(\misfitPost(\u)\) in order to draw a connection with the optimization goals. Indeed, note that the log-likelihood term in Equation 9 is precisely the least squares misfit function from Equation 1 (up to an additive constant). Moreover, if we choose a Gaussian prior \(\priorDens(\u) = \Gaussian(\u \given \meanPrior, \covPrior)\), then \(\misfitPost(\u)\) agrees with \(\misfitReg(\u)\) (again, up to an additive constant). We will explore certain algorithms that assume the prior is Gaussian, but in general allow \(\priorDens\) to be non-Gaussian.
1.3 Roadmap
With the setup and goals established, we will now take steps towards practical algorithms. It is important to recognize that the application of EnKF methodology to the optimization and sampling problems will yield approximate algorithms in general. The methods will be exact (in a manner which will be made precise) in the linear Gaussian setting, where the forward model \(\fwd\) is linear and the prior \(\priorDens\) is Gaussian. The EKI algorithms we consider are derivative-free, suitable for the black-box setting where we can only evaluate \(\fwd(\cdot)\) pointwise. In typical applications, function evaluations \(\fwd(\cdot)\) may be quite computationally expensive; e.g., they might require numerically solving partial differential equations. Another benefit of the EnKF methodology is that they allow for many model evaluations to be performed in parallel. These features will become clear as we dive into the methods. We start in this post by focusing on the sampling problem; the optimization setting will be explored in future posts.
2 Joint Gaussian Approximation
We start by introducing the notion of approximate conditioning using Gaussian distributions, a core idea underlying many Kalman methods. A slight extension of this notion yields the classic EnKF update.
2.1 Gaussian Projection
Recall that we write \(\joint\) to denote the joint distribution of \((\u,\y)\) under Equation 5. In general, this distribution will be non-Gaussian, rendering the conditioning \(u \given \yobs\) a challenging task. A simple idea is to consider approximating \(\joint\) with a Gaussian, which is a distribution for which conditioning is easy. To this end, consider the approximation \[ \begin{align} \GaussProj\joint \Def \Gaussian\left( \begin{bmatrix} \meanPrior \newline \meanObs \end{bmatrix}, \begin{bmatrix} \covPrior & \covCross \newline \tcovCross & \covObs \end{bmatrix} \right) \end{align} \tag{10}\] where the means and covariances are simply given by the first two moments of \((\u, \y)\). In particular, \[ \begin{align} &\meanPrior \Def \E[\u], &&\covPrior \Def \Cov[\u] \end{align} \tag{11}\]
are the moments of the \(\u\)-marginal, and \[ \begin{align} &\meanObs \Def \E[\y], &&\covObs \Def \Cov[\y] \end{align} \tag{12}\]
are the moments of the \(\y\)-marginal. Finally, \[ \begin{align} &\covCross \Def \Cov[\u,\y], &&\tcovCross \Def [\covCross]^\top \end{align} \tag{13}\]
are the cross-covariances between \(\u\) and \(\y\). Following Calvello, Reich, and Stuart (2024), we refer to \(\GaussProj\joint\) as the Gaussian “projection” of \(\joint\). This terminology is motivated by the fact that \(\GaussProj\joint\) can be seen to minimize the Kullback-Leibler divergence \(\text{KL}(\joint \parallel q)\) over the space of Gaussians \(q = \Gaussian(\meanVec^\prime, \covMat^\prime)\) (see Sanz-Alonso, Stuart, and Taeb (2023), chapter 4 for details).
Having invoked the joint Gaussian approximation in Equation 10, we can now approximate \(\u \given \yobs\) with the Gaussian conditional. Gaussian conditionals are also Gaussian, and are thus characterized by the well-known conditional mean and covariance formulas given below.
Let \((\tilde{\u}, \tilde{y})\) be a random vector with distribution \(\GaussProj\joint\). We consider approximating the posterior \(\u \given (\y=\yobs)\) with \(\tilde{\u} \given (\tilde{\y}=\yobs)\). As the conditional of a Gaussian distribution, this posterior approximation is Gaussian, with moments \[ \begin{align} \meanPost &= \meanPrior + \covCross [\covObs]^{-1} (\yobs - \meanObs) \newline \covPost &= \covPrior + \covCross [\covObs]^{-1} \tcovCross. \end{align} \tag{14}\]
2.2 Monte Carlo Approximations
The assumption from the preceding section yielded the analytically tractable approximation in Equation 14. In the algorithms we are building towards, we will require a Monte Carlo representation of the posterior approximation. To build towards this approach, let’s first review how we might obtain a Monte Carlo representation of the Gaussian posterior in Equation 14. With the closed-form approximation from Equation 14 in hand, we can of course simply compute \(\meanPost\) and \(\covPost\) using the above equations, then sample \(\Gaussian(\meanPost, \covPost)\) using standard methods. Interestingly, we can actually bypass the step of computing conditional moments and directly sample from the Gaussian conditional using a result known as Matheron’s rule.
Let \((\tilde{\u}, \tilde{y})\) be random variables with distribution \(\GaussProj\joint\). Then the following equality holds in distribution. \[ \begin{align} (\tilde{\u} \given [\tilde{\y} = \yobs]) &\overset{d}{=} \tilde{\u} + \covCross [\covObs]^{-1} (\yobs - \tilde{\y}). \end{align} \tag{15}\]
This implies that independent samples from \(\tilde{\u} \given [\tilde{\y} = \yobs]\) can be simulated via the following algorithm.
- Sample \((\u^\prime, \y^\prime) \sim \GaussProj\joint\)
- Return \(\transport(\u^\prime, \y^\prime)\)
where \[ \transport(\u^\prime, \y^\prime) \Def \u^\prime + \covCross [\covObs]^{-1} (\yobs - \y^\prime) \tag{16}\]
The distribution of the lefthand side of Equation 15 is given in Equation 14. Notice that the righthand side is a linear function of the Gaussian random vector \((\tilde{\u}, \tilde{\y})\), and is thus Gaussian. It remains to verify that the mean and covariance of the righthand side agrees with Equation 14. The mean is given by \[ \begin{align} \E[\transport(\tilde{\u}, \tilde{\y})] &= \E[\tilde{\u}] + \covCross [\covObs]^{-1} (\yobs - \E[\tilde{\y}]) \newline &= \meanPrior + \covCross [\covObs]^{-1} (\yobs - \meanObs) \newline &= \E[\tilde{\u} \given \tilde{\y} = \yobs]. \end{align} \] Similarly, the covariance is \[ \begin{align} \Cov[\transport(\tilde{\u}, \tilde{\y})] &= \Cov[\tilde{\u} + \covCross [\covObs]^{-1} \tcovCross] \\ &= C + \covCross [\covObs]^{-1} \tcovCross - 2\covCross [\covObs]^{-1} \tcovCross \\ &= C - \covCross [\covObs]^{-1} \tcovCross \\ &= \Cov[\tilde{\u} \given \tilde{\y} = \yobs], \end{align} \] where the second equality follows from simplifying the two cross terms that come out of the covariance expression. □
The map \(\transport(\cdot, \cdot)\) is a deterministic function that transports samples from the joint Gaussian to its conditional distribution. Note that this map depends only on the first two moments of \(\joint\).
We next consider a slight adjustment to the Matheron update, which results in (potentially) non-Gaussian approximate posterior samples. This yields the classical EnKF update equation.
An alternative Monte Carlo posterior approximation can be obtained by modifying the above sampling strategy as follows:
- Sample \((u^\prime, \y^\prime) \sim \joint\)
- Return \(\transport(\u^\prime, \y^\prime)\)
Here, \(\transport(\cdot, \cdot)\) is the same transport map as defined in Equation 16. Sampling \((u^\prime, \y^\prime)\) entails sampling a parameter from the prior, then sampling from the likelihood: \[ \begin{align} &\u^\prime \sim \priorDens \newline &\y^\prime \Def \fwd(\u^\prime) + \noise^\prime, &&\noise^\prime \sim \Gaussian(0, \covNoise) \end{align} \tag{17}\]
Note that the difference between the two above algorithms is that the former samples \((u^\prime, \y^\prime)\) from the Gaussian projection \(\GaussProj\joint\), while the latter samples from the true joint distribution \(\joint\). In both cases, the form of the transport map \(\transport\) is derived from the Gaussian approximation \(\GaussProj\joint\). The EnKF update thus combines exact Monte Carlo sampling from the joint distribution with approximate conditioning motivated by a Gaussian ansatz. Since the samples \((u^\prime, \y^\prime)\) are no longer Gaussian in general, then the approximate conditional samples \(\transport(u^\prime, \y^\prime)\) can also be non-Gaussian. One might hope that this additional flexibility leads to an improved approximation. We conclude this section by defining the Kalman gain, which forms the core of the Matheron transport map \(\transport\).
The Kalman gain associated with the inverse problem in Equation 5 is defined as \[ K := \covCross [\covObs]^{-1}. \tag{18}\] The transport map in Equation 16 can thus be written as \[ \transport(\u^\prime, \y^\prime) = \u^\prime + K(\yobs - \y^\prime). \tag{19}\]
We thus see that the transport map takes the prior sample \(\u^\prime\) and adds a “correction” term based on the data. The correction is the linear map \(K\) applied to the residual \(\yobs - \y^\prime\) (i.e., the difference between the observed and predicted data). Note that the Kalman gain is a linear map that operates in the observation space (i.e., \(\R^n\)).
2.3 Practical Algorithms
The methods presented above do not yet constitute algorithms, as we have not specified how to compute the moments defining the Gaussian projection Equation 10. By replacing the exact moments with Monte Carlo estimates, we obtain an algorithm which we will refer to as single-step ensemble Kalman inversion (EKI). Given samples, \(\{u^{(j)}, \y^{(j)}\}_{j=1}^{\nens} \sim \joint\), we can estimate the required moments via the standard Monte Carlo estimates: \[ \begin{align} &\meanPrior \Def \frac{1}{\nens} \sum_{j=1}^{\nens} \u^{(j)}, &&\covPrior \Def \frac{1}{\nens-1} \sum_{j=1}^{\nens} (\u^{(j)}-\meanPrior)(\u^{(j)}-\meanPrior)^\top \newline &\meanObs \Def \frac{1}{\nens} \sum_{j=1}^{\nens} \y^{(j)}, &&\covObs \Def \frac{1}{\nens-1} \sum_{j=1}^{\nens} (\y^{(j)}-\meanObs)(\y^{(j)}-\meanObs)^\top \end{align} \tag{20}\]
\[ \begin{equation} \covCross \Def \frac{1}{\nens-1} \sum_{j=1}^{\nens} (\u^{(j)}-\meanPrior)(\y^{(j)}-\meanObs)^\top \end{equation} \]
Note that we utilize the same notation for the exact moments and their empirical estimates; the precise meaning of the notation will be made clear from context.
Algorithm: Single Step EKI \[ \begin{array}{ll} \textbf{Input:} & \text{Sample size } \nens \\ \textbf{Output:} & \text{Approximate posterior samples } \{\u_{\star}^{(j)}\}_{j=1}^{\nens} \\ \hline \textbf{1:} & \text{Sample prior: } &&\u^{(j)} \overset{\text{iid}}{\sim} \priorDens, \ j = 1, \dots, \nens \\ \textbf{2:} & \text{Sample likelihood: } &&\y^{(j)} \Def \fwd(\u^{(j)}) + \noise^{(j)}, \ \noise^{(j)} \overset{\text{iid}}{\sim} \Gaussian(0, \covNoise) \\ \textbf{3:} & \text{Estimate moments: } &&\meanPrior, \covPrior, \meanObs, \covObs \\ \textbf{4:} & \text{Transport samples: } &&\u_{\star}^{(j)} \Def \u^{(j)} + \covCross [\covObs]^{-1}(\yobs - \y^{(j)}) \end{array} \]
The definition of the empirical moments is given in Equation 20.
3 Generalizing to Iterative Algorithms
In the previous section, we developed an algorithm for approximate posterior sampling that essentially represents a single application of the EnKF update formula. Since this update stems from a Gaussian ansatz, we would expect its performance to degrade for highly nonlinear \(\fwd\) or non-Gaussian \(\priorDens\). One approach to deal with this issue is to break the problem into a sequence of easier problems. In particular, we will consider breaking the prior-to-posterior map \(\priorDens \mapsto \postDens\) into a composition of maps \[ \priorDens \mapsto \dens_1 \mapsto \cdots \mapsto \dens_K = \postDens. \tag{21}\]
We will now apply the EnKF update \(K\) times in order to track these intermediate distributions, and hopefully end up with a better approximation of the posterior. The intuition here is that the intermediate maps \(\dens_k \mapsto \dens_{k+1}\) should represent relatively smaller changes, and small changes may be more amenable to linear-Gaussian approximation.
3.1 Interpolating between probability densities
There are many ways to construct the sequence Equation 21; i.e., to interpolate between two probability distributions \(\priorDens\) and \(\postDens\). We consider the following basic tempering schedule, starting with a generic result for two arbitrary densities, and then specializing to our particular setting.
Let \(\priorDens(\u)\) and \(\postDens(\u) = \frac{1}{\normCst}\tilde{\dens}(\u)\) be two probability densities on \(\parSpace\). For a positive integer \(K\), define the sequence of densities \(\dens_0, \dens_1, \dots, \dens_K\) recursively by
\[ \begin{align} \dens_{k+1}(\u) &:= \frac{1}{\normCst_{k+1}}\dens_k(\u)L(\u)^{1/K}, &&\normCst_{k+1} := \int \dens_k(\u)L(\u)^{1/K} d\u, \end{align} \tag{22}\]
for \(k = 0, \dots, K-1\), where \[ L(\u) \Def \frac{\tilde{\dens}(\u)}{\priorDens(\u)}. \tag{23}\] Then the final density satisfies \(\dens_K = \postDens\).
To start, note that the density ratio Equation 23 can be written as \[ L(\u) = \frac{\tilde{\dens}(\u)}{\priorDens(\u)} = \frac{Z \postDens(\u)}{\priorDens(\u)}. \]
We use this fact, and the recursion Equation 22 to obtain \[ \begin{align} \normCst_K &= \int \dens_{K-1}(\u)L(\u)^{1/K} d\u \\ &= \int \dens_{0}(\u)L(\u)^{1/K} \prod_{k=1}^{K-1} \frac{L(\u)^{1/K}}{Z_k} d\u \\ &= \frac{1}{\normCst_1 \cdots \normCst_{K-1}} \int \priorDens(\u)L(\u) d\u \\ &= \frac{1}{\normCst_1 \cdots \normCst_{K-1}} \int \priorDens(\u)\frac{\normCst \postDens(\u)}{\priorDens(\u)} d\u \\ &= \frac{\normCst}{\normCst_1 \cdots \normCst_{K-1}} \int \postDens(\u) d\u \\ &= \frac{\normCst}{\normCst_1 \cdots \normCst_{K-1}}. \end{align} \]
The density recursion similarly yields \[ \begin{align} \dens_K(\u) &= \frac{1}{\normCst_{K}}\dens_{K-1}(\u)L(\u)^{1/K} \\ &= \frac{\normCst_1 \cdots \normCst_{K-1}}{\normCst} \priorDens(\u)\frac{L(\u)}{\normCst_1 \cdots \normCst_{K-1}} \\ &= \frac{1}{\normCst} \priorDens(\u) L(\u) \\ &= \postDens(\u), \end{align} \] where we have plugged in the expressions for \(\normCst_K\) and \(L(\u)\) derived above. □
The above result constructs a sequence between two arbitrary densities \(\priorDens\) and \(\postDens\). If we choose these to be the prior and posterior distributions, then we obtain a prior-to-posterior map as a corollary.
Consider a Bayesian joint distribution \(p(\u, \y) = \priorDens(\u)p(\y \given \u)\) with posterior \(\postDens(\u) \Def \frac{1}{\normCst} \priorDens(\u) p(\yobs \given \u)\) where \(\normCst = p(\y)\). Then the sequence \(\dens_0, \dens_1, \dots, \dens_K\) defined by \[ \begin{align} \dens_{k+1}(\u) &:= \frac{1}{\normCst_{k+1}}\dens_k(\u)p(\yobs \given \u)^{1/K}, &&\normCst_{k+1} := \int \dens_k(\u)p(\yobs \given \u)^{1/K} d\u, \end{align} \tag{24}\] for \(k = 0, \dots, K-1\), satisfies \(\dens_{K} = \postDens\).
This is a special case of Equation 22 where \(\tilde{\postDens}(\u) = \priorDens(\u)p(\yobs \given \u)\). Thus, the density ratio simplifies to \[ L(\u) = \frac{\priorDens(\u)p(\yobs \given \u)}{\priorDens(\u)} = p(\yobs \given \u). \] □
The following corollary specializes the result even further to the particular Bayesian inverse problem Equation 5.
Consider the particular Bayesian joint distribution \(p(\u, \y)\) defined by the model in Equation 5. Then the updates in Equation 24 take the particular form \[ \begin{align} \dens_{k+1}(\u) &:= \frac{1}{\normCst_{k+1}}\dens_k(\u)\Gaussian(\yobs \given \fwd(\u), K\covNoise), &&\normCst_{k+1} := \int \dens_k(\u)\Gaussian(\yobs \given \fwd(\u), K\covNoise) d\u. \end{align} \tag{25}\]
This update can equivalently be written as \[ \begin{align} \dens_{k+1}(\u) &:= \frac{1}{\normCst_{k+1}}\dens_k(\u)\exp\left\{-\frac{1}{K}\misfit(\u) \right\}, &&\normCst_{k+1} := \int \dens_k(\u)\exp\left\{-\frac{1}{K}\misfit(\u) \right\} d\u, \end{align} \tag{26}\] where \(\misfit(\u)\) is defined in Equation 1. In either case, \(\dens_{K} = \postDens\).
Recall the Gaussian density \[ \begin{align} &\Gaussian(\y \given \fwd(\u), \covNoise) = \text{det}(2\pi\covNoise)^{-1/2} \exp\left\{-\misfit(\u) \right\}, &&\misfit(\u) = \frac{1}{2} \lVert \y - \fwd(\u)\rVert^2_{\covNoise}. \end{align} \]
Raising this density to the power \(K^{-1}\) thus gives \[ \begin{align} \Gaussian(\y \given \fwd(\u), \covNoise)^{1/K} &= \text{det}(2\pi K \covNoise)^{-1/2} \exp\left\{-\frac{1}{K}\misfit(\u) \right\}, \\ &= \text{det}(2\pi K \covNoise)^{-1/2} \exp\left\{-\frac{1}{2} \lVert \y - \fwd(\u)\rVert^2_{K \covNoise} \right\} \\ &= \Gaussian(\yobs \given \fwd(\u), K \covNoise). \end{align} \] The two updates Equation 25 thus follow, and differ only in whether we treat the determinant term as part of the normalizing constant \(\normCst\) or the likelihood. □