\[ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\Pr}{\mathbb{P}} \newcommand{\given}{\mid} \newcommand{\Def}{:=} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Gaussian}{\mathcal{N}} \newcommand{\fwd}{\mathcal{G}} \newcommand{\u}{u} \newcommand{\yobs}{y^{\dagger}} \newcommand{\y}{y} \newcommand{\noise}{\epsilon} \newcommand{\covNoise}{\Sigma} \newcommand{\meanVec}{m} \newcommand{\covMat}{C} \newcommand{\dimObs}{n} \newcommand{\dimPar}{d} \newcommand{\parSpace}{\mathcal{U}} \newcommand{\misfit}{\Phi} \newcommand{\misfitReg}{\Phi_R} \newcommand{\misfitPost}{\Phi_{\pi}} \newcommand{\covPrior}{\covMat} \newcommand{\meanPrior}{\meanVec} \newcommand{\dens}{\pi} \newcommand{\priorDens}{\pi_0} \newcommand{\postDens}{\pi} \newcommand{\normCst}{Z} \newcommand{\joint}{\overline{\pi}} \newcommand{\meanObs}{\meanVec^{\y}} \newcommand{\covObs}{\covMat^{\y}} \newcommand{\covCross}{\covMat^{\u \y}} \newcommand{\tcovCross}{\covMat^{\y \u}} \newcommand{\GaussProj}{\mathcal{P}_{\Gaussian}} \newcommand{\meanPost}{\meanVec_{\star}} \newcommand{\covPost}{\covMat_{\star}} \newcommand{\transport}{T} \newcommand{\nens}{J} \]
The ensemble Kalman filter (EnKF) is a well-established algorithm for state estimation in high-dimensional state space models. More recently, it has gained popularity as a general-purpose derivative-free tool for optimization and approximate posterior sampling; i.e., for the solution of inverse problems. The label ensemble Kalman inversion (EKI) is generally used to refer to the class of algorithms that adapt the EnKF methodology for such purposes. While these algorithms are typically quite simple – mostly relying on slight modifications of the standard EnKF update formula – there are quite a few subtleties required in designing and analyzing EKI methods. In particular, while much of the EKI literature is focused on optimization, small modifications of optimization-focused algorithms can be made to instead target the goal of posterior sampling. In a series of posts, we will walk through these subtleties, exploring the potential of the EnKF both as a derivative-free (approximate) optimizer and sampler. We start in this post by outlining the basic setup and goals, and then proceed to introduce a basic EnKF algorithm for approximate posterior sampling.
1 Setup: Inverse Problems
This section serves to introduce the notation that will be used throughout the entire series of posts. Our focus will be on inverse problems, with the goal being to recover a latent parameter \(\u \in \R^{\dimPar}\) from indirect, and potentially noisy, observations \(\yobs \in \R^{\dimObs}\). We assume the parameter and data are related via a forward model (i.e., parameter-to-observable map) \(\fwd: \R^{\dimPar} \to \R^{\dimObs}\), giving the relationship \(y \approx \fwd(\u)\).
1.1 Optimization
We start by formulating the solution to the inverse problem as an optimization problem. One of the most basic approaches we might take is to seek the value of the parameter that minimizes the quadratic error between the data and the model prediction.
Define the least squares model-data misfit function \[ \misfit(u) := \frac{1}{2}\lVert \yobs - \fwd(\u)\rVert^2_{\covNoise} := \frac{1}{2}(\yobs - \fwd(\u))^\top \covNoise^{-1}(\yobs - \fwd(\u)), \tag{1}\] weighted by a positive definite matrix \(\covNoise\). The (nonlinear) least squares minimization problem is then given by \[ u_{\star} \in \text{argmin}_{u \in \parSpace} \ \misfit(\u). \tag{2}\]
The above definition also serves to define the notation we will be using throughout this series to denote weighted Euclidean norms. Note also that \(\misfit(u)\) depends on the observed data \(\yobs\), but we suppress this dependence in the notation. A natural extension to the least-squares problem is to add a regularization term to the objective function. We will focus on quadratic regularization terms, which is referred to as Tikhonov regularization and ridge regression in the inverse problems and statistical literatures, respectively.
Define the Tikhonov-regularized least squares function by \[ \misfitReg(\u) := \frac{1}{2}\lVert \yobs - \fwd(\u)\rVert^2_{\covNoise} + \frac{1}{2}\lVert \u - \meanPrior\rVert^2_{\covPrior}. \tag{3}\] The Tikhonov-regularized least squares optimization problem is given by \[ u_{\star} \in \text{argmin}_{u \in \parSpace} \misfitReg(\u). \tag{4}\]
The Tikhonov loss function balances the model fit to the data with the requirement to keep \(\u\) “close” to \(\meanPrior\), where the relative weights of these objectives are determined by the (positive definite) covariances \(\covNoise\) and \(\covPrior\).
1.2 Sampling
We next consider the Bayesian formulation of the inverse problem, whereby the goal is no longer to identify a single value \(\u_{\star}\), but instead to construct a probability distribution over all possible \(\u\). The Bayesian approach requires the definition of a joint distribution over the data and the parameter, \((\u,\y)\). We view the observed data \(\yobs\) as a particular realization of the random variable \(\y\). The solution of the Bayesian inverse problem is given by the conditional distribution \(\u \given [\y=\yobs]\), known as the posterior distribution. We will often shorten this notation by writing \(\u \given \yobs\).
Throughout this series, we will primarily focus on the joint distribution on \((\u, \y)\) induced by the following model: \[ \begin{align} \y &= \fwd(\u) + \noise \newline \u &\sim \priorDens \newline \noise &\sim \Gaussian(0, \covNoise), \end{align} \tag{5}\] where \(\priorDens\) is a prior distribution on the parameter, \(\covNoise\) is the fixed (known) covariance of the additive Gaussian noise, and \(\u\) and \(\noise\) are independent. The EnKF methodology we will discuss is particularly well-suited to such additive Gaussian models with known noise covariance, but there has been work on relaxing these restrictions. The above model defines a joint distribution \(\joint\) on \((\u,\y)\) via the product of densities \[ p(\u,\y) := p(\y \given \u) \priorDens(\u) = \Gaussian(\y \given \fwd(\u), \covNoise)\priorDens(\u), \tag{6}\] with the posterior density given by Bayes’ theorem \[ \begin{align} \postDens(\u) &:= p(\u \given \yobs) = \frac{1}{\normCst}\Gaussian(\yobs \given \fwd(\u), \covNoise)\priorDens(\u), &&\normCst := \int_{\parSpace} \Gaussian(\yobs \given \fwd(\u), \covNoise)\priorDens(\u) d\u. \end{align} \tag{7}\] We omit the dependence on \(\yobs\) in the notation \(\postDens(\u)\) and \(Z\).
We seek to draw samples from the posterior distribution \(\u \given \yobs\) under the model Equation 5. We can phrase this as the task of sampling the probability distribution with density \[ \postDens(\u) \propto \exp\left\{\misfitPost(\u)\right\}, \tag{8}\] where \[ \misfitPost(\u) := -\log \postDens(\u) = -\log p(\y \given \u) - \log \priorDens(\u) = \frac{1}{2}\lVert \yobs - \fwd(\u)\rVert^2_{\covNoise} - \log \priorDens(\u) + C, \tag{9}\] is the (unnormalized) negative log posterior density, up to an additive constant \(C\) that is independent of \(\u\).
We introduce the notation \(\misfitPost(\u)\) in order to draw a connection with the optimization goals. Indeed, note that the log-likelihood term in Equation 9 is precisely the least squares misfit function Equation 1 (up to an additive constant). Moreover, if we choose Gaussian prior \(\priorDens(\u) = \Gaussian(\u \given \meanPrior, \covPrior)\), then \(\misfitPost(\u)\) agrees with \(\misfitReg(\u)\) (again, up to an additive constant). We will explore certain algorithms that assume the prior is Gaussian, but in general we will allow \(\priorDens\) to be non-Gaussian.
1.3 Roadmap
With the setup and goals established, we will now take steps towards practical algorithms. It is important to recognize that the application of EnKF methodology to the optimization and sampling problems will yield approximate algorithms in general. The methods will be exact (in a manner which will be made precise) in the linear Gaussian setting, where the forward model \(\fwd\) is linear and the prior \(\priorDens\) is Gaussian. The EKI algorithms we consider are derivative-free, suitable for the black-box setting whereby we can only evaluate \(\fwd(\cdot)\) pointwise. In typical applications, function evaluations \(\fwd(\cdot)\) may be quite computationally expensive; e.g., they might require numerically solving partial differential equations. Another benefit of the EnKF methodology is that they allow for many model evaluations to be performed in parallel. These features will become clear as we dive into the methods. We start in this post by focusing on the sampling problem; the optimization setting will be explored in future posts.
2 Joint Gaussian Approximation
We start by introducing the notion of approximate conditioning using Gaussian distributions, a core idea underlying many Kalman methods. A slight extension of this notion yields the classic EnKF update.
2.1 Gaussian Projection
Recall that we write \(\joint\) to denote the joint distribution of \((\u,\y)\) under Equation 5. In general, this distribution will be non-Gaussian, rendering the conditioning \(u \given \yobs\) a challenging task. A simple idea is to consider approximating \(\joint\) with a Gaussian, which is a distribution for which conditioning is easy. To this end, consider the approximation \[ \begin{align} \GaussProj\joint \Def \Gaussian\left( \begin{bmatrix} \meanPrior \newline \meanObs \end{bmatrix}, \begin{bmatrix} \covPrior & \covCross \newline \tcovCross & \covObs \end{bmatrix} \right) \end{align} \tag{10}\] where the means and covariances are simply given by the first two moments of \((\u, \y)\). In particular, \[ \begin{align} &\meanPrior \Def \E[\u], &&\covPrior \Def \Cov[\u] \end{align} \tag{11}\]
are the moments of the \(\u\)-marginal, and \[ \begin{align} &\meanObs \Def \E[\y], &&\covObs \Def \Cov[\y] \end{align} \tag{12}\]
are the moments of the \(\y\)-marginal. Finally, \[ \begin{align} &\covCross \Def \Cov[\u,\y], &&\tcovCross \Def [\covCross]^\top \end{align} \tag{13}\]
is the cross-covariance between \(\u\) and \(\y\). Following (Calvello, Reich, and Stuart 2024), we refer to \(\GaussProj\joint\) as the Gaussian “projection” of \(\joint\). This terminology is motivated by the fact that \(\GaussProj\joint\) can be seen to minimize the Kullback-Leibler divergence \(\text{KL}(\joint \parallel q)\) over the space of Gaussians \(q = \Gaussian(\meanVec^\prime, \covMat^\prime)\) (see (Sanz-Alonso, Stuart, and Taeb 2023), chapter 4 for details).
Having invoked the joint Gaussian approximation in Equation 10, we can now approximate \(\u \given \yobs\) with the Gaussian conditional. Gaussian conditionals are also Gaussian, and are thus characterized by the well-known conditional mean and covariance formulas given below.
Let \((\tilde{\u}, \tilde{y})\) be a random vector with distribution \(\GaussProj\joint\). We consider approximating the posterior \(\u \given (\y=\yobs)\) with \(\tilde{\u} \given (\tilde{\y}=\yobs)\). As it is the conditional of a Gaussian distribution, this posterior approximation is Gaussian, with moments \[ \begin{align} \meanPost &= \meanPrior + \covCross [\covObs]^{-1} (\yobs - \meanObs) \newline \covPost &= \covPrior + \covCross [\covObs]^{-1} \tcovCross. \end{align} \tag{14}\]
2.2 Monte Carlo Approximations
With the closed-form approximation Equation 14 in hand, we can simulate approximate posterior draws by constructing \(\meanPost\) and \(\covPost\), then sampling from \(\Gaussian(\meanPost, \covPost)\). Interestingly, we can actually bypass the step of computing conditional moments and directly sample from the Gaussian conditional using a result known as Matheron’s rule.
Let \((\tilde{\u}, \tilde{y})\) be random variables with distribution \(\GaussProj\joint\). Then the following equality holds in distribution. \[ \begin{align} (\tilde{\u} \given [\tilde{\y} = \yobs]) &\overset{d}{=} \tilde{\u} + \covCross [\covObs]^{-1} (\yobs - \tilde{\y}). \end{align} \tag{15}\]
This implies that independent samples from \(\tilde{\u} \given [\tilde{\y} = \yobs]\) can be simulated via the following algorithm.
- Sample \((\u^\prime, \y^\prime) \sim \GaussProj\joint\)
- Return \(\transport(\u^\prime, \y^\prime)\)
where \[ \transport(\u^\prime, \y^\prime) \Def \u^\prime + \covCross [\covObs]^{-1} (\yobs - \y^\prime) \tag{16}\]
The distribution of the lefthand side of Equation 15 is given in Equation 14. Notice that the righthand side is a linear function of the Gaussian random vector \((\tilde{\u}, \tilde{\y})\), and is thus Gaussian. It remains to verify that the mean and covariance of the righthand side agrees with Equation 14. The mean is given by \[ \begin{align} \E[\transport(\tilde{\u}, \tilde{\y})] &= \E[\tilde{\u}] + \covCross [\covObs]^{-1} (\yobs - \E[\tilde{\y}]) \newline &= \meanPrior + \covCross [\covObs]^{-1} (\yobs - \meanObs) \newline &= \E[\tilde{\u} \given \tilde{\y} = \yobs]. \end{align} \] Similarly, the covariance is \[ \begin{align} \Cov[\transport(\tilde{\u}, \tilde{\y})] \end{align} \] □
The map \(\transport(\cdot, \cdot)\) is a deterministic function that transports samples from the joint Gaussian to its conditional distribution. Note that this map depends on the first two moments of \(\GaussProj\joint\).
We next consider a slight adjustment to the Matheron update, which results in (potentially) non-Gaussian approximate posterior samples. This yields the classical EnKF update equation.
An alternative Monte Carlo posterior approximation can be obtained by modifying the above sampling strategy as follows:
- Sample \((u^\prime, \y^\prime) \sim \joint\)
- Return \(\transport(\u^\prime, \y^\prime)\)
Here, \(\transport(\cdot, \cdot)\) is the same transport map as defined in Equation 16. Sampling from the joint distribution in step one above entails sampling a parameter from the prior, then sampling from the likelihood: \[ \begin{align} &\u^\prime \sim \priorDens \newline &\y^\prime \Def \fwd(\u^\prime) + \noise^\prime, &&\noise^\prime \sim \Gaussian(0, \covNoise) \end{align} \tag{17}\]
Note that the difference between the two above algorithms is that the former samples \((u^\prime, \y^\prime)\) from the Gaussian projection \(\GaussProj\joint\), while the latter samples from the true joint distribution \(\joint\). In both cases, the form of the transport map \(\transport\) is derived from the Gaussian approximation \(\GaussProj\joint\). The EnKF update thus combines exact Monte Carlo sampling from the joint distribution with approximate conditioning motivated by a Gaussian ansatz. Since the samples \((u^\prime, \y^\prime)\) are no longer Gaussian in general, then the approximate conditional samples \(\transport(u^\prime, \y^\prime)\) can also be non-Gaussian. One might hope that this additional flexibility leads to an improved approximation. We conclude this section by defining the Kalman gain, which forms the core of the Matheron transport map \(\transport\).
The Kalman gain associated with the inverse problem in Equation 5 is defined as \[ K := \covCross [\covObs]^{-1}. \tag{18}\] The transport map in Equation 16 can thus be written as \[ \transport(\u^\prime, \y^\prime) = \u^\prime + K(\yobs - \y^\prime). \tag{19}\]
We thus see that the transport map takes the prior sample \(\u^\prime\) and adds a “correction” term based on the data. The correction is the linear map \(K\) applied to the residual \(\yobs - \y^\prime\) (i.e., the difference between the observed and predicted data).
2.3 Practical Algorithms
The methods presented above do not yet constitute algorithms, as we have not specified how to compute the moments defining the Gaussian projection Equation 10. By replacing the exact moments with Monte Carlo estimates, we obtain an algorithm which we will refer to as single-step ensemble Kalman inversion (EKI). Given samples, \(\{u^{(j)}, \y^{(j)}\}_{j=1}^{\nens} \sim \joint\), we can estimate the required moments via the standard Monte Carlo estimates: \[ \begin{align} &\meanPrior \Def \frac{1}{\nens} \sum_{j=1}^{\nens} \u^{(j)}, &&\covPrior \Def \frac{1}{\nens-1} \sum_{j=1}^{\nens} (\u^{(j)}-\meanPrior)(\u^{(j)}-\meanPrior)^\top \newline &\meanObs \Def \frac{1}{\nens} \sum_{j=1}^{\nens} \y^{(j)}, &&\covObs \Def \frac{1}{\nens-1} \sum_{j=1}^{\nens} (\y^{(j)}-\meanObs)(\y^{(j)}-\meanObs)^\top \end{align} \tag{20}\]
\[ \begin{equation} \covCross \Def \frac{1}{\nens-1} \sum_{j=1}^{\nens} (\u^{(j)}-\meanPrior)(\y^{(j)}-\meanObs)^\top \end{equation} \]
Note that we utilize the same notation for the exact moments and their empirical estimates; the precise meaning of the notation will be made clear from context.
Algorithm: Single Step EKI \[ \begin{array}{ll} \textbf{Input:} & \text{Sample size } \nens \\ \textbf{Output:} & \text{Approximate posterior samples } \{\u_{\star}^{(j)}\}_{j=1}^{\nens} \\ \hline \textbf{1:} & \text{Sample prior: } &&\u^{(j)} \overset{\text{iid}}{\sim} \priorDens, \ j = 1, \dots, \nens \\ \textbf{2:} & \text{Sample likelihood: } &&\y^{(j)} \Def \fwd(\u^{(j)}) + \noise^{(j)}, \ \noise^{(j)} \overset{\text{iid}}{\sim} \Gaussian(0, \covNoise) \\ \textbf{3:} & \text{Estimate moments: } &&\meanPrior, \covPrior, \meanObs, \covObs \\ \textbf{4:} & \text{Transport samples: } &&\u_{\star}^{(j)} \Def \u^{(j)} + \covCross [\covObs]^{-1}(\yobs - \y^{(j)}) \end{array} \]
The definition of the empirical moments is given in Equation 20.
Other considerations: - If prior is Gaussian, then no need to estimate m, C - Reducing sampling bias in covariance estimate of Cy. - Gaussian algorithm algorithm can be defined by returning conditional moments. - Cite Higdon paper.
3 Generalizing to Iterative Algorithms
In the previous section, we developed an algorithm for approximate posterior sampling that essentially represents a single application of the EnKF update formula. Since this update stems from a Gaussian ansatz, we would expect its performance to degrade for highly nonlinear \(\fwd\) or non-Gaussian \(\priorDens\). One approach to deal with this issue is to break the problem into a sequence of easier problems. In particular, we will consider breaking the prior-to-posterior map \(\priorDens \mapsto \postDens\) into a composition of maps \[ \priorDens \mapsto \dens_1 \mapsto \cdots \mapsto \dens_K = \postDens. \tag{21}\]
We will now apply the EnKF update \(K\) times in order to track these intermediate distributions, and hopefully end up with a better approximation of the posterior. The intuition here is that the intermediate maps \(\dens_k \mapsto \dens_{k+1}\) should represent relatively smaller changes, and small changes may be more amenable to linear-Gaussian approximation.
3.1 Interpolating between probability densities
There are many ways to construct the sequence Equation 21; i.e., to interpolate between two probability distributions \(\priorDens\) and \(\postDens\). We will start by considering the following basic tempering schedule. We start with a generic result for two arbitrary densities, and then specialize to our particular setting.
Let \(\priorDens(\u)\) and \(\postDens(\u) = \frac{1}{\normCst}\tilde{\dens}(\u)\) be two probability densities on \(\parSpace\). For a positive integer \(K\), define the sequence of densities \(\dens_0, \dens_1, \dots, \dens_K\) recursively by
\[ \begin{align} \dens_{k+1}(\u) &:= \frac{1}{\normCst_{k+1}}\dens_k(\u)L(\u)^{1/K}, &&\normCst_{k+1} := \int \dens_k(\u)L(\u)^{1/K} d\u, \end{align} \tag{22}\]
for \(k = 0, \dots, K-1\), where \[ L(\u) \Def \frac{\tilde{\dens}(\u)}{\priorDens(\u)}. \tag{23}\] Then the final density satisfies \(\dens_K = \postDens\).
To start, note that the density ratio Equation 23 can be written as \[ L(\u) = \frac{\tilde{\dens}(\u)}{\priorDens(\u)} = \frac{Z \postDens(\u)}{\priorDens(\u)}. \]
We use this fact, and the recursion Equation 22 to obtain \[ \begin{align} \normCst_K &= \int \dens_{K-1}(\u)L(\u)^{1/K} d\u \\ &= \int \dens_{0}(\u)L(\u)^{1/K} \prod_{k=1}^{K-1} \frac{L(\u)^{1/K}}{Z_k} d\u \\ &= \frac{1}{\normCst_1 \cdots \normCst_{K-1}} \int \priorDens(\u)L(\u) d\u \\ &= \frac{1}{\normCst_1 \cdots \normCst_{K-1}} \int \priorDens(\u)\frac{\normCst \postDens(\u)}{\priorDens(\u)} d\u \\ &= \frac{\normCst}{\normCst_1 \cdots \normCst_{K-1}} \int \postDens(\u) d\u \\ &= \frac{\normCst}{\normCst_1 \cdots \normCst_{K-1}}. \end{align} \]
The density recursion similarly yields \[ \begin{align} \dens_K(\u) &= \frac{1}{\normCst_{K}}\dens_{K-1}(\u)L(\u)^{1/K} \\ &= \frac{\normCst_1 \cdots \normCst_{K-1}}{\normCst} \priorDens(\u)\frac{L(\u)}{\normCst_1 \cdots \normCst_{K-1}} \\ &= \frac{1}{\normCst} \priorDens(\u) L(\u) \\ &= \postDens(\u), \end{align} \] where we have plugged in the expressions for \(\normCst_K\) and \(L(\u)\) derived above. □
The above result constructs a sequence between two arbitrary densities \(\priorDens\) and \(\postDens\). If we choose these to be the prior and posterior distributions, then we obtain a prior-to-posterior map as a corollary.
Consider a Bayesian joint distribution \(p(\u, \y) = \priorDens(\u)p(\y \given \u)\) with posterior \(\postDens(\u) \Def \frac{1}{\normCst} \priorDens(\u) p(\yobs \given \u)\) where \(\normCst = p(\y)\). Then the sequence \(\dens_0, \dens_1, \dots, \dens_K\) defined by \[ \begin{align} \dens_{k+1}(\u) &:= \frac{1}{\normCst_{k+1}}\dens_k(\u)p(\yobs \given \u)^{1/K}, &&\normCst_{k+1} := \int \dens_k(\u)p(\yobs \given \u)^{1/K} d\u, \end{align} \tag{24}\] for \(k = 0, \dots, K-1\), satisfies \(\dens_{K} = \postDens\).
This is a special case of Equation 22 where \(\tilde{\postDens}(\u) = \priorDens(\u)p(\yobs \given \u)\). Thus, the density ratio simplifies to \[ L(\u) = \frac{\priorDens(\u)p(\yobs \given \u)}{\priorDens(\u)} = p(\yobs \given \u). \] □
We conclude this section with another corollary that specializes the result even further to the particular Bayesian inverse problem Equation 5.
Consider the particular Bayesian joint distribution \(p(\u, \y)\) defined by the model in Equation 5. Then the updates in Equation 24 take the particular form \[ \begin{align} \dens_{k+1}(\u) &:= \frac{1}{\normCst_{k+1}}\dens_k(\u)\Gaussian(\yobs \given \fwd(\u), K\covNoise), &&\normCst_{k+1} := \int \dens_k(\u)\Gaussian(\yobs \given \fwd(\u), K\covNoise) d\u, \end{align} \tag{25}\]
This update can equivalently be written as \[ \begin{align} \dens_{k+1}(\u) &:= \frac{1}{\normCst_{k+1}}\dens_k(\u)\exp\left\{-\frac{1}{K}\misfit(\u) \right\}, &&\normCst_{k+1} := \int \dens_k(\u)\exp\left\{-\frac{1}{K}\misfit(\u) \right\} d\u, \end{align} \tag{26}\] where \(\misfit(\u)\) is defined in Equation 1. In either case, \(\dens_{K} = \postDens\).
Recall the Gaussian density \[ \Gaussian(\y \given \fwd(\u), \covObs) = \text{det}(2\pi\covObs)^{-1/2} \exp\left\{-\misfit(\u) \right\}, \] where \(\misfit(\u) = \frac{1}{2} \lVert \y - \fwd(\u)\rVert^2_{\covObs}\). Thus, \[ \begin{align} \Gaussian(\y \given \fwd(\u), \covObs)^{1/K} &= \text{det}(2\piK\covObs)^{-1/2} \exp\left\{-\frac{1}{K}\misfit(\u) \right\}, \\ &= \text{det}(2\piK\covObs)^{-1/2} \exp\left\{-\frac{1}{2} \lVert \y - \fwd(\u)\rVert^2_{K\covObs} \right\} \\ &= \Gaussian(\yobs \given \fwd(\u), K\covNoise). \end{align} \] The two updates Equation 25 thus follow, and differ only in whether we treat the determinant term as part of the normalizing constant \(\normCst\) or the likelihood. □
As noted above, the approximation algorithm summarized in (27) can be viewed as performing an iteration of the EnKF algorithm. In this section, we take this idea further by encoding the structure of the inverse problem (1) in a dynamical system. Once these artificial dynamics are introduced, we can then consider applying standard algorithms for Bayesian estimation of time-evolving systems (e.g., the EnKF) to solve the inverse problem. We emphasize that (1), the original problem we are trying to solve, is static. The dynamics we introduce here are purely artificial, introduced for the purpose of making a mathematical connection with the vast field of time-varying systems. This allows us to port methods that have demonstrated success in the time-varying domain to our current problem of interest.
3.2 Prior-to-Posterior Map in Finite Time
We start by generically considering how to construct a sequence of probability distributions that maps from the prior \(\pi_0\) to the posterior \(\pi\) in a finite number of steps. We will find it convenient to write the likelihood with respect to the potential \(\Phi(u)\) defined in (3), such that the posterior distribution solving (1) is given by
4 Appendix
4.1 Notation
We write \[ \lVert x \rVert^2_A := \langle x, x\rangle_A := x^\top A^{-1}x \] to denote the Euclidean norm weighted by the inverse of a positive definite matrix \(A\).
5 Background: Joint Gaussian Conditioning
5.1 Joint Gaussian Assumption
We now begin by considering approximation of the posterior \(u|y\) by way of a certain Gaussian approximation. In particular, we will assume that \((u,y)\) are jointly Gaussian distributed, in which case standard Gaussian conditioning identities can be implied to yield an approximation of \(u|y\). Given that conditionals of Gaussians are also Gaussian, this approach produces a Gaussian approximation to the posterior \(u|y\). However, borrowing an idea from EnKF methodology, we will consider a slight modification with the ability to produce non-Gaussian approximations. To avoid notational confusion between exact and approximate distributions, we will denote by \((\hat{u}, \hat{y})\) the random variables defining the joint Gaussian approximation. The Gaussian assumption thus takes the form \[ \begin{align} \begin{bmatrix} \hat{u} \newline \hat{y} \end{bmatrix} \sim \mathcal{N}\left( \begin{bmatrix} \overline{u} \newline \overline{y} \end{bmatrix}, \begin{bmatrix} \hat{C} & \hat{C}^{uy} \newline \hat{C}^{yu} & \hat{C}^{y} \end{bmatrix} \right) \tag{6} \end{align} \] where the moments \(\overline{u}\), \(\overline{y}\), \(\hat{C}\), \(\hat{C}^y\), and \(\hat{C}^{uy}\) defining this approximation are quantities that we must specify. We use the notation \(\hat{C}^{yu} := \hat{C}^{uy}\). Note that if the forward model \(\mathcal{G}\) is linear and the prior distribution \(\pi_0\) is Gaussian, then the joint Gaussian approximation (6) is actually exact, and the moments can be computed in closed-form. In other words, with the moments properly defined, \((\hat{u},\hat{y}) \overset{d}{=} (u,y)\) and therefore \((\hat{u}|\hat{y}) \overset{d}{=} (u|y)\); that is, the posterior approximation is exact. This special case is typically referred to as the linear Gaussian setting, which I discuss in depth in this this post. When \(\mathcal{G}\) is nonlinear and/or \(\pi_0\) is non-Gaussian, then (6) will truly be an approximation and the above equalities will not hold.
In the following subsections, we briefly review some properties of joint Gaussians and their conditional distributions. We work with the joint distribution (6), assuming the means and covariances are known. With the necessary background established, we then discuss practical algorithms for estimating these moments and producing approximations of \(u|y\).
5.2 Gaussian Conditional Moments
Regardless of whether or not we are truly in the linear Gaussian setting, let us suppose that we have constructed the joint distribution (6). Using standard facts about Gaussian distributions, we know the conditional is also Gaussian \[ \hat{u}|[\hat{y}=y] \sim \mathcal{N}(\hat{m}_*, \hat{C}_*), \tag{7} \] with moments given by
\[ \hat{m}_* = \overline{u} + \hat{C}^{uy}[\hat{C}^y]^{-1}(y - \overline{y}) \tag{8} \] \[ \hat{C}_* = \hat{C} - \hat{C}^{uy}[\hat{C}^y]^{-1}\hat{C}^{yu}. \tag{9} \]
5.3 Gaussian Conditional Simulation
As an alternative to explicitly computing the conditional moments (8) and (9), we can consider a Monte Carlo representation of \(\hat{u}|\hat{y}\). The conditional distribution can be directly simulated (without computing (8) and (9)) using the fact \[ (\hat{u}|[\hat{y}=y]) \overset{d}{=} \hat{u} + \hat{C}^{uy}[\hat{C}^y]^{-1}(y-\hat{y}), \tag{10} \] which can be quickly verified by computing the mean and covariance of each side. Note that the randomness in the righthand side is inherited from the random variables \(\hat{u}\) and \(\hat{y}\), while \(y\) here is viewed as a specific realization of the data (and is thus non-random). The result (10), known as Matheron’s Rule, provides the basis for the following algorithm to draw independent samples from the conditional distribution \(\hat{u}|[\hat{y}=y]\).
Gaussian Conditional Simulation via Matheron’s Rule.
Independent samples \(\hat{u}_*^{(1)}, \dots, \hat{u}_*^{(J)}\) can be simulated from the distribution \(\hat{u}|[\hat{y}=y]\) by repeating the following procedure for each \(j=1,\dots,J\):
- Draw independent samples \(\hat{u}^{(j)}\) and \(\hat{y}^{(j)}\) from the marginal distributions of \(\hat{u}\) and \(\hat{y}\), respectively. That is, \[\begin{align} &\hat{u}^{(j)} \sim \mathcal{N}(\overline{u},\hat{C}) &&\hat{y}^{(j)} \sim \mathcal{N}(\overline{y},\hat{C}^y). \tag{11} \end{align}\]
- Return \[ \hat{u}^{(j)}_* := \hat{u}^{(j)} + \hat{C}^{uy}[\hat{C}^y]^{-1}(y-\hat{y}^{(j)}). \tag{12} \]
6 Posterior Approximation Algorithm
Up to this point, we have not discussed the choice of the moments defining the joint Gaussian approximation (6). We now provide these definitions, leading to concrete algorithms for posterior approximation. We will adopt a Monte Carlo strategy by sampling independently from the true joint distribution \((u,y)\) defined by (1); i.e., \((u^{(j)}, y^{(j)}) \sim p(u,y)\). We next compute the empirical means and covariances of these random draws and insert them into the joint Gaussian approximation (6). The below subsection focuses on the estimation of these moments, which we will then follow by utilizing the above Gaussian conditioning results to derive various approximations of \(u|y\).
6.1 Estimating the Gaussian Moments
The first step in our approach requires generating the prior ensemble \[ \{(u^{(j)}, \epsilon^{(j)})\}, \qquad j = 1, \dots, J \tag{13} \] constructed by sampling according to model (1); i.e., \[ \begin{align} &u^{(j)} \sim \pi_0, &&\epsilon^{(j)} \sim \mathcal{N}(0, \Sigma). \tag{14} \end{align} \] We now consider estimating the first two moments of this joint distribution. Starting with the \(u\) marginal, we define the sample estimates \[ \begin{align} \overline{u} &:= \frac{1}{J}\sum_{j=1}^{J} u^{(j)} \tag{15} \newline \hat{C} &:= \frac{1}{J-1}\sum_{j=1}^{J} (u^{(j)}-\overline{u})(u^{(j)}-\overline{u})^\top. \tag{16} \end{align} \] Alternatively, if the prior \(\pi_0\) takes the form of a well-known distribution, then we can simply set \(\overline{u}\) and/or \(\hat{C}\) to the known moments of this distribution. We can likewise consider such mean and covariance estimates for the \(\hat{y}\) portion of (6), defined with respect to the ensemble \[ \{y^{(j)}\}_{j=1}^{J}, \qquad\qquad y^{(j)} := \mathcal{G}(u^{(j)}) + \epsilon^{(j)}. \tag{17} \] However, we can simplify matters a bit by performing part of the calculations analytically, owing to the simple additive Gaussian error structure. Noting that under (1) we have \[ \begin{align} \mathbb{E}[y] &= \mathbb{E}[\mathcal{G}(u) + \epsilon] = \mathbb{E}[\mathcal{G}(u)] \tag{18} \newline \text{Cov}[y] &= \text{Cov}[\mathcal{G}(u) + \epsilon] = \text{Cov}[\mathcal{G}(u)] + \text{Cov}[\epsilon] = \text{Cov}[\mathcal{G}(u)] + \Sigma, \tag{19} \end{align} we can focus our efforts on substituting sample-based estimates for the first term in both (18) and (19). Doing so yields, \begin{align} \overline{y} &:= \frac{1}{J} \sum_{j=1}^{J} \mathcal{G}(u^{(j)}) \tag{20} \newline \hat{C}^y &:= \frac{1}{J-1} \sum_{j=1}^{J} (\mathcal{G}(u^{(j)})-\overline{y})(\mathcal{G}(u^{(j)})-\overline{y})^\top + \Sigma. \tag{21} \end{align} \] We similarly define the last remaining quantity \(\hat{C}^{uy}\) by noting that \[ \text{Cov}[u,y] = \text{Cov}[u,\mathcal{G}(u)+\epsilon] = \text{Cov}[u,\mathcal{G}(u)] + \text{Cov}[u,\epsilon] = \text{Cov}[u,\mathcal{G}(u)]. \tag{22} \] We thus consider the estimator \[ \hat{C}^{uy} := \frac{1}{J-1} \sum_{j=1}^{J} (u^{(j)}-\overline{u})(\mathcal{G}(u^{(j)})-\overline{y})^\top. \tag{23} \]
6.2 Gaussian Approximations
At this point, we have obtained the joint approximation (6) derived by computing the sample moments as described in the previous subsection. The question is now how best to use this joint approximation in approximating the true posterior. The most straightforward approach is to simply define our approximate posterior to be \(\hat{u}|[\hat{y}=y]\), the Gaussian conditional. This conditional is available in closed-form and is defined in equations (7), (8), and (9). Suppose, for whatever reason, we are only interested in a sample-based representation of this Gaussian conditional. In this case, we can apply the algorithm summarized in (11) and (12) based on Matheron’s Rule. Explicitly, we construct an approximate posterior ensemble \(\{\hat{u}_*^{(j)}\}_{j=1}^{J^\prime}\) via the update equation in (12), \[ \hat{u}^{(j)}_* := \hat{u}^{(j)} + \hat{C}^{uy}[\hat{C}^y]^{-1}(y-\hat{y}^{(j)}), \tag{24} \] for \(j = 1, \dots, J^\prime\). At this point we should be quite clear about the definition of each term in (24). The mean and covariance estimates appearing in this expression are all derived from the ensemble \(\{(u^{(j)}, \epsilon^{(j)})\}_{j=1}^{J}\) sampled from the true joint distribution implied by (1). On the other hand, \(\{\hat{u}^{(j)}, \hat{y}^{(j)}\}_{j=1}^{J^\prime}\) are samples from the approximate joint distribution defined in (6). This is simply an exact implementation of Matheron’s rule under the joint distribution (6); i.e., the resulting ensemble \(\{\hat{u}^{(j)}_*\}_{j=1}^{J^\prime}\) contains iid samples from \(\hat{u}|[\hat{y}=y]\). If it seems odd to dwell on this point, we note that we are primarily doing so to provide motivation for the alternative method discussed below.
6.3 Non-Gaussian Approximation via EnKF Update
As discussed above, a direct application of Matheron’s rule to (6) results in an ensemble of samples from a Gaussian distribution, which we interpret as a sample-based approximation to the true posterior. We now consider constructing the ensemble \(\{\hat{u}_*^{(j)}\}_{j=1}^{J}\) via the slightly different update rule \[ \hat{u}^{(j)}_* := u^{(j)} + \hat{C}^{uy}[\hat{C}^y]^{-1}(y-y^{(j)}), \tag{25} \] for \(j = 1, \dots, J\). This is precisely the update equation used in the analysis (i.e., update) step of the EnKF, so we refer to (25) as the EnKF update. The only difference from (24) is that we have replaced the samples \((\hat{u}^{(j)},\hat{y}^{(j)})\) from (6) with the samples \((u^{(j)}, y^{(j)})\) from the true joint distribution, as defined in (14) and (17). In other words, we now utilize the samples from the initial ensemble (the same samples used to compute the mean and covariance estimates). Thus, equation (25) can be viewed as a particle-by-particle update to the initial ensemble, \[ \{u^{(j)}\}_{j=1}^{J} \mapsto \{\hat{u}^{(j)}_*\}_{j=1}^{J}. \tag{26} \] The update (25) is somewhat heuristic, and it is difficult to assess the properties of the updated ensemble. Given that the initial ensemble will not in general constitute samples from a Gaussian (since \(p(u,y)\) is in general non-Gaussian), then the updated ensemble will also generally be non-Gaussian distributed. If the inverse problem (1) is linear Gaussian, then the updates (24) and (25) are essentially equivalent. Beyond this special case, the hope is that the use of the true prior samples in (25) will better approximate non-Gaussianity in the posterior distribution, as opposed to the Gaussian approximation implied by (24). The EnKF update can alternatively be justified from an optimization perspective, which we won’t discuss here; see my post on the EnKF for details. We conclude this section by summarizing the final algorithm.Posterior Approximation via EnKF Update.
1. Generate the initial ensembles \(\{u^{(j)}\}_{j=1}^{J}\) and \(\{\mathcal{G}(u^{(j)})\}_{j=1}^{J}\), where \(u^{(j)} \sim \pi_0\).
2. Compute the sample estimates \(\overline{u}\), \(\overline{y}\), \(\hat{C}\), \(\hat{C}^y\), and \(\hat{C}^{uy}\) as defined in (20)-(23).
3. Return the updated ensemble \(\{\hat{u}^{(j)}_*\}_{j=1}^{J}\) by applying the EnKF update \[\begin{align} \hat{u}^{(j)}_* &:= u^{(j)} + \hat{C}^{uy}[\hat{C}^y]^{-1}(y-y^{(j)}), &&y^{(j)} \sim \mathcal{N}(\mathcal{G}(u^{(j)}), \Sigma). \tag{27} \end{align}\]
7 Artificial Dynamics
As noted above, the approximation algorithm summarized in (27) can be viewed as performing an iteration of the EnKF algorithm. In this section, we take this idea further by encoding the structure of the inverse problem (1) in a dynamical system. Once these artificial dynamics are introduced, we can then consider applying standard algorithms for Bayesian estimation of time-evolving systems (e.g., the EnKF) to solve the inverse problem. We emphasize that (1), the original problem we are trying to solve, is static. The dynamics we introduce here are purely artificial, introduced for the purpose of making a mathematical connection with the vast field of time-varying systems. This allows us to port methods that have demonstrated success in the time-varying domain to our current problem of interest.
7.1 Prior-to-Posterior Map in Finite Time
We start by generically considering how to construct a sequence of probability distributions that maps from the prior \(\pi_0\) to the posterior \(\pi\) in a finite number of steps. We will find it convenient to write the likelihood with respect to the potential \(\Phi(u)\) defined in (3), such that the posterior distribution solving (1) is given by
\[ \begin{align} \pi(u) &= \frac{1}{Z} \pi_0(u)\exp(-\Phi(u)), &&Z = \int \pi_0(u)\exp(-\Phi(u)) du. \tag{28} \end{align} \]
The following result describes a likelihood tempering approach to constructing the desired sequence of densities.
Prior-to-Posterior Sequence of Densities.
For a positive integer \(K\), define the sequence of densities \(\pi_0, \pi_1, \dots, \pi_K\) recursively by \[\begin{align} \pi_{k+1}(u) &:= \frac{1}{Z_{k+1}}\pi_k(u)\exp\left(-\frac{1}{K}\Phi(u)\right), &&Z_{k+1} := \int \pi_k(u)\exp\left(-\frac{1}{K}\Phi(u)\right) du, \tag{29} \end{align}\] for \(k = 0, \dots, K-1\). Then the final density satisfies \(\pi_K = \pi\), where \(\pi\) is defined in (28).
Proof. We first consider the normalizing constant at the final iteration: \[ \begin{align} Z_K &= \int \pi_{K-1}(u)\exp\left(-\frac{1}{K}\Phi(u)\right) du \newline &= \int \pi_{0}(u)\frac{1}{Z_{K-1} \cdots Z_1}\exp\left(-\frac{1}{K}\Phi(u)\right)^K du \newline &= \frac{1}{Z_{K-1} \cdots Z_1} \int \pi_{0}(u)\exp\left(-\Phi(u)\right) du \newline &= \frac{Z}{Z_1 \cdots Z_{K-1}}, \end{align} \] where the second equality simply iterates the recursion (29). Rearranging, we see that \[ Z = \prod_{k=1}^{K} Z_k. \tag{30} \]
We similarly iterate the recursion for \(\pi_K\): \[ \begin{align} \pi_K(u) &= \frac{1}{Z_K} \pi_{K-1}(u) \exp\left(-\frac{1}{K}\Phi(u)\right) \newline &= \frac{1}{Z_K Z_{K-1} \cdots Z_1} \pi_{0}(u) \exp\left(-\frac{1}{K}\Phi(u)\right)^K \newline &= \frac{1}{Z} \pi_{0}(u) \exp\left(-\Phi(u)\right) \newline &= \pi(u) \qquad\qquad\qquad\qquad\qquad \blacksquare \end{align} \]
The sequence defined by (29) essentially splits the single Bayesian inverse problem (1) into a sequence of \(K\) inverse problems. In particular, the update \(\pi_k \mapsto \pi_{k+1}\) implies solving the inverse problem \[ \begin{align} y|u &\sim \tilde{p}(y|u) \tag{31} \newline u &\sim \pi_k(u), \end{align} \] with the tempered likelihood \[ \tilde{p}(y|u) \propto \exp\left(-\frac{1}{K}\Phi(u)\right). \tag{32} \] Each update thus encodes the action of conditioning on the data \(y\), but under the modified likelihood \(\tilde{p}(y|u)\) which “spreads out” the information in the data over the \(K\) time steps. If we were to consider using the true likelihood \(p(y|u)\) in each time step, this would artificially inflate the information content in \(y\), as if we had \(K\) independent data vectors instead of just one.
7.1.1 Gaussian Special Case
Suppose the likelihood is Gaussian \(\mathcal{N}(y|\mathcal{G}(u), \Sigma)\), with associated potential \(\Phi(u) = \frac{1}{2}\lVert y - \mathcal{G}(u)\rVert^2_{\Sigma}\). The tempered likelihood in this case is \[ \exp\left(-\frac{1}{K}\Phi(u)\right) = \exp\left(-\frac{1}{2K}\lVert y - \mathcal{G}(u)\rVert^2_{\Sigma}\right) \propto \mathcal{N}(y|\mathcal{G}(u), K\Sigma). \tag{33} \] The modified likelihood remains Gaussian, and is simply the original likelihood with the variance inflated by a factor of \(K\). This matches the intuition from above; the variance is increased to account for the fact that we are conditioning on the same data vector \(K\) times.
If, in addition to the Gaussian likelihood, the prior \(\pi_0\) is Gaussian and the map \(\mathcal{G}\) is linear then the final posterior \(\pi\) and each intermediate distribution \(\pi_k\) will also be Gaussian. In this case, the recursion (29) defines a sequence of \(K\) linear Gaussian Bayesian inverse problems.
7.2 Introducing Artificial Dynamics
The previous section considered a discrete process on the level of densities; i.e., the dynamics (29) describe the evolution of \(\pi_k\). Our goal is now to design an artificial dynamical system that treats \(u\) as the state variable, such that \(\pi_k\) describes the filtering distribution of the state at iteration \(k\). In theory, we can then draw samples from \(\pi\) by applying standard filtering algorithms to this artificial dynamical system.
There are many approaches we could take here, but let us start with the simplest. We know that the update \(\pi_k \mapsto \pi_{k+1}\) should encode the action of conditioning on \(y\) with respect to the tempered likelihood. Thus, let’s consider the following dynamics and observation model: \[ \begin{align} u_{k+1} &= u_k \tag{34} \newline y_{k+1} &= \mathcal{G}(u_{k+1}) + \epsilon_{k+1}, &&\epsilon_{k+1} \sim \mathcal{N}(0, K\Sigma) \tag{35} \newline u_0 &\sim \pi_0 \tag{36} \end{align} \] The lines (34) and (36) define our artificial dynamics in \(u\), with the former providing the evolution equation and the latter the initial condition. These dynamics are rather uninteresting; the evolution operator is the identity, meaning that the state remains fixed at its initial condition. All of the interesting bits here come into play in the observation model (34). We observe that, by construction, the filtering distribution of this dynamical system at time step \(k\) is given by \(\pi_k\): \[ \pi_k(u_k) = p(u_k | y_1 = y, \dots, y_k = y). \tag{37} \] To be clear, we emphasize that the quantities \(y_1, \dots, y_K\) in the observation model (35) are random variables, and \(y_k = y\) indicates that the condition that the random variable \(y_k\) is equal to the fixed data realizaton \(y\).
7.2.1 Extending the State Space
We now provide an alternative, but equivalent, formulation that gives another useful perspective. Observe that the observation model (34) can be written as \[ y_k = \mathcal{G}(u_k) + \epsilon_{k} = \begin{bmatrix} 0 & I \end{bmatrix} \begin{bmatrix} u_k \\ \mathcal{G}(u_k) \end{bmatrix} + \epsilon_{k} =: Hv_k + \epsilon_k, \tag{38} \] where we have defined \[\begin{align} H &:= \begin{bmatrix} 0 & I \end{bmatrix} \in \mathbb{R}^{p \times (d+p)}, &&v_k := \begin{bmatrix} u_k \newline \mathcal{G}(u_k) \end{bmatrix} \in \mathbb{R}^{d+p}. \tag{38} \end{align}\] We will now adjust the dynamical system (33) to describe the dynamics with respect to the state vector \(v_k\): \[ \begin{align} v_{k+1} &= v_k \tag{40} \newline y_{k+1} &= Hv_{k+1} + \epsilon_{k+1}, &&\epsilon_{k+1} \sim \mathcal{N}(0, K\Sigma) \tag{41} \newline u_0 &\sim \pi_0. \tag{42} \end{align} \] We continue to write the initial condition (42) with respect to \(u\) but note that the distribution on \(u_0\) induces an initial distribution for \(v_0\). Why extend the state space in this way? For one, the observation operator in (41) is now linear. Linearity of the observation operator is a common assumption in the data assimilation literature, so satisfying this assumption allows us more flexibility in choosing a filtering algorithm. The main reason I opt to include the state space extension here is that this is the approach taken in Iglesias et al (2012), which is the first paper to systematically propose and analyze the application of the EnKF to inverse problems. In effect, if you have defined the EnKF with respect to a linear observation operator (as is commonly done), then the extended state space formulation allows you to extend the algorithm to the nonlinear case. As we will see, what you ultimately get is identical to the joint Gaussian approximation viewpoint used in deriving (27).
This extended state space formulation still gives rise to the sequence \(\pi_0, \dots, \pi_K\) as before. However, the distribution \(\pi_k\) is now a marginal of filtering distribution for \(v_k\) (the marginal corresponding to the first \(d\) entries of \(v_k\)).
7.3 Applying Filtering Algorithms
We have now considered a couple options to formulate the Bayesian inverse problem (1) as a discrete dynamical system, such that the filtering distributions of the artificial system are given by \(\pi_1, \dots, \pi_K\), as defined in (29). The distribution at the final time step satisfies \(\pi_K = \pi\). This points to a possible algorithmic approach to posterior inference. If we can propagate an ensemble of particles such that they are distributed according to \(\pi_k\) at time step \(k\), then the ensemble will represent the posterior at time step \(K\). Particle filtering methods might be considered to exactly implement this Monte Carlo scheme. However, in this post we consider approximate methods rooted in Kalman methodology. Specifically, let’s consider applying the EnKF to the model given in (40). We could also consider (34) and get the same result, but I’ll follow Iglesias et al (2012) in using the extended state space formulation to generalize the linear observation operator EnKF to nonlinear operators. A direct application of the EnKF to the system (40) gives the following recursive algorithm.Posterior Approximation via EnKF Update in Finite Time.
Time k=0:
Generate the initial ensemble \(\{v_0^{(j)}\}_{j=1}^{J}\), where \[ \begin{align} &v^{(j)}_0 := \begin{bmatrix} u^{(j)}_0, \mathcal{G}(u^{(j)}_0) \end{bmatrix}^\top, &&u^{(j)}_0 \overset{iid}{\sim} \pi_0. \tag{43} \end{align} \] Time k+1:
1. Perform the forecast step: \[ \hat{v}^{(j)}_{k+1} := v^{(j)}_{k}, \qquad j = 1, \dots, J \tag{44} \]
2. Compute the sample estimates: \[ \begin{align} \overline{v}_{k+1} &:= \frac{1}{J} \sum_{j=1}^{J} \hat{v}^{(j)}_{k+1} \tag{45} \newline \hat{C}^v_{k+1} &:= \frac{1}{J-1} \sum_{j=1}^{J} (\hat{v}^{(j)}_{k+1} - \overline{v}_{k+1}) (\hat{v}^{(j)}_{k+1} - \overline{v}_{k+1})^\top. \end{align} \] 3. Perform the analysis step: \[ \begin{align} v^{(j)}_{k+1} &:= \hat{v}^{(j)}_{k+1} + \hat{C}^{v}_{k+1}H^\top[H\hat{C}^{v}_{k+1}H^\top + K\Sigma]^{-1}(y-y^{(j)}), &&y^{(j)} \sim \mathcal{N}(H\hat{v}_{k+1}^{(j)}, K\Sigma). \tag{46} \end{align} \]
8 References
- The Ensemble Kalman Filter for Inverse Problems (Iglesias et al, 2012)