The Ensemble Kalman Filter

I introduce the ensemble Kalman filter as a Monte Carlo approximation to the Kalman filter in the linear Gaussian state space setting, then discuss how is applied as an approximation even when these assumptions don't hold.

As discussed in a previous post, the Kalman filter (KF) provides the closed-form mean and covariance recursions characterizing the Gaussian filtering distributions for linear Gaussian hidden Markov models. In a follow-up post we relaxed the linearity assumption, and considered approximating nonlinear dynamics or observation operators using analytical linearizations, which in turn allowed us to fall back on the standard KF recursions. In this post, we introduce the Ensemble Kalman Filter (EnKF), another approximation scheme rooted in KF methodology. In place of gradient-based linearizations, the EnKF relies on Monte Carlo or particle-based approximations to address the nonlinearity.

In practice, the EnKF is commonly used as an approximation to the true Bayesian solution in settings with nonlinear dynamics or observation operators, and is in particular favored over competitors when the dimension of the state space is quite high (potentially in the millions or billions). In the linear Gaussian setting, it can be viewed as a Monte Carlo approximation to the KF. In this post our main focus will be on the discrete-time, nonlinear Gaussian state space model

\begin{align} v_{k+1} &= g(v_k) + \eta_{k+1} && \eta_{k+1} \sim \mathcal{N}(0, Q) \tag{1} \newline y_{k+1} &= h(v_{k+1}) + \epsilon_{k+1}, && \epsilon_{k+1} \sim \mathcal{N}(0, R) \newline v_0 &\sim \mathcal{N}(m_0, C_0), &&\{\epsilon_k\} \perp \{\eta_k\} \perp v_0, \end{align} with states vkRdv_k \in \mathbb{R}^d, observations ykRny_k \in \mathbb{R}^n, forward dynamics operator g:RdRdg: \mathbb{R}^d \to \mathbb{R}^d, and observation operator h:RdRnh: \mathbb{R}^d \to \mathbb{R}^n. In general, we will consider gg and hh to be nonlinear, though we will discuss simplifications when one or the other is linear. As in previous posts, we will use the notation Yk={y1,,yk}Y_k = \{y_1, \dots, y_k\} to denote the collection of observations through time kk. We denote the forecast and filtering densities at time k+1k+1 by π^k+1(vk+1):=p(vk+1Yk)\hat{\pi}_{k+1}(v_{k+1}) := p(v_{k+1}|Y_k) and πk+1(vk+1):=p(vk+1Yk+1)\pi_{k+1}(v_{k+1}) := p(v_{k+1}|Y_{k+1}), respectively.

Kalman Filter Review

Before introducing the EnKF we very briefly review the KF, which I discuss in detail in this post. The KF is applicable in the special case when the state space model (1) has linear dynamics and a linear observation operator; that is, g(v)=Gvg(v) = Gv and h(v)=Hvh(v) = Hv for some matrices GRd×dG \in \mathbb{R}^{d \times d} and HRn×dH \in \mathbb{R}^{n \times d}. We recall that under these assumptions, the forecast π^k\hat{\pi}_k and filtering πk\pi_k distributions are both Gaussian and are given by

\begin{align} &v_{k+1}|Y_{k} \sim \mathcal{N}\left(\hat{m}_{k+1}, \hat{C}_{k+1} \right), &&v_{k+1}|Y_{k+1} \sim \mathcal{N}\left(m_{k+1}, C_{k+1} \right) \tag{2} \end{align}

where

\begin{align} \hat{C}_{k+1} &= G C_k G^\top + Q \newline \tag{3} \hat{m}_{k+1} &= Gm_k \newline C_{k+1} &= \left(H^\top R^{-1} H + \hat{C}^{-1}_{k+1}\right)^{-1} \newline m_{k+1} &= C_{k+1}\left(H^\top R^{-1}y_{k+1} + \hat{C}^{-1}_{k+1} \hat{m}_{k+1} \right). \end{align}

The Big Picture

There are many different perspectives on the EnKF, so it is easy to get lost in the equations and terminology. We therefore take a moment to orient ourselves before proceeding with the EnKF algorithm. The problem we are trying to solve here is no different from that of the last couple posts; namely, we want to characterize the filtering distributions πk\pi_k at each time step kk in such a way that we can relatively cheaply compute the update πkπk+1(4) \pi_k \mapsto \pi_{k+1} \tag{4} once the new observation yk+1y_{k+1} becomes available at the next time step. Given that the filtering distribution πk+1\pi_{k+1} encodes a compromise between (1) the dynamical model’s prediction and (2) the observed data, it is not surprising that this update boils down to (i) using the dynamical model to forecast one time step ahead, and then (ii) conditioning on the new data point. We can thus decompose the map πkπk+1\pi_k \mapsto \pi_{k+1} as πkπ^k+1πk+1,(5) \pi_k \mapsto \hat{\pi}_{k+1} \mapsto \pi_{k+1}, \tag{5} with the two arrows encoding steps (i) and (ii), respectively. In the linear Gaussian setting, all of these distributions turned out to be Gaussian, and hence characterizing these two updates reduced to deriving update equations for the mean and covariance. Under our current assumptions, this will not be the case in general. Despite this, the EnKF still leverages methodology rooted in the Kalman filter, and thus makes some implicit Gaussian assumptions. However, instead of propagating means and covariances, it propagates an ensemble of samples. Our update maps from above will thus assume the form {vk(j)}j=1J{v^k+1(j)}j=1J{vk+1(j)}j=1J,(6) \{v_k^{(j)}\}_{j=1}^{J} \mapsto \{\hat{v}_{k+1}^{(j)}\}_{j=1}^{J} \mapsto \{v_{k+1}^{(j)}\}_{j=1}^{J}, \tag{6} for an ensemble of samples indexed by j=1,2,,Jj = 1,2, \dots, J. This sample-based approximation is in the spirit of the particle filter, but instead of trying to ensure the samples have the exact correct distributions, the EnKF updates rely on approximations. Given the fact that the EnKF provides results that do not in general align with the exact Bayesian solution, there is a justifiable argument to abandon the Bayesian interpretation of the EnKF and instead view it as a derivative-free optimization algorithm. Indeed, the commonly-cited metrics of the EnKF’s superior performance over alternatives concern its ability to produce point predictions of E[vkYk]\mathbb{E}[v_k|Y_k]. We will explore both of these perspectives throughout this post. The following section introduces the EnKF as a Monte Carlo approximation of the Bayesian filtering distributions, and the following subsection reinterprets the algorithm from an optimization perspective.

Introducing the EnKF: a Monte Carlo Approximation

Recall from the previous post that one way to view the challenge imposed by the nonlinear operators in (1) is that of approximating the distribution of nonlinear maps of Gaussian random variables. The extended Kalman filter addresses this challenge via derivative-based linearization. The EnKF is a derivative-free approach, instead opting for Monte Carlo based approximations. I will synonymously refer to such approximations as sample-based, ensemble-based, or particle-based. We now consider how to go about approximating both the forecast and analysis steps in this fashion.

Forecast

Suppose that we have access to an ensemble (vk(j))j=1J(v_k^{(j)})_{j=1}^{J} providing a particle-based approximation of the filtering distribution πk\pi_k. The goal is now to approximate the forecast distribution π^k+1\hat{\pi}_{k+1}. To do so, we can define a new ensemble (v^k+1(j))j=1J(\hat{v}_{k+1}^{(j)})_{j=1}^{J} by \begin{align} &\hat{v}_{k+1}^{(j)} := g(v_k^{(j)}) + \eta_{k+1}^{(j)}, &&\eta_{k+1}^{(j)} \overset{iid}{\sim} \mathcal{N}(0, Q). \tag{7} \end{align} This is a straightforward Monte Carlo approximation of π^k+1\hat{\pi}_{k+1}; we have just fed the samples through the forward map g()g(\cdot). At this point, there are two sources of error in using (v^k+1(j))j=1J(\hat{v}_{k+1}^{(j)})_{j=1}^{J} to approximate π^k+1\hat{\pi}_{k+1}:

  1. Errors in the input ensemble (vk(j))j=1J(v_k^{(j)})_{j=1}^{J} accumulated from earlier steps of the algorithm; i.e., it may be that vk(j)v_k^{(j)} is not distributed according to πk\pi_k.
  2. Monte Carlo error stemming from the fact that we are using a finite ensemble size JJ to represent the distribution.

The first source of error is systematic, while the latter can be reduced by increasing the ensemble size. At this point, we should note that the forecast step has not contributed any new systematic errors, instead just propagating existing ones. In other words, if vk(j)πkv_k^{(j)} \sim \pi_k for all jj, then the only source of error in (v^k+1(j))j=1J(\hat{v}_{k+1}^{(j)})_{j=1}^{J} would be of the Monte Carlo variety.

We let m^k+1\hat{m}_{k+1} and C^k+1\hat{C}_{k+1} denote the empirical (i.e., sample) mean and covariance matrix of the forecast ensemble. Note that since π^k+1\hat{\pi}_{k+1} may be non-Gaussian, it is likely not characterized by its first two moments alone. Nonetheless, we may consider a Gaussian approximation to the forecast distribution given by N(m^k+1,C^k+1)\mathcal{N}(\hat{m}_{k+1}, \hat{C}_{k+1}). Adopting such an approximation would introduce another source of systematic error stemming from the Gaussian approximation of a potentially non-Gaussian distribution.

Analysis

We now focus on transforming the forecast ensemble (v^k+1(j))j=1J(\hat{v}_{k+1}^{(j)})_{j=1}^{J} into a new ensemble (vk+1(j))j=1J(v_{k+1}^{(j)})_{j=1}^{J} that (approximately) encodes the operation of conditioning on the data yk+1y_{k+1}. We will pursue the same general strategy as we did with the extended Kalman filter in the previous post. Specifically, notice that the filtering distribution πk+1\pi_{k+1} is a conditional of the joint distribution

\begin{align} &\begin{bmatrix} \hat{v}_{k+1} \newline \hat{y}_{k+1} \end{bmatrix} := \begin{bmatrix} v_{k+1} \newline y_{k+1} \end{bmatrix} \bigg| Y_k = \begin{bmatrix} v_{k+1} \newline h(v_{k+1}) + \epsilon_{k+1} \end{bmatrix} \bigg| Y_k. \tag{8} \end{align}

Note that y^k+1\hat{y}_{k+1} is the random variable equal in distribution to h(v^k+1)+ϵk+1h(\hat{v}_{k+1}) + \epsilon_{k+1}, the data distribution implied by the model forecast v^k+1\hat{v}_{k+1}. The strategy is thus to approximate the joint distribution (8) with a Gaussian, and then use the standard Gaussian conditioning identities to obtain the conditional. We will utilize the forecast ensemble (v^k+1(j))j=1J(\hat{v}_{k+1}^{(j)})_{j=1}^{J} in deriving the Gaussian approximation. In particular, we consider a Gaussian approximation of (8) with mean and covariance set to the empirical mean and covariance of the particles {(v^k+1(j),y^k+1(j))}j=1J\{(\hat{v}_{k+1}^{(j)}, \hat{y}_{k+1}^{(j)})\}_{j=1}^{J}, where

\begin{align} \hat{y}_{k+1}^{(j)} &:= h(\hat{v}^{(j)}_{k+1}) + \epsilon^{(j)}_{k+1}, &&\epsilon^{(j)}_{k+1} \overset{iid}{\sim} \mathcal{N}(0, R). \tag{9} \end{align}

Note that the notation y^k+1(j)\hat{y}_{k+1}^{(j)} indicates that this quantity can be interpreted as a simulated observation at time k+1k+1 under the assumption that the current state is equal to the forecast v^k+1(j)\hat{v}^{(j)}_{k+1}. Of course, this will typically differ from the observation yk+1y_{k+1} that is actually observed at this time step, and the discrepancy between these two quantities will help inform the adjustments to the forecast ensemble.

The Gaussian assumption yields the approximation

\begin{align} \begin{bmatrix} v_{k+1} \newline \hat{y}_{k+1} \end{bmatrix} \bigg| Y_k &\overset{d}{\approx} \mathcal{N}(\tilde{m}{k+1}, \tilde{C}{k+1}) := \mathcal{N}\left( \begin{bmatrix} \hat{m}_{k+1} \newline \hat{y}_{k+1} \end{bmatrix}, \begin{bmatrix} \hat{C}_{k+1} & \hat{C}^{vy}_{k+1} \newline \hat{C}^{yv}_{k+1} & \hat{C}^y_{k+1} \end{bmatrix} \right), \tag{10} \end{align}

where

\begin{align} \hat{m}_{k+1} &:= \frac{1}{J} \sum_{j=1}^{J} \hat{v}^{(j)}_{k+1} \tag{11} \newline \hat{y}_{k+1} &:= \frac{1}{J} \sum_{j=1}^{J} \hat{y}^{(j)}_{k+1} \newline \hat{C}_{k+1} &:= \frac{1}{J-1} \sum_{j-1}^{J} (v_{k+1}^{(j)}-\hat{m}_{k+1})(v_{k+1}^{(j)}-\hat{m}_{k+1})^\top \newline \hat{C}_{k+1}^{vy} &:= \frac{1}{J-1} \sum_{j-1}^{J} (v_{k+1}^{(j)}-\hat{m}_{k+1})(\hat{y}_{k+1}^{(j)}-\hat{y}_{k+1})^\top, \newline \hat{C}_{k+1}^{y} &:= \frac{1}{J-1} \sum_{j-1}^{J} (\hat{y}_{k+1}^{(j)}-\hat{y}_{k+1})(\hat{y}_{k+1}^{(j)}-\hat{y}_{k+1})^\top, \end{align} and C^yv:=(C^vy)\hat{C}^{yv} := (\hat{C}^{vy})^\top. We have approximated the joint distribution p(vk+1,yk+1Yk)p(v_{k+1},y_{k+1}|Y_k) with the Gaussian (10), whose mean and covariance have been furnished from empirical estimates derived from the forecast ensemble. Naturally, this approximation introduces a new source of systematic error in the algorithm. The exactness of the Gaussian approximation requires both (1) the current filtering distribution to be Gaussian; and (2) the observation operator h()h(\cdot) to be linear. Each of these assumptions represents a potential source of systematic error. Of course, on top of this there is the Monte Carlo error in the sample mean and covariance estimates.

At this point, let’s consider a few different ways we might leverage the joint Gaussian approximation (10) to generate the filtering ensemble (vk+1(j))j=1J(v^{(j)}_{k+1})_{j=1}^{J}. The third method is the one actually employed by the EnKF.

Gaussian Conditional Moments

Perhaps the most straightforward idea is to apply the typical Gaussian conditioning identities to the joint Gaussian N(m~k+1,C~k+1)\mathcal{N}(\tilde{m}_{k+1}, \tilde{C}_{k+1}) defined in <span class=”katex-error” title=”ParseError: KaTeX parse error: No such environment: align at position 63: …iven by

\begin{̲a̲l̲i̲g̲n̲}̲ m_{k+1} &= \ha…” style=”color:#cc0000”>(10). The conditional mean and covariance are given by

\begin{align} m_{k+1} &= \hat{m}_{k+1} + \hat{C}^{vy}_{k+1}[\hat{C}^y_{k+1}]^{-1}(y_{k+1}-\hat{y}{k+1}) \tag{12} \newline C{k+1} &= \hat{C}_{k+1} - \hat{C}^{vy}_{k+1}[\hat{C}^y_{k+1}]^{-1}\hat{C}^{yv}. \end{align}

Denoting the resulting conditional mean and covariance </span>m_{k+1}and and C_{k+1},respectively,wecouldthendefinetheGaussianapproximation, respectively, we could then define the Gaussian approximation \mathcal{N}(m_{k+1}, C_{k+1})tothefilteringdistribution to the filtering distribution \pi_{k+1}.WecouldusethisGaussiantostudythestateofthesystemattime. We could use this Gaussian to study the state of the system at time k+1,andthendrawsamplesfromittoobtaintheensemble, and then draw samples from it to obtain the ensemble (v_{k+1}^{(j)})_{j=1}^{J},whichcanthenbefedthroughthedynamicalmodeltostartthecycleoveragainattime, which can then be fed through the dynamical model to start the cycle over again at time k+2<span class=”katex-error” title=”ParseError: KaTeX parse error: Expected 'EOF', got '#' at position 4: .

#̲### Sample from…” style=”color:#cc0000”>.

Sample from Gaussian Conditional

Another approach is to essentially follow the same idea, but apply a neat trick to avoid having to explicitly compute </span>m_{k+1}and and C_{k+1},andinsteadsamplefromtheGaussianconditionalwithouthavingtocomputeitsmoments.Matheronsruleprovidesawaytomapsamplesfromthejointdistribution, and instead sample from the Gaussian conditional without having to compute its moments. Matheron's rule provides a way to map samples from the joint distribution \mathcal{N}(\tilde{m}{k+1}, \tilde{C}{k+1})toconditionalsamples: to conditional samples: v^{(j)}{k+1} := \tilde{v}^{(j)} + \hat{C}{k+1}^{vy}[\hat{C}{k+1}^{y}]^{-1}(y{k+1} - \tilde{y}^{(j)}), \qquad (\tilde{v}^{(j)},\tilde{y}^{(j)}) \sim \mathcal{N}(\tilde{m}{k+1}, \tilde{C}{k+1}). \tag{13} Theresultingensemble The resulting ensemble (v_{k+1}^{(j)})_{j=1}^{J}<span class=”katex-error” title=”ParseError: KaTeX parse error: Expected 'EOF', got '#' at position 194: …vious method.

#̲### The EnKF ap…” style=”color:#cc0000”> consists of particles distributed according to the Gaussian with moments (12); in other words, this is simply a different algorithm for obtaining the same ensemble as in the previous method.

The EnKF approach

TODO: Describe the slight modification to Matheron's rule; I should also explicitly add the conditional mean/cov equations in (1) above and probably separate these into sub-sections. Also, look at the mean and covariance implied by the particle update to see if these align with the mean/cov of the conditional Gaussian.

Summary

We have derived (a particular version of) the EnKF by opting for an ensemble representation of the forecast and filtering distributions, and dealing with the nonlinear updates required by the analysis step through the use of Gaussian approximations. Our derivations are summarized below. <blockquote> <p><strong>EnKF Algorithm: Version 1.</strong> Given the current ensemble </span>{v^{(j)}k}{j=1}^{J}attime at time k,theforecastandfilteringensemblesattime, the forecast and filtering ensembles at time k+1 are generated as follows. <br><br>

<strong>Forecast.</strong> \begin{align} &\hat{v}{k+1}^{(j)} := g(v_k^{(j)}) + \eta{k+1}^{(j)}, &&\eta_{k+1}^{(j)} \overset{iid}{\sim} \mathcal{N}(0, Q). \tag{13} \end{align}

<br>

<strong>Analysis.</strong> \begin{align} &v^{(j)}{k+1} := \hat{v}^{(j)}{k} + \hat{C}{k+1}^{vy}[\hat{C}{k+1}^{y}]^{-1}(y_{k+1} - \hat{y}{k+1}^{(j)}), &&\hat{y}{k+1}^{(j)} \sim \mathcal{N}(h(\hat{v}{k+1}^{(j)}),R) \tag{14} \end{align} where </span>\hat{C}{k+1}^{vy}and and \hat{C}_{k+1}^{y}$ are defined in (11). </p> </blockquote>

Special Case: Linear Observation Operator

Let’s now specialize to the common setting where the observation operator is linear: h(v)=Hvh(v) = Hv. Indeed, this assumption is typical when presenting the EnKF in textbooks and the literature (e.g., Sanz-Alonso et al., 2018; Evensen, 2009). This assumption has no effect on the forecast step, but allows for some analytical computations to replace ensemble approximations in the analysis step. Consider the choice of covariance C^k+1y\hat{C}^y_{k+1} for the yy portion of the vector in the Gaussian approximation (10). Previously we considered a Monte Carlo estimate of the covariance based on propagating samples through the nonlinear observation operator h()h(\cdot). Under the linear operator HH we now see that

\begin{align} \text{Cov}[y_{k+1}|Y_k] &= \text{Cov}[Hv_k|Y_k] + \text{Cov}[\epsilon_k|Y_k] \newline &= H \text{Cov}[v_k|Y_k] H^\top + R \newline &\approx H\hat{C}_k H^\top + R, \tag{15} \end{align}

where we have inserted the current approximation C^k\hat{C}_k for the covariance of the filtering distribution at time kk. Similarly, \begin{align} \text{Cov}[v_{k+1}, y_{k+1}|Y_k] &= \text{Cov}[v_{k+1},Hv_{k+1}|Y_k] + \text{Cov}[v_{k+1},\epsilon_k|Y_k] \newline &= \text{Cov}[v_{k+1},v_{k+1}|Y_k]H^\top \newline &\approx \hat{C}_{k+1} H^\top. \tag{16} \end{align}

Expression (10) now simplifies to \begin{align} \begin{bmatrix} v_{k+1} \newline y_{k+1} \end{bmatrix} \bigg| Y_k &\overset{d}{\approx} \mathcal{N}\left( \begin{bmatrix} \hat{m}_{k+1} \newline H\hat{m}_{k+1} \end{bmatrix}, \begin{bmatrix} \hat{C}_{k+1} & \hat{C}_{k+1}H^\top \newline H\hat{C}_{k+1} & H\hat{C}_{k+1}H^\top + R \end{bmatrix} \right). \tag{17} \end{align}

The moments still rely on the empirical estimates m^k+1\hat{m}_{k+1} and C^k+1\hat{C}_{k+1}, but inserts some partial analytic computations in place of the extra sample-based approximations in (10). Recall the discussion following (11) that the Gaussian approximation (10) in general introduces errors stemming from (1) the non-Gaussianity of the samples {v^k+1(j)}j=1J\{\hat{v}^{(j)}_{k+1}\}_{j=1}^{J}; and (2) the nonlinearity of h()h(\cdot). The assumption that HH is linear eliminates this second source of error. However, in general the Gaussian assumption will still represent an approximation due to the first source of error.

We can now once again apply Matheron’s rule to sample from the conditional implied by (17) to obtain the analysis update. This update, the analog of (14), is summarized in the modified algorithm below.

EnKF Algorithm: Linear Observation Operator. Given the current ensemble {vk(j)}j=1J\{v^{(j)}_k\}_{j=1}^{J} at time kk, the forecast and filtering ensembles at time k+1k+1 are generated as follows.

Forecast. \begin{align} &\hat{v}_{k+1}^{(j)} := g(v_k^{(j)}) + \eta_{k+1}^{(j)}, &&\eta_{k+1}^{(j)} \overset{iid}{\sim} \mathcal{N}(0, Q). \tag{18} \end{align}
Analysis. \begin{align} &v^{(j)}_{k+1} := \hat{v}^{(j)}_{k+1} + \hat{C}_{k+1}H^\top[H\hat{C}_{k+1}H^\top + R]^{-1}(y_{k+1} - \hat{y}_{k+1}^{(j)}), &&y_{k+1}^{(j)} \sim \mathcal{N}(H\hat{v}_{k+1}^{(j)},R) \tag{19} \end{align} where C^k+1\hat{C}_{k+1} is defined in (11).

Considerations for Including Noise in the Analysis Step

Stochastic Update

We now take a moment to consider how the data appears in the analysis updates (14) and (19). These updates rely on the quantity yk+1y^k+1(j)y_{k+1} - \hat{y}_{k+1}^{(j)}, which measures the discrepancy between the observed data yk+1y_{k+1}, and a simulated observation y^k+1(j)\hat{y}_{k+1}^{(j)} assuming the underlying state is drawn from the forecast distribution at time k+1k+1. In other words, this is the difference between the observation and the model forecast, with the latter adjusted to account for the noise in the observation. There are other ways we can interpret this; consider: \begin{align} y_{k+1} - \hat{y}_{k+1}^{(j)} &= y_{k+1} - [h(\hat{v}_{k+1}^{(j)}) + \epsilon_{k+1}^{(j)}] \newline &= [y_{k+1} - \epsilon_{k+1}^{(j)}] - h(\hat{v}_{k+1}^{(j)}) \newline &=: y^{(j)}_{k+1} - h(\hat{v}_{k+1}^{(j)}), \tag{20} \end{align} where we have defined the perturbed observation yk+1(j):=yk+1ϵk+1(j).(21) y^{(j)}_{k+1} := y_{k+1} - \epsilon_{k+1}^{(j)}. \tag{21} All we have done here is to rewrite the expression to view the observation yk+1y_{k+1} as being perturbed by the noise instead of the model forecast h(v^k+1(j))h(\hat{v}_{k+1}^{(j)}). Note the since the noise has a symmetric distribution about zero, then yk+1ϵk+1(j)=dyk+1+ϵk+1(j).(22) y_{k+1} - \epsilon_{k+1}^{(j)} \overset{d}{=} y_{k+1} + \epsilon_{k+1}^{(j)}. \tag{22} So the alternative definition yk+1(j):=yk+1+ϵk+1(j)y^{(j)}_{k+1} := y_{k+1} + \epsilon_{k+1}^{(j)} is equivalent from a probabilistic perspective, and indeed both conventions exist in the literature. The perturbations to the data in the EnKF update (14) fell right out of our derivation of the EnKF; specifically, this followed from the viewpoint we adopted of the update (14) as simply drawing a sample from the conditional distribution (10). Using terminology favored by the spatial statistics community, this perspective on the EnKF might be termed approximate conditional simulation (Katzfuss et al., 2016), with the “approximate” part highlighting that the joint Gaussian assumption (10) is generally an approximation. Even in the linear Gaussian setting the conditional simulation is still approximate stemming from the fact that the covariances in (10) are estimated from a finite set of samples. Regardless of the interpretation, it is important to note that the update (14) is stochastic; i.e., even viewed as conditional on the forecast ensemble, additional noise is introduced by sampling the simulated (or perturbed, depending on the perspective) observations. The particle updates (14) are then given by an affine transformation of the forecast particles, with the magnitude of the adjustment controlled by the difference between the simulated and observed data.

Deterministic Update

While the stochastic update naturally falls out of the approximate conditional simulation derivation we originally considered, this is but one of many perspectives on the EnKF. A popular alternative to the stochastic update (14) eliminates the use of simulated or perturbed observations, instead opting for a deterministic update. There are many variations on the deterministic update idea, but most fall under the heading of “Ensemble square-root filters” (Tippett et al, 2003).

TODO:

The Optimization Perspective

The SDE/MCMC Perspective

See Evensen (2009)

The Ensemble Span

References

  1. G. Evensen, Data Assimilation: The Ensemble Kalman Filter. New York: Springer, 2007.
  2. G. Evensen, The Ensemble Kalman Filter for Combined State and Parameter Estimation. Monte Carlo Techniques for Data Assimilation in Large Systems (2009).
  3. Inverse Problems and Data Assimilation (Sanz-Alonso et al, 2018).
  4. D. Kelly, K. J. H. Law, and A. M. Stuart. Well-posedness and accuracy of the ensemble Kalman filter in discrete and continuous time. Nonlinearity, 27(10):2579, 2014.
  5. K. J. H. Law, A. M. Stuart, and K. Zygalakis. Data Assimilation. Springer, 2015.
  6. Y. Chen and D. Oliver. Ensemble randomized maximum likelihood method as an iterative ensemble smoother. Mathematical Geosciences, 44(1):1–26, 2002.
  7. J. Carrillo, F. Hoffmann, A. Stuart, and U. Vaes. The Ensemble Kalman filter in the near-Gaussian setting. arXiv preprint arXiv:2212.13239, 2022.
  8. A. Carrassi, M. Bocquet, L. Bertino, and G. Evensen. Data assimilation in the geo- sciences: An overview of methods, issues, and perspectives. Wiley Interdisciplinary Reviews: Climate Change, 9(5), 2018.
  9. E. Calvello, S. Reich, and A. M. Stuart. Ensemble Kalman Methods: A Mean Field Perspective. arXiv, 2022.
  10. The Ensemble Kalman filter: a signal processing perspective (Roth et al, 2017)
  11. Understanding the Ensemble Kalman Filter (Katzfuss et al., 2016)
  12. Tippett, M. K., Anderson, J. L., Bishop, C. H., Hamill, T. M., and Whitaker, J. S. (2003), “Ensemble Square-Root Filters,” Monthly Weather Review, 131, 1485–1490.