Optimization Viewpoints on Kalman Methodology
Viewing the Kalman Filter and its variants from an optimization perspective.
Throughout this post, we utilize the following notation for inner products and norms weighted by a positive definite matrix : \begin{align} \langle v, v^\prime \rangle_C &:= \langle C^{-1}v, v^\prime\rangle = v^\top C^{-1}v^\prime \newline \lVert v \rVert_C^2 &:= \langle v, v\rangle_C. \end{align} We will also require consideration of generalizing these expressions when is strictly positive semidefinite. The notation and denotes the standard Euclidean norm and inner product, respectively. For a matrix , we will denote by and its range (column space) and kernel (null space), respectively.
Setup
We start by considering the following optimization problem
Optimization Formulation with Positive Definite .
\begin{align} v_{\star} &:= \text{argmin}_{v \in \mathbb{R}^d} J(v) \tag{1} \newline J(v) &= \frac{1}{2} \lVert y - h(v)\rVert_R^2 + \frac{1}{2} \lVert v - \hat{v}\rVert_C, \tag{2} \end{align}
where the objective is defined with respect to a map , fixed vectors and , and positive definite matrices and . A solution to (1) can be viewed as a maximum a posteriori (MAP) estimate under the Bayesian model \begin{align} y|v &\sim \mathcal{N}(h(v), R) \tag{3} \newline v &\sim \mathcal{N}(\hat{v}, C). \tag{4} \end{align} In the case that is linear (i.e., ) then (1) assumes the form of a standard regularized least squares problem, admitting a closed-form solution. In this special case, (1) also defines the optimization problem that is solved to derive the analysis update of the standard Kalman filter. Under this interpretation, is the mean of the current forecast distribution, is the linear observation operator, is the observation covariance, is the covariance of the current forecast distribution, and the solution gives the mean of the Gaussian filtering distribution. Formulation (1) also provides the basis for deriving variants of the Kalman filter, including the ensemble Kalman filter (EnKF). For now, we maintain a generic view of (1) in order to investigate its mathematical properties. Later we will make explicit links with Kalman filter methods.
Generalizing the Optimization Problem
Motivation and Setup
While the formulation (1) is perfectly adequate when is positive definite, it is not well-defined when may be only positive semidefinite. In particular, the inverse of may not exist and hence the term is no longer well-defined. Our first goal is to consider a suitable modification of the optimization problem (1) that is well-defined in this more general setting. The reason for considering this is that various filtering algorithms consider setups similar to (1) where the matrix is replaced by a low-rank approximation. For example, the EnKF defines using a sample covariance estimator, whose rank cannot exceed the number of samples used in the estimator. Thus, if the state dimension exceeds the number of samples, then this matrix will fail to be positive definite. We will discuss specifics regarding the EnKF later on, but for the time being keep the discussion generic. Throughout this section, we will consider a few different optimization formulations that are valid when is positive semidefinite; in the following section, we will pursue solutions of the problems formulated here.
Let be a matrix such that This implies that so is indeed positive semidefinite (as we have assumed). However, when the columns of must be linearly dependent so there is some such that . This implies that is not positive definite when . Note that the range is equal to that of ; i.e.,
Redefining the Objective Function
Our goal is now to extend (1) to the case where need not be strictly positive definite. To start, note that when is positive definite, we have where solves When is invertible, the unique solving (9) is simply found by multiplying both sides by . When is not invertible, then there may be zero or infinitely many such solutions. As long as there is at least one solution to (9) we will see that we can give meaning to the expression even when is only positive semidefinite.
Constrained optimization over
Let’s consider the case where there is at least one solution to (9); i.e., . We define the set of all solutions with respect to the input vector . We therefore might consider generalizing (1) to where we are implicitly restricting the solution space to such that is not empty. To encode this constraint more explicitly, we can modify (11) as where and Note that if is positive definite, then (12) reduces to (1). We can encode this constraint in an unconstrained optimization problem by leveraging the method of Lagrange multipliers. Introducing the Lagrange multiplier yields the following formulation.
Lagrange Multiplier Formulation.
\tilde{J}(v,b,\lambda) = \frac{1}{2} \lVert y - h(v)\rVert_R^2 + \frac{1}{2} \langle b, v-\hat{v}\rangle + \langle \lambda, Cb - v + \hat{v} \rangle. \tag{14}
Removing dependence on
In the previous section, we extended the optimization problem to consider optimizing over pairs in order to deal with the fact that may not be invertible. The below result shows that, for a fixed , the objective in (12) is actually independent of the particular choice of . This fact will allow us to remove the need to jointly optimize over .
Proposition.
Let be a vector such that there is at least one solution to . Then , defined in (12), is constant for any choice of solving this linear system.
Proof. Let such that . It thus follows that where we have used the linearity of the inner product and the fact that is symmetric. Since the inner product term in (12) is the only portion with dependence on , it follows that .
Thus, for each with non-empty, we can simply pick any element from to insert into . The objective will be well-defined since the above result verifies that the specific choice of is inconsequential. A natural choice is to choose the of minimal norm. That is, for non-empty, set This unique minimal norm solution is guaranteed to exist and is conveniently given by the Moore-Penrose pseudoinverse Note that when is positive definite, . We can now eliminate the requirement to optimize over in (12), (13), and (14). The optimization problem now assumes the following form.
Constrained Optimization using Pseudoinverse
\begin{align} v_{\star} &:= \text{argmin}_{v \in \mathcal{V}} \tilde{J}(v) \tag{17} \newline \tilde{J}(v) &:= \frac{1}{2} \lVert y - h(v)\rVert_R^2 + \frac{1}{2} \langle (AA^\top)^{\dagger}(v - \hat{v}), v-\hat{v}\rangle \tag{18} \newline \mathcal{V} &:= \\{v \in \mathbb{R}^d : v-\hat{v} \in \mathcal{R}(A)\\}. \tag{19} \end{align}
An important consequence of this formulation is that any solution must lie in the affine space ; this is immediate from the definition of . In particular, if is positive definite then so the solution space is unconstrained. Otherwise, the rank of imposes a constraint on the solution space.
Proposition.
A solution to the optimization problem (17) is constrained to the affine space , which has rank at most .
Unconstrained Optimization
We go one step farther in the problem formulation before turning to solutions in the next section. Note that, in contrast to (1), (17) is a constrained optimization problem. We now re-formulate (17) as an unconstrained problem, which will be more convenient when considering solutions. As noted above, the solution space of (17) is given by the affine space . Any vector in this space can thus be written as for some weight vector . We have denoted the column of by . If the columns of are linearly independent, then provide a basis for . If not, then there will be some redundancy in the weights and the representation (20) will not be unique. We can now substitute for in (18) and thus formulate the optimization over the weights :
We can simplify this even further using the following properties of the pseudoinverse:
- By definition, the pseudoinverse satisfies .
- The matrix is a projection onto the rowspace of ; in particular it is idempotent (i.e., ) and symmetric.
- The following identity holds: .
We can thus simplify the second term in (21) as \begin{align} \langle (AA^\top)^{\dagger}Aw, Aw\rangle = \langle A^\top(AA^\top)^{\dagger}Aw, w\rangle = \langle (A^{\dagger}A)w, w\rangle &= \langle (A^{\dagger}A)^2w, w\rangle \newline &= \langle (A^{\dagger}A)w, (A^{\dagger}A)w\rangle \newline &= \lVert (A^{\dagger}A)w \rVert^2, \tag{22} \end{align} where the second equality uses the third pseudinverse property above. The third and fourth equalities follow from the fact that is idempotent and symmetric. We can thus re-parameterize as and note that by the first pseudoinverse property we have which allows us to replace by in the first term in (21). We obtain We see that (25) is the sum of a model-data fit term plus a simple penalty on the weights. This final formulation is summarized below, where we have re-labelled as to lighten notation.
Unconstrained Formulation.
The optimization problem in (17) can equivalently be formulated as the unconstrained problem with objective function
One might initially take issue with the claim that (26) is actually unconstrained given that it relies on the re-parameterization (23), which constrains the weights to lie in the range of . However, recalling that is an orthogonal projection, we have the following projection property: In other words, allowing the weights to be unconstrained can only increase the objective function (since switching and has no effect on the first term in (26)), so we are justified in considering the weights to be unconstrained in (26). In fact, we could have jumped right to this conclusion from (22) and avoided needing to define at all.
Solving the Optimization Problem.
With the optimization problem (17) defined, we now consider the characterization of its solution. In general, this is challenging due to the potential nonlinearity of . We start by considering the simplified linear case, before returning to the more general setting.
Linear
Assume that for some matrix . Under a data assimilation interpretation, this corresponds to the common assumption of a linear observation operator.
References
- Data Assimilation Fundamentals (Vossepoel, Evensen, and van Leeuwen; 2022)
- Inverse Problems and Data Assimilation (Stuart, Taeb, and Sanz-Alonso)
- Ensemble Kalman Methods with Constraints (Albers et al, 2019)
TODOs
Introduce the EnKF optimization by extending the state space.