Basis Expansions for Black-Box Function Emulation
A discussion of the popular output dimensionality strategy for emulating multi-output functions.
Setup
Suppose we are working with a function \begin{align} \mathcal{G}: \mathcal{U} \subset \mathbb{R}^d \to \mathbb{R}^p, \tag{1} \end{align} where the output dimension is presumed to be “large”, potentially in the hundreds or thousands. We treat generically as a black-box function, but in common applications of interest it takes the form of an expensive computer simulation model. We will therefore use the terms black-box function, computer model, and simulator interchangeably throughout this post. Our primary goal of interest is to construct a model that approximates the map . We will refer to such a model as an emulator or surrogate model. The implicit assumption here is that computing is quite expensive, so the idea is to replace with a computationally cheaper approximation. If you don’t care about emulating expensive computer models, you can also view this generically as a regression (or interpolation) problem with a very high-dimensional output space. To further elucidate the connection, suppose that we evaluate the function at a set of points , resulting in the input-output pairs . We can now treat these pairs as training examples that can be use to fit a predictive model. From this point of view, the primary distinction from more traditional regression is that in this case we get to choose the input points at which we evaluate the function.
The methods we discuss below attempt to address two primary challenges:
- Function evaluations are expensive, and thus we seek to minimize the number of points at which is evaluated.
- The output space of is very high-dimensional.
The second challenge can be problematic for standard regression methods, which
are often tailored to scalar-valued outputs. The obvious solution here might
be to fit a separate model to predict each individual output; i.e., a model
to emulate the map for each .
With parallel computing resources, such an approach might even be feasible for
very large values of . However, larger numbers of outputs typically come with more
structure; e.g., the outputs may consist of a time series or spatial fields.
The independent output-by-output regression approach completely fails to
leverage this structure. The method discussed in this post seeks to take
advantage of such structure by finding a set of basis vectors that explain the
majority of the variation in the outputs. We proceed by first generically
discussing basis representations of the output space, and how such structure
can be leveraged to address the two challenges noted above. With the general
method defined, we conclude by discussing details for a specific choice of
basis approximation (principal components analysis) and a specific choice
of emulator model (Gaussian process regression).
Basis Representation
From a very generic perspective, the methods we discuss here can be thought of as a method for dimension reduction of the output space of a regression problem. It is perhaps more typical to see dimensionality reduction applied to the input space in such settings, with principal component regression being the obvious example. In this section, we start by discussing a low-dimensional basis representation of the output space, and then explore how such a representation can be leveraged in solving the regression problem. Throughout this introduction, I will assume that the output of is centered; this is made explicit later on when considering a concrete application of PCA.
A Basis Representation of the Output Space
Let’s start by considering approximately representing vectors in the range of , denoted , with respect to a set of orthonormal basis vectors . Given such a set of vectors, we can approximate by its projection onto the subspace : If we stack the basis vectors as columns in a matrix then we can write this projection compactly as We see that is the projection matrix that projects onto the span of the basis vectors. With regards to dimensionality reduction, the benefit here is that the simulator output can now be (approximately) represented using numbers . We can now ask the question: how do we find the basis vectors ? If we are given a set of vectors , we can take an empirical approach and try to use these examples to determine a that is optimal in some well-defined sense. Assuming that is indeed a subspace and we define “optimal” in an average squared error sense, the problem we have laid out here is exactly that of principal components analysis (PCA), a topic I discuss in depth in this post. The only difference is that we are applying PCA on the subspace . At this point, we should emphasize that in practice will often not be a subspace. Computer simulations may produce outputs that are subject to certain constraints, and thus may represent a more complicated subset of . In these cases, one can still typically apply the PCA algorithm to obtain , but the result may be sub-optimal. Alternate methods of basis construction may be warranted depending on the problem at hand.
Linking the Basis Representation with the Input Parameters
In the previous subsection, we considered approximating vectors in the range of the simulator with respect to a set of basis vectors . However, recall that our underlying goal here is to approximate the map . We thus need to consider how to leverage the basis representation of the output space in achieving this goal. Assuming we have already constructed the basis vectors , the above map can be approximated as In words: feed through the simulator and project the resulting output onto the low-dimensional subspace spanned by the basis vectors. Note that stores the weights defining the projection of onto the subspace generated by , thus providing a low dimensional summary of the simulator output. Let’s introduce the notation \begin{align} w(u) &:= B^\top \mathcal{G}(u) = \left[w_1(u), \dots, w_r(u) \right]^\top \in \mathbb{R}^r, && w_r(u) := \langle \mathcal{G}(u), b_r \rangle \end{align} to denote this weights. The basis function approximation to the simulator can thus be written as At this point, this isn’t helpful since the expensive simulation still needs to be run every time is evaluated. To address this, we now turn back to the idea of using emulators. Recall that such an approach was originally hindered due to the high-dimensional output space of the simulator. However, under the approximation , the dependence on has been reduced to , which effectively reduces the output dimension to . The idea is thus to use some sort of statistical model to emulate the map . Suppressing all details for now, let’s suppose we have fit such a model . We can now plug in place of in (5) to obtain the approximation This approximation no longer requires running the full simulator, since evaluating just requires (1) computing the emulator prediction at ; and (2) applying to the emulator prediction. It is worth emphasizing how this approach compares to the direct emulation method. In place of directly trying to approximate the map from to the model outputs , we are now considering approximating the map from to inner products of with a small number of basis vectors. The hope is that these inner products are sufficient to capture the majority of information in the model response.
The General Emulation Model
In the preceding subsections, we introduced the idea of representing the output space (range) of with respect to a low-dimensional basis in order to facilitate emulation of the map . For concreteness, we considered an orthogonal basis, whereby approximations of take the form of orthogonal projections onto the basis vectors. In this section, we take a step back and define the general model, which encompasses basis methods beyond the orthogonal projection framework.
The Basis Function Emulation Model. Given a set of vectors , we refer to a decomposition of the form as the basis function GP emulation model. We write to denote an emulator model that approximates the map .
The basis function emulation model decomposes the computer model output into
- a piece that can be explained by a linear combination of basis functions that are independent of .
- the residual , representing all variation unaccounted for by the basis functions.
We emphasize that the basis functions are independent of the input ; the effect of the inputs is restricted to the coefficients , with unaccounted for -dependence absorbed by the residual term . As noted in the previous section, if we opt for an orthogonal projection approach, then the true weights assume the form but the general model (7) allows for other approaches as well. Under different decomposition strategies, the true weights may not be given by the inner products (8). Nonetheless, we can still consider applying statistical models to approximate the underlying weight maps , regardless of what form these maps may take.
Concrete Details
Having laid out the general model, we now provide concrete details for specific choices of the basis construction and the emulator model. For the former we consider the popular PCA/SVD approach, which we have already hinted at above. For the latter, we consider the use of Gaussian processes (GPs). The combination of these two choices was first presented in the seminal paper Higdon et al (2008). I have a whole in-depth post on PCA, so I will assume general background knowledge on this topic.
Constructing the PCA Basis
Let us consider the construction of the basis vectors using a principal components analysis (PCA) approach. Depending on the field, this strategy might also be termed singular value decomposition (SVD), proper orthogonal decomposition (POD), or empirical orthogonal functions (EOF).
Initial Design
In the absence of prior knowledge about the , PCA takes a purely empirical approach; the simulator is evaluated at a variety of different inputs in order to understand the patterns of variability induced in the model outputs. We refer to the resulting input-output pairs as the design or design points. In practice, the availability of parallel computing resources typically means that model simulations can be run in parallel, thus reducing the computational cost of generating the design. The input design points are often chosen to vary “uniformly” over the input space in some sense. More generally, we might assume that the inputs are governed by some prior distribution which might encode prior knowledge about model parameters or serve to place higher weight on regions of the input space that are deemed more important. In this case, the design inputs might be sampled independently according to . In any case, the spread of the inputs ought to be chosen so that the set of corresponding outputs is representative of variation in the simulator output under typical use cases.
PCA
We denote the design outputs by , and consider stacking these vectors into the rows of a matrix . Define the empirical mean Since PCA is typically defined with respect to a centered data matrix, let denote the analog of constructed using the centered outputs . In other words, results from subtracting from each row of . The rows of provide a summary of the variation in the simulator outputs due to variation in the inputs. Each column of represents a single dimension in the output space. We seek to identify a small number of vectors such that the observed outputs can be explained as a linear combination of these vectors. PCA produces these vectors by computing an eigendecomposition of the (unnormalized) empirical covariance matrix and defining to consist of the dominant eigenvectors of . We denote the eigendecomposition by where contains the eigenvalues sorted in decreasing order of their magnitude, with storing the respective normalized eigenvectors as columns. If we truncate to the dominant eigenvectors, then we define our basis vectors by \begin{align} &B := V_r, &&b_j := v_j, \qquad j = 1, \dots, r. \tag{14} \end{align} Note that is positive semidefinite so for all . We summarize these ideas below.
PCA Approximation. Given design , the PCA-induced approximation to is given by where is defined in (10) and denote the normalized eigenvectors of the matrix defined in (11). The residual term is given by
Gaussian Process Emulators
The above section shows that the PCA approach yields the basis vectors with associated weights . In this section we consider approximating the maps with Gaussian processes (GPs). The specific features of the PCA basis construction provide useful information in designing a reasonable GP model. In particular, we will end up with a set of independent GPs, each with prior mean zero and prior marginal variance (scale) one. We justify each of these choices in turn, and then summarize the GP prior below. We then conclude by deriving how the GP predictive distributions induce a stochastic emulator for .
Zero Prior Mean
Start by noting that the sample mean of the vectors is zero for each . Indeed, The weight functions are thus centered about zero with respect to the design points. If the are indicative of typical variation in the outputs, then the should also be approximately centered with respect to , not just over the design points . Therefore, it appears reasonable to define zero-mean prior distributions on the .
Independent GPs
Next, we justify the choice to model each separately as an independent GP. For this, we point to a result proved in the post post; namely, if we define , then the vectors have empirical covariance with given in (13). In words, the weight vectors have zero sample covariance across dimensions. Similar in spirit to the zero mean case, we hope that this fact approximately holds true beyond the sample of design points. Thus, modeling the processes independently for each seems to be a reasonable choice.
Unit Scale
Finally, we justify the decision to define a prior distribution for . From (17), we see that the empirical variance of is proportional to . To make things simpler, let’s define so that now has unit sample variance. TODO: need to account for factor as well.
References
- Computer Model Calibration using High Dimensional Output (Higdon et al., 2008)
- JMS&Williamson D. B. (2020). ”Efficient calibration for high-dimensional computer model output using basis methods”. arXiv preprint arXiv:1906.05758
- JMS, Dodwell T.J., et al. (2021) ”A History Matching Approach to Building Full-Field Emulators in Composite Analysis”.
- JMS, Williamson D. B., Scinocca J., and Kharin V. (2019). ”Uncertainty quantification for computer models with spatial output using calibration-optimal bases”. Journal of the American Statistical Association, 114.528, 1800-1814.
- Chang, Won, et al. ”Probabilistic calibration of a Greenland Ice Sheet model using spatially resolved synthetic observations: toward projections of ice mass loss with uncertainties.” Geoscientific Model Development 7.5 (2014): 1933-1943.