Precision Matrices, Partial Correlations, and Conditional Independence
Interpreting the inverse covariance matrix for Gaussian distributions.
In this post, we explore how the precision (inverse covariance) matrix for a Gaussian distribution encodes conditional dependence relations between the underlying variables. We also introduce the notions of conditional and partial correlation, and prove their equivalence in the Gaussian setting. Ideas along these lines are widely used in areas such as probabilistic graphical models (e.g., Gaussian Markov random fields and scalable Gaussian processes) and high-dimensional covariance estimation. A natural complement to this topic is my post on the Cholesky decomposition of a covariance matrix, which provides an alternate route to analyzing conditional dependence structure. The main source for this post is (Lauritzen, 2019), which is freely available online and provides an excellent rigorous introduction to graphical models in statistics.
Setup and Notation
Throughout this post we consider a set of random variables , each taking values in . For an (ordered) index set , we write to denote the column vector that retains the ordering of . We will use the convention that is the complement of , and use the shorthand for the vector of all variables excluding . We also write to mean that and are conditionally independent given .
The Precision Matrix of a Gaussian
In this section we will explore how the precision matrix of a Gaussian distribution is closely related to the conditional dependence structure of the random variables . Throughout this section, we assume where is positive definite. Hence, it is invertible, and we denote its inverse by which we refer to as the precision matrix. The precision inherits positive definiteness from . In some contexts (e.g., (Lauritzen, 2019)), is also called the concentration matrix.
Throughout this section, our focus will be on the dependence between two variables and , conditional on all others. Thus, let’s define the index sets and . We partition the joint Gaussian (1) (after possibly reordering the variables) as \begin{align} \begin{bmatrix} x_A \newline x_B \end{bmatrix} &\sim \mathcal{N}\left( \begin{bmatrix} m_A \newline m_B \end{bmatrix}, \begin{bmatrix} C_A & C_{AB} \newline C_{BA} & C_B \end{bmatrix} \right). \tag{3} \end{align}
Our focus is thus on the conditional distribution of . We recall that conditionals of Gaussians are themselves Gaussian, and that the conditional covariance takes the form I go through these derivations in depth in this post. For our present purposes, it is important to appreciate the connection between the conditional covariance (4) and the joint precision . To this end, let’s consider partitioning the precision in the same manner as the covariance: \begin{align} P &= C^{-1} = \begin{bmatrix} P_A & P_{AB} \newline P_{BA} & P_B \end{bmatrix}. \tag{5} \end{align} The above blocks of can be obtained via a direct application of partitioned matrix inverse identities from linear algebra (see, e.g., James E. Pustejovsky’s post for some nice background). Applying the partitioned matrix identity to (5) yields \begin{align} P_A &= [C_A - C_{AB}(C_B)^{-1}C_{BA}]^{-1} \tag{6} \newline P_{AB} &= -(C_B)^{-1} C_{BA}P_A. \tag{7} \end{align} Notice in (6) that , the upper-left block of the joint precision, is precisely equal to the inverse of the conditional covariance given in (4). We denote this conditional precision by We summarize this important connection below.
Joint and Conditional Precision.
The precision matrix of the conditional distribution is given by In words, the conditional precision is obtained by deleting the rows and columns of the joint precision that involve the conditioning variables .
This connection also leads us to our main result, which states that conditional independence can be inferred from zero entries of the precision matrix. This result follows immediately by rearranging (9) to and noting that is simply a two-by-two matrix that we can consider inverting by hand.
Zeros in Precision imply Conditional Independence.
An entry , of the joint precision matrix is zero if and only if and are conditionally independent, given all other variables. That is, where .
Proof. Setting , the above derivation showed , where is the two-by-two block of the joint precision corresponding to the variables and . Thus, We use the well-known formula for the inverse of a two-by-two matrix to obtain \begin{align} (P_A)^{-1} &= \begin{bmatrix} P_{ii} & P_{ij} \newline P_{ji} & P_{jj} \end{bmatrix} = \frac{1}{P_{ii}P_{jj}- P_{ij}^2} \begin{bmatrix} P_{jj} & -P_{ji} \newline -P_{ij} & P_{ii}\end{bmatrix}. \end{align} Notice that the off-diagonal entries of and are the same, up to a minus sign. This means so the result is proved. The fact that conditional uncorrelatedness implies conditional independence follows from the fact that is Gaussian.
The above result interprets the zero values of off-diagonal elements of the precision matrix. Later in this post we will revisit the precision, and see that non-zero values can be interpreted through the lens of partial correlation. An interpretation of the magnitude of the diagonal elements is more straightforward, and is given in the below result.
Diagonal Entries of Precision.
The diagonal entry of the precision gives the reciprocal of the variance of , conditional on all other variables; that is,
Proof. We know from (9) that . Let and . It follows that
Notions of Linear Conditional Dependence
In this section, we introduce two analogs of the correlation coefficient that quantify the linear dependence between two variables, conditional on another set of variables. We show that if all random variables are jointly Gaussian, then the two notions coincide.
Conditional Correlation
We start by defining conditional correlation, which is nothing more than the typical notion of correlation but defined with respect to a conditional probability measure. We give the definition with respect to our current setup, but of course the notion can be generalized.
Definition: conditional correlation.
For a pair of indices and its complement , we define the conditional covariance between and , given , as where we denote . The conditional correlation is then defined as usual by normalizing the covariance (13):
The conditional correlation is simply a correlation where all expectations involved are conditional on . It is sometimes also denoted as .
Partial correlation
We now consider an alternative notion of conditional linear dependence that is defined with respect to underlying linear regression models. For generality, we provide a definition that quantifies dependence between sets of variables and , after removing the confounding effect of a third set . However, our primary interest will be in the special case and , which aligns with the conditional correlation definition given above.
Definition: partial correlation.
For a set of random variables , let be index sets and a third index set disjoint from the other two. Define the linear regression coefficients \begin{align} \alpha_{A_1}^\star, \beta_{A_1}^\star &:= \text{argmin}_{\alpha,\beta} \mathbb{E}\lVert x_{A_1} - \alpha + \beta_{A_1}^\top x_B)\rVert^2 \tag{15} \newline \alpha_{A_2}^\star, \beta_{A_2}^\star &:= \text{argmin}_{\alpha,\beta} \mathbb{E} \lVert x_{A_2} - (\alpha + \beta_{A_2}^\top x_B)\rVert^2, \end{align} and associated residuals \begin{align} e_{A_1} &:= x_{A_1} - [\alpha_{A_1}^\star + (\beta_{A_1}^\star)^\top x_B] \tag{16} \newline e_{A_2} &:= x_{A_2} - [\alpha_{A_2}^\star + (\beta_{A_2}^\star)^\top x_B]. \end{align} The partial correlation coefficient between and given is defined as
The intuition here is that the residuals from the regressions contain variation that is unexplained by , so that quantifies the linear dependence between and after removing the effect of . We emphasize the importance of the word “linear” here, as linearity plays a role in two different ways in the above definition. Recall that the typical correlation coefficient measures linear dependence, so the statement in (17) is a measure of linear dependence between the residuals. Moreover, the residuals themselves are defined via linear regressions, and hence the sense in which the effect of is “removed” also relies on linearity assumptions. Note also that we are allowing and to be sets of variables, so the linear regressions considered above are multi-output regressions, which can essentially be thought of as a set of independent univariate regressions, since the loss function simply sums the error across the outputs. Thus, in general, the s and s are vectors and matrices, respectively.
Equivalence for Gaussian Distributions
We now introduce the additional assumption that are jointly Gaussian and show that in this setting the definitions of conditional and partial correlation are equivalent. For more detailed discussion on these connections, see (Baba et al., 2004) and (Lawrance, 1976).
Conditional and Partial Correlation for Gaussians.
Suppose that are jointly Gaussian, with positive definite covariance . For a pair of indices and its complement , we have That is, under joint Gaussianity the notions of conditional and partial correlation coincide.
Proof. The result will be proved by establishing that for any and (possibly equal), where and are the residual random variables defined in (16) with and . The case establishes the equality of the variances, meaning that the correlations will also be equal. To this end, we start by noting that the righthand side in (19) is given by the relevant entry of the matrix where . Recall that is defined in (4). By extracting the relevant entry of we have We now show that reduces to (20). We recall that the conditional expectation for a square-integrable random variable is given by the projection where the minimum is considered over all -measurable functions . In our present Gaussian setting, this is solved by where . See this post for the derivation of (22). The important thing to notice is that (22) is a linear function of , which means that it solves the linear regression problem (15); i.e., . Thus, and similarly for . Notice that the constants and will be dropped when taking covariances, so it suffices to treat these as zero. We thus have \begin{align} \text{Cov}[e_i,e_j] &= \text{Cov}[x_i-\langle k_i, x_B\rangle,x_j-\langle k_j, x_B\rangle] \newline &= C_{ij} + k_i^\top C_B k_j - C_{iB}k_j - k_i^\top k_j - k_i^\top C_{Bj} \newline &= C_{ij} + C_{iB}C_B^{-1}C_B C_B^{-1}C_{Bj} - 2C_{iB}C_B^{-1}C_{Bj} \newline &= C_{ij} - C_{iB}C_B^{-1}C_{Bj}. \end{align} We see that the final expression agrees with (20), so the result is proved.
Precision Matrix and Partial Correlation
Having defined partial and conditional correlation, we now return to the question of interpreting the off-diagonal elements of a Gaussian precision matrix. Throughout this section we assume that are jointly Gaussian with positive definite covariance and precision matrix . The following definition normalizes the precision matrix, analogous to the way a covariance matrix is normalized to produce a correlation matrix.
Normalized Precision.
Define the normalized precision as the matrix with elements
We are now ready to establish the connection between the (normalized) precision and the notions of conditional and partial correlation. Note that the diagonal elements of satisfy . The following result interprets the off-diagonal elements.
Off-Diagonal Entries of Precision.
Assume that are jointly Gaussian with normalized precision matrix . Let be a pair of distinct indices and its complement. Then In words, the off-diagonal entry is equal to the negated partial correlation between and given all other variables. It is likewise equal to the negated conditional correlation.
Proof. Let . Recall from (9) that \begin{align} C_{A|B} &= (P_A)^{-1} = \begin{bmatrix} P_{ii} & P_{ij} \newline P_{ji} & P_{jj} \end{bmatrix}^{-1} = \gamma \begin{bmatrix} P_{jj} & -P_{ij} \newline -P_{ji} & P_{ii} \end{bmatrix}, \tag{26} \end{align} with . We have again used the expression for the inverse of a two-by-two matrix. The equality in (26) gives \begin{align} \text{Var}[x_i|x_B] &= \gamma P_{jj} \newline \text{Var}[x_j|x_B] &= \gamma P_{ii} \newline \text{Cov}[x_i,x_j|x_B] &= -\gamma P_{ij}. \end{align} We thus have This establishes the relationship between the precision elements and conditional correlation. Owing to the Gaussian assumption, the equivalence with the partial correlation follows from (18).
Additional Resources
In addition to (Lauritzen, 2019), (Baba et al., 2004), and (Lawrance, 1976), here is a list of some other resources that cover similar topics.
- Graphical Models in Applied Mathematical Multivariate Statistics (Whittaker, 1991)
- Dichotomization, Partial Correlation, and Conditional Independence (Vargha et al, 1996)
- A note on the partial correlation coefficient (Fleiss and Tanur, 1971)
- Kernel Partial Correlation Coefficient — a Measure of Conditional Dependence (Huang et al, 2022)
- Back to the basics: Rethinking partial correlation network methodology (Williams and Rast, 2019)
- Some StackExchange posts on precision matrices and partial correlation: here and here
- Wikipedia article on partial correlation
- Lauritzen, S. L. (2019). Lectures on Graphical Models, 3rd edition. Department of Mathematical Sciences, Faculty of Science, University of Copenhagen.
- Baba, K., Shibata, R., & Sibuya, M. (2004). PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE. Australian & New Zealand Journal of Statistics, 46(4), 657–664. https://doi.org/https://doi.org/10.1111/j.1467-842X.2004.00360.x
- Lawrance, A. (1976). On Conditional and Partial Correlation. American Statistician, 30, 146–149. https://doi.org/10.1080/00031305.1976.10479163