Precision Matrices, Partial Correlations, and Conditional Independence

Interpreting the inverse covariance matrix for Gaussian distributions.

In this post, we explore how the precision (inverse covariance) matrix for a Gaussian distribution encodes conditional dependence relations between the underlying variables. We also introduce the notions of conditional and partial correlation, and prove their equivalence in the Gaussian setting. Ideas along these lines are widely used in areas such as probabilistic graphical models (e.g., Gaussian Markov random fields and scalable Gaussian processes) and high-dimensional covariance estimation. A natural complement to this topic is my post on the Cholesky decomposition of a covariance matrix, which provides an alternate route to analyzing conditional dependence structure. The main source for this post is (Lauritzen, 2019), which is freely available online and provides an excellent rigorous introduction to graphical models in statistics.

Setup and Notation

Throughout this post we consider a set of random variables x1,,xnx_1, \dots, x_n, each taking values in R\mathbb{R}. For an (ordered) index set A{1,,n}A \subseteq \{1,\dots,n\}, we write xAx_A to denote the column vector [xi]iA[x_i]_{i \in A} that retains the ordering of AA. We will use the convention that B:={1,,n}AB := \{1,\dots,n\} \setminus A is the complement of AA, and use the shorthand xix_{\sim i} for the vector of all variables excluding xix_i. We also write xixjxBx_i \perp x_j|x_B to mean that xix_i and xjx_j are conditionally independent given xBx_B.

The Precision Matrix of a Gaussian

In this section we will explore how the precision matrix of a Gaussian distribution is closely related to the conditional dependence structure of the random variables x1,,xnx_1, \dots, x_n. Throughout this section, we assume x=(x1,,xn)N(m,C),(1) x = (x_1, \dots, x_n)^\top \sim \mathcal{N}(m, C), \tag{1} where CC is positive definite. Hence, it is invertible, and we denote its inverse by P:=C1,(2) P := C^{-1}, \tag{2} which we refer to as the precision matrix. The precision inherits positive definiteness from CC. In some contexts (e.g., (Lauritzen, 2019)), PP is also called the concentration matrix.

Throughout this section, our focus will be on the dependence between two variables xix_i and xjx_j, conditional on all others. Thus, let’s define the index sets A:={i,j}A := \{i,j\} and B:={1,,n}{i,j}B := \{1,\dots,n\} \setminus \{i,j\}. We partition the joint Gaussian (1) (after possibly reordering the variables) as \begin{align} \begin{bmatrix} x_A \newline x_B \end{bmatrix} &\sim \mathcal{N}\left( \begin{bmatrix} m_A \newline m_B \end{bmatrix}, \begin{bmatrix} C_A & C_{AB} \newline C_{BA} & C_B \end{bmatrix} \right). \tag{3} \end{align}

Our focus is thus on the conditional distribution of xAxBx_A|x_B. We recall that conditionals of Gaussians are themselves Gaussian, and that the conditional covariance takes the form CAB:=Cov[xAxB]=CACABCB1CBA.(4) C_{A|B} := \text{Cov}[x_A|x_B] = C_A - C_{AB}C_B^{-1}C_{BA}. \tag{4} I go through these derivations in depth in this post. For our present purposes, it is important to appreciate the connection between the conditional covariance (4) and the joint precision PP. To this end, let’s consider partitioning the precision in the same manner as the covariance: \begin{align} P &= C^{-1} = \begin{bmatrix} P_A & P_{AB} \newline P_{BA} & P_B \end{bmatrix}. \tag{5} \end{align} The above blocks of PP can be obtained via a direct application of partitioned matrix inverse identities from linear algebra (see, e.g., James E. Pustejovsky’s post for some nice background). Applying the partitioned matrix identity to (5) yields \begin{align} P_A &= [C_A - C_{AB}(C_B)^{-1}C_{BA}]^{-1} \tag{6} \newline P_{AB} &= -(C_B)^{-1} C_{BA}P_A. \tag{7} \end{align} Notice in (6) that PAP_A, the upper-left block of the joint precision, is precisely equal to the inverse of the conditional covariance CABC_{A|B} given in (4). We denote this conditional precision by PAB:=(CAB)1.(8) P_{A|B} := (C_{A|B})^{-1}. \tag{8} We summarize this important connection below.

Joint and Conditional Precision.
The precision matrix of the conditional distribution xAxBx_A|x_B is given by PAB=(CAB)1=PA.(9) P_{A|B} = (C_{A|B})^{-1} = P_A. \tag{9} In words, the conditional precision is obtained by deleting the rows and columns of the joint precision PP that involve the conditioning variables xBx_B.

This connection also leads us to our main result, which states that conditional independence can be inferred from zero entries of the precision matrix. This result follows immediately by rearranging (9) to CAB=(PA)1(10) C_{A|B} = (P_A)^{-1} \tag{10} and noting that PAP_A is simply a two-by-two matrix that we can consider inverting by hand.

Zeros in Precision imply Conditional Independence.
An entry PijP_{ij}, iji \neq j of the joint precision matrix is zero if and only if xix_i and xjx_j are conditionally independent, given all other variables. That is, Pij=0    Cov[xi,xjxB]=0    xixjxB,(11) P_{ij} = 0 \iff \text{Cov}[x_i,x_j|x_B] = 0 \iff x_i \perp x_j | x_B, \tag{11} where B:={1,,n}{i,j}B := \{1, \dots, n\} \setminus \{i,j\}.

Proof. Setting A:={i,j}A := \{i,j\}, the above derivation showed CAB=(PA)1C_{A|B} = (P_A)^{-1}, where PAP_A is the two-by-two block of the joint precision PP corresponding to the variables xix_i and xjx_j. Thus, Cov[xi,xjxB]=[CAB]12=0    [(PA)1]12=0. \text{Cov}[x_i,x_j|x_B] = [C_{A|B}]_{12} = 0 \iff [(P_A)^{-1}]_{12} = 0. We use the well-known formula for the inverse of a two-by-two matrix to obtain \begin{align} (P_A)^{-1} &= \begin{bmatrix} P_{ii} & P_{ij} \newline P_{ji} & P_{jj} \end{bmatrix} = \frac{1}{P_{ii}P_{jj}- P_{ij}^2} \begin{bmatrix} P_{jj} & -P_{ji} \newline -P_{ij} & P_{ii}\end{bmatrix}. \end{align} Notice that the off-diagonal entries of PAP_A and (PA)1(P_A)^{-1} are the same, up to a minus sign. This means Cov[xi,xjxB]=[CAB]12=0    Pij=0, \text{Cov}[x_i,x_j|x_B] = [C_{A|B}]_{12} = 0 \iff P_{ij} = 0, so the result is proved. The fact that conditional uncorrelatedness implies conditional independence follows from the fact that xAxBx_A|x_B is Gaussian. \qquad \blacksquare

The above result interprets the zero values of off-diagonal elements of the precision matrix. Later in this post we will revisit the precision, and see that non-zero values can be interpreted through the lens of partial correlation. An interpretation of the magnitude of the diagonal elements is more straightforward, and is given in the below result.

Diagonal Entries of Precision.
The diagonal entry PiiP_{ii} of the precision gives the reciprocal of the variance of xix_i, conditional on all other variables; that is, Pii=Var[xixi]1(12) P_{ii} = \text{Var}[x_i|x_{\sim i}]^{-1} \tag{12}

Proof. We know from (9) that CAB=(PA)1C_{A|B} = (P_A)^{-1}. Let A:={i}A := \{i\} and B:={1,,n}{i}B := \{1, \dots, n\} \setminus \{i\}. It follows that Var[xixi]=CAB=(PA)1=Pii1. \text{Var}[x_i|x_{\sim i}] = C_{A|B} = (P_A)^{-1} = P_{ii}^{-1}. \qquad \qquad \blacksquare

Notions of Linear Conditional Dependence

In this section, we introduce two analogs of the correlation coefficient that quantify the linear dependence between two variables, conditional on another set of variables. We show that if all random variables are jointly Gaussian, then the two notions coincide.

Conditional Correlation

We start by defining conditional correlation, which is nothing more than the typical notion of correlation but defined with respect to a conditional probability measure. We give the definition with respect to our current setup, but of course the notion can be generalized.

Definition: conditional correlation.
For a pair of indices {i,j}\{i,j\} and its complement B:={1,,n}{i,j}B := \{1, \dots, n\} \setminus \{i,j\}, we define the conditional covariance between xix_i and xjx_j, given xBx_B, as Cov[xi,xjxB]:=EB[(xiEB[xi])(xjEB[xj])],(13) \text{Cov}[x_i,x_j|x_B] := \mathbb{E}^B\left[(x_i-\mathbb{E}^B[x_i])(x_j-\mathbb{E}^B[x_j]) \right], \tag{13} where we denote EB[]:=E[xB]\mathbb{E}^B[\cdot] := \mathbb{E}[\cdot|x_B]. The conditional correlation is then defined as usual by normalizing the covariance (13): Cor[xi,xjxB]:=Cov[xi,xjxB]Var[xixB]Var[xjxB].(14) \text{Cor}[x_i,x_j|x_B] := \frac{\text{Cov}[x_i,x_j|x_B]}{\sqrt{\text{Var}[x_i|x_B]\text{Var}[x_j|x_B]}}. \tag{14}

The conditional correlation is simply a correlation where all expectations involved are conditional on xBx_B. It is sometimes also denoted as ρijB:=Cor[xi,xjxB]\rho_{ij|B} := \text{Cor}[x_i,x_j|x_B].

Partial correlation

We now consider an alternative notion of conditional linear dependence that is defined with respect to underlying linear regression models. For generality, we provide a definition that quantifies dependence between sets of variables xA1x_{A_1} and xA2x_{A_2}, after removing the confounding effect of a third set xBx_{B}. However, our primary interest will be in the special case A1={i}A_1 = \{i\} and A2={j}A_2 = \{j\}, which aligns with the conditional correlation definition given above.

Definition: partial correlation.
For a set of random variables x1,,xnx_1, \dots, x_n, let A1,A2{1,,n}A_1,A_2 \subset \{1, \dots, n\} be index sets and BB a third index set disjoint from the other two. Define the linear regression coefficients \begin{align} \alpha_{A_1}^\star, \beta_{A_1}^\star &:= \text{argmin}_{\alpha,\beta} \mathbb{E}\lVert x_{A_1} - \alpha + \beta_{A_1}^\top x_B)\rVert^2 \tag{15} \newline \alpha_{A_2}^\star, \beta_{A_2}^\star &:= \text{argmin}_{\alpha,\beta} \mathbb{E} \lVert x_{A_2} - (\alpha + \beta_{A_2}^\top x_B)\rVert^2, \end{align} and associated residuals \begin{align} e_{A_1} &:= x_{A_1} - [\alpha_{A_1}^\star + (\beta_{A_1}^\star)^\top x_B] \tag{16} \newline e_{A_2} &:= x_{A_2} - [\alpha_{A_2}^\star + (\beta_{A_2}^\star)^\top x_B]. \end{align} The partial correlation coefficient between xA1x_{A_1} and xA2x_{A_2} given xBx_B is defined as ρA1A2B:=Cor[eA1,eA2].(17) \rho_{A_{1}A_{2} \cdot B} := \text{Cor}[e_{A_1}, e_{A_2}]. \tag{17}

The intuition here is that the residuals from the regressions contain variation that is unexplained by xBx_B, so that ρA1A2B\rho_{A_{1}A_{2} \cdot B} quantifies the linear dependence between xA1x_{A_1} and xA2x_{A_2} after removing the effect of xBx_B. We emphasize the importance of the word “linear” here, as linearity plays a role in two different ways in the above definition. Recall that the typical correlation coefficient measures linear dependence, so the statement in (17) is a measure of linear dependence between the residuals. Moreover, the residuals themselves are defined via linear regressions, and hence the sense in which the effect of xBx_B is “removed” also relies on linearity assumptions. Note also that we are allowing xA1x_{A_1} and xA2x_{A_2} to be sets of variables, so the linear regressions considered above are multi-output regressions, which can essentially be thought of as a set of independent univariate regressions, since the loss function simply sums the error across the outputs. Thus, in general, the α\alphas and β\betas are vectors and matrices, respectively.

Equivalence for Gaussian Distributions

We now introduce the additional assumption that x1,xnx_1, \dots x_n are jointly Gaussian and show that in this setting the definitions of conditional and partial correlation are equivalent. For more detailed discussion on these connections, see (Baba et al., 2004) and (Lawrance, 1976).

Conditional and Partial Correlation for Gaussians.
Suppose that x1,,xnx_1, \dots, x_n are jointly Gaussian, with positive definite covariance CC. For a pair of indices {i,j}\{i,j\} and its complement B:={1,,n}{i,j}B := \{1, \dots, n\} \setminus \{i,j\}, we have ρijB=Cor[xi,xjxB].(18) \rho_{ij \cdot B} = \text{Cor}[x_i,x_j|x_B]. \tag{18} That is, under joint Gaussianity the notions of conditional and partial correlation coincide.

Proof. The result will be proved by establishing that Cov[ei,ej]=Cov[xi,xjxB](19) \text{Cov}[e_i,e_j] = \text{Cov}[x_i,x_j|x_B] \tag{19} for any ii and jj (possibly equal), where eie_i and eje_j are the residual random variables defined in (16) with A1:={i}A_1 := \{i\} and A2:={j}A_2 := \{j\}. The i=ji=j case establishes the equality of the variances, meaning that the correlations will also be equal. To this end, we start by noting that the righthand side in (19) is given by the relevant entry of the matrix CAB=CACABCB1CBA. C_{A|B} = C_A - C_{AB}C_B^{-1}C_{BA}. where A:={i,j}A := \{i,j\}. Recall that CABC_{A|B} is defined in (4). By extracting the relevant entry of CABC_{A|B} we have Cov[xi,xjxB]=CijCiBCB1CBj.(20) \text{Cov}[x_i,x_j|x_B] = C_{ij} - C_{iB}C_B^{-1}C_{Bj}. \tag{20} We now show that Cov[ei,ej]\text{Cov}[e_i,e_j] reduces to (20). We recall that the conditional expectation for a square-integrable random variable is given by the projection E[xixB]=argmingxig(xB)2,(21) \mathbb{E}[x_i|x_B] = \text{argmin}_{g} \lVert x_i - g(x_B)\rVert^2, \tag{21} where the minimum is considered over all xBx_B-measurable functions gg. In our present Gaussian setting, this is solved by E[xixB]=E[xi]+ki,xBE[xB],(22) \mathbb{E}[x_i|x_B] = \mathbb{E}[x_i] + \langle k_i, x_B-\mathbb{E}[x_B]\rangle, \tag{22} where ki=Cov[xB]1Cov[xB,xi]=CB1CBik_i = \text{Cov}[x_B]^{-1} \text{Cov}[x_B,x_i] = C_B^{-1}C_{Bi}. See this post for the derivation of (22). The important thing to notice is that (22) is a linear function of xBx_B, which means that it solves the linear regression problem (15); i.e., αA1+(βA1)xB=E[xixB]\alpha^{\star}_{A_1} + (\beta^{\star}_{A_1})^\top x_B = \mathbb{E}[x_i|x_B]. Thus, ei=xiE[xixB]=xiE[xi]ki,xBE[xB].(23) e_i = x_i - \mathbb{E}[x_i|x_B] = x_i - \mathbb{E}[x_i] - \langle k_i, x_B-\mathbb{E}[x_B]\rangle. \tag{23} and similarly for eje_j. Notice that the constants E[xi]\mathbb{E}[x_i] and E[xB]\mathbb{E}[x_B] will be dropped when taking covariances, so it suffices to treat these as zero. We thus have \begin{align} \text{Cov}[e_i,e_j] &= \text{Cov}[x_i-\langle k_i, x_B\rangle,x_j-\langle k_j, x_B\rangle] \newline &= C_{ij} + k_i^\top C_B k_j - C_{iB}k_j - k_i^\top k_j - k_i^\top C_{Bj} \newline &= C_{ij} + C_{iB}C_B^{-1}C_B C_B^{-1}C_{Bj} - 2C_{iB}C_B^{-1}C_{Bj} \newline &= C_{ij} - C_{iB}C_B^{-1}C_{Bj}. \end{align} We see that the final expression agrees with (20), so the result is proved. \qquad \blacksquare

Precision Matrix and Partial Correlation

Having defined partial and conditional correlation, we now return to the question of interpreting the off-diagonal elements of a Gaussian precision matrix. Throughout this section we assume that x1,,xnx_1, \dots, x_n are jointly Gaussian with positive definite covariance CC and precision matrix PP. The following definition normalizes the precision matrix, analogous to the way a covariance matrix is normalized to produce a correlation matrix.

Normalized Precision.
Define the normalized precision Pˉ\bar{P} as the matrix with elements Pˉij:=PijPiiPjj.(24) \bar{P}_{ij} := \frac{P_{ij}}{\sqrt{P_{ii}P_{jj}}}. \tag{24}

We are now ready to establish the connection between the (normalized) precision and the notions of conditional and partial correlation. Note that the diagonal elements of Pˉ\bar{P} satisfy Pˉii=1\bar{P}_{ii}=1. The following result interprets the off-diagonal elements.

Off-Diagonal Entries of Precision.
Assume that x1,,xnx_1, \dots, x_n are jointly Gaussian with normalized precision matrix Pˉ\bar{P}. Let {i,j}\{i,j\} be a pair of distinct indices and B:={1,,n}{i,j}B := \{1,\dots,n\} \setminus \{i,j\} its complement. Then Pˉij=ρijB=Cor[xi,xjxB].(25) \bar{P}_{ij} = -\rho_{ij\cdot B} = -\text{Cor}[x_i,x_j|x_B]. \tag{25} In words, the off-diagonal entry Pˉij\bar{P}_{ij} is equal to the negated partial correlation between xix_i and xjx_j given all other variables. It is likewise equal to the negated conditional correlation.

Proof. Let A:={i,j}A := \{i,j\}. Recall from (9) that \begin{align} C_{A|B} &= (P_A)^{-1} = \begin{bmatrix} P_{ii} & P_{ij} \newline P_{ji} & P_{jj} \end{bmatrix}^{-1} = \gamma \begin{bmatrix} P_{jj} & -P_{ij} \newline -P_{ji} & P_{ii} \end{bmatrix}, \tag{26} \end{align} with γ:=(PiiPjjPij2)1\gamma := (P_{ii}P_{jj}-P^2_{ij})^{-1}. We have again used the expression for the inverse of a two-by-two matrix. The equality in (26) gives \begin{align} \text{Var}[x_i|x_B] &= \gamma P_{jj} \newline \text{Var}[x_j|x_B] &= \gamma P_{ii} \newline \text{Cov}[x_i,x_j|x_B] &= -\gamma P_{ij}. \end{align} We thus have Pˉij=PijPiiPjj=Cov[xi,xjxB]Var[xixB]Var[xjxB]=Cor[xi,xjxB]. \bar{P}_{ij} = \frac{P_{ij}}{\sqrt{P_{ii}P_{jj}}} = -\frac{\text{Cov}[x_i,x_j|x_B]}{\sqrt{\text{Var}[x_i|x_B]\text{Var}[x_j|x_B]}} = -\text{Cor}[x_i,x_j|x_B]. This establishes the relationship between the precision elements and conditional correlation. Owing to the Gaussian assumption, the equivalence with the partial correlation follows from (18). \qquad \blacksquare

Additional Resources

In addition to (Lauritzen, 2019), (Baba et al., 2004), and (Lawrance, 1976), here is a list of some other resources that cover similar topics.

  1. Lauritzen, S. L. (2019). Lectures on Graphical Models, 3rd edition. Department of Mathematical Sciences, Faculty of Science, University of Copenhagen.
  2. Baba, K., Shibata, R., & Sibuya, M. (2004). PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE. Australian & New Zealand Journal of Statistics, 46(4), 657–664. https://doi.org/https://doi.org/10.1111/j.1467-842X.2004.00360.x
  3. Lawrance, A. (1976). On Conditional and Partial Correlation. American Statistician, 30, 146–149. https://doi.org/10.1080/00031305.1976.10479163