[This article was first published on R on Fixing the bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

The variance-covariance and the correlation matrices are two entities that describe the association between the columns of a two-way data matrix. They are very much used, e.g., in agriculture, biology and ecology and they can be easily calculated with base R, as shown in the box below.

data(mtcars)
matr <- mtcars[,1:4]

# Covariances
Sigma <- cov(matr)

# Correlations
R <- cor(matr)

Sigma
##              mpg        cyl       disp        hp
## mpg    36.324103  -9.172379  -633.0972 -320.7321
## cyl    -9.172379   3.189516   199.6603  101.9315
## disp -633.097208 199.660282 15360.7998 6721.1587
## hp   -320.732056 101.931452  6721.1587 4700.8669
R
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000

It is useful to be able to go back and forth from variance-covariance to correlation, without going back to the original data matrix. Let’s consider that the variance-covariance of the two variables X and Y is:

[textrm{cov}(X, Y) = sumlimits_{i=1}^{n} {(X_i – hat{X})(Y_i – hat{Y})}]

where (hat{Y}) and (hat{X}) are the means for each variable. The correlation is:

[textrm{cor}(X, Y) = frac{textrm{cov}(X, Y)}{sigma_x sigma_y} ]

where (sigma_x) and (sigma_y) are the standard deviations for X and Y.

The opposite relationship is clear:

[ textrm{cov}(X, Y) = textrm{cor}(X, Y) sigma_x sigma_y]

Therefore, converting from covariance to correlation is pretty easy. For example, take the covariance between ‘cyl’ and ‘mpg’ above (-9.172379), the correlation is:

-633.097208 / (sqrt(36.324103) * sqrt(15360.7998))
## [1] -0.8475514

On the reverse, if we have the correlation (-0.8521620), the covariance is

-0.8475514 * sqrt(36.324103) * sqrt(15360.7998)
## [1] -633.0972

If we consider the whole covariance matrix, we have to take each element in this matrix and divide it by the square roots of the diagonal elements in the same column and in the same row (see figure below).

The question is: how can we do all these calculations in one single step, for all elements in the covariance matrix, to calculate the corresponding correlation matrix?

If we have some memories of matrix algebra, we might remember that if we take a diagonal matrix of order (n times n) and multiply it by a square matrix with the same order, all elements in each column are multiplied by the diagonal element in the corresponding column:

[begin{pmatrix}
1 & 1 & 1 & 1
1 & 1 & 1 & 1
1 & 1 & 1 & 1
1 & 1 & 1 & 1
end{pmatrix}
times
begin{pmatrix}
1 & 0 & 0 & 0
0 & 2 & 0 & 0
0 & 0 & 3 & 0
0 & 0 & 0 & 4
end{pmatrix}
=
begin{pmatrix}
1 & 2 & 3 & 4
1 & 2 & 3 & 4
1 & 2 & 3 & 4
1 & 2 & 3 & 4
end{pmatrix}]

If we reverse the order of factors, all elements in each row are multiplied by the diagonal element in the corresponding row:

[
begin{pmatrix}
1 & 0 & 0 & 0
0 & 2 & 0 & 0
0 & 0 & 3 & 0
0 & 0 & 0 & 4
end{pmatrix}
times
begin{pmatrix}
1 & 1 & 1 & 1
1 & 1 & 1 & 1
1 & 1 & 1 & 1
1 & 1 & 1 & 1
end{pmatrix}
=
begin{pmatrix}
1 & 1 & 1 & 1
2 & 2 & 2 & 2
3 & 3 & 3 & 3
4 & 4 & 4 & 4
end{pmatrix}
]

Therefore, if we take a covariance matrix (Sigma) of order (n times n) and pre-multiply and post-multiply it for the same diagonal matrix of order (n times n), each element in (Sigma) is multiplied by both the diagonal elements in the same row and same column, which is exactly what we are looking for.

In the code below, we:

Create a covariance matrix
Take the square roots of the diagonal element (standard deviations) and load them in a diagonal matrix
Invert this diagonal matrix
Pre-multiply and post-multiply the covariance matrix for this diagonal matrix of inverse standard deviations

StDev <- sqrt(diag(Sigma))
StDevMat <- diag(StDev)
InvStDev <- solve(StDevMat)
InvStDev %*% Sigma %*% InvStDev
##            [,1]       [,2]       [,3]       [,4]
## [1,]  1.0000000 -0.8521620 -0.8475514 -0.7761684
## [2,] -0.8521620  1.0000000  0.9020329  0.8324475
## [3,] -0.8475514  0.9020329  1.0000000  0.7909486
## [4,] -0.7761684  0.8324475  0.7909486  1.0000000

Going from correlation to covariance can be done similarly, although, in this case, together with the correlation matrix we also need to have the standard deviations of the original variables, because they are not included in the matrix under transformation:

StDevMat %*% R %*% StDevMat
##             [,1]       [,2]       [,3]      [,4]
## [1,]   36.324103  -9.172379  -633.0972 -320.7321
## [2,]   -9.172379   3.189516   199.6603  101.9315
## [3,] -633.097208 199.660282 15360.7998 6721.1587
## [4,] -320.732056 101.931452  6721.1587 4700.8669

Solutions with R

Is there any other solutions for those who are not accustomed to matrix algebra The easiest way to go from covariance to correlation is to use the cov2cor() function in the ‘nlme’ package.

# From covariance to correlation
library(nlme)
cov2cor(Sigma)
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000

With base R, we can sweep() twice:

# From covariance to correlation
sweep(sweep(Sigma, 1, StDev, FUN = "/"), 2, StDev, FUN = "/")
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000
# From correlation to covariance
sweep(sweep(R, 1, StDev, FUN = "*"), 2, StDev, FUN = "*")
##              mpg        cyl       disp        hp
## mpg    36.324103  -9.172379  -633.0972 -320.7321
## cyl    -9.172379   3.189516   199.6603  101.9315
## disp -633.097208 199.660282 15360.7998 6721.1587
## hp   -320.732056 101.931452  6721.1587 4700.8669

We can also scale() and t() twice, but it looks far less neat:

# From covariance to correlation
scale(t(scale(t(Sigma), center = F, scale = StDev)),
      center = F, scale = StDev)
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000
## attr(,"scaled:scale")
##        mpg        cyl       disp         hp
##   6.026948   1.785922 123.938694  68.562868
# From correlation to covariance
scale(t(scale(t(R), center = F, scale = 1/StDev)),
      center = F, scale = 1/StDev)
##              mpg        cyl       disp        hp
## mpg    36.324103  -9.172379  -633.0972 -320.7321
## cyl    -9.172379   3.189516   199.6603  101.9315
## disp -633.097208 199.660282 15360.7998 6721.1587
## hp   -320.732056 101.931452  6721.1587 4700.8669
## attr(,"scaled:scale")
##         mpg         cyl        disp          hp
## 0.165921457 0.559934979 0.008068505 0.014585154

Just curious whether you young students have some better solution; I am sure you have one! Please, drop me a line!

Happy coding!

Prof. Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
Send comments to: andrea.onofri@unipg.it

Follow @onofriandreapg

To leave a comment for the author, please follow the link and comment on their blog: R on Fixing the bridge between biologists and statisticians.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: A trip from variance-covariance to correlation and back

Key Points

The primary focus of the article is on a fundamental aspect of statistical data analysis – the variance-covariance and the correlation matrices. These matrices are often used to describe the association between columns of a two-way data matrix, and their application is broad, spanning across sectors such as agriculture, biology, and ecology. The author discusses the ease with which these matrices can be calculated using the programming language, R.

The relationship between variance-covariance and correlation matrices is highlighted, with the author explaining how to convert one matrix into the other without reverting to the original data. This can be done using simple mathematical formulae. The article then delves into how this conversion can be achieved for the entire covariance matrix, providing a step-by-step guide through the process. The process involves matrix algebra and operations such as pre-multiplication and post-multiplication of the covariance matrix.

Long-term Implications and Future Developments

The understanding of how to work with variance-covariance and correlation matrices is essential for many data science, statistical, and research roles. Therefore, regular use and mastering of these tools can have significant long-term implications for workers in these fields. It can improve efficiency, provide a deeper understanding of the data, and contribute to more accurate research and data analysis results.

The article also properly documents the method, which benefits those looking to automate the process or incorporate it into a larger analytical framework. Future developments can include creating more efficient algorithms and R packages that handle these operations, saving time and computational power. Furthermore, as new statistical techniques emerge, the connections and interactions between variance-covariance and correlation might become even more critical.

Actionable Advice

As data professionals, it’s imperative to understand these mathematical concepts and their applications in R. Practice using the codes illustrated in the article on your own dataset to understand how variance-covariance and correlation matrices work and how one can be derived from the other.

For those already familiar with the matrix algebra, it’s a great chance to reinforce and implement your knowledge. If you’re not, consider the opportunity to learn – understanding matrix operations will significantly benefit your future work in data science and statistics.

Lastly, don’t limit yourself to base R. Explore the capabilities of different packages, such as ‘nlme’, which in this case, provides the function cov2cor(). R functions and packages can greatly simplify your work and make your code much more efficient.

Read the original article

“Converting Variance-Covariance to Correlation and Back: A Practical Guide”

Solutions with R

Key Points

Long-term Implications and Future Developments

Actionable Advice

Submit a Comment Cancel reply

Recent Posts

Recent Comments