library(ESRM64503)
library(kableExtra)
show_table(head(dataSAT))| id | SATV | SATM |
|---|---|---|
| 1 | 520 | 580 |
| 2 | 520 | 550 |
| 3 | 460 | 440 |
| 4 | 560 | 530 |
| 5 | 430 | 440 |
| 6 | 490 | 530 |
Matrix Algebra in R
Jihong Zhang*, Ph.D
Educational Statistics and Research Methods (ESRM) Program* University of Arkansas
October 7, 2024
🟢 Ready!
See the online syllabus here
Recommended for ESRM students and others interested in applied statistics in the social sciences.
| id | SATV | SATM |
|---|---|---|
| 1 | 520 | 580 |
| 2 | 520 | 550 |
| 3 | 460 | 440 |
| 4 | 560 | 530 |
| 5 | 430 | 440 |
| 6 | 490 | 530 |
Last several rows of the data:
| id | SATV | SATM | |
|---|---|---|---|
| 995 | 995 | 570 | 560 |
| 996 | 996 | 480 | 420 |
| 997 | 997 | 430 | 330 |
| 998 | 998 | 560 | 540 |
| 999 | 999 | 470 | 410 |
| 1000 | 1000 | 540 | 660 |
Relationship between SATV and SATM:
Matrix operations are fundamental to all modern statistical software.
When you installed R, it also came with required matrix algorithm libraries. Two popular ones are BLAS and LAPACK.
Other optimized libraries include OpenBLAS, AtlasBLAS, GotoBLAS, and Intel MKL.
{bash} Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
From the LAPACK website,
LAPACK is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems.
LAPACK routines are written so that as much as possible of the computation is performed by calls to the Basic Linear Algebra Subprograms (BLAS).
A matrix (denoted with a capital X) is composed of a set of elements.
matrix[rowIndex, columnIndex] to extract the element at rowIndex and columnIndex.[1] 3
[1] 5
[1] 3 4
[1] 1 3 5
In statistics, we use xij to represent the element in the ith row and jth column. For example, consider a matrix X with 1,000 rows and 2 columns:
The first subscript is the index of the rows
The second subscript is the index of the columns
X=x11x21…x1000,1x12x22…x1000,2
A scalar is just a single number
The name scalar is important: the number “scales” a vector—it can make a vector longer or shorter.
Scalars are typically written without boldface:
x11=520
Each element of a matrix is a scalar.
Matrices can be multiplied by a scalar so that each element is multiplied by that scalar.
X=520520⋮540580550⋮660
XT=[520580520550⋯⋯540660]
An element xij in the original matrix X becomes xji in the transposed matrix XT.
Transposes are used to align matrices for operations where the sizes of matrices matter (such as matrix multiplication)
Square Matrix: A square matrix has the same number of rows and columns.
Diagonal Matrix: A diagonal matrix is a square matrix with nonzero diagonal elements (xij=0 for i=j) and zeros on the off-diagonal elements (xij=0 for i=j):
A=2.7580001.6430000.879
Symmetric Matrix: A symmetric matrix is a square matrix where all elements are reflected across the diagonal (xij=xji).
y=a1x1+a2x2+⋯+akxk=A−1X
Here, y is the linear combination
For all k vectors, the set of all possible linear combinations is called their span
This is typically not emphasized in most analyses—but when working with latent variables it becomes important.
In data, linear combinations occur frequently:
Linear models (i.e., Regression and ANOVA)
Principal components analysis
Question: Do generalized linear models contain linear combinations? True; link function + linear predictor.
An important concept in vector geometry is the inner product of two vectors.
a⋅b=a11b11+a21b21+⋯+aN1bN1=i=1∑Nai1bi1
[,1]
[1,] 1
[2,] 2
[,1]
[1,] 2
[2,] 3
[,1]
[1,] 8
[,1]
[1,] 8
Using our example data dataSAT,
Use cbind() to combine by columns and rbind() to combine by rows. Dimensions must be conformable:
cbind requires the same number of rows.rbind requires the same number of columns.Compute the dot product of two column vectors and verify it equals the corresponding element of a matrix product.
Define two vectors a = [1, 3, 5]^T and b = [2, 4, 6]^T as 3x1 matrices.
Compute the dot product a ⋅ b in two ways:
crossprod(a, b)t(a) %*% bA = [a a] (3x2) and B = [b b] (3x2). Compute t(A) %*% B. Which entry of this 2x2 result equals a ⋅ b?The dot product is a ⋅ b = 1*2 + 3*4 + 5*6 = 44.
Both crossprod(a, b) and t(a) %*% b return 44.
Forming A = [a a] and B = [b b], the product t(A) %*% B is a 2x2 matrix where every entry equals a ⋅ b = 44, because each entry is the dot product between a column of A and a column of B.
A matrix can be thought of as a collection of vectors.
df$[name] or matrix[, index] to extract a single vector.Matrix algebra defines a set of operations and entities on matrices.
Definitions:
Identity matrix
Zero vector
Ones vector
Basic Operations:
Addition
Subtraction
Multiplication
“Division”
Matrix addition and subtraction are much like vector addition and subtraction.
Rules: Matrices must be the same size (rows and columns).
Be careful! R may not produce an error message when adding a matrix and a vector.
Method: the new matrix is constructed by element-by-element addition/subtraction of the previous matrices.
Order: the order of the matrices (pre- and post-) does not matter.
[,1] [,2]
[1,] 1 2
[2,] 3 4
[,1] [,2]
[1,] 5 6
[2,] 7 8
[,1] [,2]
[1,] 6 8
[2,] 10 12
[,1] [,2]
[1,] -4 -4
[2,] -4 -4
Error in A + C: non-conformable arrays
A(r×c)B(c×k)=C(r×k)
Rules: The pre-multiplying matrix must have a number of columns equal to the number of rows of the post-multiplying matrix.
Method: the elements of the new matrix consist of the inner (dot) products of the row vectors of the pre-multiplying matrix and the column vectors of the post-multiplying matrix.
Order: The order of the matrices matters.
R: Use the %*% operator or crossprod to perform matrix multiplication.
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[,1] [,2]
[1,] 5 6
[2,] 7 8
[3,] 9 10
[,1] [,2]
[1,] 46 52
[2,] 109 124
[,1] [,2] [,3]
[1,] 29 40 51
[2,] 39 54 69
[3,] 49 68 87
X(n×k)c=X(n×k)−I(n×1)∗Xˉ(1×k) where n is the number of cases (sample size) and k is the number of variables.
Sect1 Sect2
[1,] 41.4 -20
[2,] 23.4 -14
[3,] -16.6 -28
[4,] 26.4 5
[5,] 33.4 36
[6,] 6.4 -31
[7,] -45.6 -27
[8,] -26.6 52
[9,] -11.6 49
[10,] -30.6 -22
datSAT.X_centerround(colMeans(X_center), 3) to see if the means are 0sThe identity matrix (denoted as I) is a matrix that, when pre- or post-multiplied by another matrix, results in the original matrix:
AI=A
IA=A
The identity matrix is a square matrix that has:
Diagonal elements = 1
Off-diagonal elements = 0
I(3×3)=100010001
R: Create an identity matrix using the diag() function.
The zero and one vectors are column vectors of zeros and ones, respectively:
0(3×1)=000
1(3×1)=111
When pre- or post-multiplied with a matrix (A), the result with the zero vector is:
A0=0
0TA=0
R:
Division from algebra:
First: ba=b−1a
Second: ba=1
“Division” in matrices serves a similar role.
For square symmetric matrices, an inverse matrix is a matrix that, when pre- or post-multiplied with another matrix, produces the identity matrix:
A−1A=I
AA−1=I
R: Use solve() to calculate the matrix inverse.
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
In data analysis, the inverse shows up constantly in statistics.
Using our SAT example:
Our data matrix has size (1000×2), which is not invertible.
However, XTX has size (2×2)—square and symmetric.
SATV SATM
SATV 251797800 251928400
SATM 251928400 254862700
(A+B)+C=A+(B+C)
A+B=B+A
c(A+B)=cA+cB
(c+d)A=cA+dA
(A+B)T=AT+BT
(cd)A=c(dA)
(cA)T=cAT
c(AB)=(cA)B
A(BC)=(AB)C
We end our matrix discussion with some advanced topics.
To help us throughout, consider the correlation matrix of our SAT data:
SATV SATM
SATV 1.0000000 0.7752238
SATM 0.7752238 1.0000000
R=[1.000.780.781.00]
For a square matrix A with p rows/columns, the matrix trace is the sum of the diagonal elements:
trA=i=1∑paii
In R, we can use tr() in psych package to calculate matrix trace
For our data, the trace of the correlation matrix is 2
For all correlation matrices, the trace is equal to the number of variables
The trace is considered as the total variance in multivariate statistics
A square matrix can be characterized by a scalar value called a determinant:
detA=∣A∣
Manual calculation of the determinant is tedious. In R, we use det() to calculate matrix determinant
The determinant is useful in statistics:
Shows up in multivariate statistical distributions
Is a measure of “generalized” variance of multiple variables
If the determinant is positive, the matrix is called positive definite → the matrix has an inverse
If the determinant is zero, the matrix is called singular matrix → the matrix does not have an inverse, cannot used for multivariate analysis.
If the determinant is not positive, the matrix is called non-positive definite → the matrix has an inverse but very rare in multivariate analysis, maybe due to the mistakes in your data.
Matrices show up nearly anytime multivariate statistics are used, often in the help/manual pages of the package you intend to use for analysis
You don’t have to do matrix algebra, but please do try to understand the concepts underlying matrices
Your working with multivariate statistics will be better off because of even a small amount of understanding
The covariance matrix S is found by:
S=N−11(X−1⋅xˉT)T(X−1⋅xˉT)
SATV SATM
SATV 2479.817 3135.359
SATM 3135.359 6596.303
SATV SATM
SATV 2479.817 3135.359
SATM 3135.359 6596.303
R=D−1SD−1 S=DRD
SATV SATM
SATV 2479.817 3135.359
SATM 3135.359 6596.303
[,1] [,2]
[1,] 49.79777 0.00000
[2,] 0.00000 81.21763
[,1] [,2]
[1,] 1.0000000 0.7752238
[2,] 0.7752238 1.0000000
SATV SATM
SATV 1.0000000 0.7752238
SATM 0.7752238 1.0000000
Generalized Sample Variance=∣S∣
It is a measure of spread across all variables
Reflecting how much overlapping area (covariance) across variables relative to the total variances occurs in the sample
Amount of overlap reduces the generalized sample variance
[1] 6527152
SATV SATM
SATV 2479.817 0.000
SATM 0.000 6596.303
[1] 16357628
[1] 0.399028
SATV SATM
SATV 2479.817 16357628.070
SATM 16357628.070 6596.303
[1] -2.67572e+14
The generalized sample variance is:
The total sample variance is the sum of the variances of each variable in the sample.
Total Sample Variance=v=1∑Vsxi2=trS
Total sample variance for our SAT example:
The total sample variance does not take into consideration the covariances among the variables.
f(xp)=(2π)2V∣Σ∣211exp[−2(xpT−μ)TΣ−1(xpT−μ)]
Where V represents the number of variables and the highlighted term is the Mahalanobis distance.
We use MVN(μ,Σ) to represent a multivariate normal distribution with mean vector μ and covariance matrix Σ.
Similar to the squared error in the univariate case, we can calculate the squared Mahalanobis distance for each observed individual in the context of a multivariate distribution.
d2(xp)=(xpT−μ)TΣ−1(xpT−μ)
mahalanobis with a data vector (x), mean vector (center), and covariance matrix (cov) to calculate the squared Mahalanobis distance for one individual.SATV SATM
520 580
[1] 1.346228
[1] 0.4211176
[1] 0.6512687
[,1]
[1,] 1.346228
The multivariate normal distribution has useful properties that appear in statistical methods:
If X is distributed multivariate normally:
Similar to other distribution functions, use dmvnorm to get the density given the observations and the parameters (mean vector and covariance matrix). rmvnorm generates multiple samples from the distribution.
SATV SATM
499.32 498.27
SATV SATM
SATV 2479.817 3135.359
SATM 3135.359 6596.303
[1] 3.177814e-05
[1] 5.046773e-05
[1] -10682.62
| SATV | SATM |
|---|---|
| 537.0160 | 547.8028 |
| 387.0979 | 229.2198 |
| 571.3216 | 487.6030 |
| 535.4205 | 419.4245 |
| 497.6253 | 531.8480 |
| 419.4548 | 433.5945 |
| 467.1004 | 465.8834 |
| 587.4870 | 650.3022 |
| 517.0927 | 445.2122 |
| 467.8157 | 551.3610 |
| 499.3387 | 511.4012 |
| 517.7979 | 551.5509 |
| 549.7992 | 614.0548 |
| 521.5178 | 428.4098 |
| 495.9478 | 332.4840 |
| 496.0289 | 557.1868 |
| 439.6495 | 294.9847 |
| 577.3656 | 610.7236 |
| 510.8110 | 510.8675 |
| 512.4274 | 455.9697 |
Using a small practice matrix and the tools from this lecture, complete the following steps. Fill in the underlined placeholders where indicated.
X with columns SATV and SATM (as a numeric matrix). Compute:XBAR and covariance matrix S.R and its trace tr(R).X to Xc = X - ____ %*% t(____) using a ones vector and the mean vector.colMeans(Xc) is (approximately) ____.t(Xc) %*% Xc / (____ - 1) equals ____.XtX = crossprod(____) and its inverse XtX_inv = solve(____).round(XtX_inv %*% ____, 6) equals the identity.X using mahalanobis with center ____ and cov ____.XBAR is the 2x1 mean vector of SATV and SATM; S is their 2x2 sample covariance matrix; R is the correlation matrix and tr(R) equals the sum of its diagonal (2.0 for perfectly standardized variables; here it depends on scaling).colMeans(Xc) approximately zero; and t(Xc) %*% Xc / (n - 1) equals S by definition of sample covariance.XtX_inv %*% XtX equals the identity (up to rounding) when XtX is invertible.mahalanobis(X, XBAR, S) returns squared distances used in multivariate normal contexts; the density plot visualizes their distribution.We are now ready to discuss multivariate models and the art/science of multivariate modeling.
Many of the concepts from univariate models carry over:
Matrix algebra was necessary to concisely describe our distributions (which will soon be models).
Understanding the multivariate normal distribution is essential, as it is the most commonly used distribution for estimating multivariate models.
Next class, we will return to data analysis—for multivariate observations—using R’s lavaan package for path analysis.