Lecture 07: Matrix Algebra

Matrix Algebra in R

Author

Affiliation

Jihong Zhang*, Ph.D

Educational Statistics and Research Methods (ESRM) Program*

University of Arkansas

Published

October 7, 2024

0.1 Today’s Class

Matrix Algebra
Multivariate Normal Distribution
Multivariate Linear Analysis

0.2 Graduate Certificate in ESRM Program

See link here

1 An Brief Introduction to Matrix

1.1 Today’s Example Data

Imagine that I collected data SAT test scores for both the Math (SATM) and Verbal (SATV) sections of 1,000 students

library(ESRM64503)
library(kableExtra)
show_table(head(dataSAT))

id	SATV	SATM
1	520	580
2	520	550
3	460	440
4	560	530
5	430	440
6	490	530

show_table(tail(dataSAT))

	id	SATV	SATM
995	995	570	560
996	996	480	420
997	997	430	330
998	998	560	540
999	999	470	410
1000	1000	540	660

plot(dataSAT$SATV, dataSAT$SATM)

1.2 Background

Matrix operations are fundamental to all modern statistical software.
When you installed R, R also comes with required matrix algorithm library for you. Two popular are BLAS and LAPACK
- Other optimized libraries include OpenBLAS, AtlasBLAS, GotoBLAS, Intel MKL
  
  {bash}} Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
From the LAPACK website,

LAPACK is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems.

LAPACK routines are written so that as much as possible of the computation is performed by calls to the Basic Linear Algebra Subprograms (BLAS).

1.3 Matrix Elements

A matrix (denote as capitalized X) is composed of a set of elements
- Each element is denote by its position in the matrix (row and column)

X = matrix(c(
  1, 2,
  3, 4,
  5, 6
), nrow = 3, byrow = TRUE)
X

     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6

dim(X) # Number of rows and columns

[1] 3 2

In R, we use matrix[rowIndex, columnIndex] to extract the element with the position of rowIndex and columnIndex

X[2, 1]
X[3] # No comma in the bracket will output the element in column-wise order
X[2, ] # 2nd row vector
X[, 1] # 1st column vector

[1] 3
[1] 5
[1] 3 4
[1] 1 3 5

In statistics, we use x_{ij} to represent one element with the position of ith row and jth column. For a example matrix \mathbf{X} with the size of 1000 rows and 2 columns.
- The first subscript is the index of the rows
- The second subscript is the index of the columns

\mathbf{X} = \begin{bmatrix} x_{11} & x_{12}\\ x_{21} & x_{22}\\ \dots & \dots \\ x_{1000, 1} & x_{1000,2} \end{bmatrix}

1.4 Scalars

A scalar is just a single number
The name scalar is important: the number “scales” a vector – it can make a vector “longer” or “shorter”.
Scalars are typically written without boldface:

x_{11} = 520
Each element of a matrix is a scalar.
Matrices can be multiplied by scalar so that each elements are multiplied by this scalar
```
3 * X
```
```
     [,1] [,2]
[1,]    3    6
[2,]    9   12
[3,]   15   18
```

1.5 Matrix Transpose

The transpose of a matrix is a reorganization of the matrix by switching the indices for the rows and columns

\mathbf{X} = \begin{bmatrix} 520 & 580\\ 520 & 550\\ \vdots & \vdots\\ 540 & 660\\ \end{bmatrix}

\mathbf{X}^T = \begin{bmatrix} 520 & 520 & \cdots & 540\\ 580 & 550 & \cdots & 660 \end{bmatrix}

An element x_{ij} in the original matrix \mathbf{X} is now x_{ij} in the transposed matrix \mathbf{X}^T
Transposes are used to align matrices for operations where the sizes of matrices matter (such as matrix multiplication)
```
t(X)
```
```
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
```

1.6 Types of Matrices

Square Matrix: A square matrix has the same number of rows and columns
- Correlation / covariance matrices are square matrices
Diagonal Matrix: A diagonal matrix is a square matrix with non-zero diagonal elements (x_{ij}\neq0 for i=j) and zeros on the off-diagonal elements (x_{ij} =0 for i\neq j):

\mathbf{A} = \begin{bmatrix} 2.758 & 0 & 0 \\ 0 & 1.643 & 0 \\ 0 & 0 & 0.879\\ \end{bmatrix}
- We will use diagonal matrices to transform correlation matrices to covariance matrices
```
vars = c(2.758, 1.643, 0.879)
diag(vars)
```
```
      [,1]  [,2]  [,3]
[1,] 2.758 0.000 0.000
[2,] 0.000 1.643 0.000
[3,] 0.000 0.000 0.879
```
Symmetric Matrix: A symmetric matrix is a square matrix where all elements are reflected across the diagonal (x_{ij} = x_{ji})
- Correlation and covariance matrices are symmetric matrices
- Question: A diagonal matrix is always a symmetric matrix? True

1.7 Linear Combinations

Addition of a set of vectors (all multiplied by scalars) is called a linear combination:

\mathbb{y} = a_1x_1 + a_2x_2 + \cdots + a_kx_k
Here, \mathbb{y} is the linear combination
- For all k vectors, the set of all possible linear combinations is called their span
- Typically not thought of in most analyses – but when working with things that don’t exist (latent variables) becomes somewhat importnat
In Data, linear combinations happen frequently:
- Linear models (i.e., Regression and ANOVA)
- Principal components analysis
- Question: Does generalized linear model contains linear combinations? True, link function + a linear combination.

1.8 Inner (Dot/Cross-) Product of Vectors

An important concept in vector geometry is that of the inner product of two vectors
- The inner product is also called the dot product
\mathbf{a} \cdot \mathbf{b} = a_{11}b_{11}+a_{21}b_{21}+\cdots+ a_{N1}b_{N1} = \sum_{i=1}^N{a_{i1}b_{i1}}

x = matrix(c(1, 2), ncol = 1)
y = matrix(c(2, 3), ncol = 1)
crossprod(x, y) # R function for dot product of x and y
t(x) %*% y

     [,1]
[1,]    8
     [,1]
[1,]    8

This is formally equivalent to (but usually slightly faster than) the call t(x) %*% y (crossprod) or x %*% t(y) (tcrossprod).

Using our example data dataSAT,

crossprod(dataSAT$SATV, dataSAT$SATM) # x and y could be variables in our data

          [,1]
[1,] 251928400

In Data: the angle between vectors is related to the correlation between variables and the projection is related to regression/ANOVA/linear models

2 Matrix Algebra

2.1 Moving from Vectors to Matrices

A matrix can be thought of as a collection of vectors
- In R, we use df$[name] or matrix[, index] to extract single vector
Matrix algebra defines a set of operations and entities on matrices
- I will present a version meant to mirror your previous algebra experiences
Definitions:
- Identity matrix
- Zero vector
- Ones vector
Basic Operations:
- Addition
- Subtraction
- Multiplication
- “Division”

2.2 Matrix Addition and Subtraction

Matrix addition and subtraction are much like vector addition / subtraction

Rules: Matrices must be the same size (rows and columns)

Be careful!! R may not pop up error message when matrice + vector!

A = matrix(c(1, 2, 3, 4), nrow = 2, byrow = T)
B = c(1, 2)
A
B
A+B

     [,1] [,2]
[1,]    1    2
[2,]    3    4
[1] 1 2
     [,1] [,2]
[1,]    2    3
[2,]    5    6

Method: the new matrix is constructed of element-by-element addition/subtraction of the previous matrices
Order: the order of the matrices (pre- and post-) does not matter

A = matrix(c(1, 2, 3, 4), nrow = 2, byrow = T)
B = matrix(c(5, 6, 7, 8), nrow = 2, byrow = T)
A

     [,1] [,2]
[1,]    1    2
[2,]    3    4

     [,1] [,2]
[1,]    5    6
[2,]    7    8

A + B

     [,1] [,2]
[1,]    6    8
[2,]   10   12

A - B

     [,1] [,2]
[1,]   -4   -4
[2,]   -4   -4

2.3 Matrix Multiplication

The new matrix has the size of same number of rows of pre-multiplying matrix and same number of columns of post-multiplying matrix

\mathbf{A}_{(r \times c)} \mathbf{B}_{(c\times k)} = \mathbf{C}_{(r\times k)}

Rules: Pre-multiplying matrix must have number of columns equaling to the number of rows of the post-multiplying matrix
Method: the elements of the new matrix consist of the inner (dot) product of the row vectors of the pre-multiplying matrix and the column vectors of the post-multiplying matrix
Order: The order of the matrices matters
R: use %*% operator or crossprod to perform matrix multiplication

A = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, byrow = T)
B = matrix(c(5, 6, 7, 8, 9, 10), nrow = 3, byrow = T)
A

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

     [,1] [,2]
[1,]    5    6
[2,]    7    8
[3,]    9   10

A %*% B

     [,1] [,2]
[1,]   46   52
[2,]  109  124

B %*% A

     [,1] [,2] [,3]
[1,]   29   40   51
[2,]   39   54   69
[3,]   49   68   87

Example: The inner product of A’s 1st row vector and B’s 1st column vector equal to AB’s first element

crossprod(A[1, ], B[, 1])

     [,1]
[1,]   46

(A%*%B)[1, 1]

[1] 46

2.4 Identity Matrix

The identity matrix (denoted as \mathbf{I}) is a matrix that pre- and post- multiplied by another matrix results in the original matrix:

\mathbf{A}\mathbf{I} = \mathbf{A}

\mathbf{I}\mathbf{A}=\mathbf{A}
The identity matrix is a square matrix that has:
- Diagonal elements = 1
- Off-diagonal elements = 0
\mathbf{I}_{(3 \times 3)} = \begin{bmatrix} 1&0&0\\ 0&1&0\\ 0&0&1\\ \end{bmatrix}

R: we can create a identity matrix using diag

diag(nrow = 3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

2.5 Zero and One Vector

The zero and one vector is a column vector of zeros and ones:

\mathbf{0}_{(3\times 1)} = \begin{bmatrix}0\\0\\0\end{bmatrix}

\mathbf{1}_{(3\times 1)} = \begin{bmatrix}1\\1\\1\end{bmatrix}
When pre- or post- multiplied the matrix (\mathbf{A}) is the zero vector:

\mathbf{A0=0}

\mathbf{0^TA=0}
R:

zero_vec <- matrix(0, nrow = 3, ncol = 1)
crossprod(B, zero_vec)

     [,1]
[1,]    0
[2,]    0

one_vec <- matrix(1, nrow = 3, ncol = 1)
crossprod(B, one_vec) # column-wise sums

     [,1]
[1,]   21
[2,]   24

2.6 Matrix “Division”: The Inverse Matrix

Division from algebra:
- First: \frac{a}{b} = b^{-1}a
- Second: \frac{a}{b}=1
“Division” in matrices serves a similar role
- For square symmetric matrices, an inverse matrix is a matrix that when pre- or post- multiplied with another matrix produces the identity matrix:
  
  \mathbf{A^{-1}A=I}
  
  \mathbf{AA^{-1}=I}
R: use solve() to calculate the matrix inverse

A <- matrix(rlnorm(9), 3, 3, byrow = T)
round(solve(A) %*% A, 3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

Caution: Calculation is complicated, even computers have a tough time. Not all matrix can be inverted:

A <- matrix(2:10, nrow = 3, ncol = 3, byrow = T)
A
solve(A)%*%A

Error in solve.default(A): Lapack routine dgesv: system is exactly singular: U[3,3] = 0

     [,1] [,2] [,3]
[1,]    2    3    4
[2,]    5    6    7
[3,]    8    9   10

2.7 Example: the inverse of variance-covaraince matrix

In data: the inverse shows up constantly in statistics
- Models which assume some types of (multivariate) normality need an inverse convariance matrix

Using our SAT example

Our data matrix was size (1000\times 2), which is not invertible
However, \mathbf{X^TX} was size (2\times 2) – square and symmetric

X = as.matrix(dataSAT[, c("SATV", "SATM")])
crossprod(X, X)

          SATV      SATM
SATV 251797800 251928400
SATM 251928400 254862700

The inverse \mathbf{(X^TX)^{-1}} is

solve(crossprod(X, X))

              SATV          SATM
SATV  3.610217e-07 -3.568652e-07
SATM -3.568652e-07  3.566802e-07

2.8 Matrix Algebra Operations

\mathbf{(A+B)+C=A+(B+C)}
\mathbf{A+B=B+A}
c(\mathbf{A+B})=c\mathbf{A}+c\mathbf{B}
(c+d)\mathbf{A} = c\mathbf{A} + d\mathbf{A}
\mathbf{(A+B)^T=A^T+B^T}
(cd)\mathbf{A}=c(d\mathbf{A})
(c\mathbf{A})^{T}=c\mathbf{A}^T
c\mathbf{(AB)} = (c\mathbf{A})\mathbf{B}
\mathbf{A(BC) = (AB)C}

\mathbf{A(B+C)=AB+AC}
\mathbf{(AB)}^T=\mathbf{B}^T\mathbf{A}^T

2.9 Advanced Matrix Functions/Operations

We end our matrix discussion with some advanced topics
To help us throughout, let’s consider the correlation matrix of our SAT data:

R <- cor(dataSAT[, c("SATV", "SATM")])
R

          SATV      SATM
SATV 1.0000000 0.7752238
SATM 0.7752238 1.0000000

R = \begin{bmatrix}1.00 & 0.78 \\ 0.78 & 1.00\end{bmatrix}

2.10 Matrix Trace

For a square matrix \mathbf{A} with p rows/columns, the matrix trace is the sum of the diagonal elements:

tr\mathbf{A} = \sum_{i=1}^{p} a_{ii}
In R, we can use tr() in psych package to calculate matrix trace
For our data, the trace of the correlation matrix is 2
- For all correlation matrices, the trace is equal to the number of variables
```
psych::tr(R)
```
```
[1] 2
```
The trace is considered as the total variance in multivariate statistics
- Used as a target to recover when applying statistical models

2.11 Model Determinants

A square matrix can be characterized by a scalar value called a determinant:

\text{det}\mathbf{A} =|\mathbf{A}|
Manual calculation of the determinant is tedious. In R, we use det() to calculate matrix determinant
```
det(R)
```
```
[1] 0.399028
```
The determinant is useful in statistics:
- Shows up in multivariate statistical distributions
- Is a measure of “generalized” variance of multiple variables
If the determinant is positive, the matrix is called positive definite \rightarrow the matrix has an inverse
If the determinant is not positive, the matrix is called non-positive definite \rightarrow the matrix does not have an inverse

2.12 Wrap Up

Matrices show up nearly anytime multivariate statistics are used, often in the help/manual pages of the package you intend to use for analysis
You don’t have to do matrix algebra, but please do try to understand the concepts underlying matrices
Your working with multivariate statistics will be better off because of even a small amount of understanding

3 Multivariate Normal Distribution

3.1 Covariance and Correlation in Matrices

The covariance matrix \mathbf{S} is found by:

\mathbf{S}=\frac{1}{N-1} \mathbf{(X-1\cdot\bar x^T)^T(X-1\cdot\bar x^T)}

X = as.matrix(dataSAT[,c("SATV", "SATM")])
N = nrow(X)
XBAR = matrix(colMeans(X), ncol = 1)
ONES = matrix(1, nrow = nrow(X))
S = 1/(N-1) * t(X - ONES%*% t(XBAR)) %*% (X - ONES%*% t(XBAR))
S

         SATV     SATM
SATV 2479.817 3135.359
SATM 3135.359 6596.303

cov(X)

         SATV     SATM
SATV 2479.817 3135.359
SATM 3135.359 6596.303

3.2 From Covariance to Correlation

If we take the SDs (the square root of the diagonal of the covariance matrix) and put them into diagonal matrix \mathbf{D}, the correlation matrix is found by:

\mathbf{R = D^{-1}SD^{-1}} \mathbf{S = DRD}

         SATV     SATM
SATV 2479.817 3135.359
SATM 3135.359 6596.303

D = sqrt(diag(diag(S)))
D

         [,1]     [,2]
[1,] 49.79777  0.00000
[2,]  0.00000 81.21763

R = solve(D) %*% S %*% solve(D)
R

          [,1]      [,2]
[1,] 1.0000000 0.7752238
[2,] 0.7752238 1.0000000

cor(X)

          SATV      SATM
SATV 1.0000000 0.7752238
SATM 0.7752238 1.0000000

3.3 Generalized Variance

The determinant of the covariance matrix is called generalized variance

\text{Generalized Sample Variance} = |\mathbf{S}|

It is a measure of spread across all variables
- Reflecting how much overlapping area (covariance) across variables relative to the total variances occurs in the sample
- Amount of overlap reduces the generalized sample variance

gsv = det(S)
gsv

[1] 6527152

# If no correlation
S_noCorr = S
S_noCorr[upper.tri(S_noCorr)] = S_noCorr[lower.tri(S_noCorr)] = 0
S_noCorr

         SATV     SATM
SATV 2479.817    0.000
SATM    0.000 6596.303

gsv_noCorr <- det(S_noCorr)
gsv_noCorr

[1] 16357628

gsv / gsv_noCorr

[1] 0.399028

# If correlation = 1
S_PerfCorr = S
S_PerfCorr[upper.tri(S_PerfCorr)] = S_PerfCorr[lower.tri(S_PerfCorr)] = prod(diag(S))
S_PerfCorr

             SATV         SATM
SATV     2479.817 16357628.070
SATM 16357628.070     6596.303

gsv_PefCorr <- det(S_PerfCorr)
gsv_PefCorr

[1] -2.67572e+14

The generalized sample variance is:
- Largest when variables are uncorrelated
- Zero when variables from a linear dependency

3.4 Total Sample Variance

The total sample variance is the sum of the variances of each variable in the sample
- The sum of the diagonal elements of the sample covariance matrix
- The trace of the sample covariance matrix

\text{Total Sample Variance} = \sum_{v=1}^{V} s^2_{x_i} = \text{tr}\mathbf{S}

Total sample variance for our SAT example:

sum(diag(S))

[1] 9076.121

The total sample variance does not take into consideration the covariances among the variables
- Will not equal zero if linearly dependency exists

3.5 Mutlivariate Normal Distribution and Mahalanobis Distance

The PDF of Multivariate Normal Distribution is very similar to univariate normal distribution

f(\mathbf{x}_p) = \frac{1}{(2\pi)^{\frac{V}2}|\mathbf{\Sigma}|^{\frac12}}\exp[-\frac{\color{tomato}{(x_p^T - \mu)^T \mathbf{\Sigma}^{-1}(x_p^T-\mu)}}{2}]

Where V represents number of variables and the highlighed is Mahalanobis Distance.

We use MVN(\mathbf{\mu, \Sigma}) to represent a multivariate normal distribution with mean vector as \mathbf{\mu} and covariance matrix as \mathbf{\Sigma}
Similar to squared mean error in univariate distribution, we can calculate squared Mahalanobis Distance for each observable individual in the context of Multivariate Distribution

d^2(x_p) = (x_p^T - \mu)^T \Sigma^{-1}(x_p^T-\mu)

In R, we can use mahalanobis followed by data vector (x), mean vector (center), and covariance matrix (cov) to calculate the squared Mahalanobis Distance for one individual

x_p <- X[1, ]
x_p

SATV SATM 
 520  580

mahalanobis(x = x_p, center = XBAR, cov = S)

[1] 1.346228

mahalanobis(x = X[2, ], center = XBAR, cov = S)

[1] 0.4211176

mahalanobis(x = X[3, ], center = XBAR, cov = S)

[1] 0.6512687

# Alternatively,
t(x_p - XBAR) %*% solve(S) %*% (x_p - XBAR)

         [,1]
[1,] 1.346228

mh_dist_all <- apply(X, 1, \(x) mahalanobis(x, center = XBAR, cov = S))
plot(density(mh_dist_all))

3.6 Multivariate Normal Properties

The multivariate normal distribution has some useful properties that show up in statistical methods
If \mathbf{X} is distributed multivariate normally:
1. Linear combinations of \mathbf{X} are normally distributed
2. All subsets of \mathbf{X} are multivariate normally distributed
3. A zero covariance between a pair of variables of \mathbf{X} implies that the variables are independent
4. Conditional distributions of \mathbf{X} are multivariate normal

3.7 How to use Multivariate Normal Distribution in R

Similar to other distribution functions, we use dmvnorm to get the density given the observations and the parameters (mean vector and covariance matrix). rmvnorm can generate multiple samples given the distribution

library(mvtnorm)
(mu <- colMeans(dataSAT[, 2:3]))

  SATV   SATM 
499.32 498.27

         SATV     SATM
SATV 2479.817 3135.359
SATM 3135.359 6596.303

dmvnorm(X[1, ], mean = mu, sigma = S)

[1] 3.177814e-05

dmvnorm(X[2, ], mean = mu, sigma = S)

[1] 5.046773e-05

## Total Log Likelihood 
LL <- sum(log(apply(X, 1, \(x) dmvnorm(x, mean = mu, sigma = S))))
LL

[1] -10682.62

## Generate samples from MVN
rmvnorm(20, mean = mu, sigma = S) |> show_table()

SATV	SATM
448.6690	346.5356
547.5522	623.7793
462.0201	405.1241
512.0536	500.8779
569.1587	504.6520
486.1675	474.9578
483.9587	490.9760
583.8711	677.7026
553.1567	628.5565
492.1799	501.6640
522.4085	580.9986
504.5034	524.3015
592.2830	643.1622
519.5650	556.0304
454.2103	498.6606
596.5938	690.5468
543.7172	605.6825
493.2891	530.0512
493.6388	476.8900
479.7672	495.8584

3.8 Wrapping Up

We are now ready to discuss multivariate models and the art/science of multivariate modeling
Many of the concepts of univariate models carry over
- Maximum likelihood
- Model building via nested models
- All of the concepts involve multivariate distributions
Matrix algebra was necessary so as to concisely talk about our distributions (which will soon be models)
The multivariate normal distribution will be necessary to understand as it is the most commonly used distribution for estimation of multivariate models
Next class we will get back into data analysis – but for multivariate observations…using R’s lavaan package for path analysis

--- title: "Lecture 07: Matrix Algebra" subtitle: "Matrix Algebra in R" author: "Jihong Zhang*, Ph.D" institute: | Educational Statistics and Research Methods (ESRM) Program* University of Arkansas date: "2024-10-07" sidebar: false execute: echo: true warning: false output-location: column code-annotations: below format: uark-revealjs: scrollable: true chalkboard: true embed-resources: false code-fold: false number-sections: false footer: "ESRM 64503 - Lecture 07: Matrix Algebra" slide-number: c/t tbl-colwidths: auto output-file: slides-index.html html: page-layout: full toc: true toc-depth: 2 toc-expand: true lightbox: true code-fold: false fig-align: center filters: - quarto - line-highlight --- ## Today's Class - Matrix Algebra - Multivariate Normal Distribution - Multivariate Linear Analysis ## Graduate Certificate in ESRM Program 1. See link [here](https://esrm.uark.edu/certificates/index.php) # An Brief Introduction to Matrix ## Today's Example Data - Imagine that I collected data SAT test scores for both the Math (SATM) and Verbal (SATV) sections of 1,000 students ```{r} #| output-location: default library(ESRM64503) library(kableExtra) show_table(head(dataSAT)) show_table(tail(dataSAT)) ``` ```{r} plot(dataSAT$SATV, dataSAT$SATM) ``` ## Background - Matrix operations are fundamental to all modern statistical software. - When you installed R, R also comes with required matrix algorithm **library** for you. Two popular are **BLAS** and **LAPACK** - Other optimized libraries include OpenBLAS, AtlasBLAS, GotoBLAS, Intel MKL `{bash}} Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib` - From the LAPACK [website](https://www.netlib.org/lapack/), > **LAPACK** is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. > > LAPACK routines are written so that as much as possible of the computation is performed by calls to the Basic Linear Algebra Subprograms (**BLAS**). ## Matrix Elements - A matrix (denote as capitalized **X**) is composed of a set of elements - Each element is denote by its position in the matrix (row and column) ```{r} X = matrix(c( 1, 2, 3, 4, 5, 6 ), nrow = 3, byrow = TRUE) X ``` ```{r} dim(X) # Number of rows and columns ``` - In R, we use `matrix[rowIndex, columnIndex]` to extract the element with the position of rowIndex and columnIndex ```{r} #| results: hold #| output-location: column X[2, 1] X[3] # No comma in the bracket will output the element in column-wise order X[2, ] # 2nd row vector X[, 1] # 1st column vector ``` - In statistics, we use $x_{ij}$ to represent one element with the position of *i*th row and *j*th column. For a example matrix $\mathbf{X}$ with the size of 1000 rows and 2 columns. - The first subscript is the index of the rows - The second subscript is the index of the columns $$ \mathbf{X} = \begin{bmatrix} x_{11} & x_{12}\\ x_{21} & x_{22}\\ \dots & \dots \\ x_{1000, 1} & x_{1000,2} \end{bmatrix} $$ ## Scalars - A scalar is just a single number - The name scalar is important: the number "scales" a vector – it can make a vector "longer" or "shorter". - Scalars are typically written without boldface: $$ x_{11} = 520 $$ - Each element of a matrix is a scalar. - Matrices can be multiplied by scalar so that each elements are multiplied by this scalar ```{r} 3 * X ``` ## Matrix Transpose - The transpose of a matrix is a reorganization of the matrix by switching the indices for the rows and columns $$ \mathbf{X} = \begin{bmatrix} 520 & 580\\ 520 & 550\\ \vdots & \vdots\\ 540 & 660\\ \end{bmatrix} $$ $$ \mathbf{X}^T = \begin{bmatrix} 520 & 520 & \cdots & 540\\ 580 & 550 & \cdots & 660 \end{bmatrix} $$ - An element $x_{ij}$ in the original matrix $\mathbf{X}$ is now $x_{ij}$ in the transposed matrix $\mathbf{X}^T$ - **Transposes are used to align matrices for operations where the sizes of matrices matter (such as matrix multiplication)** ```{r} t(X) ``` ## Types of Matrices - **Square Matrix:** A square matrix has the same number of rows and columns - Correlation / covariance matrices are square matrices - **Diagonal Matrix**: A diagonal matrix is a square matrix with non-zero diagonal elements ($x_{ij}\neq0$ for $i=j$) and zeros on the off-diagonal elements ($x_{ij} =0$ for $i\neq j$): $$ \mathbf{A} = \begin{bmatrix} 2.758 & 0 & 0 \\ 0 & 1.643 & 0 \\ 0 & 0 & 0.879\\ \end{bmatrix} $$ - We will use diagonal matrices to transform correlation matrices to covariance matrices ```{r} vars = c(2.758, 1.643, 0.879) diag(vars) ``` - **Symmetric Matrix**: A symmetric matrix is a square matrix where all elements are reflected across the diagonal ($x_{ij} = x_{ji}$) - Correlation and covariance matrices are symmetric matrices - [**Question**: A diagonal matrix is always a symmetric matrix?]{.underline} [True]{.mohu} ## Linear Combinations - Addition of a set of vectors (all multiplied by scalars) is called a linear combination: $$ \mathbb{y} = a_1x_1 + a_2x_2 + \cdots + a_kx_k $$ - Here, $\mathbb{y}$ is the linear combination - For all *k* vectors, the set of all possible linear combinations is called their **span** - Typically not thought of in most analyses – but when working with things that don't exist (latent variables) becomes somewhat importnat - **In Data**, linear combinations happen frequently: - Linear models (i.e., Regression and ANOVA) - Principal components analysis - **Question**: Does generalized linear model contains linear combinations? [True, link function + a linear combination]{.mohu}. ## Inner (Dot/Cross-) Product of Vectors - An important concept in vector geometry is that of the inner product of two vectors - The inner product is also called the dot product $$ \mathbf{a} \cdot \mathbf{b} = a_{11}b_{11}+a_{21}b_{21}+\cdots+ a_{N1}b_{N1} = \sum_{i=1}^N{a_{i1}b_{i1}} $$ ```{r} #| results: hold x = matrix(c(1, 2), ncol = 1) y = matrix(c(2, 3), ncol = 1) crossprod(x, y) # R function for dot product of x and y t(x) %*% y ``` > This is formally equivalent to (but usually slightly faster than) the call `t(x) %*% y` (`crossprod`) or `x %*% t(y)` (`tcrossprod`). Using our **example data `dataSAT`**, ```{r} crossprod(dataSAT$SATV, dataSAT$SATM) # x and y could be variables in our data ``` - **In Data**: the angle between vectors is related to the correlation between variables and the projection is related to regression/ANOVA/linear models # Matrix Algebra ## Moving from Vectors to Matrices - A matrix can be thought of as a collection of vectors - In R, we use `df$[name]` or `matrix[, index]` to extract single vector - Matrix algebra defines a set of operations and entities on matrices - I will present a version meant to mirror your previous algebra experiences - Definitions: - Identity matrix - Zero vector - Ones vector - Basic Operations: - Addition - Subtraction - Multiplication - "Division" ## Matrix Addition and Subtraction - Matrix addition and subtraction are much like vector addition / subtraction - **Rules**: Matrices must be the same size (rows and columns) - [Be careful!! R may not pop up error message when matrice + vector!]{style="color: red"} ```{r} #| output-location: column #| results: hold A = matrix(c(1, 2, 3, 4), nrow = 2, byrow = T) B = c(1, 2) A B A+B ``` - **Method**: the new matrix is constructed of element-by-element addition/subtraction of the previous matrices - **Order**: the order of the matrices (pre- and post-) does not matter ```{r} #| error: true #| output-location: default A = matrix(c(1, 2, 3, 4), nrow = 2, byrow = T) B = matrix(c(5, 6, 7, 8), nrow = 2, byrow = T) A B A + B A - B ``` ## Matrix Multiplication - **The new matrix** has the size of same [number of rows of pre-multiplying]{style="color: tomato; font-weight: bold"} matrix and [same number of columns of post-multiplying]{style="color: royalblue; font-weight: bold"} matrix $$ \mathbf{A}_{(r \times c)} \mathbf{B}_{(c\times k)} = \mathbf{C}_{(r\times k)} $$ - **Rules**: Pre-multiplying matrix must have number of columns equaling to the number of rows of the post-multiplying matrix - **Method**: the elements of the new matrix consist of the inner (dot) product of [the row vectors of the pre-multiplying matrix]{style="color: tomato; font-weight: bold"} and [the column vectors of the post-multiplying matrix]{style="color: royalblue; font-weight: bold"} - **Order**: The order of the matrices matters - **R**: use `%*%` operator or `crossprod` to perform matrix multiplication ```{r} #| output-location: default A = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, byrow = T) B = matrix(c(5, 6, 7, 8, 9, 10), nrow = 3, byrow = T) A B A %*% B B %*% A ``` - **Example**: The inner product of A's 1st row vector and B's 1st column vector equal to AB's first element ```{r} #| output-location: default crossprod(A[1, ], B[, 1]) (A%*%B)[1, 1] ``` ## Identity Matrix - The identity matrix (denoted as $\mathbf{I}$) is a matrix that pre- and post- multiplied by another matrix results in the original matrix: $$ \mathbf{A}\mathbf{I} = \mathbf{A} $$ $$ \mathbf{I}\mathbf{A}=\mathbf{A} $$ - The identity matrix is a square matrix that has: - Diagonal elements = 1 - Off-diagonal elements = 0 $$ \mathbf{I}_{(3 \times 3)} = \begin{bmatrix} 1&0&0\\ 0&1&0\\ 0&0&1\\ \end{bmatrix} $$ - **R**: we can create a identity matrix using `diag` ```{r} diag(nrow = 3) ``` ## Zero and One Vector - The zero and one vector is a column vector of zeros and ones: $$ \mathbf{0}_{(3\times 1)} = \begin{bmatrix}0\\0\\0\end{bmatrix} $$ $$ \mathbf{1}_{(3\times 1)} = \begin{bmatrix}1\\1\\1\end{bmatrix} $$ - When pre- or post- multiplied the matrix ($\mathbf{A}$) is the zero vector: $$ \mathbf{A0=0} $$ $$ \mathbf{0^TA=0} $$ - **R:** ```{r} #| output-location: default zero_vec <- matrix(0, nrow = 3, ncol = 1) crossprod(B, zero_vec) one_vec <- matrix(1, nrow = 3, ncol = 1) crossprod(B, one_vec) # column-wise sums ``` ## Matrix "Division": The Inverse Matrix - Division from algebra: - First: $\frac{a}{b} = b^{-1}a$ - Second: $\frac{a}{b}=1$ - "Division" in matrices serves a similar role - For [**square symmetric**]{style="color: tomato; font-weight: bold"} matrices, an inverse matrix is a matrix that when pre- or post- multiplied with another matrix produces the identity matrix: $$ \mathbf{A^{-1}A=I} $$ $$ \mathbf{AA^{-1}=I} $$ - **R:** use `solve()` to calculate the matrix inverse ```{r} A <- matrix(rlnorm(9), 3, 3, byrow = T) round(solve(A) %*% A, 3) ``` - **Caution**: Calculation is complicated, even computers have a tough time. Not all matrix can be inverted: ```{r} #| error: true #| results: hold A <- matrix(2:10, nrow = 3, ncol = 3, byrow = T) A solve(A)%*%A ``` ## Example: the inverse of variance-covaraince matrix - In data: the inverse shows up constantly in statistics - Models which assume some types of (multivariate) normality need an inverse convariance matrix - Using our SAT example - Our data matrix was size ($1000\times 2$), which is not invertible - However, $\mathbf{X^TX}$ was size ($2\times 2$) – square and symmetric ```{r} X = as.matrix(dataSAT[, c("SATV", "SATM")]) crossprod(X, X) ``` - The inverse $\mathbf{(X^TX)^{-1}}$ is ```{r} solve(crossprod(X, X)) ``` ## Matrix Algebra Operations ::: columns ::: column - $\mathbf{(A+B)+C=A+(B+C)}$ - $\mathbf{A+B=B+A}$ - $c(\mathbf{A+B})=c\mathbf{A}+c\mathbf{B}$ - $(c+d)\mathbf{A} = c\mathbf{A} + d\mathbf{A}$ - $\mathbf{(A+B)^T=A^T+B^T}$ - $(cd)\mathbf{A}=c(d\mathbf{A})$ - $(c\mathbf{A})^{T}=c\mathbf{A}^T$ - $c\mathbf{(AB)} = (c\mathbf{A})\mathbf{B}$ - $\mathbf{A(BC) = (AB)C}$ ::: ::: column - $\mathbf{A(B+C)=AB+AC}$ - $\mathbf{(AB)}^T=\mathbf{B}^T\mathbf{A}^T$ ::: ::: ## Advanced Matrix Functions/Operations - We end our matrix discussion with some advanced topics - To help us throughout, let's consider the correlation matrix of our SAT data: ```{r} R <- cor(dataSAT[, c("SATV", "SATM")]) R ``` $$ R = \begin{bmatrix}1.00 & 0.78 \\ 0.78 & 1.00\end{bmatrix} $$ ## Matrix Trace - For a square matrix $\mathbf{A}$ with *p* rows/columns, the matrix trace is the sum of the diagonal elements: $$ tr\mathbf{A} = \sum_{i=1}^{p} a_{ii} $$ - In R, we can use `tr()` in `psych` package to calculate matrix trace - For our data, the trace of the correlation matrix is 2 - For all correlation matrices, **the trace is equal to the number of variables** ```{r} psych::tr(R) ``` - The trace is considered as the total variance in multivariate statistics - Used as a target to recover when applying statistical models ## Model Determinants - A square matrix can be characterized by a scalar value called a determinant: $$ \text{det}\mathbf{A} =|\mathbf{A}| $$ - Manual calculation of the determinant is tedious. In R, we use `det()` to calculate matrix determinant ```{r} det(R) ``` - The determinant is useful in statistics: - Shows up in multivariate statistical distributions - Is a measure of "generalized" variance of multiple variables - If the determinant is positive, the matrix is called **positive definite** $\rightarrow$ the matrix has an inverse - If the determinant is not positive, the matrix is called **non-positive definite** $\rightarrow$ the matrix does not have an inverse ## Wrap Up 1. Matrices show up nearly anytime multivariate statistics are used, often in the help/manual pages of the package you intend to use for analysis 2. You don't have to do matrix algebra, but please do try to understand the concepts underlying matrices 3. Your working with multivariate statistics will be better off because of even a small amount of understanding # Multivariate Normal Distribution ## Covariance and Correlation in Matrices - The covariance matrix $\mathbf{S}$ is found by: $$ \mathbf{S}=\frac{1}{N-1} \mathbf{(X-1\cdot\bar x^T)^T(X-1\cdot\bar x^T)} $$ ```{r} X = as.matrix(dataSAT[,c("SATV", "SATM")]) N = nrow(X) XBAR = matrix(colMeans(X), ncol = 1) ONES = matrix(1, nrow = nrow(X)) S = 1/(N-1) * t(X - ONES%*% t(XBAR)) %*% (X - ONES%*% t(XBAR)) S cov(X) ``` ## From Covariance to Correlation - If we take the SDs (the square root of the diagonal of the covariance matrix) and put them into diagonal matrix $\mathbf{D}$, the correlation matrix is found by: $$ \mathbf{R = D^{-1}SD^{-1}} $$ $$ \mathbf{S = DRD} $$ ```{r} #| output-location: default S D = sqrt(diag(diag(S))) D R = solve(D) %*% S %*% solve(D) R cor(X) ``` ## Generalized Variance - The determinant of the covariance matrix is called **generalized variance** $$ \text{Generalized Sample Variance} = |\mathbf{S}| $$ - It is a measure of spread across all variables - Reflecting how much overlapping area (covariance) across variables relative to the total variances occurs in the sample - Amount of overlap reduces the generalized sample variance ```{r} #| output-location: default gsv = det(S) gsv # If no correlation S_noCorr = S S_noCorr[upper.tri(S_noCorr)] = S_noCorr[lower.tri(S_noCorr)] = 0 S_noCorr gsv_noCorr <- det(S_noCorr) gsv_noCorr gsv / gsv_noCorr # If correlation = 1 S_PerfCorr = S S_PerfCorr[upper.tri(S_PerfCorr)] = S_PerfCorr[lower.tri(S_PerfCorr)] = prod(diag(S)) S_PerfCorr gsv_PefCorr <- det(S_PerfCorr) gsv_PefCorr ``` - The generalized sample variance is: - Largest when variables are uncorrelated - Zero when variables from a linear dependency ## Total Sample Variance - The total sample variance is the sum of the variances of each variable in the sample - The sum of the diagonal elements of the sample covariance matrix - The trace of the sample covariance matrix $$ \text{Total Sample Variance} = \sum_{v=1}^{V} s^2_{x_i} = \text{tr}\mathbf{S} $$ Total sample variance for our SAT example: ```{r} sum(diag(S)) ``` - The total sample variance does not take into consideration the covariances among the variables - Will not equal zero if linearly dependency exists ## Mutlivariate Normal Distribution and Mahalanobis Distance - The PDF of Multivariate Normal Distribution is very similar to univariate normal distribution $$ f(\mathbf{x}_p) = \frac{1}{(2\pi)^{\frac{V}2}|\mathbf{\Sigma}|^{\frac12}}\exp[-\frac{\color{tomato}{(x_p^T - \mu)^T \mathbf{\Sigma}^{-1}(x_p^T-\mu)}}{2}] $$ Where $V$ represents number of variables and the highlighed is [Mahalanobis Distance]{style="color: tomato"}. - We use $MVN(\mathbf{\mu, \Sigma})$ to represent a multivariate normal distribution with mean vector as $\mathbf{\mu}$ and covariance matrix as $\mathbf{\Sigma}$ - Similar to squared mean error in univariate distribution, we can calculate squared Mahalanobis Distance for each observable individual in the context of Multivariate Distribution $$ d^2(x_p) = (x_p^T - \mu)^T \Sigma^{-1}(x_p^T-\mu) $$ - In R, we can use `mahalanobis` followed by data vector (`x`), mean vector (`center`), and covariance matrix (`cov`) to calculate the **squared Mahalanobis Distance** for one individual ```{r} #| output-location: default x_p <- X[1, ] x_p mahalanobis(x = x_p, center = XBAR, cov = S) mahalanobis(x = X[2, ], center = XBAR, cov = S) mahalanobis(x = X[3, ], center = XBAR, cov = S) # Alternatively, t(x_p - XBAR) %*% solve(S) %*% (x_p - XBAR) ``` ```{r} mh_dist_all <- apply(X, 1, \(x) mahalanobis(x, center = XBAR, cov = S)) plot(density(mh_dist_all)) ``` ## Multivariate Normal Properties - The multivariate normal distribution has some useful properties that show up in statistical methods - If $\mathbf{X}$ is distributed multivariate normally: 1. Linear combinations of $\mathbf{X}$ are normally distributed 2. All subsets of $\mathbf{X}$ are multivariate normally distributed 3. A zero covariance between a pair of variables of $\mathbf{X}$ implies that the variables are independent 4. Conditional distributions of $\mathbf{X}$ are multivariate normal ## How to use Multivariate Normal Distribution in R Similar to other distribution functions, we use `dmvnorm` to get the density given the observations and the parameters (mean vector and covariance matrix). `rmvnorm` can generate multiple samples given the distribution ```{r} #| output-location: default library(mvtnorm) (mu <- colMeans(dataSAT[, 2:3])) S dmvnorm(X[1, ], mean = mu, sigma = S) dmvnorm(X[2, ], mean = mu, sigma = S) ## Total Log Likelihood LL <- sum(log(apply(X, 1, \(x) dmvnorm(x, mean = mu, sigma = S)))) LL ## Generate samples from MVN rmvnorm(20, mean = mu, sigma = S) |> show_table() ``` ## Wrapping Up 1. We are now ready to discuss multivariate models and the art/science of multivariate modeling 2. Many of the concepts of univariate models carry over - Maximum likelihood - Model building via nested models - All of the concepts involve multivariate distributions 3. Matrix algebra was necessary so as to concisely talk about our distributions (which will soon be models) 4. The multivariate normal distribution will be necessary to understand as it is the most commonly used distribution for estimation of multivariate models 5. Next class we will get back into data analysis – but for multivariate observations…using R’s lavaan package for path analysis