install.packages("tidyverse")
Lecture 01: Introduction and Overview
Introduction to R and Rstudio
1 Presentation Outline
Introduce ourselves
Go through the syllabus
Introduce R and Rstudio
2 Self Introduction
3 It is your turn
Tell me:
 Your name
 Department name
 Which year in your program
 Anything else you want to share
4 Class Time
 Unit 1: 17:00  17:50
 Unit 2: 18:00  18:50
 Unit 3: 19:00  19:45 (Practice with R Code on your laptop and Ask Questions)
5 What To Expect This Semester
I believe that everyone is capable and can significantly benefit from learning varied types of multivariate analysis
 The hard part is the working memory load, not the math
Philosophy: Focus on accessibility + learningbydoing
The AMS class heavily emphasize on handson taskoriented practices
No anxietyprone tasks (e.g., hand calculations, memorizing formulas)
No anxietyprone methods of evaluation (e.g., timed tests)
Materials:
Lecture slides present concepts—the what and the why
Example documents: reinforce the concepts and demonstrate the how using software—R packages
All available at the course website (hosted outside of Blackboard)
 Let me can show you how to use the website
6 Course Requirements
8 inclass quiz (16 points): formative assessments aiming for refreshing our memory in last week
An opportunity for you to request topics for further clarification and review
Typically 23 questions that can be finished in 15 minutes
4 homework assignments (HW14, 84 points): Practice doing data analysis
Based directly on example given (no googling or ChatGPT needed, ever)
Homework 13: unique canned dataset
 Answered online homework portal
Homework 4: individual data analysis + written results section
 Get my feedback for using a model of your choice on your data + optional revision
Extra points (10 points, HW0)
 I will show you how to use the system using Homework 0 as the example
7 Our Other Responsibility
My job (besides providing materials and assignments):
Answer questions via email, in individual meetings, or in groupbased zoom office hours—you can each work on homework during office hours and get immediate assistance (and then keep working)
 Email me first
Your job (in descending order of timely importance):
Ask questions—preferably in class, but any time is better than none
Frequently review the class material, focusing on mastering the vocabulary, logic, and procedural skills
Don’t wait until the last minute to start homework, and don’t be afraid to ask for help if you get stuck on one thing for more than 15 minutes
 Please email me (jzhang@uark.edu) a screenshot of your code+error so I can respond easily
Practice using the software to implement the techniques you are learning on data you care about
Do the readings for a broader perspective and additional example (best after the lecture)
8 More About Your Experience in this Class
Attendance: Strongly recommended but not required
Please do not attend inperson if you might be sick!
You can also join the class via Zoom
You won’t miss out: I will post YouTube recordings (audio + screenshare) by requested at the course website.
Changes will be sent via email by 9 am on class days
I will update the homework and inclass quiz links on class days. If not uploaded, then there are two situations: (1) I forget to do that. I will reupload later and notify you by emails. (2) I decide not to upload it or remove it.
I may change to zoomonly for dangerous weather or if I am sick.
9 Statistical Software
I will show examples primarily using R and R packages. Some important R packages include:
Tidyverse: a comprehensive R package including multiple mini packages for multiple data cleaning, data transformation.
ggplot2: a popular package for data visualization
Why not SPSS?
SPSS could only be used for some—but not all–of our content
More importantly, it doesn’t have as much room to grow; R has many new packages being developed via CRAN and GitHub
Why not SAS?
SAS is not opensourced, meaning that we cannot check source code if something goes wrong
SAS is also commercial, but R is free
Why not Python, Julia, STAT or other software (e.g., matlab)?
They are not so popular in the relevant fields of educational statistics, which means you cannot get much support from the community.
But I think Python and Julia are worth learning! If you have Python / Julia code or tips that related to our course, please share them with me! I will give you extra points
My story: I am a heavyduty R user:
I use R for almost everything (like course/conference slides, website building, data analysis, paper writing)
I am also learning Julia and Python but not plan to add them to my classes
I know how to use Mplus but do not use it very often
10 Why R?
There are some point to consider
R packages are only as good as their authors (so little quality control)
Syntax and capabilities are idiosyncratic to the packages
The good things are:
If you really master R, you can do by yourself (write your own algorithm for complex model)
You can check the source code of R packages and know where issues come from
You can communicate with R package authors and provide some suggestions
You can be R package author yourself and be famous
11 Working with Programs Through Syntax
If you feel the class is not so challenging for you, you can see materials for my other class, ESRM 6553: Advanced Multivariate Analysis
I DO NOT need you to memorize syntax
Instead, you can do exactly what I (still) do:
Find the example source file for what you need to do
Figure out how to modify it to work for your homework
Copy (CTRL+C), paste (CTRL+V), and find and replace are your friends
Don’t hesitate to ask for help (i.e., email me a screenshot)
12 What You Are Supposed To Know Already
Recommended prerequisite:
ESRM 64103. Experimental Design in Education or similar classes
ESRM 64203. Multiple Regression Techniques for Education or similar classes
What we will cover this semester:
Basic usage of R and RStudio (i.e., R download, package installation, package loading)
Descriptive statistics and bivariate associations (e.g., correlation)
Statistical concepts (e.g., null hypothesis testing, maximum likelihood estimation)
Generalized linear models (i.e., regression, ANOVA, path analysis, factor analysis)
This class will focus on multivariate analysis… so what is that?
13 What are Multivariate Analysis?
 Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable (Dependent Variables / DVs).
– Wikipedia
Many different multivariate analysis models used multivariate statistics:
Generalized linear regression
Principal components analysis (PCA)
Path analysis
Factor analysis (FA)
Network analysis (NA)
14 Why Use Multivariate Statistics
Compared to univariate analysis, Multivariate Statistics provides a more thorough picture of the reality.
Imagine our theory of depression support following two models:
Note: Not all theoretical models can be estimated via statistics
StanfordWashington University Eating Disorder Screen: Measurement Model is one special type of multivariate model
15 Research Questions in Multivariate Analysis
 What are the relationships among those variables?
 What are the effects of one variable on the outcomes controlling for other variables?
 How does the model matches the observed data?
 How do we select the best fitting model?
16 What We Will Cover This Semester
 Part I: General linear models
 Introduction/Descriptive statistics
 Simple, Marginal, Interaction Effects
 Multivariate Linear Models
 Extension: Generalized linear regression
 Predicting nonnormal conditional outcomes
 Part II: Statistical techniques
 Normal Distributions and Multivariate Normal Distribution
 Maximum Likelihood Estimation (MLE)
 Missing Data and Multiple Imputation
 Part III: Path Analysis
 Structural models
 Part IV: Latent variable models
 Principle component analysis (PCA)
 Exploratory factor analysis (EFA)
 Confirmatory factor analysis (CFA)
 Latent class models (LCA) or mixture models
17 Introduction to R
R is an comprehensive statistical and graphical programming language
We can use R language via multiple graphical user inferences or IDE, i.e., terminal, VS Code or RStudio.
We will mainly focus on RStudio because of its convenience
Rstudio is a product of posit company and is free to use for personal use
18 More RStudio
19 Installation of R and RStudio
You can download and install R base via rproject.org (currently R4.4.1)
Then, after the installation of R, you can download RStudio via posit.co (currently)
After installation of R and RStudio, you can open up the RStudio to start your R programming.
however, your R only has the base package
To enhance its utility, most users will install R packages for certain purposes
20 R packages
R packages are uploaded to some platforms (i.e., CRAN or Github) by researchers or companies
Those R packages typically have their version numbers. Some functions may be available for some version (like Ver. 1.1) but not be available in other versions.
Do not upgrade your packages if you code is running well
R users are free to download and use those R packages
To download certain package, you should know package name
For example, if you want to download the latest version of
tidyverse
package, you can type in following command in the console panel of Rstudio
 Or if you want to install the older version of package
require(devtools) install_version("tidyverse", version = "1.3.0", repos = "http://cran.us.rproject.org")
21 More about R packages
 CRAN (Comprehensive R Archive Network) is a network of servers around the world that store identical, uptodate, versions of code and documentation for R.
 It contains most stable version of packages.
 Most of time, we download package from CRAN
 Github is for the fast development for R packages
It contains the uptodate version of R which may potentially be unstable
You can download the package from Github using
pak
package::pak("tidyverse/ggplot2") pak
You can update the package and its dependencies
::pkg_install("ggplot2", upgrade = TRUE) pak
22 R functions
To operate certain tasks, you need to use functions contained in R packages
23 R functions (Cont.)
How do you know you already load the package or not
You can use
sessionInfo
functionsessionInfo()
R version 4.2.1 (20220623) Platform: aarch64appledarwin20 (64bit) Running under: macOS 15.0.1 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.2arm64/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.2arm64/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF8/en_US.UTF8/en_US.UTF8/C/en_US.UTF8/en_US.UTF8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] htmlwidgets_1.6.4 compiler_4.2.1 fastmap_1.2.0 cli_3.6.3 [5] tools_4.2.1 htmltools_0.5.8.1 parallel_4.2.1 rstudioapi_0.16.0 [9] yaml_2.3.8 rmarkdown_2.26 knitr_1.46 jsonlite_1.8.8 [13] xfun_0.43 digest_0.6.35 rlang_1.1.4 evaluate_0.23
It outputs multiple info:
R version, Operations System, Matrix operation package, Locale
Attached packages (you can call the functions of those package)
Loaded package via a namespace (and not attached), which you cannot call functions and need to
library
orrequire
them
24 Run R code
After you finish R script, you have multiple ways of running the code:
Method 1: you can click
Run
button in the top righthead of RstudioMethod 2: you can select certain code and press
Ctrl + Enter
(Win) orCommand + Return
(Mac)Method 3: you can
Rscript [FILENAME].r
to run the whole scriptMethod 4: you can using R notebook to interactively run R code
Script file is .R  Script file is .rmd or .qmd  

Run the whole script 


Run the partial script 


25 Example 1: More Usage about R
 Open up MakeFriendsWithR.qmd
26 Example 2: Homework Portal
Homework online portal will be delivered on the website
The homework system is based on Microsoft Form. You can answer the questions via Laptop, PC or Mobile Phone.
 Make sure your device connect to the Internet.
Let me use Homework 0 as one example.
27 Summary
 Note that the syllabus, schedule, and all materials are uploaded online the week before class.
 We learn that R, Rstudio, and Quarto (.qmd) can be used to execute R code/syntax.
 Inclass quiz will be administered in the begining of class. Should be quick and easy. Don’t be stressful ever!
 Office hour will be 2PM  4PM on Tuesday. Feel free to stop by my office and ask questions.
 Next week, we will start to reiterate general linear regression.
28 Next Week
We will:
 Talk about descriptive statistics, data visualization and general linear model
 HW0 will be due before CLASS BEGINS on Monday