It is easy for others to contribute add-ons which enables developers to share software implementations of new data science methodologies. This gives R users early access to the latest methods and to tools which are developed for a wide variety of disciplines, including ecology, molecular biology, social sciences, and geography, just to name a few example
Get started
R console
One way of using R is to simply start R console on your computer (PC).
In Mac, after installing R, simply type in “R” in terminal to get started
After installation of R and RStudio, you can open up the RStudio to start your R programming.
however, your R only has the base package
To enhance its utility, most users will install R packages for certain purposes
R packages
R packages are uploaded to some platforms (i.e., CRAN or Github) by researchers or companies
Those R packages typically have their version numbers. Some functions may be available for some version (like Ver. 1.1) but not be available in other versions.
Do not upgrade your packages if you code is running well
R users are free to download and use those R packages
To download certain package, you should know package name
For example, if you want to download the latest version of tidyverse package, you can type in following command in the console panel of Rstudio
install.packages("tidyverse")
Or if you want to install the older version of package
require(devtools)install_version("tidyverse", version ="1.3.0", repos ="http://cran.us.r-project.org")
More about R packages
CRAN (Comprehensive R Archive Network) is a network of servers around the world that store identical, up-to-date, versions of code and documentation for R.
It contains most stable version of packages.
Most of time, we download package from CRAN
Github is for the fast development for R packages
It contains the up-to-date version of R which may potentially be unstable
You can download the package from Github using pak package
pak::pak("tidyverse/ggplot2")
You can update the package and its dependencies
pak::pkg_install("ggplot2", upgrade =TRUE)
R functions
To operate certain tasks, you need to use functions contained in R packages
There are two ways of using R functions
Direct way: you don’t have to load your package first
Use-after-load way: Package is loaded in your session before you can call the function name without specifying the package name
library("ggplot2")ggplot() +geom_point(aes(x =1:100, y =100:1), color ="tomato")
R functions (Cont.)
How do you know you already load the package or not
You can use sessionInfo function
sessionInfo()
It outputs multiple info:
R version, Operations System, Matrix operation package, Locale
Attached packages (you can call the functions of those package)
Loaded package via a namespace (and not attached), which you cannot call functions and need to library or require them
Execute R code
After you finish R script, you have multiple ways of executing the code and output the results on Console:
Method 1: you can click Run button in the top right-head of Rstudio
Method 2: you can select certain code and press Ctrl + Enter (Win) or Command + Return (Mac)
Method 3: you can Rscript [FILENAME].r to run the whole script
Method 4: you can using R notebook to interactively run R code
Script file is .R
Script file is .rmd or .qmd
Run the whole script
Method 1
Method 3
Method 4
Run the partial script
Method 2
Method 4
Introduce Rstudio
RStudio will be our launching pad for data science projects. It not only provides an editor for us to create and edit our scripts but also provides many other useful tools.
When you start RStudio for the first time, you will see three panes:
The left pane shows the Code editor (will show when you create a new file) and R console.
On the right, the top pane includes tabs such as Environment and History, while the bottom pane shows five tabs: File, Plots, Packages, Help, and Viewer .
To start a new script in Code editor, you can click on File > New File > R Script.
We can then load the package into our R sessions using the library function in your Rscript file:
library("dslabs")head(admissions)
major gender admitted applicants
1 A men 62 825
2 B men 63 560
3 C men 37 325
4 D men 33 417
5 E men 28 191
6 F men 6 373
As you go through this class, you will see that we load packages without installing them. This is because once you install a package, it remains installed and only needs to be loaded with library.
We can install more than one package at once by feeding a character vector to this function:
install.packages(c("dplyr", "dslabs"))
You can see all the packages you have installed using the following function:
installed.packages()
Installing R Packages: from GitHub
You can also install user-built package from GitHub
install.packages("remotes") # install one package called "remotes"library("remotes") # load the package into your R sessioninstall_github(repo ="JihongZ/ESRM6990V") # install one GitHub package from my GitHub repositorylibrary(ESRM6990V) # load the package into your R sessionjihong(details =TRUE) # call one function called "jihong" from the package
What: An R package is a structured collection of R functions, data, and compiled code that is bundled together according to a specific format.
They can be thought of as libraries or modules in other programming languages.
Why: R Packages are designed to add functionality to R, allowing users to perform specific tasks or analyses that are not covered by the basic installation of R.
How: You can install/uninstall, create, load, and use R packages.
If you want to build or publish your own package, the Comprehensive R Archive Network (CRAN), Bioconductor, and GitHub are popular repositories where R packages are commonly published and maintained.
What R package include
Functions: A set of R functions that perform specific tasks, which are not available in the default R environment.
Data: Some packages include datasets that are useful for demonstrating functions within the package or for use in specific types of analysis.
Documentation: Every package comes with documentation that explains how the functions work, the data included (if any), and examples of how to use the package. This is often accessible via R help pages.
Vignettes: Many packages include vignettes, which are long-form documentation that shows how to use the package functions in a more detailed and contextual way, often in the form of tutorials.
Namespace: A namespace file that manages how functions from the package are imported and exported, helping avoid naming conflicts between different packages.
Meta-information: A DESCRIPTION file containing metadata about the package, such as its name, version, dependencies (other packages it requires to function), author, and license information.
R Package states
When you create or modify a package, you work on its “source code” or “source files”. You interact with the in-development package in its source form.
To better understand package, we need to know the five states of R package:
source
bundled
binary
installed
in-memory
We already know two functions:
install.packages() can move a package from source/bundled/binary into installed state.
library can load a package from installed state into memory (in-memory state)
What are source/bundled/binary states then? Why they differ?
Source package
A source package is just a directory of files with specific structure including:
You may also find some tar.gz file on packages’ CRAN landing page via the “Package source” field (this is the bundled state of the package). Decompressing the tar.gz file will have the source directory including R/ and DESCRIPTION
A bundled package is a package that’s been compressed into a single file (this process is called build the package). Bundled state is a compressed form of package with only single file.
By convention, package bundles in R use the extension .tar.gz and are sometimes referred to as “source tarballs”. In computer science, it is called gzipped tar file format.
A “source tarballs” file is not simply compressed file of source directory. When build source directory into bundled (.tar.gz), a few diagnostic checks and cleanups are performed. See more details here.
Binary package
If you want to distribute your package to an R user who doesn’t have package development tools, you’ll need to provide a binary package. The main distributor of binary package is CRAN.
Like a package bundle, a binary package is a single file. Unlike a bundled package, a binary package is platform specific and there are two basic flavors: Windows and macOS.
CRAN packages are usually available in binary form:
forcats for macOS: forcats_0.4.0.tgz
readxl for Windows: readxl_1.3.1.zip
This is, indeed, part of what’s usually going on behind the scenes when you call install.packages().
Uncompressing binary file will give you totally difference file structure than source/bundled package.
There are no .R files in the R/ directory - instead there are three files that store the parsed functions in an efficient file format.
Installed package
An installed package is a binary package that’s been decompressed into a package library
In practice, you don’t need to care about stats if you install popular package, unless you have issues installing R package via install.packages() or you install in-development packages .
In-memory package
When we use library() function, we load installed package into the memory of R.
This is the last step of using the package in our R task.
When you call library(somepackage), R looks through the current libraries for an installed package named “somepackage” and, if successful, it makes somepackage available for use.
---title: "Lecture 01: Basics of R"subtitle: "Getting Started"author: "Jihong Zhang*, Ph.D"institute: | Educational Statistics and Research Methods (ESRM) Program* University of Arkansasdate: "2024-10-09"date-modified: "2024-10-11"sidebar: id-lec6990vexecute: echo: true warning: false eval: falseoutput-location: defaultcode-annotations: belowformat: html: code-tools: true code-line-numbers: false code-fold: false number-offset: 0 anchor-sections: true number-sections: false uark-revealjs: scrollable: true chalkboard: true embed-resources: false code-fold: false number-sections: false footer: "ESRM 64503 - Lecture 01: Introduction to R" slide-number: c/t tbl-colwidths: auto output-file: slides-index.htmlfilters: - quarto - line-highlight---## Today's Class1. Why using R? 1. Brief history of R 2. Main features of R2. Installation of R3. How to use RStudio# Why R?## Brief History- **1975-1976**: S (Book: [*A Brief History of S*](https://sas.uwaterloo.ca/~rwoldfor/software/R-code/historyOfS.pdf)) grew up in the statistics research departments (John Chambers and others) at Bell Laboratories - To bring interactive computing to bear on statistics and data analysis problem- **1993**: Prof. [Ross Ihaka](https://en.wikipedia.org/wiki/Ross_Ihaka "Ross Ihaka") and [Robert Gentleman](https://en.wikipedia.org/wiki/Robert_Gentleman_(statistician)) from University of Auckland posted first binary file of R to teach introductory statistics- **1995**: Martin Mächler made an important contribution by convincing Ross and Robert to use the [GNU General Public License](http://www.gnu.org/licenses/gpl-2.0.html) to make R free software- **1997**: The [Comprehensive R Archive Network](https://en.wikipedia.org/wiki/R_package#Comprehensive_R_Archive_Network_(CRAN) "R package") (**CRAN**) was founded by Kurt Hornik and [Friedrich Leisch](https://en.wikipedia.org/wiki/Friedrich_Leisch "Friedrich Leisch") to host R's [source code](https://en.wikipedia.org/wiki/Source_code "Source code"), executable files, documentation, and user-created packages- **2000**: the first official 1.0 version of R was released- **2024**: R ver. 4.2.1## Example of S Language```{r}#| error: true#| output-location: column#| eval: trueX =1:5# A vector of numbers from 1 to 5X[c(TRUE, TRUE, TRUE, FALSE, FALSE)]X[1:3]X[-1:3]X[-(1:3)]X[NULL]X[NA]X[]```## Main Feature of R1. it was developed by statisticians as an interactive environment for data analysis rather than C or Java that created by software development.2. The interactivity of R is an indispensable feature in data science3. However, like in other programming languages, you can save your work in R as scripts that can be easily executed at any moment.4. If you are an expert programmer, you should not expect R to follow the conventions you are used to since you will be disappointed.## Attractive Features of R- R is free and open source.- It runs on all major platforms: Windows, MacOS, UNIX/Linux.- Scripts and data objects can be shared seamlessly across platforms.- There is a large, growing, and active community of R users and, as a result, there are numerous resources for learning and asking questions. - [stackoverflow](https://stats.stackexchange.com/questions/138/free-resources-for-learning-r) - [r-project.com](https://www.r-project.org/help.html)- It is easy for others to contribute add-ons which enables developers to share software implementations of new data science methodologies. This gives R users early access to the latest methods and to tools which are developed for a wide variety of disciplines, including ecology, molecular biology, social sciences, and geography, just to name a few example# Get started## R console- One way of using R is to simply start R console on your computer (PC). - In Mac, after installing R, simply type in "R" in terminal to get started::: panel-tabset## Windows{fig-align="center"}## Mac/Linux{fig-align="center"}:::As a quick example, try using the console to calculate a 15% tip on a meal that cost \$19.71:```{r}0.15*19.71```# Rstudio (now called Posit)## Installation of R and RStudio- You can download and install **R base** via r-project.org (currently R-4.4.1) - for [Linux](https://cloud.r-project.org/bin/linux/) - for [Windows](https://cloud.r-project.org/bin/windows/) - for [MacOS](https://cloud.r-project.org/bin/macosx/)- Then, after the installation of R, you can download RStudio via posit.co (currently) - for [Windows](https://download1.rstudio.org/electron/windows/RStudio-2024.04.2-764.exe) - for [MacOS](https://download1.rstudio.org/electron/macos/RStudio-2024.04.2-764.dmg)- After installation of R and RStudio, you can open up the RStudio to start your R programming. - however, your R only has the **base** package - To enhance its utility, most users will install **R packages** for certain purposes## R packages- R packages are uploaded to some platforms (i.e., CRAN or Github) by researchers or companies - Those R packages typically have their version numbers. Some functions may be available for some version (like Ver. 1.1) but not be available in other versions. - Do not upgrade your packages if you code is running well- R users are free to download and use those R packages - To download certain package, you should know package name - For example, if you want to download the latest version of `tidyverse` package, you can type in following command in **the console panel** of Rstudio```{r}#| eval: false#| echo: true#| code-fold: falseinstall.packages("tidyverse")``` - Or if you want to install the older version of package ```{r} #| eval: false #| echo: true #| code-fold: false require(devtools) install_version("tidyverse", version = "1.3.0", repos = "http://cran.us.r-project.org") ```## More about R packages- **CRAN** (Comprehensive R Archive Network) is a network of servers around the world that store identical, up-to-date, versions of code and documentation for R. - It contains most stable version of packages. - Most of time, we download package from CRAN- **Github** is for the fast development for R packages - It contains the up-to-date version of R which may potentially be unstable - You can download the package from Github using `pak` package```{r}#| eval: false#| echo: true#| code-fold: false pak::pak("tidyverse/ggplot2")``` - You can update the package and its dependencies```{r}#| eval: false#| echo: true#| code-fold: false pak::pkg_install("ggplot2", upgrade =TRUE)```## R functions- To operate certain tasks, you need to use functions contained in R packages - There are two ways of using R functions - **Direct way**: you don't have to load your package first -  ------------------------------------------------------------------------ - **Use-after-load way**: Package is loaded in your session before you can call the function name without specifying the package name ```{r} #| eval: true #| echo: true #| code-fold: false library("ggplot2") ggplot() + geom_point(aes(x = 1:100, y = 100:1), color = "tomato") ```## R functions (Cont.)- How do you know you already load the package or not- You can use `sessionInfo` function```{r}#| echo: true#| code-fold: falsesessionInfo()```- It outputs multiple info: - R version, Operations System, Matrix operation package, Locale - **Attached packages** (you can call the functions of those package) - **Loaded package via a namespace (and not attached)**, which you cannot call functions and need to `library` or `require` them## Execute R code- After you finish R script, you have **multiple ways of executing the code and output the results on Consol**e: - **Method 1**: you can click `Run` button in the top right-head of **Rstudio** - **Method 2**: you can select certain code and press `Ctrl + Enter` (Win) or `Command + Return` (Mac) - **Method 3**: you can `Rscript [FILENAME].r` to run the whole script - **Method 4**: you can using R notebook to interactively run R code+----------------------------+-------------------+-----------------------------+|| Script file is .R | Script file is .rmd or .qmd |+============================+===================+=============================+| **Run the whole script** | - Method 1 | - Method 4 ||| - Method 3 ||+----------------------------+-------------------+-----------------------------+| **Run the partial script** | - Method 2 | - Method 4 |+----------------------------+-------------------+-----------------------------+: {.striped}## ## Introduce Rstudio- RStudio will be our launching pad for data science projects. It not only provides an editor for us to create and edit our scripts but also provides many other useful tools.- When you start RStudio for the first time, you will see three panes: - The left pane shows the Code editor (will show when you create a new file) and R console. - On the right, the top pane includes tabs such as *Environment* and *History*, while the bottom pane shows five tabs: *File*, *Plots*, *Packages*, *Help*, and *Viewer* .- To start a new script in Code editor, you can click on `File`\>`New File`\>`R Script`.::: panel-tabset## Screenshot of Rstudio{fig-align="center"}## Start a new script{fig-align="center"}## Start writing your script{fig-align="center"}:::## Key Binding- For the efficient coding, **we highly recommend that you memorize key bindings for the operations you use most**.- RStudio provides a useful cheat sheet with the most widely used commands- To open the cheat sheet, `Help`\>`Cheat Sheets`\>`Rstudio IDE Cheat Sheets`{fig-align="center"}## Global Option- You can change the look and functionality of RStudio quite a bit.- To change the global options you click on *`Tools`* then *`Global Options`…*.- As an example we show how to make a change that we **highly recommend**: - `General`\>`Basic`\>`Workspace`: Change the [*Save workspace to .RData*]{.underline} *on exit* to [*Never*]{.underline} . - `General`\>`Basic`\>`Workspace`: Uncheck the [*Restore .RData into workspace at startup*]{.underline} to [*Never*]{.underline} - `Code`\>`Editing`: check [*use the native piper operator, \|\>*]{.underline}::: callout-note## .RData file- By default, when you exit R saves all the objects you have created into a file called .RData.- This is done so that when you restart the session in the same folder, it will load these objects.- We find that this causes confusion especially when we share code with colleagues and assume they have this .RData file.:::## Installing R Packages: from CRAN- For example, to install the **`dslabs`** package, you would type the following in your console:```{r}install.packages("dslabs") # DON'T FORGET DOUBLE QUOTE```- We can then load the package into our R sessions using the `library` function in your Rscript file:```{r}#| eval: truelibrary("dslabs")head(admissions)```- As you go through this class, you will see that we load packages without installing them. This is because once you install a package, it remains installed and only needs to be loaded with `library`.- We can install more than one package at once by feeding a character vector to this function:```{r}install.packages(c("dplyr", "dslabs"))```- You can see all the packages you have installed using the following function:```{r}installed.packages()```## Installing R Packages: from GitHub- You can also install user-built package from GitHub- I built a package for this course: [link](https://github.com/JihongZ/ESRM6990V)```{r}#| eval: falseinstall.packages("remotes") # install one package called "remotes"library("remotes") # load the package into your R sessioninstall_github(repo ="JihongZ/ESRM6990V") # install one GitHub package from my GitHub repositorylibrary(ESRM6990V) # load the package into your R sessionjihong(details =TRUE) # call one function called "jihong" from the package```### Let's Practice1. [Finish Exercise 1](ESRM6990V_Example01.qmd)# R Package Structure## Basic Information1. **What**: An R package is a structured collection of R functions, data, and compiled code that is bundled together according to a specific format. - They can be thought of as libraries or modules in other programming languages.2. **Why**: R Packages are designed to add functionality to R, allowing users to perform specific tasks or analyses that are not covered by the basic installation of R.3. **How:** You can install/uninstall, create, load, and use R packages. - If you want to build or publish your own package, the Comprehensive R Archive Network (CRAN), Bioconductor, and GitHub are popular repositories where R packages are commonly published and maintained.## What R package include- **Functions**: A set of R functions that perform specific tasks, which are not available in the default R environment.- **Data**: Some packages include datasets that are useful for demonstrating functions within the package or for use in specific types of analysis.- **Documentation**: Every package comes with documentation that explains how the functions work, the data included (if any), and examples of how to use the package. This is often accessible via R help pages.- **Vignettes**: Many packages include vignettes, which are long-form documentation that shows how to use the package functions in a more detailed and contextual way, often in the form of tutorials.- **Namespace**: A namespace file that manages how functions from the package are imported and exported, helping avoid naming conflicts between different packages.- **Meta-information**: A DESCRIPTION file containing metadata about the package, such as its name, version, dependencies (other packages it requires to function), author, and license information.## R Package states1. When you create or modify a package, you work on its “source code” or “source files”. You interact with the in-development package in its **source** form.2. To better understand package, we need to know the five states of R package: 1. source 2. bundled 3. binary 4. installed 5. in-memory3. We already know two functions: 1. `install.packages()` can move a package from **source/bundled/binary** into **installed** state. 2. `library` can load a package from installed state into memory (**in-memory** state)4. What are source/bundled/binary states then? Why they differ?## Source package1. A source package is just a directory of files with specific structure including: 1. **DESCRIPTION** file 2. `R/` folder containing all `.r` files2. Many R packages on GitHub are in source state 1. `networkscore`: <https://github.com/JihongZ/networkscore> 2. `esrm64503`: <https://github.com/JihongZ/ESRM64503>3. You may also find some `tar.gz` file on packages' CRAN landing page via the “Package source” field (this is the bundled state of the package). Decompressing the `tar.gz` file will have the source directory including `R/` and `DESCRIPTION` 1. forcats: <https://cran.r-project.org/web/packages/forcats/index.html> 2. You can depress using commands in terminal like:``` bashtar xvf forcats_0.4.0.tar.gz```## Bundled package1. A bundled package is a package that's been compressed into a single file (this process is called `build` the package). Bundled state is a compressed form of package with only single file.2. By convention, package bundles in R use the extension `.tar.gz` and are sometimes referred to as "source tarballs". In computer science, it is called gzipped tar file format.3. A "source tarballs" file is not simply compressed file of source directory. When build source directory into bundled (.tar.gz), a few diagnostic checks and cleanups are performed. See more details [here](https://cran.r-project.org/doc/manuals/R-exts.html#Building-package-tarballs).## Binary package1. If you want to distribute your package to an R user who doesn’t have package development tools, you’ll need to provide a **binary** package. The main distributor of **binary** package is CRAN.2. Like a package bundle, a binary package is a single file. Unlike a bundled package, a binary package is platform specific and there are two basic flavors: Windows and macOS.3. CRAN packages are usually available in binary form: - forcats for macOS: forcats_0.4.0.tgz - readxl for Windows: readxl_1.3.1.zip4. This is, indeed, part of what’s usually going on behind the scenes when you call `install.packages()`.5. Uncompressing binary file will give you totally difference file structure than source/bundled package. - There are no .R files in the R/ directory - instead there are three files that store the parsed functions in an efficient file format.## Installed package1. An installed package is a binary package that's been decompressed into a package library2. In practice, you don't need to care about stats if you install popular package, unless you have issues installing R package via `install.packages()` or you install in-development packages .## In-memory package1. When we use `library()` function, we load installed package into the memory of R.2. This is the last step of using the package in our R task.3. When you call `library(somepackage)`, R looks through the current libraries for an installed package named “somepackage” and, if successful, it makes somepackage available for use.# Next Week## Preparation: Make Contribute to Github R Package1. Make sure you have set up a [GitHub account](https://github.com/)2. Make sure you download the [GitHub Desktop](https://desktop.github.com/download/)