1 Data Import Exercises

1.1 Exercise 0: Importing Data from the online Website

You are working on the student exam analysis for UARK. You want to download the UARK Final exam table and then import it into R for data analysis.

Hint: use rvest package to construct a data table

❖ Answer ❖

final_example_tbl$`Exam Time` <- ifelse(final_example_tbl$`Exam Time` == "During Posted Math Testing Center Hours", NA, final_example_tbl$`Exam Time`)
final_example_tbl2 <- final_example_tbl |> 
  separate(`Exam Time`, into = c("Start_Time", "End_Time"), sep = " to ")

final_example_tbl2 |> 
  mutate(
    Start_Time = parse_time(trimws(Start_Time)),
    End_Time = parse_time(trimws(End_Time))
  )

1.2 Exercise 1: Importing Data from CSV

You are working with the following CSV file containing employee information:

Employee ID,Full Name,Department,Salary,Years of Experience
1,John Doe,HR,50000,5
2,Jane Smith,Engineering,N/A,8
3,Michael Johnson,Marketing,55000,6
4,Susan Lee,HR,52000,4
5,David Kim,Engineering,65000,7
6,Amy Brown,Marketing,57000,5

Use the read_csv() function to import this data into R.

For illustration, you can use the literal data (hint: use I()) to import the csv text.

Clean the data by replacing any missing salary values with NA.

Rename the columns to use snake case (employee_id, full_name, department, salary, years_of_experience).

1.3 Exercise 2: Importing Data Using Custom Delimiters

You have another dataset where the values are separated by semicolons (;) instead of commas:

Employee ID;Full Name;Department;Salary;Years of Experience
1;John Doe;HR;50000;5
2;Jane Smith;Engineering;60000;8
3;Michael Johnson;Marketing;55000;6
4;Susan Lee;HR;52000;4
5;David Kim;Engineering;65000;7
6;Amy Brown;Marketing;57000;5

Use read_delim() to import this dataset with the semicolon delimiter (;).

2 Data Tidying Exercises

2.1 Exercise 3: Pivoting Data Longer

You have a dataset that tracks the number of customers visiting different stores each week:

❖ Answer ❖

store_visits <- tibble(
  store = c("Store A", "Store B", "Store C"),
  week_1 = c(120, 150, 110),
  week_2 = c(130, 160, 120),
  week_3 = c(140, 170, 130),
  week_4 = c(110, 140, 100),
  week_5 = c(160, 180, 150)
)

Use pivot_longer() to convert this dataset into a long format, where each row represents a single observation for a specific week.

2.2 Exercise 4: Pivoting Data Wider

You have a long-format dataset that tracks the average test scores of students in different subjects over several terms:

❖ Answer ❖

scores_long <- tibble(
  student = c("Alice", "Alice", "Bob", "Bob", "Charlie", "Charlie"),
  term = c("Term 1", "Term 2", "Term 1", "Term 2", "Term 1", "Term 2"),
  subject = c("Math", "Math", "Math", "Math", "Math", "Math"),
  score = c(85, 90, 78, 80, 92, 95)
)

Use pivot_wider() to convert this dataset into a wide format, where each term (e.g., “Term 1”, “Term 2”) is a column, and the scores are the values.

2.3 Exercise 5: Combining pivot_longer() and pivot_wider()

You have a dataset tracking the number of calls received by a customer service center, split by department and month:

❖ Answer ❖

calls_wide <- tibble(
  department = c("Sales", "Support", "Billing"),
  january = c(200, 150, 180),
  february = c(210, 160, 190),
  march = c(220, 170, 200)
)

First, use pivot_longer() to convert this dataset into a long format, where each row corresponds to a specific department and month.
Then, use pivot_wider() to convert the long dataset back into a wide format, where the months are columns and the values represent the number of calls.

--- title: "Exercise 04: Data Manipulation and Function Scope Exercises" subtitle: "" date: "2025-08-18" execute: warning: false message: false eval: false echo: true format: html: page-layout: full code-summary: "❖ Answer ❖" code-fold: true toc: true toc-depth: 2 lightbox: true code-fold: show --- # Data Import Exercises ## Exercise 0: Importing Data from the online Website You are working on the student exam analysis for UARK. You want to download the [UARK Final exam table](https://registrar.uark.edu/registration/final-exam-schedule/spring-2025-final-exam-schedule.php) and then import it into R for data analysis. - Hint: use `rvest` package to construct a data table ```{r} #| eval: false #| echo: false #| message: false html_text_raw <- ' <table class="table table-bordered"> <tbody> <tr> <th>Final Exam Date</th> <th>Class Meeting Time/ Common Final</th> <th>Exam Time</th> </tr> <tr> <td rowspan="7">Monday, May 5, 2025</td> <td>MATH 11103, 12003, 21003, 22003</td> <td>During Posted Math Testing Center Hours</td> </tr> <tr> <td>CHEM 12103; CHEM 14103</td> <td>8:00 AM to 10:00 AM</td> </tr> <tr> <td>Mon/Wed/Fri 10:45 AM</td> <td>10:15 AM to 12:15 PM</td> </tr> <tr> <td>Mon/Wed/Fri 12:55 PM</td> <td>12:45 PM to 2:45 PM</td> </tr> <tr> <td>Mon/Wed/Fri 2:00 PM</td> <td>3:00 PM  to 5:00 PM</td> </tr> <tr> <td>SEVI 30103/SEVI 301H3 MATH 13004, 24005, 24004, 240H4, 25004, 250H4, 26004, 260H4</td> <td>5:30 PM to 7:30 PM</td> </tr> <tr> <td>Mon/Wed/Fri 5:15 PM Mon & Mon/Wed Classes Starting 5:00 PM or Later Alternate Time Slot for Makeup Exams</td> <td>7:45 PM to 9:45 PM</td> </tr> <tr> <td rowspan="8">Tuesday, May 6, 2025</td> </tr> <tr> <td>MATH 11003 (Sections 51 and Above) See Instructor for Details</td> <td>During Posted Math Testing Center Hours</td> </tr> <tr> <td>Tue/Thu 9:30 AM</td> <td>8:00 AM to 10:00 AM</td> </tr> <tr> <td>FOREIGN LANGUAGE 10103, 10203, 10106, 11003, 20103, 20203 CLASSES (FREN, JAPN, CHIN) EXCEPT ARAB 10103, 10106, 20106; CHRK 10103, 10203; GERM 10103, 10203, 20103, 20203; GREK 10203, 20103, 20203; ITAL 10103, 10203, 20103, 20203; LATN 10103, 10203, 20203; RUSS 10103, 10203, 20103, 20203; SPAN 10103, 10203, 20103, 20203, 202H3</td> <td>10:15 AM to 12:15 PM</td> </tr> <tr> <td>Tue/Thu 12:30 PM</td> <td>12:45 PM to 2:45 PM</td> </tr> <tr> <td>Tue/Thu 3:30 PM</td> <td>3:00 PM  to 5:00 PM</td> </tr> <tr> <td>FINN 20403/FINN 204H3</td> <td>5:30 PM to 7:30 PM</td> </tr> <tr> <td>Tue & Tue/Thu Classes Starting 5:00 PM or Later Alternate Time Slot for Makeup Exams</td> <td>7:45 PM to 9:45 PM</td> </tr> <tr> <td rowspan="7">Wednesday, May 7, 2025</td> <td>MATH 11003 (Sections 1 - 50) See Instructor for Details</td> <td>During Posted Math Testing Center Hours</td> </tr> <tr> <td>Mon/Wed/Fri 8:35 AM</td> <td>8:00 AM to 10:00 AM</td> </tr> <tr> <td>Mon/Wed/Fri 9:40 AM</td> <td>10:15 AM to 12:15 PM</td> </tr> <tr> <td>Mon/Wed/Fri 11:50 AM</td> <td>12:45 PM to 2:45 PM</td> </tr> <tr> <td>Mon/Wed/Fri 3:05 PM</td> <td>3:00 PM  to 5:00 PM</td> </tr> <tr> <td>SEVI 20503/SEVI 205H3 CHEM 14203/CHEM 142H3 CHEM 36203/CHEM 362H3/CHEM 37103</td> <td>5:30 PM to 7:30 PM</td> </tr> <tr> <td>Mon/Wed 4:35 Mon/Wed/Fri 4:10 PM Wed Classes Starting 5:00 PM or Later Alternate Time Slot for Makeup Exams</td> <td>7:45 PM to 9:45 PM</td> </tr> <tr> <td rowspan="7">Thursday, May 8, 2025</td> <td>MATH 20503</td> <td>During Posted Math Testing Center Hours</td> </tr> <tr> <td>Tue/Thu 8:00 AM</td> <td>8:00 AM to 10:00 AM</td> </tr> <tr> <td>Tue/Thu 11:00 AM</td> <td>10:15 AM to 12:15 PM</td> </tr> <tr> <td>Tue/Thu 2:00 PM</td> <td>12:45 PM to 2:45 PM</td> </tr> <tr> <td>Tue/Thu 4:00 PM Tue/Thu 4:30 PM</td> <td>3:00 PM to 5:00 PM</td> </tr> <tr> <td>PHYS 20203 MGMT 21003/MGMT 210H3</td> <td>5:30 PM to 7:30 PM</td> </tr> <tr> <td>Thu Classes Starting 5:00 PM or Later Alternate Time Slot for Makeup Exams</td> <td>7:45 PM to 9:45 PM</td> </tr> <tr> <td rowspan="4">Friday, May 9, 2025</td> <td>Mon/Wed/Fri 7:30 AM Alternate Time Slot for Makeup Exams</td> <td>8:00 AM to 10:00 AM</td> </tr> <tr> <td>ISYS 21003/ISYS 210H3; BUSI 10303/BUSI 103H3 Alternate Time Slot for Makeup Exams</td> <td>10:15 AM to 12:15 PM</td> </tr> </tbody> </table> ' final_example_tbl <- html_table(html_element(minimal_html(html_text_raw), "table")) final_example_tbl ``` ```{r} final_example_tbl$`Exam Time` <- ifelse(final_example_tbl$`Exam Time` == "During Posted Math Testing Center Hours", NA, final_example_tbl$`Exam Time`) final_example_tbl2 <- final_example_tbl |> separate(`Exam Time`, into = c("Start_Time", "End_Time"), sep = " to ") final_example_tbl2 |> mutate( Start_Time = parse_time(trimws(Start_Time)), End_Time = parse_time(trimws(End_Time)) ) ``` ## Exercise 1: Importing Data from CSV You are working with the following CSV file containing employee information: ``` Employee ID,Full Name,Department,Salary,Years of Experience 1,John Doe,HR,50000,5 2,Jane Smith,Engineering,N/A,8 3,Michael Johnson,Marketing,55000,6 4,Susan Lee,HR,52000,4 5,David Kim,Engineering,65000,7 6,Amy Brown,Marketing,57000,5 ``` 1. Use the `read_csv()` function to import this data into R. - For illustration, you can use the literal data (hint: use `I()`) to import the csv text. ```{r} #| eval: false #| echo: false #| message: false library(readr) employee_data <- read_csv(I( " Employee ID,Full Name,Department,Salary,Years of Experience 1,John Doe,HR,50000,5 2,Jane Smith,Engineering,N/A,8 3,Michael Johnson,Marketing,55000,6 4,Susan Lee,HR,52000,4 5,David Kim,Engineering,65000,7 6,Amy Brown,Marketing,57000,5 " )) employee_data ``` 2. Clean the data by replacing any missing salary values with NA. ```{r} #| eval: false #| echo: false #| message: false employee_data_NATrans <- read_csv(I( " Employee ID,Full Name,Department,Salary,Years of Experience 1,John Doe,HR,50000,5 2,Jane Smith,Engineering,N/A,8 3,Michael Johnson,Marketing,55000,6 4,Susan Lee,HR,52000,4 5,David Kim,Engineering,65000,7 6,Amy Brown,Marketing,57000,5 " ), na = "N/A") employee_data_NATrans ``` 3. Rename the columns to use snake case (employee_id, full_name, department, salary, years_of_experience). ```{r} #| eval: false #| echo: false #| message: false library(dplyr) employee_data_cleaned <- employee_data_NATrans %>% rename(employee_id = `Employee ID`, full_name = `Full Name`, department = `Department`, salary = `Salary`, years_of_experience = `Years of Experience`) employee_data_cleaned ``` ## Exercise 2: Importing Data Using Custom Delimiters You have another dataset where the values are separated by semicolons (;) instead of commas: ``` Employee ID;Full Name;Department;Salary;Years of Experience 1;John Doe;HR;50000;5 2;Jane Smith;Engineering;60000;8 3;Michael Johnson;Marketing;55000;6 4;Susan Lee;HR;52000;4 5;David Kim;Engineering;65000;7 6;Amy Brown;Marketing;57000;5 ``` - Use `read_delim()` to import this dataset with the semicolon delimiter (`;`). ```{r} #| eval: false #| echo: false #| message: false data_txt = " Employee ID;Full Name;Department;Salary;Years of Experience 1;John Doe;HR;50000;5 2;Jane Smith;Engineering;60000;8 3;Michael Johnson;Marketing;55000;6 4;Susan Lee;HR;52000;4 5;David Kim;Engineering;65000;7 6;Amy Brown;Marketing;57000;5" employee_data_semicolon <- read_delim(I(data_txt), delim = ";") employee_data_semicolon ``` # Data Tidying Exercises ## Exercise 3: Pivoting Data Longer You have a dataset that tracks the number of customers visiting different stores each week: ```{r} store_visits <- tibble( store = c("Store A", "Store B", "Store C"), week_1 = c(120, 150, 110), week_2 = c(130, 160, 120), week_3 = c(140, 170, 130), week_4 = c(110, 140, 100), week_5 = c(160, 180, 150) ) ``` - Use `pivot_longer()` to convert this dataset into a long format, where each row represents a single observation for a specific week. ```{r} #| eval: false #| echo: false #| message: false store_visits_long <- store_visits %>% pivot_longer( cols = starts_with("week_"), names_to = "week", values_to = "visits" ) store_visits_long ``` ## Exercise 4: Pivoting Data Wider You have a long-format dataset that tracks the average test scores of students in different subjects over several terms: ```{r} scores_long <- tibble( student = c("Alice", "Alice", "Bob", "Bob", "Charlie", "Charlie"), term = c("Term 1", "Term 2", "Term 1", "Term 2", "Term 1", "Term 2"), subject = c("Math", "Math", "Math", "Math", "Math", "Math"), score = c(85, 90, 78, 80, 92, 95) ) ``` - Use `pivot_wider()` to convert this dataset into a wide format, where each term (e.g., "Term 1", "Term 2") is a column, and the scores are the values. ```{r} #| eval: false #| echo: false #| message: false scores_wide <- scores_long %>% pivot_wider( names_from = term, values_from = score ) scores_wide ``` ## Exercise 5: Combining pivot_longer() and pivot_wider() You have a dataset tracking the number of calls received by a customer service center, split by department and month: ```{r} calls_wide <- tibble( department = c("Sales", "Support", "Billing"), january = c(200, 150, 180), february = c(210, 160, 190), march = c(220, 170, 200) ) ``` - First, use `pivot_longer()` to convert this dataset into a long format, where each row corresponds to a specific department and month. - Then, use `pivot_wider()` to convert the long dataset back into a wide format, where the months are columns and the values represent the number of calls. ```{r} #| eval: false #| echo: false #| message: false calls_long <- calls_wide %>% pivot_longer( cols = c("january", "february", "march"), names_to = "month", values_to = "calls" ) calls_wide_again <- calls_long %>% pivot_wider( names_from = month, values_from = calls ) ```