Lecture 03: R Functions

R Function

Author
Affiliation

Jihong Zhang*, Ph.D

Educational Statistics and Research Methods (ESRM) Program*

University of Arkansas

Published

January 28, 2025

Modified

October 11, 2024

2 R Function

2.1 Prebuilt functions

  • Functions: Once we defined the objects, the data analysis process can usually be described as a series of functions applied to the data.

    • In other words, we considered “function” as a set of pre-specified operations (e.g., macro in SAS)

    • R includes several predefined functions and most of the analysis pipelines we construct make extensive use of these.

    • We already used or discussed the install.packages, library, and ls functions. We also used the function sqrt to solve the quadratic equation above.

  • Evaluation: In general, we need to use parentheses followed by a function name to evaluate a function.

    • If you type ls, the function is not evaluated and instead R shows you the code that defines the function. If you type ls() the function is evaluated and, as seen above, we see objects in the workspace.
  • Function Arguments: Unlike ls, most functions require one or more arguments to specify the settings of the function.

    • For example, we assign different object to the argument of the function log. Remember that we earlier defined coef_a to be 1:
log(8)
[1] 2.079442

2.2 Functions in R

2.2.1 Overview of Functions

  • Definition: Functions are blocks of code designed to perform specific tasks.
  • Purpose:
    • Automate repetitive tasks.
    • Increase code reusability and readability.
      • A good rule of thumb is to consider writing a function whenever you’ve copied and pasted a block of code more than twice
  • Key Characteristics:
    • Inputs (arguments) → Process → Output (return value).
    • Functions can contain other functions.

2.2.2 Benefits of Using Functions

  1. Code Reusability: Write once, use multiple times.
  2. Readability: Simplify complex code.
  3. Debugging: Isolate errors within specific functions.
  4. Scalability: Build modular, extensible codebases.

3 Factor Creation

3.1 Helpful function: Pipe

  • In R, a pipe is a powerful operator (“|>” or “%>%”) used to streamline the flow of data analysis.
    • A operator is a special function with the name as symbol and left/right hand sides as arguments.
1 + 2
`+`(1, 2)
  • It enables chaining multiple operations together in a readable, sequential manner.

  • Key idea: The pipe operator passes the output of one function as the first argument to the next function, eliminating the need for intermediate variables or nested function calls.

data |> function1() |> function2() |> function3()

This is equivalent to:

function3(function2(function1(data)))

3.1.1 Example: With vs. Without Pipe

Example: Without Pipe

# Without pipe
result <- filter(mtcars, mpg > 20)
result <- select(result, mpg, cyl)
result <- arrange(result, desc(mpg))

Example: With Pipe

# With pipe
mtcars |> 
  filter(mpg > 20) |> 
  select(mpg, cyl) |> 
  arrange(desc(mpg))
  • Which one you prefer?

3.2 Creating Custom Function

3.2.1 Example: Celsius to Fahrenheit Converter

celsius_to_fahrenheit <- function(temp_c) {
  temp_f <- (temp_c * 9/5) + 32
  return(temp_f)
}

celsius_to_fahrenheit(25)  # Output: 77
  • Converts a temperature from Celsius to Fahrenheit.
  • Input: temp_c (temperature in Celsius).
  • Output: temp_f (temperature in Fahrenheit).

3.2.2 Example: Standardization

  • Did you spot the mistake?
library(dplyr)
df <- tibble(
  a = rnorm(5),
  b = rnorm(5),
  c = rnorm(5),
  d = rnorm(5),
)
df
# A tibble: 5 × 4
       a       b      c      d
   <dbl>   <dbl>  <dbl>  <dbl>
1 -0.250  0.378   1.16  -1.28 
2  0.293  0.0678  1.93   3.67 
3 -0.738 -0.555   0.256  1.57 
4  1.20   0.325  -0.203 -0.976
5  0.702 -1.58   -3.11  -0.826
df |> mutate(
  a = (a - mean(a, na.rm = TRUE)) / sd(a),
  b = (b - mean(a, na.rm = TRUE)) / sd(b),
  c = (c - mean(c, na.rm = TRUE)) / sd(c),
  d = (d - mean(d, na.rm = TRUE)) / sd(d),
)
# A tibble: 5 × 4
        a       b      c      d
    <dbl>   <dbl>  <dbl>  <dbl>
1 -0.643   0.462   0.597 -0.802
2  0.0687  0.0828  0.999  1.51 
3 -1.28   -0.678   0.130  0.533
4  1.25    0.398  -0.108 -0.658
5  0.605  -1.93   -1.62  -0.588
  • tibble creates a data frame with 4 columns
  • mutate creates new columns - generate standardized scores of all columns

  • Extract the “argument”:
    • The arguments are things that vary across calls and our analysis above tells us that we have just one. We’ll call it x because this is the conventional name for a numeric vector.
standardized <- function(x){
  (x - mean(x, na.rm = TRUE)) / sd(x)
}

df |> mutate(
  a = standardized(a),
  b = standardized(b),
  c = standardized(c),
  d = standardized(d)
)
# A tibble: 5 × 4
        a      b      c      d
    <dbl>  <dbl>  <dbl>  <dbl>
1 -0.643   0.795  0.597 -0.802
2  0.0687  0.415  0.999  1.51 
3 -1.28   -0.345  0.130  0.533
4  1.25    0.730 -0.108 -0.658
5  0.605  -1.59  -1.62  -0.588

3.2.3 Exercise: Rescale

  • Create a new function called rescale to simplify following code:
df |> mutate(
  a = (a - mean(a, na.rm = TRUE)) / (max(a) - min(a)),
  b = (b - mean(b, na.rm = TRUE)) / (max(b) - min(b)),
  c = (c - mean(c, na.rm = TRUE)) / (max(c) - min(c)),
  d = (d - mean(d, na.rm = TRUE)) / (max(d) - min(d)),
)
# A tibble: 5 × 4
        a      b       c      d
    <dbl>  <dbl>   <dbl>  <dbl>
1 -0.253   0.333  0.228  -0.346
2  0.0271  0.174  0.382   0.654
3 -0.506  -0.145  0.0497  0.230
4  0.494   0.306 -0.0413 -0.284
5  0.239  -0.667 -0.618  -0.254
  • The basic skeleton of function is like this:
name <- function(arguments) {
  body
}
  • A name. Here we’ll use ‘rescale’ because this function rescales a vector to lie between 0 and 1.

  • The body. The body is the code that’s repeated across all the calls.

3.3 Anatomy of a Function

3.3.1 Example Code for one function

function_name <- function(argument1, argument2 = default_value) {
  # Body of the function
  result <- argument1 + argument2
  return(result)
}
  • Components:
    • function_name: Name of the function.
    • arguments: Inputs provided to the function.
    • body: Code block that performs the computation.
    • return(): Specifies the output of the function.

3.4 Arguments in Functions

3.4.1 Default Arguments

  • Assign default values to arguments to make them optional.
greet <- function(name = "World") {
  return(paste("Hello,", name))
}

greet()            # Output: "Hello, World"
greet("R User")    # Output: "Hello, R User"

3.4.2 Flexible Arguments

  • ...: Allow a function to accept a variable number of arguments.
sum_numbers <- function(...) {
  numbers <- c(...) # combine into a vector
  return(sum(numbers))
}

sum_numbers(1, 2, 3, 4)  # Output: 10
[1] 10
  • Look at the help page of mean, tell me why we can have y argument in mean function
mean(x = c(1, 2, 3), y = 3)
[1] 2

  • Flexible argument can be useful when you do not know users want to use which argument
mean(y = 3, x = 8, z = 9, one_vector = c(7, TRUE))
[1] 8
(3 + 8 + 9 + 7 + 1 ) / 5 # what we expect
[1] 5.6
flexible_mean <- function(...){
  return(mean(x = c(...)))
}
flexible_mean(y = 3, x = 8, z = 9, one_vector = c(7, TRUE))
[1] 5.6
  • Question: test the sum function, tell me why sum can accept flexible arguments.
sum(y = 3, x = 8, z = 9, one_vector = c(7, TRUE))

3.5 Returning Values

  • Functions return the last evaluated expression by default.
  • Use return() for clarity.

3.5.1 Example: Summing Two Numbers

add <- function(x, y) {
  return(x + y)
}

add(10, 5)  # Output: 15
[1] 15

3.6 Nested Functions

  • Functions can call other functions.

3.6.1 Example: Calculating BMI

bmi <- function(weight, height) {
  return(weight / (height^2))
}

bmi(70, 1.75)  # Output: 22.86
  • Combines mathematical operations into a single function.

3.7 Function Scope

  • Local Scope: Variables defined inside a function are local to that function.
x = 5
print_x <- function(x) {x = 3; return(x)}
x
  • Global Scope: Variables defined outside a function are accessible throughout the script.

3.7.1 Example: Local Scope

add <- function(x, y) {
  result <- x + y
  return(result)
}

add(2, 3)   # Output: 5
result      # Error: object 'result' not found
add <- function(x, y) {
  result <<- x + y
  return(result)
}
add(2, 3)   # Output: 5
result 

3.8 Exercise 03

  • Create a blank Quarto document
  • Finish Exercise 03: R Function in the Quarto

3.9 Summary

  • Functions are the cornerstone of programming in R.
  • They encapsulate reusable logic, making code efficient and modular.
  • Key concepts include arguments, return values, and scope.
  • Practice writing functions to automate tasks and solve complex problems.
Back to top