[1] 2.079442
R Function
Educational Statistics and Research Methods (ESRM) Program*
University of Arkansas
2025-01-28
Functions: Once we defined the objects, the data analysis process can usually be described as a series of functions applied to the data.
In other words, we considered “function” as a set of pre-specified operations (e.g., macro in SAS)
R includes several predefined functions and most of the analysis pipelines we construct make extensive use of these.
We already used or discussed the install.packages
, library
, and ls
functions. We also used the function sqrt
to solve the quadratic equation above.
Evaluation: In general, we need to use parentheses followed by a function name to evaluate a function.
ls
, the function is not evaluated and instead R shows you the code that defines the function. If you type ls()
the function is evaluated and, as seen above, we see objects in the workspace.Function Arguments: Unlike ls
, most functions require one or more arguments to specify the settings of the function.
log
. Remember that we earlier defined coef_a
to be 1:It enables chaining multiple operations together in a readable, sequential manner.
Key idea: The pipe operator passes the output of one function as the first argument to the next function, eliminating the need for intermediate variables or nested function calls.
This is equivalent to:
Example: Without Pipe
Example: With Pipe
temp_c
(temperature in Celsius).temp_f
(temperature in Fahrenheit).# A tibble: 5 × 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 -0.250 0.378 1.16 -1.28
2 0.293 0.0678 1.93 3.67
3 -0.738 -0.555 0.256 1.57
4 1.20 0.325 -0.203 -0.976
5 0.702 -1.58 -3.11 -0.826
df |> mutate(
a = (a - mean(a, na.rm = TRUE)) / sd(a),
b = (b - mean(a, na.rm = TRUE)) / sd(b),
c = (c - mean(c, na.rm = TRUE)) / sd(c),
d = (d - mean(d, na.rm = TRUE)) / sd(d),
)
# A tibble: 5 × 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 -0.643 0.462 0.597 -0.802
2 0.0687 0.0828 0.999 1.51
3 -1.28 -0.678 0.130 0.533
4 1.25 0.398 -0.108 -0.658
5 0.605 -1.93 -1.62 -0.588
tibble
creates a data frame with 4 columnsmutate
creates new columns - generate standardized scores of all columnsstandardized <- function(x){
(x - mean(x, na.rm = TRUE)) / sd(x)
}
df |> mutate(
a = standardized(a),
b = standardized(b),
c = standardized(c),
d = standardized(d)
)
# A tibble: 5 × 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 -0.643 0.795 0.597 -0.802
2 0.0687 0.415 0.999 1.51
3 -1.28 -0.345 0.130 0.533
4 1.25 0.730 -0.108 -0.658
5 0.605 -1.59 -1.62 -0.588
rescale
to simplify following code:df |> mutate(
a = (a - mean(a, na.rm = TRUE)) / (max(a) - min(a)),
b = (b - mean(b, na.rm = TRUE)) / (max(b) - min(b)),
c = (c - mean(c, na.rm = TRUE)) / (max(c) - min(c)),
d = (d - mean(d, na.rm = TRUE)) / (max(d) - min(d)),
)
# A tibble: 5 × 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 -0.253 0.333 0.228 -0.346
2 0.0271 0.174 0.382 0.654
3 -0.506 -0.145 0.0497 0.230
4 0.494 0.306 -0.0413 -0.284
5 0.239 -0.667 -0.618 -0.254
A name. Here we’ll use ‘rescale’ because this function rescales a vector to lie between 0 and 1.
The body. The body is the code that’s repeated across all the calls.
function_name
: Name of the function.arguments
: Inputs provided to the function.body
: Code block that performs the computation.return()
: Specifies the output of the function....
: Allow a function to accept a variable number of arguments.sum_numbers <- function(...) {
numbers <- c(...) # combine into a vector
return(sum(numbers))
}
sum_numbers(1, 2, 3, 4) # Output: 10
[1] 10
mean
, tell me why we can have y
argument in mean
function[1] 8
[1] 5.6
flexible_mean <- function(...){
return(mean(x = c(...)))
}
flexible_mean(y = 3, x = 8, z = 9, one_vector = c(7, TRUE))
[1] 5.6
sum
function, tell me why sum
can accept flexible arguments.return()
for clarity.ESRM 64503 - Lecture 03: Object/Function/Package