lubridate
package
Educational Statistics and Research Methods (ESRM) Program*
University of Arkansas
2025-02-05
Class Outline
Date
and POSIXct
types to manage date-time information.
tidyverse
provides more typesTip
You should always use the simplest possible data type that works for your needs. That means if you can use a date instead of a date-time, you should. Date-times are substantially more complicated because of the need to handle time zones, which we’ll come back to at the end of the chapter.
lubridate
for Date-Time ParsingLoading the Package
date
.time
within a day.date-time
is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second).
today()
or now()
:date
or date-time
, you don’t need to do anything; readr::read_csv()
will automatically recognize it:# A tibble: 1 × 2
date datetime
<date> <dttm>
1 2022-01-02 2022-01-02 05:12:00
Tip
The date format follows ISO8601. If you haven’t heard of ISO8601 before, it’s an international standard for writing dates where the components of a date are organized from biggest to smallest separated by -
. For example, in ISO8601 May 3 2022 is 2022-05-03.
For other date-time formats, you’ll need to use col_types=
plus col_date()
or col_datetime()
along with a date-time format.
col_date()
in readr package can uses a format specification
csv <- "
ID,birthdate, datatime
1,01/02/15, 2024.01.02
2,02/03/15, 2024.01.03
"
read_csv(csv, col_types = cols(birthdate = col_date(format = "%m/%d/%y")))
# A tibble: 2 × 3
ID birthdate datatime
<dbl> <date> <chr>
1 1 2015-01-02 2024.01.02
2 2 2015-02-03 2024.01.03
# A tibble: 2 × 3
ID birthdate datatime
<dbl> <date> <chr>
1 1 2015-02-01 2024.01.02
2 2 2015-03-02 2024.01.03
# A tibble: 2 × 3
ID birthdate datatime
<dbl> <date> <chr>
1 1 2001-02-15 2024.01.02
2 2 2002-03-15 2024.01.03
date
objectType | Code | Meaning | Example |
---|---|---|---|
Year | %Y |
4 digit year | 2021 |
%y |
2 digit year | 21 | |
Month | %m |
Number | 2 |
%b |
Abbreviated name | Feb | |
%B |
Full name | February | |
Day | %d |
One or two digits | 2 |
%e |
Two digits | 02 | |
Time | %H |
24-hour hour | 13 |
%I |
12-hour hour | 1 | |
%p |
AM/PM | pm | |
%M |
Minutes | 35 | |
%S |
Seconds | 45 | |
%OS |
Seconds with decimal component | 45.35 | |
%Z |
Time zone name | America/Chicago | |
%z |
Offset from UTC | +0800 | |
Other | %. |
Skip one non-digit | : |
%* |
Skip any number of non-digits |
Date
Type in base RDate
format:lubridate
provides functions to interpret and standardize date formats.
Parse dates with year, month, and day components
Formats can be ambiguous, lubridate
helps with appropriate parsing:
Date
POSIXct
Date
type variables contain year-month-day informationPOSIXct
class stores timestamps as seconds since epoch.ymd_hms()
to parse full date-time values:
[1] "2025-02-12 14:30:00 UTC"
lubridate
, there are various type of parsing functions that can parse the character based on the sequence of your date stringlibrary(nycflights13)
flights_datetime <- flights |>
select(year, month, day, hour, minute)
flights_datetime
# A tibble: 336,776 × 5
year month day hour minute
<int> <int> <int> <dbl> <dbl>
1 2013 1 1 5 15
2 2013 1 1 5 29
3 2013 1 1 5 40
4 2013 1 1 5 45
5 2013 1 1 6 0
6 2013 1 1 5 58
7 2013 1 1 6 0
8 2013 1 1 6 0
9 2013 1 1 6 0
10 2013 1 1 6 0
# ℹ 336,766 more rows
make_date()
for dates, or make_datetime()
for date-times:flights_datetime |>
mutate(departure_time = make_datetime(year, month, day, hour, minute),
departure_date = make_date(year, month, day))
# A tibble: 336,776 × 7
year month day hour minute departure_time departure_date
<int> <int> <int> <dbl> <dbl> <dttm> <date>
1 2013 1 1 5 15 2013-01-01 05:15:00 2013-01-01
2 2013 1 1 5 29 2013-01-01 05:29:00 2013-01-01
3 2013 1 1 5 40 2013-01-01 05:40:00 2013-01-01
4 2013 1 1 5 45 2013-01-01 05:45:00 2013-01-01
5 2013 1 1 6 0 2013-01-01 06:00:00 2013-01-01
6 2013 1 1 5 58 2013-01-01 05:58:00 2013-01-01
7 2013 1 1 6 0 2013-01-01 06:00:00 2013-01-01
8 2013 1 1 6 0 2013-01-01 06:00:00 2013-01-01
9 2013 1 1 6 0 2013-01-01 06:00:00 2013-01-01
10 2013 1 1 6 0 2013-01-01 06:00:00 2013-01-01
# ℹ 336,766 more rows
flights
, dep_time
and arr_time
represents the time with the format HHMM
or HMM
.
dep_time %/% 100
will be hoursdep_time %% 100
will be minutes## create a self-made function that can read in HMM time format
make_datetime_100 <- function(year, month, day, time) {
make_datetime(year, month, day, time %/% 100, time %% 100)
}
flights_dt <- flights |>
filter(!is.na(dep_time), !is.na(arr_time)) |> # remove missing date
mutate(
dep_time = make_datetime_100(year, month, day, dep_time),
sched_dep_time = make_datetime_100(year, month, day, sched_dep_time),
) |>
select(origin, dest, dep_time, sched_dep_time)
flights_dt
# A tibble: 328,063 × 4
origin dest dep_time sched_dep_time
<chr> <chr> <dttm> <dttm>
1 EWR IAH 2013-01-01 05:17:00 2013-01-01 05:15:00
2 LGA IAH 2013-01-01 05:33:00 2013-01-01 05:29:00
3 JFK MIA 2013-01-01 05:42:00 2013-01-01 05:40:00
4 JFK BQN 2013-01-01 05:44:00 2013-01-01 05:45:00
5 LGA ATL 2013-01-01 05:54:00 2013-01-01 06:00:00
6 EWR ORD 2013-01-01 05:54:00 2013-01-01 05:58:00
7 EWR FLL 2013-01-01 05:55:00 2013-01-01 06:00:00
8 LGA IAD 2013-01-01 05:57:00 2013-01-01 06:00:00
9 JFK MCO 2013-01-01 05:57:00 2013-01-01 06:00:00
10 LGA ORD 2013-01-01 05:58:00 2013-01-01 06:00:00
# ℹ 328,053 more rows
%within% interval(start, end)
to select a interval of two timestapsmonth()
and wday()
you can set label = TRUE to return the abbreviated name of the month or day of the weekyear() <-
, month() <-
, and hour() <-
to modify year, month, and hours of original date-time objectfloor_date()
, round_date()
, and ceiling_date()
are useful to adjusting our dates. Each function takes a vector of dates to adjust and then the name of the unit to round down (floor), round up (ceiling), or round to.Three important classes that represent time spans:
In R, when you subtract two dates, you get a difftime
object:
Time difference of 16592 days
[1] "difftime"
Tip
A difftime class object records a time span of seconds, minutes, hours, days, or weeks.
lubridate
package provides an alternative which always uses seconds: the duration.[1] "1433548800s (~45.43 years)"
[1] "60s (~1 minutes)"
[1] "7200s (~2 hours)"
[1] "345600s (~4 days)"
[1] "1209600s (~2 weeks)"
[1] "47336400s (~1.5 years)"
However, because durations represent an exact number of seconds, sometimes you might get an unexpected result:
[1] "2026-03-08 01:00:00 EST"
one_am + ddays(1) # Time changes because of the changes from EST (Eastern Standard Time) to EDT (Eastern Daylight Time)
[1] "2026-03-09 02:00:00 EDT"
Periods are time spans but don’t have a fixed length in seconds, instead they work with “human” times, like days and months.
date-time
that only controls printing. For example, these three objects represent the same instant in time:[1] "2024-06-01 12:00:00 EDT"
[1] "2024-06-01 18:00:00 CEST"
[1] "2024-06-02 04:00:00 NZST"
difftime
to calculate time different across different time zonesYou can see a complete list of time zones with OlsonNames()
.
[1] "2009-08-07 12:00:01 HKT"
What are Locales?
The settings related to the language and the regions in which computer program executes.
Locales define how dates, times, numbers, and character encodings are interpreted.
Key aspects include:
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
LC_TIME
: Controls date-time formatting.LC_NUMERIC
: Determines the decimal and grouping symbols.readr::locale()
readr
package allows setting locales while reading data.<locale>
Numbers: 123,456.78
Formats: %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days: Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed), Thursday
(Thu), Friday (Fri), Saturday (Sat)
Months: January (Jan), February (Feb), March (Mar), April (Apr), May (May),
June (Jun), July (Jul), August (Aug), September (Sep), October
(Oct), November (Nov), December (Dec)
AM/PM: AM/PM
[1] "2015-01-01"
[1] "1979-10-14"
[1] "1994-10-01"
See vignette("locales")
for more details
lubridate
simplifies parsing and manipulating date-time data.Date
and POSIXct
formats enables powerful analysis.ESRM 64503 - Lecture 06: Date/Time/TimeZone