We have already covered character and numeric types.
class(c("tree", "cloud", "stars_&_sky"))
## [1] "character"
class(c(1, 4, 7))
## [1] "numeric"
Character predominates if there are mixed classes.
class(c(1, 2, "tree"))
## [1] "character"
class(c("1", "4", "7"))
## [1] "character"
logical is a type that only has two possible elements: TRUE and FALSE
x <- c(TRUE, FALSE, TRUE, TRUE, FALSE) class(x)
## [1] "logical"
logical elements are NOT in quotes.
The class of the data tells R how to process the data.
For example, it determines whether you can make summary statistics (numbers) or if you can sort alphabetically (characters).
There is one useful functions associated with practically all R classes:
as.CLASS_NAME(x) coerces between classes. It turns x into a certain class.
Examples:
as.numeric()as.character()as.logical()Sometimes coercing works great!
as.character(4)
## [1] "4"
as.numeric(c("1", "4", "7"))
## [1] 1 4 7
as.logical(c("TRUE", "FALSE", "FALSE"))
## [1] TRUE FALSE FALSE
as.logical(0)
## [1] FALSE
When interpretation is ambiguous, R will return NA (an R constant representing “Not Available” i.e. missing value)
as.numeric(c("1", "4", "7a"))
## Warning: NAs introduced by coercion
## [1] 1 4 NA
as.logical(c("TRUE", "FALSE", "UNKNOWN"))
## [1] TRUE FALSE NA
What is one reason we might want to convert data to numeric?
A. So we can take the mean
B. So the data looks better
C. So our data is correct
There are two major number subclasses or types
Double is equivalent to numeric. It is a number that contains
fractional values . Can be any amount of places after the decimal.
Double stands for double-precision
For most purposes, the difference between integers and doubles doesn’t matter.
The num function of the tibble package can be used to change format. See here for more: https://tibble.tidyverse.org/articles/numbers.html
A factor is a special character vector where the elements have pre-defined groups or ‘levels’. You can think of these as qualitative or categorical variables. Order is often important.
Examples:
** We will learn more about factors in a later module. **
| Example | Class | Type | Notes |
|---|---|---|---|
| 1.1 | Numeric | double | default for numbers |
| 1 | integer | integer | Need to coerce to integer with as.integer() or use sample() or seq() with whole numbers |
| “FALSE”, “Ball” | Character | Character | Need quotes |
| FALSE, TRUE | logical | logical | No quotes |
| “Small”, “Large” | Factor | Factor | Need to coerce to factor with factor() |
There are two most popular R classes used when working with dates and times:
Date class representing a calendar datePOSIXct class representing a calendar date with hours, minutes, secondsWe convert data from character to Date/POSIXct to use functions to manipulate date/date and time
lubridate is a powerful, widely used R package from “tidyverse” family to work with Date / POSIXct class objects
Date class objectclass("2021-06-15")
## [1] "character"
library(lubridate)
x <- ymd("2021-06-15") # lubridate package Year Month Day
class(x)
## [1] "Date"
Note for function ymd: year month day
a <- ymd("2021-06-15")
b <- ymd("2021-06-18")
a - b
## Time difference of -3 days
mdy("06/15/2021")
## [1] "2021-06-15"
dmy("15-June-2021")
## [1] "2021-06-15"
ymd("2021-06-15")
## [1] "2021-06-15"
Here’s a dataset on the SARS-CoV-2 viral load measured in wastewater between 2022 and 2024, collected by the collected by the National Wastewater Surveillance System.
Let’s look at the date_start variable, the first date of the sampling window.
sars_ww <-
read_csv("https://daseh.org/data/SARS-CoV-2_Wastewater_Data.csv")
# Selecting a few columns for easy viewing
sars_ww <- sars_ww %>% select(town_name, date_start)
Notice that date_start is chr class, not date.
sars_ww
## # A tibble: 2,813 × 2 ## town_name date_start ## <chr> <chr> ## 1 Barry 6/21/2020 ## 2 Barry 6/22/2020 ## 3 Barry 6/23/2020 ## 4 Barry 6/24/2020 ## 5 Barry 6/25/2020 ## 6 Barry 6/26/2020 ## 7 Barry 6/27/2020 ## 8 Barry 6/28/2020 ## 9 Barry 6/29/2020 ## 10 Barry 6/30/2020 ## # ℹ 2,803 more rows
We would need to use mutate() to help us modify that column.
sars_ww %>% mutate(date_start_fixed = mdy(date_start))
## # A tibble: 2,813 × 3 ## town_name date_start date_start_fixed ## <chr> <chr> <date> ## 1 Barry 6/21/2020 2020-06-21 ## 2 Barry 6/22/2020 2020-06-22 ## 3 Barry 6/23/2020 2020-06-23 ## 4 Barry 6/24/2020 2020-06-24 ## 5 Barry 6/25/2020 2020-06-25 ## 6 Barry 6/26/2020 2020-06-26 ## 7 Barry 6/27/2020 2020-06-27 ## 8 Barry 6/28/2020 2020-06-28 ## 9 Barry 6/29/2020 2020-06-29 ## 10 Barry 6/30/2020 2020-06-30 ## # ℹ 2,803 more rows
Two-dimensional classes are those we would often use to store data read from a file
a data frame (data.frame or tibble class)
a matrix (matrix class)
data.frame or tibble, the entire matrix is composed of one R classnumeric, or all entries are characterlists.list()mylist <- list(c("A", "b", "c"), c(1, 2, 3))
mylist
## [[1]] ## [1] "A" "b" "c" ## ## [[2]] ## [1] 1 2 3
class(mylist)
## [1] "list"
as.numeric() or as.character()Date class using ymd(), mdy() functions from lubridate packageDate or POSIXct class variables or pull out aspects like year💻 Lab
For more advanced learning: see the extra slides in this file!
Image by Gerd Altmann from Pixabay
as.matrix() creates a matrix from a data frame or tibble (where all values are the same class).
matrix() creates a matrix from scratch.
matrix(1:6, ncol = 2)
## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6
List elements can be named
mylist_named <- list(
letters = c("A", "b", "c"),
numbers = c(1, 2, 3),
one_matrix = matrix(1:4, ncol = 2)
)
mylist_named
## $letters ## [1] "A" "b" "c" ## ## $numbers ## [1] 1 2 3 ## ## $one_matrix ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4
lubridate to manipulate Date objectsx <- ymd(c("2021-06-15", "2021-07-15"))
x
## [1] "2021-06-15" "2021-07-15"
day(x) # see also: month(x) , year(x)
## [1] 15 15
x + days(10)
## [1] "2021-06-25" "2021-07-25"
x + months(1) + days(10)
## [1] "2021-07-25" "2021-08-25"
wday(x, label = TRUE)
## [1] Tue Thu ## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
lubridate to manipulate POSIXct objectsx <- ymd_hms("2013-01-24 19:39:07")
x
## [1] "2013-01-24 19:39:07 UTC"
date(x)
## [1] "2013-01-24"
x + hours(3)
## [1] "2013-01-24 22:39:07 UTC"
floor_date(x, "1 hour") # see also: ceiling_date()
## [1] "2013-01-24 19:00:00 UTC"
x1 <- ymd(c("2021-06-15"))
x2 <- ymd(c("2021-07-15"))
difftime(x2, x1, units = "weeks")
## Time difference of 4.285714 weeks
as.numeric(difftime(x2, x1, units = "weeks"))
## [1] 4.285714
Similar can be done with time (e.g. difference in hours).
n <- 1:9 n
## [1] 1 2 3 4 5 6 7 8 9
mat <- matrix(n, nrow = 3) mat
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
To get element(s) of a vector (one-dimensional object):
[ ]x <- c("a", "b", "c", "d", "e", "f", "g", "h")
x
## [1] "a" "b" "c" "d" "e" "f" "g" "h"
x[2]
## [1] "b"
x[c(1, 2, 100)]
## [1] "a" "b" NA
Note you cannot use dplyr functions (like select) on matrices. To subset matrix rows and/or columns, use matrix[row_index, column_index].
mat
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
mat[1, 1] # individual entry: row 1, column 1
## [1] 1
mat[1, 2] # individual entry: row 1, column 2
## [1] 4
mat[1, ] # first row
## [1] 1 4 7
mat[, 1] # first column
## [1] 1 2 3
mat[c(1, 2), c(2, 3)] # subset of original matrix: two rows and two columns
## [,1] [,2] ## [1,] 4 7 ## [2,] 5 8
You can reference data from list using $ (if elements are named) or using [[ ]]
mylist_named[[1]]
## [1] "A" "b" "c"
mylist_named[["letters"]] # works only for a list with elements' names
## [1] "A" "b" "c"
mylist_named$letters # works only for a list with elements' names
## [1] "A" "b" "c"