We have already covered character
and numeric
types.
class(c("tree", "cloud", "stars_&_sky"))
## [1] "character"
class(c(1, 4, 7))
## [1] "numeric"
Character predominates if there are mixed classes.
class(c(1, 2, "tree"))
## [1] "character"
class(c("1", "4", "7"))
## [1] "character"
logical
is a type that only has two possible elements: TRUE
and FALSE
x <- c(TRUE, FALSE, TRUE, TRUE, FALSE) class(x)
## [1] "logical"
logical
elements are NOT in quotes.
The class of the data tells R how to process the data.
For example, it determines whether you can make summary statistics (numbers) or if you can sort alphabetically (characters).
There is one useful functions associated with practically all R classes:
as.CLASS_NAME(x)
coerces between classes. It turns x
into a certain class.
Examples:
as.numeric()
as.character()
as.logical()
Sometimes coercing works great!
as.character(4)
## [1] "4"
as.numeric(c("1", "4", "7"))
## [1] 1 4 7
as.logical(c("TRUE", "FALSE", "FALSE"))
## [1] TRUE FALSE FALSE
as.logical(0)
## [1] FALSE
When interpretation is ambiguous, R will return NA
(an R constant representing “Not Available” i.e. missing value)
as.numeric(c("1", "4", "7a"))
## Warning: NAs introduced by coercion
## [1] 1 4 NA
as.logical(c("TRUE", "FALSE", "UNKNOWN"))
## [1] TRUE FALSE NA
What is one reason we might want to convert data to numeric?
A. So we can take the mean
B. So the data looks better
C. So our data is correct
There are two major number subclasses or types
Double
is equivalent to numeric
. It is a number that contains
fractional values . Can be any amount of places after the decimal.
Double
stands for double-precision
For most purposes, the difference between integers and doubles doesn’t matter.
The num
function of the tibble
package can be used to change format. See here for more: https://tibble.tidyverse.org/articles/numbers.html
A factor
is a special character
vector where the elements have pre-defined groups or ‘levels’. You can think of these as qualitative or categorical variables. Order is often important.
Examples:
** We will learn more about factors in a later module. **
Example | Class | Type | Notes |
---|---|---|---|
1.1 | Numeric | double | default for numbers |
1 | integer | integer | Need to coerce to integer with as.integer() or use sample() or seq() with whole numbers |
“FALSE”, “Ball” | Character | Character | Need quotes |
FALSE, TRUE | logical | logical | No quotes |
“Small”, “Large” | Factor | Factor | Need to coerce to factor with factor() |
There are two most popular R classes used when working with dates and times:
Date
class representing a calendar datePOSIXct
class representing a calendar date with hours, minutes, secondsWe convert data from character to Date
/POSIXct
to use functions to manipulate date/date and time
lubridate
is a powerful, widely used R package from “tidyverse” family to work with Date
/ POSIXct
class objects
Date
class objectclass("2021-06-15")
## [1] "character"
library(lubridate) x <- ymd("2021-06-15") # lubridate package Year Month Day class(x)
## [1] "Date"
Note for function ymd
: year month day
a <- ymd("2021-06-15") b <- ymd("2021-06-18") a - b
## Time difference of -3 days
mdy("06/15/2021")
## [1] "2021-06-15"
dmy("15-June-2021")
## [1] "2021-06-15"
ymd("2021-06-15")
## [1] "2021-06-15"
Here’s a dataset on the SARS-CoV-2 viral load measured in wastewater between 2022 and 2024, collected by the collected by the National Wastewater Surveillance System.
Let’s look at the date_start variable, the first date of the sampling window.
sars_ww <- read_csv("../../data/SARS-CoV-2_Wastewater_Data.csv") # Selecting a few columns for easy viewing sars_ww <- sars_ww %>% select(town_name, date_start)
Notice that date_start
is chr
class, not date
.
sars_ww
## # A tibble: 2,813 × 2 ## town_name date_start ## <chr> <chr> ## 1 Barry 6/21/2020 ## 2 Barry 6/22/2020 ## 3 Barry 6/23/2020 ## 4 Barry 6/24/2020 ## 5 Barry 6/25/2020 ## 6 Barry 6/26/2020 ## 7 Barry 6/27/2020 ## 8 Barry 6/28/2020 ## 9 Barry 6/29/2020 ## 10 Barry 6/30/2020 ## # ℹ 2,803 more rows
We would need to use mutate()
to help us modify that column.
sars_ww %>% mutate(date_start_fixed = mdy(date_start))
## # A tibble: 2,813 × 3 ## town_name date_start date_start_fixed ## <chr> <chr> <date> ## 1 Barry 6/21/2020 2020-06-21 ## 2 Barry 6/22/2020 2020-06-22 ## 3 Barry 6/23/2020 2020-06-23 ## 4 Barry 6/24/2020 2020-06-24 ## 5 Barry 6/25/2020 2020-06-25 ## 6 Barry 6/26/2020 2020-06-26 ## 7 Barry 6/27/2020 2020-06-27 ## 8 Barry 6/28/2020 2020-06-28 ## 9 Barry 6/29/2020 2020-06-29 ## 10 Barry 6/30/2020 2020-06-30 ## # ℹ 2,803 more rows
Two-dimensional classes are those we would often use to store data read from a file
a data frame (data.frame
or tibble
class)
a matrix (matrix
class)
data.frame
or tibble
, the entire matrix is composed of one R classnumeric
, or all entries are character
lists
.list()
mylist <- list(c("A", "b", "c"), c(1, 2, 3)) mylist
## [[1]] ## [1] "A" "b" "c" ## ## [[2]] ## [1] 1 2 3
class(mylist)
## [1] "list"
as.numeric()
or as.character()
Date
class using ymd()
, mdy()
functions from lubridate
packageDate
or POSIXct
class variables or pull out aspects like year💻 Lab
For more advanced learning: see the extra slides in this file!
Image by Gerd Altmann from Pixabay
as.matrix()
creates a matrix from a data frame or tibble (where all values are the same class).
matrix()
creates a matrix from scratch.
matrix(1:6, ncol = 2)
## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6
List elements can be named
mylist_named <- list( letters = c("A", "b", "c"), numbers = c(1, 2, 3), one_matrix = matrix(1:4, ncol = 2) ) mylist_named
## $letters ## [1] "A" "b" "c" ## ## $numbers ## [1] 1 2 3 ## ## $one_matrix ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4
lubridate
to manipulate Date
objectsx <- ymd(c("2021-06-15", "2021-07-15")) x
## [1] "2021-06-15" "2021-07-15"
day(x) # see also: month(x) , year(x)
## [1] 15 15
x + days(10)
## [1] "2021-06-25" "2021-07-25"
x + months(1) + days(10)
## [1] "2021-07-25" "2021-08-25"
wday(x, label = TRUE)
## [1] Tue Thu ## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
lubridate
to manipulate POSIXct
objectsx <- ymd_hms("2013-01-24 19:39:07") x
## [1] "2013-01-24 19:39:07 UTC"
date(x)
## [1] "2013-01-24"
x + hours(3)
## [1] "2013-01-24 22:39:07 UTC"
floor_date(x, "1 hour") # see also: ceiling_date()
## [1] "2013-01-24 19:00:00 UTC"
x1 <- ymd(c("2021-06-15")) x2 <- ymd(c("2021-07-15")) difftime(x2, x1, units = "weeks")
## Time difference of 4.285714 weeks
as.numeric(difftime(x2, x1, units = "weeks"))
## [1] 4.285714
Similar can be done with time (e.g. difference in hours).
n <- 1:9 n
## [1] 1 2 3 4 5 6 7 8 9
mat <- matrix(n, nrow = 3) mat
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
To get element(s) of a vector (one-dimensional object):
[ ]
x <- c("a", "b", "c", "d", "e", "f", "g", "h") x
## [1] "a" "b" "c" "d" "e" "f" "g" "h"
x[2]
## [1] "b"
x[c(1, 2, 100)]
## [1] "a" "b" NA
Note you cannot use dplyr
functions (like select
) on matrices. To subset matrix rows and/or columns, use matrix[row_index, column_index]
.
mat
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
mat[1, 1] # individual entry: row 1, column 1
## [1] 1
mat[1, 2] # individual entry: row 1, column 2
## [1] 4
mat[1, ] # first row
## [1] 1 4 7
mat[, 1] # first column
## [1] 1 2 3
mat[c(1, 2), c(2, 3)] # subset of original matrix: two rows and two columns
## [,1] [,2] ## [1,] 4 7 ## [2,] 5 8
You can reference data from list using $
(if elements are named) or using [[ ]]
mylist_named[[1]]
## [1] "A" "b" "c"
mylist_named[["letters"]] # works only for a list with elements' names
## [1] "A" "b" "c"
mylist_named$letters # works only for a list with elements' names
## [1] "A" "b" "c"