We have already covered character
and numeric
types.
class(c("tree", "cloud", "stars_&_sky"))
## [1] "character"
class(c(1, 4, 7))
## [1] "numeric"
Character predominates if there are mixed classes.
class(c(1, 2, "tree"))
## [1] "character"
class(c("1", "4", "7"))
## [1] "character"
logical
is a type that only has two possible elements: TRUE
and FALSE
x <- c(TRUE, FALSE, TRUE, TRUE, FALSE) class(x)
## [1] "logical"
logical
elements are NOT in quotes.
There is one useful functions associated with practically all R classes:
as.CLASS_NAME(x)
coerces between classes. It turns x
into a certain class.
Examples:
as.numeric()
as.character()
as.logical()
Sometimes coercing works great!
as.character(4)
## [1] "4"
as.numeric(c("1", "4", "7"))
## [1] 1 4 7
as.logical(c("TRUE", "FALSE", "FALSE"))
## [1] TRUE FALSE FALSE
as.logical(0)
## [1] FALSE
When interpretation is ambiguous, R will return NA
(an R constant representing “Not Available” i.e. missing value)
as.numeric(c("1", "4", "7a"))
## Warning: NAs introduced by coercion
## [1] 1 4 NA
as.logical(c("TRUE", "FALSE", "UNKNOWN"))
## [1] TRUE FALSE NA
There are two major number subclasses or types
Double
is equivalent to numeric
. It is a number that contains
fractional values . Can be any amount of places after the decimal.
Double
stands for double-precision
For most purposes, the difference between integers and doubles doesn’t matter.
The num
function of the tibble
package can be used to change format. See here for more: https://tibble.tidyverse.org/articles/numbers.html
A factor
is a special character
vector where the elements have pre-defined groups or ‘levels’. You can think of these as qualitative or categorical variables. Order is often important.
Examples:
** We will learn more about factors in a later module. **
Example | Class | Type | Notes |
---|---|---|---|
1.1 | Numeric | double | default for numbers |
1 | integer | integer | Need to coerce to integer with as.integer() or use sample() or seq() with whole numbers |
“FALSE”, “Ball” | Character | Character | Need quotes |
FALSE, TRUE | logical | logical | No quotes |
“Small”, “Large” | Factor | Factor | Need to coerce to factor with factor() |
There are two most popular R classes used when working with dates and times:
Date
class representing a calendar datePOSIXct
class representing a calendar date with hours, minutes, secondsWe convert data from character to Date
/POSIXct
to use functions to manipulate date/date and time
lubridate
is a powerful, widely used R package from “tidyverse” family to work with Date
/ POSIXct
class objects
Date
class objectclass("2021-06-15")
## [1] "character"
library(lubridate) x <- ymd("2021-06-15") # lubridate package Year Month Day class(x)
## [1] "Date"
Note for function ymd
: year month day
a <- ymd("2021-06-15") b <- ymd("2021-06-18") a - b
## Time difference of -3 days
mdy("06/15/2021")
## [1] "2021-06-15"
dmy("15-June-2021")
## [1] "2021-06-15"
ymd("2021-06-15")
## [1] "2021-06-15"
library(dasehr) covidww <- covid_wastewater %>% select(contains("date"))
head(covidww)
## # A tibble: 6 × 3 ## date_start date_end first_sample_date ## <chr> <chr> <chr> ## 1 6/21/2020 7/5/2020 7/5/2020 ## 2 6/22/2020 7/6/2020 7/5/2020 ## 3 6/23/2020 7/7/2020 7/5/2020 ## 4 6/24/2020 7/8/2020 7/5/2020 ## 5 6/25/2020 7/9/2020 7/5/2020 ## 6 6/26/2020 7/10/2020 7/5/2020
We would need to use mutate to help us modify that column.
covidww %>% mutate(date_start = mdy(date_start))
## # A tibble: 776,059 × 3 ## date_start date_end first_sample_date ## <date> <chr> <chr> ## 1 2020-06-21 7/5/2020 7/5/2020 ## 2 2020-06-22 7/6/2020 7/5/2020 ## 3 2020-06-23 7/7/2020 7/5/2020 ## 4 2020-06-24 7/8/2020 7/5/2020 ## 5 2020-06-25 7/9/2020 7/5/2020 ## 6 2020-06-26 7/10/2020 7/5/2020 ## 7 2020-06-27 7/11/2020 7/5/2020 ## 8 2020-06-28 7/12/2020 7/5/2020 ## 9 2020-06-29 7/13/2020 7/5/2020 ## 10 2020-06-30 7/14/2020 7/5/2020 ## # ℹ 776,049 more rows
Two-dimensional classes are those we would often use to store data read from a file
a data frame (data.frame
or tibble
class)
a matrix (matrix
class)
data.frame
or tibble
, the entire matrix is composed of one R classnumeric
, or all entries are character
lists
.list()
mylist <- list(c("A", "b", "c"), c(1, 2, 3)) mylist
## [[1]] ## [1] "A" "b" "c" ## ## [[2]] ## [1] 1 2 3
class(mylist)
## [1] "list"
Date
class using ymd()
, mdy()
functions from lubridate
packagePOSIXct
class representing a calendar date with hours, minutes, seconds. Can use ymd_hms()
or ymd_hm()
or ymd_h()
functions from the lubridate
packageDate
or POSIXct
class variables or pull out aspects like year💻 Lab
Image by Gerd Altmann from Pixabay
as.matrix()
creates a matrix from a data frame or tibble (where all values are the same class).
covidww_mat <- select(covidww, contains("Missouri")) %>% head(n = 3) covidww_mat
## # A tibble: 3 × 0
as.matrix(covidww_mat)
## ## [1,] ## [2,] ## [3,]
matrix()
creates a matrix from scratch.
matrix(1:6, ncol = 2)
## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6
List elements can be named
mylist_named <- list( letters = c("A", "b", "c"), numbers = c(1, 2, 3), one_matrix = matrix(1:4, ncol = 2) ) mylist_named
## $letters ## [1] "A" "b" "c" ## ## $numbers ## [1] 1 2 3 ## ## $one_matrix ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4
lubridate
to manipulate Date
objectsx <- ymd(c("2021-06-15", "2021-07-15")) x
## [1] "2021-06-15" "2021-07-15"
day(x) # see also: month(x) , year(x)
## [1] 15 15
x + days(10)
## [1] "2021-06-25" "2021-07-25"
x + months(1) + days(10)
## [1] "2021-07-25" "2021-08-25"
wday(x, label = TRUE)
## [1] Tue Thu ## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
lubridate
to manipulate POSIXct
objectsx <- ymd_hms("2013-01-24 19:39:07") x
## [1] "2013-01-24 19:39:07 UTC"
date(x)
## [1] "2013-01-24"
x + hours(3)
## [1] "2013-01-24 22:39:07 UTC"
floor_date(x, "1 hour") # see also: ceiling_date()
## [1] "2013-01-24 19:00:00 UTC"
x1 <- ymd(c("2021-06-15")) x2 <- ymd(c("2021-07-15")) difftime(x2, x1, units = "weeks")
## Time difference of 4.285714 weeks
as.numeric(difftime(x2, x1, units = "weeks"))
## [1] 4.285714
Similar can be done with time (e.g. difference in hours).
n <- 1:9 n
## [1] 1 2 3 4 5 6 7 8 9
mat <- matrix(n, nrow = 3) mat
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
To get element(s) of a vector (one-dimensional object):
[ ]
x <- c("a", "b", "c", "d", "e", "f", "g", "h") x
## [1] "a" "b" "c" "d" "e" "f" "g" "h"
x[2]
## [1] "b"
x[c(1, 2, 100)]
## [1] "a" "b" NA
Note you cannot use dplyr
functions (like select
) on matrices. To subset matrix rows and/or columns, use matrix[row_index, column_index]
.
mat
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
mat[1, 1] # individual entry: row 1, column 1
## [1] 1
mat[1, 2] # individual entry: row 1, column 2
## [1] 4
mat[1, ] # first row
## [1] 1 4 7
mat[, 1] # first column
## [1] 1 2 3
mat[c(1, 2), c(2, 3)] # subset of original matrix: two rows and two columns
## [,1] [,2] ## [1,] 4 7 ## [2,] 5 8
You can reference data from list using $
(if elements are named) or using [[ ]]
mylist_named[[1]]
## [1] "A" "b" "c"
mylist_named[["letters"]] # works only for a list with elements' names
## [1] "A" "b" "c"
mylist_named$letters # works only for a list with elements' names
## [1] "A" "b" "c"