One dimensional vectors

Character and numeric

We have already covered character and numeric types.

class(c("tree", "cloud", "stars_&_sky"))
## [1] "character"
class(c(1, 4, 7))
## [1] "numeric"

Character and numeric

Character predominates if there are mixed classes.

class(c(1, 2, "tree"))
## [1] "character"
class(c("1", "4", "7"))
## [1] "character"

Logical

logical is a type that only has two possible elements: TRUE and FALSE

x <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
class(x)
## [1] "logical"

logical elements are NOT in quotes.

General Class Information

There is one useful functions associated with practically all R classes:

as.CLASS_NAME(x) coerces between classes. It turns x into a certain class.

Examples:

  • as.numeric()
  • as.character()
  • as.logical()

Coercing: seamless transition

Sometimes coercing works great!

as.character(4)
## [1] "4"
as.numeric(c("1", "4", "7"))
## [1] 1 4 7
as.logical(c("TRUE", "FALSE", "FALSE"))
## [1]  TRUE FALSE FALSE
as.logical(0)
## [1] FALSE

Coercing: not-so-seamless

When interpretation is ambiguous, R will return NA (an R constant representing “Not Available” i.e. missing value)

as.numeric(c("1", "4", "7a"))
## Warning: NAs introduced by coercion
## [1]  1  4 NA
as.logical(c("TRUE", "FALSE", "UNKNOWN"))
## [1]  TRUE FALSE    NA

Number Subclasses

There are two major number subclasses or types

  1. Double (1.003)
  2. Integer (1)

Number Subclasses

Double is equivalent to numeric. It is a number that contains fractional values . Can be any amount of places after the decimal.

Double stands for double-precision

For most purposes, the difference between integers and doubles doesn’t matter.

Significant figures and other formats

Factors

A factor is a special character vector where the elements have pre-defined groups or ‘levels’. You can think of these as qualitative or categorical variables. Order is often important.

Examples:

  • red, orange, yellow, green, blue, purple
  • breakfast, lunch, dinner
  • baby, toddler, child, teen, adult
  • Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree
  • beginner, novice, intermediate, expert

** We will learn more about factors in a later module. **

Classes Overview

Example Class Type Notes
1.1 Numeric double default for numbers
1 integer integer Need to coerce to integer with as.integer() or use sample() or seq() with whole numbers
“FALSE”, “Ball” Character Character Need quotes
FALSE, TRUE logical logical No quotes
“Small”, “Large” Factor Factor Need to coerce to factor with factor()

Special data classes

Dates

There are two most popular R classes used when working with dates and times:

  • Date class representing a calendar date
  • POSIXct class representing a calendar date with hours, minutes, seconds

We convert data from character to Date/POSIXct to use functions to manipulate date/date and time

lubridate is a powerful, widely used R package from “tidyverse” family to work with Date / POSIXct class objects

Creating Date class object

class("2021-06-15")
## [1] "character"
library(lubridate)

x <- ymd("2021-06-15") # lubridate package Year Month Day
class(x)
## [1] "Date"

Note for function ymd: year month day

Dates are useful!

a <- ymd("2021-06-15")
b <- ymd("2021-06-18")
a - b
## Time difference of -3 days

The function matches the format

mdy("06/15/2021")
## [1] "2021-06-15"
dmy("15-June-2021")
## [1] "2021-06-15"
ymd("2021-06-15")
## [1] "2021-06-15"

Class conversion in with a dataset

library(dasehr)
covidww <- covid_wastewater %>% select(contains("date"))
head(covidww)
## # A tibble: 6 × 3
##   date_start date_end  first_sample_date
##   <chr>      <chr>     <chr>            
## 1 6/21/2020  7/5/2020  7/5/2020         
## 2 6/22/2020  7/6/2020  7/5/2020         
## 3 6/23/2020  7/7/2020  7/5/2020         
## 4 6/24/2020  7/8/2020  7/5/2020         
## 5 6/25/2020  7/9/2020  7/5/2020         
## 6 6/26/2020  7/10/2020 7/5/2020

Class conversion in with a dataset

We would need to use mutate to help us modify that column.

covidww %>% 
  mutate(date_start = mdy(date_start))
## # A tibble: 776,059 × 3
##    date_start date_end  first_sample_date
##    <date>     <chr>     <chr>            
##  1 2020-06-21 7/5/2020  7/5/2020         
##  2 2020-06-22 7/6/2020  7/5/2020         
##  3 2020-06-23 7/7/2020  7/5/2020         
##  4 2020-06-24 7/8/2020  7/5/2020         
##  5 2020-06-25 7/9/2020  7/5/2020         
##  6 2020-06-26 7/10/2020 7/5/2020         
##  7 2020-06-27 7/11/2020 7/5/2020         
##  8 2020-06-28 7/12/2020 7/5/2020         
##  9 2020-06-29 7/13/2020 7/5/2020         
## 10 2020-06-30 7/14/2020 7/5/2020         
## # ℹ 776,049 more rows

Other two-dimensional data classes

Two-dimensional data classes

Two-dimensional classes are those we would often use to store data read from a file

  • a data frame (data.frame or tibble class)

  • a matrix (matrix class)

    • also composed of rows and columns
    • unlike data.frame or tibble, the entire matrix is composed of one R class
    • for example: all entries are numeric, or all entries are character

Lists

  • One other data type that is the most generic are lists.
  • Can hold vectors, strings, matrices, models, list of other list!
  • Lists are used when you need to do something repeatedly across lots of data - for example wrangling several similar files at once
  • Lists are a bit more advanced but you may encounter them when you work with others or look up solutions

Making Lists

  • Can be created using list()
mylist <- list(c("A", "b", "c"), c(1, 2, 3))
mylist
## [[1]]
## [1] "A" "b" "c"
## 
## [[2]]
## [1] 1 2 3
class(mylist)
## [1] "list"

Summary

  • two dimensional object classes include: data frames, tibbles, matrices, and lists
  • matrix has columns and rows but is all one data class
  • lists can contain multiples of any other class of data including lists!
  • calendar dates can be represented with the Date class using ymd(), mdy() functions from lubridate package
  • Make sure you choose the right function for the way the date is formatted!
  • POSIXct class representing a calendar date with hours, minutes, seconds. Can use ymd_hms() or ymd_hm() or ymd_h()functions from the lubridate package
  • can then easily subtract Date or POSIXct class variables or pull out aspects like year

Lab Part 1

Extra Slides

Matrices

as.matrix() creates a matrix from a data frame or tibble (where all values are the same class).

covidww_mat <- select(covidww, contains("Missouri")) %>%
  head(n = 3)
covidww_mat
## # A tibble: 3 × 0
as.matrix(covidww_mat)
##     
## [1,]
## [2,]
## [3,]

Matrices

matrix() creates a matrix from scratch.

matrix(1:6, ncol = 2)
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

More about Lists

List elements can be named

mylist_named <- list(
  letters = c("A", "b", "c"),
  numbers = c(1, 2, 3),
  one_matrix = matrix(1:4, ncol = 2)
)
mylist_named
## $letters
## [1] "A" "b" "c"
## 
## $numbers
## [1] 1 2 3
## 
## $one_matrix
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

Some useful functions from lubridate to manipulate Date objects

x <- ymd(c("2021-06-15", "2021-07-15"))
x
## [1] "2021-06-15" "2021-07-15"
day(x) # see also: month(x) , year(x)
## [1] 15 15
x + days(10)
## [1] "2021-06-25" "2021-07-25"
x + months(1) + days(10)
## [1] "2021-07-25" "2021-08-25"
wday(x, label = TRUE)
## [1] Tue Thu
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

Some useful functions from lubridate to manipulate POSIXct objects

x <- ymd_hms("2013-01-24 19:39:07")
x
## [1] "2013-01-24 19:39:07 UTC"
date(x)
## [1] "2013-01-24"
x + hours(3)
## [1] "2013-01-24 22:39:07 UTC"
floor_date(x, "1 hour") # see also: ceiling_date()
## [1] "2013-01-24 19:00:00 UTC"

Differences in dates

x1 <- ymd(c("2021-06-15"))
x2 <- ymd(c("2021-07-15"))

difftime(x2, x1, units = "weeks")
## Time difference of 4.285714 weeks
as.numeric(difftime(x2, x1, units = "weeks"))
## [1] 4.285714

Similar can be done with time (e.g. difference in hours).

Data Selection

Matrices

n <- 1:9
n
## [1] 1 2 3 4 5 6 7 8 9
mat <- matrix(n, nrow = 3)
mat
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Vectors: data selection

To get element(s) of a vector (one-dimensional object):

  • Type the name of the variable and open the rectangular brackets [ ]
  • In the rectangular brackets, type index (/vector of indexes) of element (/elements) you want to pull. In R, indexes start from 1 (not: 0)
x <- c("a", "b", "c", "d", "e", "f", "g", "h")
x
## [1] "a" "b" "c" "d" "e" "f" "g" "h"
x[2]
## [1] "b"
x[c(1, 2, 100)]
## [1] "a" "b" NA

Matrices: data selection

Note you cannot use dplyr functions (like select) on matrices. To subset matrix rows and/or columns, use matrix[row_index, column_index].

mat
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
mat[1, 1] # individual entry: row 1, column 1
## [1] 1
mat[1, 2] # individual entry: row 1, column 2
## [1] 4
mat[1, ] # first row
## [1] 1 4 7
mat[, 1] # first column
## [1] 1 2 3
mat[c(1, 2), c(2, 3)] # subset of original matrix: two rows and two columns
##      [,1] [,2]
## [1,]    4    7
## [2,]    5    8

Lists: data selection

You can reference data from list using $ (if elements are named) or using [[ ]]

mylist_named[[1]]
## [1] "A" "b" "c"
mylist_named[["letters"]] # works only for a list with elements' names
## [1] "A" "b" "c"
mylist_named$letters # works only for a list with elements' names
## [1] "A" "b" "c"