Part 1


Load the package we will use in this lab.

Create some data to work with by running the following code chunk.


int_vect <- rep(seq(from = 1, to = 10), times = 3)
rand_vect <- sample(x = 1:30, size = 30, replace = TRUE)
TF_vect <- rep(c(TRUE, TRUE, FALSE), times = 10)
TF_vect2 <- rep(c("TRUE", "TRUE", "FALSE"), times = 10)


Determine the class of each of these new objects.

class(int_vect) # [1] "integer"
## [1] "integer"
class(rand_vect) # [1] "integer"
## [1] "integer"
class(TF_vect) # [1] "logical"
## [1] "logical"
class(TF_vect2) # [1] "character"
## [1] "character"


Are TF_vect and TF_vect2 different classes? Why or why not?

# Yes!
# Logical vectors do not have quotes around `TRUE` and `FALSE` values.


Create a tibble combining these vectors together called vect_data using the following code.

vect_data <- tibble(int_vect, rand_vect, TF_vect, TF_vect2)


Coerce rand_vect to character class using as.character(). Save this vector as rand_char_vect. How is the output for rand_vect and rand_char_vect different?

rand_char_vect <- as.character(rand_vect)
rand_char_vect # Numbers now have quotation marks
##  [1] "28" "16" "26" "22" "5"  "12" "15" "9"  "5"  "6"  "16" "4"  "2"  "7"  "22"
## [16] "26" "6"  "15" "14" "20" "14" "30" "24" "30" "4"  "4"  "21" "8"  "20" "24"


Read in the National Wastewater Surveillance System (NWSS) SARS-CoV-2 Wastewater data using the url link and the code provided.

The NWSS uses water from different sewage treatment plants to test for the SARS-CoV-2 virus, as a way to estimate how many COVID infections a community is experiencing.

sars_ww <- 
  read_csv(file = "")
Use the mutate() function to create a new column named date_formatted, based on the date_end column. Hint: use mdy() function. Reassign to sars_ww.

date_end: This is the last date of the sampling window. A sampling window is used to measure change in viral concentration.

# General format
sars_ww <- sars_ww %>% mutate(date_formatted = mdy(date_end))

Practice on Your Own!


Move the date_formatted variable to be before date_end using the relocate function. Take a look at the data using glimpse(). Note the difference between date_end and date_formatted columns.

# General format
NEWDATA <- OLD_DATA %>% relocate(COLUMN1, .before = COLUMN2)
sars_ww <- sars_ww %>% relocate(date_formatted, .before = date_end)

# alternative
# sars_ww <- sars_ww %>% select(date_end, date_formatted, everything()) %>% head() 

Use range() function on date_formatted variable to display the range of dates in the data set. How does this compare to that of date_end? Why? (Hint: use the pull function first to pull the values.)

pull(sars_ww, date_formatted) %>% range()
## [1] "2020-07-05" "2024-05-11"
pull(sars_ww, date_end) %>% range()
## [1] "1/1/2021" "9/9/2023"
# The max of `pull(sars_ww, date_end) %>% range()` is numerical not based on date.