Load all the packages we will use in this lab.
library(tidyverse)
library(dasehr)
Create some data to work with by running the following code chunk.
set.seed(1234)
int_vect <- rep(seq(from = 1, to = 10), times = 3)
rand_vect <- sample(x = 1:30, size = 30, replace = TRUE)
TF_vect <- rep(c(TRUE, TRUE, FALSE), times = 10)
TF_vect2 <- rep(c("TRUE", "TRUE", "FALSE"), times = 10)
Determine the class of each of these new objects.
class(int_vect) # [1] "integer"
## [1] "integer"
class(rand_vect) # [1] "integer"
## [1] "integer"
class(TF_vect) # [1] "logical"
## [1] "logical"
class(TF_vect2) # [1] "character"
## [1] "character"
Are TF_vect
and TF_vect2
different classes?
Why or why not?
# Yes!
# Logical vectors do not have quotes around `TRUE` and `FALSE` values.
Create a tibble combining these vectors together called
vect_data
using the following code.
vect_data <- tibble(int_vect, rand_vect, TF_vect, TF_vect2)
Coerce rand_vect
to character class using
as.character()
. Save this vector as
rand_char_vect
. How is the output for
rand_vect
and rand_char_vect
different?
rand_char_vect <- as.character(rand_vect)
rand_char_vect # Numbers now have quotation marks
## [1] "28" "16" "26" "22" "5" "12" "15" "9" "5" "6" "16" "4" "2" "7" "22"
## [16] "26" "6" "15" "14" "20" "14" "30" "24" "30" "4" "4" "21" "8" "20" "24"
Read in the National Wastewater Surveillance System (NWSS) SARS-CoV-2
Wastewater data from dasehr
package using the code supplied
in the chunk. Alternatively using the url link.
The NWSS uses water from different sewage treatment plants to test for covid, as a way to estimate how many covid infections a community is experiencing.
covidww <- covid_wastewater
# covidww <- read_csv(file = "https://daseh.org/data/SARS-CoV-2_Wastewater_Data.csv")
Use the mutate()
function to create a new column named
date_formatted
that is of first_sample_date
class. The new variable is created from date
column. Hint:
use mdy()
function. Reassign to covidww
.
# General format
NEWDATA <- OLD_DATA %>% mutate(NEW_COLUMN = OLD_COLUMN)
covidww <- covidww %>% mutate(date_formatted = mdy(first_sample_date))
Move the date_formatted
variable to be before
first_sample_date
using the relocate
function.
Take a look at the data using glimpse()
. Note the
difference between first_sample_date
and
date_formatted
columns.
# General format
NEWDATA <- OLD_DATA %>% relocate(COLUMN1, .before = COLUMN2)
covidww <- covidww %>% relocate(date_formatted, .before = first_sample_date)
# alternative
# covidww <- covidww %>% select(first_sample_date, date_formatted, everything()) %>% head()
glimpse(covidww)
## Rows: 776,059
## Columns: 13
## $ reporting_jurisdiction <chr> "Missouri", "Missouri", "Missouri", "Missouri",…
## $ sample_location <chr> "Treatment plant", "Treatment plant", "Treatmen…
## $ key_plot_id <chr> "NWSS_mo_259_Treatment plant_raw wastewater", "…
## $ county_names <chr> "Barry,Lawrence", "Barry,Lawrence", "Barry,Lawr…
## $ population_served <dbl> 9100, 9100, 9100, 9100, 9100, 9100, 9100, 9100,…
## $ date_start <chr> "6/21/2020", "6/22/2020", "6/23/2020", "6/24/20…
## $ date_end <chr> "7/5/2020", "7/6/2020", "7/7/2020", "7/8/2020",…
## $ rna_pct_change_15d <dbl> NA, NA, NA, NA, NA, NA, NA, 3683, 3683, 3683, 3…
## $ pos_PCR_prop_15d <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 10…
## $ percentile <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ sampling_prior <chr> "yes", "yes", "yes", "yes", "yes", "yes", "yes"…
## $ date_formatted <date> 2020-07-05, 2020-07-05, 2020-07-05, 2020-07-05…
## $ first_sample_date <chr> "7/5/2020", "7/5/2020", "7/5/2020", "7/5/2020",…
Use range()
function on date_formatted
variable to display the range of dates in the data set. How does this
compare to that of first_sample_date
? Why? (Hint: use the
pull function first to pull the values.)
pull(covidww, date_formatted) %>% range()
## [1] "2020-07-05" "2024-05-06"
pull(covidww, first_sample_date) %>% range()
## [1] "1/1/2023" "9/9/2022"
# The max of `pull(covidww, first_sample_date) %>% range()` is numerical not based on date.