Helpful tips before we start

TROUBLESHOOTING: Common new user mistakes we have seen

  • Check the file path – is the file there?
  • Typos (R is case sensitive, x and X are different)
  • Open ended quotes, parentheses, and brackets
  • Deleting part of the code chunk
  • For any function, you can write ?FUNCTION_NAME, or help("FUNCTION_NAME") to look at the help file

R Projects

R Projects can help you keep files organized and avoid issues with working directories. Check out our resource here: https://daseh.org/resources/R_Projects.html

Lab

In this lab you can use the interactive console to explore or Knit the document. Remember anything you type here can be “sent” to the console with Cmd-Enter (OS-X) or Ctrl-Enter (Windows/Linux) in an R code chunk.

# Load the necessary package
library(readr)

1.1

Use the manual import method (File > Import Dataset > From Text (readr)) to read in Haloacetic Acids (HAA5) Exposure for WA Populations on Public Water Systems data from this URL:

https://daseh.org/data/HAA5_Exposure_for_WA_Public_Water_Systems_data.csv.

These data were collected by the Washington Tracking Network. You can learn more about the data here: https://fortress.wa.gov/doh/wtn/WTNPortal/#!q0=674

1.2

What is the dataset object called? You can find this information in the Console or the Environment. Enter your answer as a comment using #.

# HAA5_Exposure_for_WA_Public_Water_Systems_data

1.3

Preview the data by clicking the table button in the Environment. How many observations and variables are there? Enter your answer as a comment using #.

# 33 obs of 11 variables

1.4

Read Haloacetic Acids (HAA5) Exposure for WA Populations on Public Water Systems data from https://daseh.org/data/HAA5_Exposure_for_WA_Public_Water_Systems_data.csv. and assign it to an object named exposure. Use the code structure below.

# General format
library(readr)
# OBJECT <- read_csv(FILE)
library(readr)
exposure <- read_csv(file = "https://daseh.org/data/HAA5_Exposure_for_WA_Public_Water_Systems_data.csv")
## Rows: 22 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (11): year, pop_on_sampled_PWS, pop_0-15µg/L, pop_>15-30µg/L, pop_>30-45...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

1.5

Take a look at the data. Do these data objects (HAA5_Exposure_for_WA_Public_Water_Systems_data and exposure) appear to be the same? Why or why not?

# Yes, when we look in the RStudio environment, the two objects have the same dimensions. If we use the View() or str() functions, we can also see in more detail that the data is the same. 
# If we wanted to get really in the weeds, we could do a logical test like all.equal(HAA5_Exposure_for_WA_Public_Water_Systems_data, exposure)

1.6

Learn your working directory by running getwd(). This is where R will look for files unless you tell it otherwise.

getwd()
## [1] "/__w/DaSEH/DaSEH/modules/Data_Input/lab"

Practice on Your Own!

P.1

Load the readxl package with the library() command.

If it is not installed, install it via: RStudio --> Tools --> Install Packages. You can also try install.packages("readxl").

library(readxl)

P.2

Download the dataset of nitrate levels in the public water systems: https://daseh.org/data/Nitrate_Exposure_for_WA_Public_Water_Systems_data.xlsx file to nitrate.xlsx by running the following code chunk. This only downloads the file, it does NOT bring the file into R.

download.file(
  url = "https://daseh.org/data/Nitrate_Exposure_for_WA_Public_Water_Systems_data.xlsx",
  destfile = "nitrate.xlsx",
  overwrite = TRUE,
  mode = "wb"
)

Note: the “wb” option makes sure the file can be read correctly on Windows and Apple machines.

P.3

Use the read_excel() function in the readxl package to read the nitrate.xlsx file and call the output nitrate.

nitrate <- read_excel(path = "nitrate.xlsx")

P.4

Run the following code - is there a problem? How do you know?

exposure <- read_delim("https://daseh.org/data/HAA5_Exposure_for_WA_Public_Water_Systems_data.csv", delim = "\t")
## Rows: 22 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## num (1): year,pop_on_sampled_PWS,pop_0-15µg/L,pop_>15-30µg/L,pop_>30-45µg/L,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
exposure
## # A tibble: 22 × 1
##    year,pop_on_sampled_PWS,pop_0-15µg/L,pop_>15-30µg/L,pop_>30-45µg/L,pop_>45-…¹
##                                                                            <dbl>
##  1                                                                       2.00e40
##  2                                                                       2.00e43
##  3                                                                       2.00e42
##  4                                                                       2.00e43
##  5                                                                       2.00e39
##  6                                                                       2.00e54
##  7                                                                       2.01e53
##  8                                                                       2.01e54
##  9                                                                       2.01e52
## 10                                                                       2.01e54
## # ℹ 12 more rows
## # ℹ abbreviated name:
## #   ¹​`year,pop_on_sampled_PWS,pop_0-15µg/L,pop_>15-30µg/L,pop_>30-45µg/L,pop_>45-60µg/L,pop_>60-75µg/L,pop_>75µg/L,pop_on_PWS_with_non-detects,pop_exposed_to_exceedances,perc_pop_exposed_to_exceedances`
# It should be a red flag to see that there is only one column that looks like: `year,pop_on_sampled_PWS,pop_0-15µg/L,pop_>15-30µg/L,pop_>30-45µg/L,pop
# This file is comma delimited, not tab delimited!