Part 1

Load the packages.

library(tidyverse)

Load the CalEnviroScreen data from the link www.daseh.org/data/CalEnviroScreen_data.csv) and subset it so that you only have data from Fresno, Merced, Placer, Sonoma, and Yolo counties.

ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")

## Rows: 8035 Columns: 67
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): CaliforniaCounty, ApproxLocation, CES4.0PercRange
## dbl (64): CensusTract, ZIP, Longitude, Latitude, CES4.0Score, CES4.0Percenti...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

ces_sub <- ces %>% filter(CaliforniaCounty %in% c("Fresno", "Merced", "Placer", "Sonoma", "Yolo"))

1.1

Use the ggplot2 package to make a plot of how diesel particulate concentration (DieselPM; y-axis) is associated with traffic density values (Traffic; x-axis). You can use lines layer (+ geom_line()) or points layer (+ geom_point()), or both!

Assign the plot to variable my_plot. Type my_plot in the console to have it displayed.

DieselPM: Diesel PM emissions from on-road and non-road sources Traffic: Traffic density in vehicle-kilometers per hour per road length, within 150 meters of the census tract boundary

# General format
ggplot(???, aes(x = ???, y = ???)) +
  ??? +
  ???

my_plot <-
  ggplot(ces_sub, aes(x = Traffic, y = DieselPM)) +
  geom_line() +
  geom_point()

my_plot

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

1.2

“Update” your plot by adding a title and changing the x and y axis titles. (Hint: use the labs function.)

my_plot <- my_plot +
  labs(
    x = "Traffic density index",
    y = "Diesel particulate matter",
    title = "Relationship between traffic density and diesel particulate matter"
  )

my_plot

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

1.3

Use the scale_x_continuous() function to plot the x axis with the following breaks c(250, 750, 1250, 1750, 2250).

# General format
my_plot <- my_plot +
  scale_x_continuous(breaks = ???)

my_plot <- my_plot +
  scale_x_continuous(
    breaks = c(250, 750, 1250, 1750, 2250)
  )

my_plot

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

1.4

Observe several different versions of the plot by displaying my_plot while adding a different “theme” to it.

# General format
my_plot + theme_bw()

my_plot + theme_bw()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

my_plot + theme_classic()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

my_plot + theme_dark()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

my_plot + theme_gray()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

my_plot + theme_void()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Practice on Your Own!

P.1

Create a boxplot (with the geom_boxplot() function) using the ces_sub data, where CaliforniaCounty is plotted on the x axis and DrinkingWater is plotted on the y axis.

DrinkingWater: Drinking water contaminant index for selected contaminants. A higher value means drinking water contains a greater volume of contaminants.

ces_sub %>%
  ggplot(aes(x = CaliforniaCounty, y = DrinkingWater)) +
  geom_boxplot()

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).

Part 2

2.1

Let’s look at the plot of traffic density and diesel particulate matter again,

Use ggplot2 package make plot of how diesel particulate concentration (DieselPM; y-axis) is associated with traffic density values (Traffic; x-axis), where each county (CaliforniaCounty) has a different color (hint: use color = CaliforniaCounty in mapping).

# General format
ggplot(???, aes(
  x = ???,
  y = ???,
  color = ???
)) +
  geom_line() +
  geom_point()

ggplot(ces_sub, aes(
  x = Traffic,
  y = DieselPM,
  color = CaliforniaCounty
)) +
  geom_line() +
  geom_point()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

2.2

Redo the above plot by adding a faceting (+ facet_wrap( ~ CaliforniaCounty, nrow = 3)) to have data for quarter in a separate plot panel.

Assign the new plot as an object called facet_plot.

Try adjusting the number of rows in the facet_wrap to see how this changes the plot.

facet_plot <- ggplot(ces_sub, aes(
  x = Traffic,
  y = DieselPM,
  color = CaliforniaCounty
)) +
  geom_line() +
  geom_point() +
  facet_wrap(~CaliforniaCounty, nrow = 3)

facet_plot

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

2.3

Observe what happens when you remove either geom_line() OR geom_point() from one of your plots above.

# These elements are removed from the plot, like layers

Practice on Your Own!

P.2

Modify facet_plot to remove the legend (hint use theme() and the legend.position argument) and change the names of the axis titles to be “Diesel particulate matter” for the y axis and “Traffic density” for the x axis.

facet_plot <- facet_plot +
  theme(legend.position = "none") +
  labs(
    y = "Diesel particulate matter",
    x = "Traffic density"
  )

facet_plot

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

P.3

Modify facet_plot one more time with a fun theme! Look into the ThemePark package. It has lots of fun themes! Try one out! Remember you will need to install it using remotes::install_github("MatthewBJane/ThemePark")and load in the library.

# remotes::install_github("MatthewBJane/ThemePark")
library(ThemePark)

facet_plot + theme_grand_budapest()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Data Visualization Lab - Key

Part 1

1.1

1.2

1.3

1.4

Practice on Your Own!

P.1

Part 2

2.1

2.2

2.3

Practice on Your Own!

P.2

P.3