Part 1

Helpful tips before we start

TROUBLESHOOTING: Common new user mistakes we have seen

Check the file path – is the file there?
Typos (R is case sensitive, x and X are different)
Open ended quotes, parentheses, and brackets
Deleting part of the code chunk
For any function, you can write ?FUNCTION_NAME, or help("FUNCTION_NAME") to look at the help file

1.1

Set up your R Project.

File, New Project or click the new project button
New Directory
New Project
Type a name and choose a location
Check that the folder is there!

Check out our resource here: https://daseh.org/resources/R_Projects.html

1.2

Load the package by adding “library(tidyverse)” below and running the code.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

1.3

Use the manual import method (File > Import Dataset > From Text (readr)) to read in the CalEnviroScreen data from this URL:

https://daseh.org/data/CalEnviroScreen_data.csv

These data were collected by California Office of Health Hazard Assessment to track environmental measures (like pollution, water contamination, etc.) that can impact human health. You can read more about the project here

1.4

What is the dataset object called? You can find this information in the Console or the Environment. Enter your answer as a comment using #.

# CalEnviroScreen_data

1.5

Preview the data by examining the Environment. How many observations and variables are there? Enter your answer as a comment using #.

# 8035  obs. of 68 variables

Practice on Your Own!

P.1

Download the data from https://daseh.org/data/CalEnviroScreen_data.csv and move the file to your project folder. Import the data by browsing for the file on your computer.

Download the data Put data in the project folder File, Import Dataset, From Text (readr) browse for the file click “Update” and “Import”

Part 2

2.1

Read in the CalEnviroScreen data from this URL using read_csv and this URL: https://daseh.org/data/CalEnviroScreen_data.csv. Assign it to an object named ces. Use the code structure below.

# General format
OBJECT <- read_csv(FILE)

ces <- read_csv(file = "https://daseh.org/data/CalEnviroScreen_data.csv")

## Rows: 8035 Columns: 67
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): CaliforniaCounty, ApproxLocation, CES4.0PercRange
## dbl (64): CensusTract, ZIP, Longitude, Latitude, CES4.0Score, CES4.0Percenti...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2.2

Take a look at the data. Do these data objects (CalEnviroScreen_data and ces) appear to be the same? Why or why not?

# Yes, when we look in the RStudio environment, the two objects have the same dimensions. If we use the View() or str() functions, we can also see in more detail that the data is the same.

2.3

Learn your working directory by running getwd(). This is where R will look for files unless you tell it otherwise.

getwd()

## [1] "/__w/DaSEH/DaSEH/modules/Data_Input/lab"

Practice on Your Own!

P.2

Run the following code - is there a problem? How do you know?

ces2 <- read_delim("https://daseh.org/data/CalEnviroScreen_data.csv", delim = "\t")

## Rows: 8035 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): CensusTract,CaliforniaCounty,ZIP,Longitude,Latitude,ApproxLocation,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

ces2

## # A tibble: 8,035 × 1
##    CensusTract,CaliforniaCounty,ZIP,Longitude,Latitude,ApproxLocation,CES4.0Sc…¹
##    <chr>                                                                        
##  1 "6001400100,\"Alameda \",94704,-122.2319033,37.8675947,\"Oakland\",4.85,2.8,…
##  2 "6001400200,\"Alameda \",94618,-122.2495763,37.848171,\"Oakland\",4.88,2.87,…
##  3 "6001400300,\"Alameda \",94618,-122.2544365,37.8405983,\"Oakland\",11.2,15.9…
##  4 "6001400400,\"Alameda \",94609,-122.2574628,37.8482107,\"Oakland\",12.39,18.…
##  5 "6001400500,\"Alameda \",94609,-122.2647445,37.8485167,\"Oakland\",16.73,29.…
##  6 "6001400600,\"Alameda \",94609,-122.2648882,37.8419909,\"Oakland\",20.02,37.…
##  7 "6001400700,\"Alameda \",94608,-122.2723135,37.8417578,\"Oakland\",36.71,70.…
##  8 "6001400800,\"Alameda \",94608,-122.2833803,37.8454493,\"Oakland\",37.1,70.7…
##  9 "6001400900,\"Alameda \",94608,-122.2802437,37.8394669,\"Oakland\",40.71,76.…
## 10 "6001401000,\"Alameda \",94608,-122.2719625,37.831217,\"Oakland\",43.74,80.4…
## # ℹ 8,025 more rows
## # ℹ abbreviated name:
## #   ¹`CensusTract,CaliforniaCounty,ZIP,Longitude,Latitude,ApproxLocation,CES4.0Score,CES4.0Percentile,CES4.0PercRange,Ozone,OzonePctl,PM2.5,PM2.5.Pctl,DieselPM,DieselPMPctl,DrinkingWater,DrinkingWaterPctl,Lead,LeadPctl,Pesticides,PesticidesPctl,ToxRelease,ToxReleasePctl,Traffic,TrafficPctl,CleanupSites,CleanupSitesPctl,GroundwaterThreats,GroundwaterThreatsPctl,HazWaste,HazWastePctl,ImpWaterBodies,ImpWaterBodiesPctl,SolidWaste,SolidWastePctl,PollutionBurden,PollutionBurdenScore,PollutionBurdenPctl,Asthma,AsthmaPctl,LowBirthWeight,LowBirthWeightPctl,CardiovascularDisease,CardiovascularDiseasePctl,TotalPop,ChildrenPercLess10,PopPerc10to64,ElderlyMore64,HispanicPerc,WhitePerc,AfAmericanPerc,NativeAmericanPerc,AsianAmericanPerc,OtherMultiplePerc,PopChar,PopCharScore,PopCharPctl,Education,EducationPctl,LinguisticIsol,LinguisticIsolPctl,Poverty,PovertyPctl,Unemployment,UnemploymentPctl,HousingBurden,HousingBurdenPctl`

# It should be a red flag to see that there is only one column that looks like: ,CensusTract,CaliforniaCounty,ZIP,Longitude,Latitude,ApproxLocation,CES4.0Score,CES4.0Percentile,CES4.0PercRange,Ozone,O
# This file is comma delimited, not tab delimited!

P.3

Try reading in some data on your computer using any method we discussed!

Data Input Lab - Key

Part 1

Helpful tips before we start

TROUBLESHOOTING: Common new user mistakes we have seen

1.1

1.2

1.3

1.4

1.5

Practice on Your Own!

P.1

Part 2

2.1

2.2

2.3

Practice on Your Own!

P.2

P.3