x
and X
are different)?FUNCTION_NAME
, or help("FUNCTION_NAME")
to look at the help fileSet up your R Project.
File, New Project or click the new project button New Directory New Project Type a name and choose a location Check that the folder is there!
Check out our resource here: https://daseh.org/resources/R_Projects.html
Load the package by adding “library(tidyverse)” below and running the code.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Use the manual import method (File > Import Dataset > From Text (readr
)) to read in the CalEnviroScreen data from this URL:
https://daseh.org/data/CalEnviroScreen_data.csv
These data were collected by California Office of Health Hazard Assessment to track environmental measures (like pollution, water contamination, etc.) that can impact human health. You can read more about the project here
What is the dataset object called? You can find this information in the Console or the Environment. Enter your answer as a comment using #
.
# CalEnviroScreen_data
Preview the data by examining the Environment. How many observations and variables are there? Enter your answer as a comment using #
.
# 8035 obs. of 68 variables
Download the data from https://daseh.org/data/CalEnviroScreen_data.csv and move the file to your project folder. Import the data by browsing for the file on your computer.
Download the data Put data in the project folder File, Import Dataset, From Text (
readr
) browse for the file click “Update” and “Import”
Read in the CalEnviroScreen data from this URL using read_csv
and this URL: https://daseh.org/data/CalEnviroScreen_data.csv. Assign it to an object named ces
. Use the code structure below.
# General format
OBJECT <- read_csv(FILE)
ces <- read_csv(file = "https://daseh.org/data/CalEnviroScreen_data.csv")
## Rows: 8035 Columns: 67
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): CaliforniaCounty, ApproxLocation, CES4.0PercRange
## dbl (64): CensusTract, ZIP, Longitude, Latitude, CES4.0Score, CES4.0Percenti...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Take a look at the data. Do these data objects (CalEnviroScreen_data
and ces
) appear to be the same? Why or why not?
# Yes, when we look in the RStudio environment, the two objects have the same dimensions. If we use the View() or str() functions, we can also see in more detail that the data is the same.
Learn your working directory by running getwd()
. This is where R will look for files unless you tell it otherwise.
getwd()
## [1] "/__w/DaSEH/DaSEH/modules/Data_Input/lab"
Run the following code - is there a problem? How do you know?
ces2 <- read_delim("https://daseh.org/data/CalEnviroScreen_data.csv", delim = "\t")
## Rows: 8035 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): CensusTract,CaliforniaCounty,ZIP,Longitude,Latitude,ApproxLocation,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ces2
## # A tibble: 8,035 × 1
## CensusTract,CaliforniaCounty,ZIP,Longitude,Latitude,ApproxLocation,CES4.0Sc…¹
## <chr>
## 1 "6001400100,\"Alameda \",94704,-122.2319033,37.8675947,\"Oakland\",4.85,2.8,…
## 2 "6001400200,\"Alameda \",94618,-122.2495763,37.848171,\"Oakland\",4.88,2.87,…
## 3 "6001400300,\"Alameda \",94618,-122.2544365,37.8405983,\"Oakland\",11.2,15.9…
## 4 "6001400400,\"Alameda \",94609,-122.2574628,37.8482107,\"Oakland\",12.39,18.…
## 5 "6001400500,\"Alameda \",94609,-122.2647445,37.8485167,\"Oakland\",16.73,29.…
## 6 "6001400600,\"Alameda \",94609,-122.2648882,37.8419909,\"Oakland\",20.02,37.…
## 7 "6001400700,\"Alameda \",94608,-122.2723135,37.8417578,\"Oakland\",36.71,70.…
## 8 "6001400800,\"Alameda \",94608,-122.2833803,37.8454493,\"Oakland\",37.1,70.7…
## 9 "6001400900,\"Alameda \",94608,-122.2802437,37.8394669,\"Oakland\",40.71,76.…
## 10 "6001401000,\"Alameda \",94608,-122.2719625,37.831217,\"Oakland\",43.74,80.4…
## # ℹ 8,025 more rows
## # ℹ abbreviated name:
## # ¹`CensusTract,CaliforniaCounty,ZIP,Longitude,Latitude,ApproxLocation,CES4.0Score,CES4.0Percentile,CES4.0PercRange,Ozone,OzonePctl,PM2.5,PM2.5.Pctl,DieselPM,DieselPMPctl,DrinkingWater,DrinkingWaterPctl,Lead,LeadPctl,Pesticides,PesticidesPctl,ToxRelease,ToxReleasePctl,Traffic,TrafficPctl,CleanupSites,CleanupSitesPctl,GroundwaterThreats,GroundwaterThreatsPctl,HazWaste,HazWastePctl,ImpWaterBodies,ImpWaterBodiesPctl,SolidWaste,SolidWastePctl,PollutionBurden,PollutionBurdenScore,PollutionBurdenPctl,Asthma,AsthmaPctl,LowBirthWeight,LowBirthWeightPctl,CardiovascularDisease,CardiovascularDiseasePctl,TotalPop,ChildrenPercLess10,PopPerc10to64,ElderlyMore64,HispanicPerc,WhitePerc,AfAmericanPerc,NativeAmericanPerc,AsianAmericanPerc,OtherMultiplePerc,PopChar,PopCharScore,PopCharPctl,Education,EducationPctl,LinguisticIsol,LinguisticIsolPctl,Poverty,PovertyPctl,Unemployment,UnemploymentPctl,HousingBurden,HousingBurdenPctl`
# It should be a red flag to see that there is only one column that looks like: ,CensusTract,CaliforniaCounty,ZIP,Longitude,Latitude,ApproxLocation,CES4.0Score,CES4.0Percentile,CES4.0PercRange,Ozone,O
# This file is comma delimited, not tab delimited!
Try reading in some data on your computer using any method we discussed!