Homework is optional, but we recommend it so you can get the most out of this course.
## you can add more, or change...these are suggestions
library(tidyverse)
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
1. Create the following two objects.
bday <- "19-Feb"
name <- "Bruce Wayne"
2. Make an object “me” that is “bday” and “name” combined.
me <- c(bday, name)
3. Determine the data class for “me”.
class(me)
## [1] "character"
# The class for "me" is "character"
4. If I want to do me / 2
I get the following error:
Error in me/2 : non-numeric argument to binary operator
.
Why? Write your answer as a comment inside the R chunk below.
# R cannot perform math functions on character data classes (types).
The following questions involve an outside dataset.
We will be working with a dataset from the “Kaggle” website, which hosts competitions for prediction and machine learning. This particular dataset contains information about temperature measures from the Rover Environmental Monitoring Station (REMS) on Mars. These data are collected by Spain and Finland. More details on this dataset are here: https://www.kaggle.com/datasets/deepcontractor/mars-rover-environmental-monitoring-station/data.
5. Bring the dataset into R. The dataset is located at: https://daseh.org/data/kaggleMars_Dataset.csv. You can
use the link, download it, or use whatever method you like for getting
the file. Once you get the file, read the dataset in using
read_csv()
and assign it the name mars
.
mars <- read_csv(file = "https://daseh.org/data/kaggleMars_Dataset.csv")
## Rows: 3197 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): earth_date, mars_date, UV_Radiation, weather
## dbl (7): earth_year, solar_day, max_ground_temp, min_ground_temp, max_air_t...
## time (2): sunrise, sunset
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# OR
mars <- read_csv("https://daseh.org/data/kaggleMars_Dataset.csv")
## Rows: 3197 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): earth_date, mars_date, UV_Radiation, weather
## dbl (7): earth_year, solar_day, max_ground_temp, min_ground_temp, max_air_t...
## time (2): sunrise, sunset
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# OR
url <- "https://daseh.org/data/kaggleMars_Dataset.csv"
mars <- read_csv(file = url)
## Rows: 3197 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): earth_date, mars_date, UV_Radiation, weather
## dbl (7): earth_year, solar_day, max_ground_temp, min_ground_temp, max_air_t...
## time (2): sunrise, sunset
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# OR
download.file(
url = "https://daseh.org/data/kaggleMars_Dataset.csv",
destfile = "mars_data.csv",
overwrite = TRUE,
mode = "wb"
)
mars <- read_csv(file = "mars_data.csv")
## Rows: 3197 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): earth_date, mars_date, UV_Radiation, weather
## dbl (7): earth_year, solar_day, max_ground_temp, min_ground_temp, max_air_t...
## time (2): sunrise, sunset
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
6. Import the data “dictionary” from https://daseh.org/data/kaggleMars_dictionary.txt. Use
the read_tsv()
function and assign it the name “key”.
key <- read_tsv(file = "https://daseh.org/data/kaggleMars_dictionary.txt")
## Rows: 12 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): earth_year, Year on Earth
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# OR
download.file(
url = "https://daseh.org/data/kaggleMars_dictionary.txt",
destfile = "dict.txt",
overwrite = TRUE,
mode = "wb"
)
key <- read_tsv("dict.txt")
## Rows: 12 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): earth_year, Year on Earth
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
7. You should now be ready to work with the “mars” dataset.
str()
. Write
your answer as a comment inside the R chunk below.str(mars)
## spc_tbl_ [3,197 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ earth_year : num [1:3197] 2022 2022 2022 2022 2022 ...
## $ earth_date : chr [1:3197] "01-26 UTC" "01-25 UTC" "01-24 UTC" "01-23 UTC" ...
## $ mars_date : chr [1:3197] "Mars, Month 6 - LS 163deg" "Mars, Month 6 - LS 163deg" "Mars, Month 6 - LS 162deg" "Mars, Month 6 - LS 162deg" ...
## $ solar_day : num [1:3197] 3368 3367 3366 3365 3364 ...
## $ max_ground_temp: num [1:3197] -3 -3 -4 -6 -7 -8 -4 -6 -6 -9 ...
## $ min_ground_temp: num [1:3197] -71 -72 -70 -70 -71 -71 -72 -70 -71 -71 ...
## $ max_air_temp : num [1:3197] 10 10 8 9 8 8 5 5 3 5 ...
## $ min_air_temp : num [1:3197] -84 -87 -81 -91 -92 -80 -84 -73 -89 -80 ...
## $ mean_pressure : num [1:3197] 707 707 708 707 708 707 706 705 707 708 ...
## $ sunrise : 'hms' num [1:3197] 05:25:00 05:25:00 05:25:00 05:26:00 ...
## ..- attr(*, "units")= chr "secs"
## $ sunset : 'hms' num [1:3197] 17:20:00 17:20:00 17:21:00 17:21:00 ...
## ..- attr(*, "units")= chr "secs"
## $ UV_Radiation : chr [1:3197] "moderate" "moderate" "moderate" "moderate" ...
## $ weather : chr [1:3197] "Sunny" "Sunny" "Sunny" "Sunny" ...
## - attr(*, "spec")=
## .. cols(
## .. earth_year = col_double(),
## .. earth_date = col_character(),
## .. mars_date = col_character(),
## .. solar_day = col_double(),
## .. max_ground_temp = col_double(),
## .. min_ground_temp = col_double(),
## .. max_air_temp = col_double(),
## .. min_air_temp = col_double(),
## .. mean_pressure = col_double(),
## .. sunrise = col_time(format = ""),
## .. sunset = col_time(format = ""),
## .. UV_Radiation = col_character(),
## .. weather = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
#spc_tbl_ [3,197 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
# $ earth_year : num [1:3197] 2022 2022 2022 2022 2022 ...
# $ earth_date : chr [1:3197] "01-26 UTC" "01-25 UTC" "01-24 UTC" "01-23 UTC" ...
# $ mars_date : chr [1:3197] "Mars, Month 6 - LS 163deg" "Mars, Month 6 - LS 163deg" "Mars, Month 6 - LS 162deg" "Mars, Month 6 - LS 162deg" ...
# $ solar_day : num [1:3197] 3368 3367 3366 3365 3364 ...
# $ max_ground_temp(degC): chr [1:3197] "-3" "-3" "-4" "-6" ...
# $ min_ground_temp(degC): chr [1:3197] "-71" "-72" "-70" "-70" ...
# $ max_air_temp(degC) : chr [1:3197] "10" "10" "8" "9" ...
# $ min_air_temp(degC) : chr [1:3197] "-84" "-87" "-81" "-91" ...
# $ mean_pressure(Pa) : chr [1:3197] "707" "707" "708" "707" ...
# $ sunrise : 'hms' num [1:3197] 05:25:00 05:25:00 05:25:00 05:26:00 ...
# ..- attr(*, "units")= chr "secs"
# $ sunset : 'hms' num [1:3197] 17:20:00 17:20:00 17:21:00 17:21:00 ...
# ..- attr(*, "units")= chr "secs"
# $ UV_Radiation : chr [1:3197] "moderate" "moderate" "moderate" "moderate" ...
# $ weather : chr [1:3197] "Sunny" "Sunny" "Sunny" "Sunny" ...
# - attr(*, "spec")=
# .. cols(
# .. earth_year = col_double(),
# .. earth_date = col_character(),
# .. mars_date = col_character(),
# .. solar_day = col_double(),
# .. `max_ground_temp(degC)` = col_character(),
# .. `min_ground_temp(degC)` = col_character(),
# .. `max_air_temp(degC)` = col_character(),
# .. `min_air_temp(degC)` = col_character(),
# .. `mean_pressure(Pa)` = col_character(),
# .. sunrise = col_time(format = ""),
# .. sunset = col_time(format = ""),
# .. UV_Radiation = col_character(),
# .. weather = col_character()
# .. )
# - attr(*, "problems")=<externalptr>
8. How many data points (rows) are in the dataset? How many variables (columns) are recorded for each data point?
dim(mars)
## [1] 3197 13
nrow(mars)
## [1] 3197
# There are 3197 data points in the dataset and 13 variables.
9. Filter out (i.e., remove) measurements from earlier than 2015 (according to the Earth year), as well as any rows with missing data (NA). Replace the original “mars” object by reassigning the new filtered dataset to “mars”. How many data points are left after filtering?
Hint: use drop_na() to remove rows with missing values.
mars <- drop_na(mars)
mars <- filter(mars, earth_year > 2014)
nrow(mars)
## [1] 2393
# OR
mars <- mars %>% drop_na() %>% filter(earth_year > 2014)
nrow(mars)
## [1] 2393
# There are 2393 measurements left after filtering by year.
10. From this point on, work with the filtered “mars” dataset from the above question. A Martian year is equivalent to 668.6 sols (or solar days). Create a new variable (column) called “years_since_landing” that shows how many Martian years the Curiosity rover had been on Mars for each measurement (divide “solar_day” by 668.6). Check to make sure the new column is there.
Hint: use the mutate()
function.
mars <- mars %>% mutate(years_since_landing = solar_day / 668.6)
# OR
mars <- mutate(mars, years_since_landing = solar_day / 668.6)
colnames(mars)
## [1] "earth_year" "earth_date" "mars_date"
## [4] "solar_day" "max_ground_temp" "min_ground_temp"
## [7] "max_air_temp" "min_air_temp" "mean_pressure"
## [10] "sunrise" "sunset" "UV_Radiation"
## [13] "weather" "years_since_landing"
11. What is the range of the maximum ground temperature (“max_ground_temp”) of the dataset?
range(mars %>% pull(max_ground_temp))
## [1] -67 11
# OR
gtemp_max_range <- pull(mars, max_ground_temp)
range(gtemp_max_range)
## [1] -67 11
# OR
range(mars$max_ground_temp)
## [1] -67 11
table(mars$max_ground_temp)
##
## -67 -54 -53 -37 -35 -34 -33 -32 -31 -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 -20
## 1 1 1 1 2 2 11 25 33 41 71 78 69 88 71 79 85 84 71 72
## -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
## 56 47 46 59 56 77 70 77 65 71 51 68 89 97 72 81 75 64 68 59
## 1 2 3 4 5 6 7 8 10 11
## 36 31 30 24 18 7 7 4 1 1
# The range is -67 degrees Celsius to 11 degrees Celsius.
12. Create a random sample with of atmospheric pressure readings from
mars
. To determine the column that corresponds to
atmospheric pressure, check the “key” corresponding to the data
dictionary that you imported above in question 6. Use
sample()
and pull()
. Remember that by default
random samples differ each time you run the code.
sample(pull(mars, mean_pressure), size = 20)
## [1] 814 858 861 846 813 863 732 726 842 874 730 848 902 864 856 860 871 771 889
## [20] 858
13. How many data points are from days where the maximum ground temperature got above 0 degrees Celsius? What percent/proportion do these represent? Use:
filter()
and nrow()
group_by()
and summarize()
orsum()
# How many data points are from days where the maximum ground temperature got above or equal to 0 degrees Celsius?
nrow(mars %>% filter(max_ground_temp >= 0))
## [1] 218
# OR
mars %>%
group_by(max_ground_temp >= 0) %>%
summarize(total = n())
## # A tibble: 2 × 2
## `max_ground_temp >= 0` total
## <lgl> <int>
## 1 FALSE 2175
## 2 TRUE 218
# OR
sum(mars$max_ground_temp >= 0)
## [1] 218
# OR
table(mars$max_ground_temp >= 0)
##
## FALSE TRUE
## 2175 218
# what percent/proportion do these represent?
nrow(mars %>% filter(max_ground_temp >= 0)) / nrow(mars)
## [1] 0.09109904
# OR
mean(mars$max_ground_temp >= 0, na.rm=T)
## [1] 0.09109904
# There are 218 data points from days where the ground temperature got above freezing. The percent of data points is 9.1%.
14. How many different UV radiation levels (“UV_Radiation”) are there?
Hint: use length()
with
unique()
or table()
. Remember to
pull()
the right column.
mars %>%
pull(UV_Radiation) %>%
unique() %>%
length()
## [1] 4
# OR
length(unique(mars %>% pull(UV_Radiation)))
## [1] 4
# OR
length(unique(mars$UV_Radiation))
## [1] 4
# OR
table(unique(mars$UV_Radiation))
##
## high low moderate very_high
## 1 1 1 1
# 4 unique levels.
15. How many different weather conditions (“weather”) are reported?
mars %>%
pull(weather) %>%
unique() %>%
length()
## [1] 1
# 1 weather condition.
16. Which UV radiation level had the highest maximum air temperature, and what was it?
Hint: Use group_by()
with
summarize()
.
mars %>%
group_by(UV_Radiation) %>%
summarize(mean = mean(max_air_temp))
## # A tibble: 4 × 2
## UV_Radiation mean
## <chr> <dbl>
## 1 high 5.66
## 2 low -11.3
## 3 moderate -1.38
## 4 very_high 12.5
17. Extend on the code you wrote for question 16. Use the
arrange()
function to sort the output by maximum air
temperature.
mars %>%
group_by(UV_Radiation) %>%
summarize(mean = mean(max_air_temp)) %>%
arrange(desc(mean))
## # A tibble: 4 × 2
## UV_Radiation mean
## <chr> <dbl>
## 1 very_high 12.5
## 2 high 5.66
## 3 moderate -1.38
## 4 low -11.3
18. How many measurements were taken on days when the UV radiation was “low” and the maximum air temperature was above freezing? Use:
filter()
and count()
filter()
and tally()
orsum()
mars %>%
filter(UV_Radiation == "low" & max_air_temp > 0) %>%
tally()
## # A tibble: 1 × 1
## n
## <int>
## 1 3
# OR
mars %>%
filter(UV_Radiation == "low" & max_air_temp > 0) %>%
count()
## # A tibble: 1 × 1
## n
## <int>
## 1 3
# OR
sum(mars$UV_Radiation == "low" & mars$max_air_temp > 0)
## [1] 3
# A total of 3 days.
19. How many days was the UV radiation was “high” or “very high”? use:
filter()
and count()
filter()
and tally()
orsum()
mars %>%
filter(UV_Radiation == "high" | UV_Radiation == "very_high") %>%
count()
## # A tibble: 1 × 1
## n
## <int>
## 1 1125
# OR
mars %>%
filter(UV_Radiation %in% c("high", "very_high")) %>%
count()
## # A tibble: 1 × 1
## n
## <int>
## 1 1125
# OR
sum(mars$UV_Radiation == "high" | mars$UV_Radiation == "very_high")
## [1] 1125
# OR
sum(mars$UV_Radiation %in% c("high", "very_high"))
## [1] 1125
# A total of 1125 days.
20. Select all columns in “mars” where the column names starts with
“min” (using select()
and starts_with()
. Then,
use colMeans()
to summarize across these columns.
mars %>%
select(starts_with("min")) %>%
colMeans()
## min_ground_temp min_air_temp
## -74.94317 -80.56080
21. Using “mars”, create a new binary (TRUEs and FALSEs) column to indicate if the day’s maximum air temperature was above freezing. Call the new column “above_freezing”.
mars <- mars %>% mutate(above_freezing = (max_air_temp > 0))
22. What is the average atmospheric pressure for days that have an air temperature above freezing and UV radiation level of “moderate”? How does this compare with days that do NOT fit these criteria?
mean_mod_warm <- mars %>%
filter(above_freezing == TRUE & UV_Radiation == "moderate") %>%
summarize(mean = mean(mean_pressure)) %>%
pull()
mean_not_mod_warm <- mars %>%
filter(above_freezing != TRUE | UV_Radiation != "moderate") %>%
summarize(mean = mean(mean_pressure)) %>%
pull()
# Days that are above freezing with UV level of "moderate" have an average atmospheric pressure of 820.7 Pa while days not fitting this criteria have an average atmospheric pressure of 826.6 Pa.
23. Among days with a “moderate” UV level that are above freezing, what is the distribution of the earth year in which these days occurred?
mod_warm <- mars %>% filter(UV_Radiation == "moderate" & above_freezing == TRUE)
mod_warm %>%
group_by(earth_year) %>%
select(earth_year) %>%
table()
## earth_year
## 2015 2016 2017 2018 2019 2020 2021 2022
## 41 31 6 74 72 152 126 17
# OR
mod_warm <- mars %>% filter(UV_Radiation == "moderate" & above_freezing == TRUE)
mod_warm %>%
group_by(earth_year) %>%
count()
## # A tibble: 8 × 2
## # Groups: earth_year [8]
## earth_year n
## <dbl> <int>
## 1 2015 41
## 2 2016 31
## 3 2017 6
## 4 2018 74
## 5 2019 72
## 6 2020 152
## 7 2021 126
## 8 2022 17
# OR
mod_warm <- mars %>% filter(UV_Radiation == "moderate" & above_freezing == TRUE)
mod_warm %>%
group_by(earth_year) %>%
tally()
## # A tibble: 8 × 2
## earth_year n
## <dbl> <int>
## 1 2015 41
## 2 2016 31
## 3 2017 6
## 4 2018 74
## 5 2019 72
## 6 2020 152
## 7 2021 126
## 8 2022 17
24. How many days (using filter()
or sum()
) have a maximum ground or air temperature above zero and have a UV
level of “high” or “very_high”?
sum((mars$max_ground_temp > 0| mars$max_air_temp > 0) & (mars$UV_Radiation == "high" | mars$UV_Radiation == "very_high"))
## [1] 886
# OR
sum((mars$max_ground_temp > 0| mars$max_air_temp > 0) & mars$UV_Radiation %in% c("high", "very_high"))
## [1] 886
# OR
mars %>%
filter((mars$max_ground_temp > 0| mars$max_air_temp > 0) & mars$UV_Radiation %in% c("high", "very_high")) %>%
count()
## # A tibble: 1 × 1
## n
## <int>
## 1 886
# A total of 886 days.
25. Make a boxplot (boxplot()
) that looks at earth year
(“earth_year”) on the x-axis and minimum air temperature
(“min_air_temp”) on the y-axis.
boxplot(mars %>% pull(min_air_temp) ~ mars %>% pull(earth_year))
26. Knit your document into a report.
You use the knit button to do this. Make sure all your code is working first!