Instructions

Homework is optional, but we recommend it so you can get the most out of this course.

## you can add more, or change...these are suggestions
library(tidyverse)
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)

Problem Set

1. Create the following two objects.

  1. Make an object “bday”. Assign it your birthday in day-month format (1-Jan).
  2. Make another object “name”. Assign it your name. Make sure to use quotation marks for anything with text!

2. Make an object “me” that is “bday” and “name” combined.

3. Determine the data class for “me”.

4. If I want to do me / 2 I get the following error: Error in me/2 : non-numeric argument to binary operator. Why? Write your answer as a comment inside the R chunk below.

The following questions involve an outside dataset.

We will be working with a dataset from the “Kaggle” website, which hosts competitions for prediction and machine learning. This particular dataset contains information about temperature measures from the Rover Environmental Monitoring Station (REMS) on Mars. These data are collected by Spain and Finland. More details on this dataset are here: https://www.kaggle.com/datasets/deepcontractor/mars-rover-environmental-monitoring-station/data.

5. Bring the dataset into R. The dataset is located at: https://daseh.org/data/kaggleMars_Dataset.csv. You can use the link, download it, or use whatever method you like for getting the file. Once you get the file, read the dataset in using read_csv() and assign it the name mars.

6. Import the data “dictionary” from https://daseh.org/data/kaggleMars_dictionary.txt. Use the read_tsv() function and assign it the name “key”.

7. You should now be ready to work with the “mars” dataset.

  1. Preview the data so that you can see the names of the columns. There are several possible functions to do this.
  2. Determine the class of the columns using str(). Write your answer as a comment inside the R chunk below.

8. How many data points (rows) are in the dataset? How many variables (columns) are recorded for each data point?

9. Filter out (i.e., remove) measurements from earlier than 2015 (according to the Earth year), as well as any rows with missing data (NA). Replace the original “mars” object by reassigning the new filtered dataset to “mars”. How many data points are left after filtering?

Hint: use drop_na() to remove rows with missing values.

10. From this point on, work with the filtered “mars” dataset from the above question. A Martian year is equivalent to 668.6 sols (or solar days). Create a new variable (column) called “years_since_landing” that shows how many Martian years the Curiosity rover had been on Mars for each measurement (divide “solar_day” by 668.6). Check to make sure the new column is there.

Hint: use the mutate() function.

11. What is the range of the maximum ground temperature (“max_ground_temp”) of the dataset?

12. Create a random sample with of atmospheric pressure readings from mars. To determine the column that corresponds to atmospheric pressure, check the “key” corresponding to the data dictionary that you imported above in question 6. Use sample() and pull(). Remember that by default random samples differ each time you run the code.

13. How many data points are from days where the maximum ground temperature got above 0 degrees Celsius? What percent/proportion do these represent? Use:

14. How many different UV radiation levels (“UV_Radiation”) are there?

Hint: use length() with unique() or table(). Remember to pull() the right column.

15. How many different weather conditions (“weather”) are reported?

16. Which UV radiation level had the highest maximum air temperature, and what was it?

Hint: Use group_by() with summarize().

17. Extend on the code you wrote for question 16. Use the arrange() function to sort the output by maximum air temperature.

18. How many measurements were taken on days when the UV radiation was “low” and the maximum air temperature was above freezing? Use:

19. How many days was the UV radiation was “high” or “very high”? use:

20. Select all columns in “mars” where the column names starts with “min” (using select() and starts_with(). Then, use colMeans() to summarize across these columns.

21. Using “mars”, create a new binary (TRUEs and FALSEs) column to indicate if the day’s maximum air temperature was above freezing. Call the new column “above_freezing”.

22. What is the average atmospheric pressure for days that have an air temperature above freezing and UV radiation level of “moderate”? How does this compare with days that do NOT fit these criteria?

23. Among days with a “moderate” UV level that are above freezing, what is the distribution of the earth year in which these days occurred?

24. How many days (using filter() or sum() ) have a maximum ground or air temperature above zero and have a UV level of “high” or “very_high”?

25. Make a boxplot (boxplot()) that looks at earth year (“earth_year”) on the x-axis and minimum air temperature (“min_air_temp”) on the y-axis.

26. Knit your document into a report.

You use the knit button to do this. Make sure all your code is working first!