Some data in this lab comes from the OCS “Exploring CO2 emissions across time” activity (https://www.opencasestudies.org/ocs-bp-co2-emissions/).

Additional data about climate change disasters can be found at “https://daseh.org/data/Yearly_CC_Disasters_total_affected.csv”. This dataset shows the total number of people affected by climate disasters by country and year.

library(tidyverse)

Part 1

1.1

Open the yearly_co2_emissions dataset. Use read_csv() from the tidyverse / readr package. You can download the data or use this URL directly: https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv

Check out the data to understand the format.

co2 <- read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")

## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(co2)

## # A tibble: 6 × 265
##   country  `1751` `1752` `1753` `1754` `1755` `1756` `1757` `1758` `1759` `1760`
##   <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Afghani…     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 2 Albania      NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 3 Algeria      NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 4 Andorra      NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 5 Angola       NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 6 Antigua…     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## # ℹ 254 more variables: `1761` <dbl>, `1762` <dbl>, `1763` <dbl>, `1764` <dbl>,
## #   `1765` <dbl>, `1766` <dbl>, `1767` <dbl>, `1768` <dbl>, `1769` <dbl>,
## #   `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>, `1774` <dbl>,
## #   `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>, `1779` <dbl>,
## #   `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>, `1784` <dbl>,
## #   `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>, `1789` <dbl>,
## #   `1790` <dbl>, `1791` <dbl>, `1792` <dbl>, `1793` <dbl>, `1794` <dbl>, …

1.2

Create a new dataset co2_long that does pivot_longer() on all columns except “country”. Remember that !country means all columns except “country”.

Reassign to co2_long.

# General format
new_data <- old_data |> pivot_longer(cols = colname(s))

co2_long <- co2 |> pivot_longer(cols = !country)

1.3

Using co2_long, filter the “country” column so it only includes values from Indonesia and Canada. Hint: use filter() and %in%.

Name the new object co2_long_sub.

# General format
new_data <- old_data |> filter(colname %in% c(...))

co2_long_sub <- co2_long |>
  filter(country %in% c("Indonesia", "Canada"))

1.4

Use pivot_wider() to reshape co2_long_sub. Use “country” for the names_from argument. Use “value” for the values_from argument.

Call this new data co2_wide.

# General format
new_data <- old_data |> pivot_wider(names_from = column1, values_from = column2)

co2_wide <- co2_long_sub |>
  pivot_wider(
    names_from = country,
    values_from = value
  )

1.5

Using co2_wide, drop all NA values using drop_na().

Reassign to co2_wide. Compare the years - what conclusions can you draw, comparing the two countries?

co2_wide <- co2_wide |> drop_na()
tail(co2_wide)

## # A tibble: 6 × 3
##   name  Canada Indonesia
##   <chr>  <dbl>     <dbl>
## 1 2009  537000    446000
## 2 2010  535000    429000
## 3 2011  537000    604000
## 4 2012  517000    637000
## 5 2013  517000    490000
## 6 2014  537000    464000

# Indonesia emissions are lower to start, but become closer in value to Canada over time. However, Indonesia has many more people!

Practice on Your Own!

P.1

Take the code from Questions 1.1 - 1.5. Chain all of this code together using the pipe |>. Call your data co2_compare.

co2_compare <-
  read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") |>
  pivot_longer(cols = !country) |>
  filter(country %in% c("Indonesia", "Canada")) |>
  pivot_wider(names_from = country, values_from = value) |> 
  drop_na()

## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

co2_compare

## # A tibble: 126 × 3
##    name  Canada Indonesia
##    <chr>  <dbl>     <dbl>
##  1 1889   11300      3.67
##  2 1890   12400     14.7 
##  3 1891   14000     22   
##  4 1892   13900    180   
##  5 1893   14900    403   
##  6 1894   14200    480   
##  7 1895   13500    770   
##  8 1896   14700    862   
##  9 1897   14800   1110   
## 10 1898   15700   1290   
## # ℹ 116 more rows

P.2

Modify the code from Question P.1:

Select different countries to compare
Call your data co2_compare2

co2_compare2 <-
  read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") |>
  pivot_longer(cols = !country) |>
  filter(country %in% c("Brazil", "Mexico")) |>
  pivot_wider(names_from = country, values_from = value) |> 
  drop_na()

## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

co2_compare2

## # A tibble: 114 × 3
##    name  Brazil Mexico
##    <chr>  <dbl>  <dbl>
##  1 1901    2100   1310
##  2 1902    2510   1430
##  3 1903    2440   1560
##  4 1904    2620   1720
##  5 1905    2800   1710
##  6 1906    3210   1920
##  7 1907    3450   2270
##  8 1908    3600   3770
##  9 1909    3610   3700
## 10 1910    4200   4160
## # ℹ 104 more rows

P.3

Add on to Question P.1 to add a new column. The new column values for each line should be “TRUE” if Canada has greater emissions than Indonesia for that year.

co2_compare <-
  co2_compare |> 
  mutate(Canada_greater = Canada > Indonesia)

co2_compare

## # A tibble: 126 × 4
##    name  Canada Indonesia Canada_greater
##    <chr>  <dbl>     <dbl> <lgl>         
##  1 1889   11300      3.67 TRUE          
##  2 1890   12400     14.7  TRUE          
##  3 1891   14000     22    TRUE          
##  4 1892   13900    180    TRUE          
##  5 1893   14900    403    TRUE          
##  6 1894   14200    480    TRUE          
##  7 1895   13500    770    TRUE          
##  8 1896   14700    862    TRUE          
##  9 1897   14800   1110    TRUE          
## 10 1898   15700   1290    TRUE          
## # ℹ 116 more rows

Part 2

2.1

Using the co2_long object, rename the columns using rename():

change name to year
change value to emissions

Reassign to co2_long.

# General format
new_data <- old_data |> rename(newcol1 = oldcol1, newcol2 = oldcol2)

co2_long <- co2_long |> rename(year = name, emissions = value)

2.2

Read in the climate change disasters dataset from the URL below. Save the dataset as an object called cc.

“https://daseh.org/data/Yearly_CC_Disasters_total_affected.csv”

This dataset shows the total number of people affected by climate disasters by country and year.

cc <- read_csv("https://daseh.org/data/Yearly_CC_disasters_total_affected.csv")

## Rows: 9720 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country
## dbl (2): year, total_affected
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2.3

For the co2_long dataset, use mutate() to make the “year” column numeric.

Reassign to co2_long.

co2_long <- co2_long |> mutate(year = as.numeric(year))

2.4

Join the datasets co2_long and cc by “year” and “country”. Keep all rows, even if there is no match. How many rows are there?

Call the dataset fj.

# General format
full_join(data1, data2, by = "")

fj <- full_join(co2_long, cc, by = c("year", "country"))
nrow(fj)

## [1] 55823

# 55,823 rows

2.5

Join the datasets co2_long and cc by “year” and “country”. Keep only rows in both datasets. How many rows are there?

Call the dataset ij.

# General format
inner_join(data1, data2, by = "")

ij <- inner_join(co2_long, cc, by = c("year", "country"))
nrow(ij)

## [1] 4585

# 4,585 rows

Practice on Your Own!

P.4

Inspect fj and ij. Why is there such a big difference between the row counts?

# There are many more years in the "co2_long" dataset than are present in the "cc" dataset. Could check out range to do this:
co2_long |> pull(year) |> range()

## [1] 1751 2014

cc |> pull(year) |> range()

## [1] 1980 2024

# Additionally, there are several indexes that don't match for country
# e.g., "Afghanistan" does not match with "Afghanistan, Islamic Rep. of"
# Perhaps the dataset could be cleaned up to account for these!

P.5

What happens if you leave out step 2.3? Try running all chunks, but place a # on line 183.

# the year column in co2_long is a character type if we don't take the step in 2.3. This prevents joining as the columns joined must be the same class.

Manipulating Data in R Lab - Key

Part 1

1.1

1.2

1.3

1.4

1.5

Practice on Your Own!

P.1

P.2

P.3

Part 2

2.1

2.2

2.3

2.4

2.5

Practice on Your Own!

P.4

P.5