Some data in this lab comes from the OCS “Exploring CO2 emissions across time” activity (https://www.opencasestudies.org/ocs-bp-co2-emissions/).
Additional data about climate change disasters can be found at “https://daseh.org/data/Yearly_CC_Disasters_total_affected.csv”. This dataset shows the total number of people affected by climate disasters by country and year.
library(tidyverse)
Open the yearly_co2_emissions
dataset. Use read_csv()
from the tidyverse
/ readr
package. You can download the data or use this URL directly: https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv
Check out the data to understand the format.
co2 <- read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(co2)
## # A tibble: 6 × 265
## country `1751` `1752` `1753` `1754` `1755` `1756` `1757` `1758` `1759` `1760`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghani… NA NA NA NA NA NA NA NA NA NA
## 2 Albania NA NA NA NA NA NA NA NA NA NA
## 3 Algeria NA NA NA NA NA NA NA NA NA NA
## 4 Andorra NA NA NA NA NA NA NA NA NA NA
## 5 Angola NA NA NA NA NA NA NA NA NA NA
## 6 Antigua… NA NA NA NA NA NA NA NA NA NA
## # ℹ 254 more variables: `1761` <dbl>, `1762` <dbl>, `1763` <dbl>, `1764` <dbl>,
## # `1765` <dbl>, `1766` <dbl>, `1767` <dbl>, `1768` <dbl>, `1769` <dbl>,
## # `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>, `1774` <dbl>,
## # `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>, `1779` <dbl>,
## # `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>, `1784` <dbl>,
## # `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>, `1789` <dbl>,
## # `1790` <dbl>, `1791` <dbl>, `1792` <dbl>, `1793` <dbl>, `1794` <dbl>, …
Create a new dataset co2_long
that does pivot_longer()
on all columns except “country”. Remember that !country
means all columns except “country”.
Reassign to co2_long
.
# General format
new_data <- old_data |> pivot_longer(cols = colname(s))
co2_long <- co2 |> pivot_longer(cols = !country)
Using co2_long
, filter the “country” column so it only includes values from Indonesia and Canada. Hint: use filter()
and %in%
.
Name the new object co2_long_sub
.
# General format
new_data <- old_data |> filter(colname %in% c(...))
co2_long_sub <- co2_long |>
filter(country %in% c("Indonesia", "Canada"))
Use pivot_wider()
to reshape co2_long_sub
. Use “country” for the names_from
argument. Use “value” for the values_from
argument.
Call this new data co2_wide
.
# General format
new_data <- old_data |> pivot_wider(names_from = column1, values_from = column2)
co2_wide <- co2_long_sub |>
pivot_wider(
names_from = country,
values_from = value
)
Using co2_wide
, drop all NA values using drop_na()
.
Reassign to co2_wide
. Compare the years - what conclusions can you draw, comparing the two countries?
co2_wide <- co2_wide |> drop_na()
tail(co2_wide)
## # A tibble: 6 × 3
## name Canada Indonesia
## <chr> <dbl> <dbl>
## 1 2009 537000 446000
## 2 2010 535000 429000
## 3 2011 537000 604000
## 4 2012 517000 637000
## 5 2013 517000 490000
## 6 2014 537000 464000
# Indonesia emissions are lower to start, but become closer in value to Canada over time. However, Indonesia has many more people!
Take the code from Questions 1.1 - 1.5. Chain all of this code together using the pipe |>
. Call your data co2_compare
.
co2_compare <-
read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") |>
pivot_longer(cols = !country) |>
filter(country %in% c("Indonesia", "Canada")) |>
pivot_wider(names_from = country, values_from = value) |>
drop_na()
## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
co2_compare
## # A tibble: 126 × 3
## name Canada Indonesia
## <chr> <dbl> <dbl>
## 1 1889 11300 3.67
## 2 1890 12400 14.7
## 3 1891 14000 22
## 4 1892 13900 180
## 5 1893 14900 403
## 6 1894 14200 480
## 7 1895 13500 770
## 8 1896 14700 862
## 9 1897 14800 1110
## 10 1898 15700 1290
## # ℹ 116 more rows
Modify the code from Question P.1:
co2_compare2
co2_compare2 <-
read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") |>
pivot_longer(cols = !country) |>
filter(country %in% c("Brazil", "Mexico")) |>
pivot_wider(names_from = country, values_from = value) |>
drop_na()
## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
co2_compare2
## # A tibble: 114 × 3
## name Brazil Mexico
## <chr> <dbl> <dbl>
## 1 1901 2100 1310
## 2 1902 2510 1430
## 3 1903 2440 1560
## 4 1904 2620 1720
## 5 1905 2800 1710
## 6 1906 3210 1920
## 7 1907 3450 2270
## 8 1908 3600 3770
## 9 1909 3610 3700
## 10 1910 4200 4160
## # ℹ 104 more rows
Add on to Question P.1 to add a new column. The new column values for each line should be “TRUE” if Canada has greater emissions than Indonesia for that year.
co2_compare <-
co2_compare |>
mutate(Canada_greater = Canada > Indonesia)
co2_compare
## # A tibble: 126 × 4
## name Canada Indonesia Canada_greater
## <chr> <dbl> <dbl> <lgl>
## 1 1889 11300 3.67 TRUE
## 2 1890 12400 14.7 TRUE
## 3 1891 14000 22 TRUE
## 4 1892 13900 180 TRUE
## 5 1893 14900 403 TRUE
## 6 1894 14200 480 TRUE
## 7 1895 13500 770 TRUE
## 8 1896 14700 862 TRUE
## 9 1897 14800 1110 TRUE
## 10 1898 15700 1290 TRUE
## # ℹ 116 more rows
Using the co2_long
object, rename the columns using rename()
:
name
to year
value
to emissions
Reassign to co2_long
.
# General format
new_data <- old_data |> rename(newcol1 = oldcol1, newcol2 = oldcol2)
co2_long <- co2_long |> rename(year = name, emissions = value)
Read in the climate change disasters dataset from the URL below. Save the dataset as an object called cc
.
“https://daseh.org/data/Yearly_CC_Disasters_total_affected.csv”
This dataset shows the total number of people affected by climate disasters by country and year.
cc <- read_csv("https://daseh.org/data/Yearly_CC_disasters_total_affected.csv")
## Rows: 9720 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country
## dbl (2): year, total_affected
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
For the co2_long
dataset, use mutate()
to make the “year” column numeric.
Reassign to co2_long
.
co2_long <- co2_long |> mutate(year = as.numeric(year))
Join the datasets co2_long
and cc
by “year” and “country”. Keep all rows, even if there is no match. How many rows are there?
Call the dataset fj
.
# General format
full_join(data1, data2, by = "")
fj <- full_join(co2_long, cc, by = c("year", "country"))
nrow(fj)
## [1] 55823
# 55,823 rows
Join the datasets co2_long
and cc
by “year” and “country”. Keep only rows in both datasets. How many rows are there?
Call the dataset ij
.
# General format
inner_join(data1, data2, by = "")
ij <- inner_join(co2_long, cc, by = c("year", "country"))
nrow(ij)
## [1] 4585
# 4,585 rows
Inspect fj
and ij
. Why is there such a big difference between the row counts?
# There are many more years in the "co2_long" dataset than are present in the "cc" dataset. Could check out range to do this:
co2_long |> pull(year) |> range()
## [1] 1751 2014
cc |> pull(year) |> range()
## [1] 1980 2024
# Additionally, there are several indexes that don't match for country
# e.g., "Afghanistan" does not match with "Afghanistan, Islamic Rep. of"
# Perhaps the dataset could be cleaned up to account for these!
What happens if you leave out step 2.3? Try running all chunks, but place a #
on line 183.
# the year column in co2_long is a character type if we don't take the step in 2.3. This prevents joining as the columns joined must be the same class.