Some data in this lab comes from the OCS “Exploring CO2 emissions across time” activity (https://www.opencasestudies.org/ocs-bp-co2-emissions/.
Additional data about climate change disasters can be found at “https://daseh.org/data/Yearly_CC_Disasters.csv”.
library(tidyverse)
Open the yearly_co2_emissions
dataset. Use read_csv()
from the tidyverse
/ readr
package. You can download the data or use this URL directly: https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv
Check out the data to understand the format.
co2 <- read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(co2)
## # A tibble: 6 × 265
## country `1751` `1752` `1753` `1754` `1755` `1756` `1757` `1758` `1759` `1760`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghani… NA NA NA NA NA NA NA NA NA NA
## 2 Albania NA NA NA NA NA NA NA NA NA NA
## 3 Algeria NA NA NA NA NA NA NA NA NA NA
## 4 Andorra NA NA NA NA NA NA NA NA NA NA
## 5 Angola NA NA NA NA NA NA NA NA NA NA
## 6 Antigua… NA NA NA NA NA NA NA NA NA NA
## # ℹ 254 more variables: `1761` <dbl>, `1762` <dbl>, `1763` <dbl>, `1764` <dbl>,
## # `1765` <dbl>, `1766` <dbl>, `1767` <dbl>, `1768` <dbl>, `1769` <dbl>,
## # `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>, `1774` <dbl>,
## # `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>, `1779` <dbl>,
## # `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>, `1784` <dbl>,
## # `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>, `1789` <dbl>,
## # `1790` <dbl>, `1791` <dbl>, `1792` <dbl>, `1793` <dbl>, `1794` <dbl>, …
Create a new dataset “co2_long” that does pivot_longer()
on all columns except “country”. Remember that !country
means all columns except “country”.
Reassign to co2_long.
# General format
new_data <- old_data %>% pivot_longer(cols = colname(s))
co2_long <- co2 %>% pivot_longer(cols = !country)
Using co2_long
, filter the “country” column so it only includes values from Indonesia and Canada. Hint: use filter
and %in%
.
Reassign to co2_long.
# General format
new_data <- old_data %>% filter(colname %in% c(...))
co2_long <- co2_long %>%
filter(country %in% c("Indonesia", "Canada"))
Use pivot_wider
to reshape “co2_long”. Use “county” for the names_from
argument. Use “value” for the values_from
argument. Call this new data co2_wide
.
# General format
new_data <- old_data %>% pivot_wider(names_from = column1, values_from = column2)
co2_wide <- co2_long %>%
pivot_wider(
names_from = country,
values_from = value
)
Using co2_wide
, drop all NA values using drop_na()
.
Reassign to co2_wide. Compare the years - what conclusions can you draw?
co2_wide <- co2_wide %>% drop_na()
Tip: you can adjust scientific notation with options(scipen=<n>)
Take the code from Questions 1.1 - 1.5. Chain all of this code together using the pipe %>%
. Call your data co2_compare
.
co2_compare <-
read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") %>%
pivot_longer(cols = !country) %>%
filter(country %in% c("Indonesia", "Canada")) %>%
pivot_wider(names_from = country, values_from = value) %>%
drop_na()
## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
co2_compare
## # A tibble: 126 × 3
## name Canada Indonesia
## <chr> <dbl> <dbl>
## 1 1889 11300 3.67
## 2 1890 12400 14.7
## 3 1891 14000 22
## 4 1892 13900 180
## 5 1893 14900 403
## 6 1894 14200 480
## 7 1895 13500 770
## 8 1896 14700 862
## 9 1897 14800 1110
## 10 1898 15700 1290
## # ℹ 116 more rows
Modify the code from Question P.1:
co2_compare2
co2_compare2 <-
read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") %>%
pivot_longer(cols = !country) %>%
filter(country %in% c("Brazil", "Mexico")) %>%
pivot_wider(names_from = country, values_from = value) %>%
drop_na()
## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
co2_compare2
## # A tibble: 114 × 3
## name Brazil Mexico
## <chr> <dbl> <dbl>
## 1 1901 2100 1310
## 2 1902 2510 1430
## 3 1903 2440 1560
## 4 1904 2620 1720
## 5 1905 2800 1710
## 6 1906 3210 1920
## 7 1907 3450 2270
## 8 1908 3600 3770
## 9 1909 3610 3700
## 10 1910 4200 4160
## # ℹ 104 more rows
Add on to Question P.1 to add a column. The column values for each line should be “TRUE” if Canada has greater emissions than Indonesia for that year.
co2_compare <-
co2_compare %>%
mutate(Canada_greater = Canada > Indonesia)
co2_compare
## # A tibble: 126 × 4
## name Canada Indonesia Canada_greater
## <chr> <dbl> <dbl> <lgl>
## 1 1889 11300 3.67 TRUE
## 2 1890 12400 14.7 TRUE
## 3 1891 14000 22 TRUE
## 4 1892 13900 180 TRUE
## 5 1893 14900 403 TRUE
## 6 1894 14200 480 TRUE
## 7 1895 13500 770 TRUE
## 8 1896 14700 862 TRUE
## 9 1897 14800 1110 TRUE
## 10 1898 15700 1290 TRUE
## # ℹ 116 more rows
Open the Yearly_CC_Disasters
dataset using the url below. Save the dataset as an object called “cc”. rename the column “Country” to “country”.
“https://daseh.org/data/Yearly_CC_Disasters.csv”
cc <- read_csv("https://daseh.org/data/Yearly_CC_Disasters.csv") %>%
rename(country = Country)
## Rows: 970 Columns: 53
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): Country, ISO2, ISO3, Indicator, Unit, Source, CTS Code, CTS Name, ...
## dbl (44): ObjectId, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 19...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Join the datasets “co2” and “cc” by “country”. Keep all rows, even if there is no match. How many rows are there?
# General format
full_join(data1, data2, by = "")
full_join(co2, cc, by = "country")
## # A tibble: 1,032 × 317
## country `1751` `1752` `1753` `1754` `1755` `1756` `1757` `1758` `1759` `1760`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan… NA NA NA NA NA NA NA NA NA NA
## 2 Albania NA NA NA NA NA NA NA NA NA NA
## 3 Albania NA NA NA NA NA NA NA NA NA NA
## 4 Albania NA NA NA NA NA NA NA NA NA NA
## 5 Albania NA NA NA NA NA NA NA NA NA NA
## 6 Albania NA NA NA NA NA NA NA NA NA NA
## 7 Albania NA NA NA NA NA NA NA NA NA NA
## 8 Albania NA NA NA NA NA NA NA NA NA NA
## 9 Algeria NA NA NA NA NA NA NA NA NA NA
## 10 Algeria NA NA NA NA NA NA NA NA NA NA
## # ℹ 1,022 more rows
## # ℹ 306 more variables: `1761` <dbl>, `1762` <dbl>, `1763` <dbl>, `1764` <dbl>,
## # `1765` <dbl>, `1766` <dbl>, `1767` <dbl>, `1768` <dbl>, `1769` <dbl>,
## # `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>, `1774` <dbl>,
## # `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>, `1779` <dbl>,
## # `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>, `1784` <dbl>,
## # `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>, `1789` <dbl>, …
# 1,032 rows
Join the datasets “co2” and “cc” by “country”. Keep only rows that match both datasets. How many rows are there?
# General format
inner_join(data1, data2, by = "")
inner_join(co2, cc, by = "country")
## # A tibble: 628 × 317
## country `1751` `1752` `1753` `1754` `1755` `1756` `1757` `1758` `1759` `1760`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Albania NA NA NA NA NA NA NA NA NA NA
## 2 Albania NA NA NA NA NA NA NA NA NA NA
## 3 Albania NA NA NA NA NA NA NA NA NA NA
## 4 Albania NA NA NA NA NA NA NA NA NA NA
## 5 Albania NA NA NA NA NA NA NA NA NA NA
## 6 Albania NA NA NA NA NA NA NA NA NA NA
## 7 Albania NA NA NA NA NA NA NA NA NA NA
## 8 Algeria NA NA NA NA NA NA NA NA NA NA
## 9 Algeria NA NA NA NA NA NA NA NA NA NA
## 10 Algeria NA NA NA NA NA NA NA NA NA NA
## # ℹ 618 more rows
## # ℹ 306 more variables: `1761` <dbl>, `1762` <dbl>, `1763` <dbl>, `1764` <dbl>,
## # `1765` <dbl>, `1766` <dbl>, `1767` <dbl>, `1768` <dbl>, `1769` <dbl>,
## # `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>, `1774` <dbl>,
## # `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>, `1779` <dbl>,
## # `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>, `1784` <dbl>,
## # `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>, `1789` <dbl>, …
# 628 rows
What countries are present in “co2” that are not present in “cc”? Use anti_join
. What countries are present in “cc” that are not present in “co2”?
# General format
anti_join(data1, data2, by = "") %>% select(columnname)
anti_join(co2, cc, by = "country") %>% select(country)
## # A tibble: 62 × 1
## country
## <chr>
## 1 Afghanistan
## 2 Andorra
## 3 Armenia
## 4 Azerbaijan
## 5 Bahamas
## 6 Bahrain
## 7 Belarus
## 8 Brunei
## 9 Cape Verde
## 10 Central African Republic
## # ℹ 52 more rows
anti_join(cc, co2, by = "country") %>% select(country) %>% distinct()
## # A tibble: 85 × 1
## country
## <chr>
## 1 Afghanistan, Islamic Rep. of
## 2 American Samoa
## 3 Anguilla
## 4 Armenia, Rep. of
## 5 Azerbaijan, Rep. of
## 6 Azores Island
## 7 Bahamas, The
## 8 Belarus, Rep. of
## 9 Bermuda
## 10 British Virgin Islands
## # ℹ 75 more rows
Take the code from 2.2 and save the output as an object “co2_cc”. Filter the dataset. Filter so that you only keep Indonesia and Canada.
co2_cc <- full_join(co2, cc, by = "country") %>%
filter(country %in% c("Indonesia", "Canada"))
Select:
2014.y
)Rename:
2014.x
2014.y
Reassign to “co2_cc”.
co2_cc <- co2_cc %>% select(country, `2014.x`, Indicator, `2014.y`) %>%
rename(emissions = `2014.x`, disasters = `2014.y`)
Use stringr
to trim the piece of text, “Climate related disasters frequency, Number of Disasters:”, from the “Indicator” column. You will use the function str_remove()
to do this. It works similarly to other stringr
functions. Try to intuit how it works by using the documentation page (?str_remove
).
Reassign to “co2_cc”.
library(stringr)
co2_cc <- co2_cc %>% mutate(Indicator = str_remove(
Indicator,
"Climate related disasters frequency, Number of Disasters: "
))
Pivot the dataset so that there are columns for country, emissions, and a column for each “Indicator”.
co2_cc %>% pivot_wider(
names_from = Indicator,
values_from = disasters
)
## # A tibble: 2 × 9
## country emissions Drought `Extreme temperature` Flood Storm TOTAL Wildfire
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Canada 537000 NA 1 1 NA 2 NA
## 2 Indonesia 464000 NA NA 6 NA 8 NA
## # ℹ 1 more variable: Landslide <dbl>