Some data in this lab comes from the OCS “Exploring CO2 emissions across time” activity (https://www.opencasestudies.org/ocs-bp-co2-emissions/.

Additional data about climate change disasters can be found at “https://daseh.org/data/Yearly_CC_Disasters.csv”.

library(tidyverse)

Part 1

1.1

Open the yearly_co2_emissions dataset. Use read_csv() from the tidyverse / readr package. You can download the data or use this URL directly: https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv

Check out the data to understand the format.

co2 <- read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(co2)
## # A tibble: 6 × 265
##   country  `1751` `1752` `1753` `1754` `1755` `1756` `1757` `1758` `1759` `1760`
##   <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Afghani…     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 2 Albania      NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 3 Algeria      NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 4 Andorra      NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 5 Angola       NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 6 Antigua…     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## # ℹ 254 more variables: `1761` <dbl>, `1762` <dbl>, `1763` <dbl>, `1764` <dbl>,
## #   `1765` <dbl>, `1766` <dbl>, `1767` <dbl>, `1768` <dbl>, `1769` <dbl>,
## #   `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>, `1774` <dbl>,
## #   `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>, `1779` <dbl>,
## #   `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>, `1784` <dbl>,
## #   `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>, `1789` <dbl>,
## #   `1790` <dbl>, `1791` <dbl>, `1792` <dbl>, `1793` <dbl>, `1794` <dbl>, …

1.2

Create a new dataset “co2_long” that does pivot_longer() on all columns except “country”. Remember that !country means all columns except “country”.

Reassign to co2_long.

# General format
new_data <- old_data %>% pivot_longer(cols = colname(s))
co2_long <- co2 %>% pivot_longer(cols = !country)

1.3

Using co2_long, filter the “country” column so it only includes values from Indonesia and Canada. Hint: use filter and %in%.

Reassign to co2_long.

# General format
new_data <- old_data %>% filter(colname %in% c(...))
co2_long <- co2_long %>%
  filter(country %in% c("Indonesia", "Canada"))

1.4

Use pivot_wider to reshape “co2_long”. Use “county” for the names_from argument. Use “value” for the values_from argument. Call this new data co2_wide.

# General format
new_data <- old_data %>% pivot_wider(names_from = column1, values_from = column2)
co2_wide <- co2_long %>%
  pivot_wider(
    names_from = country,
    values_from = value
  )

1.5

Using co2_wide, drop all NA values using drop_na().

Reassign to co2_wide. Compare the years - what conclusions can you draw?

co2_wide <- co2_wide %>% drop_na()

Tip: you can adjust scientific notation with options(scipen=<n>)

Practice on Your Own!

P.1

Take the code from Questions 1.1 - 1.5. Chain all of this code together using the pipe %>%. Call your data co2_compare.

co2_compare <-
  read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") %>%
  pivot_longer(cols = !country) %>%
  filter(country %in% c("Indonesia", "Canada")) %>%
  pivot_wider(names_from = country, values_from = value) %>% 
  drop_na()
## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
co2_compare
## # A tibble: 126 × 3
##    name  Canada Indonesia
##    <chr>  <dbl>     <dbl>
##  1 1889   11300      3.67
##  2 1890   12400     14.7 
##  3 1891   14000     22   
##  4 1892   13900    180   
##  5 1893   14900    403   
##  6 1894   14200    480   
##  7 1895   13500    770   
##  8 1896   14700    862   
##  9 1897   14800   1110   
## 10 1898   15700   1290   
## # ℹ 116 more rows

P.2

Modify the code from Question P.1:

  • Select different countries to compare
  • Call your data co2_compare2
co2_compare2 <-
  read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") %>%
  pivot_longer(cols = !country) %>%
  filter(country %in% c("Brazil", "Mexico")) %>%
  pivot_wider(names_from = country, values_from = value) %>% 
  drop_na()
## Rows: 192 Columns: 265
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (1): country
## dbl (264): 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
co2_compare2
## # A tibble: 114 × 3
##    name  Brazil Mexico
##    <chr>  <dbl>  <dbl>
##  1 1901    2100   1310
##  2 1902    2510   1430
##  3 1903    2440   1560
##  4 1904    2620   1720
##  5 1905    2800   1710
##  6 1906    3210   1920
##  7 1907    3450   2270
##  8 1908    3600   3770
##  9 1909    3610   3700
## 10 1910    4200   4160
## # ℹ 104 more rows

P.3

Add on to Question P.1 to add a column. The column values for each line should be “TRUE” if Canada has greater emissions than Indonesia for that year.

co2_compare <-
  co2_compare %>% 
  mutate(Canada_greater = Canada > Indonesia)

co2_compare
## # A tibble: 126 × 4
##    name  Canada Indonesia Canada_greater
##    <chr>  <dbl>     <dbl> <lgl>         
##  1 1889   11300      3.67 TRUE          
##  2 1890   12400     14.7  TRUE          
##  3 1891   14000     22    TRUE          
##  4 1892   13900    180    TRUE          
##  5 1893   14900    403    TRUE          
##  6 1894   14200    480    TRUE          
##  7 1895   13500    770    TRUE          
##  8 1896   14700    862    TRUE          
##  9 1897   14800   1110    TRUE          
## 10 1898   15700   1290    TRUE          
## # ℹ 116 more rows

Part 2

2.1

Open the Yearly_CC_Disasters dataset using the url below. Save the dataset as an object called “cc”. rename the column “Country” to “country”.

https://daseh.org/data/Yearly_CC_Disasters.csv

cc <- read_csv("https://daseh.org/data/Yearly_CC_Disasters.csv") %>%
  rename(country = Country)
## Rows: 970 Columns: 53
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): Country, ISO2, ISO3, Indicator, Unit, Source, CTS Code, CTS Name, ...
## dbl (44): ObjectId, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 19...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2.2

Join the datasets “co2” and “cc” by “country”. Keep all rows, even if there is no match. How many rows are there?

# General format
full_join(data1, data2, by = "")
full_join(co2, cc, by = "country")
## # A tibble: 1,032 × 317
##    country `1751` `1752` `1753` `1754` `1755` `1756` `1757` `1758` `1759` `1760`
##    <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 Afghan…     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  2 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  3 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  4 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  5 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  6 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  7 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  8 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  9 Algeria     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 10 Algeria     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## # ℹ 1,022 more rows
## # ℹ 306 more variables: `1761` <dbl>, `1762` <dbl>, `1763` <dbl>, `1764` <dbl>,
## #   `1765` <dbl>, `1766` <dbl>, `1767` <dbl>, `1768` <dbl>, `1769` <dbl>,
## #   `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>, `1774` <dbl>,
## #   `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>, `1779` <dbl>,
## #   `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>, `1784` <dbl>,
## #   `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>, `1789` <dbl>, …
# 1,032 rows

2.3

Join the datasets “co2” and “cc” by “country”. Keep only rows that match both datasets. How many rows are there?

# General format
inner_join(data1, data2, by = "")
inner_join(co2, cc, by = "country")
## # A tibble: 628 × 317
##    country `1751` `1752` `1753` `1754` `1755` `1756` `1757` `1758` `1759` `1760`
##    <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  2 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  3 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  4 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  5 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  6 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  7 Albania     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  8 Algeria     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
##  9 Algeria     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 10 Algeria     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## # ℹ 618 more rows
## # ℹ 306 more variables: `1761` <dbl>, `1762` <dbl>, `1763` <dbl>, `1764` <dbl>,
## #   `1765` <dbl>, `1766` <dbl>, `1767` <dbl>, `1768` <dbl>, `1769` <dbl>,
## #   `1770` <dbl>, `1771` <dbl>, `1772` <dbl>, `1773` <dbl>, `1774` <dbl>,
## #   `1775` <dbl>, `1776` <dbl>, `1777` <dbl>, `1778` <dbl>, `1779` <dbl>,
## #   `1780` <dbl>, `1781` <dbl>, `1782` <dbl>, `1783` <dbl>, `1784` <dbl>,
## #   `1785` <dbl>, `1786` <dbl>, `1787` <dbl>, `1788` <dbl>, `1789` <dbl>, …
# 628 rows

2.4

What countries are present in “co2” that are not present in “cc”? Use anti_join. What countries are present in “cc” that are not present in “co2”?

# General format
anti_join(data1, data2, by = "") %>% select(columnname)
anti_join(co2, cc, by = "country") %>% select(country)
## # A tibble: 62 × 1
##    country                 
##    <chr>                   
##  1 Afghanistan             
##  2 Andorra                 
##  3 Armenia                 
##  4 Azerbaijan              
##  5 Bahamas                 
##  6 Bahrain                 
##  7 Belarus                 
##  8 Brunei                  
##  9 Cape Verde              
## 10 Central African Republic
## # ℹ 52 more rows
anti_join(cc, co2, by = "country") %>% select(country) %>% distinct()
## # A tibble: 85 × 1
##    country                     
##    <chr>                       
##  1 Afghanistan, Islamic Rep. of
##  2 American Samoa              
##  3 Anguilla                    
##  4 Armenia, Rep. of            
##  5 Azerbaijan, Rep. of         
##  6 Azores Island               
##  7 Bahamas, The                
##  8 Belarus, Rep. of            
##  9 Bermuda                     
## 10 British Virgin Islands      
## # ℹ 75 more rows

Practice on Your Own!

P.4

Take the code from 2.2 and save the output as an object “co2_cc”. Filter the dataset. Filter so that you only keep Indonesia and Canada.

co2_cc <- full_join(co2, cc, by = "country") %>%
  filter(country %in% c("Indonesia", "Canada"))

P.5

Select:

  • the “country” column
  • data from the years 2014 in BOTH DATASETS (columns “2014.x” and 2014.y)
  • the “Indicator” column

Rename:

  • emissions = 2014.x
  • disasters = 2014.y

Reassign to “co2_cc”.

co2_cc <- co2_cc %>% select(country, `2014.x`, Indicator, `2014.y`) %>%
  rename(emissions = `2014.x`, disasters = `2014.y`)

P.6

Use stringr to trim the piece of text, “Climate related disasters frequency, Number of Disasters:”, from the “Indicator” column. You will use the function str_remove() to do this. It works similarly to other stringr functions. Try to intuit how it works by using the documentation page (?str_remove).

Reassign to “co2_cc”.

library(stringr)
co2_cc <- co2_cc %>% mutate(Indicator = str_remove(
  Indicator,
  "Climate related disasters frequency, Number of Disasters: "
))

P.7

Pivot the dataset so that there are columns for country, emissions, and a column for each “Indicator”.

co2_cc %>% pivot_wider(
  names_from = Indicator,
  values_from = disasters
)
## # A tibble: 2 × 9
##   country   emissions Drought `Extreme temperature` Flood Storm TOTAL Wildfire
##   <chr>         <dbl>   <dbl>                 <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 Canada       537000      NA                     1     1    NA     2       NA
## 2 Indonesia    464000      NA                    NA     6    NA     8       NA
## # ℹ 1 more variable: Landslide <dbl>