Show Absolute and Percent Change in a Time Series

Tidytuesday week 5 - USDA milk dataset

Here I’ll show you step by step the plot in the tweet below. In this plot you can show explicitly (and redundantly, but this is not always bad) absolute and relative changes among points in a time series.

To produce this plot, you just need to mix components from ggplot2, and the wonderful scico colour palette.

1 Load Packages

First you need to load the required packages.

library(tidyverse)
library(tibbletime)
library(scico)

The tidyverse is a collection of packages for data analysis, that contains the functions that we will use to load, manipulate and plot the data.

The package tibbletime stores the function rollify. This function allows you to apply another function, such as mean or sum over windows of time, or, as the name says, to “roll”" it.

The package scico stores wonderful color palettes, but more about that later.

2 Get the data

I use this snippet of code to download data automatically and only once.

# Get data Milk products ---------------------------------------------------

dat_path <- "_data/2-05-milk-product-facts.Rdata"
dat_url <- paste0("https://raw.githubusercontent.com/",
                  "rfordatascience/tidytuesday/master/data/",
                  "2019/2019-01-29/milk_products_facts.csv")

if(!file.exists(dat_path)) {
  dat_milkprods <- 
    read_csv(dat_url)
  
  save(dat_milkprods, file = dat_path)
  
} else {
  load(dat_path)
}

The original data come from USDA, but this version has been already tidied by Thomas Mock as part of the of the TidyTuesday weekly social data project. So we have little data manipulation left to do.

The milkprods dataset is already tidy, but it stores many columns that we don’t need.

dat_milkprods %>% print()
## # A tibble: 43 x 18
##     year fluid_milk fluid_yogurt butter cheese_american cheese_other
##    <int>      <int>        <dbl>  <dbl>           <dbl>        <dbl>
##  1  1975        247         1.97   4.73            8.15         6.13
##  2  1976        247         2.13   4.31            8.88         6.63
##  3  1977        244         2.34   4.29            9.21         6.78
##  4  1978        241         2.45   4.35            9.53         7.31
##  5  1979        238         2.44   4.49            9.60         7.57
##  6  1980        234         2.50   4.47            9.62         7.90
##  7  1981        230         2.44   4.24           10.2          8.03
##  8  1982        224         2.58   4.35           11.3          8.60
##  9  1983        223         3.16   4.91           11.6          8.96
## 10  1984        224         3.55   4.98           11.9          9.62
## # ... with 33 more rows, and 12 more variables: cheese_cottage <dbl>,
## #   evap_cnd_canned_whole_milk <dbl>, evap_cnd_bulk_whole_milk <dbl>,
## #   evap_cnd_bulk_and_can_skim_milk <dbl>, frozen_ice_cream_regular <dbl>,
## #   frozen_ice_cream_reduced_fat <dbl>, frozen_sherbet <dbl>,
## #   frozen_other <dbl>, dry_whole_milk <dbl>, dry_nonfat_milk <dbl>,
## #   dry_buttermilk <dbl>, dry_whey <dbl>

3 Wrangle and Estimate Percentage Changes

We need just 2 variables and we can easily extract them with dplyr::select.

A bit more complex: we need to estimate percentage changes from the day before. This is swiflty done with rollify. We can use this function factory to build the function roll_percent() calculates percentage change over a column of a dataframe.

# Percent ---------------------------------------------------------

# roll percent over a dataframe
roll_percent <- rollify(.f = function(n) (n[2] - n[1])*100/n[1], 2)

dat <- 
  dat_milkprods %>%
  select(year, butter) %>% 
  # apply on this dataframe, on the column butter
  mutate(percent = roll_percent(butter)) %>% 
  filter(complete.cases(.))

So, this is the clean dataframe that we use for plotting:

dat %>% print()
## # A tibble: 42 x 3
##     year butter percent
##    <int>  <dbl>   <dbl>
##  1  1976   4.31  -8.78 
##  2  1977   4.29  -0.441
##  3  1978   4.35   1.41 
##  4  1979   4.49   3.14 
##  5  1980   4.47  -0.528
##  6  1981   4.24  -5.13 
##  7  1982   4.35   2.67 
##  8  1983   4.91  12.8  
##  9  1984   4.98   1.42 
## 10  1985   4.87  -2.09 
## # ... with 32 more rows

4 A bit of style

You can set the plot styles at any time, let’s do it now.

Below I modify the theme_minimal from ggplot2 with some fonts and colours that I like. I devised this setting and modified them from what Kieran Healy does.

theme_set(
  theme_minimal() +
    theme(text = element_text(family = "Arial Narrow",
                              colour = "grey40",
                              size = 11),
          axis.title = element_text(size = 14),
          plot.title = element_text(colour = "grey20",
                                    face = "bold",
                                    size = 18),
          plot.subtitle = element_text(face = "bold",
                                       size = 12),
          aspect.ratio = .6,   
          plot.margin = margin(t = 10, r = 15, b = 0, l = 10,
                               unit = "mm"))
)

See setting the theme with theme_set() in ggplot2 as if you where applying a CSS file to your website. All plots below will be produced according to this theme.

5 Plotting (aka the fun part ;) )

First you can set the basic aesthetic mapping. All elements of the plot will have the variable year mapped to the x axis and butter mapped to the y axis.

5.1 Set the basic aesthetic mapping

Below, I also use a small dplyr trick, setting the yend variable, just before plotting. This variable doesn’t add anything to the dataset, but I just need it when I plot the percentage changes, to make the look good.

With the tidyverse and ggplot2 you have at least two choices for setting variables on the fly:

  • Right before plotting, in a pipe, as I’m doing here,
  • Directly within ggplot2 when you define the aesthetic mapping, as I will do later.
p <- 
  dat %>% 
  mutate(yend = butter + (percent/10)) %>% 
  ggplot(aes(x = year,
             y = butter))

At this point I’ve mapped basic aesthetic to the plot, but I did not specify any geometric object to appear, this the plot is empty.

p %>% print()

5.2 Add the first geometric objects

I’ll start with the grey annotation square, and the text, just because I want them below any other object of the plot.

Then I add the key geometric elements, the dots, that are mapped to the absolute value of the butter sold in the US (lbs per person) and the arrows that are mapped to the percentage changes.

Note that you can call aes() inside a the call for a geometric object, such as geom_point(). In this way you can map a variable to an aesthetic exclusively in that geometric object.

p <- 
  p +
  # First the annotations
  annotate(geom = "rect",
           xmin = 2008, xmax = 2010,
           ymin = -Inf, ymax = Inf,
           fill = "grey80", alpha = .5) +
  annotate(geom = "text",
           x = 2009, y = 4,
           label = "2008\nEconomic Crisis?",
           family = "Arial Narrow",
           colour = "grey40",
           size = 3, fontface = "bold") +
  # and then the basic geometric objects
  geom_segment(aes(yend = yend,
                   xend = ..x..,
                   colour = percent),
               size = 2,
               arrow = arrow(length = unit(1.2, "mm"),
                             type = "closed")) +
  geom_point(colour = "grey40", size = 2)

p %>% print()

5.3 Specify percent changes explicitly

We can specify percent changes explicitly with numbers, making it easier for readers to learn the specific size of the effect.

I’ll do it with geom_text().

The numbers must appear above the arrow, if the percentage change is positive, and below if it is negaitve. We can specify this using case_when() a vectorized ifelse statement.I specify it directly in the call to the aesthetic mapping.

p <- 
  p +
  geom_text(aes(y = case_when(percent > 0 ~ yend + .12,
                              TRUE ~ yend - .12),
                label = percent %>% 
                  round() %>% paste0("%"),
                colour = percent),
            size = 2.7)

p %>% print()

5.4 Colors colors colors

Colors: are they necessary? Are they to be avoided? Well, they make your plot look good, and for sure they are helpful. How can we get the most out of them.

Beside looking good, a colour palette today must have two properties:

  • Be colour blind friendly (no need to explain this),
  • Be perceptively uniform, or at least perceptively reasonable (i.e. it should not let you guess pattern that are not there in the data).

The concept of perceptively uniform is explained clearly in the vignette of the viridis package.

Beside the beautiful viridis palette, I also love those in the package scico. This package was developed by Thomas Lin Pedersen and it ports into R the color palettes developed by Fabio Crameri.

I use the roma scale, which is divergent and colorful, and map percent change to it.

I want the desaturated center of the divergent palette to be mapped to the 0% changes, but percentage data are not equally distributed around zero, so we can do it manually by setting artificial limits to the colot mapping.

# needed to center divergent palette
lim <- 
  dat$percent %>% 
  range() %>% 
  abs() %>% 
  max()


p  +
  scale_colour_scico(palette = "roma",
                   direction = 1,
                   limits = c(-lim, lim),
                   guide = FALSE) +
  guides(colour = element_blank()) +
  labs(title = "Fluctuations in Butter Consumptions",
       subtitle = str_wrap("In the US between 1975 - 2017,
                           with weight of sold butter in lbs 
                           and its percent change compared to
                           the previous year."),
       y = "Sold Butter in lbs",
       x = "Year",
       caption = "Data: USDA | Plot by @othomn") 
p
# Save --------------------------------------------------------------------

png(filename = "plots/2-05-milk.png",
    height = 1600, width = 2100,
    res = 300)
p %>% print()
dev.off() 
Avatar
Otho Mantegazza
Biologist, Data Scientist
comments powered by Disqus