Here I’ll show you step by step the plot in the tweet below. In this plot you can show explicitly (and redundantly, but this is not always bad) absolute and relative changes among points in a time series.
Late #TidyTuesday , #dataviz
— Otho (@othomn) February 1, 2019
Visualization practice, I'm showing consumption of butter and its percent change compared to the previous year.
I did this plot on “butter” just because its consumption changes.
Source: https://t.co/yNp2qkOVO5
Code: https://t.co/flNedKO1kh pic.twitter.com/yoTHspwBxZ
To produce this plot, you just need to mix components from ggplot2
, and the wonderful scico
colour palette.
1 Load Packages
First you need to load the required packages.
library(tidyverse)
library(tibbletime)
library(scico)
The tidyverse
is a collection of packages for data analysis, that contains the functions that we will use to load, manipulate and plot the data.
The package tibbletime
stores the function rollify
. This function allows you to apply another function, such as mean
or sum
over windows of time, or, as the name says, to “roll”" it.
The package scico stores wonderful color palettes, but more about that later.
2 Get the data
I use this snippet of code to download data automatically and only once.
# Get data Milk products ---------------------------------------------------
dat_path <- "_data/2-05-milk-product-facts.Rdata"
dat_url <- paste0("https://raw.githubusercontent.com/",
"rfordatascience/tidytuesday/master/data/",
"2019/2019-01-29/milk_products_facts.csv")
if(!file.exists(dat_path)) {
dat_milkprods <-
read_csv(dat_url)
save(dat_milkprods, file = dat_path)
} else {
load(dat_path)
}
The original data come from USDA, but this version has been already tidied by Thomas Mock as part of the of the TidyTuesday weekly social data project. So we have little data manipulation left to do.
The milkprods dataset is already tidy, but it stores many columns that we don’t need.
dat_milkprods %>% print()
## # A tibble: 43 x 18
## year fluid_milk fluid_yogurt butter cheese_american cheese_other
## <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1975 247 1.97 4.73 8.15 6.13
## 2 1976 247 2.13 4.31 8.88 6.63
## 3 1977 244 2.34 4.29 9.21 6.78
## 4 1978 241 2.45 4.35 9.53 7.31
## 5 1979 238 2.44 4.49 9.60 7.57
## 6 1980 234 2.50 4.47 9.62 7.90
## 7 1981 230 2.44 4.24 10.2 8.03
## 8 1982 224 2.58 4.35 11.3 8.60
## 9 1983 223 3.16 4.91 11.6 8.96
## 10 1984 224 3.55 4.98 11.9 9.62
## # ... with 33 more rows, and 12 more variables: cheese_cottage <dbl>,
## # evap_cnd_canned_whole_milk <dbl>, evap_cnd_bulk_whole_milk <dbl>,
## # evap_cnd_bulk_and_can_skim_milk <dbl>, frozen_ice_cream_regular <dbl>,
## # frozen_ice_cream_reduced_fat <dbl>, frozen_sherbet <dbl>,
## # frozen_other <dbl>, dry_whole_milk <dbl>, dry_nonfat_milk <dbl>,
## # dry_buttermilk <dbl>, dry_whey <dbl>
3 Wrangle and Estimate Percentage Changes
We need just 2 variables and we can easily extract them with dplyr::select
.
A bit more complex: we need to estimate percentage changes from the day before. This is swiflty done with rollify. We can use this function factory to build the function roll_percent()
calculates percentage change over a column of a dataframe.
# Percent ---------------------------------------------------------
# roll percent over a dataframe
roll_percent <- rollify(.f = function(n) (n[2] - n[1])*100/n[1], 2)
dat <-
dat_milkprods %>%
select(year, butter) %>%
# apply on this dataframe, on the column butter
mutate(percent = roll_percent(butter)) %>%
filter(complete.cases(.))
So, this is the clean dataframe that we use for plotting:
dat %>% print()
## # A tibble: 42 x 3
## year butter percent
## <int> <dbl> <dbl>
## 1 1976 4.31 -8.78
## 2 1977 4.29 -0.441
## 3 1978 4.35 1.41
## 4 1979 4.49 3.14
## 5 1980 4.47 -0.528
## 6 1981 4.24 -5.13
## 7 1982 4.35 2.67
## 8 1983 4.91 12.8
## 9 1984 4.98 1.42
## 10 1985 4.87 -2.09
## # ... with 32 more rows
4 A bit of style
You can set the plot styles at any time, let’s do it now.
Below I modify the theme_minimal
from ggplot2
with some fonts and colours that I like. I devised this setting and modified them from what Kieran Healy does.
theme_set(
theme_minimal() +
theme(text = element_text(family = "Arial Narrow",
colour = "grey40",
size = 11),
axis.title = element_text(size = 14),
plot.title = element_text(colour = "grey20",
face = "bold",
size = 18),
plot.subtitle = element_text(face = "bold",
size = 12),
aspect.ratio = .6,
plot.margin = margin(t = 10, r = 15, b = 0, l = 10,
unit = "mm"))
)
See setting the theme with theme_set()
in ggplot2 as if you where applying a CSS file to your website. All plots below will be produced according to this theme.
5 Plotting (aka the fun part ;) )
First you can set the basic aesthetic mapping. All elements of the plot will have the variable year
mapped to the x axis and butter
mapped to the y axis.
5.1 Set the basic aesthetic mapping
Below, I also use a small dplyr trick, setting the yend
variable, just before plotting. This variable doesn’t add anything to the dataset, but I just need it when I plot the percentage changes, to make the look good.
With the tidyverse
and ggplot2
you have at least two choices for setting variables on the fly:
- Right before plotting, in a pipe, as I’m doing here,
- Directly within
ggplot2
when you define the aesthetic mapping, as I will do later.
p <-
dat %>%
mutate(yend = butter + (percent/10)) %>%
ggplot(aes(x = year,
y = butter))
At this point I’ve mapped basic aesthetic to the plot, but I did not specify any geometric object to appear, this the plot is empty.
p %>% print()
5.2 Add the first geometric objects
I’ll start with the grey annotation square, and the text, just because I want them below any other object of the plot.
Then I add the key geometric elements, the dots, that are mapped to the absolute value of the butter sold in the US (lbs per person) and the arrows that are mapped to the percentage changes.
Note that you can call aes()
inside a the call for a geometric object, such as geom_point()
. In this way you can map a variable to an aesthetic exclusively in that geometric object.
p <-
p +
# First the annotations
annotate(geom = "rect",
xmin = 2008, xmax = 2010,
ymin = -Inf, ymax = Inf,
fill = "grey80", alpha = .5) +
annotate(geom = "text",
x = 2009, y = 4,
label = "2008\nEconomic Crisis?",
family = "Arial Narrow",
colour = "grey40",
size = 3, fontface = "bold") +
# and then the basic geometric objects
geom_segment(aes(yend = yend,
xend = ..x..,
colour = percent),
size = 2,
arrow = arrow(length = unit(1.2, "mm"),
type = "closed")) +
geom_point(colour = "grey40", size = 2)
p %>% print()
5.3 Specify percent changes explicitly
We can specify percent changes explicitly with numbers, making it easier for readers to learn the specific size of the effect.
I’ll do it with geom_text()
.
The numbers must appear above the arrow, if the percentage change is positive, and below if it is negaitve. We can specify this using case_when()
a vectorized ifelse statement.I specify it directly in the call to the aesthetic mapping.
p <-
p +
geom_text(aes(y = case_when(percent > 0 ~ yend + .12,
TRUE ~ yend - .12),
label = percent %>%
round() %>% paste0("%"),
colour = percent),
size = 2.7)
p %>% print()
5.4 Colors colors colors
Colors: are they necessary? Are they to be avoided? Well, they make your plot look good, and for sure they are helpful. How can we get the most out of them.
Beside looking good, a colour palette today must have two properties:
- Be colour blind friendly (no need to explain this),
- Be perceptively uniform, or at least perceptively reasonable (i.e. it should not let you guess pattern that are not there in the data).
The concept of perceptively uniform is explained clearly in the vignette of the viridis package.
Beside the beautiful viridis palette, I also love those in the package scico. This package was developed by Thomas Lin Pedersen and it ports into R the color palettes developed by Fabio Crameri.
I use the roma
scale, which is divergent and colorful, and map percent change to it.
I want the desaturated center of the divergent palette to be mapped to the 0% changes, but percentage data are not equally distributed around zero, so we can do it manually by setting artificial limits to the colot mapping.
# needed to center divergent palette
lim <-
dat$percent %>%
range() %>%
abs() %>%
max()
p +
scale_colour_scico(palette = "roma",
direction = 1,
limits = c(-lim, lim),
guide = FALSE) +
guides(colour = element_blank()) +
labs(title = "Fluctuations in Butter Consumptions",
subtitle = str_wrap("In the US between 1975 - 2017,
with weight of sold butter in lbs
and its percent change compared to
the previous year."),
y = "Sold Butter in lbs",
x = "Year",
caption = "Data: USDA | Plot by @othomn")
p
# Save --------------------------------------------------------------------
png(filename = "plots/2-05-milk.png",
height = 1600, width = 2100,
res = 300)
p %>% print()
dev.off()