December 10, 2019
This weeks’s installment of Tidy Tuesday is all about replicating professional plots in R. Inspired by Rafael Irizarry’s post “You can replicate almost any plot with R”, the goal is to take otherwise professional publication-ready plots and make them in R (usually ggplot2).
I was interested in this Tidy Tuesday because some of my past work has been dedicated to creating publication-ready plots. Because the first visualization I ever created was inspired by (a replication of?) this visualization from Bloomberg graphics, I decided to set out on a journey to make that plot as close as possible to the real thing.
The real goal of this week’s Tidy Tuesday is using the data that Rafael posted to create other cool visualizations; I took a slightly different approach to try to recreate another visualization entirely. What follows is an interactive recreation of the visualization above, using Shiny and plotly.
First, we read in the data. This process was a bit complicated as I kind of had to guess where Bloomberg pulled their data from.
I relied on three datasets:
1) Educational attainment broke down by occupation, provided by BLS here
2) Salaries, median hourly/annual wages broke down by occupation, provided by BLS here
3) Risk of automation broken down by occupation, provided by Carl Benedikt Frey and Michael A. Osborne (but compiled here)
In another post, I detail the data cleaning process. I’ll spare you the details here.
Now we create the UI, as is the case for any Shiny app. This is pretty simple: first, we add the title panel and beautify it with some CSS.
Next, we add the main panel, which includes a) the plot object, b) the footnote, and c) some CSS.
That’s it!
Now we can define the server()
function, where the real magic of this visualization happens.
All of the following takes place in the server <- function(input, output, session) {}
function.
We know we’re going to need a ggplot object. In my case, we’ll need a plot object which relies on probability, median income, and risk of automation.
This creates the base of the object.
We also know that, like the Bloomberg visualization we’re replicating, we’re going to want a tooltip.
That’s why we included text
in the above code, which we define here:
This tooltip takes in some CSS, some HTML, and creates a pretty tooltip! The glue
function is lovely.
The Bloomberg visualization is unique in that it has no axis lines. We can replicate that in ggplot2
via the following code:
But that’s not all!
The Bloomberg visualization is also unique in that it doesn’t have axis titles. Moreover, the axis labels are a bit unique; the x axis increases sequentially by 10 until 90 where it transitions into ‘90%’ (the % is not present in the earlier numbers).
We can mimic that kind of styling with this code:
We create a bit of a buffer on the limits
argument so that we can add annotations. We’ll get to that later!
To get as close as possible to Bloomberg’s plot, I’d also like to mimic their color scheme. I pulled the colors from their dotplot with this awesome Chrome plugin; then, I added them to R with the following:
In the plot object, we reference this with the following:
This essentially creates a fill scale (manually) with specified hex codes for colors. I also tried to manipulate the order of the legend but that didn’t translate to plotly (a documented problem, I believe).
Finally, we do something really hacky: add a regression line with geom_segment
. (I’m so sorry, R gods.)
We now have the ggplot object created; let’s convert it to a plotly object.
This process relies on the ggplotly
function, which reads in a previously defined ggplot
object and converts into an interactive plotly one.
After creating the base plotly object, we move to some more complicated steps:
We’d like the legend to orient horizontally, right above the plot. We do that with the following (inside the layout
function):
This does a few things. First, it orients the legend horizontally. Second, it anchors the legend to the left. Third, it defines the location (using x-y pairs) of the legend. traceorder
is meant to maintain the previous order from ggplot
, but that didn’t work in my version. itemsizing
is meant to keep the legend items with a constant size, as opposed to dynamic relative to the plot objects themselves. This also didn’t work. The last two arguments define the spacing between points and the font size of the legend text!
We also see the Bloomberg viz has a right-aligned Y-axis. We can add that to plotly via the following code:
Finally, we add three commands to the layout
function.
This a) changes the font of the plot, b) adds a small margin, and c) stylizes the tooltip on hover.
The last step is to mimic Bloomberg’s annotations. This is a little tough, specifically because it requires pretty specific x- and y-values.
First, we’ll add their guiding annotations (that replace axis labels) that you can find in each corner:
Next, we add annotations for ‘most and least likely to be automated’, as well as the y axis label.
And finally, add a couple of plot annotations which label specific points. (We are not labelling a hundred occupations like Bloomberg did.)
Finalize the plotly object with
We’re done! Run the application with the following code:
And we’re done! Find my interactive visualization here. Find the code, uninterrupted and (hopefully) reproducible, here.
Here’s the Bloomberg visualization:
And here’s mine: