This week, we're going to become wizards of ggplot2, the best way to create graphics in R.
Adapted from a great tutorial by Rebecca Barter.
The layered grammar of graphics
ggplot2 is based around three ideas. They make more sense in practice, but it's helpful to outline them up front:
- data: a data frame containing the variables that you want to visualize
- geoms: geometric objects (circles, lines, text) that you will actually see
- aesthetics: the mapping from the data to the geographic objects (e.g. by describing position, size, colour, etc)
1.Let's download our data and start making a chart
We're using the the gapminder dataset again this week.
First, fire up R Studio. Then, we'll load everything we need and remind ourselves what this data looks like.
library(tidyverse)
library(gapminder)
head(gapminder)
So ggplot. We tell it what data (a data frame) we are interested in and how each of the variables in our dataset will be used (e.g. as an x or y coordinate, as a coloring variable or a size variable, etc).
The output of this function is a grid with gdpPercap as the x-axis and lifeExp as the y-axis. However, we have not yet told ggplot what type of geometric object the data will be mapped to, so no data has been displayed.
Essentially, we've created the grid for the chart. But not the chart yet.
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp))
Next, we will add a “geom” layer to our ggplot object. This one will be points.
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
# add a points layer on top
geom_point()
Now we're talking.
2. Transparency, color, size
What we've done is map each country (row) in the data to a point in the space defined by the GDP and life expectancy value. The end result is a fascinating blob of points. Fortunately, there are many things that we can do to make this blob of points look better.
One possibility? Change the transparency of the points by setting the transparency.
Let's change the 'alpha' argument.
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5)
What other tweaks could we make? How about changing the color of the points to be blue instead of black, and making the points smaller.
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5, col = "cornflowerblue", size = 0.5)
As you can see, ggplot will change many things at the same time.
But what if we want different colors for the points, based on the continent of each country?
We can make use of the aes() function. Let check out what those continents are first.
unique(gapminder$continent)
Got it. Ok, so we can plug that continent vector into ggplot, and ask it to color the points differently, depending on what continent the country represents.
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(alpha = 0.5, size = 0.5)
Nice! There's other things we can change, too, like the size of the points. Say we want to make those correspond to the population of the country.
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
geom_point(alpha = 0.5)
3. Other chart types
So far, we have only seen scatterplots (point geoms). But there are many other geoms we could add, including:
- lines
- histograms
- boxplots and violin plots
- barplots
- smoothed curves
Let's try some out
What's different about this one?
ggplot(gapminder, aes(x = year, y = lifeExp, group = country, color = continent)) +
geom_line(alpha = 0.5)
How might this one be different?
ggplot(gapminder, aes(x = continent, y = lifeExp, fill = continent)) +
geom_boxplot()
Let's bust out a historgram.
ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram(binwidth = 3)
And finally let's try to find how a mathematical model might interpret our data. Don't publish these, but they can be helpful for internal use.
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, size = pop)) +
geom_point(aes(color = continent), alpha = 0.5) +
geom_smooth(se = FALSE, method = "loess", color = "grey30")
4.Let's make something publishable
We want a focused chart to show readers. So let’s filter to a single year.
gapminder_2007 <- gapminder %>% filter(year == 2007)
ggplot(gapminder_2007, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
geom_point(alpha = 0.5)
Let's get fancy and use a logorithmic scale. Who kjnows what that is?
Here's how we do that.
ggplot(gapminder_2007, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
geom_point(alpha = 0.5) +
scale_x_log10()
Now it's time to add a title and change the name of the y-axis and legends using the labs function.
ggplot(gapminder_2007, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
# add scatter points
geom_point(alpha = 0.5) +
# log-scale the x-axis
scale_x_log10() +
# change labels
labs(title = "GDP versus life expectancy in 2007",
x = "GDP per capita (log scale)",
y = "Life expectancy",
size = "Popoulation",
color = "Continent")
5. Themes
Mabye you, like me, kinda hate this gray background.
Well, you can change it with the ggthemes package.
install.packages("ggthemes")
library(ggthemes)
Let's take a look at some of the available themes and try them out on that last chart.
Finally, let's go crazy and see if we can figure out everything this is doing.
ggplot(gapminder_2007, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
# add scatter points
geom_point(alpha = 0.5) +
# clean the axes names and breaks
scale_x_log10(breaks = c(1000, 10000),
limits = c(200, 120000)) +
# change labels
labs(title = "GDP versus life expectancy in 2007",
x = "GDP per capita (log scale)",
y = "Life expectancy",
size = "Popoulation (millions)",
color = "Continent") +
# change the size scale
scale_size(range = c(0.1, 10),
breaks = 1000000 * c(250, 500, 750, 1000, 1250),
labels = c("250", "500", "750", "1000", "1250")) +
# add a nicer theme
theme_classic(base_family = "Helvetica")
And let's talk about how to save a chart you create in R.
- R Data Viz Assignment. Using data from your capstone or Final Project create two charts using ggplot.
- If you need inspiration, use code from our walkthrough today. Or, take a look at some of these simple cool R chart examples.
- Due on Monday by 5 PM.
- Story memo: 50-100 words about Final Project progress over last week