Good data visualization helps you tell your data’s story with minimal misunderstanding.

Here I’m going to explain the datavis rules that I follow to create stable isotopes graphs, along with the associated R code. If you want to follow along and see all the code used for this page, download the R Markdown here and take a look around.

I’m going to use bone collagen data (carbon, nitrogen, sulphur) and strontium from tooth enamel from my PhD Thesis. These are bone samples from individuals excavated from two Polynesian sites, Bourewa (Fiji) and ’Atele (Tonga).

I’ve cleaned the data and put it in this repository as ThesisData.xlsx. Maybe later I’ll share some R tips for cleaning data, but today let’s focus on datavis. Let’s focus on one component at a time, thinking about why it might be better to go in one art direction rather than another, and the underlying R code for making your data follow this datavis rule.

Let’s compare a simple carbon and nitrogen graph, with the sites color-coded. The left is letting ggplot decide all aspects of the graph, while the right is how I would prepare a graph for publication.

The first isn’t too bad, especially not for intitial data exploration, but it’s not ready for presentation. The plot on the right is publication-ready in my eyes, other than not being a high enough DPI for many journals.

Let’s build the graph, one piece at a time.

1. The Background

I love dark themes. I’m writing in one of RStudio’s right now. But, unless you’re creating graphs for a dark-themed presentation or web output, graphs with black backgrounds aren’t going to work well in publication and no one will like you when they’re printing out your paper.

Some people like grid lines but I prefer a clean slate. The first image below is the carbon and nitrogen data plotted using ggplot, with no theme. The second image is the same data, but using the theme_classic() option.

2. Axis Text and Ticks

2.1 Values

When displaying any interval data, don’t display any more decimal places than needed. Otherwise, the graph becomes increasingly difficult to read. Yes, I know that many stable isotopes data are presented to the first or even second decimal. But you don’t need decimal points on your axes for isotope data other than strontium- save that richness of data for tables and text. This might be my biggest pet peeve for the smallest reason. It’s hard to add extraneous decimal places in R, so I’m mostly looking at you, Excel users.

Awful Axes, Part I

Make sure the values text is large enough to read easily for most. Err on the side of caution, some publishers like to shove your figure into one column of a two-column article layout.

I sometimes angle the x-axis values because I think that makes me look fancy.

2.2 Ticks and value range

Be mindful of the value range you use. R and Excel try to be helpful, but there’s two major ways they can be wrong.

Starting at zero

Again, looking at you Excel.

Awful Axes, Part II

Axis range too narrow

This one’s trickier, and I often see researchers still learning about stable isotopes make this mistake. A population eating only from a C3 environment with no C4 or marine input might show a range in C-values of only 2 per mill. That doesn’t mean your x-axis should only be 2 per mill across! This encourages you and your readers to think about differences in values on too small of a scale, and see patterns that aren’t biologically meanful. Even if values are tight, widen out to include the expected environmental range.

2.3 Tick Units

Adjust your major units on the axes accordingly to the range you have. Too few tick marks, and it’s hard to estimate the values using the graph and the reader has to delve into your raw data. Too many tick marks, and it’s visually really busy. R likes to do it’s best, but don’t be afraid to be specific, especially for strontium data.

2.4 Axis Titles

For each axis displaying isotopic values, you’ll want:

What’s being measured (e.g. \(\delta ^{13}\)C, \(\delta ^{15}\)N)
The unit (i.e. per mill ‰)
The international reference used

For ggplot the labs section is a big help, but it’s not easy to superscript, subscript, include Greek characters, or lesser used scientific units. Below is the code you can copy/paste for easy use in your own graphs. Change out x and y depending on what axis it’s on, and always remember to check that you’ve got matching brackets.

For carbon from collagen, standardized to VPDB: x = expression(paste(delta^13, "C"[collagen], " (\u2030, VPDB)"))

Oxygen from carbonate, standarized to VSMOW, would look like this: x = expression(paste(delta^18, "O"[carbonate], " (\u2030, VSMOW)"))

For strontium ratios: y = expression(paste(""^{87},"Sr/"^86,"Sr"))

3. Labels

Sometimes it’s nice to label the points. The package ggrepel let’s you make sure your text labels don’t overlap. My example of the basic graphics in ggplot a little harsh, as I could have used hjust = 0, nudge_x = 0.05 to move the text off of the points, but you can see that geom_text_repel is quick and easy.

If you want to only label a few points on the graph, you can either geom_text_repel using the data(subset = ...) or annotate('text')

4. The Data Points

This is my favorite part, where the graph really starts to shine.

4.1 Size

I like to make the points a little bigger than the base size, so long as they’re not overlapping.

4.2 Color

I love color. I often try to create my own color palettes for data presentation, keeping in mind that they should be color-blind and printer friendly. When I don’t want to create a bespoke palette, I use the package viridis. A lot of people like RColorBrewer as well.

The Viridis package isn’t great for comparing two groups, so you’re best off coming up with your own color scheme at that point.

4.3 Shapes

I like to use both color and shape where possible, to help with set apart groups. Generally, I just let ggplot decide on the shapes.

5. Absolute Legend

We’re hitting the point of tiny changes to make your graph just that much nicer. For the legend, as with your x and y axis titles, your variable names might not be formatted in a way you want people to see. In the example we’ve been looking at, my color and shape are determined by variable site which isn’t too terrible to read, but I like capitalizing my legend title. I also like to outline my legend box. There’s two different ways to do this, but I use the one that most consistently makes a line around the entirety of the box, not just a weird shadow.

When I have some whitespace in the top right corner, I try to make use of that by manually moving the legend over.

6. Exporting to 5000 DPI

Journals will often ask for images to be 300 dpi or greater, presumably to put them on a billboard in Amsterdam. Here’s a handy bit of code to copy, alter, and paste for export to whatever image type and dpi you need.

Figure_Name <- your ggplot code here
dev.off()
ggsave(filename = "FigName.tiff", Figure_Name, width = 270, height = 135, units = c("mm"), dpi = 300)

You can change the image type (.tiff, .jpg, etc.), dpi, whatever you need.

7. Bar Graphs For Strontium?

No. Never.

8. What are your thoughts?

This is an early version (v 1.2), please let me know what I’ve missed either through Github or through Twitter if you want to see more isotopes-specific graphing in R.

Stable Isotopes for Archaeology: Data Visualization

C Stantis, PhD