Making Great Tables with gt
Formatting Tables
This week, we will practice making nice, report worthy, tables!
I would recommend you think of tables no different from the visualizations you’ve been making. We want all aspects of our tables to be clear to the reader, so the comparisons we want them to make are straightforward. You should be thinking about:
- Column headers
- Grouping headers
- Order of columns
- Order of rows
- Number of decimals included for numeric entries
- etc.
Tables are also a great avenue to display creativity! In fact, there is a yearly RStudio table contest, and here is a gallery of the award winning tables!
There are many packages for generating tables but we recommend the gt (greatables) R package.
For simple tables, the basic gt() function from the gt package works quite well. For more sophisticated tables, you will need to add-on additional styling functions (e.g., cols_label(), tab_header(), fmt_percent()).
Making Simple Tables
Let’s explore the gt package using the palmerpenguins dataset. We can start with a simple table with information about the dataset. Let’s display the data type of each variable in the palmerpenguins dataset:
| Variable | Data Type |
|---|---|
| species | factor |
| island | factor |
| bill_length_mm | numeric |
| bill_depth_mm | numeric |
| flipper_length_mm | integer |
| body_mass_g | integer |
| sex | factor |
| year | integer |
You’ll notice that by default, gt produces a nicely formatted HTML table. If I wanted to make the table more professional, I would think about adding the following features:
- rows ordered to make the information easy to understand
- bolded column names
- a table caption or header
penguins |>
map_chr(.f = class) |>
enframe(name = "Variable", value = "Data Type") |>
arrange(`Data Type`) |>
gt() |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
) |>
tab_header(
title = md("Data Types of Variables in the `penguins` Dataset")
) |>
tab_caption(
caption = md("Example 2: `gt` table with simple extras.")
)Data Types of Variables in the penguins Dataset |
|
| Variable | Data Type |
|---|---|
| species | factor |
| island | factor |
| sex | factor |
| flipper_length_mm | integer |
| body_mass_g | integer |
| year | integer |
| bill_length_mm | numeric |
| bill_depth_mm | numeric |
You’ll notice in the documentation for tab_style() that there are many additional styling options (e.g., color) that we’re not using here.
Make More Complex Tables
To demonstrate making a more complex table, we’re going to translate this article from the gt package into the penguins data.
Row Groups
To display observations related to a specific group, we can use group_by() before passing the data into gt(). This step creates a row for each unique value of species:
penguins |>
group_by(species) |>
slice_head(n = 10) |>
gt()| island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
|---|---|---|---|---|---|---|
| Adelie | ||||||
| Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
| Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
| Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
| Torgersen | NA | NA | NA | NA | NA | 2007 |
| Torgersen | 36.7 | 19.3 | 193 | 3450 | female | 2007 |
| Torgersen | 39.3 | 20.6 | 190 | 3650 | male | 2007 |
| Torgersen | 38.9 | 17.8 | 181 | 3625 | female | 2007 |
| Torgersen | 39.2 | 19.6 | 195 | 4675 | male | 2007 |
| Torgersen | 34.1 | 18.1 | 193 | 3475 | NA | 2007 |
| Torgersen | 42.0 | 20.2 | 190 | 4250 | NA | 2007 |
| Chinstrap | ||||||
| Dream | 46.5 | 17.9 | 192 | 3500 | female | 2007 |
| Dream | 50.0 | 19.5 | 196 | 3900 | male | 2007 |
| Dream | 51.3 | 19.2 | 193 | 3650 | male | 2007 |
| Dream | 45.4 | 18.7 | 188 | 3525 | female | 2007 |
| Dream | 52.7 | 19.8 | 197 | 3725 | male | 2007 |
| Dream | 45.2 | 17.8 | 198 | 3950 | female | 2007 |
| Dream | 46.1 | 18.2 | 178 | 3250 | female | 2007 |
| Dream | 51.3 | 18.2 | 197 | 3750 | male | 2007 |
| Dream | 46.0 | 18.9 | 195 | 4150 | female | 2007 |
| Dream | 51.3 | 19.9 | 198 | 3700 | male | 2007 |
| Gentoo | ||||||
| Biscoe | 46.1 | 13.2 | 211 | 4500 | female | 2007 |
| Biscoe | 50.0 | 16.3 | 230 | 5700 | male | 2007 |
| Biscoe | 48.7 | 14.1 | 210 | 4450 | female | 2007 |
| Biscoe | 50.0 | 15.2 | 218 | 5700 | male | 2007 |
| Biscoe | 47.6 | 14.5 | 215 | 5400 | male | 2007 |
| Biscoe | 46.5 | 13.5 | 210 | 4550 | female | 2007 |
| Biscoe | 45.4 | 14.6 | 211 | 4800 | female | 2007 |
| Biscoe | 46.7 | 15.3 | 219 | 5200 | male | 2007 |
| Biscoe | 43.3 | 13.4 | 209 | 4400 | female | 2007 |
| Biscoe | 46.8 | 15.4 | 215 | 5150 | male | 2007 |
Similar to what I did before, I could arrange this table based on a variable:
penguins |>
group_by(species) |>
arrange(body_mass_g) |>
slice_head(n = 10) |>
gt()| island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
|---|---|---|---|---|---|---|
| Adelie | ||||||
| Biscoe | 36.5 | 16.6 | 181 | 2850 | female | 2008 |
| Biscoe | 36.4 | 17.1 | 184 | 2850 | female | 2008 |
| Biscoe | 34.5 | 18.1 | 187 | 2900 | female | 2008 |
| Dream | 33.1 | 16.1 | 178 | 2900 | female | 2008 |
| Torgersen | 38.6 | 17.0 | 188 | 2900 | female | 2009 |
| Biscoe | 37.9 | 18.6 | 193 | 2925 | female | 2009 |
| Dream | 37.5 | 18.9 | 179 | 2975 | NA | 2007 |
| Dream | 37.0 | 16.9 | 185 | 3000 | female | 2007 |
| Dream | 37.3 | 16.8 | 192 | 3000 | female | 2009 |
| Torgersen | 35.9 | 16.6 | 190 | 3050 | female | 2008 |
| Chinstrap | ||||||
| Dream | 46.9 | 16.6 | 192 | 2700 | female | 2008 |
| Dream | 43.2 | 16.6 | 187 | 2900 | female | 2007 |
| Dream | 40.9 | 16.6 | 187 | 3200 | female | 2008 |
| Dream | 46.1 | 18.2 | 178 | 3250 | female | 2007 |
| Dream | 51.5 | 18.7 | 187 | 3250 | male | 2009 |
| Dream | 45.2 | 16.6 | 191 | 3250 | female | 2009 |
| Dream | 50.3 | 20.0 | 197 | 3300 | male | 2007 |
| Dream | 46.7 | 17.9 | 195 | 3300 | female | 2007 |
| Dream | 48.1 | 16.4 | 199 | 3325 | female | 2009 |
| Dream | 42.5 | 16.7 | 187 | 3350 | female | 2008 |
| Gentoo | ||||||
| Biscoe | 42.7 | 13.7 | 208 | 3950 | female | 2008 |
| Biscoe | 44.5 | 14.3 | 216 | 4100 | NA | 2007 |
| Biscoe | 42.0 | 13.5 | 210 | 4150 | female | 2007 |
| Biscoe | 45.8 | 14.6 | 210 | 4200 | female | 2007 |
| Biscoe | 45.5 | 13.9 | 210 | 4200 | female | 2008 |
| Biscoe | 45.3 | 13.8 | 208 | 4200 | female | 2008 |
| Biscoe | 45.3 | 13.7 | 210 | 4300 | female | 2008 |
| Biscoe | 43.8 | 13.9 | 208 | 4300 | female | 2008 |
| Biscoe | 44.0 | 13.6 | 208 | 4350 | female | 2008 |
| Biscoe | 46.2 | 14.1 | 217 | 4375 | female | 2009 |
Hiding Columns & Moving Columns
We can use our handy friend the select() function to remove columns we aren’t interested in (-c(sex, year, island)) and to reorder the columns we want to keep. Now it is much easier to see that the table is arranged based on body_mass_g since it is the first variable that appears in the table!
| bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g |
|---|---|---|---|
| Adelie | |||
| 36.5 | 16.6 | 181 | 2850 |
| 36.4 | 17.1 | 184 | 2850 |
| 34.5 | 18.1 | 187 | 2900 |
| 33.1 | 16.1 | 178 | 2900 |
| 38.6 | 17.0 | 188 | 2900 |
| 37.9 | 18.6 | 193 | 2925 |
| 37.5 | 18.9 | 179 | 2975 |
| 37.0 | 16.9 | 185 | 3000 |
| 37.3 | 16.8 | 192 | 3000 |
| 35.9 | 16.6 | 190 | 3050 |
| Chinstrap | |||
| 46.9 | 16.6 | 192 | 2700 |
| 43.2 | 16.6 | 187 | 2900 |
| 40.9 | 16.6 | 187 | 3200 |
| 46.1 | 18.2 | 178 | 3250 |
| 51.5 | 18.7 | 187 | 3250 |
| 45.2 | 16.6 | 191 | 3250 |
| 50.3 | 20.0 | 197 | 3300 |
| 46.7 | 17.9 | 195 | 3300 |
| 48.1 | 16.4 | 199 | 3325 |
| 42.5 | 16.7 | 187 | 3350 |
| Gentoo | |||
| 42.7 | 13.7 | 208 | 3950 |
| 44.5 | 14.3 | 216 | 4100 |
| 42.0 | 13.5 | 210 | 4150 |
| 45.8 | 14.6 | 210 | 4200 |
| 45.5 | 13.9 | 210 | 4200 |
| 45.3 | 13.8 | 208 | 4200 |
| 45.3 | 13.7 | 210 | 4300 |
| 43.8 | 13.9 | 208 | 4300 |
| 44.0 | 13.6 | 208 | 4350 |
| 46.2 | 14.1 | 217 | 4375 |
Using Formatter Functions
The gt package also comes with powerful functions for reformatting the text displayed in the columns. For example, if you are working with data on currencies, the fmt_currency() function would help you format the column so every value has the necessary currency strings (e.g., $100.35). Similarly, the fmt_percent() will format every value in a column to have a % symbol after the value (e.g., 75%).
To explore these functions, we will need to do a bit of data summarizing. Let’s calculate the mean for each of the penguin measurement columns.
Notice that I’m calculating the mean for four different columns. It would be difficult to read, cumbersome to type, and error prone to use four different lines within summarize() to do these calculations. Instead, let’s use the across() function!
| species | bill_length_mm_mean | bill_depth_mm_mean | flipper_length_mm_mean | body_mass_g_mean |
|---|---|---|---|---|
| Adelie | 38.79139 | 18.34636 | 189.9536 | 3700.662 |
| Chinstrap | 48.83382 | 18.42059 | 195.8235 | 3733.088 |
| Gentoo | 47.50488 | 14.98211 | 217.1870 | 5076.016 |
The decimal values for these columns are a bit too long. Let’s use this as an opportunity to explore the fmt_number() function:
penguins |>
select(-c(sex, year, island),
body_mass_g,
bill_depth_mm,
bill_length_mm,
flipper_length_mm) |>
group_by(species) |>
summarize(
across(.cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g),
.fns = ~mean(.x, na.rm = TRUE),
.names = "{.col}_mean"
)
) |>
gt() |>
fmt_number(columns = -species,
decimals = 2)| species | bill_length_mm_mean | bill_depth_mm_mean | flipper_length_mm_mean | body_mass_g_mean |
|---|---|---|---|---|
| Adelie | 38.79 | 18.35 | 189.95 | 3,700.66 |
| Chinstrap | 48.83 | 18.42 | 195.82 | 3,733.09 |
| Gentoo | 47.50 | 14.98 | 217.19 | 5,076.02 |
Much better! Now let’s see if we can clean up the column headers!
Putting Columns Into Groups
All of the measurements in this table are related to various aspects of a penguins morphology: bill, flipper, body mass. It seems like this could be a great place to play around with adding groups to these columns.
penguins |>
select(-c(sex, year, island),
body_mass_g,
bill_depth_mm,
bill_length_mm,
flipper_length_mm) |>
group_by(species) |>
summarize(
across(.cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g),
.fns = ~mean(.x, na.rm = TRUE),
.names = "{.col}_mean"
)
) |>
gt() |>
fmt_number(columns = -species,
decimals = 2) |>
tab_spanner(
label = "Bill Measurements",
columns = c(bill_length_mm_mean, bill_depth_mm_mean)
) |>
tab_spanner(
label = "Flipper Measurements",
columns = flipper_length_mm_mean
) |>
tab_spanner(
label = "Body Mass Measurements",
columns = body_mass_g_mean
) | species |
Bill Measurements
|
Flipper Measurements
|
Body Mass Measurements
|
|
|---|---|---|---|---|
| bill_length_mm_mean | bill_depth_mm_mean | flipper_length_mm_mean | body_mass_g_mean | |
| Adelie | 38.79 | 18.35 | 189.95 | 3,700.66 |
| Chinstrap | 48.83 | 18.42 | 195.82 | 3,733.09 |
| Gentoo | 47.50 | 14.98 | 217.19 | 5,076.02 |
Well, now that we have these groups, it seems like we could simplify the names of each group’s columns.
penguins |>
select(-c(sex, year, island),
body_mass_g,
bill_depth_mm,
bill_length_mm,
flipper_length_mm) |>
group_by(species) |>
summarize(
across(.cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g),
.fns = ~mean(.x, na.rm = TRUE),
.names = "{.col}_mean"
)
) |>
gt() |>
fmt_number(columns = -species,
decimals = 2) |>
tab_spanner(
label = "Bill Measurements",
columns = c(bill_length_mm_mean, bill_depth_mm_mean)
) |>
tab_spanner(
label = "Flipper Measurements",
columns = flipper_length_mm_mean
) |>
tab_spanner(
label = "Body Mass Measurements",
columns = body_mass_g_mean
) |>
cols_label(
bill_length_mm_mean = "Mean Length (mm)",
bill_depth_mm_mean = "Mean Depth (mm)",
flipper_length_mm_mean = "Mean Length (mm)",
body_mass_g_mean = "Mean Mass (g)",
species = "Penguin Species"
)| Penguin Species |
Bill Measurements
|
Flipper Measurements
|
Body Mass Measurements
|
|
|---|---|---|---|---|
| Mean Length (mm) | Mean Depth (mm) | Mean Length (mm) | Mean Mass (g) | |
| Adelie | 38.79 | 18.35 | 189.95 | 3,700.66 |
| Chinstrap | 48.83 | 18.42 | 195.82 | 3,733.09 |
| Gentoo | 47.50 | 14.98 | 217.19 | 5,076.02 |
Adding Color
For our last exploration, let’s add some color to the table! The data_color() function allows us to add colors to the cells of our table. The colors can be added to every cell (like shown below) or only added to select cells (based on a condition that is checked for the rows).
penguins |>
select(-c(sex, year, island),
body_mass_g,
bill_depth_mm,
bill_length_mm,
flipper_length_mm) |>
group_by(species) |>
summarize(
across(.cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g),
.fns = ~mean(.x, na.rm = TRUE),
.names = "{.col}_mean"
)
) |>
gt() |>
fmt_number(columns = -species,
decimals = 2) |>
tab_spanner(
label = "Bill Measurements",
columns = c(bill_length_mm_mean, bill_depth_mm_mean)
) |>
tab_spanner(
label = "Flipper Measurements",
columns = flipper_length_mm_mean
) |>
tab_spanner(
label = "Body Mass Measurements",
columns = body_mass_g_mean
) |>
cols_label(
bill_length_mm_mean = "Mean Length (mm)",
bill_depth_mm_mean = "Mean Depth (mm)",
flipper_length_mm_mean = "Mean Length (mm)",
body_mass_g_mean = "Mean Mass (g)",
species = "Penguin Species"
) |>
data_color(
method = "numeric",
palette = "PuOr",
reverse = TRUE
)| Penguin Species |
Bill Measurements
|
Flipper Measurements
|
Body Mass Measurements
|
|
|---|---|---|---|---|
| Mean Length (mm) | Mean Depth (mm) | Mean Length (mm) | Mean Mass (g) | |
| Adelie | 38.79 | 18.35 | 189.95 | 3,700.66 |
| Chinstrap | 48.83 | 18.42 | 195.82 | 3,733.09 |
| Gentoo | 47.50 | 14.98 | 217.19 | 5,076.02 |
Now go forth and make great tables!
- The
tab_style()function allows you to modify which aspects of a table?
- title
- subtitle
- text of the cells
- fill of the cells
- borders of the cells
- column labels
- row group labels
- footnotes
- source notes
- none of the above
- all of the above
To specify multiple styles in the
styleargument oftab_style(), the styles must be specified as a [vector / dataframe / list].Is it possible to specify multiple
locationsfor thestylesto be applied?
- Yes
- No
- Only if they can be selected with the same
cells_XXXX()function.
There are two ways to create row groups in
gt(). One option is to usegroup_by()[before / after] inputting the data intogt(). The second option is to usetab_row_group()[before / after] inputting the data intogt().The
cols_label()function is the only way to change the names of columns in agt()table.