Making Great Tables with gt

Formatting Tables

This week, we will practice making nice, report worthy, tables!

I would recommend you think of tables no different from the visualizations you’ve been making. We want all aspects of our tables to be clear to the reader, so the comparisons we want them to make are straightforward. You should be thinking about:

  • Column headers
  • Grouping headers
  • Order of columns
  • Order of rows
  • Number of decimals included for numeric entries
  • etc.

Tables are also a great avenue to display creativity! In fact, there is a yearly RStudio table contest, and here is a gallery of the award winning tables!

There are many packages for generating tables but we recommend the gt (greatables) R package.

For simple tables, the basic gt() function from the gt package works quite well. For more sophisticated tables, you will need to add-on additional styling functions (e.g., cols_label(), tab_header(), fmt_percent()).

Making Simple Tables

Let’s explore the gt package using the palmerpenguins dataset. We can start with a simple table with information about the dataset. Let’s display the data type of each variable in the palmerpenguins dataset:

penguins |> 
  map_chr(.f = class) |> 
  enframe(name = "Variable", value = "Data Type") |> 
  gt()
Variable Data Type
species factor
island factor
bill_length_mm numeric
bill_depth_mm numeric
flipper_length_mm integer
body_mass_g integer
sex factor
year integer

You’ll notice that by default, gt produces a nicely formatted HTML table. If I wanted to make the table more professional, I would think about adding the following features:

  • rows ordered to make the information easy to understand
  • bolded column names
  • a table caption or header
penguins |> 
  map_chr(.f = class) |> 
  enframe(name = "Variable", value = "Data Type") |> 
  arrange(`Data Type`) |> 
  gt() |> 
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
    ) |> 
  tab_header(
    title = md("Data Types of Variables in the `penguins` Dataset")
  ) |> 
  tab_caption(
    caption = md("Example 2: `gt` table with simple extras.")
    )
Example 2: gt table with simple extras.
Data Types of Variables in the penguins Dataset
Variable Data Type
species factor
island factor
sex factor
flipper_length_mm integer
body_mass_g integer
year integer
bill_length_mm numeric
bill_depth_mm numeric
TipBut wait there’s more!

You’ll notice in the documentation for tab_style() that there are many additional styling options (e.g., color) that we’re not using here.

Make More Complex Tables

To demonstrate making a more complex table, we’re going to translate this article from the gt package into the penguins data.

Row Groups

To display observations related to a specific group, we can use group_by() before passing the data into gt(). This step creates a row for each unique value of species:

penguins |> 
  group_by(species) |>
  slice_head(n = 10) |> 
  gt()
island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie
Torgersen 39.1 18.7 181 3750 male 2007
Torgersen 39.5 17.4 186 3800 female 2007
Torgersen 40.3 18.0 195 3250 female 2007
Torgersen NA NA NA NA NA 2007
Torgersen 36.7 19.3 193 3450 female 2007
Torgersen 39.3 20.6 190 3650 male 2007
Torgersen 38.9 17.8 181 3625 female 2007
Torgersen 39.2 19.6 195 4675 male 2007
Torgersen 34.1 18.1 193 3475 NA 2007
Torgersen 42.0 20.2 190 4250 NA 2007
Chinstrap
Dream 46.5 17.9 192 3500 female 2007
Dream 50.0 19.5 196 3900 male 2007
Dream 51.3 19.2 193 3650 male 2007
Dream 45.4 18.7 188 3525 female 2007
Dream 52.7 19.8 197 3725 male 2007
Dream 45.2 17.8 198 3950 female 2007
Dream 46.1 18.2 178 3250 female 2007
Dream 51.3 18.2 197 3750 male 2007
Dream 46.0 18.9 195 4150 female 2007
Dream 51.3 19.9 198 3700 male 2007
Gentoo
Biscoe 46.1 13.2 211 4500 female 2007
Biscoe 50.0 16.3 230 5700 male 2007
Biscoe 48.7 14.1 210 4450 female 2007
Biscoe 50.0 15.2 218 5700 male 2007
Biscoe 47.6 14.5 215 5400 male 2007
Biscoe 46.5 13.5 210 4550 female 2007
Biscoe 45.4 14.6 211 4800 female 2007
Biscoe 46.7 15.3 219 5200 male 2007
Biscoe 43.3 13.4 209 4400 female 2007
Biscoe 46.8 15.4 215 5150 male 2007

Similar to what I did before, I could arrange this table based on a variable:

penguins |> 
  group_by(species) |>
  arrange(body_mass_g) |> 
  slice_head(n = 10) |> 
  gt()
island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie
Biscoe 36.5 16.6 181 2850 female 2008
Biscoe 36.4 17.1 184 2850 female 2008
Biscoe 34.5 18.1 187 2900 female 2008
Dream 33.1 16.1 178 2900 female 2008
Torgersen 38.6 17.0 188 2900 female 2009
Biscoe 37.9 18.6 193 2925 female 2009
Dream 37.5 18.9 179 2975 NA 2007
Dream 37.0 16.9 185 3000 female 2007
Dream 37.3 16.8 192 3000 female 2009
Torgersen 35.9 16.6 190 3050 female 2008
Chinstrap
Dream 46.9 16.6 192 2700 female 2008
Dream 43.2 16.6 187 2900 female 2007
Dream 40.9 16.6 187 3200 female 2008
Dream 46.1 18.2 178 3250 female 2007
Dream 51.5 18.7 187 3250 male 2009
Dream 45.2 16.6 191 3250 female 2009
Dream 50.3 20.0 197 3300 male 2007
Dream 46.7 17.9 195 3300 female 2007
Dream 48.1 16.4 199 3325 female 2009
Dream 42.5 16.7 187 3350 female 2008
Gentoo
Biscoe 42.7 13.7 208 3950 female 2008
Biscoe 44.5 14.3 216 4100 NA 2007
Biscoe 42.0 13.5 210 4150 female 2007
Biscoe 45.8 14.6 210 4200 female 2007
Biscoe 45.5 13.9 210 4200 female 2008
Biscoe 45.3 13.8 208 4200 female 2008
Biscoe 45.3 13.7 210 4300 female 2008
Biscoe 43.8 13.9 208 4300 female 2008
Biscoe 44.0 13.6 208 4350 female 2008
Biscoe 46.2 14.1 217 4375 female 2009

Hiding Columns & Moving Columns

We can use our handy friend the select() function to remove columns we aren’t interested in (-c(sex, year, island)) and to reorder the columns we want to keep. Now it is much easier to see that the table is arranged based on body_mass_g since it is the first variable that appears in the table!

penguins |> 
  select(-c(sex, year, island), 
         body_mass_g, 
         bill_depth_mm, 
         bill_length_mm, 
         flipper_length_mm) |> 
  group_by(species) |>
  arrange(body_mass_g) |> 
  slice_head(n = 10) |> 
  gt()
bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
Adelie
36.5 16.6 181 2850
36.4 17.1 184 2850
34.5 18.1 187 2900
33.1 16.1 178 2900
38.6 17.0 188 2900
37.9 18.6 193 2925
37.5 18.9 179 2975
37.0 16.9 185 3000
37.3 16.8 192 3000
35.9 16.6 190 3050
Chinstrap
46.9 16.6 192 2700
43.2 16.6 187 2900
40.9 16.6 187 3200
46.1 18.2 178 3250
51.5 18.7 187 3250
45.2 16.6 191 3250
50.3 20.0 197 3300
46.7 17.9 195 3300
48.1 16.4 199 3325
42.5 16.7 187 3350
Gentoo
42.7 13.7 208 3950
44.5 14.3 216 4100
42.0 13.5 210 4150
45.8 14.6 210 4200
45.5 13.9 210 4200
45.3 13.8 208 4200
45.3 13.7 210 4300
43.8 13.9 208 4300
44.0 13.6 208 4350
46.2 14.1 217 4375

Using Formatter Functions

The gt package also comes with powerful functions for reformatting the text displayed in the columns. For example, if you are working with data on currencies, the fmt_currency() function would help you format the column so every value has the necessary currency strings (e.g., $100.35). Similarly, the fmt_percent() will format every value in a column to have a % symbol after the value (e.g., 75%).

To explore these functions, we will need to do a bit of data summarizing. Let’s calculate the mean for each of the penguin measurement columns.

ImportantEfficiency & Readability

Notice that I’m calculating the mean for four different columns. It would be difficult to read, cumbersome to type, and error prone to use four different lines within summarize() to do these calculations. Instead, let’s use the across() function!

penguins |> 
  select(-c(sex, year, island),
         species,
         body_mass_g, 
         bill_depth_mm, 
         bill_length_mm, 
         flipper_length_mm) |> 
  group_by(species) |>
  summarize(
    across(.cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g), 
           .fns = ~mean(.x, na.rm = TRUE), 
           .names = "{.col}_mean"
           )
    ) |> 
  gt()
Table 1
species bill_length_mm_mean bill_depth_mm_mean flipper_length_mm_mean body_mass_g_mean
Adelie 38.79139 18.34636 189.9536 3700.662
Chinstrap 48.83382 18.42059 195.8235 3733.088
Gentoo 47.50488 14.98211 217.1870 5076.016

The decimal values for these columns are a bit too long. Let’s use this as an opportunity to explore the fmt_number() function:

penguins |> 
  select(-c(sex, year, island), 
         body_mass_g, 
         bill_depth_mm, 
         bill_length_mm, 
         flipper_length_mm) |> 
  group_by(species) |>
  summarize(
    across(.cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g), 
           .fns = ~mean(.x, na.rm = TRUE), 
           .names = "{.col}_mean"
           )
    ) |> 
  gt() |> 
  fmt_number(columns = -species, 
             decimals = 2)
Table 2
species bill_length_mm_mean bill_depth_mm_mean flipper_length_mm_mean body_mass_g_mean
Adelie 38.79 18.35 189.95 3,700.66
Chinstrap 48.83 18.42 195.82 3,733.09
Gentoo 47.50 14.98 217.19 5,076.02

Much better! Now let’s see if we can clean up the column headers!

Putting Columns Into Groups

All of the measurements in this table are related to various aspects of a penguins morphology: bill, flipper, body mass. It seems like this could be a great place to play around with adding groups to these columns.

penguins |> 
  select(-c(sex, year, island), 
         body_mass_g, 
         bill_depth_mm, 
         bill_length_mm, 
         flipper_length_mm) |> 
  group_by(species) |>
  summarize(
    across(.cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g), 
           .fns = ~mean(.x, na.rm = TRUE), 
           .names = "{.col}_mean"
           )
    ) |> 
  gt() |> 
  fmt_number(columns = -species, 
             decimals = 2) |> 
  tab_spanner(
    label = "Bill Measurements",
    columns = c(bill_length_mm_mean, bill_depth_mm_mean)
  ) |> 
  tab_spanner(
    label = "Flipper Measurements",
    columns = flipper_length_mm_mean
  ) |> 
  tab_spanner(
    label = "Body Mass Measurements",
    columns = body_mass_g_mean
  ) 
species
Bill Measurements
Flipper Measurements
Body Mass Measurements
bill_length_mm_mean bill_depth_mm_mean flipper_length_mm_mean body_mass_g_mean
Adelie 38.79 18.35 189.95 3,700.66
Chinstrap 48.83 18.42 195.82 3,733.09
Gentoo 47.50 14.98 217.19 5,076.02

Well, now that we have these groups, it seems like we could simplify the names of each group’s columns.

penguins |> 
  select(-c(sex, year, island), 
         body_mass_g, 
         bill_depth_mm, 
         bill_length_mm, 
         flipper_length_mm) |> 
  group_by(species) |>
  summarize(
    across(.cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g), 
           .fns = ~mean(.x, na.rm = TRUE), 
           .names = "{.col}_mean"
           )
    ) |> 
  gt() |> 
  fmt_number(columns = -species, 
             decimals = 2) |> 
  tab_spanner(
    label = "Bill Measurements",
    columns = c(bill_length_mm_mean, bill_depth_mm_mean)
  ) |> 
  tab_spanner(
    label = "Flipper Measurements",
    columns = flipper_length_mm_mean
  ) |> 
  tab_spanner(
    label = "Body Mass Measurements",
    columns = body_mass_g_mean
  ) |> 
  cols_label(
    bill_length_mm_mean = "Mean Length (mm)",
    bill_depth_mm_mean = "Mean Depth (mm)",
    flipper_length_mm_mean = "Mean Length (mm)", 
    body_mass_g_mean = "Mean Mass (g)", 
    species = "Penguin Species"
    )
Penguin Species
Bill Measurements
Flipper Measurements
Body Mass Measurements
Mean Length (mm) Mean Depth (mm) Mean Length (mm) Mean Mass (g)
Adelie 38.79 18.35 189.95 3,700.66
Chinstrap 48.83 18.42 195.82 3,733.09
Gentoo 47.50 14.98 217.19 5,076.02

Adding Color

For our last exploration, let’s add some color to the table! The data_color() function allows us to add colors to the cells of our table. The colors can be added to every cell (like shown below) or only added to select cells (based on a condition that is checked for the rows).

penguins |> 
  select(-c(sex, year, island), 
         body_mass_g, 
         bill_depth_mm, 
         bill_length_mm, 
         flipper_length_mm) |> 
  group_by(species) |>
  summarize(
    across(.cols = c(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g), 
           .fns = ~mean(.x, na.rm = TRUE), 
           .names = "{.col}_mean"
           )
    ) |> 
  gt() |> 
  fmt_number(columns = -species, 
             decimals = 2) |> 
  tab_spanner(
    label = "Bill Measurements",
    columns = c(bill_length_mm_mean, bill_depth_mm_mean)
  ) |> 
  tab_spanner(
    label = "Flipper Measurements",
    columns = flipper_length_mm_mean
  ) |> 
  tab_spanner(
    label = "Body Mass Measurements",
    columns = body_mass_g_mean
  ) |> 
  cols_label(
    bill_length_mm_mean = "Mean Length (mm)",
    bill_depth_mm_mean = "Mean Depth (mm)",
    flipper_length_mm_mean = "Mean Length (mm)", 
    body_mass_g_mean = "Mean Mass (g)", 
    species = "Penguin Species"
    ) |> 
  data_color(
    method = "numeric",
    palette = "PuOr", 
    reverse = TRUE
  )
Penguin Species
Bill Measurements
Flipper Measurements
Body Mass Measurements
Mean Length (mm) Mean Depth (mm) Mean Length (mm) Mean Mass (g)
Adelie 38.79 18.35 189.95 3,700.66
Chinstrap 48.83 18.42 195.82 3,733.09
Gentoo 47.50 14.98 217.19 5,076.02

Now go forth and make great tables!

Check-inCheck In
  1. The tab_style() function allows you to modify which aspects of a table?
  • title
  • subtitle
  • text of the cells
  • fill of the cells
  • borders of the cells
  • column labels
  • row group labels
  • footnotes
  • source notes
  • none of the above
  • all of the above
  1. To specify multiple styles in the style argument of tab_style(), the styles must be specified as a [vector / dataframe / list].

  2. Is it possible to specify multiple locations for the styles to be applied?

  • Yes
  • No
  • Only if they can be selected with the same cells_XXXX() function.
  1. There are two ways to create row groups in gt(). One option is to use group_by() [before / after] inputting the data into gt(). The second option is to use tab_row_group() [before / after] inputting the data into gt().

  2. The cols_label() function is the only way to change the names of columns in a gt() table.