Unit testing in R with testthat

Unit testing in R with `testthat`#

Author: Thomas Hawes (GitHub: thawes-rse)

License: Creative Commons Attribution-ShareAlike 4.0 International license (CC BY-SA 4.0).

Course objectives#

Understand why automated testing is valuable.
Learn how to use the testthat package to write tests for functions in R.
Learn a standard practice for organising your functions and tests.
Gain some tips and advice for writing tests.

Introduction#

Most people test their code in a manual, ad-hoc fashion. In R, this might typically be in the form of sourcing particular lines in a script, or trying out a function in the console with one or two different inputs and checking by eye that it gives the right answer.

While this is fine up to a point, there is a lot of value in turning testing into an automated process, that is, writing code that runs tests for us. There are several reasons why automated tests are valuable:

It leads to fewer bugs. When we test we often take a more critical view of our code and start to consider all the ‘unexpected’ cases where it could go wrong. This helps us identify potential bugs before they become a problem, making our code more robust.
They provide a safety net for us when making changes to code. Have you ever encountered a situation when you changed some bit of code, but then this lead to an error elsewhere in the code which you didn’t foresee? Having a suite of automated tests that we can run whenever we make changes to code gives us confidence that our change didn’t unexpectely cause incorrect behaviour in another part of the code. This is especially valuable when we’re coming back to code after time away and have forgotten the details of how it works, or if we’re working on someone else’s code: having a suite of tests gives us much more confidence to make changes without fear of unexpected bugs creeping in.
It nearly always improves the design of our code. Writing tests for our code encourages us to structure it in such a way that it is easy to test, which often means breaking it down into simpler pieces (functions) that pass data between each other: this is often easier to understand and modify. Furthermore, writing tests forces us to really think about what exactly the code needs to do, in a range of scenarios. The act of writing tests is a great tool for bringing clarity to our thinking.
It gives others confidence in our code. Having a suite of tests demonstrates that we’ve taken the time to check it works correctly and provides a precise record of what has been tested (and what might not have been tested).

This tutorial will guide you through writing unit tests in R, using the testthat package. Unit testing, for the purposes of this tutorial, means testing functions that we’ve written ourselves, to check that they behave correctly on different inputs. This tutorial therefore assumes that you have some experience with writing functions in R and using these in your code.

You can install testthat from CRAN in the usual way, using install.package or renv:

install.packages("testthat")

# Or if using renv for package management:
renv::install("testthat")

A note on `testthat` before we begin#

The testthat package is designed mainly for writing tests for functions that are in an R package. The R Packages book by Hadley Wickham and Jennifer Bryan (2nd ed at time of writing) provides an excellent handbook for creating your own R packages, including the use of testthat for writing tests for the package. This tutorial presents a simplified and somewhat unorthodox use of testthat that doesn’t require your code to be in a package. However, in the long term you may want to consider putting your functions into a package to use testthat ‘properly’ (besides this, it’s a great way of sharing your functions with others).

Setting up your test files#

Generally we keep our tests in separate files to our main R source files. The recommended practice is to have the following structure in our project:

MyAnalysis/
├── R/
│   ├── data_cleaning.R
│   ├── data_loading.R
│   └── utilities.R
├── ... <other files and folders>
└── tests/
    └── testthat/
        ├── test_data_cleaning.R
        ├── test_data_loading.R
        └── test_utilities.R

Observe the following:

All files of functions are kept in a folder called R. This is a standard pattern with R codebases, particularly with R packages. We suggest you keep to this pattern.
All files containing tests must begin with either test_ or test-, because testthat assumes this naming convention when looking for tests to run.
It is very common practice to have one test file per file of functions, with the test file name derived from the file it corresponds to. This keeps the tests organised and easy to navigate. Only deviate from this convention if you have a compelling reason to.
It’s not strictly necessary to keep the tests in a subfolder testthat of tests. But we recommend this because (1) it’s the layout that testthat is designed to work with; (2) it is the layout required by testthat if you later decide to turn your project into an R package; and (3) it is the layout other developers familiar testing using testthat will expect, so is easier for them to follow.

Writing and running your first test#

For the purposes of this tutorial, we’ll assume that our functions to be tested live in the file R/functions.R and that we write the corresponding tests in tests/testthat/test_functions.R.

Suppose we want to test the following function (the classic ‘fizzbuzz’ function). The function takes in a number and does one of the following:

returns the string “fizz” if the number is divisible by 3;
returns the string “buzz” if it’s divisible by 5;
returns the string “fizzbuzz” if it’s divisible by both 3 and 5; and
returns the number itself if it’s not divisible by 3 or 5.

fizzbuzz <- function(n) {
  if (n %% 15 == 0) {
    return("fizzbuzz")
  } else if (n %% 3 == 0) {
    return("fizz")
  } else if (n %% 5 == 0) {
    return("buzz")
  } else {
    return(n)
  }
}

Our first test checks that fizzbuzz of 1 is equal to 1.

library(testthat)

test_that("fizzbuzz of 1 is 1", {
  expect_equal(fizzbuzz(1), 1)
})

Test passed 🥳

Let’s break this down:

The test is contained within the test_that function. The first argument to test_that is a string giving a description of what’s being tested (in this case, "fizzbuzz of 1 is 1"). The second argument is a block of code that will be executed for the test, which has to be contained in curly braces.
The block of test code is simple in this case. It contains the function expect_equal that is used to check whether the first argument (fizzbuzz(1) in this case) is equal to the second argument (1 in this case). This function is an example of what testthat refers to as expectations: these are just functions that are used by testthat to check whether some condition is TRUE/FALSE or that some other thing happened (such as an error being raised, for example). We’ll see a few more expectations later.

Note: We need to use expectations

You might wonder whether we could have instead not used expect_equal and instead just written
test_that("fizzbuzz of 1 is 1", {
  fizzbuzz(1) == 1
})
The answer is no. Expectations like expect_that are part of the general machinery that testthat uses when running several tests at once, so we need to use these instead of our code to perform checks.

Putting into a file#

Here is how we would put this test into a test file:

# Source the file containing our functions to be tested
source("../../R/functions.R")

test_that("fizzbuzz of 1 is 1", {
  expect_equal(fizzbuzz(1), 1)
})

Note: we need to source the file R/functions.R with a path relative to the directory where the test file test_functions.R lives!

We can then run the tests in this file by running the following in an R console (assuming our working directory is the root folder of our project):

# From the root folder of our project
testthat::test_dir("tests/testthat")

You should see output similar to the following:

> testthat::test_dir("tests/testthat")
✔ | F W  S  OK | Context
✔ |          1 | functions                                

══ Results ═══════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]

Way to go!

This table shows us that our single test passed!

What if our test had failed? Let’s temporarily change our function fizzbuzz to incorrectly return 2 instead of 1. In that case, we would see something like the following:

> testthat::test_dir("tests/testthat")
✔ | F W  S  OK | Context
✖ | 1        0 | functions                                
──────────────────────────────────────────────────────────
Failure (test_functions.R:4:3): fizzbuzz of 1 is 1
fizzbuzz(1) not equal to 1.
1/1 mismatches
[1] 2 - 1 == 1
──────────────────────────────────────────────────────────

══ Results ═══════════════════════════════════════════════
── Failed tests ──────────────────────────────────────────
Failure (test_functions.R:4:3): fizzbuzz of 1 is 1
fizzbuzz(1) not equal to 1.
1/1 mismatches
[1] 2 - 1 == 1

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]
Error: Test failures

Notice how the summary table shows there was a single test failure (and no passes), and that a description of which test failed, as well as how it failed.

Exercise: Running your first test#

If you haven’t done so already, put the fizzbuzz function into an R source file and put the tests into a corresponding test file, and run the tests from the R console using testthat::test_dir. We suggest following the template project structure above.
Once you have done this, try modifying the test and/or function to get some test failures.

Adding more tests#

Adding more tests is simply a matter of adding more test_that calls within the test file:

# In tests/testthat/test_functions.R

source("../../R/functions.R")

test_that("fizzbuzz of 1 is 1", {
  expect_equal(fizzbuzz(1), 1)
})

test_that("fizzbuzz of 2 is 2", {
  expect_equal(fizzbuzz(2), 2)
})

test_that("fizzbuzz of 3 is 'fizz'", {
  expect_equal(fizzbuzz(3), "fizz")
})

When we run testthat::test_dir("tests/testthat"), each of these tests will run and a summary of the results printed. As above, if there are any test failures or errors, these will be reported individually.

> testthat::test_dir("tests/testthat")
✔ | F W  S  OK | Context
✔ |          3 | functions                                

══ Results ═══════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 3 ]

Exercise: add your own tests#

Add more tests to the test file, checking that the correct output is given for the inputs n = 5 and n = 15.
What other values do you think should be checked? Add tests for these cases too. (Hint: notice the %% in the if conditions.)

Examples of other expectations#

So far we have only use the expect_equal expectation in our tests, however there are others we can use.

Testing for errors with `expect_error`#

Going back to our fizzbuzz example, notice that there’s been a bit of an implict assumption that the input is an integer. But the code will actually work fine even if we supply a non-integer (try it!). Perhaps we really want to enforce that only integer arguments n are allowed and that if n isn’t an integer, then an error will be raised instead. Raising errors is a good way to flag to us / other people when some bad or unwanted input data was received by a function, stopping the code from doing any further computations with this bad data.

How could we test this? The testthat package provides the expect_error expectation for this. The basic form for using this is:

expect_error(expr, msg_regexp)

where:

expr is the expression to test (e.g. fizzbuzz(3.5))
msg_regexp is a character vector giving a regular expression that the error message should match, or NULL (the default) to not check the message. Often it’s simplest just to write out the whole message, in which case you can also specify the optional argument fixed = TRUE to ensure the error message matches msg_regexp exactly.

Here’s an example of using expect_error to check that fizzbuzz gives an error with a specific message, if a non-integer number is provided:

test_that("fizzbuzz of non-integers gives error", {
  expect_error(
    fizzbuzz(3.5),
    "expected `n` to be an integer, but received a non-integer number instead",
    fixed = TRUE  # to check the error message is equal to the above message
  )
})

── Failure: fizzbuzz of non-integers gives error ───────────────────────────────────────────────────────────────
`fizzbuzz(3.5)` did not throw an error.

Error:
! Test failed
Traceback:

1. (function () 
 . expr)()
2. reporter$stop_if_needed()
3. abort("Test failed", call = NULL)
4. signal_abort(cnd, .file)
5. signalCondition(cnd)

The test failure is expected here because we haven’t yet updated our fizzbuzz function to raise the error! Let’s rectify that:

fizzbuzz <- function(n) {
  # Check that input is an integer
  if (trunc(n) != n) {
    stop("expected `n` to be an integer, but received a non-integer number instead")
  }
  
  if (n %% 15 == 0) {
    return("fizzbuzz")
  } else if (n %% 3 == 0) {
    return("fizz")
  } else if (n %% 5 == 0) {
    return("buzz")
  } else {
    return(n)
  }
}

Now running our new test will pass, because the error is raised with the correct message:

test_that("fizzbuzz of non-integers gives error", {
  expect_error(
    fizzbuzz(3.5),
    "expected `n` to be an integer, but received a non-integer number instead",
    fixed = TRUE  # to check the error message is equal to the above message
  )
})

Test passed 🥇

Exercise: Testing negative `n`#

If you haven’t done so already, update your version of fizzbuzz to include the error check and your test file to include the test for the error.
The function fizzbuzz currently works if the argument n is any integer, positive or negative. Modify fizzbuzz to handle the case where n is negative separately and write a test to verify it works. You can do this however you like, but here are three possibilities:
- Raise an error (like we did above).
- Return the value as currently calculated but also issue the user with a warning. See the base R function warning and use the expect_warning expectation in your test.
- Return the value as currently calculated but also print a message to the console using cat. Use the expect_output expectation to test this.

Other common expectations#

The testthat documentation lists out all the different expectations it offers for writing tests. Here are a selection we think are particularly useful:

expect_equal (obviously)
expect_null: does code return NULL?
expect_true, expect_false: does code return TRUE or FALSE?
expect_match, expect_no_match: does a string match a regular expression (or not)?
expect_length: does code return a vector of a given length?
expect_named: does code return a vector with given names? (Can be used to check that a dataframe has given column names.)
expect_setequal(x, y): do x and y define the same sets i.e. have the same elements up to repetition and change of order.
expect_in(x, y): is every element in x also in y?
expect_mapequal(x, y): do x and y have the same names and x[names(y)] == y?
expect_error, expect_warning, expect_output: does code raise an error, issue a warning, or print output to the console?

Exercise: Other expectations#

Write your own function and test that uses an expectation that you haven’t used yet. The function doesn’t have to do anything sensible; the point is to gain practice using different expectations and reading the testthat documentation.

A more complicated example#

The tests we’ve seen so far have been very simple: just calling one line of code and checking for a particular output / error etc. Often though you will write functions that may need more code to write a good test. In this section we’ll look at a more realistic example and leave our toy fizzbuzz example behind.

Use of dplyr

This part of the notes assumes you have the R package dplyr installed.

Our example function for testing is taken from the Writing functions in R short course. The function below is designed to take in medical data of subjects that contains a column BMI of body mass index measurements. After filtering the data to remove rows that have missing BMI scores, it adds a column of discrete BMI categories, based on the following ranges:

Normal: 18.5 <= BMI < 25
Overweight: 25 <= BMI < 30
Obese: 30 <= BMI
Underweight: BMI < 18.5

#' Add a column of BMI categories
#'
#' The following ranges are used for the categorization:
#'
#' Normal:      18.5 <= BMI < 25
#' Overweight:  25 <= BMI < 30
#' Obese:       30 <= BMI
#' Underweight: BMI < 18.5
#'
#' If `data` has rows that are missing a BMI score then these rows will be
#' dropped and a warning message displayed.
#'
#' @param data A dataframe containing a column 'BMI' of numerical BMI scores.
#'
#' @returns A dataframe with a new factor column 'BMI_cat', with the levels
#'   ordered "Normal", "Overweight", "Obese", "Underweight".
add_BMI_categories <- function(data) {
  BMI_data <- data |>
    dplyr::filter(!is.na(BMI))
  
  if (nrow(BMI_data) < nrow(data)) {
    n_dropped_rows = nrow(data) - nrow(BMI_data)
    warning("Dropped ", n_dropped_rows, " rows with missing BMI readings")
  }
  
  BMI_levels <- c("Normal", "Overweight", "Obese", "Underweight")
  BMI_data <- BMI_data |>
    dplyr::mutate(
      BMI_cat = dplyr::case_when(
        BMI >= 18.5 & BMI < 25 ~ "Normal",
        BMI >= 25 & BMI < 30 ~ "Overweight",
        BMI >= 30 ~ "Obese",
        BMI < 18.5 ~ "Underweight"
      )
    ) |>
    dplyr::mutate(BMI_cat = factor(BMI_cat, levels = BMI_levels))
  
  return(BMI_data)
}

Exercise: What to test?#

Before continuing, look at the description of the function above and write down all the tests that you think should be written for it.

Starting our test suite#

We think the following would be good things to test for add_BMI_categories (perhaps you thought of others!):

Categorises BMI within the above ranges correctly (the normal case, a.k.a. ‘happy path’).
Categorises BMI values at the boundary points of the above ranges correctly (edge cases).
Filters out any rows with missing BMI values.
Gives a warning with the correct number of dropped rows, if there were missing BMI values.
Leaves the original columns in the dataframe untouched (both in the case where there are missing BMI values and where there are not).

We could also test the following, though in that case we’d be tempted to modify our function give better error messages than the ones automatically produced by R:

Gives an error if the BMI column is missing.
Gives an error if the BMI column is not numeric.

Let’s write a test for the first of the above cases: the ‘happy path’.

test_that("add_BMI_categories categorises BMI within the correct ranges", {
  data <- data.frame(BMI = c(18, 30.1, 27.5, 20))
  
  BMI_levels <- c("Normal", "Overweight", "Obese", "Underweight")
  expected_BMI_cats <- factor(
    c("Underweight", "Obese", "Overweight", "Normal"),
    levels = BMI_levels
  )
  expected_result <- data |>
    dplyr::mutate(BMI_cat = expected_BMI_cats)
  
  expect_equal(add_BMI_categories(data), expected_result)
})

Test passed 🌈

Observe the following:

We can put multiple lines of code within a test. In fact, it’s rare to be able to write tests in just one line.
We cooked up a simple dataframe data that allowed us to test the behaviour we wanted. In this case, we know what the correct answer should be, so can test the output of add_BMI_categories directly. In general, we want our test data to give a fair representation of what we’re trying to test, but beyond that it should be simple e.g. don’t use 10 rows when 4 will suffice.
We’ve tried to keep our test code as easy to read as possible and adopt good programming practices. This is important, because we need to check that the test is correct by eye!
The overall structure of the test can be broken down into three parts, sometimes referred to as ‘Arrange-Act-Assert’:
1. Set up the arguments we want to supply to our function-under-test.
2. Call the function.
3. Check something about the result. (In the above code, 2 and 3 are merged into the last line.)

Other than that, there’s really nothing more to writing tests as for writing any other code.

Exercise: Continuing the test suite#

Implement the other tests that were identified above (and/or implement the tests you came up with).

Exercise: Changing the function’s behaviour#

Suppose we no longer want to filter out the rows with missing BMI values, but instead should add NA as the BMI_cat for any rows where BMI is missing.

Remove any tests that tested the old filtering behaviour.
Modify the code of the function to do this and write test(s) to check this new behaviour. If you want to experiment with something new, try doing this by implementing the test(s) before changing the function, in an incremental fashion: add a test, then change the function just enough to make it pass the new test (and also the old tests), then repeat this process with any extra tests until you’ve made all the required changes to the function.

General advice / tips#

We conclude this tutorial on unit testing with some general advice and references to further information for more advanced testing cases. An excellent general guide to using testthat and how to think about testing in R can be found in the R Packages book (chapters 13 – 15 at the time of writing).

The testing mindset#

When we test a function, we are basically putting it under scrutiny: we try to think about ways it could go wrong and check that it behaves correctly. Often this includes considering cases we might not have thought about when we wrote the original function: for example, what if NULL or NA gets supplied as the value of one of the arguments? Or what if a function that takes in a vector receives a vector of length zero? Or what if there are no rows left after doing some filtering of a dataframe?

Keep the tests passing!#

Hopefully this one is obvious: Always ensure that all your tests pass. If a test starts to fail, then investigate why — don’t just ignore it! If it shows that there is a bug in your code, then fix it. If on the other hand the test needs updating to be correct, then change it. Or, if the test is no longer relevant, then delete it.

Re-run all the tests whenever you change your code#

The whole point of having a suite of unit tests is to catch bugs that might have unexpectedly crept into our code after we make a change. Therefore, run your tests whenever you make changes to your functions that are under test. We want to stress two points here:

You should run all of your tests, not just the ones for the function(s) you have changed. This is to ensure that your changes haven’t adversely affected any other functions that rely on the modified functions. In practice, using testthat::test_dir means it’s no harder to run all your tests than just a few of them.
Don’t make a big change to your functions all in one go without running your tests. Instead, break down big changes into smaller steps and run the tests after each of the smaller steps. If at any point a test fails after a change, having made smaller changes will help you identify the cause of the failure much more quickly than a whole lot of changes. For the same reason, where possible we recommended only changing one function at a time between running tests.

Keep tests organised#

We’ve already discussed how the usual convention is to have one test file for each R source file. In addition to this, keep all the tests within a test file grouped together by function, as this makes it easier to navigate and find your tests. Also make it clear in the description argument of test_that which function is being tested, for example:

test_that("my_func does such-and-such a thing", {...})

Isolated tests#

Strive to ensure that each test_that test can be run completely on its own and that all the tests in your test files can be run in any order: don’t assume that testthat will run the tests in a particular order. In particular, modifying global variables that are shared between tests is a very bad idea, because executing one test affects the setup conditions for the other tests in this case (which is a very tricky source of bugs). It’s OK to repeat code / data between tests, because this keeps them independent. If you really need to use the same object between tests, look into using test fixtures instead.

Test behaviour, not implementation#

In a nutshell, functions are like little machines that take input (through the arguments) and then return some value as the output and/or have some other side effect (like printing a message to the console or writing a file). Focus on testing the behaviour of functions and not how they’re implemented. Put another way, how could you test that this machine is working correctly without peering into its internal mechanics? This approach is often called black-box testing: we test the function based just on its ‘interface’ (the inputs, outputs and any side effects). The advantage of this approach is that it allows us to change the internal workings of the function without affecting the tests. Another way to think about this: write your tests as if they were little snippets of example code showing a user of the function what it does and how to use it.

How much testing?#

Once you get into testing, it can sometimes be difficult to know where to stop and when you’ve done ‘enough’. It’s difficult to give a hard-and-fast rule about this because it’s context specific. But you might find it helpful to think in terms of taking a risk-based approach. In general, the risk of an event is defined as the multiple of (1) the impact of of the event occuring, and (2) the likelihood of it occuring. So consider focussing on cases where the consequences of incorrect behaviour are significant (such as an incorrect calculation silently carrying through the analysis) or which are likely to occur (e.g. testing a function that cleans data that is very likely to contain missing values). Pay particular attention to the cases which are high risk i.e. likely to occur and that have a significantly negative impact if the code is wrong! On the other hand, if something is unlikely to happen and the consequences of incorrect code are minor then maybe it’s not worth worrying about testing it.

Keep tests simple#

Strive to keep tests simple. Sometimes we find that to test a particular behaviour of a function, we need to do quite a lot of set-up code or it’s difficult to ‘detect’ the thing we’re trying to check. In that case, consider whether you could break up the function into smaller parts that can be tested independently, or whether maybe that function is not quite the ‘right’ function and rewriting it would make it easier to test. Often, allowing the tests to inform the design of your functions in this way leads to code which is easier to change and understand, because it is broken down into pieces that can be changed / replaced with minimal impact on other parts of the code. Aside from this, keeping tests simple means we are less likely to make mistakes when writing the tests themselves.

Bug –> test#

Whenever you discover a bug in your code, before fixing your code write a test that detects that bug (i.e. fails initially but will pass when the code is corrected). This is a great way of building up a suite of tests and ensuring that bugs don’t re-emerge in the future.

Tools for testing more challenging things#

Occasionally you might need to write a test that involves making a change outside the test itself, like writing a file or updating some global setting. In this case, we need to ensure that the state of the system is returned to how it was before the test was run — not doing so can lead to situations where tests start incorrectly failing for unapparent reasons. The testthat package provides test fixtures for situations like these: see this vignette on test fixtures for more details.

We’ll also mention here that the testthat package provides something called snapshots to help writing tests where the expected output is difficult to create manually, where we mostly just want to check that the output is ‘the same’ each time we run the test. This can be particularly helpful e.g. for checking that an image file of a graph hasn’t changed. See this vignette on snapshot tests for details.