22 Testing in R

Warning

This document is not rendered by the handbook so some code samples may be out of date/not working. (sorry!)

When developing packages in R, we usually lean on {testthat}. Creating unit tests with {testthat} (leaving aside integration tests for now) is pretty simple, first we write a function which performs some actions, then we write some tests which check that our function still performs those actions - or if following Test Driven Development (TDD) practices, we write the tests first and then write the functionality, in either case, {testthat} makes it pretty seamless.

As a quick refresher, let’s look at how to write some basic unit tests in R; for brevity we won’t follow a strict TDD workflow - we’ll test a couple of functions that we’ve inherited - is_even() and is_odd().

Tip

The is_odd() function calls the is_even() function, so it makes sense to test the is_even() function first. Once we’ve tested the implementation of is_even(), we only need to test the additional logic introduced by is_odd().

22.1 Testable Functions

is_even <- function(number) {
  
  stopifnot(is.numeric(number), number != 0)
  
  return(number %% 2 == 0)
}


is_odd <- function(number) {
  stopifnot(is.numeric(number), number != 0)
  
  return(!is_even(number))
}

Our two functions should take a number as an input, and check that the number isn’t zero. is_even should return TRUE if the number is divisible by 2 and FALSE otherwise. is_odd calls is_even and then flips the truth value, so that if a number isn’t even, and it’s not 0, it’s odd.

It clearly makes sense to test is_even first, because is_odd depends on it. So what should we test? Echoing Einstein’s famous words on simplicity, tests should test everything the function does, nothing more and nothing less. Obviously we should check that our input validation is working, if we input 0 do we get an error, the same if we input a non-numeric. Then we should check a few return values, some that the function should return FALSE to and some that the function should return TRUE to. And then there’s a slightly less obvious test - our function has one argument and that argument is mandatory, i.e. it has no default value; if we’ve made this decision we should have made it for a reason, so we should test the function does error if no value is set.

Reminder that unit tests should:

be simple: tests are not the place to show off what you can do, you should be able to understand at a glance what’s being tested and how the test works i.e. favour writing each test out rather than wrapping a bunch in a vapply() or a map().
be lightweight: you’re usually going to want to write hundreds of them for a package and check them often. Each test should run in fractions of a second, a heavy test suite won’t be used and so becomes self-defeating
be self-contained: don’t pass data around between tests, each test_that block should be able to start and terminate in isolation
be informative: they should provide helpful error messages when they fail, so that you know precisely which part of your code’s logic is broken and where
be comprehensive: if your code should do something, write a test to show it does
look both ways: if your code shouldn’t do something, write a test to show it doesn’t

Writing tests may at times feel cumbersome, but only at the beginning. Once you’ve got a good test suite up development becomes more enjoyable - less anxiety associated with each change you make or feature you implement - and faster (trust!). You should often feel like you’re insulting your own intelligence and that of your colleagues’ by writing such a simple test “Well duh, of course it does that…”

22.2 Our First Tests

There are a number of other, more specific tests for more advanced users, but let’s stick to expect_error, expect_true, and expect_false for now.

library(testthat)
test_that("is_even has an argument called number and it requires an input", {
  
  expect_true(names(formals(is_even)) == "number")
  expect_error(is_even(),
               regexp = 'argument "number" is missing')
})

We’re going to make a change to is_even, to show that these tests can fail if the underlying logic of is_even changes resulting in changes in the function’s behaviour (this isn’t necessary except for explanatory purposes).

is_even_inputs <- function() {
  test_that("is_even has an argument called number and it requires an input", {
    
    expect_true(
      names(formals(is_even)) == "number")
    
    expect_error(
      is_even(),
      regexp = 'argument "number" is missing'
    )
  })
}


is_even_inputs()

Ok, so the test passes. But what if I want to change the input is_even takes to ‘x’ which is a more common input?

is_even <- is_even <- function(x) {
  
  stopifnot(is.numeric(x), x != 0)
  
  return(x %% 2 == 0)
}

is_even_inputs()

We see that we get a test failure: – Failure: is_even has an argument called number and it requires an input —– names(formals(is_even)) == “number” is not TRUE

This is exactly what we wanted. We wrote a function, wrote some tests, changed the function’s behaviour and then running our tests told us that we’d altered the function’s behaviour. At this point we should either fix our function - if indeed we broke it - or update our tests. We’ll fix the function as the tests are still doing what we want them to. Then we’ll check our old tests still pass.

is_even <- function(number) {
  
  stopifnot(is.numeric(number), number != 0)
  
  return(number %% 2 == 0)
}

is_even_inputs()

Ok, so let’s carry on with testing the function. We’ll establish that our function doesn’t take 0 as an input, and that if we feed it a string, or a string that could be coerced into a numeric that the function errors. This last one might seem like a funny test, but we haven’t explicitly asked our function to coerce its inputs, so we should check that it does not.

test_that("is_even errors if given a non-numeric input, or 0 as an input", {
  expect_error(is_even(0),
               regexp = stringr::fixed('number != 0'))
  
  expect_error(
    is_even("string"),
    regexp = "is\\.numeric"
  )
  
  expect_error(
    is_even("10"),
    regexp = "is\\.numeric"
  )
})

And then finally we can test that the return values are what we expect:

test_that("is_even returns a logical, and that logical is TRUE if given an even input, and FALSE if given an odd.", {
  expect_true(
    inherits(is_even(10), "logical")
  )
  
  expect_true(
    inherits(is_even(9), "logical")
  )
  
  expect_true(
    is_even(10) == TRUE
  )
  expect_false(
    is_even(9) == TRUE
    )
  
  #and another value, just to be sure...
  expect_true(
    is_even(10002) == TRUE
  )
})

22.3 Refactoring is_odd

We’ve now tested that our is_even function does what it should, and doesn’t do what it shouldn’t. We could add more tests, like what happens if we input a data frame as number, or a factor? Or if 8938957 and 23665 are odds, but we feel quite confident that our current cases take care of those.

We haven’t tested is_odd yet, but let’s take another look at our function definitions and see if we can’t simplify the logic somehwat.

is_even <- function(number) {
  
  stopifnot(is.numeric(number), number != 0)
  
  return(number %% 2 == 0)
}

is_odd <- function(number) {
  stopifnot(is.numeric(number), number != 0)
  
  return(!is_even(number))
}

We’ve written a pretty lightweight and comprehensive test suite for is_even, so do we just go ahead and write the same tests for is_odd? We don’t really need to, because is_odd calls is_even anyway. So let’s simplify is_odd:

is_odd <- function(number) {
  return(!is_even(number))
}

Informally test a few values:

is_odd("string")
is_odd(0)

So we can see that is_odd is producing the errors we would expect it to, because the logic is cemented in is_even. Our tests for is_odd don’t really need to duplicate this logic, so we could test one each odd-signalling end digit, and each even-signalling end digit.

test_that("is_odd returns TRUE for odd numbers and FALSE for even numbers", {
    
  expect_true(is_odd(11))
  expect_true(is_odd(333))
  expect_true(is_odd(555))
  expect_true(is_odd(37))
  expect_true(is_odd(49))
  
  
  expect_false(is_odd(10))
  expect_false(is_odd(4))
  expect_false(is_odd(638))
  expect_false(is_odd(132))
  expect_false(is_odd(666))
})

It’s overkill to do this, but there’s an important point to be made. You might look at this and think ‘shouldn’t I just apply a list of numbers, rather than write each test out, to avoid duplication?’

test_that("is_odd returns TRUE for odd numbers and FALSE for even numbers", {
  odds <- list(11, 333, 555, 37, 49)
  lapply(odds, function(odd) {
    expect_true(is_odd(odd))
  })
  
  evens <- list(10, 4, 638, 132, 666)
  lapply(evens, function(even){
    expect_false(is_odd(even))
  })
})

22.4 Keep it Simple, Stupid

Whilst this is generally good practice, it’s not ideal in the case of testing because when a test fails, our error messages are less informative. For brevity we’ll add an odd value to our evens list, and apply that list over our tests:

test_that("is_odd returns FALSE when given even inputs",{
  evens <- list(10, 4, 638, 132, 666, 17)
  lapply(evens, function(even){
    expect_false(is_odd(even))
  })
})

We see that the error message we get back doesn’t tell us which of our inputs failed, just that we expected a FALSE and we got a TRUE, somewhere. In this case it’s pretty obvious, but there are times when testing things like shiny UI components where it’s tempting to put all the UI tags into a list of tags and l/vapply them into an expect function to keep the testing code concise and avoid duplication. However, we want our tests to be informative more than we want them to adhere to Do Not Repeat Yourself principles.

23 Testing Shiny apps

Testing feels pretty straightforward for R packages with {testthat} but it was not built with Shiny in mind. Shiny introduces reactive programming to R users, and it’s not self-evident how to test reactive components and applications via {testthat}‘s traditional testing approach. In fact, when I sat down to start testing Shiny apps, I realised that not only could I not see how to do it, I didn’t know how to articulate why I couldn’t just do it. I stared at the screen for a while with that unpleasant sense of ’I don’t know what I’m doing’, looked at a few help pages, and eventually went back to building out more features (don’t do this!).

Let’s steal a basic shiny app from the sidebarLayout documentation. From the code it’s pretty clear that we’ll have a one page app, with a sidebar layout. In the sidebar we’ll have a slider input which allows us to select a number of observations and then in the main panel we’ll output a histogram. The server then reacts to changes in the slider’s input, and generates a new histogram each time.

library(shiny)

# Define UI
ui <- fluidPage(

  # Application title
  titlePanel("Hello Shiny!"),
  sidebarLayout(
    # Sidebar with a slider input
    sidebarPanel(
      sliderInput("obs",
                  "Number of observations:",
                  min = 0,
                  max = 1000,
                  value = 500)
    ),
    # Show a plot of the generated distribution
    mainPanel(
      plotOutput("distPlot")
    )
  )
)

# Server logic
server <- function(input, output, session) {
  output$distPlot <- renderPlot({
    hist(rnorm(input$obs))
  })
}
if (interactive()) {
# Complete app with UI and server components
shinyApp(ui, server)
}

23.1 What’s the problem?

Ideally we want to test the three main components - the UI, the server, and the call to combine the two. The first obstacle is the UI, it’s not a function like we’re used to testing. And we don’t need to write a test to check that it’s still not a function going forwards, we let the Shiny developers write their own tests.

inherits(ui, "function")

I think it’s quite common to build Shiny apps without really knowing what a shiny.tag.list is, and that’s what a UI is.

S3Class(ui)

And then there’s the slightly puzzling unnamed list which has 4 elements

length(ui); names(ui)

What are the elements, and what kinds of test can we run on them?

ui[[1]]
ui[[2]]
ui[2]
ui[3]
class(ui[4][[1]][[1]])

We’ll come back to this shortly.

The next problem is the server object - which is a function - but a slightly esoteric one.

S3Class(server)

It takes three mandatory arguments - input, output, and session. The input argument is quite transparent - we use it to access inputs all the time when building Shiny apps, and similar for outputs with the output$ object. As we can index them with $ we think they’re probably named lists of some description. But session is a little more opaque.

formals(server)

So we know that if we want to test the server function we’ll need to add input, output, and session but we don’t really know what we should add there. Like the UI, we’ll come back to this shortly.

So the shinyApp function, which has a more familiar look about it. It takes our ui and server as inputs, and builds the Shiny app for us. Source code:

function (ui, server, onStart = NULL, options = list(), uiPattern = "/", 
    enableBookmarking = NULL) 
{
    if (!is.function(server)) {
        stop("`server` must be a function", call. = FALSE)
    }
    uiPattern <- sprintf("^%s$", uiPattern)
    httpHandler <- uiHttpHandler(ui, uiPattern)
    serverFuncSource <- function() {
        server
    }
    if (!is.null(enableBookmarking)) {
        bookmarkStore <- match.arg(enableBookmarking, c("url", 
            "server", "disable"))
        enableBookmarking(bookmarkStore)
    }
    appOptions <- captureAppOptions()
    structure(list(httpHandler = httpHandler, serverFuncSource = serverFuncSource, 
        onStart = onStart, options = options, appOptions = appOptions), 
        class = "shiny.appobj")
}

There’s some input validation,

inputs is a list of reactiveValues, output is a list of some values too. Session is… a bit different. And how do we access it programmatically?

Later - refactor to have min > 0, as why would you want 0 breaks and to allow the erro?

24 Testing a Golem Module

There is another layer of complexity if we build our apps with frameworks like {golem}. For the rest of this post, we’ll assume some familiarity with {golem} and its modules.

In my case, the modules that I want to test take reactiveValues from other modules, or reactive objects, such as reactive data frames from other modules. This presents a barrier to testing, as in a general R or testthat session, we’re not in a reactive context.

Now - I’m pretty sure I first read this in Mastering Shiny by Hadley Wickham - but it’s important to remember that in R, virtually everything is a function, and reactives are no different. This means that we can mimic the behaviour of a reactive, by passing in a function to a module.

To make it more concrete, I have a module’s server function which takes an id and a data frame. The function then calls the moduleServer function, which takes the id from my_module_server, and a server function as an input.

my_module_server <- function(id, highlighted_dataframe) {
  moduleServer(id, 
               function(input, output, session) # This is what we'd usually have as our server, e.g. server <- function(input, output, session)
  ) {
    
  } 
}

At the moment the module doesn’t actually do anything, but we have a skeleton in place and we can see that when we call my_module_server we have to provide an input for id and highlighted_dataframe.

Let’s add some real logic, so that our module creates a reactive object, which updates whenever there’s a change in our highlighted_dataframe input, or the updatePlotButton is pressed, and needs to have an x column in the highlighted_dataframe, plus an input set for topN and width + height. This module is a bit more complex, but still less complex than many modules will be.

my_module_server <- function(id, highlighted_dataframe) {
  moduleServer(id, function(input, output, session)) {
    
    reactive_plot <- shiny::eventReactive( c(highlighted_dataframe(), input$updatePlotButton), {
      module_plot <- highlighted_dataframe() %>%
        make_module_plot(
          x_var = x,
          top_n = input$topN
        )
      return(module_plot)
    }
    
    output$modulePlot <- shiny::renderPlot({
      reactive_plot()
    }, res = 100, width = function() input$width, height = function() input$height
    )
    } 
  }
}

In testing this module we want to know that given the right inputs, a plot is rendered. So how do we go about testing it?

My first pass with testServer was to do something like:

testServer(
  app = my_module_server,
  args = list(),
  expr = {
    ns <- session$ns
    
    #Check input isn't set
    expect_true(is.null(session$topN))
    
    #Set input
    session$setInputs(topN = 5)
    expect_true(input$topN == 5)
    
    #... some other code
  }
)

This test passed, as did the other tests that I wrote for the inputs, but then I realised that I could set anything as an input here, and the test would pass

testServer(
  app = my_module_server,
  args = list(),
  expr = {
    ns <- session$ns
    
    #Check input isn't set
    expect_true(is.null(session$topN))
    
    #Set input
    session$setInputs(topN = 5)
    expect_true(input$topN == 5)
    
    session$setInputs(shalabadaba = "shalabadooo")
    expect_equal(input$shalabadaba, "shalabadooo")
  }
)

So I realised that I wasn’t testing what I thought I was, or what I needed to test. So I wanted to get a bit more information about what’s actually happening in the testServer, like is there actually a reactive_plot being generated?

testServer(
  app = my_module_server,
  args = list(),
  expr = {
    ns <- session$ns
    
    print(reactive_plot())
  }
)

So now I get the error that ‘highlighted_dataframe was missing’, which is a mandatory argument for the module server, now we’re getting somewhere. Whereas before the tests were passing because they weren’t really testing anything, the test is now failing in meaningful ways.

In more familiar R terms, the server was waiting until it had to do anything with reactive_plot before raising an error. So how do we solve it and check that a plot really is being generated?

25 Notes from other resources

25.1 Mastering Shiny - Testing (Chapter 21 presently)

You can use browser() inside testServer to see what’s going on with specific values/what your changes do and what will / won’t work…

stopifnot(is.reactive(var)) - nice little trick for input validation in modules, e.g. for highlighted_dataframe()

testServer - Unlike the real world, time does not advance automatically. So if you want to test code that relies on reactiveTimer() or invalidateLater(), you’ll need to manually advance time by calling session$elapse(millis = 300).

testServer() ignores UI. That means inputs don’t get default values, and no JavaScript works. Most importantly this means that you can’t test the update* functions, because they work by sending JavaScript to the browser to simulates user interactions. You’ll require the next technique to test such code.

Wrap testServer in test_that

25.2 Shiny App Packages - Testing (Section 3)

Testing the UI

mod_bigram_network_ui(id = "test")

25.3 Gotchas + Reminders

Browse[1]> class(bigram_reactive) [1] “reactive.event” “reactiveExpr” “reactive” “function”
Browse[1]> x <- bigram_reactive() Browse[1]> class(x) [1] “ggraph” “gg” “ggplot”

bigram_reactive is the reactive expression, untriggered. bigram_reactive() is the actual ggraph/gg/ggplot object now triggered. Always good to remind onself of this and what that means when interacting with the objects at various points of the SWD process.

Can use ui <- mod_bigram_network_ui(id = "test") and type ui to see all of the shiny tags, and then type ui[[1]] to render the UI in a viewer object, maybe easier than end app, run_app() -> click to app.

25.4 Testing interaction with nested modules

the mod_group_vol_time_server gets its filtered data out of the mod_daterange_input_server. This presents a challenge because each module is namespaced, so we can’t just setInputs(dateRange = list(as.Date("2023-01-03"), as.Date("2023-01-09")) because we’d be setting the value of dateRange inside the wrong namespace - the namespace of mod_group_vol_time_server.

This means when we try to access the group_date_range_vot$over_time_data() to generate our groupued volume over time char, we get an error that the dateRange isn’t set. So with the help of ui <- mod_group_vol_time_ui(id = "test") we can look for the correct input to set dateRange to, which is dateRangeGroupVol-dateRange. We’ve unearthed a general truth about nested modules. Our parent module is dateRangeGroupVol, our child is dateRange, so we join the two with dateRangeGroupVol-dateRange, if dateRange had a child module called dateRangeChild, we’d join the three with dateRangeGroupVol-dateRange-dateRangeChild!

session$setInputs(
  # ns("dateRange") = list(as.Date("2023-01-03"), as.Date("2023-01-09")),
  dateBreak = "day",
  height = 600,
  width = 400,
  nrow = 2,
  #So to pass stuff into the modules that need them we can pre-prepend the namespace with syms
  `dateRangeGroupVol-dateRange` = list(as.Date("2023-01-03"), as.Date("2023-01-09"))
)

25.5 Emulating a reactive with the return values specified:

because in our module we try to: label_ids <- as.numeric(selected_range()$key)

We can’t just send in selected_range as c(1, 2, 3). We need to send it in as an object which imitates a reactive, with a return value of list(key = c(…)) so that when we call it in the module later on, we get the current value, and we can access the key. Tricky.

selected_range = function(){ return(list(key = c(1, 2, 3))) }

If wanting to change the length of the generate_dummy_data for whatever reason: function(){return(generate_dummy_data(length = 20))}