A Tale of Two Testing Environments

Background
Lesson 1: check suggested packages
Lesson 2: MODULARIZE + use vagrant scripts with caution
Internet solutions
Conclusion

Today marks the second time I’ve debugged the problem of tests that pass with devtools::test() but fail with devtools::check(). Since I’m now riding my debug-success high (and hope never to repeat this again), here is a blogpost.

Background

There are some resources online (see below) to help with debugging this particular problem, but they are sparse and situation-specific. In the Automated Checking chapter of R packages, Hadley writes:

Occasionally you may have a problem where the tests pass when run interactively with devtools::test(), but fail when in R CMD check. This usually indicates that you’ve made a faulty assumption about the testing environment, and it’s often hard to figure it out.

This is so true, but also not really what I wanted to read when I was stuck in this situation.

Lesson 1: check suggested packages

My first experience with the two-testing-environment problem involved the httr package, which I wrote about in my previous blogpost. In this package, the content() function takes an API response and by default, tries to parse the response. I happily used this to automatically read in a csv–that is, until I realized that it was the source of my debugging frustration.

Turns out, the content() function calls readr::read_csv() by default. Since readr is listed in Suggests in the httr package, it was not accounted for in R CMD CHECK. Once I added it to the DESCRIPTION file of my own package, all tests were happy once again.

Lesson 2: MODULARIZE + use vagrant scripts with caution

This time around, I knew right away that the problem had something to do with my environment, but I still needed hours to figure it out (granted, I ran into this bug late on a Friday so perhaps the other lesson here is…try again Monday). This time, I was in the unfortunate position of a test breaking on a “workhorse” function which–in retrospect–was trying to do way too much:

it sourced functions and ran them (there is a complicated reason for this and yes, this is where it all broke down)
it copied files from one place to another (and renamed them in the process)
it created a file based off of a template function
it read a YAML file

The YAML part was actually the easiest, since it was already its own function. I wrote some tests and confirmed that this was not the breaking piece.

The other parts were a bit harder to figure out because they were not modularized. Starting out with functional programming, it’s easy to get stuck on the “three times” rule:

You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code). (R4DS)

or in Twitter form:

Time to write a function https://t.co/QF2Nmeytb6
— Hadley Wickham (@hadleywickham) September 17, 2017

However, even if they are only used once, functions help keep code modular and readable. Modularity is particularly useful for testing and debugging. Readability can be enhanced by using small, well-named helpers rather than comments. Check out Code Smells and Feels for an example:

Finally, I dug into my ill-fated sourcery. Because these extra functions were not an official part of the package (they live in inst), there was no automatic check on them to catch packages not listed in the DESCRIPTION file. Thus, they were only tested when called explicitly, and broke in weird ways.

Coincidentally, the culprit happened to be readr again. Once I put it into my DESCRIPTION, I was on the home stretch.

In retrospect, the error message about failing namespaces makes a bit more sense:

checking tests ... ERROR
  Running 'testthat.R'
Running the tests in 'tests/testthat.R' failed.
Last 13 lines of output:
  5: getExportedValue(pkg, name)
  6: asNamespace(ns)
  7: getNamespace(ns)
  8: tryCatch(loadNamespace(name), error = function(e) stop(e))
  9: tryCatchList(expr, classes, parentenv, handlers)
  10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
  11: value[[3L]](cond)

Internet solutions

In my meanderings through GitHub issues and StackOverflow, I came across some other solutions to this problem:

In this GitHub issue, lhsego writes: “Simply add Sys.setenv("R_TESTS" = "") as the first line in tests/testthat.R”
In the same issue, espinielli writes: “In my particular case, the discrepancy I got between devtools::test() and devtools::check() was due to having the definition of a new unit in a .R file rather than inside .onLoad() (and removing it in .onUnload()).”
On StackOverflow, wligtenberg ended up self-solving: “In the end, the issue was trivial. I used base::sort() to create the levels of a factor. (To ensure that they would always align, even if the data was in a different order.) The problem is, that the default sort method depends on the locale of your system. And R CMD check uses a different locale than my interactive session.”

In response to my tweet about this blogpost, Tony Elhabr pointed out this passage in the Tests chapter for the R Packages book that highlights a few other gotchas:

I suppose so... Or just read the R Packages book a bit more closely 😄 pic.twitter.com/yuY8Dji3qS
— Tony (@TonyElHabr) August 12, 2018

Conclusion

Debugging this kind of error is no joke! In the future, I would start with the following:

check the DESCRIPTION file–does it really list all the packages I’m using?
check the Suggests section in other packages–make sure you list any relevant packages in your DESCRIPTION file
clear your environment before running interactive tests and check for the environmental variables listed above

Finally, be strategic: isolate the problem, test it systematically, and learn from the pain of those who debugged before you.