Who you gonna call? R processes!

Aside from the workshops and talks at #rstudioconf back in January, I picked up many useful nuggets from the stream of conference tweets. I was particularly struck by this one:

Tired: Load R scripts with source()
Wired: Load R scripts with callr https://t.co/x2wxIOdmU0
— Travis Gerke (@travisgerke) January 18, 2019

Not feeling the amazement? Okay, let me explain…

Loyal readers may remember my previous blogpost, in which I described starting a new R process to run a plumber API for testing. To do so, I opened up a new RStudio terminal with rstudioapi::terminalCreate(), then sent it commands with terminalSend(). My commands sometimes “got lost” if I sent them too soon, so I forced my function to “nap” and try again until the commands were fully complete. A bit of a convoluted workaround, but it worked.

The callr package, on the other hand, deals with R processes in a much more sophisticated way. The authors describe it this way:

Call R from R: It is sometimes useful to perform a computation in a separate R process, without affecting the current R process. This package does exactly that.

The goods

I was able to catch Gábor Csárdi, the creator and maintainer of the callr package, at the #TidyverseDevDay for a live demo.

Gábor showed me r(), which evaluates an expression in another R session, and then returns the output. The first argument is a function, and the second argument is a list of the arguments/parameters to pass into the function.

library(callr)
r(function(i) 1 + i, list(i = 5))

## [1] 6

Try running the code again and observe how it freezes up your R session for a moment before returning the answer 6. It sends the call, then waits for the response and returns it to you.

The r_bg() (background) function takes the same arguments, but once it sends your expression to the new R session, it stops immediately. You must retrieve the output yourself by using $get_result() on the R process object.

my_r_process <- r_bg(function(i) 1 + i, list(i = 5))
my_r_process$wait()
my_r_process$get_result()

## [1] 6

If you try to get_result() right away, you get an error (though you may not notice if you run these lines interactively one-by-one).

my_r_process <- r_bg(function(i) 1 + i, list(i = 5))
my_r_process$get_result()

## Error in rp_get_result(self, private): Still alive

Because of this behavior, r_bg() is much faster than r(). However, if you try to retrieve the output to soon, you may run into errors.

library(tictoc)
tic(); r(function(i) 1 + i, list(i = 5)); toc()

## [1] 6

## 0.244 sec elapsed

tic(); r_bg(function(i) 1 + i, list(i = 5)); toc()

## PROCESS 'R', running, pid 82950.

## 0.021 sec elapsed

For code that takes a while to run, the “notorious” r_bg() is a clear winner–it keeps your console free to use while working hard on your behalf in the background.

Mixing it up with R6

You can browse through the available functions if you type callr:: (hit <TAB> if nothing shows up automatically). You may notice that some of the options are not functions at all, but rather R6 class generators.

class(callr::r_process)

## [1] "R6ClassGenerator"

What in the world is R6?¹ Unlike regular R objects, R6 objects are mutable. That is, you can change them without having to reassign the changed version to the old variable name. This sort of functionality is particularly well-suited to processes.

In pseudo-code, it goes something like this:

# "normal" R - variables don't change unless they are reassigned
my_num <- 7
my_num <- multiply_three(my_num)
my_num # becomes 21 because it has been re-assigned

versus

# the world of R6
my_num <- MyR6Class$new(7)
my_num$multiply_three() 
my_num # magically becomes 21

Those of you familiar with Python or other programming languages may already have some intuition for mutable objects.

Process versus session

Among the R6 classes in the package are r_process and r_session. An r_process starts in the background, evaluates an R function call, and then quits.

rp <- r_process$new(r_process_options(func = function(i) i + 1,
                                      args = list(i = 2)))
rp$wait() # if we call get_result too soon, we get an error
rp$get_result()

## [1] 3

rp

## PROCESS 'R', finished.

On the other hand, an r_session continues to run in the background until it is explicitly closed.

rs <- r_session$new()
rs$run(function(i) i + 1, args = list(i = 2))

## [1] 3

rs$run(function() paste(stringr::fruit[1:3], collapse = " and "))

## [1] "apple and apricot and avocado"

rs

## R SESSION, alive, idle, pid 82966.

Conclusion

R is a single-threaded language, meaning that it can only handle one thing at a time. We have ways of speeding things up with vectorization and parallelization, but callr provides us with even more flexibility. We can partition code into separate processes ourselves. Next time, instead of source()ing your script or–as in my previous case–opening up a new RStudio terminal, consider if it’s time to call up an R process.

For a full explanation of R6, take a look at this chapter in Advanced R.↩