Aside from the workshops and talks at #rstudioconf back in January, I picked up many useful nuggets from the stream of conference tweets. I was particularly struck by this one:
Tired: Load R scripts with source()— Travis Gerke (@travisgerke) January 18, 2019
Wired: Load R scripts with callr https://t.co/x2wxIOdmU0
Not feeling the amazement? Okay, let me explain…
Loyal readers may remember my previous blogpost, in which I described starting a new R process to run a plumber API for testing. To do so, I opened up a new RStudio terminal with
rstudioapi::terminalCreate(), then sent it commands with
terminalSend(). My commands sometimes “got lost” if I sent them too soon, so I forced my function to “nap” and try again until the commands were fully complete. A bit of a convoluted workaround, but it worked.
callr package, on the other hand, deals with R processes in a much more sophisticated way. The authors describe it this way:
Call R from R: It is sometimes useful to perform a computation in a separate R process, without affecting the current R process. This package does exactly that.
I was able to catch Gábor Csárdi, the creator and maintainer of the
callr package, at the #TidyverseDevDay for a live demo.
Gábor showed me
r(), which evaluates an expression in another R session, and then returns the output. The first argument is a function, and the second argument is a list of the arguments/parameters to pass into the function.
library(callr) r(function(i) 1 + i, list(i = 5))
##  6
Try running the code again and observe how it freezes up your R session for a moment before returning the answer
6. It sends the call, then waits for the response and returns it to you.
r_bg() (background) function takes the same arguments, but once it sends your expression to the new R session, it stops immediately. You must retrieve the output yourself by using
$get_result() on the R process object.
my_r_process <- r_bg(function(i) 1 + i, list(i = 5)) my_r_process$wait() my_r_process$get_result()
##  6
If you try to
get_result() right away, you get an error (though you may not notice if you run these lines interactively one-by-one).
my_r_process <- r_bg(function(i) 1 + i, list(i = 5)) my_r_process$get_result()
## Error in rp_get_result(self, private): Still alive
Because of this behavior,
r_bg() is much faster than
r(). However, if you try to retrieve the output to soon, you may run into errors.
library(tictoc) tic(); r(function(i) 1 + i, list(i = 5)); toc()
##  6
## 0.48 sec elapsed
tic(); r_bg(function(i) 1 + i, list(i = 5)); toc()
## PROCESS 'Rterm', running, pid 7276.
## 0.12 sec elapsed
For code that takes a while to run, the “notorious”
r_bg() is a clear winner–it keeps your console free to use while working hard on your behalf in the background.
Mixing it up with R6
You can browse through the available functions if you type
<TAB> if nothing shows up automatically). You may notice that some of the options are not functions at all, but rather R6 class generators.
##  "R6ClassGenerator"
What in the world is R6?1 Unlike regular R objects, R6 objects are mutable. That is, you can change them without having to reassign the changed version to the old variable name. This sort of functionality is particularly well-suited to processes.
In pseudo-code, it goes something like this:
# "normal" R - variables don't change unless they are reassigned my_num <- 7 my_num <- multiply_three(my_num) my_num # becomes 21 because it has been re-assigned
# the world of R6 my_num <- MyR6Class$new(7) my_num$multiply_three() my_num # magically becomes 21
Those of you familiar with Python or other programming languages may already have some intuition for mutable objects.
Process versus session
Among the R6 classes in the package are
r_process starts in the background, evaluates an R function call, and then quits.
rp <- r_process$new(r_process_options(func = function(i) i + 1, args = list(i = 2))) rp$wait() # if we call get_result too soon, we get an error rp$get_result()
##  3
## PROCESS 'Rterm', finished.
On the other hand, an
r_session continues to run in the background until it is explicitly closed.
rs <- r_session$new() rs$run(function(i) i + 1, args = list(i = 2))
##  3
rs$run(function() paste(stringr::fruit[1:3], collapse = " and "))
##  "apple and apricot and avocado"
## R SESSION, alive, idle, pid 11176.
R is a single-threaded language, meaning that it can only handle one thing at a time. We have ways of speeding things up with vectorization and parallelization, but
callr provides us with even more flexibility. We can partition code into separate processes ourselves. Next time, instead of
source()ing your script or–as in my previous case–opening up a new RStudio terminal, consider if it’s time to call up an R process.