Over the past few months, I worked on several projects that involved accessing web API’s in R, which meant I spent a lot of time puzzling over the functions and code in the httr package. I came to really enjoy referring back to the content() function in particular–it seemed that I learned something new every time I went back to it!

httr basics

You can use httr’s GET function to read any URL into your R session. However, some URL’s are more conducive to parsing and manipulation than others. For example, I can use GET to retrieve a csv file.

library(httr)
internet_thing <- GET("https://cn.dataone.org/cn/v2/resolve/urn:uuid:9e123f84-ce0d-4094-b898-c9e73680eafa")

If you take a look at the internet_thing, it looks something like this:

The internet_thing includes a url, status_code, and a number of other attributes. The content looks particularly scary because it’s still in a raw format.

When we bring the content() function into the game, voila – the internet_thing gets automatically parsed into a tibble.

data <- content(internet_thing)
data[, 1:5] # showing just the first 5 columns
## # A tibble: 84 x 5
##    District.Name District.ID School.ID School.Name            Location    
##    <chr>               <int>     <int> <chr>                  <chr>       
##  1 Denali                  2     20010 Anderson School        Anderson    
##  2 Denali                  2     20030 Cantwell School        Cantwell    
##  3 Denali                  2     20040 Tri-Valley School      Healy       
##  4 Denali                  2     28010 Denali Peak Program    Healy       
##  5 Bering Strait           7     70010 Brevig Mission School  Brevig Miss~
##  6 Bering Strait           7     70040 Aniguiin School        Elim        
##  7 Bering Strait           7     70050 Little Diomede School  Diomede     
##  8 Bering Strait           7     70060 Martin L. Olson School Golovin     
##  9 Bering Strait           7     70070 Koyuk-Malimiut School  Koyuk       
## 10 Bering Strait           7     70080 Anthony A. Andrews Sc~ St. Michael 
## # ... with 74 more rows

How did content() understand that the gobbledygook was a csv?

If you look back at the internet_thing, you can see that text/csv specified as the content-type in the header section. The content() function “sees” this specification and tries to parse it in an appropriate manner. In this case, it means reading the content of the internet_thing in using read_csv(). (If you’re really curious about the specifics, more details are in the code!)

Also: for more info on API’s, the httr quickstart guide and vignettes are solid places to start. The first half of the RStudio Plumber webinar is also a great intro!

This post is dedicated to the small but neat things I learned as I was trying to figure out the logic behind the function, so…

On with the tricks!

Embrace the backtick

When I came across the following piece of code, I wasn’t entirely sure what the %||% operator did, so I tried the normal approach: googling it.

type <- type %||% x$headers[["Content-Type"]] %||%
    mime::guess_type(x$url, empty = "application/octet-stream")

This time, Google totally failed me, so I tried searching for it on the GitHub repo. And failed. Again. I knew it must be there, but I simply could not figure out how to search it!

Thankfully, Twitter was there for me in my time of need:

In retrospect, the backticks are obvious, but I couldn’t quite put my finger on the problem at the time. In fact, I still haven’t figured out how to Google for it directly (so send me a message if you do!).

If you want some more backtick action, try this in your R console:

?`'`

The null-default operator %||%

Also known as the null coalescing operator, this operator allows you to run through a series of values and take the first one that is not NULL.

I later discovered that this operator is exported not only by rlang but also purrr, and is used throughout the tidyverse set of packages.

That inspired me to write my own version of this function, my very own null-na-default operator, which returns the first argument that is neither NA nor NULL. I’ve only found one clunky use-case for it so far, but I’m keeping my eyes out for other opportunities.

`%|||%` <- function (x, y) {
  if (is.null(x) | is.na(x)) {
    y
  }
  else {
    x
  }
}

Check argument inputs with match.arg()

I am slow to develop this particular habit, but in general, it’s always a good idea to check a user’s arguments at the beginning of a function. That way, you can break out of the function early and avoid confusing error messages that get you chasing bugs in the wrong direction.

arg <- "howdy"
match.arg(arg, c("howdy", "aloha", "g'day"))
## [1] "howdy"
arg <- "howday"
match.arg(arg, c("howdy", "aloha", "g'day"))
## Error in match.arg(arg, c("howdy", "aloha", "g'day")): 'arg' should be one of "howdy", "aloha", "g'day"

It also assumes you want partial-matching but throws an error if multiple options match.

arg <- "how"
match.arg(arg, c("howdy", "aloha", "g'day"))
## [1] "howdy"
arg <- "how"
match.arg(arg, c("howdy", "aloha", "g'day", "how are ya?"))
## Error in match.arg(arg, c("howdy", "aloha", "g'day", "how are ya?")): 'arg' should be one of "howdy", "aloha", "g'day", "how are ya?"

switch() out your if-elses

You can’t always escape if-else chains, but sometimes you can use switch() to evaluate different expressions depending on your input. Take this example:

emph <- "smile"

if(emph == "regular") {
    print("how's it going?")
} else if (emph == "smile") {
    print("how's it going? :)")
} else if (emph == "exclamation") {
    print("how's it going?!")
}
## [1] "how's it going? :)"

versus

switch(emph,
       regular = print("how's it going?"),
       smile = print("how's it going? :)"),
       exclamation = print("how goes?!"))
## [1] "how's it going? :)"

The changes I’ve made here were pretty tame, but really, anything is possible (though you probably don’t want to surprise your users too much).

switch(emph,
       regular = paste("Which would you choose?", pi, "or pie?"),
       smile = 1:3,
       exclamation = factorial(10))
## [1] 1 2 3

As Jenny Bryan put it in her excellent Code Smells and Feels talk at the 2018 useR conference:

switch() is ideal if you need to dispatch different logic, based on a string.
You are allowed to write a helper function to generate that string.

The tidyverse friend of the switch is case_when. In Jenny’s words:

dplyr::case_when() is ideal if you need to dispatch different data, based on data (+ logic).

Note: the talk also covers our friend the null-coalescing operator %||%!

Strange bedfellows

I thought I was relatively comfortable with lists until I saw functions stored in a list. Check it out for yourself in the code of the parse_auto() helper function, which is called by content(). It still blows my mind a little, but I realize now that it’s a form of object-oriented programming.

Depending on what type of internet_thing we give it, parse_auto() will find the function best suited to it. For example, the following snippet (which I’ve shortened from the original) shows that different functions–the aptly named read_xml() and read_csv–are called to depending on whether the metadata of the internet_thing says it’s an XML file or a csv.

parsers$`text/xml` <- function(x, type = NULL, ...) {
  need_package("xml2")
  xml2::read_xml(x, encoding = encoding, ...)
}

parsers$`text/csv` <- function(x, type = NULL, ...) {
  need_package("readr")
  readr::read_csv(x, locale = readr::locale(encoding = encoding), ...)
}

tl;dr - do read the source code!

Admittedly, not all source code is created equal, but try to at least skim through! If you find that you can read through the code almost like it’s English, then you’ve probably found a human-readable gem.