Measuring Speed with Microbenchmark

Contents

%load_ext rpy2.ipython
/Users/cc1333/Library/Python/3.9/lib/python/site-packages/rpy2/ipython/rmagic.py:77: UserWarning: The Python package `pandas` is strongly recommended when using `rpy2.ipython`. Unfortunately it could not be loaded (error: No module named 'pandas'), but at least we found `numpy`.
  warnings.warn('The Python package `pandas` is strongly '
%%R
suppressPackageStartupMessages({
  suppressMessages({
    library(microbenchmark)
  })
})

Measuring Speed with Microbenchmark#

One way of measuring the time taken to run an R function is to invoke the Sys.time() function from the base R package. Consider the following toy function:

%%R
sleep <- function(sleep_time, print_string = FALSE) {
  # type checking
  stopifnot(typeof(sleep_time) == "double", sleep_time >= 0)
  
  # sleeping
  Sys.sleep(sleep_time)
  
  # print a string if print_string is TRUE
  if (print_string) 
  {
    print("I am awake now")
  }
  
}

This function takes an input variable called sleep_time, checks that sleep_time is a positive number, pauses execution for sleep_time seconds, then prints “I am awake now” to the console. Naturally, the functions you may work with are more interesting than this simple toy function, but it remains useful from a pedagogical perspective. Suppose we want to time this function using Sys.time(), we can do so as follows:

%%R
t_start <- Sys.time()
sleep(5)
t_end <- Sys.time()

t_end - t_start
Time difference of 5.005392 secs

The above code times the execution of sleep() exactly once. Due to uncontrollable variations the time taken to run a function will not be exactly the same every time we run it. Hence, we may want to run the function many times to get an idea of the distribution of its run time. To do so we could perform the following:

%%R
# Run the function n_runs times and report the average time
n_runs <- 10
sleep_time <- 0.1
time_taken <- system.time(replicate(n_runs, sleep(sleep_time)))
avg_time <- time_taken["elapsed"] / n_runs
avg_time
elapsed 
 0.1045 

The microbenchmark() function included in the microbenchmark package provides a more accurate alternative. From the description in the documentation, see ?microbenchmark, microbenchmark does the following:

  • Tries hard to accurately measure only the time take to evaluate the function.

  • Uses sub-millisecond timing functions built into your OS for increased accuracy.

Let’s try it out.

%%R

# benchmark sleep(0.1) ten times and report the results in nanoseconds
results <- microbenchmark(sleep(0.1), times = 100, unit = 'ns')
results_summary <- summary(results)
print(results_summary$min/1e9)
print(results_summary$mean/1e9) 
print(results_summary$max/1e9)
[1] 0.09879479
[1] 0.1021442
[1] 0.1036635
In addition: Warning message:
In microbenchmark(sleep(0.1), times = 100, unit = "ns") :
  less accurate nanosecond times to avoid potential integer overflows

The microbenchmark function provides additional information over the approaches above, including the minimum, maximum and mean evaluation time. Let’s plot a histogram of the results.

%%R
hist(results$time/1e9, xlab = "Time (s)", main = "Histogram of Microbenchmark Results")
../../_images/fa369e04a5c19968bf8363c08c2eed3d80603fe0490a4642745cc2c14aaf2dd8.png

Exercise#

Try microbenchmarking one of your own functions.