## delayed: Framework for Parallelizing Dependent Tasks
## Version: 0.2.1

R supports a range of options to parallelize computation. For an overview, see the HPC Task View on CRAN. In general, these options work extremely well for problems that are embarassingly parallel, in that they support procedures such as parallel lapply calls and parallel for loops – essentially map operations. However, there is no easy way to parallelize dependent tasks in R.

In contrast, the Python language has the excellent framework for exactly this purpose – dask. dask makes it easy to build up a graph of interdependent tasks and then execute them in parallel in an order that optimizes performance (Dask Development Team 2016). The present package seeks to reproduce a subset of that functionality in R, specifically the delayed module. To parallelize across the tasks, we leverage the excellent future package (Bengtsson 2017).

The power of the delayed framework is best appreciated when demonstrated by example.

## Example

The two primary ways to generate Delayed objects in R are via the delayed and delayed_fun functions.

delayed is used to delay expressions

# delay a simple expression
delayed_object <- delayed(3 + 4)
print(delayed_object)
## [1] "delayed(3 + 4)"
# compute its result
delayed_object$compute() ## [1] 7 …while delayed_fun wraps a function so that it returns Delayed results # delay a function x2 <- function(x) {x * x} delayed_x2 <- delayed_fun(x2) # calling it returns a delayed call delayed_object <- delayed_x2(4) print(delayed_object) ## [1] "delayed(x2(x = 4))" # again, we can compute its result delayed_object$compute()
## [1] 16

These elements of the functionality of delayed are substantially similar to the facilities already offered by the future package. delayed diverges from future by offereing the ability to chain Delayed objects together. For example:

# delay a simple expression
delayed_object_7 <- delayed(3 + 4)

# and another
delayed_object_3 <- delayed(1 + 2)

# delay a function for addition
adder <- function(x, y){x + y}

# but now, use one delayed as input to another

# We can still compute its result.
chained_delayed_10$compute() ## [1] 10 We can visualize the dependency structure of these delayed tasks by calling plot on the resultant Delayed object: plot(chained_delayed_10) ## Parallelization Now that we’ve had an elementary look at the functionality offered by delayed, we may take a look at how to parallelize dependent computations – the core problem addressed by the package. We can easily parallelize across dependency structures by specifying a future plan. Let’s try it out library(future) plan(multicore, workers = 2) # re-define the delayed object from above delayed_object_7 <- delayed(3 + 4) delayed_object_3 <- delayed(1 + 2) chained_delayed_10 <- delayed_adder(delayed_object_7, delayed_object_3) # compute it using the future plan (two multicore workers), verbose mode lets us # see the computation order chained_delayed_10$compute(nworkers = 2, verbose = TRUE)
## updating 1 + 2 from ready to running
## updating 3 + 4 from ready to running
## updating 3 + 4 from running to resolved
## updating 1 + 2 from running to resolved
## updating adder(x = delayed_object_7, y = delayed_object_3) from waiting to ready
## updating adder(x = delayed_object_7, y = delayed_object_3) from ready to running
## updating adder(x = delayed_object_7, y = delayed_object_3) from running to resolved
## [1] 10