\(\DeclareMathOperator{\expit}{expit}\) \(\DeclareMathOperator{\logit}{logit}\) \(\DeclareMathOperator*{\argmin}{\arg\!\min}\) \(\newcommand{\indep}{\perp\!\!\!\perp}\) \(\newcommand{\coloneqq}{\mathrel{=}}\) \(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\M}{\mathcal{M}}\) \(\renewcommand{\P}{\mathbb{P}}\) \(\newcommand{\I}{\mathbb{I}}\) \(\newcommand{\1}{\mathbbm{1}}\)

1 Welcome to the tlverse

1.1 What is the tlverse?

The tlverse is a new framework for doing Targeted Learning in R, inspired by the tidyverse ecosystem of R packages.

By analogy to the tidyverse:

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

So, the tlverse is

  • an opinionated collection of R packages for Targeted Learning
  • sharing an underlying philosophy, grammar, and set of data structures

1.2 tlverse components

These are the main packages that represent the core of the tlverse:

  • sl3: Modern Super Learning with Pipelines
    • What? A modern object-oriented re-implementation of the Super Learner algorithm, employing recently developed paradigms for R programming.
    • Why? A design that leverages modern tools for fast computation, is forward-looking, and can form one of the cornerstones of the tlverse.
  • tmle3: An Engine for Targeted Learning
    • What? A generalized framework that simplifies Targeted Learning by identifying and implementing a series of common statistical estimation procedures.
    • Why? A common interface and engine that accommodates current algorithmic approaches to Targeted Learning and is still flexible enough to remain the engine even as new techniques are developed.

In addition to the engines that drive development in the tlverse, there are some supporting packages — in particular, we have two…

  • origami: A Generalized Framework for Cross-Validation
    • What? A generalized framework for flexible cross-validation
    • Why? Cross-validation is a key part of ensuring error estimates are honest and preventing overfitting. It is an essential part of the both the Super Learner algorithm and Targeted Learning.
  • delayed: Parallelization Framework for Dependent Tasks
    • What? A framework for delayed computations (futures) based on task dependencies.
    • Why? Efficient allocation of compute resources is essential when deploying large-scale, computationally intensive algorithms.

A key principle of the tlverse is extensibility. That is, we want to support new Targeted Learning estimators as they are developed. The model for this is new estimators are implemented in additional packages using the core packages above. There are currently two featured examples of this:

  • tmle3mopttx: Optimal Treatments in tlverse
    • What? Learn an optimal rule and estimate the mean outcome under the rule
    • Why? Optimal Treatment is a powerful tool in precision healthcare and other settings where a one-size-fits-all treatment approach is not appropriate.
  • tmle3shift: Shift Interventions in tlverse
    • What? Shift interventions for continuous treatments
    • Why? Not all treatment variables are discrete. Being able to estimate the effects of continuous treatment represents a powerful extension of the Targeted Learning approach.