# About this book

*Targeted Learning in R: Causal Data Science with the tlverse Software
Ecosystem* is an open source, reproducible electronic handbook for applying the
Targeted Learning methodology in practice using the

`tlverse`

software
ecosystem. This work is currently in an early draft
phase and is available to facilitate input from the community. To view or
contribute to the available content, consider visiting the GitHub
repository.

## 0.1 Outline

The contents of this handbook are meant to serve as a reference guide for applied research as well as materials that can be taught in a series of short courses focused on the applications of Targeted Learning. Each section introduces a set of distinct causal questions, motivated by a case study, alongside statistical methodology and software for assessing the causal claim of interest. The (evolving) set of materials includes

- Motivation: Why we need a statistical revolution
- The Roadmap and introductory case study: the WASH Beneifits data
- Introduction to the
`tlverse`

software ecosystem - Cross-validation with the
`origami`

package - Ensemble machine learning with the
`sl3`

package - Targeted learning for causal inference with the
`tmle3`

package - Optimal treatments regimes and the
`tmle3mopttx`

package - Stochastic treatment regimes and the
`tmle3shift`

package - Causal mediation analysis with the
`tmle3mediate`

package -
*Coda*: Why we need a statistical revolution

## What this book is not

The focus of this work is **not** on providing in-depth technical descriptions
of current statistical methodology or recent advancements. Instead, the goal is
to convey key details of state-of-the-art techniques in an manner that is both
clear and complete, without burdening the reader with extraneous information.
We hope that the presentations herein will serve as references for researchers
– methodologists and domain specialists alike – that empower them to deploy
the central tools of Targeted Learning in an efficient manner. For technical
details and in-depth descriptions of both classical theory and recent advances
in the field of Targeted Learning, the interested reader is invited to consult
van der Laan and Rose (2011) and/or van der Laan and Rose (2018) as appropriate. The primary literature
in statistical causal inference, machine learning, and non/semiparametric theory
include many of the most recent advances in Targeted Learning and related areas.

##
0.2 Reproduciblity with the `tlverse`

The `tlverse`

software ecosystem is a growing collection of packages, several of
which are quite early on in the software lifecycle. The team does its best to
maintain backwards compatibility. Once this work reaches completion, the
specific versions of the `tlverse`

packages used will be archived and tagged to
produce it.

This book was written using bookdown, and the complete source is available on GitHub. This version of the book was built with R version 4.0.2 (2020-06-22), pandoc version 2.2, and the following packages:

package | version | source |
---|---|---|

bookdown | 0.22.3 | Github (rstudio/bookdown@c8883c9) |

bslib | 0.2.5.9001 | Github (rstudio/bslib@ae5e994) |

dagitty | 0.3-1 | CRAN (R 4.0.2) |

data.table | 1.14.0 | CRAN (R 4.0.2) |

delayed | 0.3.0 | CRAN (R 4.0.2) |

downlit | 0.2.1 | CRAN (R 4.0.2) |

dplyr | 1.0.6 | CRAN (R 4.0.2) |

forecast | 8.15 | CRAN (R 4.0.2) |

ggdag | 0.2.3 | CRAN (R 4.0.2) |

ggfortify | 0.4.11 | CRAN (R 4.0.2) |

ggplot2 | 3.3.3 | CRAN (R 4.0.2) |

kableExtra | 1.3.4 | CRAN (R 4.0.2) |

knitr | 1.33 | CRAN (R 4.0.2) |

mvtnorm | 1.1-2 | CRAN (R 4.0.2) |

origami | 1.0.4 | Github (tlverse/origami@35e8b79) |

randomForest | 4.6-14 | CRAN (R 4.0.2) |

readr | 1.4.0 | CRAN (R 4.0.2) |

rmarkdown | 2.8 | CRAN (R 4.0.2) |

skimr | 2.1.3 | CRAN (R 4.0.2) |

sl3 | 1.4.3 | Github (tlverse/sl3@5496bfb) |

stringr | 1.4.0 | CRAN (R 4.0.2) |

tibble | 3.1.2 | CRAN (R 4.0.2) |

tidyr | 1.1.3 | CRAN (R 4.0.2) |

tmle3 | 0.2.0 | Github (tlverse/tmle3@425e21c) |

tmle3mediate | 0.0.3 | Github (tlverse/tmle3mediate@27f8ee7) |

tmle3mopttx | 0.1.0 | Github (tlverse/tmle3mopttx@9fb1a3b) |

tmle3shift | 0.2.0 | Github (tlverse/tmle3shift@43f6fc0) |

## 0.3 Learning resources

To effectively utilize this handbook, the reader need not be a fully trained
statistician to begin understanding and applying these methods. However, it is
highly recommended for the reader to have an understanding of basic statistical
concepts such as confounding, probability distributions, confidence intervals,
hypothesis tests, and regression. Advanced knowledge of mathematical statistics
may be useful but is not necessary. Familiarity with the `R`

programming
language will be essential. We also recommend an understanding of introductory
causal inference.

For learning the `R`

programming language we recommend the following (free)
introductory resources:

- Software Carpentry’s
*Programming with*`R`

- Software Carpentry’s
`R`

for Reproducible Scientific Analysis - Garret Grolemund and Hadley Wickham’s
`R`

for Data Science

For a general introduction to causal inference, we recommend

## 0.4 Setup instructions

### 0.4.1 R and RStudio

**R** and **RStudio** are separate downloads and installations. R is the
underlying statistical computing environment. RStudio is a graphical integrated
development environment (IDE) that makes using R much easier and more
interactive. You need to install R before you install RStudio.

#### 0.4.1.1 Windows

##### 0.4.1.1.1 If you already have R and RStudio installed

- Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
- To check which version of R you are using, start RStudio and the first thing
that appears in the console indicates the version of R you are
running. Alternatively, you can type
`sessionInfo()`

, which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, please download and install it. You can check here for more information on how to remove old versions from your system if you wish to do so.

##### 0.4.1.1.2 If you don’t have R and RStudio installed

- Download R from the CRAN website.
- Run the
`.exe`

file that was just downloaded - Go to the RStudio download page
- Under
*Installers*select**RStudio x.yy.zzz - Windows XP/Vista/7/8**(where x, y, and z represent version numbers) - Double click the file to install it
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.

#### 0.4.1.2 macOS / Mac OS X

##### 0.4.1.2.1 If you already have R and RStudio installed

- Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
- To check the version of R you are using, start RStudio and the first thing
that appears on the terminal indicates the version of R you are running.
Alternatively, you can type
`sessionInfo()`

, which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, please download and install it.

##### 0.4.1.2.2 If you don’t have R and RStudio installed

- Download R from the CRAN website.
- Select the
`.pkg`

file for the latest R version - Double click on the downloaded file to install R
- It is also a good idea to install XQuartz (needed by some packages)
- Go to the RStudio download page
- Under
*Installers*select**RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit)**(where x, y, and z represent version numbers) - Double click the file to install RStudio
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.

#### 0.4.1.3 Linux

- Follow the instructions for your distribution
from CRAN, they provide information
to get the most recent version of R for common distributions. For most
distributions, you could use your package manager (e.g., for Debian/Ubuntu run
`sudo apt-get install r-base`

, and for Fedora`sudo yum install R`

), but we don’t recommend this approach as the versions provided by this are usually out of date. In any case, make sure you have at least R 3.3.1. - Go to the RStudio download page
- Under
*Installers*select the version that matches your distribution, and install it with your preferred method (e.g., with Debian/Ubuntu`sudo dpkg -i rstudio-x.yy.zzz-amd64.deb`

at the terminal). - Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.

These setup instructions are adapted from those written for Data Carpentry: R for Data Analysis and Visualization of Ecological Data.