# Welcome

*Targeted Learning in R: Causal Data Science with the tlverse Software
Ecosystem* is an fully reproducible, open source, electronic handbook for
applying Targeted Learning methodology in practice using the software stack
provided by the

`tlverse`

ecosystem. This work is
a draft phase and is publicly available to solicit input from the community. To
view or contribute, visit the GitHub
repository.

## Outline

The contents of this handbook are meant to serve as a reference guide for both applied research and for the teaching of short courses illustrating successful applications of the Targeted Learning statistical paradigm. Each section introduces a set of distinct causal inference questions, often motivated by a case study, alongside statistical methodology and open source software for assessing the scientific (causal) claim of interest. The set of materials currently includes

- Motivation: Why we need a statistical revolution
- The Roadmap and introductory case study: the WASH Benefits Bangladesh dataset
- Introduction to the
`tlverse`

software ecosystem - Cross-validation with the
`origami`

package - Ensemble machine learning with the
`sl3`

package - Targeted learning for causal inference with the
`tmle3`

package - Optimal treatments regimes and the
`tmle3mopttx`

package - Stochastic treatment regimes and the
`tmle3shift`

package - Causal mediation analysis with the
`tmle3mediate`

package -
*Coda*: Why we need a statistical revolution

## What this book is not

This book does **not** focus on providing in-depth technically sophisticated
descriptions of modern statistical methodology or recent advancements in
Targeted Learning. Instead, the goal is to convey key details of these
state-of-the-art statistical techniques in a manner that is clear, complete, and
intuitive, while simultaneously avoiding the cognitive burden carried by
extraneous details (e.g., mathematically niche theoretical arguments). Our aim
is for the presentations herein to serve as a coherent reference for researchers
– applied methodologists and domain specialists alike – that empower them to
deploy the central statistical tools of Targeted Learning in a manner efficient
for their scientific pursuits. For a mathematically sophisticated treatment of
some of these topics, inclusive of in-depth technical details, in the field of
Targeted Learning, the interested reader is invited to consult van der Laan and Rose (2011)
and van der Laan and Rose (2018), among numerous other works, as appropriate. The primary
literature in causal inference, machine learning, and non/semi-parametric
statistical theory include many of the most recent advances in Targeted Learning
and related areas. For background in causal inference, Hernán and Robins (2022) serves
as an introductory modern reference.

## Reproduciblity

The `tlverse`

software ecosystem is a growing collection of packages, several of
which are quite early on in the software lifecycle. The team does its best to
maintain backwards compatibility. Once this work reaches completion, the
specific versions of the `tlverse`

packages used will be archived and tagged to
produce it.

This book was written using bookdown, and the complete source is available on GitHub. This version of the book was built with R version 4.2.0 (2022-04-22), pandoc version 2.7.3, and the following packages:

package | version | source |
---|---|---|

bookdown | 0.26.3 | Github (rstudio/bookdown@169c43b6bb95213f2af63a95acd4e977a58a3e1f) |

bslib | 0.3.1 | CRAN (R 4.2.0) |

dagitty | 0.3-1 | CRAN (R 4.2.0) |

data.table | 1.14.2 | CRAN (R 4.2.0) |

delayed | 0.3.0 | CRAN (R 4.2.0) |

downlit | 0.4.0 | CRAN (R 4.2.0) |

dplyr | 1.0.9 | CRAN (R 4.2.0) |

forecast | 8.16 | CRAN (R 4.2.0) |

future | 1.26.1 | CRAN (R 4.2.0) |

ggdag | 0.2.4 | CRAN (R 4.2.0) |

ggfortify | 0.4.14 | CRAN (R 4.2.0) |

ggplot2 | 3.3.6 | CRAN (R 4.2.0) |

kableExtra | 1.3.4 | CRAN (R 4.2.0) |

knitr | 1.39 | CRAN (R 4.2.0) |

mvtnorm | 1.1-3 | CRAN (R 4.2.0) |

origami | 1.0.5 | Github (tlverse/origami@e1b8fe6f5e75fff1d48eed115bb81475c9bd506e) |

randomForest | 4.7-1.1 | CRAN (R 4.2.0) |

readr | 2.1.2 | CRAN (R 4.2.0) |

rmarkdown | 2.14 | CRAN (R 4.2.0) |

skimr | 2.1.4 | CRAN (R 4.2.0) |

sl3 | 1.4.5 | Github (tlverse/sl3@de445c210eefa5aa9dd4c0d1fab8126f0d7c5eeb) |

stringr | 1.4.0 | CRAN (R 4.2.0) |

tibble | 3.1.7 | CRAN (R 4.2.0) |

tidyr | 1.2.0 | CRAN (R 4.2.0) |

tmle3 | 0.2.0 | Github (tlverse/tmle3@ed72f8a20e64c914ab25ffe015d865f7a9963d27) |

tmle3mediate | 0.0.3 | Github (tlverse/tmle3mediate@70d1151c4adb54d044f355d06d07bcaeb7f8ae07) |

tmle3mopttx | 1.0.0 | Github (tlverse/tmle3mopttx@c8c675f051bc5ee6d51fa535fe6dc80791d4d1b7) |

tmle3shift | 0.2.0 | Github (tlverse/tmle3shift@4ed52b50af501a5fa2e6257b568d17fd485d3f42) |

## Learning resources

To effectively utilize this handbook, the reader need not be a fully trained
statistician to begin understanding and applying these methods. However, it is
highly recommended for the reader to have an understanding of basic statistical
concepts such as confounding, probability distributions, confidence intervals,
hypothesis tests, and regression. Advanced knowledge of mathematical statistics
may be useful but is not necessary. Familiarity with the `R`

programming
language will be essential. We also recommend an understanding of introductory
causal inference.

For learning the `R`

programming language we recommend the following (free)
introductory resources:

- Software Carpentry’s
*Programming with*`R`

- Software Carpentry’s
`R`

for Reproducible Scientific Analysis - Garret Grolemund and Hadley Wickham’s
`R`

for Data Science

For a general, modern introduction to causal inference, we recommend

- Miguel A. Hernán and James M. Robins’
*Causal Inference: What If*(2022) - Jason A. Roy’s
*A Crash Course in Causality: Inferring Causal Effects from Observational Data*on Coursera

Feel free to suggest a resource!

## Want to help?

Any feedback on the book is very welcome. Feel free to open an issue, or to make a Pull Request if you spot a typo.