Welcome!
This open source, reproducible vignette is for two halfday workshops at the
Society for Epidemiologic Research Meeting on June 14, 2022. Beyond introducing
Targeted Learning (TL), these workshops focus on applying the methodology in
practice using the tlverse
software ecosystem.
These materials are based on a working draft of the book Targeted
Learning in R
: Causal Data Science with the tlverse
Software
Ecosystem, which includes indepth
discussion of these topics and much more, and may serve as a useful reference
to accompany these workshop materials.
Important links

R
version 4.2.0+: InstallR
or update to the most recent major version ofR
(which is 4.2.0+): https://cloud.rproject.org. For more operating systemspecific instructions, please see the setup instructions below. 
R
packages: Please try to set up theR
virtual environment before the day of the workshop by following these instructions. If you are experiencing issues or if this is your first time usingrenv
, this introduction to the package might be helpful: https://rstudio.github.io/renv/articles/renv.html. As an alternative to the virtual environment, you can install the relevant software packages instead using this installR
script.  Installation errors: You will probably exceed the GitHub API rate limit during the installation, which will throw an error. This issue and a solution are addressed here: https://tlverse.org/ser2022workshop/tlverse.html#installtlverse.

Code:
R
script files for each section of the workshop are available in the “R_code” folder in the GitHub repository for the workshop: https://github.com/tlverse/ser2022workshops/tree/master/R_code.  Slides: Any slide decks are available in the “slides” folder in the GitHub repository for the workshop: https://github.com/tlverse/ser2022workshops/tree/master/slides. ## About {}
Targeted Learning I: Causal Inference Meets Machine Learning (8:30A–12:00P)
This workshop will provide an introduction to the field of targeted learning for
causal inference, and the corresponding tlverse
software
ecosystem. Emphasis will be placed on targeted
minimum lossbased estimation (TMLE) of causal effects under single time point
interventions, including extensions for missing covariates and outcomes. These
multiply robust, efficient plugin estimators use stateoftheart machine
learning tools to flexibly adjust for confounding while yielding valid
statistical inference. In addition to discussion, this workshop will incorporate
both interactive activities and handson, guided R
programming exercises, to
allow participants the opportunity to familiarize themselves with methodology
and tools that translate to realworld data analysis. It is highly recommended
for participants to have an understanding of basic statistical concepts such as
confounding, probability distributions, confidence intervals, hypothesis
testing, and regression. Advanced knowledge of mathematical statistics is
useful but not necessary. Familiarity with the R
programming language will be
essential.
Targeted Learning II: Advanced Applications of Causal Inference (1:00–4:00P)
Building on an introduction to targeted learning and its software ecosystem,
the tlverse
, this workshop serves as a
walkthrough of its use for estimation of advanced parameters motivated by
causal inference. In particular, we will discuss targeted estimators of the
causal effects of dynamic, optimal dynamic, and stochastic interventions; time
permitting, estimation of the effects of interventions in settings with
timetoevent (survival) outcomes may also be discussed. Throughout, we will
draw on advanced uses of machine learning, including conditional density
estimation and categorical outcome prediction, highlighting the extensibility
of the tlverse
. In addition to discussion, this workshop will incorporate both
interactive activities and handson, guided R
programming exercises, to allow
participants the opportunity to familiarize themselves with methodology and
tools that translate to realworld data analysis. It is highly recommended for
participants to have an understanding of basic statistical concepts such as
confounding, probability distributions, confidence intervals, hypothesis
testing, and regression. Advanced knowledge of mathematical statistics is
useful but not necessary. Familiarity with the R
programming language will be
essential. Prior experience with the tlverse
(as covered in the SER workshop
“Targeted Learning I”) is highly recommended.
Schedule
 08:30–9:15A: Introduction to Targeted Learning by Alan (slide deck here: https://github.com/tlverse/ser2022workshop/tree/master/slides/intro.pdf)
 09:15–09:45A: Introduction to the
tlverse
and WASH Benefits Bangladesh Study by Rachael  09:45–10:00A: Break
 10:00–10:30A: Super learning in the
tlverse
with thesl3
R
package by Rachael  10:30–10:45A: Programming exercises with
sl3
 10:45–11:15A: Targeted minimum lossbased estimation in the
tlverse
with thetmle3
R
package by Alan and Ivana  11:15–11:30A: Programming exercises with
tmle3
 11:30A–12:00P: Q&A with Mark and Alan
 12:00–01:00P: Lunch Break
 01:00–02:00P: Optimal treatment regimes with the
tmle3mopttx
R
package by Ivana  02:0002:15P: Programming exercises with
tmle3mopttx
 02:3003:15P: Stochastic treatment regimes with the
tmle3shift
R
package by Nima  03:1503:30P: Programming exercises with
tmle3shift
 03:3004:00P: Q&A with Mark and Alan
NOTE: All listings are in Central Time.
About the instructors
Alan Hubbard
Alan Hubbard is Professor of Biostatistics, former head of the Division of Biostatistics at UC Berkeley, and head of data analytics core at UC Berkeley’s SuperFund research program. His current research interests include causal inference, variable importance analysis, statistical machine learning, estimation of and inference for dataadaptive statistical target parameters, and targeted minimum lossbased estimation. Research in his group is generally motivated by applications to problems in computational biology, epidemiology, and precision medicine.
Nima Hejazi
Nima Hejazi, PhD, is an incoming Assistant Professor
of Biostatistics at the Harvard T.H. Chan School of Public
Health. He received his PhD in
biostatistics at UC Berkeley, working under the supervision of Mark van der Laan
and Alan Hubbard, and afterwards held an NSF postdoctoral research fellowship.
Nima’s research interests blend causal inference, machine learning,
semiparametric estimation, and computational statistics – areas of recent
emphasis include causal mediation analysis, efficiency under biased sampling
designs, non/semiparametric sieve estimation with machine learning, and
targeted lossbased estimation. His work is primarily driven by applications in
clinical trials (esp. vaccine efficacy trials), infectious disease epidemiology,
and computational biology. Nima is passionate about statistical computing
and open source software design standards for statistical data science, and he
has coled or contributed significantly to many tlverse
packages (hal9001
,
sl3
, tmle3
, origami
, tmle3shift
, tmle3mediate
).
Ivana Malenica
Ivana Malenica is a PhD student in biostatistics advised by Mark van der Laan. Ivana is currently a fellow at the Berkeley Institute for Data Science, after serving as a NIH Biomedical Big Data and FreeportMcMoRan Genomic Engine fellow. She earned her Master’s in Biostatistics and Bachelor’s in Mathematics, and spent some time at the Translational Genomics Research Institute. Very broadly, her research interests span non/semiparametric theory, probability theory, machine learning, causal inference and highdimensional statistics. Most of her current work involves complex dependent settings (dependence through time and network) and adaptive sequential designs.
Rachael Phillips
Rachael Phillips is a PhD student in biostatistics, advised by Alan Hubbard and Mark van der Laan. She has an MA in Biostatistics, BS in Biology, and BA in Mathematics. As a student of targeted learning, Rachael integrates causal inference, machine learning, and statistical theory to answer causal questions with statistical confidence. She is motivated by issues arising in healthcare, and is especially interested in clinical algorithm frameworks and guidelines.
Mark van der Laan
Mark van der Laan, PhD, is Professor of Biostatistics and Statistics at UC
Berkeley. His research interests include statistical methods in computational
biology, survival analysis, censored data, adaptive designs, targeted maximum
likelihood estimation, causal inference, dataadaptive lossbased learning, and
multiple testing. His research group developed lossbased super learning in
semiparametric models, based on crossvalidation, as a generic optimal tool for
the estimation of infinitedimensional parameters, such as nonparametric density
estimation and prediction with both censored and uncensored data. Building on
this work, his research group developed targeted maximum likelihood estimation
for a target parameter of the datagenerating distribution in arbitrary
semiparametric and nonparametric models, as a generic optimal methodology for
statistical and causal inference. Most recently, Mark’s group has focused in
part on the development of a centralized, principled set of software tools for
targeted learning, the tlverse
. Unfortunately, Mark is not able to attend
SER 2022 in person.
Jeremy Coyle
Jeremy Coyle, PhD, is a consulting data scientist and statistical programmer,
currently leading the software development effort that has produced the
tlverse
ecosystem of R
packages and related software tools. Jeremy earned his
PhD in Biostatistics from UC Berkeley in 2016, primarily under the supervision
of Alan Hubbard. Unfortunately, Jeremy is not able to attend SER 2022 in person.
Reproduciblity with the tlverse
{#repro}
The tlverse
software ecosystem is a growing collection of packages, several of
which are quite early on in the software lifecycle. The team does its best to
maintain backwards compatibility. Once this work reaches completion, the
specific versions of the tlverse
packages used will be archived and tagged to
produce it.
This book was written using bookdown, and the complete source is available on GitHub. This version of the book was built with R version 4.2.0 (20220422), pandoc version 2.7.3, and the following packages:
package  version  source 

bookdown  0.26.3  Github (rstudio/bookdown@169c43b6bb95213f2af63a95acd4e977a58a3e1f) 
bslib  0.3.1.9000  Github (rstudio/bslib@a4946a49499438e71dce29c810a41e2d05170376) 
data.table  1.14.2  CRAN (R 4.2.0) 
delayed  0.3.0  CRAN (R 4.2.0) 
devtools  2.4.3  CRAN (R 4.2.0) 
downlit  0.4.0  CRAN (R 4.2.0) 
dplyr  1.0.9  CRAN (R 4.2.0) 
ggplot2  3.3.6  CRAN (R 4.2.0) 
here  1.0.1  CRAN (R 4.2.0) 
kableExtra  1.3.4  CRAN (R 4.2.0) 
knitr  1.39  CRAN (R 4.2.0) 
mvtnorm  1.13  CRAN (R 4.2.0) 
origami  1.0.5  Github (tlverse/origami@e1b8fe6f5e75fff1d48eed115bb81475c9bd506e) 
readr  2.1.2  CRAN (R 4.2.0) 
rmarkdown  2.14  CRAN (R 4.2.0) 
skimr  2.1.4  CRAN (R 4.2.0) 
sl3  1.4.5  Github (tlverse/sl3@de445c210eefa5aa9dd4c0d1fab8126f0d7c5eeb) 
stringr  1.4.0  CRAN (R 4.2.0) 
tibble  3.1.7  CRAN (R 4.2.0) 
tidyr  1.2.0  CRAN (R 4.2.0) 
tidyverse  1.3.1  CRAN (R 4.2.0) 
tmle3  0.2.0  Github (tlverse/tmle3@ed72f8a20e64c914ab25ffe015d865f7a9963d27) 
tmle3mediate  0.0.3  Github (tlverse/tmle3mediate@70d1151c4adb54d044f355d06d07bcaeb7f8ae07) 
tmle3mopttx  1.0.0  Github (tlverse/tmle3mopttx@c8c675f051bc5ee6d51fa535fe6dc80791d4d1b7) 
tmle3shift  0.2.0  Github (tlverse/tmle3shift@4ed52b50af501a5fa2e6257b568d17fd485d3f42) 
R
and RStudio setup instructions {#setup}
R
and RStudio are separate downloads and installations. R
is the
underlying statistical computing environment. RStudio is a graphical integrated
development environment (IDE) that makes using R
much easier and more
interactive. You need to install R
before you install RStudio.
Windows
0.0.0.1 If you already have R
and RStudio installed
 Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
 To check which version of
R
you are using, start RStudio and the first thing that appears in the console indicates the version ofR
you are running. Alternatively, you can typesessionInfo()
, which will also display which version ofR
you are running. Go on the CRAN website and check whether a more recent version is available. If so, please download and install it. You can check here for more information on how to remove old versions from your system if you wish to do so.
0.0.0.2 If you don’t have R
and RStudio installed
 Download
R
from the CRAN website.  Run the
.exe
file that was just downloaded.  Go to the RStudio download page.
 Under Installers select RStudio x.yy.zzz  Windows XP/Vista/7/8 (where x, y, and z represent version numbers).
 Double click the file to install it.
 Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
macOS / Mac OS X
0.0.0.3 If you already have R
and RStudio installed
 Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
 To check the version of
R
you are using, start RStudio and the first thing that appears on the terminal indicates the version ofR
you are running. Alternatively, you can typesessionInfo()
, which will also display which version ofR
you are running. Go on the CRAN website and check whether a more recent version is available. If so, please download and install it.
0.0.0.4 If you don’t have R
and RStudio installed
 Download
R
from the CRAN website.  Select the
.pkg
file for the latestR
version.  Double click on the downloaded file to install
R
.  It is also a good idea to install XQuartz (needed by some packages).
 Go to the RStudio download page.
 Under Installers select RStudio x.yy.zzz  Mac OS X 10.6+ (64bit) (where x, y, and z represent version numbers).
 Double click the file to install RStudio.
 Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
Linux
 Follow the instructions for your distribution from
CRAN; they provide information
to get the most recent version of
R
for common distributions. For most distributions, you could use your package manager (e.g., for Debian/Ubuntu runsudo aptget install rbase
, and for Fedorasudo yum install R
), but we don’t recommend this approach as the versions provided by this are usually out of date. In any case, make sure you have at leastR
4.2.0.  Go to the RStudio download page.
 Under Installers select the version that matches your distribution, and
install it with your preferred method (e.g., with Debian/Ubuntu
sudo dpkg i rstudiox.yy.zzzamd64.deb
at the terminal).  Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
These setup instructions are adapted from those written for Data Carpentry: R
for Data Analysis and Visualization of Ecological
Data.