Course Information

This open source, reproducible vignette is for a full-day short course on March 19, 2023 at the International Biometric Society Eastern North American Region (ENAR) Conference in Nashville, Tennessee. Entitled “Targeted Learning: Advanced Methods for Causal Machine Learning”, this workshop provides a comprehensive introduction to the field of Targeted Learning, at the intersection of causal inference and machine learning, and its accompanying tlverse software ecosystem. Focus will be on targeted minimum loss-based estimation (TMLE) of causal effects of sophisticated interventions, including dynamic, optimal dynamic, stochastic regimes. The robust and efficient plug-in estimators that will be introduced leverage state-of-the-art machine learning via the super learner in order to flexibly adjust for confounding while yielding valid statistical inference.

This course will be of interest to both statistical and applied scientists engaged in biomedical/health studies, whether experimental or observational, who wish to apply cutting-edge statistical and causal inference methodology to rigorously formalize and answer research questions. This workshop incorporates interactive discussions and hands-on, guided R programming exercises, allowing participants to familiarize themselves with methodology and tools that translate to real-world data analysis.

Participants are highly recommended to have had prior training in basic statistical concepts (e.g., confounding, probability distributions, hypothesis testing and confidence intervals, regression). Advanced knowledge of mathematical statistics is useful but not necessary. Familiarity with the R programming language is essential.

Schedule

Pre-workshop software installation: Please see “Part 1: Preliminaries” of this website to set up R, RStudio, and the tlverse.
Pre-workshop reading: The Roadmap of Targeted Learning and Why We Need A Statistical Revolution
08:00-10:00: Introduction to Targeted Learning
10:00-10:20: Coffee break
10:20-10:45: Introduction to the tlverse
10:45-12:00: Super learning with the sl3 R package
12:00-01:00: Lunch break
01:00-01:30: Brief intro to the tmle3 R package
01:30-03:00: Optimal treatment regimes with the tmle3mopttx R package
03:00-03:20: Coffee break
03:20-05:00: Stochastic treatment regimes with the tmle3shift R package

Materials

The course materials on this website are based on a working draft of the book, Targeted Learning in R: Causal Data Science with the tlverse Software Ecosystem, which includes in-depth discussion of these topics and much more, and may serve as a useful reference to accompany these workshop materials.
The GitHub repository for this short course is available at https://github.com/tlverse/enar2023-workshop/. The GitHub repository contains the files for generating this website and additional learning resources, including
- R script files (for each section of the website that contains R code): https://github.com/tlverse/enar2023-workshop/tree/master/R_code, and
- Slide decks for various presentations that will be given: https://github.com/tlverse/enar2023-workshop/tree/master/slides.

Instructors

Mark van der Laan

Mark van der Laan is the Jiann-Ping Hsu/Karl E. Peace Professor of Biostatistics and Statistics at the University of California, Berkeley. He has made contributions to survival analysis, semiparametric statistics, multiple testing, and causal inference. He also developed the targeted maximum likelihood methodology and general theory for super-learning. He is a founding editor of the Journal of Causal Inference and International Journal of Biostatistics. He has authored four books on Targeted Learning, censored data and multiple testing, authored over 300 publications, and graduated over 50 PhD students. He received the COPSS Presidents’ Award in 2005, the Mortimer Spiegelman Award in 2004, and the van Dantzig Award in 2005.

Alan Hubbard

Alan Hubbard is a Professor and the Head of Biostatistics at the University of California at Berkeley (UCB), Co-director of the Center of Targeted Learning, Head of the Computational Biology Core of the SuperFund Center at UCB (NIH/EPA), and a consulting statistician on several federally funded and foundation projects. He has worked as well on projects ranging from molecular biology of aging, epidemiology, and infectious disease modeling, but most all of his work has focused on semi-parametric estimation in high-dimensional data. His current methods-research focuses on precision medicine, variable importance, statistical inference for data-adaptive parameters, and statistical software implementing targeted learning methods. Alan is currently working in several areas of applied research, including early childhood development in developing countries, environmental genomics and comparative effectiveness research. He has most recently concentrated on using complex patient data for better prediction for acute trauma patients.

Nima Hejazi

Nima Hejazi, is an Assistant Professor of Biostatistics at the Harvard T.H. Chan School of Public Health. He recently completed an NSF Mathematical Sciences Postdoctoral Research Fellowship, and, prior to this, obtained his PhD in Biostatistics from UC Berkeley. He has been on the founding core development team of the tlverse project, an extensible software ecosystem for targeted learning, and, since 2020, has collaborated very closely with the Vaccine and Infectious Disease Division of the Fred Hutchinson Cancer Center as a core member of the US Government Immune Correlates Biostatistical Analysis Team of the NIAID-funded COVID-19 Prevention Network. Nima’s research interests combine causal inference and machine learning, driven by the aim of developing assumption-lean statistical procedures tailored for efficient and robust inference about scientifically informative parameters. He is particularly motivated by methodological issues stemming from robust non/semi-parametric inference, high-dimensional inference, targeted loss-based estimation, and biased sampling designs, usually tied to applications from clinical trials or computational biology and especially as related to scientific issues concerning vaccine efficacy evaluation, infectious disease epidemiology, and immunology.

Ivana Malenica

Ivana Malenica is a Postdoctoral Researcher in the Department of Statistics (https://statistics.fas.harvard.edu/) at Harvard and a Wojcicki and Troper Data Science Fellow at the Harvard Data Science Initiative. She obtained her PhD in Biostatistics at UC Berkeley working with Mark van der Laan, where she was a Berkeley Institute for Data Science (BIDS) Fellow and a NIH Biomedical Big Data Fellow. Her research interests span non/semi-parametric theory, causal inference and machine learning, with emphasis on personalized health and dependent settings. Most of her current work involves causal inference with time and network dependence, online learning, optimal individualized treatment, reinforcement learning, and adaptive sequential designs.

Rachael Phillips

Rachael Phillips is a PhD candidate in biostatistics at the University of California at Berkeley, advised by Professors Alan Hubbard and Mark van der Laan. She has an MA in Biostatistics, BS in Biology, and BA in Mathematics. As a student of Targeted Learning, Rachael integrates causal inference, machine learning, and semi-parametric statistical theory to answer causal questions with statistical confidence. She is a researcher for the Center for Targeted Machine Learning and Causal Inference and actively actively collaborates with Professor and chief anesthesiologist, Romain Pirracchio, at the University of California at San Francisco (UCSF) on the development of clinical algorithm frameworks and guidelines. For multiple years during her PhD studies, Rachael worked with and was funded by the United States Food and Drug Administration (FDA contract 75F40119C10155). Led by Dr. Susan Gruber, PI, this project focused on the use of Targeted Learning for the evaluation and generation of real-world evidence (RWE). Also, throughout her PhD, she has developed open-source software, biostatistics graduate courses and other educational material for Targeted Learning and causal inference.

Introduction