Course Information
This open source, reproducible vignette is for a full-day short course on March
19, 2023 at the International Biometric Society Eastern North American Region
(ENAR) Conference in Nashville, Tennessee. Entitled “Targeted Learning:
Advanced Methods for Causal Machine Learning”, this workshop provides a
comprehensive introduction to the field of Targeted Learning, at the
intersection of causal inference and machine learning, and its accompanying
tlverse
software ecosystem. Focus will be on
targeted minimum loss-based estimation (TMLE) of causal effects of
sophisticated interventions, including dynamic, optimal dynamic,
stochastic regimes. The robust and efficient plug-in estimators that will be
introduced leverage state-of-the-art machine learning via the super learner in
order to flexibly adjust for confounding while yielding valid statistical
inference.
This course will be of interest to both statistical and applied scientists
engaged in biomedical/health studies, whether experimental or observational,
who wish to apply cutting-edge statistical and causal inference methodology to
rigorously formalize and answer research questions. This workshop incorporates
interactive discussions and hands-on, guided R
programming exercises, allowing
participants to familiarize themselves with methodology and tools that
translate to real-world data analysis.
Participants are highly recommended to have had prior training in basic
statistical concepts (e.g., confounding, probability distributions, hypothesis
testing and confidence intervals, regression). Advanced knowledge of
mathematical statistics is useful but not necessary. Familiarity with the R
programming language is essential.
Schedule
-
Pre-workshop software installation: Please see “Part 1: Preliminaries” of
this website to set up
R
, RStudio, and thetlverse
. - Pre-workshop reading: The Roadmap of Targeted Learning and Why We Need A Statistical Revolution
- 08:00-10:00: Introduction to Targeted Learning
- 10:00-10:20: Coffee break
- 10:20-10:45: Introduction to the
tlverse
- 10:45-12:00: Super learning with the
sl3
R
package - 12:00-01:00: Lunch break
- 01:00-01:30: Brief intro to the
tmle3
R
package - 01:30-03:00: Optimal treatment regimes with the
tmle3mopttx
R
package - 03:00-03:20: Coffee break
- 03:20-05:00: Stochastic treatment regimes with the
tmle3shift
R
package
Materials
The course materials on this website are based on a working draft of the book, Targeted Learning in
R
: Causal Data Science with thetlverse
Software Ecosystem, which includes in-depth discussion of these topics and much more, and may serve as a useful reference to accompany these workshop materials.-
The GitHub repository for this short course is available at https://github.com/tlverse/enar2023-workshop/. The GitHub repository contains the files for generating this website and additional learning resources, including
-
R
script files (for each section of the website that containsR
code): https://github.com/tlverse/enar2023-workshop/tree/master/R_code, and - Slide decks for various presentations that will be given: https://github.com/tlverse/enar2023-workshop/tree/master/slides.
-
Instructors
Mark van der Laan
Mark van der Laan is the Jiann-Ping Hsu/Karl E. Peace Professor of Biostatistics and Statistics at the University of California, Berkeley. He has made contributions to survival analysis, semiparametric statistics, multiple testing, and causal inference. He also developed the targeted maximum likelihood methodology and general theory for super-learning. He is a founding editor of the Journal of Causal Inference and International Journal of Biostatistics. He has authored four books on Targeted Learning, censored data and multiple testing, authored over 300 publications, and graduated over 50 PhD students. He received the COPSS Presidents’ Award in 2005, the Mortimer Spiegelman Award in 2004, and the van Dantzig Award in 2005.
Alan Hubbard
Alan Hubbard is a Professor and the Head of Biostatistics at the University of California at Berkeley (UCB), Co-director of the Center of Targeted Learning, Head of the Computational Biology Core of the SuperFund Center at UCB (NIH/EPA), and a consulting statistician on several federally funded and foundation projects. He has worked as well on projects ranging from molecular biology of aging, epidemiology, and infectious disease modeling, but most all of his work has focused on semi-parametric estimation in high-dimensional data. His current methods-research focuses on precision medicine, variable importance, statistical inference for data-adaptive parameters, and statistical software implementing targeted learning methods. Alan is currently working in several areas of applied research, including early childhood development in developing countries, environmental genomics and comparative effectiveness research. He has most recently concentrated on using complex patient data for better prediction for acute trauma patients.
Nima Hejazi
Nima Hejazi, is an Assistant Professor of
Biostatistics at the Harvard T.H. Chan School of Public Health. He recently
completed an NSF Mathematical Sciences Postdoctoral Research Fellowship, and,
prior to this, obtained his PhD in Biostatistics from UC Berkeley. He has been
on the founding core development team of the tlverse
project, an extensible software ecosystem for
targeted learning, and, since 2020, has collaborated very closely with the
Vaccine and Infectious Disease Division of the Fred Hutchinson Cancer Center as
a core member of the US Government Immune Correlates Biostatistical Analysis
Team of the NIAID-funded COVID-19 Prevention Network. Nima’s research interests
combine causal inference and machine learning, driven by the aim of developing
assumption-lean statistical procedures tailored for efficient and robust
inference about scientifically informative parameters. He is particularly
motivated by methodological issues stemming from robust non/semi-parametric
inference, high-dimensional inference, targeted loss-based estimation, and
biased sampling designs, usually tied to applications from clinical trials or
computational biology and especially as related to scientific issues concerning
vaccine efficacy evaluation, infectious disease epidemiology, and immunology.
Ivana Malenica
Ivana Malenica is a Postdoctoral Researcher in the Department of Statistics (https://statistics.fas.harvard.edu/) at Harvard and a Wojcicki and Troper Data Science Fellow at the Harvard Data Science Initiative. She obtained her PhD in Biostatistics at UC Berkeley working with Mark van der Laan, where she was a Berkeley Institute for Data Science (BIDS) Fellow and a NIH Biomedical Big Data Fellow. Her research interests span non/semi-parametric theory, causal inference and machine learning, with emphasis on personalized health and dependent settings. Most of her current work involves causal inference with time and network dependence, online learning, optimal individualized treatment, reinforcement learning, and adaptive sequential designs.
Rachael Phillips
Rachael Phillips is a PhD candidate in biostatistics at the University of California at Berkeley, advised by Professors Alan Hubbard and Mark van der Laan. She has an MA in Biostatistics, BS in Biology, and BA in Mathematics. As a student of Targeted Learning, Rachael integrates causal inference, machine learning, and semi-parametric statistical theory to answer causal questions with statistical confidence. She is a researcher for the Center for Targeted Machine Learning and Causal Inference and actively actively collaborates with Professor and chief anesthesiologist, Romain Pirracchio, at the University of California at San Francisco (UCSF) on the development of clinical algorithm frameworks and guidelines. For multiple years during her PhD studies, Rachael worked with and was funded by the United States Food and Drug Administration (FDA contract 75F40119C10155). Led by Dr. Susan Gruber, PI, this project focused on the use of Targeted Learning for the evaluation and generation of real-world evidence (RWE). Also, throughout her PhD, she has developed open-source software, biostatistics graduate courses and other educational material for Targeted Learning and causal inference.