This is an open source and fully-reproducible electronic vignette for an
invited short-course on applying the targeted learning methodology in practice
using the tlverse
software ecosystem, given at
the Conference on Statistical Practice (CSP) on 20 February 2020. The
Hitchhiker’s Guide to the tlverse
, or a Targeted Learning Practitioner’s
Handbook is an in-draft book covering
the same topics in greater detail and may serve as a useful accompanying
resource to these workshop materials.
About this workshop
This 1-day workshop will provide a comprehensive introduction to the field of
Targeted Learning and the corresponding tlverse
software
ecosystem. In particular, we will focus on
targeted minimum loss estimators of causal effects, including those of static
and dynamic treatments (as well as those of optimal dynamic and stochastic
interventions, time permitting). These multiply robust, efficient plug-in
estimators use state-of-the-art, ensemble machine learning tools to flexibly
adjust for confounding while yielding valid statistical inference. We will
discuss the utility of this robust estimation strategy in comparison to
conventional techniques, which often rely on restrictive statistical models and
may therefore lead to severely biased inference. In addition to discussion, this
workshop will incorporate both interactive activities and hands-on, guided R
programming exercises, to allow participants the opportunity to familiarize
themselves with methodology and tools that will translate to real-world causal
inference analyses. It is highly recommended for participants to have an
understanding of basic statistical concepts such as confounding, probability
distributions, confidence intervals, hypothesis tests, and regression. Advanced
knowledge of mathematical statistics may be useful but is not necessary.
Familiarity with the R
programming language will be essential.
About the instructors and authors
Mark van der Laan
Mark van der Laan, Ph.D., is Professor of Biostatistics and Statistics at UC
Berkeley. His research interests include statistical methods in computational
biology, survival analysis, censored data, adaptive designs, targeted maximum
likelihood estimation, causal inference, data-adaptive loss-based learning, and
multiple testing. His research group developed loss-based super learning in
semiparametric models, based on cross-validation, as a generic optimal tool for
the estimation of infinite-dimensional parameters, such as nonparametric density
estimation and prediction with both censored and uncensored data. Building on
this work, his research group developed targeted maximum likelihood estimation
for a target parameter of the data-generating distribution in arbitrary
semiparametric and nonparametric models, as a generic optimal methodology for
statistical and causal inference. Most recently, Mark’s group has focused in
part on the development of a centralized, principled set of software tools for
targeted learning, the tlverse
. For more information, see
https://vanderlaan-lab.org.
Jeremy Coyle
Jeremy Coyle, Ph.D., is a consulting data scientist and statistical programmer,
currently leading the software development effort that has produced the
tlverse
ecosystem of R packages and related software tools. Jeremy earned his
Ph.D. in Biostatistics from UC Berkeley in 2016, primarily under the supervision
of Alan Hubbard.
Alan Hubbard
Alan Hubbard, Ph.D., is Professor of Biostatistics, former head of the Division
of Biostatistics at UC Berkeley, and head of data analytics core at UC
Berkeley’s SuperFund research program. His current research interests include
causal inference, variable importance analysis, statistical machine learning,
estimation of and inference for data-adaptive statistical target parameters, and
targeted minimum loss-based estimation. Research in his group is generally
motivated by applications to problems in computational biology, epidemiology,
and precision medicine.
Nima Hejazi
Nima is a Ph.D. candidate in biostatistics with a designated emphasis in
computational and genomic biology, working with Mark van der Laan and Alan
Hubbard. Nima is affiliated with UC Berkeley’s Center for Computational Biology
and is a former NIH Biomedical Big Data fellow. He earned is Master’s in
Biostatistics (2017) and a Bachelor’s with a triple major in Molecular and Cell
Biology (Neurobiology), Psychology, and Public Health (2015) at UC Berkeley.
Nima’s interests span nonparametric estimation, high-dimensional inference,
targeted learning, statistical computing, survival analysis, and computational
biology, with an emphasis on the development of robust and efficient statistical
methodologies that draw on the intersection of causal inference and statistical
machine learning. For more information, see https://nimahejazi.org.
Ivana Malenica
Ivana is a Ph.D. student in biostatistics advised by Mark van der Laan. Ivana is
currently a fellow at the Berkeley Institute for Data Science, after serving as
a NIH Biomedical Big Data and Freeport-McMoRan Genomic Engine fellow. She earned
her Master’s in Biostatistics and Bachelor’s in Mathematics, and spent some time
at the Translational Genomics Research Institute. Very broadly, her research
interests span non/semi-parametric theory, probability theory, machine learning,
causal inference and high-dimensional statistics. Most of her current work
involves complex dependent settings (dependence through time and network) and
adaptive sequential designs.
Rachael Phillips
Rachael is a Ph.D. student in biostatistics, advised by Alan Hubbard and Mark
van der Laan. She has an M.A. in Biostatistics, B.S. in Biology with a
Chemistry minor and a B.A. in Mathematics with a Spanish minor. Rachael’s
research focuses on narrowing the gap between the theory and application of
modern statistics for real-world data science. Specifically, Rachael is
motivated by issues arising in healthcare, and she leverages strategies rooted
in causal inference and nonparametric estimation to build clinician-tailored,
machine-driven solutions. Rachael is also passionate about free, online-mediated
education and its corresponding pedagogy.