Targeted (Machine) Learning for Real-World Data Science and Causal Inference with the tlverse
Software Ecosystem
Software Workshops at Deming Conference on Applied Statistics (4-6 December 2019)
updated: December 06, 2019
Preface
This is an open source and fully-reproducible electronic vignette for the
software workshops incorporated in the half-day tutorial (4 December 2019) and
2-day short course (5-6 December 2019) on applying Targeted Learning in practice
given at the Deming Conference on Applied Statistics. The Hitchhiker’s Guide
to the tlverse
, or a Targeted Learning Practitioner’s
Handbook is an in-draft book covering
the tlverse
software topics in greater detail
and may serve as a useful accompanying resource to these workshop materials.
Important links
Software installation
Please install the relevant software before the workshop.
Code
R
script files for each section of the workshop are available via the GitHub
repository for the short course.
Abstract
Half-Day Tutorial – 9A-12P on December 4, 2019
Targeted Maximum Likelihood Estimation (TMLE) for Machine Learning: A Gentle Introduction
During this half-day tutorial, we will delve into the utility of the roadmap of
targeted learning for translating real-world data applications to a mathematical
and statistical formulation of the relevant research question of interest.
Participants will perform hands-on implementation of state-of-the-art targeted
maximum likelihood estimators using the tlverse
software ecosystem in the R
programming language. Participants will actively learn and apply the core
principles of the Targeted Learning methodology, which (1) generalizes machine
learning to any estimand of interest; (2) obtains an optimal estimator of the
given estimand, grounded in theory; (3) integrates modern ensemble machine
learning techniques; and (4) provides formal statistical inference in terms of
confidence intervals and testing of specified null hypotheses of interest. It
is highly recommended for participants to have an understanding of basic
statistical concepts such as confounding, probability distributions, confidence
intervals, hypothesis tests, and regression. Advanced knowledge of mathematical
statistics may be useful but is not necessary. Familiarity with the R programming
language will be essential.
2-Day Short Course – 8A-5P on December 5-6, 2019
Targeted Learning in Data Science: Causal Inference for Observational and Experimental Data
This 2-day short course will provide a comprehensive introduction to the field
of targeted learning for causal inference and the corresponding tlverse
software ecosystem. We will focus on targeted minimum loss-based estimators of
causal effects, including those of static, dynamic, optimal dynamic, and
stochastic interventions. These multiply robust, efficient plug-in estimators
use state-of-the-art ensemble machine learning tools to flexibly adjust for
confounding while yielding valid statistical inference. Estimators will be
explored under various real-world scenarios: when the outcome is subject to
missingness, when mediators are present on the causal pathway, in high
dimensions, under two-phase sampling designs, and in right-censored survival
settings possibly subject to competing risks. We will discuss the utility of
this robust estimation strategy in comparison to conventional techniques, which
often rely on restrictive statistical models and may therefore lead to severely
biased inference. In addition to discussion, this course will incorporate both
interactive activities and hands-on, guided R
programming exercises, to allow
participants the opportunity to familiarize themselves with methodology and
tools that will translate to real-world analyses. It is highly recommended for
participants to have an understanding of basic statistical concepts such as
confounding, probability distributions, confidence intervals, hypothesis tests,
and regression. Advanced knowledge of mathematical statistics may be useful but
is not necessary. Familiarity with the R
programming language will be
essential.
Contents
These materials are feature modules centered around distinct causal questions, each motivated by a case study, alongside statistical methodology and software for assessing the causal claim of interest. Topics include
- Why we need a statistical revolution
- Introduction to the
tlverse
software ecosystem - Roadmap of statistical learning with causal inference
- International Stroke Trial (IST), WASH Benefits, and Veterans’ Administration Lung Cancer Trial data
- Super (ensemble machine) learning with the
sl3
tlverse
R
package - Targeted learning for causal inference with the
tmle3
tlverse
R
package - Optimal treatment regimes and the
tmle3mopttx
tlverse
R
package - Stochastic treatment regimes and the
tmle3shift
tlverse
R
package - One-step TMLE for time-to-event outcomes with the
MOSS
R
package - Treatment specific mean outcome or marginal structural model for longitudinal
data with the
ltmle
R
package