Chapter 5 The TMLE Framework

Based on the tmle3 R package.

Updated: 2019-12-06

5.1 Introduction

The first step in the estimation procedure is an initial estimate of the data-generating distribution, or the relevant part of this distribution that is needed to evaluate the target parameter. For this initial estimation, we use the super learner (van der Laan, Polley, and Hubbard 2007), as described in the previous section.

With the initial estimate of relevant parts of the data-generating distribution necessary to evaluate the target parameter, we are ready to construct the TMLE!

5.1.1 Substitution Estimators

Beyond a fit of the prediction function, one might also want to estimate more targeted parameters specific to certain scientific questions.
The approach is to plug into the estimand of interest estimates of the relevant distributions.
Sometimes, we can use simple empirical distributions, but averaging some function over the observations (e.g., giving weight \(1/n\) for all observations).
Other parts of the distribution, like conditional means or probabilities, the estimate will require some sort of smoothing due to the curse of dimensionality.

We give one example using an example of the average treatment effect (see above):

\(\Psi(P_0) = \Psi(Q_0) = \mathbb{E}_0 \big[\mathbb{E}_0[Y \mid A = 1, W] - \mathbb{E}_0[Y \mid A = 0, W]\big]\), where \(Q_0\) represents both the distribution of \(Y \mid A,W\) and distribution of \(W\).
Let \(\bar{Q}_0(A,W) \equiv \mathbb{E}_0(Y \mid A,W)\) and \(Q_{0,W}(w) = P_0 (W=w)\), then \[ \Psi(Q_0) = \sum_w \{ \bar{Q}_0(1,w)-\bar{Q}_0(0,w)\} Q_{0,W}(w) \]
The Substitution Estimator plugs in the empirical distribution (weight \(1/n\) for each observation) for \(Q_{0,W}(W_i)\), and some estimate of the regression of \(Y\) on \((A,W)\) (say SL fit): \[ \Psi(Q_n) = \frac{1}{n} \sum_{i=1}^n \{ \bar{Q}_n(1,W_i)-\bar{Q}_n(0,W_i)\} \]
Thus, it becomes the average of the differences in predictions from the fit keeping the observed \(W\), but first replacing \(A=1\) and then the same but all \(A=0\).

5.1.2 TMLE

Though using SL over an arbitrary parametric regression is an improvement, it’s not sufficient to have the properties of an estimator one needs for rigorous inference.
Because the variance-bias trade-off in the SL is focused on the prediction model, it can, for instance, under-fit portions of the distributions that are critical for estimating the parameter of interest, \(\Psi(P_0)\).
TMLE keeps the benefits of substitution estimators (it is one), but augments the original estimates to correct for this issue and also results in an asymptotically linear (and thus normally-distributed) estimator with consistent Wald-style confidence intervals.
Produces a well-defined, unbiased, efficient substitution estimator of target parameters of a data-generating distribution.
Updates an initial (super learner) estimate of the relevant part of the data-generating distribution possibly using an estimate of a nuisance parameter (like the model of intervention given covariates).
Removes asymptotic residual bias of initial estimator for the target parameter, if it uses a consistent estimator of \(g_0\).
If initial estimator was consistent for the target parameter, the additional fitting of the data in the targeting step may remove finite sample bias, and preserves consistency property of the initial estimator.
If the initial estimator and the estimator of \(g_0\) are both consistent, then it is also asymptotically efficient according to semi-parametric statistical model efficiency theory.
Thus, every effort is made to achieve minimal bias and the asymptotic semi-parametric efficiency bound for the variance.

There are different types of TMLE, sometimes for the same set of parameters, but below is an example of the algorithm for estimating the ATE.
In this case, one can present the estimator as:

\[ \Psi(Q^{\star}_n) = \frac{1}{n} \sum_{i=1}^n \{ \bar{Q}^{\star}_n(1,W_i) - \bar{Q}^{\star}_n(0,W_i)\} \] where \(\bar{Q}^{\star}_n(A,W)\) is the TMLE augmented estimate. \(f(\bar{Q}^{\star}_n(A,W)) = f(\bar{Q}_n(A,W)) + \epsilon_n \cdot h_n(A,W)\), where \(f(\cdot)\) is the appropriate link function (e.g., logit), \(\epsilon_n\) is an estimated coefficient and \(h_n(A,W)\) is a “clever covariate”.

In this case, \(h_n(A,W) = \frac{A}{g_n(W)}-\frac{1-A}{1-g_n(W)}\), with \(g_n(W) = \mathbb{P}(A=1 \mid W)\) being the estimated (also by SL) propensity score, so the estimator depends both on initial SL fit of the outcome regression (\(\bar{Q}_0\)) and an SL fit of the propensity score (\(g_n\)).
There are further robust augmentations that are used in tlverse, such as an added layer of cross-validation to avoid over-fitting bias (CV-TMLE), and so called methods that can more robustly estimated several parameters simultaneously (e.g., the points on a survival curve).

5.1.3 Inference

The estimators we discuss are asymptotically linear, meaning that the difference in the estimate \(\Psi(P_n)\) and the true parameter (\(\Psi(P_0)\)) can be represented in first order by a i.i.d. sum: \[\begin{equation}\label{eqn:IC} \Psi(P_n) - \Psi(P_0) = \frac{1}{n} \sum_{i=1}^n IC(O_i; \nu) + o_p(1/\sqrt{n}) \end{equation}\]

where \(IC(O_i; \nu)\) (the influence curve or function) is a function of the data and possibly other nuisance parameters \(\nu\). Importantly, such estimators have mean-zero Gaussian limiting distributions; thus, in the univariate case, one has that \[\begin{equation}\label{eqn:limit_dist} \sqrt{n}(\Psi(P_n) - \Psi(P_0)) \xrightarrow[]{D}N(0,\mathbb{V}IC(O_i;\nu)), \end{equation}\] so that inference for the estimator of interest may be obtained in terms of the influence function. For this simple case, a 95% confidence interval may be derived as: \[\begin{equation}\label{eqn:CI} \Psi(P^{\star}_n) \pm z_{1 - \frac{\alpha}{2}} \sqrt{\frac{\hat{\sigma}^2}{n}}, \end{equation}\] where \(SE=\sqrt{\frac{\hat{\sigma}^2}{n}}\) and \(\hat{\sigma}^2\) is the sample variance of the estimated IC’s: \(IC(O; \hat{\nu})\). One can use the functional delta method to derive the influence curve if a parameter of interest may be written as a function of other asymptotically linear estimators.

Thus, we can derive robust inference for parameters that are estimated by fitting complex, machine learning algorithms and these methods are computationally quick (do not rely on re-sampling based methods like the bootstrap).

5.2 Learning Objectives

Use tmle3 to estimate an Average Treatment Effect (ATE)
Understand tmle3 “Specs”
Fit tmle3 for a custom set of parameters
Use the delta method to estimate transformations of parameters

5.3 Easy-Bake Example: `tmle3` for ATE

We’ll illustrate the most basic use of TMLE using the IST example data introduced earlier and estimating an Average Treatment Effect (ATE).

As a reminder, the ATE is identified with the following statistical parameter (under assumptions): \(ATE = \mathbb{E}_0(Y(1)-Y(0)) = \mathbb{E}_0\left(\mathbb{E}_0[Y \mid A=1,W]-\mathbb{E}_0[Y \mid A=0,W] \right)\)

This Easy-Bake implementation consists of the following steps:

Load the necessary libraries and data
Define the variable roles
Create a “Spec” object
Define the super learners
Fit the TMLE
Evaluate the TMLE estimates

0. Load the Data

We’ll use the same WASH Benefits data as the earlier chapters:

library(data.table)
library(tmle3)
library(sl3)
ist_data <- data.table(read.csv("https://raw.githubusercontent.com/tlverse/deming2019-workshop/master/data/ist_sample.csv"))

1. Define the variable roles

We’ll use the common \(W\) (covariates), \(A\) (treatment/intervention), \(Y\) (outcome) data structure. tmle3 needs to know what variables in the dataset correspond to each of these roles. We use a list of character vectors to tell it. We call this a “Node List” as it corresponds to the nodes in a Directed Acyclic Graph (DAG), a way of displaying causal relationships between variables.

node_list <- list(
  W = c(
    "RXHEP", "REGION", "RDELAY", "RCONSC", "SEX", "AGE", "RSLEEP", "RVISINF",
    "RCT", "RATRIAL", "RASP3", "MISSING_RATRIAL_RASP3", "RHEP24", 
    "MISSING_RHEP24", "RSBP", "RDEF1", "RDEF2", "RDEF3", "RDEF4", "RDEF5",
    "RDEF6", "RDEF7", "RDEF8", "STYPE"
    ),
  A = "RXASP",
  Y = "DRSISC"
)

Handling Missingness

Currently, missingness in tlverse is handled in a fairly simple way:

Missing covariates are median (for continuous) or mode (for discrete) imputed, and additional covariates indicating imputation are generated
Observations missing treatment variables are excluded.
We implement an IPCW-TMLE to more efficiently handle missingness in the outcome variables.

These steps are implemented in the process_missing() function in tmle3, and are automatically handled in sl3. In this data, we already imputed missing covariate values which were present in RATRIAL, RASP3, RHEP24; and we created additional covariates indicating imputation MISSING_RATRIAL_RASP3, MISSING_RHEP24. The missingness was identical for RATRIAL and RASP3, so we only needed to create one covariate indicating imputation for these two variables.

head(ist_data)

   RDELAY RCONSC SEX AGE RSLEEP RATRIAL RCT RVISINF RHEP24 RASP3 RSBP RDEF1
1:     46      F   F  85      N       N   N       N      Y     N  150     N
2:     33      F   M  71      Y       Y   Y       Y      N     Y  180     Y
3:      6      D   M  88      N       Y   N       N      N     N  140     Y
4:      8      F   F  68      Y       N   Y       Y      N     N  118     Y
5:     13      F   M  60      N       N   Y       N      N     N  140     Y
6:     16      F   F  71      Y       N   Y       N      N     N  160     N
   RDEF2 RDEF3 RDEF4 RDEF5 RDEF6 RDEF7 RDEF8 STYPE RXHEP
1:     Y     N     N     N     N     N     N  PACS     N
2:     Y     Y     Y     Y     N     N     N  TACS     L
3:     Y     Y     C     C     C     C     C  PACS     N
4:     Y     N     N     N     N     N     N  LACS     M
5:     Y     Y     Y     N     N     Y     Y  POCS     N
6:     Y     N     N     N     N     N     N  PACS     N
                    REGION MISSING_RATRIAL_RASP3 MISSING_RHEP24 RXASP DRSISC
1: Europe and Central Asia                     0              0     0      0
2:   East Asia and Pacific                     0              0     0      0
3: Europe and Central Asia                     0              0     0      0
4: Europe and Central Asia                     0              0     0      0
5: Europe and Central Asia                     0              0     1      0
6: Europe and Central Asia                     0              0     1      0

2. Create a “Spec” Object

tmle3 is general, and allows most components of the TMLE procedure to be specified in a modular way. However, most end-users will not be interested in manually specifying all of these components. Therefore, tmle3 implements a tmle3_Spec object that bundles a set of components into a specification that, with minimal additional detail, can be run by an end-user.

We’ll start with using one of the specs, and then work our way down into the internals of tmle3.

ate_spec <- tmle_ATE(
  treatment_level = 1,
  control_level = 0
)

3. Define the Relevant Super Learners

Currently, the only other thing a user must define are the sl3 learners used to estimate the relevant factors of the likelihood: \(Q\), \(g\), and \(\Delta\).

This takes the form of a list of sl3 learners, one for each likelihood factor to be estimated with sl3:

# choose base learners
lrnr_mean <- make_learner(Lrnr_mean)
lrnr_glm <- make_learner(Lrnr_glm)
lrnr_lasso <- make_learner(Lrnr_glmnet)
lrnr_ridge <- make_learner(Lrnr_glmnet, alpha = 0)
lrnr_ranger <- make_learner(Lrnr_ranger)

grid_params <- list(max_depth = c(2, 5, 8),
                    eta = c(0.01, 0.1, 0.3))
grid <- expand.grid(grid_params, KEEP.OUT.ATTRS = FALSE)
params_default <- list(nthread = getOption("sl.cores.learners", 1))
xgb_learners <- apply(grid, MARGIN = 1, function(params_tune) {
  do.call(Lrnr_xgboost$new, c(params_default, as.list(params_tune)))})

learners_Y <- make_learner(Stack, unlist(list(xgb_learners, lrnr_ridge, 
                                              lrnr_mean, lrnr_ranger, 
                                              lrnr_lasso, lrnr_glm), 
                                       recursive = TRUE))
                                            
# default metalearner appropriate to data types
sl_Y <- Lrnr_sl$new(
  learners = unlist(list(xgb_learners, lrnr_ridge, lrnr_mean, lrnr_ranger, 
                         lrnr_lasso, lrnr_glm), recursive = TRUE)
  )
sl_Delta <- Lrnr_sl$new(
  learners = list(lrnr_mean, lrnr_glm, lrnr_lasso, lrnr_ridge)
  )
sl_A <- Lrnr_sl$new(
  learners = list(lrnr_mean, lrnr_glm)
  )
learner_list <- list(A = sl_A, delta_Y = sl_Delta, Y = sl_Y)

Here, we use a super learner as defined in the previous sl3 section. In the future, we plan to include reasonable default learners.

4. Fit the TMLE

We now have everything we need to fit the tmle using tmle3:

tmle_fit <- tmle3(ate_spec, ist_data, node_list, learner_list)

5. Evaluate the Estimates

We can see the summary results by printing the fit object. Alternatively, we can extra results from the summary by indexing into it:

print(tmle_fit)

A tmle3_Fit that took 1 step(s)
   type                                      param     init_est     tmle_est
1:  ATE ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}] -0.001894602 -0.004648681
            se       lower      upper psi_transformed lower_transformed
1: 0.004305187 -0.01308669 0.00378933    -0.004648681       -0.01308669
   upper_transformed
1:        0.00378933

# in most cases the transformation that's applied to tmle3 estimates and 
# inference (psi_transformed) is nothing -- it comes up for estimands like ORs, 
# which are estimated on the log scale
estimates <- tmle_fit$summary$psi_transformed
print(estimates)

[1] -0.004648681

5.4 `tmle3` Components

Now that we’ve successfully used a spec to obtain a TML estimate, let’s look under the hood at the components. The spec has a number of functions that generate the objects necessary to define and fit a TMLE.

5.4.1 `tmle3_task`

First is, a tmle3_Task, analogous to an sl3_Task, containing the data we’re fitting the TMLE to, as well as an NPSEM generated from the node_list defined above, describing the variables and their relationships.

tmle_task <- ate_spec$make_tmle_task(ist_data, node_list)

tmle_task$npsem

$W
tmle3_Node: W
    Variables: RXHEP, REGION, RDELAY, RCONSC, SEX, AGE, RSLEEP, RVISINF, RCT, RATRIAL, RASP3, MISSING_RATRIAL_RASP3, RHEP24, MISSING_RHEP24, RSBP, RDEF1, RDEF2, RDEF3, RDEF4, RDEF5, RDEF6, RDEF7, RDEF8, STYPE
    Parents: 

$A
tmle3_Node: A
    Variables: RXASP
    Parents: W

$Y
tmle3_Node: Y
    Variables: DRSISC
    Parents: A, W

$delta_Y
tmle3_Node: delta_Y
    Variables: delta_Y
    Parents: A, W

5.4.2 Initial Likelihood

Next, is an object representing the likelihood, factorized according to the NPSEM described above:

initial_likelihood <- ate_spec$make_initial_likelihood(
  tmle_task,
  learner_list
)
print(initial_likelihood)

W: Lf_emp
A: LF_fit
Y: LF_fit
delta_Y: LF_fit

These components of the likelihood indicate how the factors were estimated: the marginal distribution of \(W\) was estimated using NPMLE, and the conditional distributions of \(A\), \(\Delta\), and \(Y\) were estimated using sl3 fits (as defined with the learner_list) above.

We can use this in tandem with the tmle_task object to obtain likelihood estimates for each observation:

initial_likelihood$get_likelihoods(tmle_task)

          W         A          Y   delta_Y
   1: 2e-04 0.4931439 0.02317890 0.9994764
   2: 2e-04 0.5323151 0.02825866 0.9994643
   3: 2e-04 0.4768836 0.01363309 0.9994558
   4: 2e-04 0.5162446 0.02756512 0.9994699
   5: 2e-04 0.5157559 0.01639482 0.9977771
  ---                                     
4996: 2e-04 0.4952072 0.01365717 0.9994684
4997: 2e-04 0.4930078 0.02359050 0.9984894
4998: 2e-04 0.4928027 0.01439682 0.9985203
4999: 2e-04 0.4790796 0.03792173 0.9991405
5000: 2e-04 0.4948674 0.01738388 0.9980632

5.4.3 Targeted Likelihood (updater)

We also need to define a “Targeted Likelihood” object. This is a special type of likelihood that is able to be updated using an tmle3_Update object. This object defines the update strategy (e.g. submodel, loss function, CV-TMLE or not, etc).

targeted_likelihood <- Targeted_Likelihood$new(initial_likelihood)

When constructing the targeted likelihood, you can specify different update options. See the documentation for tmle3_Update for details of the different options. For example, you can disable CV-TMLE (the default in tmle3) as follows:

targeted_likelihood_no_cv <-
  Targeted_Likelihood$new(initial_likelihood,
    updater = list(cvtmle = FALSE)
  )

5.4.4 Parameter Mapping

Finally, we need to define the parameters of interest. Here, the spec defines a single parameter, the ATE. In the next section, we’ll see how to add additional parameters.

tmle_params <- ate_spec$make_params(tmle_task, targeted_likelihood)
print(tmle_params)

[[1]]
Param_ATE: ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}]

5.4.5 Putting it all together

Having used the spec to manually generate all these components, we can now manually fit a tmle3:

tmle_fit_manual <- fit_tmle3(
  tmle_task, targeted_likelihood, tmle_params,
  targeted_likelihood$updater
)
print(tmle_fit_manual)

A tmle3_Fit that took 1 step(s)
   type                                      param     init_est     tmle_est
1:  ATE ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}] -0.002037851 -0.004717238
            se       lower       upper psi_transformed lower_transformed
1: 0.004293116 -0.01313159 0.003697115    -0.004717238       -0.01313159
   upper_transformed
1:       0.003697115

The result is equivalent to fitting using the tmle3 function as above.

5.5 Fitting `tmle3` with multiple parameters

Above, we fit a tmle3 with just one parameter. tmle3 also supports fitting multiple parameters simultaneously. To illustrate this, we’ll use the tmle_TSM_all spec:

tsm_spec <- tmle_TSM_all()
targeted_likelihood <- Targeted_Likelihood$new(initial_likelihood)
all_tsm_params <- tsm_spec$make_params(tmle_task, targeted_likelihood)
print(all_tsm_params)

[[1]]
Param_TSM: E[Y_{A=0, delta_Y=1}]

[[2]]
Param_TSM: E[Y_{A=1, delta_Y=1}]

This spec generates a Treatment Specific Mean (TSM) for each level of the exposure variable. Note that we must first generate a new targeted likelihood, as the old one was targeted to the ATE. However, we can recycle the initial likelihood we fit above, saving us a super learner step.

5.5.1 Delta Method

We can also define parameters based on Delta Method Transformations of other parameters. For instance, we can estimate a ATE using the delta method and two of the above TSM parameters:

ate_param <- define_param(
  Param_delta, targeted_likelihood,
  delta_param_ATE,
  list(all_tsm_params[[1]], all_tsm_params[[2]])
)
print(ate_param)

Param_delta: E[Y_{A=1, delta_Y=1}] - E[Y_{A=0, delta_Y=1}]

This can similarly be used to estimate other derived parameters like Relative Risks, and Population Attributable Risks.

5.5.2 Fit

We can now fit a TMLE simultaneously for all TSM parameters, as well as the above defined ATE parameter

all_params <- c(all_tsm_params, ate_param)
tmle_fit_multiparam <- fit_tmle3(
  tmle_task, targeted_likelihood, all_params,
  targeted_likelihood$updater
)
print(tmle_fit_multiparam)

A tmle3_Fit that took 1 step(s)
   type                                         param     init_est     tmle_est
1:  TSM                         E[Y_{A=0, delta_Y=1}]  0.023110120  0.025906026
2:  TSM                         E[Y_{A=1, delta_Y=1}]  0.021072269  0.021183901
3:  ATE E[Y_{A=1, delta_Y=1}] - E[Y_{A=0, delta_Y=1}] -0.002037851 -0.004722125
            se       lower       upper psi_transformed lower_transformed
1: 0.003178264  0.01967674 0.032135309     0.025906026        0.01967674
2: 0.002891541  0.01551659 0.026851216     0.021183901        0.01551659
3: 0.004293326 -0.01313689 0.003692638    -0.004722125       -0.01313689
   upper_transformed
1:       0.032135309
2:       0.026851216
3:       0.003692638

5.6 Stratified Effect Estimates

TMLE can also be applied to estimate effects in in strata of a baseline covariate. The tmle_stratified spec makes it easy to extend an existing spec with stratification.

For instance, we can estimate strata specific ATEs as follows: \(ATE = \mathbb{E}_0(Y(1)-Y(0) \mid V=v ) = \mathbb{E}_0\left(\mathbb{E}_0[Y \mid A=1,W]-\mathbb{E}_0[Y \mid A=0,W] \mid V=v \right)\)

For example, we can stratify the above ATE spec to estimate the ATE in strata of country region:

stratified_ate_spec <- tmle_stratified(ate_spec, "REGION")
stratified_fit <- tmle3(stratified_ate_spec, ist_data, node_list, learner_list)

Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'
Error in xgboost::xgb.DMatrix(Xmat) : 
  REAL() can only be applied to a 'numeric', not a 'logical'

print(stratified_fit)

A tmle3_Fit that took 1 step(s)
             type
1:            ATE
2: stratified ATE
3: stratified ATE
4: stratified ATE
5: stratified ATE
6: stratified ATE
7: stratified ATE
8: stratified ATE
                                                                       param
1:                                ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}]
2:    ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}] | V=Europe and Central Asia
3:      ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}] | V=East Asia and Pacific
4: ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}] | V=Middle East & North Africa
5:  ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}] | V=Latin America & Caribbean
6:                 ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}] | V=South Asia
7:              ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}] | V=North America
8:         ATE[Y_{A=2, delta_Y=1}-Y_{A=1, delta_Y=1}] | V=Sub-Saharan Africa
        init_est      tmle_est          se        lower       upper
1: -0.0022015483 -0.0045954553 0.004317047 -0.013056711 0.003865801
2: -0.0022316436 -0.0042473161 0.004633570 -0.013328947 0.004834314
3: -0.0023371825  0.0050747161 0.020964533 -0.036015013 0.046164445
4: -0.0003294139  0.0001945933 0.001637558 -0.003014962 0.003404149
5: -0.0025104994 -0.0264986320 0.022817253 -0.071219625 0.018222361
6: -0.0002519670  0.0025249227 0.002173750 -0.001735549 0.006785395
7: -0.0030792781 -0.0083046085 0.048741790 -0.103836762 0.087227545
8: -0.0038918309 -0.0963311222 0.110582496 -0.313068832 0.120406588
   psi_transformed lower_transformed upper_transformed
1:   -0.0045954553      -0.013056711       0.003865801
2:   -0.0042473161      -0.013328947       0.004834314
3:    0.0050747161      -0.036015013       0.046164445
4:    0.0001945933      -0.003014962       0.003404149
5:   -0.0264986320      -0.071219625       0.018222361
6:    0.0025249227      -0.001735549       0.006785395
7:   -0.0083046085      -0.103836762       0.087227545
8:   -0.0963311222      -0.313068832       0.120406588

This TMLE is consistent for both the marginal ATE as well as the ATEs in strata of \(V\). For continuous \(V\), this could be extended using a working Marginal Structural Model (MSM), although that has not yet been implemented in tmle3.

5.7 Exercise

Follow the steps below to estimate an ATE using a simplified version of data from the Collaborative Perinatal Project (CPP), available in the sl3 package. We define a binary intervention variable, parity01 – an indicator of having one or more children before the current child and a binary outcome, haz01 – an indicator of having an above average height for age.

# load the data set
data(cpp)
cpp <- cpp[!is.na(cpp[, "haz"]), ]
cpp$parity01 <- as.numeric(cpp$parity > 0)
cpp[is.na(cpp)] <- 0
cpp$haz01 <- as.numeric(cpp$haz > 0)

Define the variable roles \((W,A,Y)\) by creating a list of these nodes. Include the following baseline covariates in \(W\): apgar1, apgar5, gagebrth, mage, meducyrs, sexn. Both \(A\) and \(Y\) are specified above.
Define a tmle3_Spec object for the ATE, tmle_ATE().
Using the same base learning libraries defined above, specify sl3 base learners for estimation of \(Q = E(Y|A,Y)\) and \(g=P(A|W)\).
Define the metalearner like below

metalearner <- make_learner(Lrnr_solnp,
  loss_function = loss_loglik_binomial,
  learner_function = metalearner_logistic_binomial
)

Define one super learner for estimating \(Q\) and another for estimating \(g\). Use the metalearner above for both \(Q\) and \(g\) super learners.
Create a list of the two super learners defined in Step 5 and call this object learner_list. The list names should be A (defining the super learner for estimating \(g\)) and Y (defining the super learner for estimating \(Q\)).
Fit the tmle with the tmle3 function by specifying (1) the tmle3_Spec, which we defined in Step 2; (2) the data; (3) the list of nodes, which we specified in Step 1; and (4) the list of super learners for estimating \(g\) and \(Q\), which we defined in Step 6. Note: Like before, you will need to make a data copy to deal with data.table weirdness (cpp2 <- data.table::copy(cpp)) and use cpp2 as the data.
Interpret the tmle3 fit both causally and statistically.

5.8 Summary

tmle3 is a general purpose framework for generating TML estimates.
The easiest way to use tmle3 is to use a predefined spec, allowing you to just fill in the blanks for the data, variable roles, and sl3 learners.
Digging under the hood allows users to specify a wide range of TMLEs.
In the next sections, we’ll see how this framework can be used to estimate more modern parameters such as the effect under optimal treatments and shift
interventions.

References

van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).