`shift_tmle.Rmd`

Stochastic treatment regimes present a relatively simple manner in which to assess the effects of continuous treatments by way of parameters that examine the effects induced by the counterfactual shifting of the observed values of a treatment of interest. Here, we present an implementation of a new algorithm for computing targeted minimum loss-based estimates of treatment shift parameters defined based on a shifting function \(d(A,W)\). For a technical presentation of the algorithm, the interested reader is invited to consult Díaz and van der Laan (2018). For additional background on Targeted Learning and previous work on stochastic treatment regimes, please consider consulting van der Laan and Rose (2011), van der Laan and Rose (2018), and Díaz and van der Laan (2012).

To start, let’s load the packages we’ll use and set a seed for simulation:

```
library(tidyverse)
library(data.table)
library(condensier)
library(sl3)
library(tmle3)
library(tmle3shift)
set.seed(429153)
```

Start with a simple additive shift – i.e., \(d(a,w) = a + \delta\) if \(a \leq u(w) - \delta\) or \(d(a, w) = a\) if \(a \geq u(w) - \delta\).

The additive shift will have support everywhere (i.e., \(a \leq u(w)\) is true everywhere).

The data structure that we now observe is \(O = (W, A, Y)\).

```
# simulate simple data for tmle-shift sketch
n_obs <- 1000 # number of observations
n_w <- 1 # number of baseline covariates
tx_mult <- 2 # multiplier for the effect of W = 1 on the treatment
## baseline covariates -- simple, binary
W <- as.numeric(replicate(n_w, rbinom(n_obs, 1, 0.5)))
## create treatment based on baseline W
A <- as.numeric(rnorm(n_obs, mean = tx_mult * W, sd = 1))
## create outcome as a linear function of A, W + white noise
Y <- A + W + rnorm(n_obs, mean = 0, sd = 1)
```

The above composes our observed data structure \(O = (W, A, Y)\). To formally express this fact using the `tlverse`

grammar introduced by the `tmle3`

package, we create a single data object and specify the functional relationships between the nodes in the *directed acyclic graph* (DAG) via *nonparametric structural equation models* (NPSEMs), reflected in the node list that we set up:

```
# organize data and nodes for tmle3
data <- data.table(W, A, Y)
node_list <- list(W = "W", A = "A", Y = "Y")
head(data)
```

```
## W A Y
## 1: 1 2.4031607 4.0283549
## 2: 1 4.4973744 6.4329477
## 3: 1 2.0330871 1.4733069
## 4: 0 -0.8089023 -0.9610039
## 5: 1 1.8432067 2.5954114
## 6: 1 1.3555863 2.7855801
```

We now have an observed data structure (`data`

) and a specification of the role that each variable in the data set plays as the nodes in a DAG.

To start, we will initialize a specification for the TMLE of our parameter of interest (called a `tmle3_Spec`

in the `tlverse`

nomenclature) simply by calling `tmle_shift`

. We specify the argument `shift_val = 0.5`

when initializing the `tmle3_Spec`

object to communicate that we’re interested in a shift of \(0.5\) on the scale of the treatment \(A\) – that is, we specify \(\delta = 0.5\) (note that this is an arbitrarily chosen value for this example).

```
# initialize a tmle specification
tmle_spec <- tmle_shift(shift_val = 0.5,
shift_fxn = shift_additive_bounded,
shift_fxn_inv = shift_additive_bounded_inv)
```

As seen above, the `tmle_shift`

specification object (like all `tmle3_Spec`

objects) does *not* store the data for our specific analysis of interest. Later, we’ll see that passing a data object directly to the `tmle3`

wrapper function, alongside the instantiated `tmle_spec`

, will serve to construct a `tmle3_Task`

object internally (see the `tmle3`

documentation for details).

`sl3`

To easily incorporate ensemble machine learning into the estimation procedure, we rely on the facilities provided in the `sl3`

R package. For a complete guide on using the `sl3`

R package, consider consulting https://sl3.tlverse.org, or https://tlverse.org for the `tlverse`

ecosystem, of which `sl3`

is a major part.

Using the framework provided by the `sl3`

package, the nuisance parameters of the TML estimator may be fit with ensemble learning, using the cross-validation framework of the Super Learner algorithm of van der Laan, Polley, and Hubbard (2007).

```
# learners used for conditional expectation regression (e.g., outcome)
lrn1 <- Lrnr_mean$new()
lrn2 <- Lrnr_glm$new()
lrn3 <- Lrnr_ranger$new()
sl_lrn <- Lrnr_sl$new(
learners = list(lrn1, lrn2, lrn3),
metalearner = Lrnr_nnls$new()
)
# learners used for conditional density regression (e.g., propensity score)
lrn1_dens <- Lrnr_condensier$new(
nbins = 25, bin_estimator = lrn1,
bin_method = "dhist"
)
lrn2_dens <- Lrnr_condensier$new(
nbins = 20, bin_estimator = lrn2,
bin_method = "dhist"
)
lrn3_dens <- Lrnr_condensier$new(
nbins = 15, bin_estimator = lrn3,
bin_method = "dhist"
)
sl_lrn_dens <- Lrnr_sl$new(
learners = list(lrn1_dens, lrn2_dens, lrn3_dens),
metalearner = Lrnr_solnp_density$new()
)
```

As seen above, we can generate two different ensemble learners for the two nuisance regressions that must be fit in the process of computing this TML estimator. In particular, we use a Super Learner composed of an intercept model, a GLM, and a random forest (as implemented in the `ranger`

R package) for fitting the outcome regressions (often denoted “Q” in the literature) while we use variations of these learners, through the `condensier`

R package, for the conditional density estimation needed in fitting the treatment mechanism (often denoted “g” in the literature).

We make the above explicit with respect to standard notation by bundling the ensemble learners into a list object below:

```
# specify outcome and treatment regressions and create learner list
Q_learner <- sl_lrn
g_learner <- sl_lrn_dens
learner_list <- list(Y = Q_learner, A = g_learner)
```

The `learner_list`

object above specifies the role that each of the ensemble learners we’ve generated is to play in computing initial estimators to be used in building a TMLE for the parameter of interest here. In particular, it makes explicit the fact that our `Q_learner`

is used in fitting the outcome regression while our `g_learner`

is used in fitting our treatment mechanism regression.

`tmle_fit <- tmle3(tmle_spec, data, node_list, learner_list)`

```
##
## Iter: 1 fn: 1829.0370 Pars: 0.89609235 0.10389606 0.00001158
## Iter: 2 fn: 1829.0370 Pars: 0.896093725 0.103899682 0.000006593
## solnp--> Completed in 2 iterations
```

```
## type param init_est tmle_est se lower upper
## 1: TSM E[Y_{A=NULL}] 2.029112 2.03477 0.06480381 1.907757 2.161783
## psi_transformed lower_transformed upper_transformed
## 1: 2.03477 1.907757 2.161783
```

The `print`

method of the resultant `tmle_fit`

object conveniently displays the results from computing our TML estimator.

Recall that the asymptotic distribution of TML estimators has been studied thoroughly: \[\psi_n - \psi_0 = (P_n - P_0) \cdot D(\bar{Q}_n^*, g_n) + R(\hat{P}^*, P_0),\] which, provided the following two conditions:

- If \(D(\bar{Q}_n^*, g_n)\) converges to \(D(P_0)\) in \(L_2(P_0)\) norm, and
- the size of the class of functions considered for estimation of \(\bar{Q}_n^*\) and \(g_n\) is bounded (technically, \(\exists \mathcal{F}\) st \(D(\bar{Q}_n^*, g_n) \in \mathcal{F}\)
, where \(\mathcal{F}\) is a Donsker class), readily admits the conclusion that \(\psi_n - \psi_0 = (P_n - P_0) \cdot D(P_0) + R(\hat{P}^*, P_0)\).**whp**

Under the additional condition that the remainder term \(R(\hat{P}^*, P_0)\) decays as \(o_P \left( \frac{1}{\sqrt{n}} \right),\) we have that \[\psi_n - \psi_0 = (P_n - P_0) \cdot D(P_0) + o_P \left( \frac{1}{\sqrt{n}} \right),\] which, by a central limit theorem, establishes a Gaussian limiting distribution for the estimator:

\[\sqrt{n}(\psi_n - \psi) \to N(0, V(D(P_0))),\] where \(V(D(P_0))\) is the variance of the efficient influence curve (canonical gradient) when \(\psi\) admits an asymptotically linear representation.

The above implies that \(\psi_n\) is a \(\sqrt{n}\)-consistent estimator of \(\psi\), that it is asymptotically normal (as given above), and that it is locally efficient. This allows us to build Wald-type confidence intervals in a straightforward manner:

\[\psi_n \pm z_{\alpha} \cdot \frac{\sigma_n}{\sqrt{n}},\] where \(\sigma_n^2\) is an estimator of \(V(D(P_0))\). The estimator \(\sigma_n^2\) may be obtained using the bootstrap or computed directly via the following

\[\sigma_n^2 = \frac{1}{n} \sum_{i = 1}^{n} D^2(\bar{Q}_n^*, g_n)(O_i)\]

Having now re-examined these facts, let’s simply examine the results of computing our TML estimator:

```
## type param init_est tmle_est se lower upper
## 1: TSM E[Y_{A=NULL}] 2.029112 2.03477 0.06480381 1.907757 2.161783
## psi_transformed lower_transformed upper_transformed
## 1: 2.03477 1.907757 2.161783
```

Díaz, Iván, and Mark J van der Laan. 2012. “Population Intervention Causal Effects Based on Stochastic Interventions.” *Biometrics* 68 (2). Wiley Online Library: 541–49.

———. 2018. “Stochastic Treatment Regimes.” In *Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies*, 167–80. Springer Science & Business Media.

van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” *Statistical Applications in Genetics and Molecular Biology* 6 (1).

van der Laan, Mark J, and Sherri Rose. 2011. *Targeted Learning: Causal Inference for Observational and Experimental Data*. Springer Science & Business Media.

———. 2018. *Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies*. Springer Science & Business Media.