Background

A treatment often affects an outcome indirectly, through a particular pathway, by its effect on intermediate variables (mediators). Causal mediation analysis concerns the construction and evaluation of these indirect effects and the direct effects that are complementary to them. Generally, the indirect effect (IE) of a treatment on an outcome is the portion of the total effect that is found to work through mediators, while the direct effect includes all other components of the total effect (including both the effect of the treatment on the outcome and the effect through all paths not explicitly involving the mediators). Identifying and quantifying the mechanisms underlying causal effects is an increasingly popular endeavor in public health, medicine, and the social sciences, as such mechanistic knowledge improves understanding of both why and how treatments may be effective.

While the study of mediation analysis may be traced back quite far, the field only came into its modern form with the identification and careful study of the natural direct and indirect effects [Robins and Greenland (1992); pearl2001direct]. The natural direct effect (NDE) and the natural indirect effect (NIE) are based on a decomposition of the average treatment effect (ATE) in the presence of mediators (VanderWeele 2015), with requisite theory for the construction of efficient estimators of these quantities only receiving attention recently (Tchetgen Tchetgen and Shpitser 2012). Here, we examine the use of the tmle3mediate package for constructing targeted maximum likelihood (TML) estimators of the NDE and NIE (Zheng and van der Laan 2012).

To make our methodology and results comparable to that exposed in existing tools, we’ll take as a running example a simple dataset from an observational study of the relationship between BMI and kids’ behavior, distributed as part of the mma R package on CRAN. First, let’s load the packages we’ll be using and set a seed; then, load this dataset and take a quick look at it

# preliminaries
library(dplyr)
library(tidyr)
library(sl3)
library(tmle3)
library(tmle3mediate)

# load and examine data
library(mma)
data(weight_behavior)

# set a seed
set.seed(429153)

The documentation for the dataset describes it as a “database obtained from the Louisiana State University Health Sciences Center, New Orleans, by Dr. Richard Scribner. He explored the relationship between BMI and kids’ behavior through a survey at children, teachers and parents in Grenada in 2014. This data set includes 691 observations and 15 variables.”

Unfortunately, the dataset contains a few observations with missing values. As these are unrelated to the demonstration of our analytic methodology, we’ll simply remove these for the time being. Note that in a real-world data analysis, we would instead consider strategies for working with the observed data and missing observations, including imputation and inverse probability of censoring weighting. For now, we simply remove the incomplete observations, resulting in a dataset with fewer observations but much the same structure as the original:

## # A tibble: 567 × 15
##      bmi   age sex   race  numpeople   car gotosch snack tvhours cmpthours
##    <dbl> <dbl> <fct> <fct>     <int> <int> <fct>   <fct>   <dbl>     <dbl>
##  1  18.2  12.2 F     OTHER         5     3 2       1           4         0
##  2  22.8  12.8 M     OTHER         4     3 2       1           4         2
##  3  25.6  12.1 M     OTHER         2     3 2       1           0         2
##  4  15.1  12.3 M     OTHER         4     1 2       1           2         1
##  5  23.0  11.8 M     OTHER         4     1 1       1           4         3
##  6  19.2  12.1 F     OTHER         3     3 2       1           0         0
##  7  16.6  12.4 M     OTHER         5     3 1       1           0         0
##  8  22.1  11.9 F     OTHER         5     2 2       1           3         4
##  9  15.9  12.4 F     OTHER         4     3 2       2           0         0
## 10  18.6  12.7 M     OTHER         5     3 2       1           0         0
## # … with 557 more rows, and 5 more variables: cellhours <dbl>, sports <dbl>,
## #   exercises <int>, sweat <int>, overweigh <dbl>

For the analysis of this observational dataset, we focus on the effect of participating in a sports team (sports) on the BMI of children (bmi), taking several related covariates as mediators (snack, exercises, overweigh) and all other collected covariates as potential confounders. Considering an NPSEM, we separate the observed variables from the dataset into their corresponding nodes as follows:

exposure <- "sports"
outcome <- "bmi"
mediators <- c("snack", "exercises", "overweigh")
covars <- setdiff(colnames(weight_behavior_complete),
                  c(exposure, outcome, mediators))
node_list <- list(
  W = covars,
  A = exposure,
  Z = mediators,
  Y = outcome
)

Here the node_list encodes the parents of each node – for example, \(Z\) (the mediators) have parents \(A\) (the treatment) and \(W\) (the baseline confounders), and \(Y\) (the outcome) has parents \(Z\), \(A\), and \(W\).

To use the natural (in)direct effects for mediation analysis, we decompose the ATE into its corresponding direct and indirect effect components. Throughout, we’ll make use of the (semiparametric efficient) TML estimators of Zheng and van der Laan (2012), which allow for flexible ensemble machine learning to be incorporated into the estimation of nuisance parameters. For this, we rely on the sl3 R package (Coyle et al. 2020), which provides an implementation of machine learning pipelines and the Super Learner algorithm; for more on the sl3 R package, consider consulting this chapter of the tlverse handbook. Below, we construct an ensemble learner using a handful of popular machine learning algorithms:

# SL learners used for continuous data (the nuisance parameter M)
enet_contin_learner <- Lrnr_glmnet$new(
  alpha = 0.5, family = "gaussian", nfolds = 3
)
lasso_contin_learner <- Lrnr_glmnet$new(
  alpha = 1, family = "gaussian", nfolds = 3
)
fglm_contin_learner <- Lrnr_glm_fast$new(family = gaussian())
mean_learner <- Lrnr_mean$new()
contin_learner_lib <- Stack$new(
  enet_contin_learner, lasso_contin_learner, fglm_contin_learner, mean_learner
)
sl_contin_learner <- Lrnr_sl$new(learners = contin_learner_lib,
                                 metalearner = Lrnr_nnls$new())

# SL learners used for binary data (nuisance parameters G and E in this case)
enet_binary_learner <- Lrnr_glmnet$new(
  alpha = 0.5, family = "binomial", nfolds = 3
)
lasso_binary_learner <- Lrnr_glmnet$new(
  alpha = 1, family = "binomial", nfolds = 3
)
fglm_binary_learner <- Lrnr_glm_fast$new(family = binomial())
binary_learner_lib <- Stack$new(
  enet_binary_learner, lasso_binary_learner, fglm_binary_learner, mean_learner
)
logistic_metalearner <- make_learner(Lrnr_solnp,
                                     metalearner_logistic_binomial,
                                     loss_loglik_binomial)
sl_binary_learner <- Lrnr_sl$new(learners = binary_learner_lib,
                                 metalearner = logistic_metalearner)

# create list for treatment and outcome mechanism regressions
learner_list <- list(
  Y = sl_contin_learner,
  A = sl_binary_learner
)

The Natural Direct and Indirect Effects

Decomposing the Average Treatment Effect

The natural direct and indirect effects arise from a decomposition of the ATE: \[\begin{equation*} \mathbb{E}[Y(1) - Y(0)] = \underbrace{\mathbb{E}[Y(1, Z(0)) - Y(0, Z(0))]}_{NDE} + \underbrace{\mathbb{E}[Y(1, Z(1)) - Y(1, Z(0))]}_{NIE}. \end{equation*}\] In particular, the natural indirect effect (NIE) measures the effect of the treatment \(A \in \{0, 1\}\) on the outcome \(Y\) through the mediators \(Z\), while the natural direct effect (NDE) measures the effect of the treatment on the outcome through all other paths.

The Natural Direct Effect

The NDE is defined as \[\begin{equation*} \Psi_{NDE}=\mathbb{E}[Y(1, Z(0)) - Y(0, Z(0))] \overset{\text{rand.}}{=} \sum_w \sum_z [\underbrace{\mathbb{E}(Y \mid A = 1, z, w)}_{\bar{Q}_Y(A = 1, z, w)} - \underbrace{\mathbb{E}(Y \mid A = 0, z, w)}_{\bar{Q}_Y(A = 0, z, w)}] \times \underbrace{p(z \mid A = 0, w)}_{Q_Z(0, w))} \underbrace{p(w)}_{Q_W}, \end{equation*}\] where the likelihood factors \(p(z \mid A = 0, w)\) and \(p(w)\) (among other conditional densities) arise from a factorization of the joint likelihood: \[\begin{equation*} p(w, a, z, y) = \underbrace{p(y \mid w, a, z)}_{Q_Y(A, W, Z)} \underbrace{p(z \mid w, a)}_{Q_Z(Z \mid A, W)} \underbrace{p(a \mid w)}_{g(A \mid W)} \underbrace{p(w)}_{Q_W}. \end{equation*}\]

The process of estimating the NDE begins by constructing \(\bar{Q}_{Y, n}\), an estimate of the outcome mechanism \(\bar{Q}_Y(Z, A, W) = \mathbb{E}[Y \mid Z, A, W]\) (i.e., the conditional mean of \(Y\), given \(Z\), \(A\), and \(W\)). With an estimate of this conditional expectation in hand, predictions of the counterfactual quantities \(\bar{Q}_Y(Z, 1, W)\) (setting \(A = 1\)) and, likewise, \(\bar{Q}_Y(Z, 0, W)\) (setting \(A = 0\)) can easily be obtained. We denote the difference of these counterfactual quantities \(\bar{Q}_{\text{diff}}\), i.e., \(\bar{Q}_{\text{diff}} = \bar{Q}_Y(Z, 1, W) - \bar{Q}_Y(Z, 0, W)\). \(\bar{Q}_{\text{diff}}\) represents the difference in \(Y\) attributable to changes in \(A\) while keeping \(Z\) and \(W\) at their natural (i.e. observed) values.

The estimation procedure treats \(\bar{Q}_{\text{diff}}\) itself as a nuisance parameter, regressing its estimate \(\bar{Q}_{\text{diff}, n}\) on \(W\), among control observations only (i.e., those for whom \(A = 0\) is observed); the goal of this step is to remove part of the marginal impact of \(Z\) on \(\bar{Q}_{\text{diff}}\), since \(W\) is a parent of \(Z\). Regressing this difference on \(W\) among the controls recovers the expected \(\bar{Q}_{\text{diff}}\), had all individuals been set to the control condition \(A = 0\). Any residual additive effect of \(Z\) on \(\bar{Q}_{\text{diff}}\) is removed during the TML estimation step using the auxiliary (or “clever”) covariate, which accounts for the mediators \(Z\). This auxiliary covariate take the form \[\begin{equation*} C_Y(Q_Z, g)(O) = \Bigg\{\frac{\mathbb{I}(A = 1)}{g(1 \mid W)} \frac{Q_Z(Z \mid 0, W)}{Q_Z(Z \mid 1, W)} - \frac{\mathbb{I}(A = 0)}{g(0 \mid W)} \Bigg\}. \end{equation*}\] Breaking this down, \(\frac{\mathbb{I}(A = 1)}{g(1 \mid W)}\) is the inverse probability weight for \(A = 1\) and, likewise, \(\frac{\mathbb{I}(A = 0)}{g(0 \mid W)}\) is the inverse probability weight for \(A = 0\). The middle term is the ratio of the mediator density when \(A = 0\) to the mediator density when \(A = 1\).

Thus, it would appear that an estimate of this conditional density is required; unfortunately, tools to estimate such quantities are sparse in the statistics literature, and the problem is still more complicated (and computationally taxing) when \(Z\) is high-dimensional. As only the ratio of these conditional densities is required, a convenient re-parametrization may be achieved, that is, \[\begin{equation*} \frac{p(A = 0 \mid Z, W) g(0 \mid W)}{p(A = 1 \mid Z, W) g(1 \mid W)}. \end{equation*}\] Going forward, we will denote this re-parameterized conditional probability \(e(A \mid Z, W) := p(A \mid Z, W)\). This is particularly useful since the problem is reduced to the estimation of conditional means, opening the door to the use of a wide range of machine learning algorithms (e.g., most of those in sl3).

Underneath the hood, the counterfactual outcome difference \(\bar{Q}_{\text{diff}}\) and \(e(A \mid Z, W)\), the conditional probability of \(A\) given \(Z\) and \(W\), are used in constructing the auxiliary covariate for TML estimation. These nuisance parameters play an important role in the bias-correcting TMLE-update step.

The Natural Indirect Effect

Derivation and estimation of the NIE is analogous to that of the NDE. The NIE is the effect of \(A\) on \(Y\) only through the mediator(s) \(Z\). This quantity – known as the (additive) natural indirect effect \(\mathbb{E}(Y(Z(1), 1) - \mathbb{{E}(Y(Z(0), 1)\) – corresponds to the difference of the conditional expectation of \(Y\) given \(A = 1\) and \(Z(1)\) (the values the mediator would take under \(A = 1\)) and the conditional expectation of \(Y\) given \(A = 1\) and \(Z(0)\) (the values the mediator would take under \(A = 0\)).

As with the NDE, the re-parameterization trick can be used to estimate \(\mathbb{E}(A \mid Z, W)\), avoiding estimation of a possibly multivariate conditional density. However, in this case, the mediated mean outcome difference, denoted \(\Psi_Z(Q)\), is instead estimated as follows \[\begin{equation*} \Psi_{NIE}(Q) = \mathbb{E}_{QZ}(\Psi_{NIE, Z}(Q)(1, W) - \Psi_{NIE, Z}(Q)(0, W)) \end{equation*}\]

Phrased plainly, this uses \(\bar{Q}_Y(Z, 1, W)\) (the predicted values for \(Y\) given \(Z\) and \(W\) when \(A = 1\)) and regresses this vector on \(W\) among the treated units (for whom \(A = 1\) is observed) to obtain the conditional mean \(\Psi_{NIE, Z}(Q)(1, W)\). Performing the same procedure, but now regressing \(\bar{Q}_Y(Z, 1, W)\) on \(W\) among the control units (for whom \(A = 0\) is observed) yields \(\Psi_{NIE,Z}(Q)(0, W)\). The difference of these two estimates is the NIE and can be thought of as the additive marginal effect of treatment on the conditional expectation of \(Y\) given \(W\), \(A = 1\), \(Z\) through its effects on \(Z\). So, in the case of the NIE, our estimate \(\psi_n\) is slightly different, but the same quantity \(e(A \mid Z, W)\) comes into play as the auxiliary covariate.

Estimating the Natural Indirect Effect

We demonstrate calculation of the NIE using tmle3mediate below, starting by instantiating a “Spec” object that encodes exactly which learners to use for the nuisance parameters \(e(A \mid Z, W)\) and \(\Psi_Z\). We then pass our Spec object to the tmle3 function, alongside the data, the node list (created above), and a learner list indicating which machine learning algorithms to use for estimating the nuisance parameters based on \(A\) and \(Y\).

tmle_spec_NIE <- tmle_NIE(
  e_learners = Lrnr_cv$new(lasso_binary_learner, full_fit = TRUE),
  psi_Z_learners = Lrnr_cv$new(lasso_contin_learner, full_fit = TRUE),
  max_iter = 1
)
weight_behavior_NIE <- tmle3(
  tmle_spec_NIE, weight_behavior_complete, node_list, learner_list
)
weight_behavior_NIE

## A tmle3_Fit that took 1 step(s)
##    type                  param init_est  tmle_est        se       lower
## 1:  NIE NIE[Y_{A=1} - Y_{A=0}] 1.020857 0.9976898 0.5242565 -0.02983395
##       upper psi_transformed lower_transformed upper_transformed
## 1: 2.025214       0.9976898       -0.02983395          2.025214

Estimating the Natural Direct Effect

An analogous procedure applies for estimation of the NDE, only replacing the Spec object for the NIE with tmle_spec_NDE to define learners for the NDE nuisance parameters:

tmle_spec_NDE <- tmle_NDE(
  e_learners = Lrnr_cv$new(lasso_binary_learner, full_fit = TRUE),
  psi_Z_learners = Lrnr_cv$new(lasso_contin_learner, full_fit = TRUE),
  max_iter = 1
)
weight_behavior_NDE <- tmle3(
  tmle_spec_NDE, weight_behavior_complete, node_list, learner_list
)
weight_behavior_NDE

## A tmle3_Fit that took 1 step(s)
##    type                  param init_est tmle_est        se     lower    upper
## 1:  NDE NDE[Y_{A=1} - Y_{A=0}] 0.026284 0.026284 0.7494207 -1.442554 1.495122
##    psi_transformed lower_transformed upper_transformed
## 1:        0.026284         -1.442554          1.495122

The Population Intervention Direct and Indirect Effects

At times, the natural direct and indirect effects may prove too limiting, as these effect definitions are based on static interventions (i.e., setting \(A = 0\) or \(A = 1\)), which may be unrealistic for real-world interventions. In such cases, one may turn instead to the population intervention direct effect (PIDE) and the population intervention indirect effect (PIIE), which are based on decomposing the effect of the population intervention effect (PIE) of flexible stochastic interventions (Dı́az and Hejazi 2020).

A particular type of stochastic intervention well-suited to working with binary treatments is the incremental propensity score intervention (IPSI), first proposed by (???). Such interventions do not deterministically set the treatment level of an observed unit to a fixed quantity (i.e., setting \(A = 1\)), but instead alter the odds of receiving the treatment by a fixed amount (\(0 \leq \delta \leq \infty\)) for each individual. In the case of the mma dataset, we will proceed by considering an IPSI that modulates the odds of participating in a sports team by \(\delta = 2\). Such an intervention may be interpreted (hypothetically) as the effect of a school program that motivates children to participate in sports teams (e.g., as in an encouragement design). To exemplify our approach, we postulate a motivational intervention that doubles the odds (i.e., \(\delta = 2\)) of participating in a sports team for each individual:

delta_ipsi <- 2

Decomposing the Population Intervention Effect

We may decompose the population intervention effect (PIE) in terms of the population intervention direct effect (PIDE) and the population intervention indirect effect (PIIE): \[\begin{equation*} \mathbb{E}\{Y(A_\delta)\} - \mathbb{E}Y = \overbrace{\mathbb{E}\{Y(A_\delta, Z(A_\delta)) - Y(A_\delta, Z)\}}^{\text{PIIE}} + \overbrace{\mathbb{E}\{Y(A_\delta, Z) - Y(A, Z)\}}^{\text{PIDE}}. \end{equation*}\]

This decomposition of the PIE as the sum of the population intervention direct and indirect effects has an interpretation analogous to the corresponding standard decomposition of the average treatment effect. In the sequel, we will compute each of the components of the direct and indirect effects above using appropriate estimators as follows

For \(\mathbb{E}\{Y(A, Z)\}\), the sample mean \(\frac{1}{n}\sum_{i=1}^n Y_i\) is consistent;
for \(\mathbb{E}\{Y(A_{\delta}, Z)\}\), a TML estimator for the effect of a joint intervention altering the treatment mechanism but not the mediation mechanism, based on the proposal in Dı́az and Hejazi (2020); and,
for \(\mathbb{E}\{Y(A_{\delta}, Z_{A_{\delta}})\}\), an efficient estimator for the effect of a joint intervention altering both the treatment and mediation mechanisms, as proposed in (???) and implemented in the npcausal R package.

Estimating the Effect Decomposition Term

As described by Dı́az and Hejazi (2020), the statistical functional identifying the decomposition term that appears in both the PIDE and PIIE \(\mathbb{E}\{Y(A_{\delta}, Z)\}\), which corresponds to altering the treatment mechanism while keeping the mediation mechanism fixed, is \[\begin{equation*} \theta_0(\delta) = \int m_0(a, z, w) g_{0,\delta}(a \mid w) p_0(z, w) d\nu(a, z, w), \end{equation*}\] for which a TML estimator is available. The corresponding efficient influence function (EIF) with respect to the nonparametric model \(\mathcal{M}\) is \(D_{\eta,\delta}(o) = D^Y_{\eta,\delta}(o) + D^A_{\eta,\delta}(o) + D^{Z,W}_{\eta,\delta}(o) - \theta(\delta)\).

The TML estimator may be computed basd on the EIF estimating equation and may incorporate cross-validation (Zheng and van der Laan 2011; Chernozhukov et al. 2018) to circumvent possibly restrictive entropy conditions (e.g., Donsker class). The resultant estimator is \[\begin{equation*} \hat{\theta}(\delta) = \frac{1}{n} \sum_{i = 1}^n D_{\hat{\eta}_{j(i)}, \delta}(O_i) = \frac{1}{n} \sum_{i = 1}^n \left\{ D^Y_{\hat{\eta}_{j(i)}, \delta}(O_i) + D^A_{\hat{\eta}_{j(i)}, \delta}(O_i) + D^{Z,W}_{\hat{\eta}_{j(i)}, \delta}(O_i) \right\}, \end{equation*}\] which is implemented in tmle3mediate (a one-step estimator is also avaialble, in the medshift R package). We demonstrate the use of tmle3mediate to obtain \(\mathbb{E}\{Y(A_{\delta}, Z)\}\) via its TML estimator:

# instantiate tmle3 spec for stochastic mediation
tmle_spec_pie_decomp <- tmle_medshift(
  delta = delta_ipsi,
  e_learners = Lrnr_cv$new(lasso_binary_learner, full_fit = TRUE),
  phi_learners = Lrnr_cv$new(lasso_contin_learner, full_fit = TRUE)
)

# compute the TML estimate
weight_behavior_pie_decomp <- tmle3(
  tmle_spec_pie_decomp, weight_behavior_complete, node_list, learner_list
)
weight_behavior_pie_decomp

## A tmle3_Fit that took 2144 step(s)
##    type         param init_est tmle_est        se    lower    upper
## 1: PIDE E[Y_{A=NULL}] 19.16216 19.14584 0.1881337 18.77711 19.51458
##    psi_transformed lower_transformed upper_transformed
## 1:        19.14584          18.77711          19.51458

Estimating the Population Intervention Direct Effect

Recall that, based on the decomposition outlined previously, the population intervention direct effect may be denoted \(\beta_{\text{PIDE}}(\delta) = \theta_0(\delta) - \mathbb{E}Y\). Thus, an estimator of the PIDE, \(\hat{\beta}_{\text{PIDE}}(\delta)\) may be expressed as a composition of estimators of its constituent parameters: \[\begin{equation*} \hat{\beta}_{\text{PIDE}}({\delta}) = \hat{\theta}(\delta) - \frac{1}{n} \sum_{i = 1}^n Y_i. \end{equation*}\]

Based on the above, we may construct an estimator of the PIDE using quantities already computed. To do this, we need only apply the delta method, available from the tmle3 package.

References

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21 (1). https://doi.org/10.1111/ectj.12097.

Coyle, Jeremy R, Nima S Hejazi, Ivana Malenica, and Oleg Sofrygin. 2020. sl3: Modern Pipelines for Machine Learning and Super Learning. https://github.com/tlverse/sl3. https://doi.org/10.5281/zenodo.1342293.

Dı́az, Iván, and Nima S Hejazi. 2020. “Causal Mediation Analysis for Stochastic Interventions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (3): 661–83. https://doi.org/10.1111/rssb.12362.

Robins, James M, and Sander Greenland. 1992. “Identifiability and Exchangeability for Direct and Indirect Effects.” Epidemiology, 143–55.

Tchetgen Tchetgen, Eric J, and Ilya Shpitser. 2012. “Semiparametric Theory for Causal Mediation Analysis: Efficiency Bounds, Multiple Robustness, and Sensitivity Analysis.” Annals of Statistics 40 (3): 1816–45. https://doi.org/10.1214/12-AOS990.

VanderWeele, Tyler. 2015. Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford University Press.

Zheng, Wenjing, and Mark J van der Laan. 2011. “Cross-Validated Targeted Minimum-Loss-Based Estimation.” In Targeted Learning, 459–74. Springer.

———. 2012. “Targeted Maximum Likelihood Estimation of Natural Direct Effects.” International Journal of Biostatistics 8 (1). https://doi.org/10.2202/1557-4679.1361.

Targeted Learning for Causal Mediation Analysis

Nima Hejazi, James Duncan, and David McCoy

2021-08-29