Chapter 3 Ensemble Machine Learning
Rachael Phillips
Based on the sl3
R
package by Jeremy
Coyle, Nima Hejazi, Ivana Malenica, and Oleg Sofrygin.
Updated: 2019-05-22
3.1 Learning Objectives
By the end of this lesson you will be able to:
- Assemble an ensemble of learners based on the properties that identify what features they support.
- Customize learner hyperparameters to incorporate a diversity of different settings.
- Select a subset of available covariates and pass only those variables to the modeling algorithm.
- Fit an ensemble with nested cross-validation to obtain an estimate of the performance of the ensemble itself.
- Calculate
sl3
variable importance metrics. - Interpret the discrete and continuous super learner fits.
- Rationalize the need to remove bias from the super learner to make an optimal bias-variance tradeoff for the parameter of interest.
3.2 Introduction
Now that we have defined the statistical estimation problem in The Targeted Learning Roadmap, we are ready construct the TMLE; an asymptotically efficient substitution estimator of this target quantity.
The first step in this estimation procedure is an initial estimate of the data-generating distribution, or the relevant part of this distribution that is needed to evaluate the target parameter. For this initial estimation, we use the super learner (Van der Laan, Polley, and Hubbard 2007), an important step for creating a robust estimator.
Super Learner
Loss-function-based tool that uses V-fold cross-validation to obtain the best prediction of the relevant part of the likelihood that’s needed to evaluate target parameter.
Requires expressing the estimand as the minimizer of an expected loss, and proposing a library of algorithms (“learners” in
sl3
nomenclature) that we think might be consistent with the true data-generating distribution.Proven to be asymptotically as accurate as the best possible prediction algorithm that is tested (van der Laan and Dudoit 2003; Van der Vaart, Dudoit, and Laan 2006).
The discrete super learner, or cross-validated selector, is the algorithm in the library that minimizes the V-fold cross-validated empirical risk.
The continuous super learner is a weighted average of the library of algorithms, where the weights are chosen to minimize the V-fold cross-validated empirical risk of the library. Restricting the weights (“metalearner” in
sl3
nomenclature) to be positive and sum to one (convex combination) has been shown to improve upon the discrete super learner (Polley and Van Der Laan 2010; Van der Laan, Polley, and Hubbard 2007).This background material is described in greater detail in the accompanying
tlverse
handbooksl3
chapter.
3.3 Basic sl3
Implementation
We begin by illustrating the core functionality of the super learner algorithm
as implemented in sl3
.
The sl3
implementation consists of the following steps:
- Load the necessary libraries and data
- Define the machine learning task
- Make a super learner by creating library of base learners and a metalearner
- Train the super learner on the machine learning task
- Obtain predicted values
WASH Benefits Study Example
Using the WASH data, we are interested in predicting weight-for-height z-score
whz
using the available covariate data.
0. Load the necessary libraries and data
library(kableExtra)
library(knitr)
library(skimr)
library(tidyverse)
library(data.table)
library(sl3)
library(SuperLearner)
library(origami)
set.seed(7194)
# load data set and take a peek
washb_data <- fread("https://raw.githubusercontent.com/tlverse/tlverse-data/master/wash-benefits/washb_data.csv",
stringsAsFactors = TRUE)
head(washb_data) %>%
kable(digits = 4) %>%
kable_styling(fixed_thead = T, font_size = 10) %>%
scroll_box(width = "100%", height = "250px")
whz | tr | fracode | month | aged | sex | momage | momedu | momheight | hfiacat | Nlt18 | Ncomp | watmin | elec | floor | walls | roof | asset_wardrobe | asset_table | asset_chair | asset_khat | asset_chouki | asset_tv | asset_refrig | asset_bike | asset_moto | asset_sewmach | asset_mobile |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.00 | Control | N05265 | 9 | 268 | male | 30 | Primary (1-5y) | 146.40 | Food Secure | 3 | 11 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
-1.16 | Control | N05265 | 9 | 286 | male | 25 | Primary (1-5y) | 148.75 | Moderately Food Insecure | 2 | 4 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
-1.05 | Control | N08002 | 9 | 264 | male | 25 | Primary (1-5y) | 152.15 | Food Secure | 1 | 10 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
-1.26 | Control | N08002 | 9 | 252 | female | 28 | Primary (1-5y) | 140.25 | Food Secure | 3 | 5 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
-0.59 | Control | N06531 | 9 | 336 | female | 19 | Secondary (>5y) | 150.95 | Food Secure | 2 | 7 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
-0.51 | Control | N06531 | 9 | 304 | male | 20 | Secondary (>5y) | 154.20 | Severely Food Insecure | 0 | 3 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
1. Define the machine learning task
To define the machine learning “task” (predict weight-for-height z-score
whz
using the available covariate data), we need to create an sl3_Task
object.
The sl3_Task
keeps track of the roles the variables play in the
machine learning problem, the data, and any metadata (e.g., observational-level
weights, id, offset).
# specify the outcome and covariates
outcome <- "whz"
covars <- colnames(washb_data)[-which(names(washb_data) == outcome)]
# create the sl3 task
washb_task <- make_sl3_Task(
data = washb_data,
covariates = covars,
outcome = outcome
)
Warning in .subset2(public_bind_env, "initialize")(...): Missing Covariate Data
Found. Imputing covariates using sl3_process_missing
# examine the task
washb_task
A sl3 Task with 4695 obs and these nodes:
$covariates
[1] "tr" "fracode" "month" "aged"
[5] "sex" "momage" "momedu" "momheight"
[9] "hfiacat" "Nlt18" "Ncomp" "watmin"
[13] "elec" "floor" "walls" "roof"
[17] "asset_wardrobe" "asset_table" "asset_chair" "asset_khat"
[21] "asset_chouki" "asset_tv" "asset_refrig" "asset_bike"
[25] "asset_moto" "asset_sewmach" "asset_mobile" "delta_momage"
[29] "delta_momheight"
$outcome
[1] "whz"
$id
NULL
$weights
NULL
$offset
NULL
2. Make a super learner
Now that we have defined our machine learning problem with the task, we are ready to “make” the super learner. This requires specification of
- Base learning algorithms, to establish a library of learners that we think might be consistent with the true data-generating distribution.
- Metalearner, to ensemble the base learners.
We might also incorporate
- Feature selection, to pass only a subset of the predictors to the algorithm.
- Hyperparameter specification, to tune base learners.
Learners have properties that indicate what features they support. We may use
sl3_list_properties()
to get a list of all properties supported by at least
one learner.
sl3_list_properties()
[1] "binomial" "categorical" "continuous"
[4] "cv" "density" "ids"
[7] "multivariate_outcome" "offset" "preprocessing"
[10] "timeseries" "weights" "wrapper"
Since we have a continuous outcome, we may identify the learners that support
this outcome type with sl3_list_learners()
.
sl3_list_learners(c("continuous"))
[1] "Lrnr_arima" "Lrnr_bartMachine"
[3] "Lrnr_bilstm" "Lrnr_condensier"
[5] "Lrnr_dbarts" "Lrnr_expSmooth"
[7] "Lrnr_glm" "Lrnr_glm_fast"
[9] "Lrnr_glmnet" "Lrnr_grf"
[11] "Lrnr_h2o_glm" "Lrnr_h2o_grid"
[13] "Lrnr_hal9001" "Lrnr_HarmonicReg"
[15] "Lrnr_lstm" "Lrnr_mean"
[17] "Lrnr_nnls" "Lrnr_optim"
[19] "Lrnr_pkg_SuperLearner" "Lrnr_pkg_SuperLearner_method"
[21] "Lrnr_pkg_SuperLearner_screener" "Lrnr_randomForest"
[23] "Lrnr_ranger" "Lrnr_rpart"
[25] "Lrnr_rugarch" "Lrnr_solnp"
[27] "Lrnr_stratified" "Lrnr_svm"
[29] "Lrnr_tsDyn" "Lrnr_xgboost"
Now that we have an idea of some learners, we can construct them using the
make_learner
function.
# choose base learners
lrnr_glm <- make_learner(Lrnr_glm)
lrnr_mean <- make_learner(Lrnr_mean)
lrnr_glmnet <- make_learner(Lrnr_glmnet)
We can customize learner hyperparameters to incorporate a diversity of different settings.
Documentation for the learners and their hyperparameters can be found
in the sl3
Learners
Reference.
We can also include learners from the SuperLearner
R
package.
lrnr_ranger100 <- make_learner(Lrnr_ranger, num.trees = 100)
lrnr_hal_simple <- make_learner(Lrnr_hal9001, degrees = 1, n_folds = 2)
lrnr_gam <- Lrnr_pkg_SuperLearner$new("SL.gam")
lrnr_bayesglm <- Lrnr_pkg_SuperLearner$new("SL.bayesglm")
In order to assemble the library of learners, we need to “stack” them together.
A Stack
is a special learner and it has the same interface as all
other learners. What makes a stack special is that it combines multiple learners
by training them simultaneously, so that their predictions can be either
combined or compared.
stack <- make_learner(
Stack,
lrnr_glm, lrnr_mean, lrnr_ranger100, lrnr_glmnet,
lrnr_gam, lrnr_bayesglm
)
We will fit a non-negative least squares metalearner using Lrnr_nnls
. Note
that any learner can be used as a metalearner.
metalearner <- make_learner(Lrnr_nnls)
We can optionally select a subset of available covariates and pass only those variables to the modeling algorithm.
Let’s consider screening covariates based on their correlation with our outcome
of interest (cor.test
p-value \(\leq 0.1\)).
screen_cor <- Lrnr_pkg_SuperLearner_screener$new("screen.corP")
# which covariates are selected on the full data?
screen_cor$train(washb_task)
[1] "Lrnr_pkg_SuperLearner_screener_screen.corP"
$selected
[1] "tr" "fracode" "aged" "momage"
[5] "momedu" "momheight" "hfiacat" "Nlt18"
[9] "elec" "floor" "walls" "asset_wardrobe"
[13] "asset_table" "asset_chair" "asset_khat" "asset_chouki"
[17] "asset_tv" "asset_refrig" "asset_moto" "asset_sewmach"
[21] "asset_mobile"
To “pipe” only the selected covariates to the modeling algorithm, we need to
make a Pipeline
, which is a just set of learners to be fit sequentially, where
the fit from one learner is used to define the task for the next learner.
cor_pipeline <- make_learner(Pipeline, screen_cor, stack)
Now our learners will be preceded by a screening step.
We also consider the original stack
, just to compare how the feature selection
methods perform in comparison to the methods without feature selection.
Analogous to what we have seen before, we have to stack the pipeline and
original stack
together, so we may use them as base learners in our super
learner.
fancy_stack <- make_learner(Stack, cor_pipeline, stack)
# we can visualize the stack
dt_stack <- delayed_learner_train(fancy_stack, washb_task)
plot(dt_stack, color = FALSE, height = "400px", width = "100%")
We have made a library/stack of base learners and a metalearner, so we are ready to make the super learner. The super learner algorithm fits a metalearner on the validation-set predictions.
sl <- make_learner(Lrnr_sl,
learners = fancy_stack,
metalearner = metalearner
)
# we can visualize the super learner
dt_sl <- delayed_learner_train(sl, washb_task)
plot(dt_sl, color = FALSE, height = "400px", width = "100%")
3. Train the super learner on the machine learning task
Now we are ready to “train” our super learner on our sl3_task
object, washb_task
.
sl_fit <- sl$train(washb_task)
4. Obtain predicted values
Now that we have fit the super learner, we are ready to obtain our predicted values, and we can also obtain a summary of the results.
sl_preds <- sl_fit$predict()
head(sl_preds)
[1] -0.5139505 -0.8953067 -0.7283936 -0.7681312 -0.6325866 -0.7474441
sl_fit$print()
[1] "SuperLearner:"
List of 2
$ : chr "Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)"
$ : chr "Stack"
[1] "Lrnr_nnls"
lrnrs
1: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_glm_TRUE
2: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_mean
3: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_ranger_100_TRUE_1
4: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_glmnet_NULL_deviance_10_1_100_TRUE
5: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_pkg_SuperLearner_SL.gam
6: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_pkg_SuperLearner_SL.bayesglm
7: Stack_Lrnr_glm_TRUE
8: Stack_Lrnr_mean
9: Stack_Lrnr_ranger_100_TRUE_1
10: Stack_Lrnr_glmnet_NULL_deviance_10_1_100_TRUE
11: Stack_Lrnr_pkg_SuperLearner_SL.gam
12: Stack_Lrnr_pkg_SuperLearner_SL.bayesglm
weights
1: 0.00000000
2: 0.02094039
3: 0.27896845
4: 0.00000000
5: 0.09953083
6: 0.00000000
7: 0.00000000
8: 0.00000000
9: 0.15706577
10: 0.11337860
11: 0.33096117
12: 0.00000000
[1] "Cross-validated risk (MSE, squared error loss):"
learner
1: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_glm_TRUE
2: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_mean
3: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_ranger_100_TRUE_1
4: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_glmnet_NULL_deviance_10_1_100_TRUE
5: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_pkg_SuperLearner_SL.gam
6: Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_pkg_SuperLearner_SL.bayesglm
7: Stack_Lrnr_glm_TRUE
8: Stack_Lrnr_mean
9: Stack_Lrnr_ranger_100_TRUE_1
10: Stack_Lrnr_glmnet_NULL_deviance_10_1_100_TRUE
11: Stack_Lrnr_pkg_SuperLearner_SL.gam
12: Stack_Lrnr_pkg_SuperLearner_SL.bayesglm
13: SuperLearner
coefficients mean_risk SE_risk fold_SD fold_min_risk fold_max_risk
1: NA 1.015128 0.02363317 0.07629401 0.8927540 1.131594
2: NA 1.065282 0.02502664 0.09191791 0.9264292 1.196647
3: NA 1.017992 0.02355587 0.08078127 0.8628044 1.147006
4: NA 1.012581 0.02359430 0.07887722 0.8821712 1.130815
5: NA 1.011497 0.02357149 0.07449866 0.8919503 1.132290
6: NA 1.015119 0.02363328 0.07631510 0.8926608 1.131570
7: NA 1.018612 0.02380402 0.07799191 0.8956048 1.134940
8: NA 1.065282 0.02502664 0.09191791 0.9264292 1.196647
9: NA 1.018297 0.02340946 0.08833583 0.8740308 1.170782
10: NA 1.012294 0.02359055 0.07927258 0.8826979 1.130114
11: NA 1.012122 0.02358982 0.07486427 0.8981537 1.135950
12: NA 1.018596 0.02380414 0.07801948 0.8954820 1.134909
13: NA 1.004581 0.02337043 0.07938602 0.8715618 1.134431
3.4 Extensions
3.4.1 Cross-validated Super Learner
We can cross-validate the super learner to see how well the super learner performs on unseen data, and obtain an estimate of the cross-validated risk of the super learner.
This estimation procedure requires an “external” layer of cross-validation,
also called nested cross-validation, which involves setting aside a separate
holdout sample that we don’t use to fit the super learner. This
external cross validation procedure may also incorporate 10 folds, which is the
default in sl3
. However, we will incorporate 2 outer/external folds of
cross-validation for computational efficiency.
We also need to specify a loss function to evaluate super learner.
Documentation for the available loss functions can be found in the sl3
Loss
Function Reference.
washb_task_new <- make_sl3_Task(
data = washb_data,
covariates = covars,
outcome = outcome,
folds = make_folds(washb_data, fold_fun = folds_vfold, V = 2)
)
Warning in .subset2(public_bind_env, "initialize")(...): Missing Covariate Data
Found. Imputing covariates using sl3_process_missing
CVsl <- CV_lrnr_sl(sl_fit, washb_task_new, loss_squared_error)
CVsl %>%
kable(digits = 4) %>%
kable_styling(fixed_thead = T, font_size = 10) %>%
scroll_box(width = "100%", height = "250px")
learner | coefficients | mean_risk | SE_risk | fold_SD | fold_min_risk | fold_max_risk |
---|---|---|---|---|---|---|
Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_glm_TRUE | NA | 1.0153 | 0.0235 | 0.0133 | 1.0060 | 1.0247 |
Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_mean | NA | 1.0652 | 0.0250 | 0.0219 | 1.0497 | 1.0807 |
Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_ranger_100_TRUE_1 | NA | 1.0204 | 0.0235 | 0.0061 | 1.0161 | 1.0247 |
Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_glmnet_NULL_deviance_10_1_100_TRUE | NA | 1.0126 | 0.0235 | 0.0127 | 1.0036 | 1.0217 |
Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_pkg_SuperLearner_SL.gam | NA | 1.0124 | 0.0235 | 0.0109 | 1.0047 | 1.0201 |
Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_pkg_SuperLearner_SL.bayesglm | NA | 1.0153 | 0.0235 | 0.0133 | 1.0059 | 1.0247 |
Stack_Lrnr_glm_TRUE | NA | 1.0244 | 0.0248 | 0.0250 | 1.0067 | 1.0421 |
Stack_Lrnr_mean | NA | 1.0652 | 0.0250 | 0.0219 | 1.0497 | 1.0807 |
Stack_Lrnr_ranger_100_TRUE_1 | NA | 1.0199 | 0.0236 | 0.0107 | 1.0124 | 1.0275 |
Stack_Lrnr_glmnet_NULL_deviance_10_1_100_TRUE | NA | 1.0140 | 0.0236 | 0.0123 | 1.0053 | 1.0227 |
Stack_Lrnr_pkg_SuperLearner_SL.gam | NA | 1.0245 | 0.0272 | 0.0349 | 0.9998 | 1.0491 |
Stack_Lrnr_pkg_SuperLearner_SL.bayesglm | NA | 1.0244 | 0.0249 | 0.0251 | 1.0067 | 1.0421 |
SuperLearner | NA | 1.0085 | 0.0234 | 0.0141 | 0.9985 | 1.0184 |
3.4.2 Variable Importance Measures with sl3
The sl3
varimp
function returns a table with variables listed in decreasing
order of importance, in which the measure of importance is based on a risk
difference between the learner fit with a permuted covariate and the learner
fit with the true covariate, across all covariates.
In this manner, the larger the risk difference, the more important the variable is in the prediction.
washb_varimp <- varimp(sl_fit, loss_squared_error)
washb_varimp %>%
kable(digits = 4) %>%
kable_styling(fixed_thead = T, font_size = 10) %>%
scroll_box(width = "100%", height = "250px")
X | risk_diff |
---|---|
aged | 0.0379 |
month | 0.0088 |
momedu | 0.0081 |
tr | 0.0058 |
asset_chair | 0.0038 |
fracode | 0.0034 |
asset_refrig | 0.0033 |
momheight | 0.0030 |
Nlt18 | 0.0027 |
elec | 0.0024 |
asset_chouki | 0.0022 |
floor | 0.0020 |
asset_moto | 0.0018 |
asset_wardrobe | 0.0017 |
hfiacat | 0.0015 |
asset_khat | 0.0007 |
asset_mobile | 0.0005 |
momage | 0.0004 |
sex | 0.0003 |
asset_bike | 0.0000 |
asset_sewmach | 0.0000 |
delta_momage | -0.0001 |
walls | -0.0002 |
roof | -0.0003 |
delta_momheight | -0.0003 |
asset_table | -0.0005 |
watmin | -0.0006 |
Ncomp | -0.0006 |
asset_tv | -0.0010 |
3.5 Exercise
3.5.1 Predicting Myocardial Infarction with sl3
Follow the steps below to predict myocardial infarction (mi
) using the
available covariate data. Thanks to Professor David Benkeser at Emory University
for making the this Cardiovascular Health Study (CHS) data accessible.
Work with a buddy/team. You have 20 minutes.
In the etherpad, submit your group’s answers to the following questions.
- Which learner was the discrete super learner? What was the cross validated mean risk of the discrete super learner?
- What was the cross validated risk of the continuous super learner?
- Did your group face any challenges?
- Any additional comments/questions about this
sl3
section of the workshop?
# load the data set
db_data <-
url("https://raw.githubusercontent.com/benkeser/sllecture/master/chspred.csv")
chspred <- read_csv(file = db_data, col_names = TRUE)
# take a quick peek
head(chspred) %>%
kable(digits = 4) %>%
kable_styling(fixed_thead = T, font_size = 10) %>%
scroll_box(width = "100%", height = "200px")
waist | alcoh | hdl | beta | smoke | ace | ldl | bmi | aspirin | gend | age | estrgn | glu | ins | cysgfr | dm | fetuina | whr | hsed | race | logcystat | logtrig | logcrp | logcre | health | logkcal | sysbp | mi |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
110.1642 | 0.0000 | 66.4974 | 0 | 0 | 1 | 114.2162 | 27.9975 | 0 | 0 | 73.5179 | 0 | 159.9314 | 70.3343 | 75.0078 | 1 | 0.1752 | 1.1690 | 1 | 1 | -0.3420 | 5.4063 | 2.0126 | -0.6739 | 0 | 4.3926 | 177.1345 | 0 |
89.9763 | 0.0000 | 50.0652 | 0 | 0 | 0 | 103.7766 | 20.8931 | 0 | 0 | 61.7723 | 0 | 153.3888 | 33.9695 | 82.7433 | 1 | 0.5717 | 0.9011 | 0 | 0 | -0.0847 | 4.8592 | 3.2933 | -0.5551 | 1 | 6.2071 | 136.3742 | 0 |
106.1941 | 8.4174 | 40.5059 | 0 | 0 | 0 | 165.7158 | 28.4554 | 1 | 1 | 72.9312 | 0 | 121.7145 | -17.3017 | 74.6989 | 0 | 0.3517 | 1.1797 | 0 | 1 | -0.4451 | 4.5088 | 0.3013 | -0.0115 | 0 | 6.7320 | 135.1993 | 0 |
90.0566 | 0.0000 | 36.1750 | 0 | 0 | 0 | 45.2035 | 23.9608 | 0 | 0 | 79.1191 | 0 | 53.9691 | 11.7315 | 95.7823 | 0 | 0.5439 | 1.1360 | 0 | 0 | -0.4807 | 5.1832 | 3.0243 | -0.5751 | 1 | 7.3972 | 139.0182 | 0 |
78.6143 | 2.9790 | 71.0642 | 0 | 1 | 0 | 131.3121 | 10.9656 | 0 | 1 | 69.0179 | 0 | 94.3153 | 9.7112 | 72.7109 | 0 | 0.4916 | 1.1028 | 1 | 0 | 0.3121 | 4.2190 | -0.7057 | 0.0053 | 1 | 8.2779 | 88.0470 | 0 |
91.6593 | 0.0000 | 59.4963 | 0 | 0 | 0 | 171.1872 | 29.1317 | 0 | 1 | 81.8346 | 0 | 212.9066 | -28.2269 | 69.2184 | 1 | 0.4621 | 0.9529 | 1 | 0 | -0.2872 | 5.1773 | 0.9705 | 0.2127 | 1 | 5.9942 | 69.5943 | 0 |
- Create an
sl3
task, setting myocardial infarctionmi
as the outcome and using all available covariate data. - Make a library of seven relatively fast base learning algorithms (i.e., do
not consider BART or HAL). Customize hyperparameters for one of your
learners. Feel free to use learners from
sl3
orSuperLearner
. You may use the same base learning library that is presented above. - Incorporate feature selection with the
SuperLearner
screenerscreen.corP
. - Fit the metalearning step with non-negative least squares,
Lrnr_nnls
. - With the metalearner and base learners, make the super learner and train it on the task.
- Print your super learner fit by calling
print()
with$
. - Cross-validate your super learner fit to see how well it performs on unseen
data. Specify
loss_squared_error
as the loss function to evaluate the super learner. Like above, create a new task with 2 folds of external cross validation for computational efficiency.
3.6 Summary
The general ensemble learning approach of super learner can be applied to a diversity of estimation and prediction problems that can be defined by a loss function.
Plug-in estimators of the estimand are desirable because a plug-in estimator respects both the local and global constraints of the statistical model.
Asymptotically linear estimators are also advantageous, since they converge to the estimand at \(1/\sqrt{n}\) rate, and thereby permit formal statistical inference.
If we plug in the estimator returned by super learner into the target parameter mapping, then we would end up with an estimator that has the same bias as what we plugged in. This estimator would not be asymptotically linear.
Targeted maximum likelihood estimation (TMLE) is a general strategy that succeeds in constructing asymptotically linear plug-in estimators.
In the chapters that follow, we focus on the targeted maximum likelihood estimator and the targeted minimum loss-based estimator, both referred to as TMLE.
References
Polley, Eric C, and Mark J Van Der Laan. 2010. “Super Learner in Prediction.” bepress.
van der Laan, Mark J, and Sandrine Dudoit. 2003. “Unified Cross-Validation Methodology for Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples.” bepress.
Van der Laan, Mark J, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1). De Gruyter.
Van der Vaart, Aad W, Sandrine Dudoit, and Mark J van der Laan. 2006. “Oracle Inequalities for Multi-Fold Cross Validation.” Statistics & Decisions 24 (3). Oldenbourg Wissenschaftsverlag: 351–71.