A flexible implementation of the Super Learner ensemble machine learning system
Authors: Jeremy Coyle, Nima Hejazi, Ivana Malenica, Rachael Phillips, and Oleg Sofrygin
sl3
?
sl3
is an implementation of the Super Learner ensemble machine learning algorithm of van der Laan, Polley, and Hubbard (2007). The Super Learner algorithm performs ensemble learning in one of two fashions:
sl3
nomenclature) – that is, the discrete Super Learner is the single learning algorithm that minimizes the cross-validated risk.Looking for long-form documentation or a walkthrough of the sl3
package? Don’t worry! Just browse the chapter in our book.
Install the most recent version from the master
branch on GitHub via remotes
:
remotes::install_github("tlverse/sl3")
Past stable releases may be located via the releases page on GitHub and may be installed by including the appropriate major version tag. For example,
remotes::install_github("tlverse/sl3@v1.3.7")
To contribute, check out the devel
branch and consider submitting a pull request.
sl3
makes the process of applying screening algorithms, learning algorithms, combining both types of algorithms into a stacked regression model, and cross-validating this whole process essentially trivial. The best way to understand this is to see the sl3
package in action:
set.seed(49753)
library(tidyverse)
library(data.table)
library(SuperLearner)
library(origami)
library(sl3)
# load example data set
data(cpp)
cpp <- cpp %>%
dplyr::filter(!is.na(haz)) %>%
mutate_all(~ replace(., is.na(.), 0))
# use covariates of intest and the outcome to build a task object
covars <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
"sexn")
task <- sl3_Task$new(
data = cpp,
covariates = covars,
outcome = "haz"
)
# set up screeners and learners via built-in functions and pipelines
slscreener <- Lrnr_pkg_SuperLearner_screener$new("screen.glmnet")
glm_learner <- Lrnr_glm$new()
screen_and_glm <- Pipeline$new(slscreener, glm_learner)
SL.glmnet_learner <- Lrnr_pkg_SuperLearner$new(SL_wrapper = "SL.glmnet")
# stack learners into a model (including screeners and pipelines)
learner_stack <- Stack$new(SL.glmnet_learner, glm_learner, screen_and_glm)
stack_fit <- learner_stack$train(task)
preds <- stack_fit$predict()
head(preds)
#> Lrnr_pkg_SuperLearner_SL.glmnet Lrnr_glm_TRUE
#> 1: 0.3525946 0.36298498
#> 2: 0.3525946 0.36298498
#> 3: 0.2442593 0.25993072
#> 4: 0.2442593 0.25993072
#> 5: 0.2442593 0.25993072
#> 6: 0.0269504 0.05680264
#> Pipeline(Lrnr_pkg_SuperLearner_screener_screen.glmnet->Lrnr_glm_TRUE)
#> 1: 0.36228209
#> 2: 0.36228209
#> 3: 0.25870995
#> 4: 0.25870995
#> 5: 0.25870995
#> 6: 0.05600958
future
s
While it’s straightforward to fit a stack of learners (as above), it’s easy to take advantage of sl3
’s built-in parallelization support too. To do this, you can simply choose a plan()
from the future
ecosystem.
In the above examples, we fit stacks of learners, but didn’t create a Super Learner ensemble, which uses cross-validation (CV) to build the ensemble model. For the sake of computational expedience, we may be interested in lowering the number of CV folds (from 10). Let’s take a look at how to do both below.
# first, let's instantiate some more learners and create a Super Learner
mean_learner <- Lrnr_mean$new()
rf_learner <- Lrnr_ranger$new()
sl <- Lrnr_sl$new(mean_learner, glm_learner, rf_learner)
# CV folds are controlled in the sl3_Task object; we can lower the number of
# folds simply by specifying this in creating the Task
task <- sl3_Task$new(
data = cpp,
covariates = covars,
outcome = "haz",
folds = 5L
)
# now, let's fit the Super Learner with just 5-fold CV, then get predictions
sl_fit <- sl$train(task)
sl_preds <- sl_fit$predict()
The folds
argument to sl3_Task
supports both integers (for V-fold CV) and all of the CV schemes supported in the origami
package. To see the full list, query ?fold_funs
from within R
or take a look at origami
’s online documentation.
Properties supported by sl3
learners are presented in the following table:
binomial | categorical | continuous | cv | density | h2o | ids | importance | offset | preprocessing | sampling | screener | timeseries | weights | wrapper | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Lrnr_arima | x | x | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_bartMachine | √ | x | √ | x | x | x | x | x | x | x | x | x | x | x | x |
Lrnr_bayesglm | √ | x | √ | x | x | x | x | x | √ | x | x | x | x | √ | x |
Lrnr_bilstm | x | x | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_bound | √ | √ | √ | x | x | x | x | x | x | x | x | x | x | √ | √ |
Lrnr_caret | √ | √ | √ | x | x | x | x | x | x | x | x | x | x | x | √ |
Lrnr_cv | x | x | x | √ | x | x | x | x | x | x | x | x | x | x | √ |
Lrnr_cv_selector | √ | √ | √ | x | x | x | x | x | x | x | x | x | x | √ | √ |
Lrnr_dbarts | √ | x | √ | x | x | x | x | x | x | x | x | x | x | √ | x |
Lrnr_define_interactions | x | x | x | x | x | x | x | x | x | √ | x | x | x | x | x |
Lrnr_density_discretize | x | x | x | x | √ | x | x | x | x | x | x | x | x | x | x |
Lrnr_density_hse | x | x | x | x | √ | x | x | x | x | x | x | x | x | x | x |
Lrnr_density_semiparametric | x | x | x | x | √ | x | x | x | x | x | √ | x | x | x | x |
Lrnr_earth | √ | x | √ | x | x | x | x | x | x | x | x | x | x | x | x |
Lrnr_expSmooth | x | x | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_ga | √ | √ | √ | x | x | x | x | x | √ | x | x | x | x | √ | x |
Lrnr_gam | √ | x | √ | x | x | x | x | x | x | x | x | x | x | x | x |
Lrnr_gbm | √ | x | √ | x | x | x | x | x | x | x | x | x | x | x | x |
Lrnr_glm | √ | x | √ | x | x | x | x | x | √ | x | x | x | x | √ | x |
Lrnr_glm_fast | √ | x | √ | x | x | x | x | x | √ | x | x | x | x | √ | x |
Lrnr_glm_semiparametric | √ | x | √ | x | x | x | x | x | x | x | x | x | x | √ | x |
Lrnr_glmnet | √ | √ | √ | √ | x | x | √ | x | x | x | x | x | x | √ | x |
Lrnr_glmtree | √ | x | √ | x | x | x | x | x | √ | x | x | x | x | √ | x |
Lrnr_grf | √ | √ | √ | x | x | x | x | x | x | x | x | x | x | √ | x |
Lrnr_grfcate | √ | √ | √ | x | x | x | x | x | x | x | x | x | x | √ | x |
Lrnr_gru_keras | √ | √ | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_gts | x | x | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_h2o_glm | √ | √ | √ | x | x | √ | x | x | √ | x | x | x | x | √ | x |
Lrnr_h2o_grid | √ | √ | √ | x | x | √ | x | x | √ | x | x | x | x | √ | x |
Lrnr_hal9001 | √ | x | √ | √ | x | x | √ | x | x | x | x | x | x | √ | x |
Lrnr_haldensify | x | x | x | x | √ | x | x | x | x | x | x | x | x | x | x |
Lrnr_HarmonicReg | x | x | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_hts | x | x | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_independent_binomial | x | √ | x | x | x | x | x | x | x | x | x | x | x | x | x |
Lrnr_lightgbm | √ | √ | √ | x | x | x | x | √ | √ | x | x | x | x | √ | x |
Lrnr_lstm_keras | √ | √ | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_mean | √ | √ | √ | x | x | x | x | x | √ | x | x | x | x | √ | x |
Lrnr_multiple_ts | x | x | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_multivariate | x | √ | x | x | x | x | x | x | x | x | x | x | x | x | x |
Lrnr_nnet | √ | √ | √ | x | x | x | x | x | x | x | x | x | x | √ | x |
Lrnr_nnls | √ | x | √ | x | x | x | x | x | x | x | x | x | x | x | x |
Lrnr_optim | √ | √ | √ | x | x | x | x | x | √ | x | x | x | x | √ | x |
Lrnr_pca | x | x | x | x | x | x | x | x | x | √ | x | x | x | x | x |
Lrnr_pkg_SuperLearner | √ | x | √ | x | x | x | √ | x | x | x | x | x | x | √ | √ |
Lrnr_pkg_SuperLearner_method | √ | x | √ | x | x | x | x | x | x | x | x | x | x | √ | √ |
Lrnr_pkg_SuperLearner_screener | √ | x | √ | x | x | x | √ | x | x | x | x | x | x | √ | √ |
Lrnr_polspline | √ | √ | √ | x | x | x | x | x | x | x | x | x | x | √ | x |
Lrnr_pooled_hazards | x | √ | x | x | x | x | x | x | x | x | x | x | x | x | x |
Lrnr_randomForest | √ | √ | √ | x | x | x | x | √ | x | x | x | x | x | √ | x |
Lrnr_ranger | √ | √ | √ | x | x | x | x | √ | x | x | x | x | x | √ | x |
Lrnr_revere_task | x | x | x | √ | x | x | x | x | x | x | x | x | x | x | √ |
Lrnr_rpart | √ | √ | √ | x | x | x | x | x | x | x | x | x | x | √ | x |
Lrnr_rugarch | x | x | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_screener_augment | x | x | x | x | x | x | x | x | x | x | x | √ | x | x | x |
Lrnr_screener_coefs | x | x | x | x | x | x | x | x | x | x | x | √ | x | x | x |
Lrnr_screener_correlation | √ | √ | √ | x | x | x | x | x | x | x | x | √ | x | x | x |
Lrnr_screener_importance | x | x | x | x | x | x | x | x | x | x | x | √ | x | x | x |
Lrnr_sl | x | x | x | √ | x | x | x | x | x | x | x | x | x | x | √ |
Lrnr_solnp | √ | √ | √ | x | x | x | x | x | √ | x | x | x | x | √ | x |
Lrnr_solnp_density | x | x | x | x | √ | x | x | x | x | x | x | x | x | x | x |
Lrnr_stratified | √ | x | √ | x | x | x | x | x | x | x | x | x | x | x | √ |
Lrnr_subset_covariates | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x |
Lrnr_svm | √ | √ | √ | x | x | x | x | x | x | x | x | x | x | x | x |
Lrnr_ts_weights | x | x | x | √ | x | x | x | x | x | x | x | x | x | x | √ |
Lrnr_tsDyn | x | x | √ | x | x | x | x | x | x | x | x | x | √ | x | x |
Lrnr_xgboost | √ | √ | √ | x | x | x | x | √ | √ | x | x | x | x | √ | x |
Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.
After using the sl3
R package, please cite the following:
@software{coyle2021sl3-rpkg,
= {Coyle, Jeremy R and Hejazi, Nima S and Malenica, Ivana and
author
Phillips, Rachael V and Sofrygin, Oleg},= {{sl3}: Modern Pipelines for Machine Learning and {Super
title
Learning}},= {2021},
year = {\url{https://github.com/tlverse/sl3}},
howpublished = {{R} package version 1.4.2},
note = {https://doi.org/10.5281/zenodo.1342293},
url = {10.5281/zenodo.1342293}
doi }
© 2017-2021 Jeremy R. Coyle, Nima S. Hejazi, Ivana Malenica, Rachael V. Phillips, Oleg Sofrygin
The contents of this repository are distributed under the GPL-3 license. See file LICENSE
for details.