The Super Learner Algorithm

Learner that encapsulates the Super Learner algorithm. Fits metalearner on cross-validated predictions from learners. Then forms a pipeline with the learners.

Format

An R6Class object inheriting from Lrnr_base.

Value

A learner object inheriting from Lrnr_base with methods for training and prediction. For a full list of learner functionality, see the complete documentation of Lrnr_base.

Parameters

learners: The "library" of user-specified algorithms for the super learner to consider as candidates.
metalearner = "default": The metalearner to be fit on c cross-validated predictions from the candidates. If "default", the default_metalearner is used to construct a metalearner based on the outcome_type of the training task.
cv_control = NULL: Optional list of arguments that will be used to define a specific cross-validation fold structure for fitting the super learner. Intended for use in a nested cross-validation scheme, such as cross-validated super learner (cv_sl) or when Lrnr_sl is considered in the list of candidate learners in another Lrnr_sl. Includes the arguments listed below, and any others to be passed to fold_funs:
- strata = NULL: Discrete covariate or outcome name to define stratified cross-validation folds. If NULL and if task$outcome_type$type is binary or categorical, then the default behavior is to consider stratified cross-validation, where the strata are defined with respect to the outcome. To override the default behavior, i.e., to not consider stratified cross-validation when strata = NULL and task$outcome_type$type is binary or categorical is not NULL, set strata = "none".
- cluster_by_id = TRUE: Logical to specify clustered cross-validation scheme according to id in task. Specifically, if task$nodes$id is not NULL and if cluster_by_id = TRUE (default) then task$nodes$id is used to define a clustered cross-validation scheme, so dependent units are placed together in the same training sets and validation set. To override the default behavior, i.e., to not consider clustered cross-validation when task$nodes$id is not NULL, set cluster_by_id = FALSE.
- fold_fun = NULL: A function indicating the origami cross-validation scheme to use, such as folds_vfold for V-fold cross-validation. See fold_funs for a list of possibilities. If NULL (default) and if other cv_control arguments are specified, e.g., V, strata or cluster_by_id, then the default behavior is to set fold_fun = origami::folds_vfold.
- ...: Other arguments to be passed to fold_fun, such as V for fold_fun = folds_vfold. See fold_funs for a list fold-function-specific possible arguments.
keep_extra = TRUE: Stores all sub-parts of the super learner computation. When FALSE, the resulting object has a memory footprint that is significantly reduced through the discarding of intermediary data structures.
verbose = NULL: Whether to print cv_control-related messages. Warnings and errors are always printed. When verbose = NULL, verbosity specified by option sl3.verbose will be used, and the default sl3.verbose option is FALSE. (Note: to turn on sl3.verbose option, set options("sl3.verbose" = TRUE).)
...: Any additional parameters that can be considered by Lrnr_base.

Other Learners: Custom_chain, Lrnr_HarmonicReg, Lrnr_arima, Lrnr_bartMachine, Lrnr_base, Lrnr_bayesglm, Lrnr_bilstm, Lrnr_caret, Lrnr_cv_selector, Lrnr_cv, Lrnr_dbarts, Lrnr_define_interactions, Lrnr_density_discretize, Lrnr_density_hse, Lrnr_density_semiparametric, Lrnr_earth, Lrnr_expSmooth, Lrnr_gam, Lrnr_ga, Lrnr_gbm, Lrnr_glm_fast, Lrnr_glm_semiparametric, Lrnr_glmnet, Lrnr_glmtree, Lrnr_glm, Lrnr_grfcate, Lrnr_grf, Lrnr_gru_keras, Lrnr_gts, Lrnr_h2o_grid, Lrnr_hal9001, Lrnr_haldensify, Lrnr_hts, Lrnr_independent_binomial, Lrnr_lightgbm, Lrnr_lstm_keras, Lrnr_mean, Lrnr_multiple_ts, Lrnr_multivariate, Lrnr_nnet, Lrnr_nnls, Lrnr_optim, Lrnr_pca, Lrnr_pkg_SuperLearner, Lrnr_polspline, Lrnr_pooled_hazards, Lrnr_randomForest, Lrnr_ranger, Lrnr_revere_task, Lrnr_rpart, Lrnr_rugarch, Lrnr_screener_augment, Lrnr_screener_coefs, Lrnr_screener_correlation, Lrnr_screener_importance, Lrnr_solnp_density, Lrnr_solnp, Lrnr_stratified, Lrnr_subset_covariates, Lrnr_svm, Lrnr_tsDyn, Lrnr_ts_weights, Lrnr_xgboost, Pipeline, Stack, define_h2o_X(), undocumented_learner

Examples

if (FALSE) {
data(cpp_imputed)
covs <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs")
task <- sl3_Task$new(cpp_imputed, covariates = covs, outcome = "haz")
# this is just for illustrative purposes, not intended for real applications
# of the super learner!
glm_lrn <- Lrnr_glm$new()
ranger_lrn <- Lrnr_ranger$new()
lasso_lrn <- Lrnr_glmnet$new()
eSL <- Lrnr_sl$new(learners = list(glm_lrn, ranger_lrn, lasso_lrn))
eSL_fit <- eSL$train(task)
# example with cv_control, where Lrnr_sl included as a candidate
eSL_nested5folds <- Lrnr_sl$new(
  learners = list(glm_lrn, ranger_lrn, lasso_lrn),
  cv_control = list(V = 5),
  verbose = FALSE
)
dSL <- Lrnr_sl$new(
  learners = list(glm_lrn, ranger_lrn, lasso_lrn, eSL_nested5folds),
  metalearner = Lrnr_cv_selector$new(loss_squared_error)
)
dSL_fit <- dSL$train(task)
# example with cv_control, where we use cross-validated super learner
cvSL_fit <- cv_sl(
  lrnr_sl = eSL_nested5folds, task = task, eval_fun = loss_squared_error
)
}

Format

Value

Parameters

See also

Examples