`randomForest`

and order in decreasing order of
importance.`R/Lrnr_randomForest.R`

, `R/importance.R`

`importance.Rd`

Function that takes a cross-validated fit (i.e., cross-validated learner
that has already been trained on a task), which could be a cross-validated
single learner or super learner, and generates a risk-based variable
importance score for either each covariate or each group of covariates in
the task. This function outputs a `data.table`

, where each row
corresponds to the risk difference or the risk ratio between the following
two risks: the risk when a covariate (or group of covariates) is permuted or
removed, and the original risk (i.e., when all covariates are included as
they were in the observed data). A higher risk ratio/difference corresponds
to a more important covariate/group. A plot can be generated from the
returned `data.table`

by calling companion function
`importance_plot`

.

```
importance(fit, eval_fun = NULL, fold_number = "validation",
type = c("remove", "permute"), importance_metric = c("difference",
"ratio"), covariate_groups = NULL)
importance(fit, eval_fun = NULL, fold_number = "validation",
type = c("remove", "permute"), importance_metric = c("difference",
"ratio"), covariate_groups = NULL)
```

- fit
A trained cross-validated (CV) learner (such as a CV stack or super learner), from which cross-validated predictions can be generated.

- eval_fun
The evaluation function (risk or loss function) for evaluating the risk. Defaults vary based on the outcome type, matching defaults in

`default_metalearner`

. See`loss_functions`

and`risk_functions`

for options. Default is`NULL`

.- fold_number
The fold number to use for obtaining the predictions from the fit. Either a positive integer for obtaining predictions from a specific fold's fit;

`"full"`

for obtaining predictions from a fit on all of the data, or`"validation"`

(default) for obtaining cross-validated predictions, where the data used for training and prediction never overlaps across the folds. Note that if a positive integer or`"full"`

is supplied here then there will be overlap between the data used for training and validation, so`fold_number ="validation"`

is recommended.- type
Which method should be used to obscure the relationship between each covariate / covariate group and the outcome? When

`type`

is`"remove"`

(default), each covariate / covariate group is removed one at a time from the task; the cross-validated learner is refit to this modified task; and finally, predictions are obtained from this refit. When`type`

is`"permute"`

, each covariate / covariate group is permuted (sampled without replacement) one at a time, and then predictions are obtained from this modified data.- importance_metric
Either

`"ratio"`

or`"difference"`

(default). For each covariate / covariate group,`"ratio"`

returns the risk of the permuted/removed covariate / covariate group divided by observed/original risk (i.e., the risk with all covariates as they existed in the sample) and`"difference"`

returns the difference between the risk with the permuted/removed covariate / covariate group and the observed risk.- covariate_groups
Optional named list covariate groups which will invoke variable importance evaluation at the group-level, by removing/permuting all covariates in the same group together. If covariates in the task are not specified in the list of groups, then those covariates will be added as additional single-covariate groups.

A `data.table`

of variable importance for each covariate.

```
# define ML task
data(cpp_imputed)
covs <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs")
task <- sl3_Task$new(cpp_imputed, covariates = covs, outcome = "haz")
# build relatively fast learner library (not recommended for real analysis)
lasso_lrnr <- Lrnr_glmnet$new()
glm_lrnr <- Lrnr_glm$new()
ranger_lrnr <- Lrnr_ranger$new()
lrnrs <- c(lasso_lrnr, glm_lrnr, ranger_lrnr)
names(lrnrs) <- c("lasso", "glm", "ranger")
lrnr_stack <- make_learner(Stack, lrnrs)
# instantiate SL with default metalearner
sl <- Lrnr_sl$new(lrnr_stack)
sl_fit <- sl$train(task)
importance_result <- importance(sl_fit)
importance_result
#> covariate MSE_difference
#> 1: gagebrth 0.042704792
#> 2: mage 0.031846864
#> 3: meducyrs 0.030945845
#> 4: apgar1 0.013604514
#> 5: parity 0.004934802
#> 6: apgar5 -0.008147488
# importance with groups of covariates
groups <- list(
scores = c("apgar1", "apgar5"),
maternal = c("parity", "mage", "meducyrs")
)
importance_result_groups <- importance(sl_fit, covariate_groups = groups)
importance_result_groups
#> covariate_group MSE_difference
#> 1: scores 6.53175726
#> 2: maternal 0.14678546
#> 3: gagebrth 0.03660152
```