This learner implements Generalized Random Forests, using the grf package. This is a pluggable package for forest-based statistical estimation and inference. GRF currently provides non-parametric methods for least-squares regression, quantile regression, and treatment effect estimation (optionally using instrumental variables). Current implementation trains a regression forest that can be used to estimate quantiles of the conditional distribution of (Y|X=x).
R6Class
object.
Learner object with methods for training and prediction. See
Lrnr_base
for documentation on learners.
num.trees = 2000
Number of trees grown in the forest. NOTE: Getting accurate confidence intervals generally requires more trees than getting accurate predictions.
quantiles = c(0.1, 0.5, 0.9)
Vector of quantiles used to calibrate the forest.
regression.splitting = FALSE
Whether to use regression splits
when growing trees instead of specialized splits based on the quantiles
(the default). Setting this flag to TRUE
corresponds to the
approach to quantile forests from Meinshausen (2006).
clusters = NULL
Vector of integers or factors specifying which cluster each observation corresponds to.
equalize.cluster.weights = FALSE
If FALSE
, each unit
is given the same weight (so that bigger clusters get more weight). If
TRUE
, each cluster is given equal weight in the forest. In this
case, during training, each tree uses the same number of observations from
each drawn cluster: If the smallest cluster has K units, then when we
sample a cluster during training, we only give a random K elements of the
cluster to the tree-growing procedure. When estimating average treatment
effects, each observation is given weight 1/cluster size, so that the
total weight of each cluster is the same.
sample.fraction = 0.5
Fraction of the data used to build each
tree. NOTE: If honesty = TRUE
, these subsamples will further be cut
by a factor of honesty.fraction.
.
mtry = NULL
Number of variables tried for each split. By default, this is set based on the dimensionality of the predictors.
min.node.size = 5
A target for the minimum number of
observations in each tree leaf. Note that nodes with size smaller than
min.node.size
can occur, as in the randomForest package.
honesty = TRUE
Whether or not honest splitting (i.e., sub-sample splitting) should be used.
alpha = 0.05
A tuning parameter that controls the maximum imbalance of a split.
imbalance.penalty = 0
A tuning parameter that controls how harshly imbalanced splits are penalized.
num.threads = 1
Number of threads used in training. If set to
NULL
, the software automatically selects an appropriate amount.
quantiles_pred
Vector of quantiles used to predict. This can be different than the vector of quantiles used for training.
Individual learners have their own sets of parameters. Below is a list of shared parameters, implemented by Lrnr_base
, and shared
by all learners.
covariates
A character vector of covariates. The learner will use this to subset the covariates for any specified task
outcome_type
A variable_type
object used to control the outcome_type used by the learner. Overrides the task outcome_type if specified
...
All other parameters should be handled by the invidual learner classes. See the documentation for the learner class you're instantiating
Other Learners:
Custom_chain
,
Lrnr_HarmonicReg
,
Lrnr_arima
,
Lrnr_bartMachine
,
Lrnr_base
,
Lrnr_bayesglm
,
Lrnr_bilstm
,
Lrnr_caret
,
Lrnr_cv_selector
,
Lrnr_cv
,
Lrnr_dbarts
,
Lrnr_define_interactions
,
Lrnr_density_discretize
,
Lrnr_density_hse
,
Lrnr_density_semiparametric
,
Lrnr_earth
,
Lrnr_expSmooth
,
Lrnr_gam
,
Lrnr_ga
,
Lrnr_gbm
,
Lrnr_glm_fast
,
Lrnr_glm_semiparametric
,
Lrnr_glmnet
,
Lrnr_glmtree
,
Lrnr_glm
,
Lrnr_grfcate
,
Lrnr_gru_keras
,
Lrnr_gts
,
Lrnr_h2o_grid
,
Lrnr_hal9001
,
Lrnr_haldensify
,
Lrnr_hts
,
Lrnr_independent_binomial
,
Lrnr_lightgbm
,
Lrnr_lstm_keras
,
Lrnr_mean
,
Lrnr_multiple_ts
,
Lrnr_multivariate
,
Lrnr_nnet
,
Lrnr_nnls
,
Lrnr_optim
,
Lrnr_pca
,
Lrnr_pkg_SuperLearner
,
Lrnr_polspline
,
Lrnr_pooled_hazards
,
Lrnr_randomForest
,
Lrnr_ranger
,
Lrnr_revere_task
,
Lrnr_rpart
,
Lrnr_rugarch
,
Lrnr_screener_augment
,
Lrnr_screener_coefs
,
Lrnr_screener_correlation
,
Lrnr_screener_importance
,
Lrnr_sl
,
Lrnr_solnp_density
,
Lrnr_solnp
,
Lrnr_stratified
,
Lrnr_subset_covariates
,
Lrnr_svm
,
Lrnr_tsDyn
,
Lrnr_ts_weights
,
Lrnr_xgboost
,
Pipeline
,
Stack
,
define_h2o_X()
,
undocumented_learner
# load example data
data(cpp_imputed)
# create sl3 task
task <- sl3_Task$new(
cpp_imputed,
covariates = c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs"),
outcome = "haz"
)
# train grf learner and make predictions
lrnr_grf <- Lrnr_grf$new(seed = 123)
lrnr_grf_fit <- lrnr_grf$train(task)
lrnr_grf_pred <- lrnr_grf_fit$predict()