Principal Component Analysis and Regression

This learner provides facilities for performing principal components analysis (PCA) to reduce the dimensionality of a data set to a pre-specified value. For further details, consult the documentation of prcomp from the core package stats. This learner object is primarily intended for use with other learners as part of a pre-processing pipeline.

Format

R6Class object.

Value

Learner object with methods for training and prediction. See Lrnr_base for documentation on learners.

Parameters

n_comp: A numeric value indicating the number of components to be produced as a result of the PCA dimensionality reduction. For convenience, this defaults to two (2) components.
center: A logical value indicating whether the input data matrix should be centered before performing PCA. This defaults to TRUE since that is the recommended practice. Consider consulting the documentation of prcomp for details.
scale.: A logical value indicating whether the input data matrix should be scaled (to unit variance) before performing PCA. Consider consulting the documentation of prcomp for details.
...: Other optional parameters to be passed to prcomp. Consider consulting the documentation of prcomp for details.

Common Parameters

Individual learners have their own sets of parameters. Below is a list of shared parameters, implemented by Lrnr_base, and shared by all learners.

covariates: A character vector of covariates. The learner will use this to subset the covariates for any specified task
outcome_type: A variable_type object used to control the outcome_type used by the learner. Overrides the task outcome_type if specified
...: All other parameters should be handled by the invidual learner classes. See the documentation for the learner class you're instantiating

Other Learners: Custom_chain, Lrnr_HarmonicReg, Lrnr_arima, Lrnr_bartMachine, Lrnr_base, Lrnr_bayesglm, Lrnr_bilstm, Lrnr_caret, Lrnr_cv_selector, Lrnr_cv, Lrnr_dbarts, Lrnr_define_interactions, Lrnr_density_discretize, Lrnr_density_hse, Lrnr_density_semiparametric, Lrnr_earth, Lrnr_expSmooth, Lrnr_gam, Lrnr_ga, Lrnr_gbm, Lrnr_glm_fast, Lrnr_glm_semiparametric, Lrnr_glmnet, Lrnr_glmtree, Lrnr_glm, Lrnr_grfcate, Lrnr_grf, Lrnr_gru_keras, Lrnr_gts, Lrnr_h2o_grid, Lrnr_hal9001, Lrnr_haldensify, Lrnr_hts, Lrnr_independent_binomial, Lrnr_lightgbm, Lrnr_lstm_keras, Lrnr_mean, Lrnr_multiple_ts, Lrnr_multivariate, Lrnr_nnet, Lrnr_nnls, Lrnr_optim, Lrnr_pkg_SuperLearner, Lrnr_polspline, Lrnr_pooled_hazards, Lrnr_randomForest, Lrnr_ranger, Lrnr_revere_task, Lrnr_rpart, Lrnr_rugarch, Lrnr_screener_augment, Lrnr_screener_coefs, Lrnr_screener_correlation, Lrnr_screener_importance, Lrnr_sl, Lrnr_solnp_density, Lrnr_solnp, Lrnr_stratified, Lrnr_subset_covariates, Lrnr_svm, Lrnr_tsDyn, Lrnr_ts_weights, Lrnr_xgboost, Pipeline, Stack, define_h2o_X(), undocumented_learner

Examples

set.seed(37912)

# load example data
ncomp <- 3
data(cpp_imputed)
covars <- c(
  "apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
  "sexn"
)
outcome <- "haz"

# create sl3 task
task <- sl3_Task$new(cpp_imputed, covariates = covars, outcome = outcome)

# define learners
glm_fast <- Lrnr_glm_fast$new(intercept = FALSE)
pca_sl3 <- Lrnr_pca$new(n_comp = ncomp, center = TRUE, scale. = TRUE)
pcr_pipe_sl3 <- Pipeline$new(pca_sl3, glm_fast)

# create stacks + train and predict
pcr_pipe_sl3_fit <- pcr_pipe_sl3$train(task)
pcr_pred <- pcr_pipe_sl3_fit$predict()