h2o Model Definition

Definition of h2o type models. This function is for internal use only. This function uploads input data into an h2o.Frame, allowing the data to be subset to the task$X data.table by a smaller set of covariates if spec'ed in params.

This learner provides faster fitting procedures for generalized linear models by using the h2o package and the h2o.glm method. The h2o Platform fits GLMs in a computationally efficient manner. For details on the procedure, consult the documentation of the h2o package.

define_h2o_X(task, outcome_type = NULL)

Format

R6Class object.

Arguments

task: An object of type Lrnr_base as defined in this package.
outcome_type: An object of type Variable_Tyoe for use in formatting the outcome

Value

Learner object with methods for training and prediction. See Lrnr_base for documentation on learners.

Parameters

intercept=TRUE: If TRUE, and intercept term is included.
standardize=TRUE: Standardize covariates to have mean = 0 and SD = 1.
lambda=0: Lasso Parameter.
max_iterations=100: Maximum number of iterations.
ignore_const_columns=FALSE: If TRUE, drop constant covariate columns
missing_values_handling="Skip": How to handle missing values.
...: Other arguments passed to h2o.glm.

Common Parameters

Individual learners have their own sets of parameters. Below is a list of shared parameters, implemented by Lrnr_base, and shared by all learners.

covariates: A character vector of covariates. The learner will use this to subset the covariates for any specified task
outcome_type: A variable_type object used to control the outcome_type used by the learner. Overrides the task outcome_type if specified
...: All other parameters should be handled by the invidual learner classes. See the documentation for the learner class you're instantiating

Other Learners: Custom_chain, Lrnr_HarmonicReg, Lrnr_arima, Lrnr_bartMachine, Lrnr_base, Lrnr_bayesglm, Lrnr_bilstm, Lrnr_caret, Lrnr_cv_selector, Lrnr_cv, Lrnr_dbarts, Lrnr_define_interactions, Lrnr_density_discretize, Lrnr_density_hse, Lrnr_density_semiparametric, Lrnr_earth, Lrnr_expSmooth, Lrnr_gam, Lrnr_ga, Lrnr_gbm, Lrnr_glm_fast, Lrnr_glm_semiparametric, Lrnr_glmnet, Lrnr_glmtree, Lrnr_glm, Lrnr_grfcate, Lrnr_grf, Lrnr_gru_keras, Lrnr_gts, Lrnr_h2o_grid, Lrnr_hal9001, Lrnr_haldensify, Lrnr_hts, Lrnr_independent_binomial, Lrnr_lightgbm, Lrnr_lstm_keras, Lrnr_mean, Lrnr_multiple_ts, Lrnr_multivariate, Lrnr_nnet, Lrnr_nnls, Lrnr_optim, Lrnr_pca, Lrnr_pkg_SuperLearner, Lrnr_polspline, Lrnr_pooled_hazards, Lrnr_randomForest, Lrnr_ranger, Lrnr_revere_task, Lrnr_rpart, Lrnr_rugarch, Lrnr_screener_augment, Lrnr_screener_coefs, Lrnr_screener_correlation, Lrnr_screener_importance, Lrnr_sl, Lrnr_solnp_density, Lrnr_solnp, Lrnr_stratified, Lrnr_subset_covariates, Lrnr_svm, Lrnr_tsDyn, Lrnr_ts_weights, Lrnr_xgboost, Pipeline, Stack, undocumented_learner

Examples

library(h2o)
#> 
#> ----------------------------------------------------------------------
#> 
#> Your next step is to start H2O:
#>     > h2o.init()
#> 
#> For H2O package documentation, ask for help:
#>     > ??h2o
#> 
#> After starting H2O, you can use the Web UI at http://localhost:54321
#> For more information visit https://docs.h2o.ai
#> 
#> ----------------------------------------------------------------------
#> 
#> Attaching package: ‘h2o’
#> The following objects are masked from ‘package:data.table’:
#> 
#>     hour, month, week, year
#> The following objects are masked from ‘package:stats’:
#> 
#>     cor, sd, var
#> The following objects are masked from ‘package:base’:
#> 
#>     %*%, %in%, &&, apply, as.factor, as.numeric, colnames, colnames<-,
#>     ifelse, is.character, is.factor, is.numeric, log, log10, log1p,
#>     log2, round, signif, trunc, ||
suppressWarnings(h2o.init())
#> 
#> H2O is not running yet, starting it now...
#> 
#> Note:  In case of errors look at the following log files:
#>     /var/folders/7n/j5jj0p3s3jb5d59l41rbyyt40000gn/T//RtmpTiZ4lE/file10634425fbaa/h2o_Rachael_started_from_r.out
#>     /var/folders/7n/j5jj0p3s3jb5d59l41rbyyt40000gn/T//RtmpTiZ4lE/file1063459ad5806/h2o_Rachael_started_from_r.err
#> 
#> 
#> Starting H2O JVM and connecting: .... Connection successful!
#> 
#> R is connected to the H2O cluster: 
#>     H2O cluster uptime:         4 seconds 22 milliseconds 
#>     H2O cluster timezone:       America/Los_Angeles 
#>     H2O data parsing timezone:  UTC 
#>     H2O cluster version:        3.36.1.2 
#>     H2O cluster version age:    1 year, 2 months and 21 days !!! 
#>     H2O cluster name:           H2O_started_from_R_Rachael_jep458 
#>     H2O cluster total nodes:    1 
#>     H2O cluster total memory:   2.00 GB 
#>     H2O cluster total cores:    4 
#>     H2O cluster allowed cores:  4 
#>     H2O cluster healthy:        TRUE 
#>     H2O Connection ip:          localhost 
#>     H2O Connection port:        54321 
#>     H2O Connection proxy:       NA 
#>     H2O Internal Security:      FALSE 
#>     R Version:                  R version 4.2.0 (2022-04-22) 
#> 

# load example data
data(cpp_imputed)

# create sl3 task
task <- sl3_Task$new(
  cpp_imputed,
  covariates = c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs"),
  outcome = "haz"
)

# train h2o glm learner and make predictions
lrnr_h2o <- Lrnr_h2o_glm$new()
lrnr_h2o_fit <- lrnr_h2o$train(task)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
lrnr_h2o_pred <- lrnr_h2o_fit$predict()