Estimation procedure for HAL, the Highly Adaptive LASSO

fit_hal(X, Y, X_unpenalized = NULL, degrees = NULL,
  fit_type = c("glmnet", "lassi"), n_folds = 10, use_min = TRUE,
  reduce_basis = NULL, family = c("gaussian", "binomial"),
  return_lasso = FALSE, return_x_basis = FALSE, basis_list = NULL,
  lambda = NULL, cv_select = TRUE, ..., yolo = TRUE)



An input matrix containing observations and covariates following standard conventions in problems of statistical learning.


A numeric vector of obervations of the outcome variable of interest, following standard conventions in problems of statistical learning.


An input matrix with the same format as X, that directly get appended into the design matrix (no basis expansion) and no L-1 penalization is placed on these covariates


The highest order of interaction terms for which the basis functions ought to be generated. The default (NULL) corresponds to generating basis functions for the full dimensionality of the input matrix.


The specific routine to be called when fitting the LASSO regression in a cross-validated manner. Choosing the glmnet option will result in a call to cv.glmnet while lassi will produce a (faster) call to a custom LASSO routine using the origami package.


Integer for the number of folds to be used when splitting the data for cross-validation. This defaults to 10 as this is the convention for v-fold cross-validation.


Determines which lambda is selected from cv.glmnet. TRUE corresponds to "lambda.min" and FALSE corresponds to "lambda.1se".


A numeric value bounded in the open interval (0,1) indicating the minimum proportion of 1's in a basis function column needed for the basis function to be included in the procedure to fit the Lasso. Any basis functions with a lower proportion of 1's than the specified cutoff will be removed. This argument defaults to NULL, in which case all basis functions are used in the lasso-fitting stage of the HAL algorithm.


A character corresponding to the error family for a generalized linear model. Options are limited to "gaussian" for fitting a standard general linear model and "binomial" for logistic regression.


A logical indicating whether or not to return the glmnet fit of the lasso model.


A logical indicating whether or not to return the matrix of (possibly reduced) basis functions used in the HAL lasso fit.


The full set of basis functions generated from the input data X (via a call to enumerate_basis). The dimensionality of this structure is dim = (n * 2^(d - 1)), where n is the number of observations and d is the number of columns in X.


A user-specified array of values of the lambda tuning parameter of the Lasso L1 regression. If NULL, cv.glmnet will be used to automatically select a CV-optimal value of this regularization parameter. If specified, the Lasso L1 regression model will be fit via glmnet, returning regularized coefficient values for each value in the input array.


A logical specifying whether the array of values specified should be passed to cv.glmnet in order to pick the optimal value (based on cross-validation) (when set to TRUE) or to simply fit along the sequence of values (or single value) using glmnet (when set to FALSE).


Other arguments passed to cv.glmnet. Please consult the documentation for glmnet for a full list of options.


A logical indicating whether to print one of a curated selection of quotes from the HAL9000 computer, from the critically acclaimed epic science-fiction film "2001: A Space Odyssey" (1968).


Object of class hal9001, containing a list of basis functions, a copy map, coefficients estimated for basis functions, and timing results (for assessing computational efficiency).


The procedure uses a custom C++ implementation to generate a design matrix consisting of basis functions corresponding to covariates and interactions of covariates and to remove duplicate columns of indicators. The LASSO regression is fit to this (usually) very wide matrix using either a custom implementation (based on the origami package) or by a call to cv.glmnet from the glmnet package.