Conditional Density Estimation with the Highly Adaptive LASSO

Format

An R6Class object inheriting from Lrnr_base.

Value

A learner object inheriting from Lrnr_base with methods for training and prediction. For a full list of learner functionality, see the complete documentation of Lrnr_base.

Parameters

  • grid_type = "equal_range": A character indicating the strategy to be used in creating bins along the observed support of A. For bins of equal range, use "equal_range"; consult the documentation of cut_interval for further information. To ensure each bin has the same number of observations, use "equal_mass"; consult the documentation of cut_number for details. The default is "equal_range" since this has been found to provide better performance in simulation experiments; however, both types may be specified (i.e., c("equal_range", "equal_mass")) together, in which case cross-validation will be used to select the optimal binning strategy.

  • n_bins = c(3, 5): This numeric value indicates the number of bins into which the support of A is to be divided. As with grid_type, multiple values may be specified, in which cross-validation will be used to select the optimal number of bins.

  • lambda_seq = exp(seq(-1, -13, length = 1000L)): A numeric sequence of regularization parameter values of Lasso regression, which are passed to fit_hal via its argument lambda, itself passed to glmnet.

  • trim_dens = 1/sqrt(n): A numeric giving the minimum allowed value of the resultant density predictions. Any predicted density values below this tolerance threshold are set to the indicated minimum. The default is to use the inverse of the square root of the sample size of the prediction set, i.e., 1/sqrt(n); another notable choice is 1/sqrt(n)/log(n). If there are observations in the prediction set with values of new_A outside of the support of the training set, their predictions are similarly truncated.

  • ...: Other arguments to be passed directly to haldensify. See its documentation for details.

Examples

library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:data.table’:
#> 
#>     between, first, last
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
data(cpp_imputed)
covars <- c("parity", "sexn")
outcome <- "haz"

# create task
task <- cpp_imputed %>%
  slice(seq(1, nrow(.), by = 3)) %>%
  filter(agedays == 1) %>%
  sl3_Task$new(
    covariates = covars,
    outcome = outcome
  )

# instantiate the learner
hal_dens <- Lrnr_haldensify$new(
  grid_type = "equal_range",
  n_bins = c(3, 5),
  lambda_seq = exp(seq(-1, -13, length = 100))
)

# fit and predict densities
hal_dens_fit <- hal_dens$train(task)
hal_dens_preds <- hal_dens_fit$predict()