HAL Formula term: Generate a single term of the HAL basis

h(
  ...,
  k = NULL,
  s = NULL,
  pf = 1,
  monotone = c("none", "i", "d"),
  . = NULL,
  dot_args_as_string = FALSE,
  X = NULL
)

Arguments

...

Variables for which to generate multivariate interaction basis function where the variables can be found in a matrix X in a parent environment/frame. Note, just like standard formula objects, the variables should not be characters (e.g. do h(W1,W2) not h("W1", "W2")) h(W1,W2,W3) will generate three-way HAL basis functions between W1, W2, and W3. It will not generate the lower dimensional basis functions.

k

The number of knots for each univariate basis function used to generate the tensor product basis functions. If a single value then this value is used for the univariate basis functions for each variable. Otherwise, this should be a variable named list that specifies for each variable how many knots points should be used. h(W1,W2,W3, k = list(W1 = 3, W2 = 2, W3=1)) is equivalent to first binning the variables W1, W2 and W3 into 3, 2 and 1 unique values and then calling h(W1,W2,W3). This coarsening of the data ensures that fewer basis functions are generated, which can lead to substantial computational speed-ups. If not provided and the variable num_knots is in the parent environment, then s will be set to num_knots`.

s

The smoothness_orders for the basis functions. The possible values are 0 for piece-wise constant zero-order splines or 1 for piece-wise linear first-order splines. If not provided and the variable smoothness_orders is in the parent environment, then s will be set to smoothness_orders.

pf

A penalty.factor value the generated basis functions that is used by glmnet in the LASSO penalization procedure. pf = 1 (default) is the standard penalization factor used by glmnet and pf = 0 means the generated basis functions are unpenalized.

monotone

Whether the basis functions should enforce monotonicity of the interaction term. If \code{s} = 0, this is monotonicity of the function, and, if \code{s} = 1, this is monotonicity of its derivative (e.g., enforcing a convex fit). Set "none" for no constraints, "i" for a monotone increasing constraint, and "d" for a monotone decreasing constraint. Using "i" constrains the basis functions to have positive coefficients in the fit, and "d" constrains the basis functions to have negative coefficients.

.

Just like with formula, . as in h(.) or h(.,.) is treated as a wildcard variable that generates terms using all variables in the data. The argument . should be a character vector of variable names that . iterates over. Specifically, h(., k=1, . = c("W1", "W2", "W3")) is equivalent to h(W1, k=1) + h(W2, k=1) + h(W3, k=1), and h(., ., k=1, . = c("W1", "W2", "W3")) is equivalent to h(W1,W2, k=1) + h(W2,W3, k=1) + h(W1, W3, k=1)

dot_args_as_string

Whether the arguments ... are characters or character vectors and should thus be evaluated directly. When TRUE, the expression h("W1", "W2") can be used.

X

An optional design matrix where the variables given in ... can be found. Otherwise, X is taken from the parent environment.