HAL Formula term: Generate a single term of the HAL basis
h(
...,
k = NULL,
s = NULL,
pf = 1,
monotone = c("none", "i", "d"),
. = NULL,
dot_args_as_string = FALSE,
X = NULL
)
Variables for which to generate multivariate interaction basis
function where the variables can be found in a matrix X
in a parent
environment/frame. Note, just like standard formula
objects, the
variables should not be characters (e.g. do h(W1,W2) not h("W1", "W2"))
h(W1,W2,W3) will generate three-way HAL basis functions between W1, W2, and
W3. It will not
generate the lower dimensional basis functions.
The number of knots for each univariate basis function used to
generate the tensor product basis functions. If a single value then this
value is used for the univariate basis functions for each variable.
Otherwise, this should be a variable named list that specifies for each
variable how many knots points should be used.
h(W1,W2,W3, k = list(W1 = 3, W2 = 2, W3=1))
is equivalent to first
binning the variables W1
, W2
and W3
into 3
, 2
and 1
unique
values and then calling h(W1,W2,W3)
. This coarsening of the data ensures
that fewer basis functions are generated, which can lead to substantial
computational speed-ups. If not provided and the variable num_knots
is in the parent environment, then s
will be set to
num_knots
`.
The smoothness_orders
for the basis functions. The possible
values are 0
for piece-wise constant zero-order splines or 1
for
piece-wise linear first-order splines. If not provided and the variable
smoothness_orders
is in the parent environment, then s
will
be set to smoothness_orders
.
A penalty.factor
value the generated basis functions that is
used by glmnet
in the LASSO penalization procedure. pf = 1
(default) is the standard penalization factor used by glmnet
and
pf = 0
means the generated basis functions are unpenalized.
Whether the basis functions should enforce monotonicity of
the interaction term. If \code{s} = 0
, this is monotonicity of the
function, and, if \code{s} = 1
, this is monotonicity of its derivative
(e.g., enforcing a convex fit). Set "none"
for no constraints, "i"
for
a monotone increasing constraint, and "d"
for a monotone decreasing
constraint. Using "i"
constrains the basis functions to have positive
coefficients in the fit, and "d"
constrains the basis functions to have
negative coefficients.
Just like with formula
, .
as in h(.)
or h(.,.)
is
treated as a wildcard variable that generates terms using all variables in
the data. The argument .
should be a character vector of variable
names that .
iterates over. Specifically,
h(., k=1, . = c("W1", "W2", "W3"))
is equivalent to
h(W1, k=1) + h(W2, k=1) + h(W3, k=1)
, and
h(., ., k=1, . = c("W1", "W2", "W3"))
is equivalent to
h(W1,W2, k=1) + h(W2,W3, k=1) + h(W1, W3, k=1)
Whether the arguments ...
are characters or
character vectors and should thus be evaluated directly. When TRUE
, the
expression h("W1", "W2") can be used.
An optional design matrix where the variables given in ...
can be found. Otherwise, X
is taken from the parent environment.