Process data to account for missingness in preparation for TMLE
process_missing( data, node_list, complete_nodes = c("A", "Y"), impute_nodes = NULL, max_p_missing = 0.5 )
data, |
|
---|---|
node_list, |
|
complete_nodes, |
|
impute_nodes, |
|
max_p_missing, |
|
list
containing the following elements:
data
, the updated dataset
node_list
, the updated list of nodes
n_dropped
, the number of observations dropped
dropped_cols
, the variables dropped due to excessive missingness
Rows where there is missingness in any of the complete_nodes
will be
dropped. Then, missingness will be median-imputed for the variables in the impute_nodes
.
Indicator variables of missingness will be generated for these nodes.
Then covariates will be processed as follows:
any covariate with more than max_p_missing
missingness will be dropped
indicators of missingness will be generated
missing values will be median-imputed