Process data to account for missingness in preparation for TMLE

process_missing(data, node_list, complete_nodes = c("A", "Y"),
impute_nodes = NULL, max_p_missing = 0.5)

## Arguments

data, data.table, containing the missing variables list, what variables comprise each node character vector, nodes we must observe character vector, nodes we will impute numeric, what proportion of missing is tolerable? Beyond that, the variable will be dropped from the analysis

## Value

list containing the following elements:

• data, the updated dataset

• node_list, the updated list of nodes

• n_dropped, the number of observations dropped

• dropped_cols, the variables dropped due to excessive missingness

## Details

Rows where there is missingness in any of the complete_nodes will be dropped. Then, missingness will be median-imputed for the variables in the impute_nodes. Indicator variables of missingness will be generated for these nodes.

Then covariates will be processed as follows:

1. any covariate with more than max_p_missing missingness will be dropped

2. indicators of missingness will be generated

3. missing values will be median-imputed