3 The WASH Benefits Example Dataset
The data come from a study of the effect of water quality, sanitation, hand washing, and nutritional interventions on child development in rural Bangladesh (WASH Benefits Bangladesh): a cluster-randomised controlled trial (Luby et al. 2018). The study enrolled pregnant women in their first or second trimester from the rural villages of Gazipur, Kishoreganj, Mymensingh, and Tangail districts of central Bangladesh, with an average of eight women per cluster. Groups of eight geographically adjacent clusters were block-randomised, using a random number generator, into six intervention groups (all of which received weekly visits from a community health promoter for the first 6 months and every 2 weeks for the next 18 months) and a double-sized control group (no intervention or health promoter visit). The six intervention groups were:
- chlorinated drinking water;
- improved sanitation;
- hand-washing with soap;
- combined water, sanitation, and hand washing;
- improved nutrition through counseling and provision of lipid-based nutrient supplements; and
- combined water, sanitation, handwashing, and nutrition.
In the workshop, we concentrate on child growth (size for age) as the outcome of interest. For reference, this trial was registered with ClinicalTrials.gov as NCT01590095.
library(tidyverse)
# read in data
dat <- read_csv("https://raw.githubusercontent.com/tlverse/tlverse-data/master/wash-benefits/washb_data.csv")
dat
#> # A tibble: 4,695 × 28
#> whz tr fracode month aged sex momage momedu momheight hfiacat Nlt18
#> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 0 Control N05265 9 268 male 30 Prima… 146. Food S… 3
#> 2 -1.16 Control N05265 9 286 male 25 Prima… 149. Modera… 2
#> 3 -1.05 Control N08002 9 264 male 25 Prima… 152. Food S… 1
#> 4 -1.26 Control N08002 9 252 female 28 Prima… 140. Food S… 3
#> 5 -0.59 Control N06531 9 336 female 19 Secon… 151. Food S… 2
#> # … with 4,690 more rows, and 17 more variables: Ncomp <dbl>, watmin <dbl>,
#> # elec <dbl>, floor <dbl>, walls <dbl>, roof <dbl>, asset_wardrobe <dbl>,
#> # asset_table <dbl>, asset_chair <dbl>, asset_khat <dbl>, asset_chouki <dbl>,
#> # asset_tv <dbl>, asset_refrig <dbl>, asset_bike <dbl>, asset_moto <dbl>,
#> # asset_sewmach <dbl>, asset_mobile <dbl>
For the purposes of this workshop, we we start by treating the data as independent and identically distributed (i.i.d.) random draws from a very large target population. We could, with available options, account for the clustering of the data (within sampled geographic units), but, for simplification, we avoid these details in these workshop presentations, although modifications of our methodology for biased samples, repeated measures, etc., are available.
We have 28 variables measured, of which 1 variable is set to be the outcome of
interest. This outcome, \(Y\), is the weight-for-height Z-score (whz
in dat
);
the treatment of interest, \(A\), is the randomized treatment group (tr
in
dat
); and the adjustment set, \(W\), consists simply of everything else. This
results in our observed data structure being \(n\) i.i.d. copies of \(O_i = (W_i, A_i, Y_i)\), for \(i = 1, \ldots, n\).
Using the skimr
package, we can
quickly summarize the variables measured in the WASH Benefits data set:
Name | dat |
Number of rows | 4695 |
Number of columns | 28 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 23 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
tr | 0 | 1 | 3 | 15 | 0 | 7 | 0 |
fracode | 0 | 1 | 2 | 6 | 0 | 20 | 0 |
sex | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
momedu | 0 | 1 | 12 | 15 | 0 | 3 | 0 |
hfiacat | 0 | 1 | 11 | 24 | 0 | 4 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
whz | 0 | 1.00 | -0.59 | 1.03 | -4.67 | -1.28 | -0.6 | 0.08 | 4.97 | ▁▆▇▁▁ |
month | 0 | 1.00 | 6.45 | 3.33 | 1.00 | 4.00 | 6.0 | 9.00 | 12.00 | ▇▇▅▇▇ |
aged | 0 | 1.00 | 266.32 | 52.17 | 42.00 | 230.00 | 266.0 | 303.00 | 460.00 | ▁▂▇▅▁ |
momage | 18 | 1.00 | 23.91 | 5.24 | 14.00 | 20.00 | 23.0 | 27.00 | 60.00 | ▇▇▁▁▁ |
momheight | 31 | 0.99 | 150.50 | 5.23 | 120.65 | 147.05 | 150.6 | 154.06 | 168.00 | ▁▁▆▇▁ |
Nlt18 | 0 | 1.00 | 1.60 | 1.25 | 0.00 | 1.00 | 1.0 | 2.00 | 10.00 | ▇▂▁▁▁ |
Ncomp | 0 | 1.00 | 11.04 | 6.35 | 2.00 | 6.00 | 10.0 | 14.00 | 52.00 | ▇▃▁▁▁ |
watmin | 0 | 1.00 | 0.95 | 9.48 | 0.00 | 0.00 | 0.0 | 1.00 | 600.00 | ▇▁▁▁▁ |
elec | 0 | 1.00 | 0.60 | 0.49 | 0.00 | 0.00 | 1.0 | 1.00 | 1.00 | ▆▁▁▁▇ |
floor | 0 | 1.00 | 0.11 | 0.31 | 0.00 | 0.00 | 0.0 | 0.00 | 1.00 | ▇▁▁▁▁ |
walls | 0 | 1.00 | 0.72 | 0.45 | 0.00 | 0.00 | 1.0 | 1.00 | 1.00 | ▃▁▁▁▇ |
roof | 0 | 1.00 | 0.99 | 0.12 | 0.00 | 1.00 | 1.0 | 1.00 | 1.00 | ▁▁▁▁▇ |
asset_wardrobe | 0 | 1.00 | 0.17 | 0.37 | 0.00 | 0.00 | 0.0 | 0.00 | 1.00 | ▇▁▁▁▂ |
asset_table | 0 | 1.00 | 0.73 | 0.44 | 0.00 | 0.00 | 1.0 | 1.00 | 1.00 | ▃▁▁▁▇ |
asset_chair | 0 | 1.00 | 0.73 | 0.44 | 0.00 | 0.00 | 1.0 | 1.00 | 1.00 | ▃▁▁▁▇ |
asset_khat | 0 | 1.00 | 0.61 | 0.49 | 0.00 | 0.00 | 1.0 | 1.00 | 1.00 | ▅▁▁▁▇ |
asset_chouki | 0 | 1.00 | 0.78 | 0.41 | 0.00 | 1.00 | 1.0 | 1.00 | 1.00 | ▂▁▁▁▇ |
asset_tv | 0 | 1.00 | 0.30 | 0.46 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▃ |
asset_refrig | 0 | 1.00 | 0.08 | 0.27 | 0.00 | 0.00 | 0.0 | 0.00 | 1.00 | ▇▁▁▁▁ |
asset_bike | 0 | 1.00 | 0.32 | 0.47 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▃ |
asset_moto | 0 | 1.00 | 0.07 | 0.25 | 0.00 | 0.00 | 0.0 | 0.00 | 1.00 | ▇▁▁▁▁ |
asset_sewmach | 0 | 1.00 | 0.06 | 0.25 | 0.00 | 0.00 | 0.0 | 0.00 | 1.00 | ▇▁▁▁▁ |
asset_mobile | 0 | 1.00 | 0.86 | 0.35 | 0.00 | 1.00 | 1.0 | 1.00 | 1.00 | ▁▁▁▁▇ |
A convenient summary of the relevant variables is given just above, complete with a small visualization describing the marginal characteristics of each covariate. Note that the asset variables reflect socio-economic status of the study participants. Notice also the uniform distribution of the treatment groups (with twice as many controls); this is, of course, by design.