Our causal contrast of interest:
Our assumptions:
G-methods: IPT weighting, standardization (parametric g-formula), g-estimation
Consider structural model:
Given all assumptions hold, we can estimate the desired contrast (parameters of the structural model) using outcome regression:
library(tidyverse)library(magrittr)# getting the datadata <- readr::read_csv(file = "https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/1268/20/nhefs.csv") %>% mutate( education = case_when( education == 1 ~ "8th grade or less", education == 2 ~ "HS dropout", education == 3 ~ "HS", education == 4 ~ "College dropout", education == 5 ~ "College or more", T ~ "missing" ) ) %>% mutate(across(.cols = c(sex, race, education, exercise, active), .fns = forcats::as_factor)) %>% drop_na(qsmk, sex, race, education, exercise, active, wt82)
# what's inside?data %>% select(qsmk, age, sex, race, education, wt71, smokeintensity, smokeyrs, exercise, active)
## # A tibble: 1,566 x 10## qsmk age sex race education wt71 smokeintensity smokeyrs exercise## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <fct> ## 1 0 42 0 1 8th grade or ~ 79.0 30 29 2 ## 2 0 36 0 0 HS dropout 58.6 20 24 0 ## 3 0 56 1 1 HS dropout 56.8 20 26 2 ## 4 0 68 0 1 8th grade or ~ 59.4 3 53 2 ## 5 0 40 0 0 HS dropout 87.1 20 19 1 ## 6 0 43 1 1 HS dropout 99 10 21 1 ## 7 0 56 1 0 HS 63.0 20 39 1 ## 8 0 29 1 0 HS 58.7 2 9 2 ## 9 0 51 0 0 HS dropout 64.9 25 37 2 ## 10 0 43 0 0 HS dropout 62.3 20 25 2 ## # ... with 1,556 more rows, and 1 more variable: active <fct>
# fit a regression of the outcome on the exposure and confoundersfit <- glm(data = data, formula = wt82_71 ~ qsmk + sex + race + age + I(age*age) + education + smokeintensity + I(smokeintensity*smokeintensity) + smokeyrs + I(smokeyrs*smokeyrs) + exercise + active + wt71 + I(wt71*wt71), family = gaussian())fit %>% broom::tidy(conf.int = T) %>% filter(term == "qsmk")
## # A tibble: 1 x 7## term estimate std.error statistic p.value conf.low conf.high## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 qsmk 3.46 0.438 7.90 5.36e-15 2.60 4.32
# Hernan: model without any product terms yielded the estimate 3.5 (95% confidence interval: 2.6, 4.3) kg.
When using IPT-weigting or g-estimation, we computed the exposure probability given the covariables Pr[A=1|L] at an individual level in the dataset
τ(L) is the propensity score
fit <- glm(qsmk ~ sex + race + age + I(age*age) + education + smokeintensity + I(smokeintensity*smokeintensity) + smokeyrs + I(smokeyrs*smokeyrs) + exercise + active + wt71 + I(wt71*wt71), data = data, family = binomial())data %<>% mutate( ps = predict(fit, type = "response", newdata = .))
data %>% ggplot(aes(x = ps, color = as.factor(qsmk), fill = as.factor(qsmk))) + geom_histogram(alpha = 0.2, bins = 30) + # facet_grid(rows = vars(qsmk)) + theme_minimal(base_size = 12)
data %>% ggplot(aes(x = ps, color = as.factor(qsmk), fill = as.factor(qsmk))) + geom_density(alpha = 0.2) + # facet_grid(rows = vars(qsmk)) + theme_minimal(base_size = 12)
data %>% group_by(qsmk) %>% summarise(min = min(ps), mean = mean(ps), median = median(ps), max = max(ps))
## # A tibble: 2 x 5## qsmk min mean median max## <dbl> <dbl> <dbl> <dbl> <dbl>## 1 0 0.0510 0.239 0.222 0.681## 2 1 0.0599 0.309 0.282 0.777
fit <- glm(data = data, formula = wt82_71 ~ qsmk + ps) %>% broom::tidy(conf.int = T)
fit %>% filter(term == "qsmk")
## # A tibble: 1 x 7## term estimate std.error statistic p.value conf.low conf.high## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 qsmk 3.45 0.460 7.52 9.41e-14 2.55 4.35
# Hernan: model without any product terms yielded the estimate 3.5 (95% confidence interval: 2.6, 4.3) kg.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |