Topic 11 Inverse Probability Weighting
Pre-class work
Videos/slides
Learning Goals
- Verify properties of inverse probability weighting for estimating average causal effects using simulation
Exercises
You can download a template RMarkdown file to start from here.
Exercise 1: Simulate
Simulate data from the causal graph below where \(Z\), \(W\), and \(A\) are binary, and \(Y\) is quantitative. Use the model below for your structural equation for \(A\), and store your data in a dataset object called sim_data
.
\[ \log(\mathrm{odds}(A = 1)) = -1 + 0.4 Z + 0.4 W + 0.9 Z*W \]
Exercise 2: Intervene
Based on your SEM, what do you expect to be the average causal effect? Modify your simulation to simulate an intervention where all are treated and all are untreated to verify that this is the true average causal effect.
Exercise 3: Inverse probability weighting
Here, we’ll check the performance of inverse probability weighting for estimating the average causal effect.
In practice, we don’t know the true SEM underlying the data, so we need to estimate the appropriate weights from the data. This starts from estimating the probability of receiving treatment conditional on appropriate variables. (This probability is called the propensity score.) Fit an appropriate propensity score model called
ps_mod
.The code below uses your fitted
ps_mod
to compute IP weights.- Look up the
mutate()
andcase_when()
functions fromdplyr
by entering?function_name
in the Console. predict(logistic_mod, newdata = data_to_make_predictions_for, type = "response")
is used to predict probabilities from a logistic regression model. (Withouttype = "response"
, log-odds are computed.)- Add comments to this code to document these different pieces, and check in with the instructor if you have questions.
sim_data %>% sim_data <- mutate( ps = predict(ps_mod, newdata = sim_data, type = "response"), ip_weight = case_when( ==1 ~ 1/ps, A==0 ~ 1/(1-ps) A ) )
- Look up the
Fit an ordinary regression model
Y ~ A
that ignores the IP weights. Is the estimated ACE what you expected?Incorporate the IP weights into your analysis by modifying your model from part c as below. Is the estimated ACE what you expected?
lm(..., data = ..., weights = ip_weight)
Note: Using weights = ...
in glm(..., family = "binomial")
does not work exactly the way we want it to. (It works as we want for lm()
.) Going forward, we will use a specialized package (the survey
package) for dealing with weights.
Exercise 4: Balance checking
What is the key differentiator between regression and inverse probability weighting?
Let’s verify that unique property of IP weighting. Make plots to show the relationship between \(Z\) and \(A\) and between \(W\) and \(A\) in the original, unweighted data.
Modify your plot to incorporate the IP weights by adding the following to
aes()
. What property of IP weighting does this show?aes(..., weight = ip_weight)
Extra!
If you’ve taken Statistical Machine Learning, what connections can you make between ideas about predictive modeling and propensity score estimation? If you haven’t taken that course but are curious, feel free to ask the instructor!