Identifying causal effects with causal graphs
Goals
- Explain how d-separation and causal/noncausal paths relate to identification of causal effects.
- Apply d-separation to block noncausal paths in causal DAGs with and without unobserved variables.
- Apply strategies to deal with identification problems caused by unobserved variables.
- Differentiate confounding and selection bias in terms of graph structure and how they arise in applied studies.
Review
We’ll review pre-class material via the slides here.
Exercises
Exercise 1
For each of the causal graphs below, identify the set of variables needed to block noncausal paths (if possible) between treatment \(A\) and outcome \(Y\). Any \(U\) variables displayed in the graphs are unobserved/unmeasured.
Exercise 2
Historically, people have tried to create definitions for confounders by listing criteria that purely rely on associations. For example:
A confounder must:
1. Be associated with treatment and outcome
2. Not be caused by treatment
Using the causal graph below, explain why this is not a good definition for a confounder.
Exercise 3
First, think through the relationships depicted in the causal graphs below and whether they make sense. These are intended to reflect a range of scenarios for why people drop out of studies.
Then for each of the graphs, identify the set of variables that would block noncausal paths between the treatment \(A\) and outcome \(Y\). (\(U\) and \(W\) are unmeasured.) Check your answers to one of the graphs using DAGitty.
Exercise 4
In this exercise, we’ll consider how causal graphs can inform study design. (Inspired by a 1970s study on the relationship between estrogen use and endometrial cancer.)
Researchers have noticed a consistent association between use of a certain drug and disease. Research groups debated two hypotheses:
- The drug does cause disease.
- The drug doesn’t actually cause disease but leads to a side effect, leading to more frequent doctor visits, leading to increased diagnosis of existing disease.
The following study plan was proposed: restrict the study only to those with side effects and compare disease rates in drug-users and non-users. In this way, all participants have the same chance of being diagnosed.
The following causal graphs correspond to the two hypotheses:
(The graphs don’t show confounders of the drug-true disease relationship for compactness. We can assume that these have already been adjusted for.)
Study design 1
Consider the study proposal above: restrict analysis to those with side effects.
Before looking at the causal graphs: does the rationale for this study design make sense? Why did researchers want to only look at patients with side effects?
Under this study design, the researchers were expecting that if Hypothesis 1 were correct (the drug does cause disease), they would find an association between drug use and diagnosed disease. They expected that if Hypothesis 2 were correct (the drug does NOT cause disease), they would find NO association between drug use and diagnosed disease.
- Are these expectations correct? Explain in light of the causal graphs.
Based on your answer above, is this an effective study design for the research questions of interest? That is, can this study proposal distinguish between the two hypotheses?
Study design 2
Consider another study proposal: ensure that everyone is screened for disease frequently, and we don’t restrict our analysis to only those with side effects.
What arrow can be removed as a result of this study design? (It might help to draw an updated version of DAGs 1 and 2 with this arrow removed.)
Under this study design, the researchers had the same expectations: if Hypothesis 1 were correct, they would find an association between drug use and diagnosed disease. If Hypothesis 2 were correct, they would find NO association between drug use and diagnosed disease.
- Are these expectations correct? Explain in light of the causal graphs.
Based on your answer above, is this an effective study design for the research questions of interest? That is, can this study proposal distinguish between the two hypotheses?