Topic 10 Graphical Structure of Mediation

Learning Goals

Understand scientific motivations for mediation analysis and pose research questions that can be investigated by direct and indirect effects.
Describe how regression models can be used to estimate direct and indirect effects.
Apply d-separation ideas to understand the graphical reasoning behind the confounding assumptions needed to identify direct and indirect effects.

Warm-up

In your groups, discuss responses to the video questions corresponding to the Mediation Analysis video.

Exercises

Background

Consider the following causal diagram representing variables important in mediation analysis:

Consider the following 4 assumptions:

No unmeasured confounding of the treatment-outcome relationship ( $A$ and $Y$ ).
No unmeasured confounding of the mediator-outcome relationship ( $M$ and $Y$ ).
No unmeasured confounding of the treatment-mediator relationship ( $A$ and $M$ ).
No confounder of the mediator-outcome relationship is affected by treatment (arrows from $A$ to $C_2$ ).

When these 4 assumptions hold, the CDE, NDE, and NIE are identifiable, and we can use models such as the regression models below to estimate them. ( $C = \{C_1, C_2, C_3\}$ )

$E[Y\mid A, M, C] = \theta_0 + \theta_1 A + \theta_2 M + \theta_3 AM + \mathbf{\theta_4'C}$ $E[M\mid A, C] = \beta_0 + \beta_1 A + \mathbf{\beta_2'C}$

Identifiability: A quantity (here, the different direct and indirect effects) is identifiable if the true value is able to be learned from an infinite amount of data. In causal inference, threats to identifiability are usually the result of unmeasured variables that create undesired non-causal paths.

In the later exercises, it will be helpful to refer to the following list of paths from $A$ to $Y$ :

## Path 1:   A <- C1 -> Y  
## Path 2:   A -> Y  
## Path 3:   A <- C3 -> M -> Y  
## Path 4:   A <- C3 -> M <- C2 -> Y  
## Path 5:   A -> M -> Y  
## Path 6:   A -> M <- C2 -> Y  
## Path 7:   A -> C2 -> Y  
## Path 8:   A -> C2 -> M -> Y

Exercise 1

Assumptions 1 and 2 are needed in order to identify the controlled direct effect (CDE). We will use the above DAG to understand why.

First argue that drawing a box around $M$ in the DAG is relevant here.
Using d-separation ideas, argue why Assumptions 1 and 2 must hold.
Using d-separation ideas, argue why it is not necessary for Assumptions 3 and 4 to hold if 1 and 2 hold.

Exercise 2

In addition to Assumptions 1 and 2, Assumptions 3 and 4 are also needed to identify the natural effects (the NDE and NIE).

Using d-separation ideas, why must Assumption 1 still hold to identify the NDE and NIE?
Now let’s look at Assumption 2.
- Argue that the same d-separation reasoning from Exercise 1 applies for understanding why Assumption 2 must hold to identify the NDE.
- The natural indirect effect can be viewed as a composition of the $A$ to $M$ effect and the $M$ to $Y$ effect. Given this, why must Assumption 2 hold?
Now let’s look at Assumption 3. How do the two natural effects differ from the controlled direct effect? Given this, why does unmeasured treatment-mediator confounding pose a concern?

Assumption 4 is harder to justify purely graphically, but if you are curious about it and the underlying proof, ask the instructor.

Exercise 3

We can use models such as the regression models below to estimate the CDE, NDE, and NIE. ( $C = \{C_1, C_2, C_3\}$ , and $\theta_4, \beta_2$ are vectors of coefficients.)

$E[Y\mid A, M, C] = \theta_0 + \theta_1 A + \theta_2 M + \theta_3 AM + \theta_4'C$ $E[M\mid A, C] = \beta_0 + \beta_1 A + \beta_2'C$

Given these models, show that the 3 effects are given by:

CDE: $\theta_1 + \theta_3 m$
NDE: $\theta_1 + \theta_3 (\beta_0 + \beta_2'C)$
NIE: $\beta_1(\theta_2 + \theta_3)$

Note: Next time, we’ll explore how this approach generalizes to more flexible ways to estimate mediation effects.