Topic 14 Sensitivity Analyses for Unmeasured Variables
Pre-class work
Videos/slides
Learning Goals
SENS1
: Evaluate the sensitivity of findings to data quality and propose appropriate sensitivity analyses for a research investigation.SENS2
: Conduct and communicate the results of a sensitivity analysis for unmeasured confounding.
Background
We will be implementing the approach for a sensitivity analysis proposed in the research article Assessing Sensitivity to Unmeasured Confounding Using a Simulated Potential Confounder by Carnegie, Harada, and Hill.
It looks like their package treatSens was taken off CRAN and is undergoing revisions. We will implement their approach for a sensitivity analysis for the particular case of quantitative treatment, outcome, and unmeasured confounder.
Summary of approach:
Identify set of variables \(Z\) that results in conditional exchangeability of the treatment groups.
Express the joint distribution of the data as below, with the assumption that \(U\) and \(Z\) are independent.
\[ \begin{align*} P(Y, A, U \mid Z) &= P_1(Y \mid A, U, Z) P_2(A \mid U, Z) P_3(U \mid Z) \\ &= P_1(Y \mid A, U, Z) P_2(A \mid U, Z) P_3(U) \end{align*} \]
- Express these components of the joint distribution as structural equations:
\[ Y \sim N(\beta_0 + \beta_1 A + \beta_2 Z + \beta_3 U, \sigma_Y^2) \] \[ A \sim N(\alpha_0 + \alpha_1 Z + \alpha_2 U, \sigma_A^2) \] \[ U \sim N(0, \sigma_U^2) \]
- Using probability theory, we can rewrite the joint distribution to express \(U\) as a function of \(A\) and \(Y\) (see paper).
- These formulas require the residuals and estimated standard deviation of residuals from the following models:
Y ~ A+Z
A ~ Z
- Key idea: We can use observed data simulate \(U\) such that it is consistent with the above structural equations.
- These formulas require the residuals and estimated standard deviation of residuals from the following models:
- Sensitivity parameters: \(\beta_3\) and \(\alpha_2\). In our code below, we will call these
assoc_Y
andassoc_A
respectively.
Exercises
You can download a template RMarkdown file to start from here.
Warm-up
Work to understand what the code below is doing. Clarify with the instructor as needed. You will be adapting this code to implement a sensitivity analysis in the context of the smoking-weight gain study that we’ve explored previously.
library(dplyr)
library(ggplot2)
function(.data, model_A, model_Y, assoc_A, assoc_Y) {
sensitivity_analysis <- nrow(.data)
n <-
# Obtain residuals with residuals()
# Obtain residual variances with sigma()
residuals(model_A)
res_A <- sigma(model_A)^2
res_var_A <- residuals(model_Y)
res_Y <- sigma(model_Y)^2
res_var_Y <-
# Compute the mean and variance of U given A and Y
(assoc_A/res_var_A)*res_A
mean_U_term1 <- (((res_var_A - assoc_A^2)*assoc_Y)/(res_var_A*res_var_Y))*res_Y
mean_U_term2 <- mean_U_term1 + mean_U_term2
mean_U <-
(res_var_A - assoc_A^2)/(res_var_A*res_var_Y)
var_U_term1 <- res_var_Y - assoc_Y^2 + ((assoc_A*assoc_Y)^2)/res_var_A
var_U_term2 <- var_U_term1*var_U_term2
var_U <-
# Simulate U and add it to the data
rnorm(n, mean = mean_U, sd = sqrt(var_U))
U <-$U <- U
.data
########################################################################
# The part below is the only part you need to change to implement
# the sensitivity analysis in a new context.
# Refit model to estimate the causal effect
lm(Y ~ A+C+U, data = .data)
updated_model <-# The names of the coefficients and confidence interval output rows
# are called "A" for the treatment variable A.
# This will change in a new dataset.
list(c(
estimate = unname(coefficients(updated_model)["A"]),
ci_95_lower = confint(updated_model)["A",1],
ci_95_upper = confint(updated_model)["A",2]
))
}
# Set up simulated example data
set.seed(451)
10000
n <- rnorm(n, 10, 1)
U <- rnorm(n, 10, 1)
C <- rnorm(n, 2+C+U, 2)
A <- rnorm(n, 10 + A + C + U, 10)
Y <-
data.frame(U, C, A, Y)
sim_data <-
# Begin the sensitivity analysis
# Fit required models for the sensitivity analysis
lm(A ~ C, data = sim_data)
mod_A <- lm(Y ~ A+C, data = sim_data)
mod_Y <-
# Set up degree of association between U and A and between U and Y
# The U->A associations have some constraints: we set up values
# for the U->A associations that are at most equal to the
# standard deviation of the residuals from the model for A.
seq(from = 0.01, to = sigma(mod_A), length.out = 10)
U_A_assocs <- seq(from = 0.5, to = 5, by = 0.5)
U_Y_assocs <-
# Form all combinations of the U->A and U->Y sensitivity parameters
expand.grid(U_A = U_A_assocs, U_Y = U_Y_assocs)
sens_data <-
# Run sensitivity analysis
sens_data %>%
sens_data <- group_by(U_A, U_Y) %>%
mutate(sens = sensitivity_analysis(sim_data, mod_A, mod_Y, U_A, U_Y))
# Collect sensitivity analysis results in a data.frame
bind_cols(sens_data[,1:2], bind_rows(sens_data$sens))
sens_data <-
# Plot results
function(string, prefix = "U -> Y strength:") paste(prefix, string)
prepender <-
ggplot(sens_data, aes(x = U_A, y = estimate)) +
geom_ribbon(aes(ymin = ci_95_lower, ymax = ci_95_upper), fill = "grey70") +
geom_line() +
geom_hline(aes(yintercept = coefficients(mod_Y)["A"]), color = "blue", lty = "dashed") +
geom_hline(aes(yintercept = 0), color = "black") +
facet_wrap(~U_Y, labeller = as_labeller(prepender)) +
labs(x = "Strength of U -> A association", y = "ACE and 95% CI")
Main exercise
Here, we will adapt the code above to implement a sensitivity analysis for unmeasured confounding for the smoking-weight gain study.
- The treatment variable is
smkintensity82_71
the change in number of cigarettes smoked per day from 1971 to 1982. - Assume that a graphical analysis has led to the conclusion that conditional exchangeability holds given
smokeyrs
,age
, andasthma
. - Display graphical output from the sensitivity analysis and write a brief discussion of conclusions you can make from the sensitivity analysis (no more than 400 words).
# Load data
library(cidata)
data(nhefs_complete)
# View the data codebook
View(nhefs_codebook)