Topic 14 Sensitivity Analyses for Unmeasured Variables
Learning Goals
SENS1
: Evaluate the sensitivity of findings to data quality and propose appropriate sensitivity analyses for a research investigation.SENS2
: Conduct and communicate the results of a sensitivity analysis for unmeasured confounding.
Background
We will be implementing the approach for a sensitivity analysis proposed in the research article Assessing Sensitivity to Unmeasured Confounding Using a Simulated Potential Confounder by Carnegie, Harada, and Hill.
It looks like their package treatSens was taken off CRAN and is undergoing revisions. We will implement their approach for a sensitivity analysis for the particular case of quantitative treatment, outcome, and unmeasured confounder.
Summary of approach:
Identify set of variables Z that results in conditional exchangeability of the treatment groups.
Express the joint distribution of the data as below, with the assumption that U and Z are independent.
P(Y,A,U∣Z)=P1(Y∣A,U,Z)P2(A∣U,Z)P3(U∣Z)=P1(Y∣A,U,Z)P2(A∣U,Z)P3(U)
- Express these components of the joint distribution as structural equations:
Y∼N(β0+β1A+β2Z+β3U,σ2Y) A∼N(α0+α1Z+α2U,σ2A) U∼N(0,σ2U)
- Using probability theory, we can rewrite the joint distribution to express U as a function of A and Y (see paper).
- These formulas require the residuals and estimated standard deviation of residuals from the following models:
Y ~ A+Z
A ~ Z
- Key idea: We can use observed data simulate U such that it is consistent with the above structural equations.
- These formulas require the residuals and estimated standard deviation of residuals from the following models:
- Sensitivity parameters: β3 and α2. In our code below, we will call these
assoc_Y
andassoc_A
respectively.
Exercises
You can download a template RMarkdown file to start from here.
Warm-up
Work to understand what the code below is doing. Clarify with the instructor as needed. You will be adapting this code to implement a sensitivity analysis in the context of the smoking-weight gain study that we’ve explored previously.
library(dplyr)
library(ggplot2)
function(.data, model_A, model_Y, assoc_A, assoc_Y) {
sensitivity_analysis <- nrow(.data)
n <-
# Obtain residuals with residuals()
# Obtain residual variances with sigma()
residuals(model_A)
res_A <- sigma(model_A)^2
res_var_A <- residuals(model_Y)
res_Y <- sigma(model_Y)^2
res_var_Y <-
# Compute the mean and variance of U given A and Y
(assoc_A/res_var_A)*res_A
mean_U_term1 <- (((res_var_A - assoc_A^2)*assoc_Y)/(res_var_A*res_var_Y))*res_Y
mean_U_term2 <- mean_U_term1 + mean_U_term2
mean_U <-
(res_var_A - assoc_A^2)/(res_var_A*res_var_Y)
var_U_term1 <- res_var_Y - assoc_Y^2 + ((assoc_A*assoc_Y)^2)/res_var_A
var_U_term2 <- var_U_term1*var_U_term2
var_U <-
# Simulate U and add it to the data
rnorm(n, mean = mean_U, sd = sqrt(var_U))
U <-$U <- U
.data
########################################################################
# The part below is the only part you need to change to implement
# the sensitivity analysis in a new context.
# Refit model to estimate the causal effect
lm(Y ~ A+C+U, data = .data)
updated_model <-# The names of the coefficients and confidence interval output rows
# are called "A" for the treatment variable A.
# This will change in a new dataset.
list(c(
estimate = unname(coefficients(updated_model)["A"]),
ci_95_lower = confint(updated_model)["A",1],
ci_95_upper = confint(updated_model)["A",2]
))
}
# Set up simulated example data
set.seed(451)
10000
n <- rnorm(n, 10, 1)
U <- rnorm(n, 10, 1)
C <- rnorm(n, 2+C+U, 2)
A <- rnorm(n, 10 + A + C + U, 10)
Y <-
data.frame(U, C, A, Y)
sim_data <-
# Begin the sensitivity analysis
# Fit required models for the sensitivity analysis
lm(A ~ C, data = sim_data)
mod_A <- lm(Y ~ A+C, data = sim_data)
mod_Y <-
# Set up degree of association between U and A and between U and Y
# The U->A associations have some constraints: we set up values
# for the U->A associations that are at most equal to the
# standard deviation of the residuals from the model for A.
seq(from = 0.01, to = sigma(mod_A), length.out = 10)
U_A_assocs <- seq(from = 0.5, to = 5, by = 0.5)
U_Y_assocs <-
# Form all combinations of the U->A and U->Y sensitivity parameters
expand.grid(U_A = U_A_assocs, U_Y = U_Y_assocs)
sens_data <-
# Run sensitivity analysis
sens_data %>%
sens_data <- group_by(U_A, U_Y) %>%
mutate(sens = sensitivity_analysis(sim_data, mod_A, mod_Y, U_A, U_Y))
# Collect sensitivity analysis results in a data.frame
bind_cols(sens_data[,1:2], bind_rows(sens_data$sens))
sens_data <-
# Plot results
function(string, prefix = "U -> Y strength:") paste(prefix, string)
prepender <-
ggplot(sens_data, aes(x = U_A, y = estimate)) +
geom_ribbon(aes(ymin = ci_95_lower, ymax = ci_95_upper), fill = "grey70") +
geom_line() +
geom_hline(aes(yintercept = coefficients(mod_Y)["A"]), color = "blue", lty = "dashed") +
geom_hline(aes(yintercept = 0), color = "black") +
facet_wrap(~U_Y, labeller = as_labeller(prepender)) +
labs(x = "Strength of U -> A association", y = "ACE and 95% CI")
Main exercise
Here, we will adapt the code above to implement a sensitivity analysis for unmeasured confounding for the smoking-weight gain study.
- The treatment variable is
smkintensity82_71
the change in number of cigarettes smoked per day from 1971 to 1982. - Assume that a graphical analysis has led to the conclusion that conditional exchangeability holds given
smokeyrs
,age
, andasthma
. - Display graphical output from the sensitivity analysis and write a brief discussion of conclusions you can make from the sensitivity analysis (no more than 400 words).
# Load data
library(cidata)
data(nhefs_complete)
# View the data codebook
View(nhefs_codebook)