- (15pts) For each of exercises below, draw a DAG that contains four variables A, B, C, and D. Each
DAG should imply the (conditional) independencies listed in the corresponding exercise, and only
these independencies
Hint: Not all nodes needs to be connected to some other nodes.
Hint: Marginal independence means unconditional independence.
A C
B D
(a) DAG that implies A |=C | B and no other (conditional or marginal) independencies.
A C
B D
(b) DAG that implies A |=C | B, D and no other (conditional or marginal) independencies.
A C
B D
(c) DAG that implies A |=C | B, D and B |=D | A and no other (conditional or marginal) independencies.
2 - (10pts) Find all pairwise marginal and conditional independencies in the DAG in Figure 1.
A C
B D
Figure 1
3 - (10pts) (Intrumental variables). Many observational studies suffer from confounding. In this exercise
we investigate a method of “confounding adjustment” which, under certain assumptions, has the remarkable property of producing causal inference even in the presence of unmeasured confounding.
Let A be the exposure of interest, let Y be the outcome of interest, and let U be all unmeasured variables (confounders) that affect both A and Y . Let Z be a measured variable which have the following
properties: a) U does not affect Z, b) Z does not affect U, c) Z and U don’t have common causes, d)
Z affects A, e) Z has no effect on Y , apart from an indirect effect mediated through A. A variable Z
which have properties a)-e) is called an instrumental variable.
(a) Draw a DAG that connects A, Y , U, and Z.
(b) Show that an observed association between Z and Y implies that A has a causal effect on Y (that
is, we can test whether A has a causal effect on Y by testing whether Z and Y are associated).
Hint: You only need to explain using a graph; no formula is needed.
4 - (20pts) Consider the following DAG.
X R S T U V Y
(a) List all pairs of variables that are independent conditional on the set {R, V }. Use R code to get
the answer.
(b) For each pair of nonadjacent variables, give a set of variables that, when conditioned on, renders
that pair independent. Use R code to get the answer.
Hint: You can include the empty set if the pair of nonadjacent variables are marginally independent.
Hint: You only need to provide one possible set. They do not need to be the minimal set.
(c) Suppose we generate data by the model described in the DAG, and we fit them with the linear
equation Y = a + bX + cZ. Which of the variables in the model may be chosen for Z so as to
guarantee that the slope b would be equal to zero?
Hint: Recall, a non zero slope implies that Y and X are dependent given Z.
Hint: Include all possible variables. Note also that Z is a single variable (not a combination of
variables).
(d) Suppose we fit the data with the equation:
Y = a + bX + cR + dS + eT,
which of the coefficients would be zero?
5 - (45pts) Consider the following DAG.
X W Y
Z1 Z2
Z3
(a) (5pts) For each pair of nonadjacent nodes in the graph, determine whether they are independent
conditional on all the other variables on the DAG. Use R code to answer this question.
(b) (5pts) For every variable V in the graph, find a minimal set of nodes that renders V independent
of all the other variables in the graph. Use R code to answer this question.
Hint: It is possible that the minimal set is an empty set.
(c) (5pts) Suppose we wish to estimate the value of Y from measurements taken on all other variables in the model. Find the smallest set of variables that would yield as good an estimate of Y
as when we measured all variables.
(d) (5pts) Suppose we wish to predict the value of Z2 from measurements of Z3. Would the quality
of our prediction improve if we add measurement of W? Explain.
(e) (5pts) Suppose we want to estimate the causal effect of X on Y . What variables should you
adjust for? It suffices to give one set of variables.
(f) (15pts) Use the following command to load the data on (X, W, Y, Z1, Z2, Z3) :
load(“Assignment3.RData”)
Write R code to estimate the average causal effect of X on Y using (i) outcome regression,
assuming a linear model for the outcome (without a quadratic term); (ii) inverse probability
weighting, assuming a logistic model for the treatment; (iii) doubly robust estimation. You
should adjust for the same set of variables you give in the previous question.
(g) (5pts) Based on your estimates, do you think your models for the outcome regression, and inverse
probability weighting are correct? Explain.
Sample Solution