1. (15pts) For each of exercises below, draw a DAG that contains four variables A, B, C, and D. Each
    DAG should imply the (conditional) independencies listed in the corresponding exercise, and only
    these independencies
    Hint: Not all nodes needs to be connected to some other nodes.
    Hint: Marginal independence means unconditional independence.
    A C
    B D
    (a) DAG that implies A |=C | B and no other (conditional or marginal) independencies.
    A C
    B D
    (b) DAG that implies A |=C | B, D and no other (conditional or marginal) independencies.
    A C
    B D
    (c) DAG that implies A |=C | B, D and B |=D | A and no other (conditional or marginal) independencies.
    2
  2. (10pts) Find all pairwise marginal and conditional independencies in the DAG in Figure 1.
    A C
    B D
    Figure 1
    3
  3. (10pts) (Intrumental variables). Many observational studies suffer from confounding. In this exercise
    we investigate a method of “confounding adjustment” which, under certain assumptions, has the remarkable property of producing causal inference even in the presence of unmeasured confounding.
    Let A be the exposure of interest, let Y be the outcome of interest, and let U be all unmeasured variables (confounders) that affect both A and Y . Let Z be a measured variable which have the following
    properties: a) U does not affect Z, b) Z does not affect U, c) Z and U don’t have common causes, d)
    Z affects A, e) Z has no effect on Y , apart from an indirect effect mediated through A. A variable Z
    which have properties a)-e) is called an instrumental variable.
    (a) Draw a DAG that connects A, Y , U, and Z.
    (b) Show that an observed association between Z and Y implies that A has a causal effect on Y (that
    is, we can test whether A has a causal effect on Y by testing whether Z and Y are associated).
    Hint: You only need to explain using a graph; no formula is needed.
    4
  4. (20pts) Consider the following DAG.
    X R S T U V Y
    (a) List all pairs of variables that are independent conditional on the set {R, V }. Use R code to get
    the answer.
    (b) For each pair of nonadjacent variables, give a set of variables that, when conditioned on, renders
    that pair independent. Use R code to get the answer.
    Hint: You can include the empty set if the pair of nonadjacent variables are marginally independent.
    Hint: You only need to provide one possible set. They do not need to be the minimal set.
    (c) Suppose we generate data by the model described in the DAG, and we fit them with the linear
    equation Y = a + bX + cZ. Which of the variables in the model may be chosen for Z so as to
    guarantee that the slope b would be equal to zero?
    Hint: Recall, a non zero slope implies that Y and X are dependent given Z.
    Hint: Include all possible variables. Note also that Z is a single variable (not a combination of
    variables).
    (d) Suppose we fit the data with the equation:
    Y = a + bX + cR + dS + eT,
    which of the coefficients would be zero?
    5
  5. (45pts) Consider the following DAG.
    X W Y
    Z1 Z2
    Z3
    (a) (5pts) For each pair of nonadjacent nodes in the graph, determine whether they are independent
    conditional on all the other variables on the DAG. Use R code to answer this question.
    (b) (5pts) For every variable V in the graph, find a minimal set of nodes that renders V independent
    of all the other variables in the graph. Use R code to answer this question.
    Hint: It is possible that the minimal set is an empty set.
    (c) (5pts) Suppose we wish to estimate the value of Y from measurements taken on all other variables in the model. Find the smallest set of variables that would yield as good an estimate of Y
    as when we measured all variables.
    (d) (5pts) Suppose we wish to predict the value of Z2 from measurements of Z3. Would the quality
    of our prediction improve if we add measurement of W? Explain.
    (e) (5pts) Suppose we want to estimate the causal effect of X on Y . What variables should you
    adjust for? It suffices to give one set of variables.
    (f) (15pts) Use the following command to load the data on (X, W, Y, Z1, Z2, Z3) :
    load(“Assignment3.RData”)
    Write R code to estimate the average causal effect of X on Y using (i) outcome regression,
    assuming a linear model for the outcome (without a quadratic term); (ii) inverse probability
    weighting, assuming a logistic model for the treatment; (iii) doubly robust estimation. You
    should adjust for the same set of variables you give in the previous question.
    (g) (5pts) Based on your estimates, do you think your models for the outcome regression, and inverse
    probability weighting are correct? Explain.

Sample Solution

This question has been answered.

Get Answer