Exercise Introduction to Causal Inference

MVEN10 Risk Assessment in Environment and Public Health

Author

Zheng Zhou

Instructions

Welcome to the Exercise for the course MVEN10. This exercise is designed to help you self-assess your understanding and application of key concepts discussed in the lectures. Please read the instructions and the description of each knowledge point carefully before beginning the exercise.

Note

Data analysis questions (1c, 2b, 3c, 4b & 4c) are optional in this class.

Exercise Details

  • Time Limit: 60 minutes
  • Total Questions: 4
  • Purpose: To evaluate your comprehension and practical application of causal inference methodologies through both theoretical questions and data analysis tasks.

Knowledge Points Covered

  1. Correlation is not Causation: Understanding the difference between correlation and causation and why correlation alone does not imply a causal relationship.
  2. Observational Designs: Understanding different observational study designs, including simple comparison, one-group pre-post comparison, and multi-group pre-post comparison.
  3. Counterfactual and Threats to Internal Validity: The concept of counterfactuals and identifying threats to internal validity in study designs.
  4. Causal Diagrams (DAGs): Interpreting and constructing Directed Acyclic Graphs to represent causal relationships.
  5. Confounding: Identifying confounders and understanding how they affect causal inference.
  6. Randomized Controlled Trial (RCT) and Causal Effect Estimation: The importance of RCTs in estimating causal effects and understanding how randomization helps in causal inference.

Exercise Questions

Question 1: Correlation is Not Causation, and Causal Diagrams

You are provided with a dataset containing information on individuals’ coffee consumption, smoking status, and the incidence of heart disease. The data shows a positive correlation between coffee consumption and heart disease.

Tasks:

(a) Explain why correlation does not imply causation in this context. What other factors might be influencing this observed correlation?

(b) Draw a causal diagram (Directed Acyclic Graph) to represent the relationship between coffee consumption, heart disease, and smoking status. Identify and explain the role of the confounder in this diagram.

Optional

(c) Using the provided dataset, perform a logistic regression analysis to estimate the effect of coffee consumption on heart disease without controlling for smoking status. Repeat the analysis controlling for smoking status and compare the results. What do the differences in the estimates tell you about the importance of confounders?

Question 2: Observational Designs and Confounding

Consider a study that aims to evaluate the impact of a new training program on employee productivity. The dataset includes pre-training and post-training productivity scores for two groups: one that received the training and one that did not.

Tasks:

(a) Draw treatment-time plots, explain the difference between a simple comparison, a one-group pre-post comparison, and a multi-group pre-post comparison design in observational studies. Discuss the strengths and limitations of each design.

Optional

(b) Analyze the provided dataset using a multi-group pre-post comparison design to estimate the effect of the training program on productivity. Specifically, use a difference-in-differences (DiD) approach to estimate the causal effect of the training program. Interpret your findings.

(c) Discuss potential confounders in this study. How could they bias the results of your analysis, and how might you address these confounders in your design or analysis?

Question 3: Counterfactuals and Threats to Internal Validity

A study investigates the effect of a new medication on blood pressure. The dataset includes a group of patients who received the medication and a control group that did not.

Tasks:

(a) Define the concept of a counterfactual in the context of this study. What would the counterfactual scenario represent for a patient in the treatment group?

(b) Identify and discuss at least two potential threats to internal validity in this study. How could these threats affect the estimation of the causal effect of the medication on blood pressure?

Optional

(c) Using the provided dataset, estimate the average treatment effect (ATE) of the medication on blood pressure using a regression model. Discuss any assumptions you made in your analysis and how they might affect the validity of your findings.

Question 4: Randomized Controlled Trials and Causal Effect Estimation

You are given data from a randomized controlled trial (RCT) designed to assess the effectiveness of a new diet program in reducing cholesterol levels. Participants were randomly assigned to either the diet program or a control group.

Tasks:

(a) Explain the importance of randomization in this study and how it helps in making causal inferences.

Optional

(b) Analyze the provided RCT dataset to estimate the causal effect of the diet program on cholesterol levels. Use a t-test or regression analysis to estimate the average treatment effect (ATE).

(c) Discuss how non-compliance (i.e., participants not adhering to their assigned group) could affect the results of the RCT. How might you address non-compliance in your analysis?