Exercise: Hazard Identification

MVEN10 Risk Assessment in Environment and Public Health

Author

Zheng Zhou

Published

September 12, 2025

Schedule

Part I: hazard identification: 1-1:50 pm

Part II: exposure assessment: 2-3 pm (do not forget the report)

Part I: Understand Threats to Validity in R

This part helps to practice your R skills and improve your understandings on the threats to internal validity. You will use the R skills acquired from previous classes to perform data management and inspection in R. These data will be used in an analysis on the threats to internal validity. Example R script is provided at the end of this part. You are encourage to complete the exercise by yourself, then compare your code with the example script.

Submission requirements

If present and active at the exercise, no report is required.

Instructions

Download the spreadsheet with different hazard data.

Open the file in Excel. Inspect worksheet base. Understand the data based on the following variable notions:

Subject ID: unique identifier of subjects.
Exposed: exposure status. 0= unexposed, 1= exposed.
BPb: Blood lead levels (parts per billion) of the subject

Calculate the average BPb level for exposed and unexposed subjects, respectively. Use excel or R.
Compare worksheet expand with worksheet base. Discuss whether you believe they are the same and why.
Go to worksheet scen1 to scen4 and calculate the average BPb level for the exposed and unexposed subjects from each data.
Create a simple comparison plot and a two-group pre-post comparison plot for each scenario based on the corresponding worksheet.

Below is a code to read in the data and create the plot in R. Open your project in posit cloud. Upload the data file in the data folder. Open an empty qmd file, add an R-chunk and paste in the code. You can create multiple chunks by putting the cursor where you want a divide and press the +c green button.

library(readxl)
library(tidyverse)

path = "data"

base <- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "base")
print(base)

expand <- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "expand")
print(expand)

scen1 <- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "scen1")
print(scen1)

scen2 <- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "scen2")
print(scen2)

scen3 <- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "scen3")
print(scen3)

scen4 <- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "scen4")
print(scen4)


# Function to create simple comparison plot
create_simple_plot <- function(df, scenario_name) {
  ggplot(df, aes(x = as.factor(Exposed), y = `Observed BPb`, fill = as.factor(Exposed))) +
    geom_boxplot(alpha = 0.7, width = 0.5) +
    geom_jitter(width = 0.1, size = 2, alpha = 0.8) +
    scale_fill_manual(values = c("0" = "skyblue", "1" = "salmon")) +
    labs(title = paste("Scenario", gsub("scen", "", scenario_name), 
                       ": Observed BPb by Exposure Group"),
         x = "Exposed (0 = No, 1 = Yes)",
         y = "Blood Lead Level (BPb)") +
    theme_minimal() +
    theme(legend.position = "none")
}

# Function to create group mean pre-post comparison plot
create_group_mean_plot <- function(df, scenario_name) {
  # Calculate group means
  group_means <- df %>%
    group_by(Exposed) %>%
    summarize(
      mean_before = mean(BPb_before, na.rm = TRUE),
      mean_after = mean(BPb_after, na.rm = TRUE)
    ) %>%
    pivot_longer(
      cols = c(mean_before, mean_after),
      names_to = "Time",
      values_to = "Mean_BPb"
    ) %>%
    mutate(
      Time = factor(Time,
                   levels = c("mean_before", "mean_after"),
                   labels = c("Pre", "Post")),
      Exposed_Group = ifelse(Exposed == 0, "Unexposed", "Exposed")
    )
  
  ggplot(group_means, aes(x = Time, y = Mean_BPb, group = Exposed_Group)) +
    geom_line(aes(color = Exposed_Group), alpha = 0.7, linewidth = 1.5) +
    geom_point(aes(color = Exposed_Group, shape = Exposed_Group), size = 4) +
    scale_color_manual(values = c("Unexposed" = "skyblue", "Exposed" = "salmon")) +
    scale_shape_manual(values = c("Unexposed" = 16, "Exposed" = 17)) +
    labs(title = paste("Scenario", gsub("scen", "", scenario_name), 
                       ": Pre-Post Group Mean Comparison"),
         subtitle = "Lines show average BPb levels before and after exposure for each group",
         x = "Time",
         y = "Mean Blood Lead Level (BPb)",
         color = "Exposure Group",
         shape = "Exposure Group") +
    theme_minimal() +
    theme(legend.position = "bottom")
}
# Create plots for Scenario 1
scen1_simple_plot <- create_simple_plot(scen1, "scen1")
scen1_group_prepost_plot <- create_group_mean_plot(scen1, "scen1")

# Create plots for Scenario 2
scen2_simple_plot <- create_simple_plot(scen2, "scen2")
scen2_group_prepost_plot <- create_group_mean_plot(scen2, "scen2")

# Create plots for Scenario 3
scen3_simple_plot <- create_simple_plot(scen3, "scen3")
scen3_group_prepost_plot <- create_group_mean_plot(scen3, "scen3")

# Create plots for Scenario 4
scen4_simple_plot <- create_simple_plot(scen4, "scen4")
scen4_group_prepost_plot <- create_group_mean_plot(scen4, "scen4")

Discuss whether there is difference in the simple comparison plots between scenarios, or difference in the pre-post comparison plot between scenarios.

Reference

International Agency for Research on Cancer. 2012. Arsenic and arsenic compound. https://publications.iarc.who.int/Book-And-Report-Series/Iarc-Monographs-On-The-Identification-Of-Carcinogenic-Hazards-To-Humans/Arsenic-Metals-Fibres-And-Dusts-2012

WHO 2011. Evaluations of the Joint FAO/WHO Expert Committee on Food Additives (JECFA). https://apps.who.int/food-additives-contaminants-jecfa-database/Home/Chemical/1863

Integrated Risk Information System. 2025. Arsenic, Inorganic. https://iris.epa.gov/ChemicalLanding/&substance_nmbr=278#values