Exercise: Hazard Identification
MVEN10 Risk Assessment in Environment and Public Health
Schedule
Part I: hazard identification: 1-1:50 pm
Part II: exposure assessment: 2-3 pm (do not forget the report)
Part I: Understand Threats to Validity in R
This part helps to practice your R skills and improve your understandings on the threats to internal validity. You will use the R skills acquired from previous classes to perform data management and inspection in R. These data will be used in an analysis on the threats to internal validity. Example R script is provided at the end of this part. You are encourage to complete the exercise by yourself, then compare your code with the example script.
Submission requirements
If present and active at the exercise, no report is required.
Instructions
- Download the spreadsheet with different hazard data.
- Open the file in Excel. Inspect worksheet
base
. Understand the data based on the following variable notions:
Subject ID: unique identifier of subjects.
Exposed: exposure status. 0= unexposed, 1= exposed.
BPb: Blood lead levels (parts per billion) of the subject
Calculate the average BPb level for exposed and unexposed subjects, respectively. Use excel or R.
Compare worksheet
expand
with worksheetbase
. Discuss whether you believe they are the same and why.Go to worksheet
scen1
toscen4
and calculate the average BPb level for the exposed and unexposed subjects from each data.Create a simple comparison plot and a two-group pre-post comparison plot for each scenario based on the corresponding worksheet.
Below is a code to read in the data and create the plot in R. Open your project in posit cloud. Upload the data file in the data folder. Open an empty qmd file, add an R-chunk and paste in the code. You can create multiple chunks by putting the cursor where you want a divide and press the +c green button.
library(readxl)
library(tidyverse)
= "data"
path
<- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "base")
base print(base)
<- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "expand")
expand print(expand)
<- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "scen1")
scen1 print(scen1)
<- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "scen2")
scen2 print(scen2)
<- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "scen3")
scen3 print(scen3)
<- read_xlsx(file.path(path,"exercise_Data_hazardID.xlsx"),sheet = "scen4")
scen4 print(scen4)
# Function to create simple comparison plot
<- function(df, scenario_name) {
create_simple_plot ggplot(df, aes(x = as.factor(Exposed), y = `Observed BPb`, fill = as.factor(Exposed))) +
geom_boxplot(alpha = 0.7, width = 0.5) +
geom_jitter(width = 0.1, size = 2, alpha = 0.8) +
scale_fill_manual(values = c("0" = "skyblue", "1" = "salmon")) +
labs(title = paste("Scenario", gsub("scen", "", scenario_name),
": Observed BPb by Exposure Group"),
x = "Exposed (0 = No, 1 = Yes)",
y = "Blood Lead Level (BPb)") +
theme_minimal() +
theme(legend.position = "none")
}
# Function to create group mean pre-post comparison plot
<- function(df, scenario_name) {
create_group_mean_plot # Calculate group means
<- df %>%
group_means group_by(Exposed) %>%
summarize(
mean_before = mean(BPb_before, na.rm = TRUE),
mean_after = mean(BPb_after, na.rm = TRUE)
%>%
) pivot_longer(
cols = c(mean_before, mean_after),
names_to = "Time",
values_to = "Mean_BPb"
%>%
) mutate(
Time = factor(Time,
levels = c("mean_before", "mean_after"),
labels = c("Pre", "Post")),
Exposed_Group = ifelse(Exposed == 0, "Unexposed", "Exposed")
)
ggplot(group_means, aes(x = Time, y = Mean_BPb, group = Exposed_Group)) +
geom_line(aes(color = Exposed_Group), alpha = 0.7, linewidth = 1.5) +
geom_point(aes(color = Exposed_Group, shape = Exposed_Group), size = 4) +
scale_color_manual(values = c("Unexposed" = "skyblue", "Exposed" = "salmon")) +
scale_shape_manual(values = c("Unexposed" = 16, "Exposed" = 17)) +
labs(title = paste("Scenario", gsub("scen", "", scenario_name),
": Pre-Post Group Mean Comparison"),
subtitle = "Lines show average BPb levels before and after exposure for each group",
x = "Time",
y = "Mean Blood Lead Level (BPb)",
color = "Exposure Group",
shape = "Exposure Group") +
theme_minimal() +
theme(legend.position = "bottom")
}# Create plots for Scenario 1
<- create_simple_plot(scen1, "scen1")
scen1_simple_plot <- create_group_mean_plot(scen1, "scen1")
scen1_group_prepost_plot
# Create plots for Scenario 2
<- create_simple_plot(scen2, "scen2")
scen2_simple_plot <- create_group_mean_plot(scen2, "scen2")
scen2_group_prepost_plot
# Create plots for Scenario 3
<- create_simple_plot(scen3, "scen3")
scen3_simple_plot <- create_group_mean_plot(scen3, "scen3")
scen3_group_prepost_plot
# Create plots for Scenario 4
<- create_simple_plot(scen4, "scen4")
scen4_simple_plot <- create_group_mean_plot(scen4, "scen4") scen4_group_prepost_plot
- Discuss whether there is difference in the simple comparison plots between scenarios, or difference in the pre-post comparison plot between scenarios.
Reference
International Agency for Research on Cancer. 2012. Arsenic and arsenic compound. https://publications.iarc.who.int/Book-And-Report-Series/Iarc-Monographs-On-The-Identification-Of-Carcinogenic-Hazards-To-Humans/Arsenic-Metals-Fibres-And-Dusts-2012
WHO 2011. Evaluations of the Joint FAO/WHO Expert Committee on Food Additives (JECFA). https://apps.who.int/food-additives-contaminants-jecfa-database/Home/Chemical/1863
Integrated Risk Information System. 2025. Arsenic, Inorganic. https://iris.epa.gov/ChemicalLanding/&substance_nmbr=278#values