Conjugate models

A solution to exercise BADT

Author

Sahlin

library(ggplot2)

Task 1

Test an hypothesis using conjugate model. How sensitive is the conclusion from the choice of prior?

Let \(\lambda\) be the average number of goals scored in a Women’s World Cup game. We’ll analyse \(\lambda\) by a Gamma-Poisson model where data \(Y_i\) is the observed number of goals scored in a sample of World Cup games

Specify the model
Plot and summarize our prior understanding of \(\lambda\).
Why is the Poisson model a reasonable choice for our data
\(Y_i\)?

Use the wwc_2019_matches data from fivethirtyeight

The wwc_2019_matches data includes the number of goals scored by the two teams in each 2019 Women’s World Cup match. We summed the scores by the two teams per game, made a histogram and calculated some summary statistics:

Some larger datasets need to be installed separately, like senators and
house_district_forecast. To install these, we recommend you install the
fivethirtyeightdata package by running:
install.packages('fivethirtyeightdata', repos =
'https://fivethirtyeightdata.github.io/drat/', type = 'source')

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   2.000   3.000   2.808   3.000  13.000

[1] 52

Make a test if the average number of goals scored in a Women’s World Cup game is less than 1.8.

# Summary statistics of data
m <- mean(goals)
n <- length(goals)

# Prior 

## Flat
s_prior = 1 #shape
r_prior = 1/2 #rate
#expected value is shape/rate
s_prior/r_prior

[1] 2

#variance increase with shape and decrease with rate
s_prior/r_prior^2

[1] 4

## Strong
s_prior = 1*20
r_prior = 1/2*20
#expected value is shape/rate. 
s_prior/r_prior

[1] 2

#variance increase with shape and decrease with rate
s_prior/r_prior^2

[1] 0.2

# Posterior
s_post = s_prior + m*n #shape
r_post = r_prior + n #rate

pp <- ppoints(200)
#yy <- seq(0.01,10,by=0.01)
yy <- qgamma(pp,shape=s_prior,rate=r_prior)
df <- data.frame(goals=c(yy,yy),
           pdf=c(dgamma(yy,shape=s_prior,rate=r_prior),dgamma(yy,shape=s_post,rate=r_post)),
           dist=rep(c("prior","post"),each=200))

ggplot(df,aes(x=goals,y=pdf,col=dist)) +
  geom_line() +
  annotate(geom="point",x=m,y=0) +
  geom_vline(xintercept=1.8)

# test if lambda is less than 1.8

# I calculate the probability that lambda is less than 1.8 using the posterior. If this value is not super small, I will consider it to 
pgamma(1.8,shape=s_post,rate=r_post)

[1] 9.125193e-07

# 95% probability interval 
c(qgamma(0.025,shape=s_post,rate=r_post),
qgamma(0.975,shape=s_post,rate=r_post))

[1] 2.285608 3.099771

Task 2

Use conjugate models to construct probability intervals for the results from a clinical trial and compare to a published meta-analysis.

Lancet paper

A relative risk (RR) is a ratio of the probability of an event occurring in the exposed group versus the probability of the event occurring in the non-exposed group.

Reproduce the probability interval for a RR from the forest plots tables e.g. Scales et al 2003
Calculate a probability interval for one of the studies for which the RR is not calculable, e.g. Hall et al 2014

# data Scales
y_mask = 3
N_mask = 16
y_nomask = 4
N_nomask = 15

if(FALSE){
# not calculable Hall
y_mask = 0
N_mask = 42
y_nomask = 0
N_nomask = 6
}

# prior
a_prior = 1
b_prior = 1

# posterior
a_mask = a_prior + y_mask
b_mask = b_prior + N_mask - y_mask
a_nomask = a_prior + y_nomask
b_nomask = b_prior + N_nomask - y_nomask

pp = ppoints(200)
df <- data.frame(theta=pp,pdf=c(dbeta(pp,a_mask,b_mask),dbeta(pp,a_nomask,b_nomask)),treatment=rep(c("mask","no mask"),each=200))

ggplot(df,aes(x=theta,y=pdf,col=treatment))+
  geom_line()

# sample to calculate posterior for the derived quantity RR
niter = 10^4

RR = rbeta(niter,a_mask,b_mask)/rbeta(niter,a_nomask,b_nomask)

# summarise the posterior for RR - a value less than 1 is in favour of wearing face masks

mean(RR) # posterior mean

[1] 0.8918846

quantile(RR,probs=c(0.025,0.975)) # posterior 95% probability interval

     2.5%     97.5% 
0.2056288 2.4259585