# Smoking

This exercise is from [1].

The above table comes from one of the first studies of the link between lung cancer and smoking, by Richard Doll and A. Bradford Hill. In 20 hospitals in London, UK, patients admitted with lung cancer in the previous year were queried about their smoking behavior. For each patient admitted, researchers studied the smoking behavior of a non-cancer control patient at the same hospital of the same sex and within the same 5-year grouping on age. A smoker was defined as a person who had smoked at least one cigarette a day for at least a year.

1. Construct a table that represents this study in R.
2. Summarize the association, and explain how to interpret it.
3. Is the association significant?

# Epitope data

Here’s the data again:

epitope_data <- data.frame(position=1:100,
count=c(2, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 2, 2, 7, 1, 0, 2, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 2, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0))

As on day 3, we assume a known false positive rate of 1% and this time, we perform a binomial test on all positions:

count <- epitope_data$count p_values <- sapply(count, function(count){ binom.test(count,50,p=0.01, alternative="greater")$p.value
})

Look up the p.adjust function. Can you use it to adjust the p-values and control the family-wise error rate? Is the peak at position 42 still significant after the correction?

# References

[1] Agresti, A. (2006). An Introduction to Categorical Data Analysis: Second Edition. In An Introduction to Categorical Data Analysis: Second Edition. https://doi.org/10.1002/0470114754