library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ dplyr 1.0.6
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggbeeswarm)
I simulate a data set where we compare a control and two treatments (A and B), and the outcome is measured in counts. We have several data points per group:
set.seed(8)
poisson_data <- data.frame(group = c(rep("ctrl",10),rep("A",10),rep("B",10)),
count = c(rpois(n=10,lambda=0.5),rpois(n=10,lambda=1.5),rpois(n=10,lambda=3)))
# set ctrl as first level, so comparisons will be made against ctrl
poisson_data$group = factor(poisson_data$group, levels=c("ctrl","A","B"))
Look at the data
poisson_data %>%
ggplot(aes(group,count))+
geom_beeswarm()
We can use a “generalized linear model” of the Poisson family, which works like a linear model (i.e. fits three means), but in each group the residuals are Poisson distributed around the mean.
poisson_model <- glm(count ~ group, data=poisson_data, family=poisson)
summary(poisson_model)
##
## Call:
## glm(formula = count ~ group, family = poisson, data = poisson_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4400 -1.0677 0.1035 0.3367 1.8327
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3567 0.3780 -0.944 0.34534
## groupA 0.2513 0.5040 0.499 0.61800
## groupB 1.5198 0.4173 3.642 0.00027 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 46.764 on 29 degrees of freedom
## Residual deviance: 24.333 on 27 degrees of freedom
## AIC: 87.309
##
## Number of Fisher Scoring iterations: 5
The model fits the (log-)mean for the control (Intercept), the log-fold-change between groupA and control (groupA) and the log-fold-change between groupB and control (groupB).
R also performs a Wald test for each parameter, and it tests whether this parameter significantly improves the fit: whether the control mean is different from zero, whether there is a difference between ctrl-groupA and whether there is a difference between ctrl-groupB. This is given as Pr(>|z|) in the output of the summary function.
It doesn’t do any correction. Somehow, it’s not very common to do p-value correction when testing for several parameters of the same model, and this is admittedly inconsistent.
If we want to test whether there is any difference between the groups, we have to specify the null hypothesis as a null model. This null model has only an intercept, that is we fit a “grand mean”, an average over all counts.
This is the null model:
nullmodel <- glm( count ~ 1, family=poisson, data= poisson_data) # 1 means just intercept
The intercept of this model is the grand mean:
exp(nullmodel$coefficients)
## (Intercept)
## 1.6
mean(poisson_data$count)
## [1] 1.6
Now we can compare the two models with the Anova:
anova(nullmodel, poisson_model, test="LRT")
## Analysis of Deviance Table
##
## Model 1: count ~ 1
## Model 2: count ~ group
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 29 46.764
## 2 27 24.333 2 22.431 1.346e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The likelihood ratio test (LRT) asks whether the two models significantly differ, and this is the same as asking whether not all groups are the same, or whether there are any differences between the groups.
Notice the difference to classical ANOVA:
- classical ANOVA assumes Gaussian residuals, calculates sum of squares and from this an F-statistic.
- Poisson Anova assumes Poisson residuals, calculates the deviance and from this a chi-square statistic.