Poisson regression

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggbeeswarm)

Example data

I simulate a data set where we compare a control and two treatments (A and B), and the outcome is measured in counts. We have several data points per group:

set.seed(8)
poisson_data <- data.frame(group = c(rep("ctrl",10),rep("A",10),rep("B",10)),
                      count = c(rpois(n=10,lambda=0.5),rpois(n=10,lambda=1.5),rpois(n=10,lambda=3)))

# set ctrl as first level, so comparisons will be made against ctrl
poisson_data$group = factor(poisson_data$group, levels=c("ctrl","A","B"))

Look at the data

poisson_data %>% 
  ggplot(aes(group,count))+
  geom_beeswarm()

Set up a model

We can use a “generalized linear model” of the Poisson family, which works like a linear model (i.e. fits three means), but in each group the residuals are Poisson distributed around the mean.

poisson_model <- glm(count ~ group, data=poisson_data, family=poisson)

Test for individual differences between control and treatment

summary(poisson_model)

## 
## Call:
## glm(formula = count ~ group, family = poisson, data = poisson_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4400  -1.0677   0.1035   0.3367   1.8327  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.3567     0.3780  -0.944  0.34534    
## groupA        0.2513     0.5040   0.499  0.61800    
## groupB        1.5198     0.4173   3.642  0.00027 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 46.764  on 29  degrees of freedom
## Residual deviance: 24.333  on 27  degrees of freedom
## AIC: 87.309
## 
## Number of Fisher Scoring iterations: 5

The model fits the (log-)mean for the control (Intercept), the log-fold-change between groupA and control (groupA) and the log-fold-change between groupB and control (groupB).

R also performs a Wald test for each parameter, and it tests whether this parameter significantly improves the fit: whether the control mean is different from zero, whether there is a difference between ctrl-groupA and whether there is a difference between ctrl-groupB. This is given as Pr(>|z|) in the output of the summary function.
It doesn’t do any correction. Somehow, it’s not very common to do p-value correction when testing for several parameters of the same model, and this is admittedly inconsistent.

Test whether there is any difference between the three groups

If we want to test whether there is any difference between the groups, we have to specify the null hypothesis as a null model. This null model has only an intercept, that is we fit a “grand mean”, an average over all counts.

This is the null model:

nullmodel <- glm( count ~ 1, family=poisson, data= poisson_data)  # 1 means just intercept

The intercept of this model is the grand mean:

exp(nullmodel$coefficients)

## (Intercept) 
##         1.6

mean(poisson_data$count)

## [1] 1.6

Now we can compare the two models with the Anova:

anova(nullmodel, poisson_model, test="LRT")

## Analysis of Deviance Table
## 
## Model 1: count ~ 1
## Model 2: count ~ group
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1        29     46.764                          
## 2        27     24.333  2   22.431 1.346e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The likelihood ratio test (LRT) asks whether the two models significantly differ, and this is the same as asking whether not all groups are the same, or whether there are any differences between the groups.

Notice the difference to classical ANOVA:
- classical ANOVA assumes Gaussian residuals, calculates sum of squares and from this an F-statistic.
- Poisson Anova assumes Poisson residuals, calculates the deviance and from this a chi-square statistic.