Teacher: Sarah Kaspar (Centre for Statistical Data Analysis, EMBL Heidelberg)
License: CC BY-NC-SA
Each day consists of lectures and practical demonstrations in R from 10:00 to 12:00 and tutored exercises from 13:00 to 15:00, with 1h lunch break in between.
To install all packages necessary for completing the exercises and running the demonstrations, run the following command:
source("https://www.huber.embl.de/users/kaspar/biostat_2021/install_packages_biostat.R")
Thanks to Mike Smith for providing the installation code.
This day covers data frames, basic data wrangling (select
, filter
, mutate
, summarize
) and ggplot2
.
We learn what sampling and statistical distributions are, and look at some common distributions of biological data. The R demo and exercises are about understanding probability, cumulative distributions and quantiles in R, and we use tools that help you deciding on a distribution for your data.
The goal here is to understand how hypothesis tests work in general. Specifically, we look at the binomial test and tests for comparing two groups (t-test, Wilcoxon test).
Contingency tables are tables of counts from different categories. We use them to find out whether two categorical variables (e.g. disease and treatment) are associated.
In multiple testing scenarios, we adjust the p-values to reduce false positives.