# About

**Teacher**: Sarah Kaspar (Centre for Statistical Data Analysis, EMBL Heidelberg)

**License**: CC BY-NC-SA

# Schedule

- Tuesday, 14 Sep 2021, 10:00 – 15:00 CEST

- Thursday, 16 Sep 2021, 10:00 – 15:00 CEST

- Tuesday, 21 Sep 2021, 10:00 – 15:00 CEST

- Thursday, 23 Sep 2021, 10:00 – 15:00 CEST

Each day consists of lectures and practical demonstrations in R from 10:00 to 12:00 and tutored exercises from 13:00 to 15:00, with 1h lunch break in between.

# Materials

To install all packages necessary for completing the exercises and running the demonstrations, run the following command:

`source("https://www.huber.embl.de/users/kaspar/biostat_2021/install_packages_biostat.R")`

Thanks to Mike Smith for providing the installation code.

## Day 1 - Exploratory data analysis

This day covers data frames, basic data wrangling (`select`

, `filter`

, `mutate`

, `summarize`

) and `ggplot2`

.

## Day 2 - Statistical distributions

We learn what sampling and statistical distributions are, and look at some common distributions of biological data. The R demo and exercises are about understanding probability, cumulative distributions and quantiles in R, and we use tools that help you deciding on a distribution for your data.

## Day 3 - Hypothesis testing

The goal here is to understand how hypothesis tests work in general. Specifically, we look at the binomial test and tests for comparing two groups (t-test, Wilcoxon test).

## Day 4 - Contingency tables and multiple testing

Contingency tables are tables of counts from different categories. We use them to find out whether two categorical variables (e.g. disease and treatment) are associated.

In multiple testing scenarios, we adjust the p-values to reduce false positives.