This webpage contains teaching materials I have been designing for various courses. The labs consist of .Rnw or .Rmd files which require the installation of several packages.

The necessary packages are also indicated at the beginning of each lab or there is an R script–file contained in the git repo that will install the necessary packages.

Most of the labs have been built using knitr and BiocStyle in connection with .Rmd documents.

The material on this page is licensed under a MIT licence, the full text of which you can find here. It basically means, you can do anything you like with the software and the associated materials.

This lab gives an introduction to R using many packages from the tidyverse.

Check out the git repository here: https://git.embl.de/klaus/tidyverse_R_intro

The links below link to html pages of the lab, the repo contains the full material.

Content | Link |
---|---|

Data handling lab | tidyverse_R_intro |

This labs introduces essential data handling techniques and graphics in R.

Check out the git repository here: https://git.embl.de/klaus/graphics_and_data_handling

The links below link to html page of the lab, the repo contains the full material.

Content | Link |
---|---|

Data handling lab | graphics and data handling handling lab |

These labs introduces various statistical methods used in bioinformatics. They are applied mostly to bulk– and single cell RNA–Seq data. Check out the git repository here: https://git.embl.de/klaus/stat_methods_bioinf

The links below link to html pages of the labs, the repo contains the full material.

Content | Link |
---|---|

Data handling lab | Data handling lab |

Graphics for bioinformatics | Graphics for bioinformatics lab |

Factor analysis, multiple testing, machine Learning | Factor analysis, testing and Machine Learning lab |

This material discusses the analysis of a high content screening data set. It has been prepared for the EMBO course High–Throughput Microscopy for Systems Biology in October 2016.

You can find its git repository here.

This introduces Machine Learning (knn based classification) using a simple high-content screening data set.

You can find its git repository here.

It was created for a Machine Learning course at with Prof. Bernd Bischl & colleagues at EMBL in June 2018.

This was created together with Frank Stein: using a TMT data set, we explore (QC) the quantified proteomics data and then analyze differential abundance in the normalized data using linear models.

The nice overview figure was create by Isabelle Becher.

You can find its git repository here.

This was created for an EMBO quantitative proteomics course in 2018.

**Material that is not updated anymore, but might still be of interest.**

This material gives a concise introduction to R.

Large parts of this material are based on the contributed documentation on CRAN. Notably, “Applied Statistics for Bioinformatics Using R” , “IcebreakeR” and “A (very) short Introduction to R” as well as the “Best first R tutorial” and introductory material from Laurent Gatto.

An in-depth resource for the details of R-programming is Advanced R.

This material gives an introduction to data handling and data reshaping with R, including a lot of data handling techniques using the dplyr package and reshaping using the tidyr and reshape2 packages. We also introduce chaining with the magrittr package.

The material on dplyr is to some extend based on tutorials by Kevin Markham and Dirk Schuhmacher. The illustration of the dplyr verbs is adapted from a presentation of H. Wickham.

Example data were provided by Michele Christovao, Elisabeth Zielonka and Ina Kalinina.

Typical summary statistics for location and scale as well as common diagnostic plots are presented. The plots are given both in base R as well as using ggplot2 commands.

Large parts of this material are based on the plots used in “Applied Statistics for Bioinformatics Using R”.

Lars Velten contributed the Protein exercise using ggplot. (Note the data used in the protein-example has been simulated and does not correspond to any real experimental data).

The ggplot explanations have been inspired by the ggplot2 book as well as the ggplot2 intro by Josef Fruehwald.

Wolfgang Huber provided nice thoughts & slides on color usage for graphics.

This is a very concise introduction to important statistical methods in bioinformatics: dimensionality reduction, clustering and regression.

These techniques are illustrated in the context of the analysis of (single cell) RNA–Seq data.

The material on statistical distributions is based on “Applied Statistics for Bioinformatics Using R”.

I adapt the usage of the bodyfat data as an example data set for multivariate models such as regression and PCA from Michael Lavine’s Introduction to Statistical Thought, which is an excellent introductory statistics textbook in itself.

The slides are partially based on material by Wolfgang Huber and John Marioni (EMBL-EBI).

This is an introduction to hypothesis testing, including multiple testing as well as advanced topics such as regularized t-statistics, independent filtering and empirical null estimation.

The material on the basic tests is based on “Applied Statistics for Bioinformatics Using R”.

Almost all of the slides are by Wolfgang Huber.

The material on tests for categorical data is mainly based on the book Introductory Statistics with R, which is also great for learning R and statistics at the same time.

This is a concise introduction to hypothesis testing, including only the most widely used tests but also explaining the idea of permutation tests, since these can be very useful in practice.

The material also covers multiple testing as well as advanced topics such as regularized t-statistics, independent filtering and empirical null estimation.

The permutation test explanations were inspired by Tim Hesterberg’s excellent review on resampling for undergraduates.

This material introduces a complete workflow for DE analysis of RNA–Seq data starting from the raw FASTQ files. It performs a re-analysis of the RNA-Seq data analyzed in Uslu et. al. – Long-range enhancers regulating Myc expression are required for normal facial morphogenesis, 2014

The material is largely based on the documentation of the DESeq2 package on Bioconductor by Mike Love, Simon Anders and Wolfgang Huber.

The first part of the lab, from FASTQ files to the count-table follows Anders et. al. - Count-based differential expression analysis of RNA sequencing data using R and Bioconductor, 2013 closely.

Simon Anders also provided the slides.

Type | Link |
---|---|

Slides | Slides DESeq2 - Wolfgang Huber |

Slides | Slides DESeq2 |

Slides | Slides HT Sequencing |

Lab | Lab pdf |

R script | R–script |

Rnw | .Rnw file |

This material introduces a workflow for DE analysis of RNA-Seq data starting from the gene count table. It is similar to the worklflow above and performs a re–analysis of the RNA-Seq data analyzed in Uslu et. al. – Long-range enhancers regulating Myc expression are required for normal facial morphogenesis, 2014

It has been created for the DNA/RNA module of the 2014 EMBL predoc course and uses html instead of LaTeX.

The material is largely based on the documentation of the DESeq2 package and the rnaseqGene workflow on Bioconductor by Mike Love, Simon Anders and Wolfgang Huber.

Type | Link |
---|---|

Lab | Lab html |

R script | R–script |

Rmd | .Rmd file |

This is an introduction to Machine Learning, it is still work in progress.

This is heavily based on material form S. Arora (Bioc Seattle, Oct 14) and VJ Carey (Brixen 2011).

Type | Link |
---|---|

Lab | Lab html |

R script | R–script |

Rmd | .Rmd file |