Approach & Outputs

We exploit new data types and new types of experiments and studies by developing the computational techniques needed to turn raw data into biology.

Modern Statistics for Modern Biology textbook, with Susan Holmes: online version. There is also a print version published by CUP.

Modern Statistics for Modern Biology textbook, with Susan Holmes: online version. There is also a print version published by CUP.

Cellular neighborhood analysis of healthy and malignant lymph nodes based on single-cell resolution spatial proteomics by multiplexed immunohistochemistry.

Cellular neighborhood analysis of healthy and malignant lymph nodes based on single-cell resolution spatial proteomics by multiplexed immunohistochemistry.

Cluster-free differential expression analysis of sc-RNA-seq data using LEMUR. Paper link.

Cluster-free differential expression analysis of sc-RNA-seq data using LEMUR. Paper link.

Comparison of transformations for single-cell RNA-seq data. Paper link.

Comparison of transformations for single-cell RNA-seq data. Paper link.

Ternary plots of relative sensitivities to targeted kinase inhibitors for a cohort of primary tumour samples of chronic lymphocytic leukaemia (CLL). Paper link.

Ternary plots of relative sensitivities to targeted kinase inhibitors for a cohort of primary tumour samples of chronic lymphocytic leukaemia (CLL). Paper link.

Functional precision medicine

Omics and imaging technologies are producing increasingly detailed biology-based understanding of human health and disease. The next challenge is using this knowledge to engineer treatments and cures. To this end, we integrate observational data, such as from large-scale sequencing and molecular profiling, with interventional data, such as from systematic genetic or chemical screens, to reconstruct a fuller picture of the underlying causal relationships and actionable intervention points. A fascinating example is our collaboration on molecular mechanisms of individual sensitivity and resistance of tumors to treatments in our precision oncology project together with Thorsten Zenz at University Hospital Zurich and Sascha Dietrich at University Hospital Düsseldorf.

Open science

As we engage with new data types, we aim to develop high-quality computational methods of wide applicability. We consider the release and maintenance of scientific software an integral part of doing science. We contribute to the Bioconductor project, an open source software collaboration to provide tools for the analysis and understanding of genome-scale data. An example is our DESeq2 package for analyzing count data from high-throughput sequencing.

Mentoring and career development

Science is an intellectual adventure and a creative process done by people. For each of us, our work is at the same time, a means to achieve a scientific goal, a job that enables us pay our bills, and a stage of training and professional development. This includes student internships, BSc/MSc theses, PhD theses, postdoctoral projects. The group, and EMBL more generally, offers a well-established mentoring framework to support these triple objectives. Former group members have moved on to rewarding careers: professors, independent group leaders, senior management or professional scientist roles in industry.

Teaching

We maintain the textbook Modern Statistics for Modern Biology by Susan Holmes and Wolfgang Huber. The book is available online, for free, as HTML. It was published as a printed book in 2019 by Cambridge University Press.

We run the annual summer school CSAMA—Biological Data Science. It usually takes place in June in Brixen/Bressanone. Here is the webpage of the 2024 edition. See here for some impressions.

In July 2023, we co-organized the first Biological Data Science Summer School in Ukraine, in Uzhhorod. See also Wolfgang’s post about it. We repeated it in July 2024 and plan to do it again in July 2025.

We develop publicly available interactive training materials on statistical methods.

Software

We are a frequent contributor to the Bioconductor project

LEMUR Cluster-free differential expression analysis of multi-condition single-cell data using Latent Embedding Multivariate Regression
MOFA Multi-Omics Factor Analysis
DESeq2 Differential gene expression analysis based on the negative binomial distribution
IHW Multiple testing and false discovery rate (FDR) control by Independent Hypothesis Weighting
EBImage Image processing and analysis toolbox for R
Rarr Read Zarr Files in R
rhdf5 R Interface to HDF5
vsn Normalization and variance stabilizing transformation of fluorescence intensity data
cellHTS2 Analysis of cell-based high-throughput screens
DEXSeq Inference of differential exon usage in RNA-Seq
HilbertVis Visualize long vectors of data using Hilbert curves
Python
HTSeq Processing and analyzing data from high-throughput sequencing assays
SOFA Semi-supervised (Multi) Omics Factor Analysis
spatialproteomics lightweight wrapper around xarray to facilitate processing, exploration and analysis of multiplexed immunohistochemistry data

The manifold hypothesis

The manifold hypothesis