Approach & Outputs

We exploit new data types and new types of experiments and studies by developing the computational techniques needed to turn raw data into biology.

Multi-scale biology in space and time: bringing together different data types and resolutions to find low-dimensional explanations (factors, gradients, regions, graphs and networks) of high-dimensional data, using statistical models, first-principles based theory and machine learning.
Spatial omics in immunooncology to find and improve treatment options for patients.
Multidimensional phenotypes from genetic and drug-based perturbation assays to map context-dependent gene-gene and gene-drug interaction networks.
Many powerful mathematical and computational ideas exist but are difficult to access. We aim to translate them into practical methods and software that make a real difference to biomedical researchers. We sometimes term this approach translational statistics.

Modern Statistics for Modern Biology textbook, with Susan Holmes: online version. There is also a print version published by CUP.

Cellular neighborhood analysis of healthy and malignant lymph nodes based on single-cell resolution spatial proteomics by multiplexed immunohistochemistry.

Cluster-free differential expression analysis of sc-RNA-seq data using LEMUR. Paper link.

Ternary plots of relative sensitivities to targeted kinase inhibitors for a cohort of primary tumour samples of chronic lymphocytic leukaemia (CLL). Paper link.

Spatial omics and imaging

Modern spatial omics techniques measure the spatial distribution of tens of thousands distinct molecules, with major limiting factors being sampling efficiency and spatial resolution. Photon microscopy observes one or a few distinct molecules at resolutions of tens of nanometers. Electron microscopy measures e\(^-\) densities at resolutions of a few angstroms. But how to bring this all together? How to navigate huge Terabyte-scale maps of cells and organisms? How to decipher biological functions and processes, associate them with phenotypes in health and disease, and exploit this for better understanding fundamental biology and advance biotechnology and biomedicine?

Functional precision medicine and immuno-oncology

We integrate observational ’omics data, interventional clinical data, and systematic genetic or chemical perturbation data on (ex-vivo) model systems to decipher the molecular mechanisms of variable sensitivity and resistance of tumors to treatments (precision oncology collaboration with Thorsten Zenz at University Hospital Zurich), and to understand the role of the immune system and the tumour microenvironment in tumourigenesis, progression and treatment (systems immunology collaboration with Sascha Dietrich at University Hospital Düsseldorf).

Open science

As we engage with new data types, we aim to develop high-quality computational methods of wide applicability. We consider the release and maintenance of scientific software an integral part of doing science. We contribute to the Bioconductor project, an open source software collaboration to provide tools for the analysis and understanding of genome-scale data. An example is our DESeq2 package for analyzing count data from high-throughput sequencing.

Mentoring and career development

Science is an intellectual adventure and a creative process done by people. For each of us, our work is at the same time, a means to achieve a scientific goal, a job that enables us pay our bills, and a stage of training and professional development. This includes student internships, BSc/MSc theses, PhD theses, postdoctoral projects. The group, and EMBL more generally, offers a well-established mentoring framework to support these triple objectives. Former group members have moved on to rewarding careers: professors, independent group leaders, senior management or professional scientist roles in industry.

Teaching

We maintain the textbook Modern Statistics for Modern Biology by Susan Holmes and Wolfgang Huber. The book is available online, for free, as HTML. It was published as a printed book in 2019 by Cambridge University Press.

We run the annual summer school CSAMA—Biological Data Science. It usually takes place in June in Brixen/Bressanone. Here is the webpage of the 2025 edition. See here for some impressions.

In July 2023, 2024 and 2025, we co-organized the Ukrainian Biological Data Science Summer School in Uzhhorod, Ukraine. See also Wolfgang’s post about it.

We develop publicly available interactive training materials on statistical methods.

Software

We are a frequent contributor to the Bioconductor project

LEMUR	Cluster-free differential expression analysis of multi-condition single-cell data using Latent Embedding Multivariate Regression
MOFA	Multi-Omics Factor Analysis
DESeq2	Differential gene expression analysis based on the negative binomial distribution
IHW	Multiple testing and false discovery rate (FDR) control by Independent Hypothesis Weighting
EBImage	Image processing and analysis toolbox for R
Rarr	Read Zarr Files in R
rhdf5	R Interface to HDF5
vsn	Normalization and variance stabilizing transformation of fluorescence intensity data
cellHTS2	Analysis of cell-based high-throughput screens
DEXSeq	Inference of differential exon usage in RNA-Seq
HilbertVis	Visualize long vectors of data using Hilbert curves

Python
HTSeq	Processing and analyzing data from high-throughput sequencing assays
SOFA	Semi-supervised (Multi) Omics Factor Analysis
spatialproteomics	lightweight wrapper around xarray to facilitate processing, exploration and analysis of multiplexed immunohistochemistry data