Approach & Outputs
We exploit new data types and new types of experiments and studies by developing the computational techniques needed to turn raw data into biology.
- Multi-scale biology in space and time: bringing together different data types and resolutions to find low-dimensional explanations (factors, gradients, clusters, trees and networks) of high-dimensional data, using statistical models, first-principles based theory and machine learning.
- Use spatial omics in immunooncology to find and improve treatment options for patients.
- Multidimensional phenotyping of genetic and drug-based perturbation assays to map context-dependent gene-gene and gene-drug interaction networks.
- Many powerful mathematical and computational ideas exist but are difficult to access. We aim to translate them into practical methods and software that make a real difference to biomedical researchers. We sometimes term this approach translational statistics.
Functional precision medicine
Omics and imaging technologies are producing increasingly detailed biology-based understanding of human health and disease. The next challenge is using this knowledge to engineer treatments and cures. To this end, we integrate observational data, such as from large-scale sequencing and molecular profiling, with interventional data, such as from systematic genetic or chemical screens, to reconstruct a fuller picture of the underlying causal relationships and actionable intervention points. A fascinating example is our collaboration on molecular mechanisms of individual sensitivity and resistance of tumors to treatments in our precision oncology project together with Thorsten Zenz at University Hospital Zurich and Sascha Dietrich at University Hospital Düsseldorf.
Open science
As we engage with new data types, we aim to develop high-quality computational methods of wide applicability. We consider the release and maintenance of scientific software an integral part of doing science. We contribute to the Bioconductor project, an open source software collaboration to provide tools for the analysis and understanding of genome-scale data. An example is our DESeq2 package for analyzing count data from high-throughput sequencing.
Mentoring and career development
Science is an intellectual adventure and a creative process done by people. For each of us, our work is at the same time, a means to achieve a scientific goal, a job that enables us pay our bills, and a stage of training and professional development. This includes student internships, BSc/MSc theses, PhD theses, postdoctoral projects. The group, and EMBL more generally, offers a well-established mentoring framework to support these triple objectives. Former group members have moved on to rewarding careers: professors, independent group leaders, senior management or professional scientist roles in industry.
Teaching
We maintain the textbook Modern Statistics for Modern Biology by Susan Holmes and Wolfgang Huber. The book is available online, for free, as HTML. It was published as a printed book in 2019 by Cambridge University Press.
We run the annual summer school CSAMA—Biological Data Science. It usually takes place in June in Brixen/Bressanone. Here is the webpage of the 2024 edition. See here for some impressions.
In July 2023, we co-organized the first Biological Data Science Summer School in Ukraine, in Uzhhorod. See also Wolfgang’s post about it. We repeated it in July 2024 and plan to do it again in July 2025.
We develop publicly available interactive training materials on statistical methods.
Software
We are a frequent contributor to the Bioconductor project
LEMUR | Cluster-free differential expression analysis of multi-condition single-cell data using Latent Embedding Multivariate Regression |
MOFA | Multi-Omics Factor Analysis |
DESeq2 | Differential gene expression analysis based on the negative binomial distribution |
IHW | Multiple testing and false discovery rate (FDR) control by Independent Hypothesis Weighting |
EBImage | Image processing and analysis toolbox for R |
Rarr | Read Zarr Files in R |
rhdf5 | R Interface to HDF5 |
vsn | Normalization and variance stabilizing transformation of fluorescence intensity data |
cellHTS2 | Analysis of cell-based high-throughput screens |
DEXSeq | Inference of differential exon usage in RNA-Seq |
HilbertVis | Visualize long vectors of data using Hilbert curves |
Python | |
HTSeq | Processing and analyzing data from high-throughput sequencing assays |
SOFA | Semi-supervised (Multi) Omics Factor Analysis |
spatialproteomics | lightweight wrapper around xarray to facilitate processing, exploration and analysis of multiplexed immunohistochemistry data |