New paper: Biological Plasticity Rescues Target Activity in CRISPR Knockouts

Arne H. Smits, Frederik Ziebell, Gerard Joberty, …, Lars M. Steinmetz, Gerard Drewes and Wolfgang Huber.

Gene knockouts (KOs) are efficiently engineered through CRISPR-Cas9-induced frameshift mutations. While DNA editing efficiency is readily verified by DNA sequencing, a systematic understanding of the efficiency of protein elimination has been lacking. Here, we devised an experimental strategy combining RNA-seq and triple-stage mass spectrometry to characterize 193 genetically verified deletions targeting 136 distinct genes generated by CRISPR-induced frameshifts in HAP1 cells. We observed residual protein expression for about one third of the quantified targets, at variable levels from low to original, and identified two causal mechanisms, translation reinitiation leading to N-terminally truncated target proteins, or skipping of the edited exon leading to protein isoforms with internal sequence deletions. Detailed analysis of three truncated targets, BRD4, DNMT1 and NGLY1, revealed partial preservation of protein function. Our results imply that systematic characterization of residual protein expression or function in CRISPR-Cas9 generated KO lines is necessary for phenotype interpretation.

The paper will soon be available online.

Wolfgang Huber teaching at Stanford University

The Holmes and Huber labs collaborate on developing statistical tools for large multi-layer data analyses, for integrating large, heterogeneous biological data, and for finding applications in molecular medicine. They aim to deliver tools that are easy to use by domain-scientists to analyze their own data – for instance by providing the tools in the form of R / Bioconductor packages.

Together they want to help the next generation of biologists understand the “black box” of statistics by training them in quantitative statistical methods. They have written a textbook (Modern Statistics for Modern Biology) and together, they teach a summer course (Stats 366 – Bios 221) at Stanford. They keep further developing these materials, to take up new scientific developments (e.g. new data types), new methods, or new statistical or computational ideas.

Read more.

Welcome Anna Sommani

Anna is a Master student in Physics at the University of Heidelberg with a curriculum specialised in computational physics and biophysics. She joined the Huber group to work on a 12 month interdisciplinary project in collaboration with the Merten group (EMBL Heidelberg). The project uses the latest microfluidic approaches for antibody discovery in combination with single-cell RNA sequencing. Her research is to develop, test and apply the high-level statistical data analysis needed for this cutting-edge single-cell analysis platform.

German Conference on Bioinformatics 2019

The German Conference on Bioinformatics (GCB) is an annual, international conference devoted to all areas of bioinformatics and meant as a platform for the whole bioinformatics community. Recent meetings attracted a multinational audience with 250 – 300 participants each year.
In 2019, the conference focuses on bringing physicians, bioinformatics & medical informatics together and aims to showcase applications and opportunities beyond. Spearheading scientists will be presenting along with young researchers and industry representatives. Workshops will provide opportunities for hands-on experience.

The upcoming GCB will be held at the German Cancer Research Center in Heidelberg. The first day 16 September is reserved for workshops and satellite meetings. The main conference will take place from September 17-19. The schedule will allow for fly-in on Monday and fly-out on Thursday or Friday.

CSAMA 2019 – Statistical Data Analysis for Genome-Scale Biology

CSAMA 2019 (17th edition)
Statistical Data Analysis for Genome Scale Biology
Bressanone-Brixen, Italy (South Tyrol Alps)
July 21-26, 2019

Lecturers:

  • Vincent J. Carey, Harvard Medical School
  • Laurent Gatto, University of Cambridge
  • Robert Gentleman, 23andMe, Mountain View
  • Wolfgang Huber, European Molecular Biology Laboratory (EMBL), Heidelberg
  • Martin Morgan, Roswell Park Comprehensive Cancer Center, Buffalo
  • Johannes Rainer, European Academy of Bozen (EURAC)
  • Charlotte Soneson, University of Zurich
  • Levi Waldron, CUNY School of Public Health at Hunter College, New York

Teaching Assistants:

  • Simone Bell, EMBL, Heidelberg
  • Lori Shepherd, RPCCC, Buffalo
  • Mike L. Smith, EMBL, Heidelberg

The one-week intensive course Statistical Data Analysis for Genome-Scale Biology teaches statistical and computational analysis of multi-omics studies in biology and biomedicine. It covers the underlying theory and state of the art (the morning lectures) and practical hands-on exercises based on the R / Bioconductor environment (the afternoon labs). At the end of the course, you should be able to run analysis workflows on your own (multi-)omic data, adapt and combine different tools, and make informed and scientifically sound choices about analysis strategies.

Topics include:

  • Introduction to R and Bioconductor
  • The elements of statistics: hypothesis testing, multiple testing, regression, regularization, clustering and classification, parallelization and performance (machine learning), visualisation
  • RNA-Seq data analysis
  • Computing with sequences and genomic intervals
  • Working with annotation – genes, genomic features, variants, transcripts and proteins
  • Gene set enrichment analysis
  • Mass spec proteomics and metabolomics
  • Basis of microbiome analysis
  • Experimental design, batch effects and confounding
  • Reproducible research and workflow authoring with R markdown
  • Package development, version control and developer tools (incl. git, github, RStudio)
  • Working with large data: performance parallelisation and cloud computing

The course consists of

  • morning lectures: 20 x 45 minutes: Monday to Friday 8:30h – 12:00h
  • 4 practical computer tutorials in the afternoons (13:30h – 16:30h) on Monday, Tuesday, Thursday and Friday

Visit the course’s website at: http://www.huber.embl.de/csama

New paper: Gain of CTCF-anchored chromatin loops marks the exit from naive pluripotency

The genome of pluripotent stem cells adopts a unique three-dimensional architecture featuring weakly condensed heterochromatin and large nucleosome-free regions. Yet, it is unknown whether structural loops and contact domains display characteristics that distinguish embryonic stem cells (ESCs) from differentiated cell types. We used genome-wide chromosome conformation capture and super-resolution imaging to determine nuclear organization in mouse ESC and neural stem cell (NSC) derivatives. We found that loss of pluripotency is accompanied by widespread gain of structural loops. This general architectural change correlates with enhanced binding of CTCF and cohesins and more pronounced insulation of contacts across chromatin boundaries in lineage-committed cells. Reprogramming NSCs to pluripotency restores the unique features of ESC domain topology. Domains defined by the anchors of loops established upon differentiation are enriched for developmental genes. Chromatin loop formation is a pervasive structural alteration to the genome that accompanies exit from pluripotency and delineates the spatial segregation of developmentally regulated genes.

Read more

Congratulations Laleh

Laleh Haghverdi won the Peter and Traudl Engelhorn Foundation research prize 2018 on the topic “Computational Biology: New Methods in Biology-Oriented Information Technologies with Impact on Basic Research and Application” for her contributions to development of computational tools for understanding and analysis of single-cell transcriptomics data.

Single-cell measurement technologies have been increasingly in use over the past years for investigation of heterogeneities among cells (e.g. in tumors) or transient cell states as in development and cell differentiation. The new single-cell measurement techniques pose new computational challenges for analysis and interpretation of the collected data.

Laleh has contributed to pioneering research on cell lineage tree analysis and development of data integration methods for single-cell data. Her proposed computational strategy for analysis of continuous cell lineage trajectories based on diffusion maps is being used by several researchers today and her contribution to single-cell transcriptomics data integration by mutual nearest neighbors (MNN) matching in high dimensions has proven effective for accelerated achievement to the goals of large scale single-cell projects such as the Human Cell Atlas (HCA), where integrating several data sets collected at several laboratories using different technologies is a necessity.

CSAMA 2018 – Statistical Data Analysis for Genome Biology

CSAMA 2018 (16th edition)
Statistical Data Analysis for Genome Scale Biology
Bressanone-Brixen, Italy (South Tyrol Alps)
July 8-13, 2018

Lecturers:

  • Vincent J. Carey, Harvard Medical School
  • Laurent Gatto, University of Cambridge
  • Robert Gentleman, 23andMe, Mountain View
  • Laleh Haghverdi, European Molecular Biology Laboratory (EMBL), Heidelberg
  • Wolfgang Huber, European Molecular Biology Laboratory (EMBL), Heidelberg
  • Michael I. Love, University of North Carolina-Chapel Hill
  • Martin Morgan, Roswell Park Comprehensive Cancer Center, Buffalo
  • Johannes Rainer, European Academy of Bozen (EURAC)
  • Charlotte Soneson, University of Zurich
  • Levi Waldron, CUNY School of Public Health at Hunter College, New York

Teaching Assistants:

  • Simone Bell, EMBL, Heidelberg
  • Vladislav Kim, EMBL, Heidelberg
  • Lori Shepherd, RPCCC, Buffalo
  • Mike L. Smith, EMBL, Heidelberg

The one-week intensive course Statistical Data Analysis for Genome-Scale Biology teaches statistical and computational analysis of multi-omics studies in biology and biomedicine. It covers the underlying theory and state of the art (the morning lectures) and practical hands-on exercises based on the R / Bioconductor environment (the afternoon labs). At the end of the course, you should be able to run analysis workflows on your own (multi-)omic data, adapt and combine different tools, and make informed and scientifically sound choices about analysis strategies.

Topics include:

  • Introduction to R and Bioconductor
  • The elements of statistics: hypothesis testing, multiple testing, regression, regularization, clustering and classification, parallelization and performance (machine learning), visualisation
  • RNA-Seq data analysis
  • Computing with sequences and genomic intervals
  • Working with annotation – genes, genomic features, variants, transcripts and proteins
  • Gene set enrichment analysis
  • Mass spec proteomics and metabolomics
  • Basis of microbiome analysis
  • Experimental design, batch effects and confounding
  • Reproducible research and workflow authoring with R markdown
  • Package development, version control and developer tools (incl. git, github, RStudio)
  • Working with large data: performance parallelisation and cloud computing

The course consists of

  • morning lectures: 20 x 45 minutes: Monday to Friday 8:30h – 12:00h
  • 4 practical computer tutorials in the afternoons (13:30h – 16:30h) on Monday, Tuesday, Thursday and Friday

Visit the course’s website at: http://www.huber.embl.de/csama