Last updated: 2020-05-29

Checks: 6 1

Knit directory: Proteomics/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200227)

The command set.seed(20200227) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 3fb50c5

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/complexAnalysis_IGHV_cache/
    Ignored:    analysis/complexAnalysis_trisomy12_alteredPQR_cache/
    Ignored:    analysis/complexAnalysis_trisomy12_cache/
    Ignored:    analysis/correlateCLLPD_cache/
    Ignored:    code/.Rhistory
    Ignored:    data/.DS_Store
    Ignored:    output/.DS_Store

Untracked files:
    Untracked:  analysis/CNVanalysis_11q.Rmd
    Untracked:  analysis/CNVanalysis_trisomy12.Rmd
    Untracked:  analysis/CNVanalysis_trisomy19.Rmd
    Untracked:  analysis/analysisSplicing.Rmd
    Untracked:  analysis/analysisTrisomy19.Rmd
    Untracked:  analysis/annotateCNV.Rmd
    Untracked:  analysis/complexAnalysis_IGHV.Rmd
    Untracked:  analysis/complexAnalysis_trisomy12.Rmd
    Untracked:  analysis/correlateGenomic_PC12adjusted.Rmd
    Untracked:  analysis/correlateGenomic_noBlock.Rmd
    Untracked:  analysis/correlateGenomic_noBlock_MCLL.Rmd
    Untracked:  analysis/correlateGenomic_noBlock_UCLL.Rmd
    Untracked:  analysis/default.css
    Untracked:  analysis/del11q.pdf
    Untracked:  analysis/del11q_norm.pdf
    Untracked:  analysis/peptideValidate.Rmd
    Untracked:  analysis/plotExpressionCNV.Rmd
    Untracked:  analysis/processPeptides_LUMOS.Rmd
    Untracked:  analysis/style.css
    Untracked:  analysis/trisomy12.pdf
    Untracked:  analysis/trisomy12_AFcor.Rmd
    Untracked:  analysis/trisomy12_norm.pdf
    Untracked:  code/AlteredPQR.R
    Untracked:  code/utils.R
    Untracked:  data/190909_CLL_prot_abund_med_norm.tsv
    Untracked:  data/190909_CLL_prot_abund_no_norm.tsv
    Untracked:  data/20190423_Proteom_submitted_samples_bereinigt.xlsx
    Untracked:  data/20191025_Proteom_submitted_samples_final.xlsx
    Untracked:  data/LUMOS/
    Untracked:  data/LUMOS_peptides/
    Untracked:  data/LUMOS_protAnnotation.csv
    Untracked:  data/LUMOS_protAnnotation_fix.csv
    Untracked:  data/SampleAnnotation_cleaned.xlsx
    Untracked:  data/example_proteomics_data
    Untracked:  data/facTab_IC50atLeast3New.RData
    Untracked:  data/gmts/
    Untracked:  data/mapEnsemble.txt
    Untracked:  data/mapSymbol.txt
    Untracked:  data/proteins_in_complexes
    Untracked:  data/pyprophet_export_aligned.csv
    Untracked:  data/timsTOF_protAnnotation.csv
    Untracked:  output/LUMOS_processed.RData
    Untracked:  output/cnv_plots.zip
    Untracked:  output/cnv_plots/
    Untracked:  output/cnv_plots_norm.zip
    Untracked:  output/dxdCLL.RData
    Untracked:  output/exprCNV.RData
    Untracked:  output/pepCLL_lumos.RData
    Untracked:  output/pepTab_lumos.RData
    Untracked:  output/plotCNV_allChr11_diff.pdf
    Untracked:  output/plotCNV_del11q_sum.pdf
    Untracked:  output/proteomic_LUMOS_20200227.RData
    Untracked:  output/proteomic_LUMOS_20200320.RData
    Untracked:  output/proteomic_LUMOS_20200430.RData
    Untracked:  output/proteomic_timsTOF_20200227.RData
    Untracked:  output/splicingResults.RData
    Untracked:  output/timsTOF_processed.RData
    Untracked:  plotCNV_del11q_diff.pdf

Unstaged changes:
    Modified:   analysis/_site.yml
    Modified:   analysis/analysisSF3B1.Rmd
    Modified:   analysis/compareProteomicsRNAseq.Rmd
    Modified:   analysis/correlateCLLPD.Rmd
    Modified:   analysis/correlateGenomic.Rmd
    Deleted:    analysis/correlateGenomic_removePC.Rmd
    Modified:   analysis/correlateMIR.Rmd
    Modified:   analysis/correlateMethylationCluster.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/predictOutcome.Rmd
    Modified:   analysis/processProteomics_LUMOS.Rmd
    Modified:   analysis/qualityControl_LUMOS.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.

load("../../var/ddsrna_180717.RData")
load("../../var/patmeta_200522.RData")
load("../../var/proteomic_LUMOS_20200430.RData")
load("../output/exprCNV.RData")

Subset samples and define 11q deleted region.

Only genes on chr11 will be considered in this analysis

protCLL$del11q <- patMeta[match(colnames(protCLL),patMeta$Patient.ID),]$del11q
patTab <- colData(protCLL) %>% data.frame() %>% rownames_to_column("patID") %>%
  filter(!is.na(del11q))
dds <- dds[,dds$PatID %in% patTab$patID]
allRnaTab <- filter(allRnaTab, patID %in% patTab$patID, ChromID %in% "chr11" )
allProtTab <- filter(allProtTab, patID %in% patTab$patID, ChromID %in% "chr11")

I will define the region of 11q22.3 to 11q23.2 as the commonly deleted region, as most del11q samples show deletion in this region. Let me know if this is appropriate. This region will be refered to as 11q in the analysis

start <- 102.9
end <- 114.5
allRnaTab <- mutate(allRnaTab, ChromID = ifelse(start_position > start & end_position < end, "11q", "other"))
allProtTab <- mutate(allProtTab, ChromID = ifelse(start_position > start & end_position < end, "11q", "other"))

Compare gene dosage effect on RNA and protein level

Gene dosage effect on RNA level

#remove genes never expressed
noExpTab <- group_by(allRnaTab, id) %>% summarise(sumExpr = sum(expr)) %>%
  filter(sumExpr == 0)

rnaExprTab <- allRnaTab %>% filter(!id%in% noExpTab$id) %>%
  mutate(del11q = patMeta[match(patID, patMeta$Patient.ID),]$del11q) %>%
  filter(!is.na(del11q)) %>% mutate(cnv = ifelse(del11q %in% 1, "del11q","wt"))

Compare expression levels of 11q genes in del11q and wt samples

Raw counts

meanExprChr11 <- rnaExprTab %>% filter(ChromID %in% "11q") %>%
  group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup()

ggplot(meanExprChr11, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)

There is no strong difference.

Expression values centered by mean

meanExprChr11 <- rnaExprTab %>% filter(ChromID %in% "11q") %>%
  group_by(id) %>% mutate(med=mean(expr),sd = sd(expr)) %>% mutate(expr = (expr-med)) %>%
  group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup()
ggplot(meanExprChr11, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)

Now the difference is clearly visible. Genes on 11q have lower expression.

Mean expression difference between del11q and wt samples for the genes on 11q and other regions

plotTab <- rnaExprTab %>%
  group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key = cnv, value = meanExpr) %>%
  mutate(diff = del11q-wt)

ggplot(plotTab, aes(x=diff, fill = ChromID, y = ..density..)) + geom_histogram(position = "identity", alpha=0.5) +
  xlab("Mean expression difference")

Gene dosage effect on protein level

protExprTab <- allProtTab %>% 
  mutate(del11q = patMeta[match(patID, patMeta$Patient.ID),]$del11q) %>%
  filter(!is.na(del11q)) %>% mutate(cnv = ifelse(del11q %in% 1, "del11q","wt"))

Compare expression levels of 11q genes in del11q and wt samples

Raw counts

meanExprChr11 <- protExprTab %>% filter(ChromID %in% "11q") %>%
  group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup()

ggplot(meanExprChr11, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)

Very few proteins in this region.

Expression values centered by mean

meanExprChr11 <- protExprTab %>% filter(ChromID %in% "11q") %>%
  group_by(id) %>% mutate(med=mean(expr),sd = sd(expr)) %>% mutate(expr = (expr-med)) %>%
  group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup()
ggplot(meanExprChr11, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)

Now difference is clearly visible.

Mean protein expression difference between del11q and wt samples for the genes on 11q and other regions

plotTab <- protExprTab %>%
  group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key = cnv, value = meanExpr) %>%
  mutate(diff = del11q-wt)

ggplot(plotTab, aes(x=diff, fill = ChromID, y = ..density..)) + geom_histogram(position = "identity", alpha=0.5)

Mostly show lower expression, but there are also some outliers

Compare the gene dosage effect between RNA and protein data

Variance of 11q gene expression

varRna <- filter(rnaExprTab, ChromID == "11q") %>%
  group_by(id,del11q) %>% summarise(sd = sd(expr)) %>%
  mutate(set = "rna")
varProt <- filter(protExprTab, ChromID == "11q") %>%
  group_by(id,del11q) %>% summarise(sd = sd(expr)) %>%
  mutate(set = "protein")
plotTab <- bind_rows(varRna, varProt) %>% ungroup()

ggplot(plotTab, aes(x=sd, fill = set, y = ..density..)) + 
  geom_histogram(position = "identity", alpha=0.5, bins = 100)+
  facet_wrap(~del11q) + xlab("Expression variance")

The variance of RNA expression is higher than protein expression, which is an indication of buffering. The trend is the same in samples with or without del11q

Expression value (centered by mean)

expRna <- filter(rnaExprTab, ChromID == "11q") %>%
  group_by(id) %>% mutate(meanVal = mean(expr)) %>%
  mutate(expr = expr-meanVal) %>%
  group_by(id,del11q) %>%
  summarise(meanExpr = mean(expr)) %>%
  mutate(set = "rna")
expProt <- filter(protExprTab, ChromID == "11q") %>%
  group_by(id) %>% mutate(meanVal = mean(expr)) %>%
  mutate(expr = expr-meanVal) %>%
  group_by(id,del11q) %>%
  summarise(meanExpr = mean(expr)) %>%
  mutate(set = "protein")
plotTab <- bind_rows(expRna, expProt)

ggplot(plotTab, aes(x=meanExpr, fill = set, y = ..density..)) + 
  geom_histogram(position = "identity", alpha=0.5, bins = 100) +
  facet_wrap(~del11q) +
  xlab("Mean expression")

The RNA expression change is larger than protein expression change.

Plot the log fold change of RNA and proteins on 11q (del11q VS WT)

protDiffTab <- protExprTab %>%
  group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key = cnv, value = meanExpr) %>%
  mutate(diffProt = log(del11q/wt)) %>%
  mutate(chr = ifelse(ChromID == "11q","11q","Other")) %>%
  select(id, diffProt, chr)

rnaDiffTab <- rnaExprTab %>%
  group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key = cnv, value = meanExpr) %>%
  mutate(diffRna = log(del11q/wt)) %>% select(id, diffRna)

compareTab <- left_join(protDiffTab, rnaDiffTab, by = "id") %>%
  filter(!is.na(diffProt),!is.na(diffRna)) %>%
  gather(key = dataset, value=diff,-id,-chr) %>%
  mutate(group =paste0(chr,"_",dataset)) %>%
  filter(chr == "11q")
  

ggplot(compareTab, aes(x=diff, fill = group, y = ..density..)) + 
  geom_histogram(position = "identity", alpha=0.5, bins = 100) +
  geom_vline(xintercept = 0, color = "blue", linetype ="dashed") +
  xlab("Expression different (del11q - wt)")

Similar as the plot above, it can be seen that RNA expression change is larger than the protein expression change. This also indicates a buffering effect. Although it’s not a complete buffering, as the average protein expression difference is still larger than 0.

Analysis of the buffering effect

Quantifying buffering effect by comparing magnitude of differential expression

Differential expression in proteomics

gene11q <- intersect(filter(allProtTab, ChromID == "11q")$id, filter(allRnaTab, ChromID =="11q")$id)
#subset samples
overSample <- intersect(dds$PatID, colnames(protCLL))
protSub <- protCLL[rowData(protCLL)$ensembl_gene_id %in% gene11q ,overSample]
rownames(protSub) <- rowData(protSub)$ensembl_gene_id
designMat <- data.frame(row.names = colnames(protSub),
                        del11q = protSub$del11q)
exprMat <- assays(protSub)[["count"]]

#testing

fit <- proDA(exprMat, design = ~ ., col_data = designMat )
  
resTab.prot <- test_diff(fit, contrast = "del11q1") %>%
  select(name, pval, diff, t_statistic, adj_pval) %>%
  dplyr::rename(id = name, logFC.prot = diff, stat.prot = t_statistic, pval.prot = pval, padj.prot = adj_pval)

Differential expression in RNAseq

#subset samples
overSample <- intersect(dds$PatID, colnames(protCLL))
ddsSub <- dds[gene11q,overSample]
ddsSub$IGHV <- protSub[,ddsSub$PatID]$IGHV.status
ddsSub$del11q <- protSub[,ddsSub$PatID]$del11q

design(ddsSub) <- ~ del11q
#testing
deRes <- DESeq(ddsSub)
  
resTab.rna <- results(deRes, name = "del11q_1_vs_0", tidy = TRUE) %>%
  select(row, log2FoldChange, stat, pvalue, padj) %>%
  dplyr::rename(id = row, logFC.rna = log2FoldChange, stat.rna = stat, pval.rna = pvalue, padj.rna = padj) %>%
  mutate(logFC.rna = logFC.rna*log(2)) #change to log fold change, same as proteomic

Define a buffering score

comTab <- left_join(resTab.prot, resTab.rna, by = "id") %>%
  mutate(symbol = rowData(dds[id,])$symbol)

Only genes that are down-regulated are considered. Otherwise it’s hard to intepret the dosage effect.

bufferTab <- comTab %>% filter(stat.rna < 0, stat.prot<0) %>%
  ungroup() %>%
  mutate(stat.prot.sqrt = sqrt(abs(stat.prot)),
         stat.prot.center = stat.prot.sqrt - mean(stat.prot.sqrt)) %>%
  mutate(score = stat.prot.center*stat.rna) %>%
  mutate(ifBuffer = case_when(
    padj.prot < 0.1 & padj.rna < 0.1 ~ "noBuffer",
    padj.prot > 0.1 & padj.rna < 0.1 ~ "Buffered",
    padj.prot < 0.1 & padj.rna > 0.1 ~ "Enhanced",
    TRUE ~ "undetermined"
  )) %>%
  arrange(desc(score))

Here I use two ways to quantify the buffering effect:

A buffering score, which is based on the difference of log fold change between protein and rna dataset and the t-statistics of the differentially expressed RNAs. The purpose is to give the gene that show significant and strong RNA change, but little protein change a high buffering score. While the genes that do not show strong RNA expression change will have a score close to zero. And the genes that show both strong protein and RNA expression change a more negative score.
A categorical variable, “ifBuffer”, based on the the significance of differential expression (10% FDR). The genes that show both significant protein and RNA up-regulation are in the “noBuffer” group, while the genes that show significant RNA-up-regulation but no significant protein expression change are in the “Buffered” group. The “Enhanced” group contains the genes that do not show significant changes in RNA level but with significant changes in protein level. The buffering score can not differentiate this group and will categorize it as undetermined. But the genes in this group, although pretty rare, may also be potentially interesting. Other genes are in the “undetermined” group.

ggplot(bufferTab, aes(x=ifBuffer,y=score, fill = ifBuffer)) + geom_boxplot() + geom_point()

The buffering score and the categorical variable are related. Perhaps the buffering score can estimate more subtle effect, like the degree of buffering.

table(bufferTab$ifBuffer)


    Buffered     noBuffer undetermined 
           3            8            1

Compare fold change in RNA expression and protein expression

ggplot(bufferTab, aes(x=logFC.rna, y=logFC.prot)) + geom_point(aes(col = ifBuffer)) +
  xlab("RNA log fold change") + ylab("Protein log fold change") +
  geom_smooth(method = "lm")

Table to show buffering effect

select(bufferTab, symbol, ifBuffer, score, padj.prot, padj.rna, logFC.prot,logFC.rna) %>%
  mutate_if(is.numeric, formatC, digits=2) %>%
  DT::datatable()

Buffered proteins have higher scores and not buffered proteins have lower scores

Plot most and least buffered genes

ALl buffered proteins

geneList <- filter(bufferTab, ifBuffer == "Buffered")$id
pList <- lapply(geneList, function(i) {
  tabProt <- allProtTab %>% filter(id == i) %>%
    select(id, patID, symbol,expr) %>% dplyr::rename(protExpr = expr)
  tabRna <- allRnaTab %>% filter(id == i) %>%
    select(id, patID, expr) %>% dplyr::rename(rnaExpr = expr)
  plotTab <- left_join(tabProt, tabRna, by = c("id","patID")) %>% 
    filter(!is.na(protExpr), !is.na(rnaExpr)) %>%
    mutate(del11q = patMeta[match(patID, patMeta$Patient.ID),]$del11q)
  p <- ggplot(plotTab, aes(x=rnaExpr, y = protExpr)) +
    geom_point(aes(col=del11q)) + geom_smooth(method="lm") + ggtitle(unique(plotTab$symbol)) +
    theme(legend.position = "bottom")
  ggMarginal(p, type = "histogram", groupFill = TRUE)
  })
cowplot::plot_grid(plotlist = pList, ncol=3)

All non-buffered proteins

geneList <- filter(bufferTab, ifBuffer == "noBuffer")$id
pList <- lapply(geneList, function(i) {
  tabProt <- allProtTab %>% filter(id == i) %>%
    select(id, patID, symbol,expr) %>% dplyr::rename(protExpr = expr)
  tabRna <- allRnaTab %>% filter(id == i) %>%
    select(id, patID, expr) %>% dplyr::rename(rnaExpr = expr)
  plotTab <- left_join(tabProt, tabRna, by = c("id","patID")) %>% 
    filter(!is.na(protExpr), !is.na(rnaExpr)) %>%
    mutate(del11q = patMeta[match(patID, patMeta$Patient.ID),]$del11q)
  p <- ggplot(plotTab, aes(x=rnaExpr, y = protExpr)) +
    geom_point(aes(col=del11q)) + geom_smooth(method="lm") +
    ggtitle(unique(plotTab$symbol)) +
    theme(legend.position = "bottom")
  ggMarginal(p, type = "histogram", groupFill = TRUE)
  })
cowplot::plot_grid(plotlist = pList, ncol=3)

Whether buffering is related to protein complexes

int_pairs = read_delim("../data/proteins_in_complexes", delim = "\t") %>%
  mutate(symbolA = rowData(protCLL)[match(ProtA, rownames(protCLL)),]$hgnc_symbol,
         symbolB = rowData(protCLL)[match(ProtB, rownames(protCLL)),]$hgnc_symbol) %>%
  filter(!is.na(symbolA),!is.na(symbolB))

bufferTab <- mutate(bufferTab, inComplex = ifelse(symbol %in% c(int_pairs$symbolA, int_pairs$symbolB), TRUE, FALSE))

Plot the buffering scores of proteins in complex and not in complex

ggplot(bufferTab, aes(x=inComplex, y=score)) + geom_boxplot() + geom_point()

t.test(score~inComplex, bufferTab, var.equal= TRUE)


    Two Sample t-test

data:  score by inComplex
t = -0.19566, df = 10, p-value = 0.8488
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -6.761654  5.669999
sample estimates:
mean in group FALSE  mean in group TRUE 
          -1.899900           -1.354073

No significant differences can be observed.

There’s no significant associations between whether the protein is in complex and whether the protein expression is buffered.

Plot buffering effect on genomic coordinates

plotFoldGenome <- function(bufferTab, allBand, allProtTab, chr, region = c(-Inf,Inf),
                           ifTrend = FALSE, maxVal =1, minVal=-1) {
  
  #table for cyto band
  bandTab <- filter(allBand, ChromID == chr, chromStart >= region[1], chromEnd <= region[2]) %>%
    mutate(chromMid = chromMid)
  
  #table for fold change
  protCoordTab <- allProtTab %>% distinct(symbol, start_position, end_position, mid_position)
  foldTab <- bufferTab %>% select(symbol, logFC.prot, logFC.rna, score, ifBuffer) %>%
    gather(key = "set", value = "logFC", -symbol, -score,-ifBuffer) %>%
    left_join(protCoordTab)
  bufferLine <- filter(bufferTab, ifBuffer %in% c("Buffered","noBuffer")) %>%
    left_join(protCoordTab) %>%
    distinct(symbol, mid_position, logFC.prot, logFC.rna, ifBuffer) %>%
    mutate(minY = ifelse(logFC.prot > logFC.rna, logFC.rna, logFC.prot),
           maxY= ifelse(logFC.prot > logFC.rna, logFC.prot, logFC.rna))
  
  xMax <- max(bandTab$chromEnd, na.rm = T)
  xMin <- min(bandTab$chromStart, na.rm=T)
  #main plot for Protein
  gPro <- ggplot() + 
    geom_rect(data=bandTab, mapping=aes(xmin=chromStart, xmax=chromEnd, ymin=minVal, ymax=maxVal, 
                                        fill=Colour, label = band), alpha=0.1) +
    geom_text(data=bandTab, mapping=aes(label=band, x=chromMid), y=maxVal, hjust =1, angle = 90, size=2.5) +
    geom_rect(data = foldTab, 
            mapping=aes(xmin=start_position,
                        xmax=end_position, ymin=logFC, ymax=logFC+0.1,
                        fill = set)) +
    geom_segment(data = bufferLine, aes(x=mid_position, xend = mid_position, 
                                        y=minY + 0.1, yend = maxY, col = ifBuffer),
                 linetype = "dashed") +
    scale_x_continuous(expand=c(0,0),limits = c(xMin,xMax)) +
    xlab("Genomic position [Mb]") + 
    ylab("Log Fold Change") + 
    scale_fill_manual(values = c(even = "white",odd = "grey50",
                                logFC.rna = "orange", logFC.prot = "darkblue")) +
    scale_color_manual(values = c(logFC.rna = "orange",logFC.prot = "darkblue",
                                  Buffered = "red",noBuffer = "green")) +
    ggtitle(paste0("Log fold change comparison","(",chr,")")) +
    ggrepel::geom_text_repel(data = bufferLine,
                                 aes(x=mid_position, y=logFC.prot, label = symbol, col = ifBuffer)) +
    theme(plot.title = element_text(face = "bold", size = 10, hjust = 0.3),
        legend.position = "none",
        panel.background = element_blank(),
        panel.grid.major = element_line(colour="grey90", size=0.1))
  
    if (ifTrend) {
      gPro <- gPro + stat_smooth(data =foldTab, geom="line",
                mapping = aes(y=logFC, x= mid_position,
                              color = set), 
                formula = y ~ x, method = "loess", se=FALSE, span=0.2,
                size =0.5, alpha=0.5)
    }
    
   
    #for legend
    ## if the patient has CNV data
    lgTab <- tibble(x= seq(6),y=seq(6),
                    Dataset = c(rep("logFC.rna",3), rep("logFC.prot",3)),
                    ifBuffer = c(rep("Buffered",3), rep("noBuffer",3)))
  
    lg <- ggplot(lgTab, aes(x=x,y=y)) +
      geom_point(aes(fill = Dataset, color = ifBuffer), shape =22,size=3) + 
      scale_fill_manual(values = c(logFC.rna = "orange", logFC.prot = "darkblue")) +
      scale_color_manual(values = c(logFC.rna = "orange",logFC.prot = "darkblue",
                                  Buffered = "red",noBuffer = "green")) + 
      theme(legend.position = "bottom")
    
    lg <- get_legend(lg)
    
    return(list(plot = gPro, legend = lg))
}

band11q <- filter(allBand, ChromID == "chr11", band %in% c("q22.3","q23.1","q23.2")) %>%
  mutate(ChromID = "11q")
g <- plotFoldGenome(bufferTab, band11q, allProtTab, "11q", region = c(-Inf,Inf),
                           ifTrend = FALSE, maxVal =0.5, minVal=-2)
pg <- plot_grid(g$plot, g$legend, ncol = 1, rel_heights = c(1,0.1))
pg

ggsave(filename = "../public/del11q_buffer_plot.pdf", plot = pg, device = "pdf", height = 10, width = 18)

PDF version: del11q_buffer_plot.pdf

In this plot, the y axis in the log fold change of either protein (blue) or RNA (orange) expression (del11q vs WT). If there’s a “Buffering” effect, the protein and rna is connected by a red dotted line. The noBuffer effect will be joined by a green dotted line.

Summary:

Similar as trisomy12, the gene dosage effect of del11q is visible in both RNA expression and protein expression, as compared to WT samples. And it seems there’s less buffering based on the ratio of buffered proteins, compared to trisomy12 and trisomy19. This maybe because it’s more difficult to buffer a deletion than a gain.
Same as trisomy12 and trisomy19, there’s no significant association between the buffering effect and whether the protein is in complex or not.

sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.15.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] forcats_0.4.0               stringr_1.4.0              
 [3] dplyr_0.8.5                 purrr_0.3.3                
 [5] readr_1.3.1                 tidyr_1.0.0                
 [7] tibble_3.0.0                tidyverse_1.3.0            
 [9] cowplot_0.9.4               ggplot2_3.3.0              
[11] ggExtra_0.9                 proDA_1.1.2                
[13] jyluMisc_0.1.5              piano_2.0.2                
[15] DESeq2_1.24.0               SummarizedExperiment_1.14.0
[17] DelayedArray_0.10.0         BiocParallel_1.18.0        
[19] matrixStats_0.54.0          Biobase_2.44.0             
[21] GenomicRanges_1.36.0        GenomeInfoDb_1.20.0        
[23] IRanges_2.18.1              S4Vectors_0.22.0           
[25] BiocGenerics_0.30.0        

loaded via a namespace (and not attached):
  [1] readxl_1.3.1           backports_1.1.4        Hmisc_4.2-0           
  [4] fastmatch_1.1-0        drc_3.0-1              workflowr_1.6.0       
  [7] igraph_1.2.4.1         shinydashboard_0.7.1   splines_3.6.0         
 [10] crosstalk_1.0.0        TH.data_1.0-10         digest_0.6.19         
 [13] htmltools_0.4.0        gdata_2.18.0           magrittr_1.5          
 [16] checkmate_2.0.0        memoise_1.1.0          cluster_2.1.0         
 [19] openxlsx_4.1.0.1       limma_3.40.2           annotate_1.62.0       
 [22] modelr_0.1.5           sandwich_2.5-1         colorspace_1.4-1      
 [25] ggrepel_0.8.1          rvest_0.3.5            blob_1.1.1            
 [28] haven_2.2.0            xfun_0.8               crayon_1.3.4          
 [31] RCurl_1.95-4.12        jsonlite_1.6           genefilter_1.66.0     
 [34] survival_2.44-1.1      zoo_1.8-6              glue_1.3.2            
 [37] survminer_0.4.4        gtable_0.3.0           zlibbioc_1.30.0       
 [40] XVector_0.24.0         car_3.0-3              abind_1.4-5           
 [43] scales_1.1.0           mvtnorm_1.0-11         DBI_1.0.0             
 [46] relations_0.6-8        miniUI_0.1.1.1         Rcpp_1.0.1            
 [49] plotrix_3.7-6          cmprsk_2.2-8           xtable_1.8-4          
 [52] htmlTable_1.13.1       foreign_0.8-71         bit_1.1-14            
 [55] km.ci_0.5-2            Formula_1.2-3          DT_0.7                
 [58] httr_1.4.1             htmlwidgets_1.3        fgsea_1.10.0          
 [61] gplots_3.0.1.1         RColorBrewer_1.1-2     acepack_1.4.1         
 [64] ellipsis_0.2.0         farver_2.0.3           pkgconfig_2.0.2       
 [67] XML_3.98-1.20          dbplyr_1.4.2           nnet_7.3-12           
 [70] locfit_1.5-9.1         labeling_0.3           tidyselect_1.0.0      
 [73] rlang_0.4.5            later_0.8.0            AnnotationDbi_1.46.0  
 [76] munsell_0.5.0          cellranger_1.1.0       tools_3.6.0           
 [79] visNetwork_2.0.7       cli_1.1.0              generics_0.0.2        
 [82] RSQLite_2.1.1          broom_0.5.2            evaluate_0.14         
 [85] yaml_2.2.0             knitr_1.23             bit64_0.9-7           
 [88] fs_1.4.0               zip_2.0.2              survMisc_0.5.5        
 [91] caTools_1.17.1.2       nlme_3.1-140           mime_0.7              
 [94] slam_0.1-45            xml2_1.2.2             compiler_3.6.0        
 [97] rstudioapi_0.10        curl_3.3               ggsignif_0.5.0        
[100] marray_1.62.0          reprex_0.3.0           geneplotter_1.62.0    
[103] stringi_1.4.3          lattice_0.20-38        Matrix_1.2-17         
[106] KMsurv_0.1-5           shinyjs_1.0            vctrs_0.2.4           
[109] pillar_1.4.3           lifecycle_0.2.0        data.table_1.12.2     
[112] bitops_1.0-6           httpuv_1.5.1           extraDistr_1.8.11     
[115] R6_2.4.0               latticeExtra_0.6-28    promises_1.0.1        
[118] KernSmooth_2.23-15     gridExtra_2.3          rio_0.5.16            
[121] codetools_0.2-16       MASS_7.3-51.4          gtools_3.8.1          
[124] exactRankTests_0.8-30  assertthat_0.2.1       rprojroot_1.3-2       
[127] withr_2.1.2            multcomp_1.4-10        GenomeInfoDbData_1.2.1
[130] mgcv_1.8-28            hms_0.5.2              grid_3.6.0            
[133] rpart_4.1-15           rmarkdown_1.13         carData_3.0-2         
[136] ggpubr_0.2.1           git2r_0.26.1           maxstat_0.7-25        
[139] sets_1.0-18            lubridate_1.7.4        shiny_1.3.2           
[142] base64enc_0.1-3

Analysis of gene dosage effect related to deletion of 11q region

Junyan Lu

2020-02-27

Subset samples and define 11q deleted region.

Compare gene dosage effect on RNA and protein level

Gene dosage effect on RNA level

Compare expression levels of 11q genes in del11q and wt samples

Raw counts

Expression values centered by mean

Mean expression difference between del11q and wt samples for the genes on 11q and other regions

Gene dosage effect on protein level

Compare expression levels of 11q genes in del11q and wt samples

Raw counts

Expression values centered by mean

Mean protein expression difference between del11q and wt samples for the genes on 11q and other regions

Compare the gene dosage effect between RNA and protein data

Variance of 11q gene expression

Expression value (centered by mean)

Plot the log fold change of RNA and proteins on 11q (del11q VS WT)

Analysis of the buffering effect

Quantifying buffering effect by comparing magnitude of differential expression

Differential expression in proteomics

Differential expression in RNAseq

Define a buffering score

Here I use two ways to quantify the buffering effect:

Compare fold change in RNA expression and protein expression

Table to show buffering effect

Plot most and least buffered genes

ALl buffered proteins

All non-buffered proteins

Plot buffering effect on genomic coordinates

Summary: