Last updated: 2020-05-29

Checks: 6 1

Knit directory: Proteomics/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200227)

The command set.seed(20200227) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 3fb50c5

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/complexAnalysis_IGHV_cache/
    Ignored:    analysis/complexAnalysis_trisomy12_alteredPQR_cache/
    Ignored:    analysis/complexAnalysis_trisomy12_cache/
    Ignored:    analysis/correlateCLLPD_cache/
    Ignored:    code/.Rhistory
    Ignored:    data/.DS_Store
    Ignored:    output/.DS_Store

Untracked files:
    Untracked:  analysis/CNVanalysis_trisomy12.Rmd
    Untracked:  analysis/CNVanalysis_trisomy19.Rmd
    Untracked:  analysis/analysisSplicing.Rmd
    Untracked:  analysis/analysisTrisomy19.Rmd
    Untracked:  analysis/annotateCNV.Rmd
    Untracked:  analysis/complexAnalysis_IGHV.Rmd
    Untracked:  analysis/complexAnalysis_trisomy12.Rmd
    Untracked:  analysis/correlateGenomic_PC12adjusted.Rmd
    Untracked:  analysis/correlateGenomic_noBlock.Rmd
    Untracked:  analysis/correlateGenomic_noBlock_MCLL.Rmd
    Untracked:  analysis/correlateGenomic_noBlock_UCLL.Rmd
    Untracked:  analysis/default.css
    Untracked:  analysis/del11q.pdf
    Untracked:  analysis/del11q_norm.pdf
    Untracked:  analysis/peptideValidate.Rmd
    Untracked:  analysis/plotCNV_del11q.pdf
    Untracked:  analysis/plotExpressionCNV.Rmd
    Untracked:  analysis/processPeptides_LUMOS.Rmd
    Untracked:  analysis/style.css
    Untracked:  analysis/trisomy12.pdf
    Untracked:  analysis/trisomy12_AFcor.Rmd
    Untracked:  analysis/trisomy12_norm.pdf
    Untracked:  code/AlteredPQR.R
    Untracked:  code/utils.R
    Untracked:  data/190909_CLL_prot_abund_med_norm.tsv
    Untracked:  data/190909_CLL_prot_abund_no_norm.tsv
    Untracked:  data/20190423_Proteom_submitted_samples_bereinigt.xlsx
    Untracked:  data/20191025_Proteom_submitted_samples_final.xlsx
    Untracked:  data/LUMOS/
    Untracked:  data/LUMOS_peptides/
    Untracked:  data/LUMOS_protAnnotation.csv
    Untracked:  data/LUMOS_protAnnotation_fix.csv
    Untracked:  data/SampleAnnotation_cleaned.xlsx
    Untracked:  data/example_proteomics_data
    Untracked:  data/facTab_IC50atLeast3New.RData
    Untracked:  data/gmts/
    Untracked:  data/mapEnsemble.txt
    Untracked:  data/mapSymbol.txt
    Untracked:  data/proteins_in_complexes
    Untracked:  data/pyprophet_export_aligned.csv
    Untracked:  data/timsTOF_protAnnotation.csv
    Untracked:  output/LUMOS_processed.RData
    Untracked:  output/cnv_plots.zip
    Untracked:  output/cnv_plots/
    Untracked:  output/cnv_plots_norm.zip
    Untracked:  output/dxdCLL.RData
    Untracked:  output/exprCNV.RData
    Untracked:  output/pepCLL_lumos.RData
    Untracked:  output/pepTab_lumos.RData
    Untracked:  output/plotCNV_allChr11_diff.pdf
    Untracked:  output/plotCNV_del11q_sum.pdf
    Untracked:  output/proteomic_LUMOS_20200227.RData
    Untracked:  output/proteomic_LUMOS_20200320.RData
    Untracked:  output/proteomic_LUMOS_20200430.RData
    Untracked:  output/proteomic_timsTOF_20200227.RData
    Untracked:  output/splicingResults.RData
    Untracked:  output/timsTOF_processed.RData
    Untracked:  plotCNV_del11q_diff.pdf

Unstaged changes:
    Modified:   analysis/_site.yml
    Modified:   analysis/analysisSF3B1.Rmd
    Modified:   analysis/compareProteomicsRNAseq.Rmd
    Modified:   analysis/correlateCLLPD.Rmd
    Modified:   analysis/correlateGenomic.Rmd
    Deleted:    analysis/correlateGenomic_removePC.Rmd
    Modified:   analysis/correlateMIR.Rmd
    Modified:   analysis/correlateMethylationCluster.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/predictOutcome.Rmd
    Modified:   analysis/processProteomics_LUMOS.Rmd
    Modified:   analysis/qualityControl_LUMOS.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.

load("../../var/ddsrna_180717.RData")
load("../../var/patmeta_200522.RData")
load("../../var/proteomic_LUMOS_20200430.RData")
load("../output/exprCNV.RData")

Compare gene dosage effect on RNA and protein level

Gene dosage effect on RNA level

#remove genes never expressed
noExpTab <- group_by(allRnaTab, id) %>% summarise(sumExpr = sum(expr)) %>%
  filter(sumExpr == 0)

#mean variance trend
meanVarTab <- group_by(allRnaTab, id) %>%
  summarise(meanVal = mean(expr),varVal = var(expr)) 
plot(meanVarTab$meanVal, meanVarTab$varVal)

#looks fine for exploratory analysis

rnaExprTab <- allRnaTab %>% filter(!id%in% noExpTab$id) %>%
  mutate(trisomy12 = patMeta[match(patID, patMeta$Patient.ID),]$trisomy12) %>%
  filter(!is.na(trisomy12)) %>% mutate(cnv = ifelse(trisomy12 %in% 1, "trisomy12","wt"))

Compare expression levels of Chr12 genes in tri12 and wt samples

Raw counts

meanExprChr12 <- rnaExprTab %>% filter(ChromID %in% "chr12") %>%
  group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup()

ggplot(meanExprChr12, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)

There is no strong different, this is because the baseline expression values among genes are much larger than the relative expression difference between trisomy12 and wt samples.

Expression values centered by mean

meanExprChr12 <- rnaExprTab %>% filter(ChromID %in% "chr12") %>%
  group_by(id) %>% mutate(med=mean(expr),sd = sd(expr)) %>% mutate(expr = (expr-med)) %>%
  group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup()
ggplot(meanExprChr12, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)

Now the difference is clearly visible.

Mean expression difference between trisomy12 and wt samples for the genes on chr12 and other chromosomes

plotTab <- rnaExprTab %>%
  group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key = cnv, value = meanExpr) %>%
  mutate(diff = trisomy12-wt) %>%
  mutate(chr = ifelse(ChromID == "chr12","Chr12","Other"))

ggplot(plotTab, aes(x=diff, fill = chr, y = ..density..)) + geom_histogram(position = "identity", alpha=0.5) +
  xlab("Mean expression difference")

It’s also visible that most genes on chr12 tend to have higher expression in trisomy12 samples. While other genes follow normal distribution centered at 0.

Gene dosage effect on protein level

protExprTab <- allProtTab %>% 
  mutate(trisomy12 = patMeta[match(patID, patMeta$Patient.ID),]$trisomy12) %>%
  filter(!is.na(trisomy12)) %>% mutate(cnv = ifelse(trisomy12 %in% 1, "trisomy12","wt"))

Compare expression levels of Chr12 genes in tri12 and wt samples

Raw counts

meanExprChr12 <- protExprTab %>% filter(ChromID %in% "chr12") %>%
  group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup()

ggplot(meanExprChr12, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)

Similar as in the RNA expression, the difference is not very visible when raw count is used.

Expression values centered by mean

meanExprChr12 <- protExprTab %>% filter(ChromID %in% "chr12") %>%
  group_by(id) %>% mutate(med=mean(expr),sd = sd(expr)) %>% mutate(expr = (expr-med)) %>%
  group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup()
ggplot(meanExprChr12, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)

Now difference is clearly visible.

Mean protein expression difference between trisomy12 and wt samples for the genes on chr12 and other chromosomes

plotTab <- protExprTab %>%
  group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key = cnv, value = meanExpr) %>%
  mutate(diff = trisomy12-wt) %>%
  mutate(chr = ifelse(ChromID == "chr12","Chr12","Other"))

ggplot(plotTab, aes(x=diff, fill = chr, y = ..density..)) + geom_histogram(position = "identity", alpha=0.5)

The gene dosage effect is also clearly visible in the proteomic dataset

Compare the gene dosage effect between RNA and protein data

Variance of chr12 gene expression

varRna <- filter(rnaExprTab, ChromID == "chr12") %>%
  group_by(id,trisomy12) %>% summarise(sd = sd(expr)) %>%
  mutate(set = "rna")
varProt <- filter(protExprTab, ChromID == "chr12") %>%
  group_by(id,trisomy12) %>% summarise(sd = sd(expr)) %>%
  mutate(set = "protein")
plotTab <- bind_rows(varRna, varProt) %>% ungroup()

ggplot(plotTab, aes(x=sd, fill = set, y = ..density..)) + 
  geom_histogram(position = "identity", alpha=0.5, bins = 100)+
  facet_wrap(~trisomy12) + xlab("Expression variance")

The variance of RNA expression is higher than protein expression, which is an indication of buffering. The trend is the same in samples with or without trisomy12

Expression value (centered by mean)

expRna <- filter(rnaExprTab, ChromID == "chr12") %>%
  group_by(id) %>% mutate(meanVal = mean(expr)) %>%
  mutate(expr = expr-meanVal) %>%
  group_by(id,trisomy12) %>%
  summarise(meanExpr = mean(expr)) %>%
  mutate(set = "rna")
expProt <- filter(protExprTab, ChromID == "chr12") %>%
  group_by(id) %>% mutate(meanVal = mean(expr)) %>%
  mutate(expr = expr-meanVal) %>%
  group_by(id,trisomy12) %>%
  summarise(meanExpr = mean(expr)) %>%
  mutate(set = "protein")
plotTab <- bind_rows(expRna, expProt)

ggplot(plotTab, aes(x=meanExpr, fill = set, y = ..density..)) + 
  geom_histogram(position = "identity", alpha=0.5, bins = 100) +
  facet_wrap(~trisomy12) +
  xlab("Mean expression")

The RNA expression change is larger than protein expression change.

Plot the log fold change of RNA and proteins on chr12 (trisomy12 VS WT)

protDiffTab <- protExprTab %>%
  group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key = cnv, value = meanExpr) %>%
  mutate(diffProt = log(trisomy12/wt)) %>%
  mutate(chr = ifelse(ChromID == "chr12","Chr12","Other")) %>%
  select(id, diffProt, chr)

rnaDiffTab <- rnaExprTab %>%
  group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key = cnv, value = meanExpr) %>%
  mutate(diffRna = log(trisomy12/wt)) %>% select(id, diffRna)

compareTab <- left_join(protDiffTab, rnaDiffTab, by = "id") %>%
  filter(!is.na(diffProt),!is.na(diffRna)) %>%
  gather(key = dataset, value=diff,-id,-chr) %>%
  mutate(group =paste0(chr,"_",dataset)) %>%
  filter(chr == "Chr12")
  

ggplot(compareTab, aes(x=diff, fill = group, y = ..density..)) + 
  geom_histogram(position = "identity", alpha=0.5, bins = 100) +
  geom_vline(xintercept = 0, color = "blue", linetype ="dashed") +
  xlab("Expression different (tri12 - wt)")

Similar as the plot above, it can be seen that RNA expression change is larger than the protein expression change. This also indicates a buffering effect. Although it’s not a complete buffering, as the average protein expression difference is still larger than 0.

Analysis of the buffering effect

Quantifying buffering effect by comparing magnitude of differential expression

Differential expression in proteomics

#subset samples
overSample <- intersect(dds$PatID, colnames(protCLL))
protSub <- protCLL[rowData(protCLL)$chromosome_name %in% "12",overSample]
overGene <- na.omit(intersect(rownames(dds),rowData(protSub)$ensembl_gene_id))
protSub <- protSub[rowData(protSub)$ensembl_gene_id %in% overGene, ]
rownames(protSub) <- rowData(protSub)$ensembl_gene_id
designMat <- data.frame(row.names = colnames(protSub),
                        IGHV = protSub$IGHV.status,
                        trisomy12 = protSub$trisomy12)
exprMat <- assays(protSub)[["count"]]

#testing

fit <- proDA(exprMat, design = ~ ., col_data = designMat )
  
resTab.prot <- test_diff(fit, contrast = "trisomy121") %>%
  select(name, pval, diff, t_statistic, adj_pval) %>%
  dplyr::rename(id = name, logFC.prot = diff, stat.prot = t_statistic, pval.prot = pval, padj.prot = adj_pval)

Differential expression in RNAseq

#subset samples
overSample <- intersect(dds$PatID, colnames(protCLL))
ddsSub <- dds[overGene,overSample]
ddsSub$IGHV <- protSub[,ddsSub$PatID]$IGHV.status
ddsSub$trisomy12 <- protSub[,ddsSub$PatID]$trisomy12

design(ddsSub) <- ~ IGHV + trisomy12
#testing
deRes <- DESeq(ddsSub)
  
resTab.rna <- results(deRes, name = "trisomy12_1_vs_0", tidy = TRUE) %>%
  select(row, log2FoldChange, stat, pvalue, padj) %>%
  dplyr::rename(id = row, logFC.rna = log2FoldChange, stat.rna = stat, pval.rna = pvalue, padj.rna = padj) %>%
  mutate(logFC.rna = logFC.rna*log(2)) #change to log fold change, same as proteomic

Define a buffering score

comTab <- left_join(resTab.prot, resTab.rna, by = "id") %>%
  mutate(symbol = rowData(dds[id,])$symbol)

Only chr12 genes that are up-regulated are considered. Otherwise it’s hard to intepret the dosage effect.

bufferTab <- comTab %>% filter(stat.rna > 0, stat.prot>0) %>%
  ungroup() %>%
  mutate(stat.prot.sqrt = sqrt(stat.prot),
         stat.prot.center = stat.prot.sqrt - mean(stat.prot.sqrt)) %>%
  mutate(diffStat = stat.rna-stat.prot,
         diffFold = logFC.rna -logFC.prot) %>% 
  mutate(score = -stat.prot.center*stat.rna) %>%
  mutate(ifBuffer = case_when(
    padj.prot < 0.1 & padj.rna < 0.1 ~ "noBuffer",
    padj.prot > 0.1 & padj.rna < 0.1 ~ "Buffered",
    padj.prot < 0.1 & padj.rna > 0.1 ~ "Enhanced",
    TRUE ~ "undetermined"
  )) %>%
  arrange(desc(score))

Here I use two ways to quantify the buffering effect:

A buffering score, which is based on the difference of log fold change between protein and rna dataset and the t-statistics of the differentially expressed RNAs. The purpose is to give the gene that show significant and strong RNA change, but little protein change a high buffering score. While the genes that do not show strong RNA expression change will have a score close to zero. And the genes that show both strong protein and RNA expression change a more negative score.
A categorical variable, “ifBuffer”, based on the the significance of differential expression. The genes that show both significant protein and RNA up-regulation are in the “noBuffer” group, while the genes that show significant RNA-up-regulation but no significant protein expression change are in the “Buffered” group. The “Enhanced” group contains the genes that do not show significant changes in RNA level but with significant changes in protein level. The buffering score can not differentiate this group and will categorize it as undetermined. But the genes in this group, although pretty rare, may also be potentially interesting. Other genes are in the “undetermined” group.

ggplot(bufferTab, aes(x=ifBuffer,y=score, fill = ifBuffer)) + geom_boxplot() + geom_point()

The buffering score and the categorical variable are related. Perhaps the buffering score can estimate more subtle effect, like the degree of buffering.

table(bufferTab$ifBuffer)


    Buffered     Enhanced     noBuffer undetermined 
          50           10          115           12

Compare fold change in RNA expression and protein expression

ggplot(bufferTab, aes(x=logFC.rna, y=logFC.prot)) + geom_point(aes(col = ifBuffer)) +
  xlab("RNA log fold change") + ylab("Protein log fold change") +
  geom_smooth(method = "lm")

Table to show buffering effect

select(bufferTab, symbol, ifBuffer, score, padj.prot, padj.rna, logFC.prot,logFC.rna) %>%
  mutate_if(is.numeric, formatC, digits=2) %>%
  DT::datatable()

Buffered proteins have higher scores and not buffered proteins have lower scores

Plot most and least buffered genes

Top 9 most buffered based on buffering score

geneList <- bufferTab$id[1:9]
pList <- lapply(geneList, function(i) {
  tabProt <- allProtTab %>% filter(id == i) %>%
    select(id, patID, symbol,expr) %>% dplyr::rename(protExpr = expr)
  tabRna <- allRnaTab %>% filter(id == i) %>%
    select(id, patID, expr) %>% dplyr::rename(rnaExpr = expr)
  plotTab <- left_join(tabProt, tabRna, by = c("id","patID")) %>% 
    filter(!is.na(protExpr), !is.na(rnaExpr)) %>%
    mutate(trisomy12 = patMeta[match(patID, patMeta$Patient.ID),]$trisomy12)
  p <- ggplot(plotTab, aes(x=rnaExpr, y = protExpr)) +
    geom_point(aes(col=trisomy12)) + geom_smooth(method="lm") + ggtitle(unique(plotTab$symbol)) +
    theme(legend.position = "bottom")
  ggMarginal(p, type = "histogram", groupFill = TRUE)
  })
cowplot::plot_grid(plotlist = pList, ncol=3)

The 9 least buffered based on buffering score

geneList <- tail(bufferTab$id, n=9)
pList <- lapply(geneList, function(i) {
  tabProt <- allProtTab %>% filter(id == i) %>%
    select(id, patID, symbol,expr) %>% dplyr::rename(protExpr = expr)
  tabRna <- allRnaTab %>% filter(id == i) %>%
    select(id, patID, expr) %>% dplyr::rename(rnaExpr = expr)
  plotTab <- left_join(tabProt, tabRna, by = c("id","patID")) %>% 
    filter(!is.na(protExpr), !is.na(rnaExpr)) %>%
    mutate(trisomy12 = patMeta[match(patID, patMeta$Patient.ID),]$trisomy12)
  p <- ggplot(plotTab, aes(x=rnaExpr, y = protExpr)) +
    geom_point(aes(col=trisomy12)) + geom_smooth(method="lm") +
    ggtitle(unique(plotTab$symbol)) +
    theme(legend.position = "bottom")
  ggMarginal(p, type = "histogram", groupFill = TRUE)
  })
cowplot::plot_grid(plotlist = pList, ncol=3)

Plot the 10 genes in the enhanced group, where protein change is more significant than RNA change

geneList <- filter(bufferTab, ifBuffer == "Enhanced")$id
pList <- lapply(geneList, function(i) {
  tabProt <- allProtTab %>% filter(id == i) %>%
    select(id, patID, symbol,expr) %>% dplyr::rename(protExpr = expr)
  tabRna <- allRnaTab %>% filter(id == i) %>%
    select(id, patID, expr) %>% dplyr::rename(rnaExpr = expr)
  plotTab <- left_join(tabProt, tabRna, by = c("id","patID")) %>% 
    filter(!is.na(protExpr), !is.na(rnaExpr)) %>%
    mutate(trisomy12 = patMeta[match(patID, patMeta$Patient.ID),]$trisomy12)
  p <- ggplot(plotTab, aes(x=rnaExpr, y = protExpr)) +
    geom_point(aes(col=trisomy12)) + geom_smooth(method="lm") +
    ggtitle(unique(plotTab$symbol)) +
    theme(legend.position = "bottom")
  ggMarginal(p, type = "histogram", groupFill = TRUE)
  })
cowplot::plot_grid(plotlist = pList, ncol=3)

Enrichment analysis based on buffering score

inputTab <- bufferTab %>% select(symbol, score) %>%
  arrange(abs(score)) %>% distinct(symbol, .keep_all = T) %>%
  data.frame() %>% column_to_rownames("symbol")

gmts = list(H= "../data/gmts/h.all.v6.2.symbols.gmt",
            KEGG = "../data/gmts/c2.cp.kegg.v6.2.symbols.gmt")
enRes <- list()
enRes[["HALLMARK"]] <- jyluMisc::runGSEA(inputTab, gmts$H, "page")
enRes[["KEGG"]] <- jyluMisc::runGSEA(inputTab, gmts$KEGG, "page")
p <- jyluMisc::plotEnrichmentBar(enRes, pCut =0.05, ifFDR= FALSE)
#pdf("tri12Enrich.pdf", height = 15, width = 6)
plot(p)

#dev.off()

Here, down indicate the pathways that the non-buffered proteins are enrichment. As they are not buffered, those pathways may be more biologically relevant, for example give survival advantages are not buffered.

Whether buffering is related to protein complexes

int_pairs = read_delim("../data/proteins_in_complexes", delim = "\t") %>%
  mutate(symbolA = rowData(protCLL)[match(ProtA, rownames(protCLL)),]$hgnc_symbol,
         symbolB = rowData(protCLL)[match(ProtB, rownames(protCLL)),]$hgnc_symbol) %>%
  filter(!is.na(symbolA),!is.na(symbolB))

bufferTab <- mutate(bufferTab, inComplex = ifelse(symbol %in% c(int_pairs$symbolA, int_pairs$symbolB), TRUE, FALSE))

Plot the buffering scores of proteins in complex and not in complex

ggplot(bufferTab, aes(x=inComplex, y=score)) + geom_boxplot() + geom_point()

t.test(score~inComplex, bufferTab, var.equal= TRUE)


    Two Sample t-test

data:  score by inComplex
t = 0.65033, df = 185, p-value = 0.5163
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.7261018  1.4401983
sample estimates:
mean in group FALSE  mean in group TRUE 
         -0.1794555          -0.5365038

No significant differences can be observed.

Test using buffering group

testTab <- filter(bufferTab, ifBuffer!="undetermined")
table(testTab$ifBuffer, testTab$inComplex)

          
           FALSE TRUE
  Buffered    16   34
  Enhanced     6    4
  noBuffer    39   76

chisq.test(testTab$ifBuffer, testTab$inComplex)


    Pearson's Chi-squared test

data:  testTab$ifBuffer and testTab$inComplex
X-squared = 3.0089, df = 2, p-value = 0.2221

There’s not significant associations between whether the protein is in complex and whether the protein expression is buffered.

Plot buffering effect on genomic coordinates

plotFoldGenome <- function(bufferTab, allBand, allProtTab, chr, region = c(-Inf,Inf),
                           ifTrend = FALSE, maxVal =1, minVal=-1) {
  
  #table for cyto band
  bandTab <- filter(allBand, ChromID == chr, chromStart >= region[1], chromEnd <= region[2]) %>%
    mutate(chromMid = chromMid)
  
  #table for fold change
  protCoordTab <- allProtTab %>% distinct(symbol, start_position, end_position, mid_position)
  foldTab <- bufferTab %>% select(symbol, logFC.prot, logFC.rna, score, ifBuffer) %>%
    gather(key = "set", value = "logFC", -symbol, -score,-ifBuffer) %>%
    left_join(protCoordTab)
  bufferLine <- filter(bufferTab, ifBuffer %in% c("Buffered","Enhanced")) %>%
    left_join(protCoordTab) %>%
    distinct(symbol, mid_position, logFC.prot, logFC.rna, ifBuffer) %>%
    mutate(minY = ifelse(logFC.prot > logFC.rna, logFC.rna, logFC.prot),
           maxY= ifelse(logFC.prot > logFC.rna, logFC.prot, logFC.rna))
  
  xMax <- max(bandTab$chromEnd, na.rm = T)
  
  #main plot for Protein
  gPro <- ggplot() + 
    geom_rect(data=bandTab, mapping=aes(xmin=chromStart, xmax=chromEnd, ymin=minVal, ymax=maxVal, 
                                        fill=Colour, label = band), alpha=0.1) +
    geom_text(data=bandTab, mapping=aes(label=band, x=chromMid), y=maxVal, hjust =1, angle = 90, size=2.5) +
    geom_rect(data = foldTab, 
            mapping=aes(xmin=start_position,
                        xmax=end_position, ymin=logFC, ymax=logFC+0.1,
                        fill = set)) +
    geom_segment(data = bufferLine, aes(x=mid_position, xend = mid_position, 
                                        y=minY + 0.1, yend = maxY, col = ifBuffer),
                 linetype = "dashed") +
    scale_x_continuous(expand=c(0,0),limits = c(0,xMax)) +
    xlab("Genomic position [Mb]") + 
    ylab("Log Fold Change") + 
    scale_fill_manual(values = c(even = "white",odd = "grey50",
                                logFC.rna = "orange", logFC.prot = "darkblue")) +
    scale_color_manual(values = c(logFC.rna = "orange",logFC.prot = "darkblue",
                                  Buffered = "red",Enhanced = "green")) +
    ggtitle(paste0("Log fold change comparison","(",chr,")")) +
    ggrepel::geom_text_repel(data = bufferLine,
                                 aes(x=mid_position, y=logFC.prot, label = symbol, col = ifBuffer)) +
    theme(plot.title = element_text(face = "bold", size = 10, hjust = 0.3),
        legend.position = "none",
        panel.background = element_blank(),
        panel.grid.major = element_line(colour="grey90", size=0.1))
  
    if (ifTrend) {
      gPro <- gPro + stat_smooth(data =foldTab, geom="line",
                mapping = aes(y=logFC, x= mid_position,
                              color = set), 
                formula = y ~ x, method = "loess", se=FALSE, span=0.2,
                size =0.5, alpha=0.5)
    }
    
   
    #for legend
    ## if the patient has CNV data
    lgTab <- tibble(x= seq(6),y=seq(6),
                    Dataset = c(rep("logFC.rna",3), rep("logFC.prot",3)),
                    ifBuffer = c(rep("Buffered",3), rep("Enhanced",3)))
  
    lg <- ggplot(lgTab, aes(x=x,y=y)) +
      geom_point(aes(fill = Dataset, color = ifBuffer), shape =22,size=3) + 
      scale_fill_manual(values = c(logFC.rna = "orange", logFC.prot = "darkblue")) +
      scale_color_manual(values = c(logFC.rna = "orange",logFC.prot = "darkblue",
                                  Buffered = "red",Enhanced = "green")) + 
      theme(legend.position = "bottom")
    
    lg <- get_legend(lg)
    
    return(list(plot = gPro, legend = lg))
}

g <- plotFoldGenome(bufferTab, allBand, allProtTab, "chr12", region = c(-Inf,Inf),
                           ifTrend = TRUE, maxVal =2, minVal=0)
pg <- plot_grid(g$plot, g$legend, ncol = 1, rel_heights = c(1,0.1))
pg

ggsave(filename = "../public/trisomy12_buffer_plot.pdf", plot = pg, device = "pdf", height = 10, width = 18)

PDF version: trisomy12_buffer_plot.pdf

In this plot, the y axis in the log fold change of either protein (blue) or RNA (orange) expression (trisomy12 vs WT). If there’s a “Buffering” effect, the protein and rna is connected by a red dotted line. If there’s an “Enhanced” effect, they will be joined by a green dotted line.

Summary:

The gene dosage effect is visible in both RNA expression and protein expression, as compared to WT samples, the genes on Chr12 show elevated global expression of both RNA and protein in trisomy12 sample. But the scale of difference is less in proteins and the protein expression is less varied than RNA expression. This may be due to the buffering or moderation effect of translation or some other mechanisms that regulate protein abundance.
It seems the non-buffered proteins are enriched in PI3K, mTOR pathway and the buffered proteins are not enriched in any pathways. This may be an evidence that the non-buffered proteins are more functionally important for the survival of trisomy12 CLL cells.
There’s no significant association between the buffering effect and whether the protein is in complex or not.

sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.15.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] forcats_0.4.0               stringr_1.4.0              
 [3] dplyr_0.8.5                 purrr_0.3.3                
 [5] readr_1.3.1                 tidyr_1.0.0                
 [7] tibble_3.0.0                tidyverse_1.3.0            
 [9] cowplot_0.9.4               ggplot2_3.3.0              
[11] ggExtra_0.9                 proDA_1.1.2                
[13] jyluMisc_0.1.5              piano_2.0.2                
[15] DESeq2_1.24.0               SummarizedExperiment_1.14.0
[17] DelayedArray_0.10.0         BiocParallel_1.18.0        
[19] matrixStats_0.54.0          Biobase_2.44.0             
[21] GenomicRanges_1.36.0        GenomeInfoDb_1.20.0        
[23] IRanges_2.18.1              S4Vectors_0.22.0           
[25] BiocGenerics_0.30.0        

loaded via a namespace (and not attached):
  [1] readxl_1.3.1           backports_1.1.4        Hmisc_4.2-0           
  [4] fastmatch_1.1-0        drc_3.0-1              workflowr_1.6.0       
  [7] igraph_1.2.4.1         shinydashboard_0.7.1   splines_3.6.0         
 [10] crosstalk_1.0.0        TH.data_1.0-10         digest_0.6.19         
 [13] htmltools_0.4.0        gdata_2.18.0           magrittr_1.5          
 [16] checkmate_2.0.0        memoise_1.1.0          cluster_2.1.0         
 [19] openxlsx_4.1.0.1       limma_3.40.2           annotate_1.62.0       
 [22] modelr_0.1.5           sandwich_2.5-1         colorspace_1.4-1      
 [25] ggrepel_0.8.1          rvest_0.3.5            blob_1.1.1            
 [28] haven_2.2.0            xfun_0.8               crayon_1.3.4          
 [31] RCurl_1.95-4.12        jsonlite_1.6           genefilter_1.66.0     
 [34] survival_2.44-1.1      zoo_1.8-6              glue_1.3.2            
 [37] survminer_0.4.4        gtable_0.3.0           zlibbioc_1.30.0       
 [40] XVector_0.24.0         car_3.0-3              abind_1.4-5           
 [43] scales_1.1.0           mvtnorm_1.0-11         DBI_1.0.0             
 [46] relations_0.6-8        miniUI_0.1.1.1         Rcpp_1.0.1            
 [49] plotrix_3.7-6          cmprsk_2.2-8           xtable_1.8-4          
 [52] htmlTable_1.13.1       foreign_0.8-71         bit_1.1-14            
 [55] km.ci_0.5-2            Formula_1.2-3          DT_0.7                
 [58] httr_1.4.1             htmlwidgets_1.3        fgsea_1.10.0          
 [61] gplots_3.0.1.1         RColorBrewer_1.1-2     acepack_1.4.1         
 [64] ellipsis_0.2.0         farver_2.0.3           pkgconfig_2.0.2       
 [67] XML_3.98-1.20          dbplyr_1.4.2           nnet_7.3-12           
 [70] locfit_1.5-9.1         labeling_0.3           tidyselect_1.0.0      
 [73] rlang_0.4.5            later_0.8.0            AnnotationDbi_1.46.0  
 [76] munsell_0.5.0          cellranger_1.1.0       tools_3.6.0           
 [79] visNetwork_2.0.7       cli_1.1.0              generics_0.0.2        
 [82] RSQLite_2.1.1          broom_0.5.2            evaluate_0.14         
 [85] yaml_2.2.0             knitr_1.23             bit64_0.9-7           
 [88] fs_1.4.0               zip_2.0.2              survMisc_0.5.5        
 [91] caTools_1.17.1.2       nlme_3.1-140           mime_0.7              
 [94] slam_0.1-45            xml2_1.2.2             compiler_3.6.0        
 [97] rstudioapi_0.10        curl_3.3               ggsignif_0.5.0        
[100] marray_1.62.0          reprex_0.3.0           geneplotter_1.62.0    
[103] stringi_1.4.3          lattice_0.20-38        Matrix_1.2-17         
[106] KMsurv_0.1-5           shinyjs_1.0            vctrs_0.2.4           
[109] pillar_1.4.3           lifecycle_0.2.0        data.table_1.12.2     
[112] bitops_1.0-6           httpuv_1.5.1           extraDistr_1.8.11     
[115] R6_2.4.0               latticeExtra_0.6-28    promises_1.0.1        
[118] KernSmooth_2.23-15     gridExtra_2.3          rio_0.5.16            
[121] codetools_0.2-16       MASS_7.3-51.4          gtools_3.8.1          
[124] exactRankTests_0.8-30  assertthat_0.2.1       rprojroot_1.3-2       
[127] withr_2.1.2            multcomp_1.4-10        GenomeInfoDbData_1.2.1
[130] mgcv_1.8-28            hms_0.5.2              grid_3.6.0            
[133] rpart_4.1-15           rmarkdown_1.13         carData_3.0-2         
[136] ggpubr_0.2.1           git2r_0.26.1           maxstat_0.7-25        
[139] sets_1.0-18            lubridate_1.7.4        shiny_1.3.2           
[142] base64enc_0.1-3

Analysis of gene dosage effect related to trisomy12

Junyan Lu

2020-02-27

Compare gene dosage effect on RNA and protein level

Gene dosage effect on RNA level

Compare expression levels of Chr12 genes in tri12 and wt samples

Raw counts

Expression values centered by mean

Mean expression difference between trisomy12 and wt samples for the genes on chr12 and other chromosomes

Gene dosage effect on protein level

Compare expression levels of Chr12 genes in tri12 and wt samples

Raw counts

Expression values centered by mean

Mean protein expression difference between trisomy12 and wt samples for the genes on chr12 and other chromosomes

Compare the gene dosage effect between RNA and protein data

Variance of chr12 gene expression

Expression value (centered by mean)

Plot the log fold change of RNA and proteins on chr12 (trisomy12 VS WT)

Analysis of the buffering effect

Quantifying buffering effect by comparing magnitude of differential expression

Differential expression in proteomics

Differential expression in RNAseq

Define a buffering score

Here I use two ways to quantify the buffering effect:

Compare fold change in RNA expression and protein expression

Table to show buffering effect

Plot most and least buffered genes

Top 9 most buffered based on buffering score

The 9 least buffered based on buffering score

Plot the 10 genes in the enhanced group, where protein change is more significant than RNA change

Enrichment analysis based on buffering score

Plot buffering effect on genomic coordinates

Summary: