Last updated: 2020-05-29
Checks: 6 1
Knit directory: Proteomics/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish
to commit the R Markdown file and build the HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200227)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/.Rhistory
Ignored: analysis/complexAnalysis_IGHV_cache/
Ignored: analysis/complexAnalysis_trisomy12_alteredPQR_cache/
Ignored: analysis/complexAnalysis_trisomy12_cache/
Ignored: analysis/correlateCLLPD_cache/
Ignored: code/.Rhistory
Ignored: data/.DS_Store
Ignored: output/.DS_Store
Untracked files:
Untracked: analysis/CNVanalysis_trisomy12.Rmd
Untracked: analysis/CNVanalysis_trisomy19.Rmd
Untracked: analysis/analysisSplicing.Rmd
Untracked: analysis/analysisTrisomy19.Rmd
Untracked: analysis/annotateCNV.Rmd
Untracked: analysis/complexAnalysis_IGHV.Rmd
Untracked: analysis/complexAnalysis_trisomy12.Rmd
Untracked: analysis/correlateGenomic_PC12adjusted.Rmd
Untracked: analysis/correlateGenomic_noBlock.Rmd
Untracked: analysis/correlateGenomic_noBlock_MCLL.Rmd
Untracked: analysis/correlateGenomic_noBlock_UCLL.Rmd
Untracked: analysis/default.css
Untracked: analysis/del11q.pdf
Untracked: analysis/del11q_norm.pdf
Untracked: analysis/peptideValidate.Rmd
Untracked: analysis/plotCNV_del11q.pdf
Untracked: analysis/plotExpressionCNV.Rmd
Untracked: analysis/processPeptides_LUMOS.Rmd
Untracked: analysis/style.css
Untracked: analysis/trisomy12.pdf
Untracked: analysis/trisomy12_AFcor.Rmd
Untracked: analysis/trisomy12_norm.pdf
Untracked: code/AlteredPQR.R
Untracked: code/utils.R
Untracked: data/190909_CLL_prot_abund_med_norm.tsv
Untracked: data/190909_CLL_prot_abund_no_norm.tsv
Untracked: data/20190423_Proteom_submitted_samples_bereinigt.xlsx
Untracked: data/20191025_Proteom_submitted_samples_final.xlsx
Untracked: data/LUMOS/
Untracked: data/LUMOS_peptides/
Untracked: data/LUMOS_protAnnotation.csv
Untracked: data/LUMOS_protAnnotation_fix.csv
Untracked: data/SampleAnnotation_cleaned.xlsx
Untracked: data/example_proteomics_data
Untracked: data/facTab_IC50atLeast3New.RData
Untracked: data/gmts/
Untracked: data/mapEnsemble.txt
Untracked: data/mapSymbol.txt
Untracked: data/proteins_in_complexes
Untracked: data/pyprophet_export_aligned.csv
Untracked: data/timsTOF_protAnnotation.csv
Untracked: output/LUMOS_processed.RData
Untracked: output/cnv_plots.zip
Untracked: output/cnv_plots/
Untracked: output/cnv_plots_norm.zip
Untracked: output/dxdCLL.RData
Untracked: output/exprCNV.RData
Untracked: output/pepCLL_lumos.RData
Untracked: output/pepTab_lumos.RData
Untracked: output/plotCNV_allChr11_diff.pdf
Untracked: output/plotCNV_del11q_sum.pdf
Untracked: output/proteomic_LUMOS_20200227.RData
Untracked: output/proteomic_LUMOS_20200320.RData
Untracked: output/proteomic_LUMOS_20200430.RData
Untracked: output/proteomic_timsTOF_20200227.RData
Untracked: output/splicingResults.RData
Untracked: output/timsTOF_processed.RData
Untracked: plotCNV_del11q_diff.pdf
Unstaged changes:
Modified: analysis/_site.yml
Modified: analysis/analysisSF3B1.Rmd
Modified: analysis/compareProteomicsRNAseq.Rmd
Modified: analysis/correlateCLLPD.Rmd
Modified: analysis/correlateGenomic.Rmd
Deleted: analysis/correlateGenomic_removePC.Rmd
Modified: analysis/correlateMIR.Rmd
Modified: analysis/correlateMethylationCluster.Rmd
Modified: analysis/index.Rmd
Modified: analysis/predictOutcome.Rmd
Modified: analysis/processProteomics_LUMOS.Rmd
Modified: analysis/qualityControl_LUMOS.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
There are no past versions. Publish this analysis with wflow_publish()
to start tracking its development.
load("../../var/ddsrna_180717.RData")
load("../../var/patmeta_200522.RData")
load("../../var/proteomic_LUMOS_20200430.RData")
load("../output/exprCNV.RData")
#remove genes never expressed
noExpTab <- group_by(allRnaTab, id) %>% summarise(sumExpr = sum(expr)) %>%
filter(sumExpr == 0)
#mean variance trend
meanVarTab <- group_by(allRnaTab, id) %>%
summarise(meanVal = mean(expr),varVal = var(expr))
plot(meanVarTab$meanVal, meanVarTab$varVal)
#looks fine for exploratory analysis
rnaExprTab <- allRnaTab %>% filter(!id%in% noExpTab$id) %>%
mutate(trisomy12 = patMeta[match(patID, patMeta$Patient.ID),]$trisomy12) %>%
filter(!is.na(trisomy12)) %>% mutate(cnv = ifelse(trisomy12 %in% 1, "trisomy12","wt"))
meanExprChr12 <- rnaExprTab %>% filter(ChromID %in% "chr12") %>%
group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
ungroup()
ggplot(meanExprChr12, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)
There is no strong different, this is because the baseline expression values among genes are much larger than the relative expression difference between trisomy12 and wt samples.
meanExprChr12 <- rnaExprTab %>% filter(ChromID %in% "chr12") %>%
group_by(id) %>% mutate(med=mean(expr),sd = sd(expr)) %>% mutate(expr = (expr-med)) %>%
group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
ungroup()
ggplot(meanExprChr12, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)
Now the difference is clearly visible.
plotTab <- rnaExprTab %>%
group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
ungroup() %>%
spread(key = cnv, value = meanExpr) %>%
mutate(diff = trisomy12-wt) %>%
mutate(chr = ifelse(ChromID == "chr12","Chr12","Other"))
ggplot(plotTab, aes(x=diff, fill = chr, y = ..density..)) + geom_histogram(position = "identity", alpha=0.5) +
xlab("Mean expression difference")
It’s also visible that most genes on chr12 tend to have higher expression in trisomy12 samples. While other genes follow normal distribution centered at 0.
protExprTab <- allProtTab %>%
mutate(trisomy12 = patMeta[match(patID, patMeta$Patient.ID),]$trisomy12) %>%
filter(!is.na(trisomy12)) %>% mutate(cnv = ifelse(trisomy12 %in% 1, "trisomy12","wt"))
meanExprChr12 <- protExprTab %>% filter(ChromID %in% "chr12") %>%
group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
ungroup()
ggplot(meanExprChr12, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)
Similar as in the RNA expression, the difference is not very visible when raw count is used.
meanExprChr12 <- protExprTab %>% filter(ChromID %in% "chr12") %>%
group_by(id) %>% mutate(med=mean(expr),sd = sd(expr)) %>% mutate(expr = (expr-med)) %>%
group_by(id, symbol, cnv) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
ungroup()
ggplot(meanExprChr12, aes(x=meanExpr, fill = cnv)) + geom_histogram(position = "identity", alpha=0.5)
Now difference is clearly visible.
plotTab <- protExprTab %>%
group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
ungroup() %>%
spread(key = cnv, value = meanExpr) %>%
mutate(diff = trisomy12-wt) %>%
mutate(chr = ifelse(ChromID == "chr12","Chr12","Other"))
ggplot(plotTab, aes(x=diff, fill = chr, y = ..density..)) + geom_histogram(position = "identity", alpha=0.5)
The gene dosage effect is also clearly visible in the proteomic dataset
varRna <- filter(rnaExprTab, ChromID == "chr12") %>%
group_by(id,trisomy12) %>% summarise(sd = sd(expr)) %>%
mutate(set = "rna")
varProt <- filter(protExprTab, ChromID == "chr12") %>%
group_by(id,trisomy12) %>% summarise(sd = sd(expr)) %>%
mutate(set = "protein")
plotTab <- bind_rows(varRna, varProt) %>% ungroup()
ggplot(plotTab, aes(x=sd, fill = set, y = ..density..)) +
geom_histogram(position = "identity", alpha=0.5, bins = 100)+
facet_wrap(~trisomy12) + xlab("Expression variance")
The variance of RNA expression is higher than protein expression, which is an indication of buffering. The trend is the same in samples with or without trisomy12
expRna <- filter(rnaExprTab, ChromID == "chr12") %>%
group_by(id) %>% mutate(meanVal = mean(expr)) %>%
mutate(expr = expr-meanVal) %>%
group_by(id,trisomy12) %>%
summarise(meanExpr = mean(expr)) %>%
mutate(set = "rna")
expProt <- filter(protExprTab, ChromID == "chr12") %>%
group_by(id) %>% mutate(meanVal = mean(expr)) %>%
mutate(expr = expr-meanVal) %>%
group_by(id,trisomy12) %>%
summarise(meanExpr = mean(expr)) %>%
mutate(set = "protein")
plotTab <- bind_rows(expRna, expProt)
ggplot(plotTab, aes(x=meanExpr, fill = set, y = ..density..)) +
geom_histogram(position = "identity", alpha=0.5, bins = 100) +
facet_wrap(~trisomy12) +
xlab("Mean expression")
The RNA expression change is larger than protein expression change.
protDiffTab <- protExprTab %>%
group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
ungroup() %>%
spread(key = cnv, value = meanExpr) %>%
mutate(diffProt = log(trisomy12/wt)) %>%
mutate(chr = ifelse(ChromID == "chr12","Chr12","Other")) %>%
select(id, diffProt, chr)
rnaDiffTab <- rnaExprTab %>%
group_by(id, cnv, ChromID) %>% summarise(meanExpr = mean(expr, na.rm=TRUE)) %>%
ungroup() %>%
spread(key = cnv, value = meanExpr) %>%
mutate(diffRna = log(trisomy12/wt)) %>% select(id, diffRna)
compareTab <- left_join(protDiffTab, rnaDiffTab, by = "id") %>%
filter(!is.na(diffProt),!is.na(diffRna)) %>%
gather(key = dataset, value=diff,-id,-chr) %>%
mutate(group =paste0(chr,"_",dataset)) %>%
filter(chr == "Chr12")
ggplot(compareTab, aes(x=diff, fill = group, y = ..density..)) +
geom_histogram(position = "identity", alpha=0.5, bins = 100) +
geom_vline(xintercept = 0, color = "blue", linetype ="dashed") +
xlab("Expression different (tri12 - wt)")
Similar as the plot above, it can be seen that RNA expression change is larger than the protein expression change. This also indicates a buffering effect. Although it’s not a complete buffering, as the average protein expression difference is still larger than 0.
#subset samples
overSample <- intersect(dds$PatID, colnames(protCLL))
protSub <- protCLL[rowData(protCLL)$chromosome_name %in% "12",overSample]
overGene <- na.omit(intersect(rownames(dds),rowData(protSub)$ensembl_gene_id))
protSub <- protSub[rowData(protSub)$ensembl_gene_id %in% overGene, ]
rownames(protSub) <- rowData(protSub)$ensembl_gene_id
designMat <- data.frame(row.names = colnames(protSub),
IGHV = protSub$IGHV.status,
trisomy12 = protSub$trisomy12)
exprMat <- assays(protSub)[["count"]]
#testing
fit <- proDA(exprMat, design = ~ ., col_data = designMat )
resTab.prot <- test_diff(fit, contrast = "trisomy121") %>%
select(name, pval, diff, t_statistic, adj_pval) %>%
dplyr::rename(id = name, logFC.prot = diff, stat.prot = t_statistic, pval.prot = pval, padj.prot = adj_pval)
#subset samples
overSample <- intersect(dds$PatID, colnames(protCLL))
ddsSub <- dds[overGene,overSample]
ddsSub$IGHV <- protSub[,ddsSub$PatID]$IGHV.status
ddsSub$trisomy12 <- protSub[,ddsSub$PatID]$trisomy12
design(ddsSub) <- ~ IGHV + trisomy12
#testing
deRes <- DESeq(ddsSub)
resTab.rna <- results(deRes, name = "trisomy12_1_vs_0", tidy = TRUE) %>%
select(row, log2FoldChange, stat, pvalue, padj) %>%
dplyr::rename(id = row, logFC.rna = log2FoldChange, stat.rna = stat, pval.rna = pvalue, padj.rna = padj) %>%
mutate(logFC.rna = logFC.rna*log(2)) #change to log fold change, same as proteomic
comTab <- left_join(resTab.prot, resTab.rna, by = "id") %>%
mutate(symbol = rowData(dds[id,])$symbol)
Only chr12 genes that are up-regulated are considered. Otherwise it’s hard to intepret the dosage effect.
bufferTab <- comTab %>% filter(stat.rna > 0, stat.prot>0) %>%
ungroup() %>%
mutate(stat.prot.sqrt = sqrt(stat.prot),
stat.prot.center = stat.prot.sqrt - mean(stat.prot.sqrt)) %>%
mutate(diffStat = stat.rna-stat.prot,
diffFold = logFC.rna -logFC.prot) %>%
mutate(score = -stat.prot.center*stat.rna) %>%
mutate(ifBuffer = case_when(
padj.prot < 0.1 & padj.rna < 0.1 ~ "noBuffer",
padj.prot > 0.1 & padj.rna < 0.1 ~ "Buffered",
padj.prot < 0.1 & padj.rna > 0.1 ~ "Enhanced",
TRUE ~ "undetermined"
)) %>%
arrange(desc(score))
A buffering score, which is based on the difference of log fold change between protein and rna dataset and the t-statistics of the differentially expressed RNAs. The purpose is to give the gene that show significant and strong RNA change, but little protein change a high buffering score. While the genes that do not show strong RNA expression change will have a score close to zero. And the genes that show both strong protein and RNA expression change a more negative score.
A categorical variable, “ifBuffer”, based on the the significance of differential expression. The genes that show both significant protein and RNA up-regulation are in the “noBuffer” group, while the genes that show significant RNA-up-regulation but no significant protein expression change are in the “Buffered” group. The “Enhanced” group contains the genes that do not show significant changes in RNA level but with significant changes in protein level. The buffering score can not differentiate this group and will categorize it as undetermined. But the genes in this group, although pretty rare, may also be potentially interesting. Other genes are in the “undetermined” group.
ggplot(bufferTab, aes(x=ifBuffer,y=score, fill = ifBuffer)) + geom_boxplot() + geom_point()
The buffering score and the categorical variable are related. Perhaps the buffering score can estimate more subtle effect, like the degree of buffering.
table(bufferTab$ifBuffer)
Buffered Enhanced noBuffer undetermined
50 10 115 12
ggplot(bufferTab, aes(x=logFC.rna, y=logFC.prot)) + geom_point(aes(col = ifBuffer)) +
xlab("RNA log fold change") + ylab("Protein log fold change") +
geom_smooth(method = "lm")
select(bufferTab, symbol, ifBuffer, score, padj.prot, padj.rna, logFC.prot,logFC.rna) %>%
mutate_if(is.numeric, formatC, digits=2) %>%
DT::datatable()
Buffered proteins have higher scores and not buffered proteins have lower scores
geneList <- bufferTab$id[1:9]
pList <- lapply(geneList, function(i) {
tabProt <- allProtTab %>% filter(id == i) %>%
select(id, patID, symbol,expr) %>% dplyr::rename(protExpr = expr)
tabRna <- allRnaTab %>% filter(id == i) %>%
select(id, patID, expr) %>% dplyr::rename(rnaExpr = expr)
plotTab <- left_join(tabProt, tabRna, by = c("id","patID")) %>%
filter(!is.na(protExpr), !is.na(rnaExpr)) %>%
mutate(trisomy12 = patMeta[match(patID, patMeta$Patient.ID),]$trisomy12)
p <- ggplot(plotTab, aes(x=rnaExpr, y = protExpr)) +
geom_point(aes(col=trisomy12)) + geom_smooth(method="lm") + ggtitle(unique(plotTab$symbol)) +
theme(legend.position = "bottom")
ggMarginal(p, type = "histogram", groupFill = TRUE)
})
cowplot::plot_grid(plotlist = pList, ncol=3)
geneList <- tail(bufferTab$id, n=9)
pList <- lapply(geneList, function(i) {
tabProt <- allProtTab %>% filter(id == i) %>%
select(id, patID, symbol,expr) %>% dplyr::rename(protExpr = expr)
tabRna <- allRnaTab %>% filter(id == i) %>%
select(id, patID, expr) %>% dplyr::rename(rnaExpr = expr)
plotTab <- left_join(tabProt, tabRna, by = c("id","patID")) %>%
filter(!is.na(protExpr), !is.na(rnaExpr)) %>%
mutate(trisomy12 = patMeta[match(patID, patMeta$Patient.ID),]$trisomy12)
p <- ggplot(plotTab, aes(x=rnaExpr, y = protExpr)) +
geom_point(aes(col=trisomy12)) + geom_smooth(method="lm") +
ggtitle(unique(plotTab$symbol)) +
theme(legend.position = "bottom")
ggMarginal(p, type = "histogram", groupFill = TRUE)
})
cowplot::plot_grid(plotlist = pList, ncol=3)
geneList <- filter(bufferTab, ifBuffer == "Enhanced")$id
pList <- lapply(geneList, function(i) {
tabProt <- allProtTab %>% filter(id == i) %>%
select(id, patID, symbol,expr) %>% dplyr::rename(protExpr = expr)
tabRna <- allRnaTab %>% filter(id == i) %>%
select(id, patID, expr) %>% dplyr::rename(rnaExpr = expr)
plotTab <- left_join(tabProt, tabRna, by = c("id","patID")) %>%
filter(!is.na(protExpr), !is.na(rnaExpr)) %>%
mutate(trisomy12 = patMeta[match(patID, patMeta$Patient.ID),]$trisomy12)
p <- ggplot(plotTab, aes(x=rnaExpr, y = protExpr)) +
geom_point(aes(col=trisomy12)) + geom_smooth(method="lm") +
ggtitle(unique(plotTab$symbol)) +
theme(legend.position = "bottom")
ggMarginal(p, type = "histogram", groupFill = TRUE)
})
cowplot::plot_grid(plotlist = pList, ncol=3)
inputTab <- bufferTab %>% select(symbol, score) %>%
arrange(abs(score)) %>% distinct(symbol, .keep_all = T) %>%
data.frame() %>% column_to_rownames("symbol")
gmts = list(H= "../data/gmts/h.all.v6.2.symbols.gmt",
KEGG = "../data/gmts/c2.cp.kegg.v6.2.symbols.gmt")
enRes <- list()
enRes[["HALLMARK"]] <- jyluMisc::runGSEA(inputTab, gmts$H, "page")
enRes[["KEGG"]] <- jyluMisc::runGSEA(inputTab, gmts$KEGG, "page")
p <- jyluMisc::plotEnrichmentBar(enRes, pCut =0.05, ifFDR= FALSE)
#pdf("tri12Enrich.pdf", height = 15, width = 6)
plot(p)
#dev.off()
Here, down indicate the pathways that the non-buffered proteins are enrichment. As they are not buffered, those pathways may be more biologically relevant, for example give survival advantages are not buffered.
plotFoldGenome <- function(bufferTab, allBand, allProtTab, chr, region = c(-Inf,Inf),
ifTrend = FALSE, maxVal =1, minVal=-1) {
#table for cyto band
bandTab <- filter(allBand, ChromID == chr, chromStart >= region[1], chromEnd <= region[2]) %>%
mutate(chromMid = chromMid)
#table for fold change
protCoordTab <- allProtTab %>% distinct(symbol, start_position, end_position, mid_position)
foldTab <- bufferTab %>% select(symbol, logFC.prot, logFC.rna, score, ifBuffer) %>%
gather(key = "set", value = "logFC", -symbol, -score,-ifBuffer) %>%
left_join(protCoordTab)
bufferLine <- filter(bufferTab, ifBuffer %in% c("Buffered","Enhanced")) %>%
left_join(protCoordTab) %>%
distinct(symbol, mid_position, logFC.prot, logFC.rna, ifBuffer) %>%
mutate(minY = ifelse(logFC.prot > logFC.rna, logFC.rna, logFC.prot),
maxY= ifelse(logFC.prot > logFC.rna, logFC.prot, logFC.rna))
xMax <- max(bandTab$chromEnd, na.rm = T)
#main plot for Protein
gPro <- ggplot() +
geom_rect(data=bandTab, mapping=aes(xmin=chromStart, xmax=chromEnd, ymin=minVal, ymax=maxVal,
fill=Colour, label = band), alpha=0.1) +
geom_text(data=bandTab, mapping=aes(label=band, x=chromMid), y=maxVal, hjust =1, angle = 90, size=2.5) +
geom_rect(data = foldTab,
mapping=aes(xmin=start_position,
xmax=end_position, ymin=logFC, ymax=logFC+0.1,
fill = set)) +
geom_segment(data = bufferLine, aes(x=mid_position, xend = mid_position,
y=minY + 0.1, yend = maxY, col = ifBuffer),
linetype = "dashed") +
scale_x_continuous(expand=c(0,0),limits = c(0,xMax)) +
xlab("Genomic position [Mb]") +
ylab("Log Fold Change") +
scale_fill_manual(values = c(even = "white",odd = "grey50",
logFC.rna = "orange", logFC.prot = "darkblue")) +
scale_color_manual(values = c(logFC.rna = "orange",logFC.prot = "darkblue",
Buffered = "red",Enhanced = "green")) +
ggtitle(paste0("Log fold change comparison","(",chr,")")) +
ggrepel::geom_text_repel(data = bufferLine,
aes(x=mid_position, y=logFC.prot, label = symbol, col = ifBuffer)) +
theme(plot.title = element_text(face = "bold", size = 10, hjust = 0.3),
legend.position = "none",
panel.background = element_blank(),
panel.grid.major = element_line(colour="grey90", size=0.1))
if (ifTrend) {
gPro <- gPro + stat_smooth(data =foldTab, geom="line",
mapping = aes(y=logFC, x= mid_position,
color = set),
formula = y ~ x, method = "loess", se=FALSE, span=0.2,
size =0.5, alpha=0.5)
}
#for legend
## if the patient has CNV data
lgTab <- tibble(x= seq(6),y=seq(6),
Dataset = c(rep("logFC.rna",3), rep("logFC.prot",3)),
ifBuffer = c(rep("Buffered",3), rep("Enhanced",3)))
lg <- ggplot(lgTab, aes(x=x,y=y)) +
geom_point(aes(fill = Dataset, color = ifBuffer), shape =22,size=3) +
scale_fill_manual(values = c(logFC.rna = "orange", logFC.prot = "darkblue")) +
scale_color_manual(values = c(logFC.rna = "orange",logFC.prot = "darkblue",
Buffered = "red",Enhanced = "green")) +
theme(legend.position = "bottom")
lg <- get_legend(lg)
return(list(plot = gPro, legend = lg))
}
g <- plotFoldGenome(bufferTab, allBand, allProtTab, "chr12", region = c(-Inf,Inf),
ifTrend = TRUE, maxVal =2, minVal=0)
pg <- plot_grid(g$plot, g$legend, ncol = 1, rel_heights = c(1,0.1))
pg
ggsave(filename = "../public/trisomy12_buffer_plot.pdf", plot = pg, device = "pdf", height = 10, width = 18)
PDF version: trisomy12_buffer_plot.pdf
In this plot, the y axis in the log fold change of either protein (blue) or RNA (orange) expression (trisomy12 vs WT). If there’s a “Buffering” effect, the protein and rna is connected by a red dotted line. If there’s an “Enhanced” effect, they will be joined by a green dotted line.
The gene dosage effect is visible in both RNA expression and protein expression, as compared to WT samples, the genes on Chr12 show elevated global expression of both RNA and protein in trisomy12 sample. But the scale of difference is less in proteins and the protein expression is less varied than RNA expression. This may be due to the buffering or moderation effect of translation or some other mechanisms that regulate protein abundance.
It seems the non-buffered proteins are enriched in PI3K, mTOR pathway and the buffered proteins are not enriched in any pathways. This may be an evidence that the non-buffered proteins are more functionally important for the survival of trisomy12 CLL cells.
There’s no significant association between the buffering effect and whether the protein is in complex or not.
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.15.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] forcats_0.4.0 stringr_1.4.0
[3] dplyr_0.8.5 purrr_0.3.3
[5] readr_1.3.1 tidyr_1.0.0
[7] tibble_3.0.0 tidyverse_1.3.0
[9] cowplot_0.9.4 ggplot2_3.3.0
[11] ggExtra_0.9 proDA_1.1.2
[13] jyluMisc_0.1.5 piano_2.0.2
[15] DESeq2_1.24.0 SummarizedExperiment_1.14.0
[17] DelayedArray_0.10.0 BiocParallel_1.18.0
[19] matrixStats_0.54.0 Biobase_2.44.0
[21] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0
[23] IRanges_2.18.1 S4Vectors_0.22.0
[25] BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] readxl_1.3.1 backports_1.1.4 Hmisc_4.2-0
[4] fastmatch_1.1-0 drc_3.0-1 workflowr_1.6.0
[7] igraph_1.2.4.1 shinydashboard_0.7.1 splines_3.6.0
[10] crosstalk_1.0.0 TH.data_1.0-10 digest_0.6.19
[13] htmltools_0.4.0 gdata_2.18.0 magrittr_1.5
[16] checkmate_2.0.0 memoise_1.1.0 cluster_2.1.0
[19] openxlsx_4.1.0.1 limma_3.40.2 annotate_1.62.0
[22] modelr_0.1.5 sandwich_2.5-1 colorspace_1.4-1
[25] ggrepel_0.8.1 rvest_0.3.5 blob_1.1.1
[28] haven_2.2.0 xfun_0.8 crayon_1.3.4
[31] RCurl_1.95-4.12 jsonlite_1.6 genefilter_1.66.0
[34] survival_2.44-1.1 zoo_1.8-6 glue_1.3.2
[37] survminer_0.4.4 gtable_0.3.0 zlibbioc_1.30.0
[40] XVector_0.24.0 car_3.0-3 abind_1.4-5
[43] scales_1.1.0 mvtnorm_1.0-11 DBI_1.0.0
[46] relations_0.6-8 miniUI_0.1.1.1 Rcpp_1.0.1
[49] plotrix_3.7-6 cmprsk_2.2-8 xtable_1.8-4
[52] htmlTable_1.13.1 foreign_0.8-71 bit_1.1-14
[55] km.ci_0.5-2 Formula_1.2-3 DT_0.7
[58] httr_1.4.1 htmlwidgets_1.3 fgsea_1.10.0
[61] gplots_3.0.1.1 RColorBrewer_1.1-2 acepack_1.4.1
[64] ellipsis_0.2.0 farver_2.0.3 pkgconfig_2.0.2
[67] XML_3.98-1.20 dbplyr_1.4.2 nnet_7.3-12
[70] locfit_1.5-9.1 labeling_0.3 tidyselect_1.0.0
[73] rlang_0.4.5 later_0.8.0 AnnotationDbi_1.46.0
[76] munsell_0.5.0 cellranger_1.1.0 tools_3.6.0
[79] visNetwork_2.0.7 cli_1.1.0 generics_0.0.2
[82] RSQLite_2.1.1 broom_0.5.2 evaluate_0.14
[85] yaml_2.2.0 knitr_1.23 bit64_0.9-7
[88] fs_1.4.0 zip_2.0.2 survMisc_0.5.5
[91] caTools_1.17.1.2 nlme_3.1-140 mime_0.7
[94] slam_0.1-45 xml2_1.2.2 compiler_3.6.0
[97] rstudioapi_0.10 curl_3.3 ggsignif_0.5.0
[100] marray_1.62.0 reprex_0.3.0 geneplotter_1.62.0
[103] stringi_1.4.3 lattice_0.20-38 Matrix_1.2-17
[106] KMsurv_0.1-5 shinyjs_1.0 vctrs_0.2.4
[109] pillar_1.4.3 lifecycle_0.2.0 data.table_1.12.2
[112] bitops_1.0-6 httpuv_1.5.1 extraDistr_1.8.11
[115] R6_2.4.0 latticeExtra_0.6-28 promises_1.0.1
[118] KernSmooth_2.23-15 gridExtra_2.3 rio_0.5.16
[121] codetools_0.2-16 MASS_7.3-51.4 gtools_3.8.1
[124] exactRankTests_0.8-30 assertthat_0.2.1 rprojroot_1.3-2
[127] withr_2.1.2 multcomp_1.4-10 GenomeInfoDbData_1.2.1
[130] mgcv_1.8-28 hms_0.5.2 grid_3.6.0
[133] rpart_4.1-15 rmarkdown_1.13 carData_3.0-2
[136] ggpubr_0.2.1 git2r_0.26.1 maxstat_0.7-25
[139] sets_1.0-18 lubridate_1.7.4 shiny_1.3.2
[142] base64enc_0.1-3