Last updated: 2023-02-17

Checks: 5 1

Knit directory: combiDLBCL/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20220425)

The command set.seed(20220425) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: no version control

Tracking code development and connecting the code version to the results is critical for reproducibility. To start using Git, open the Terminal and type git init in your project directory.

This project is not being versioned with Git. To obtain the full reproducibility benefits of using workflowr, please see ?wflow_start.

Load libraries and datasets

Data preprocessing

Remove duplicated samples

screenData <- filter(screenData, !ifRemove) %>%
  mutate(Name = cellLine)

Use edge effect corrected values

screenData <- mutate(screenData, normVal = normVal.cor)

Get combi-drug

singleViab <- filter(screenData, Drug_B != "DMSO", Drug_A=="DMSO") %>%
  mutate(Drug = Drug_B, Conc = Drug_B.Conc, ConcStep = Drug_B.ConcStep) %>%
  select(Name, Drug, Conc, ConcStep, normVal)

Get base-drug

baseViab <- filter(screenData, Drug_A != "DMSO", Drug_B == "DMSO") %>%
  mutate(Drug = Drug_A, Conc = Drug_A.Conc, ConcStep = Drug_A.ConcStep) %>%
  select(Name, Drug, Conc, ConcStep, normVal)

Combine and summarise

viabTab.conc <- bind_rows(singleViab, baseViab) #individual concentration
viabTab <- group_by(viabTab.conc, Name, Drug) %>%
  summarise(viab = calcAUC(normVal, Conc)) %>%
  ungroup()

`summarise()` has grouped output by 'Name'. You can override using the
`.groups` argument.

Genomic data

load("../data/SVs_filtered.RData")

Summarise mutations (cell lines used in drug screen): count as gene mutation if there is at least one mutation within gene

mutTab <- group_by(svTab, Name, Gene) %>% summarise(n = length(Name)) %>%
  arrange(desc(n)) %>%
  filter(Name %in% viabTab$Name)

`summarise()` has grouped output by 'Name'. You can override using the
`.groups` argument.

#Get mutations occured at least in three cell lines
geneCount <- group_by(mutTab, Gene) %>% summarise(n=length(Name)) %>%
  filter(n>=3) %>% arrange(desc(n)) %>%
  mutate(Gene = factor(Gene, levels = Gene))

There are too many mutations. Manual curation maybe needed.

Occurrence of mutations among cell lines, only mutations occurred at least 3 times are considered

ggplot(geneCount, aes(x=Gene, y=n)) +
  geom_bar(stat = "identity") +
  theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Summarise plot of genomics

Only use mutations that occurred at least five time in all the cell lines

mutTabSub <- filter(mutTab, Gene %in% geneCount$Gene) %>%
  mutate(status =1) %>% select(Name, Gene, status) %>%
  pivot_wider(names_from = "Gene", values_from = "status") %>%
  mutate_all(replace_na,0) %>%
  pivot_longer(-Name, names_to = "Gene", values_to = "status")

`mutate_all()` ignored the following grouping variables:
• Column `Name`
ℹ Use `mutate_at(df, vars(-group_cols()), myoperation)` to silence the message.

geneMat <- mutTabSub %>% pivot_wider(names_from = "Gene", values_from = "status") %>%
  column_to_rownames("Name") %>% as.matrix()

geneMat <- geneMat[,order(colSums(geneMat))]

Prepare plot

sortTab <- function(sumTab) {
  i <- ncol(sumTab)
  #print(i)
  if (i == 1) {
    return(rownames(sumTab)[order(sumTab[,i])])
  }
  allLevel <- sort(unique(sumTab[,i]))
  orderRow <- lapply(allLevel, function(n) {
    sortTab(sumTab[sumTab[,i] %in% n, seq(1,i-1), drop = FALSE])  
  }) %>% unlist() %>% c()
  return(orderRow)
}

sortedPat <- rev(sortTab(geneMat))

plotTab <- mutTabSub %>% mutate(Name = factor(Name, levels = sortedPat),
                                Gene = factor(Gene, levels = colnames(geneMat)))
ggplot(plotTab, aes(x=Name, y=Gene)) +
  geom_tile(aes(fill = factor(status)), col = "grey50") +
  scale_fill_manual(values =  c(`1`="black",`0`="white"), name = "mutation") +
  theme(axis.text.x = element_text(angle = 90, hjust=1, vjust=0.5))

Exploratory data analysis

Hierarchical clustering visualization

library(pheatmap)

viabMat <- viabTab %>% pivot_wider(names_from = Name, values_from = viab) %>%
  column_to_rownames("Drug") %>% as.matrix()

viabMat <- viabMat[,complete.cases(t(viabMat))]
atpCount <- filter(screenData, Drug_A  == "DMSO", Drug_B  == "DMSO", !ifEdge) %>%
  group_by(Name) %>% summarise(atp = median(value, na.rm=TRUE)) %>%
  mutate(logATP = log2(atp))

seleGenes <- c("TP53","EZH2","MYC","FOXO1","EP300","CACNA1E","BCL2")

colAnno <- tibble(Name = colnames(viabMat)) %>%
  left_join(mutTabSub,by="Name") %>% filter(Gene %in% seleGenes) %>%
  pivot_wider(names_from = "Gene", values_from = "status") %>%
  mutate(baseATP = atpCount[match(Name, atpCount$Name),]$logATP) %>%
  data.frame() %>% column_to_rownames("Name")

pheatmap(viabMat, scale = "row", clustering_method = "ward.D2", annotation_col = colAnno)

Drug-Drug correlation heatmap

pheatmap(cor(t(viabMat)))

Cell line correlation heatmap

pheatmap(cor(viabMat))

PCA

Calculate PCA

pcRes <- prcomp(t(viabMat), scale. = FALSE, center = TRUE)
pcTab <- pcRes$x[,1:10] %>% as_tibble(rownames = "Name")
varExp <- (pcRes$sdev^2)/sum(pcRes$sdev^2)

Plot PC1 and PC2

ggplot(pcTab, aes(x=PC1, y=PC2, label = Name, col=Name)) +
  xlab(sprintf("PC1 (%1.1f%%)", 100*varExp[1])) + ylab(sprintf("PC2 (%1.1f%%)", 100*varExp[2])) +
  geom_point() +
  ggrepel::geom_text_repel() +
  theme(legend.position = "none")

Correlate top10 PCs with baseline ATP count

testTab <- pcTab %>% pivot_longer(-Name, names_to = "PC", values_to = "value") %>%
  left_join(atpCount, by = "Name")

resTab <- group_by(testTab, PC) %>% nest() %>%
  mutate(m=map(data, ~cor.test(~value+logATP,.))) %>%
  mutate(res=map(m, broom::tidy)) %>% unnest(res) %>%
  arrange(p.value) %>% 
  select(PC, estimate, p.value)
head(resTab)

# A tibble: 6 × 3
# Groups:   PC [6]
  PC    estimate p.value
  <chr>    <dbl>   <dbl>
1 PC6     0.572  0.00436
2 PC4    -0.521  0.0108 
3 PC10   -0.220  0.314  
4 PC5    -0.195  0.373  
5 PC1     0.110  0.619  
6 PC8     0.0817 0.711

Correlate top10 PCs with genomic background

testTab <- pcTab %>% pivot_longer(-Name, names_to = "PC", values_to = "value") %>%
  full_join(mutTabSub, by = "Name") %>%
  filter(!is.na(status))

resTab <- group_by(testTab, PC, Gene) %>% nest() %>%
  mutate(m=map(data, ~t.test(value~status,.,var.equal=TRUE))) %>%
  mutate(res=map(m, broom::tidy)) %>% unnest(res) %>%
  arrange(p.value) %>% 
  select(PC, estimate, p.value) %>%
  ungroup() %>%
  mutate(p.adj = p.adjust(p.value, method = "BH"))

Adding missing grouping variables: `Gene`

head(resTab)

# A tibble: 6 × 5
  Gene  PC    estimate p.value p.adj
  <chr> <chr>    <dbl>   <dbl> <dbl>
1 EP300 PC1      0.699 0.00513 0.450
2 PRDM1 PC4      0.412 0.0158  0.450
3 KMT2D PC3      0.331 0.0177  0.450
4 GNA13 PC1      0.598 0.0198  0.450
5 IGLL5 PC5      0.280 0.0224  0.450
6 GNA13 PC7      0.222 0.0245  0.450

P-value histogram

hist(resTab$p.value)

plotTab <- mutate(pcTab, EP300 = factor(colAnno[Name,]$EP300))
ggplot(plotTab, aes(x=PC1, y=PC2, label = Name, col=EP300)) +
  xlab(sprintf("PC1 (%1.1f%%)", 100*varExp[1])) + ylab(sprintf("PC2 (%1.1f%%)", 100*varExp[2])) +
  geom_point() +
  #scale_color_gradient(low = "blue",high = "red") +
  ggrepel::geom_text_repel() +
  theme_bw()

Association between drug responses and genomics

Perform t-test

testTab <- full_join(viabTab, mutTabSub, by = "Name") %>%
  filter(!is.na(status))
resTab <- group_by(testTab, Drug, Gene) %>% nest() %>%
  mutate(m = map(data, ~t.test(viab ~ status, ., var.equal=TRUE))) %>%
  mutate(res = map(m, broom::tidy)) %>%
  unnest(res) %>%
  select(Drug, Gene, p.value) %>%
  arrange(p.value) %>%
  ungroup() %>%
  mutate(p.adj = p.adjust(p.value, method="BH"))

P-value histogram

hist(resTab$p.value)

All results with P<0.01

resTab.sig <- filter(resTab, p.value < 0.01)
resTab.sig %>% mutate_if(is.numeric, formatC, digits=1) %>% DT::datatable()

No one passed 10% FDR, probably too many test

Boxplot of significant pairs (0.01)

pList <- lapply(seq(nrow(resTab.sig)), function(i) {
  rec <- resTab.sig[i,]
  plotTab <- filter(testTab, Drug == rec$Drug, Gene == rec$Gene) %>%
    mutate(status = ifelse(status ==1, "Mut","WT"))
  ggplot(plotTab, aes(x=status, y=viab)) +
    geom_boxplot(outlier.shape = NA) +
    ggbeeswarm::geom_quasirandom(aes(col = status)) +
    theme_bw() +
    theme(legend.position = "none") +
    ggtitle(sprintf("%s ~ %s (p=%s)", rec$Drug, rec$Gene, formatC(rec$p.value, digits=2, format="e")))
})
cowplot::plot_grid(plotlist=pList,ncol=3)

Consensus clustering analysis

Run clustering with ConcsensusClustterPlus

library(ConsensusClusterPlus)
#consensus clustering
#Center each feature by median
d <- sweep(viabMat,1, apply(viabMat,1, median, na.rm=T))

resConsClust <- ConsensusClusterPlus(viabMat, maxK=10, reps=1000 , pItem=0.8, pFeature=0.8, title = "AAscreen_conc", 
                            clusterAlg="hc",distance="pearson",
                            seed=2022, plot="png")

end fraction

clustered
clustered
clustered
clustered
clustered
clustered
clustered
clustered
clustered

#plot clustering result
icl = calcICL(resConsClust,title="AAscreen_conc",plot="png")

#save results for later use
#save(resConsClust, file = "../output/resConsClust.RData")

Based on delta curve, three clusters would be most appropriate

Post-processing consensus clustering results

Select samples with clustering consensus over 80%

k=3
conMat <- resConsClust[[k]]$consensusMatrix
conClust <- resConsClust[[k]]$consensusClass
colnames(conMat) <- colnames(viabMat)

Visualization

geneAnno <- mutTabSub %>% filter(Gene %in% seleGenes) %>%
  pivot_wider(names_from = "Gene", values_from = "status")
colAnno <- tibble(Name = colnames(viabMat),
                  Cluster = factor(conClust)) %>%
  left_join(geneAnno, by ="Name") %>%
  mutate(baseATP = atpCount[match(Name, atpCount$Name),]$logATP) %>%
  data.frame() %>% column_to_rownames("Name")

pheatmap(conMat, annotation_col = colAnno, method = "complete", clustering_distance_rows = "correlation", clustering_distance_cols = "correlation")

Visualize clusters in PCA

plotTab <- pcTab %>% mutate(cluster = conClust[Name])
ggplot(plotTab, aes(x=PC1, y=PC2, label = Name, col=factor(cluster))) +
  xlab(sprintf("PC1 (%1.1f%%)", 100*varExp[1])) + ylab(sprintf("PC2 (%1.1f%%)", 100*varExp[2])) +
  geom_point() +
  ggrepel::geom_text_repel()+
  theme_bw()

Identify signature drugs for the cluster

Only focus on cluster 1, 2 and 4, which have more then 3 samples

clusterTab <- tibble(Name = names(conClust), Cluster = conClust) %>%
  filter(Cluster %in% c(1,2)) %>%
  mutate(Cluster = paste0("C",Cluster))

ANOVA test

testTab <- viabTab %>% left_join(clusterTab, by = "Name") %>%
  filter(!is.na(Cluster)) %>% mutate(Cluster = factor(Cluster))

aovRes <- testTab %>% group_by(Drug) %>% nest() %>%
  mutate(m = map(data, ~lm(viab~Cluster,.))) %>%
  mutate(aov = map(m, car::Anova)) %>%
  mutate(res = map(aov, broom::tidy)) %>%
  unnest(res) %>%
  filter(term == "Cluster") %>%
  select(Drug, p.value) %>% arrange(p.value)
aovRes.sig <- filter(aovRes, p.value < 0.05)

Boxplot of significant associations

pList <- lapply(seq(nrow(aovRes.sig)), function(i) {
  rec <- aovRes.sig[i,]
  plotTab <- filter(testTab, Drug == rec$Drug) %>%
    mutate(TP53 = factor(colAnno[Name,]$TP53)) %>%
    filter(!is.na(TP53))
  ggplot(plotTab, aes(x=Cluster, y=viab)) +
    geom_boxplot(outlier.shape = NA) + 
    ggbeeswarm::geom_quasirandom(aes(col = Cluster, shape = TP53),size=3) +
    theme_bw() + 
    ggtitle(sprintf("%s (p=%s)", rec$Drug, formatC(rec$p.value, digits = 2)))
})

cowplot::plot_grid(plotlist= pList, ncol=2)

Basically C1 is the CHP resistant cluster and C2 is the CHP sensitive cluster. The sensitively maybe related to TP53 mutations.

Association with proteomics/metabolomics

Proteomics

Data distribution

library(SummarizedExperiment)

Loading required package: MatrixGenerics

Loading required package: matrixStats


Attaching package: 'matrixStats'

The following object is masked from 'package:dplyr':

    count


Attaching package: 'MatrixGenerics'

The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges

Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: 'BiocGenerics'

The following objects are masked from 'package:dplyr':

    combine, intersect, setdiff, union

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors


Attaching package: 'S4Vectors'

The following objects are masked from 'package:dplyr':

    first, rename

The following object is masked from 'package:tidyr':

    expand

The following objects are masked from 'package:base':

    expand.grid, I, unname

Loading required package: IRanges


Attaching package: 'IRanges'

The following objects are masked from 'package:dplyr':

    collapse, desc, slice

The following object is masked from 'package:purrr':

    reduce

Loading required package: GenomeInfoDb

Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: 'Biobase'

The following object is masked from 'package:MatrixGenerics':

    rowMedians

The following objects are masked from 'package:matrixStats':

    anyMissing, rowMedians

protData <- readRDS("../data/SC005_SummarizedExperiment_proteomics.RDS")

#select baseline samples
protData <- protData[,protData$condition %in% "U"]
protMat <- assay(protData)

boxplot(protMat)

Median normalization (not performed)

#protMatNorm <- PhosR::medianScaling(protMat, scale = FALSE)
protMatNorm <- protMat
boxplot(protMatNorm)

protNorm <- protData
assay(protNorm) <- protMatNorm

Average technical replicates for each cell line

protTab <- assay(protNorm) %>% as_tibble(rownames = "uniprotID") %>%
  pivot_longer(-uniprotID) %>%
  mutate(cellLine = colData(protNorm)[name,]$cell.line) %>%
  group_by(uniprotID, cellLine) %>%
  summarise(count = mean(value, na.rm=TRUE)) %>%
  mutate(symbol = rowData(protNorm)[uniprotID,]$Gene_name,
         cluster = clusterTab[match(cellLine, clusterTab$Name),]$Cluster,
         doxSense = colData(protNorm)[match(cellLine, protNorm$cell.line),]$Doxo.response) %>%
  filter(cellLine %in% clusterTab$Name, 
         !symbol %in% c("",NA), !is.na(cluster)) %>%
  ungroup()

`summarise()` has grouped output by 'uniprotID'. You can override using the
`.groups` argument.

protSub <- jyluMisc::tidyToSum(protTab, rowID = "uniprotID",colID = "cellLine", 
                     values = "count", annoRow = "symbol", 
                     annoCol = c("cluster", "doxSense"))
protSub$TP53 <- factor(colAnno[colnames(protSub),]$TP53)

Identify proteins differentially expressed between C1 and C2

protSub <- protSub[,protSub$cluster %in% c("C1","C2")]
table(protSub$cluster)


C1 C2 
 8  3

Differential protein expression using proDA

library(proDA)
protMat <- assay(protSub)
fit <- proDA(protMat, design = ~ cluster,
             col_data = colData(protSub))

resTab <- test_diff(fit, contrast = "clusterC2") %>%
  arrange(pval) %>%
  mutate(symbol = rowData(protSub[name,])$symbol)

hist(resTab$pval)

Not strong difference

Proteins with p-value < 0.05

resTab.sig <- filter(resTab, pval < 0.05)
resTab.sig %>% select(symbol, pval, adj_pval, diff) %>%
  mutate_if(is.numeric, formatC, digits=1) %>%
  DT::datatable()

Plot top 9 examples

pList <- lapply(seq(9), function(i) {
  rec <- resTab.sig[i,]
  plotTab <- tibble(expr = protMat[rec$name,],
                    cluster = protSub$cluster,
                    TP53 = protSub$TP53) 
  ggplot(plotTab, aes(x=cluster, y=expr)) +
    geom_boxplot(outlier.shape = NA) +
    ggbeeswarm::geom_quasirandom(aes(color = cluster, shape = TP53), size=3) +
    ggtitle(rec$symbol) +
    theme_bw()
})

cowplot::plot_grid(plotlist = pList,ncol=3)

Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).

Warning: Removed 4 rows containing missing values (`position_quasirandom()`).

Warning: Removed 3 rows containing non-finite values (`stat_boxplot()`).

Warning: Removed 3 rows containing missing values (`position_quasirandom()`).

Warning: Removed 3 rows containing non-finite values (`stat_boxplot()`).

Warning: Removed 3 rows containing missing values (`position_quasirandom()`).

Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).

Warning: Removed 1 rows containing missing values (`position_quasirandom()`).

Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

Warning: Removed 2 rows containing missing values (`position_quasirandom()`).

Enrichment analysis

gmts = list(H= "../data/gmts/h.all.v6.2.symbols.gmt",
            KEGG = "../data/gmts/c2.cp.kegg.v6.2.symbols.gmt",
            C6 = "../data/gmts/c6.all.v6.2.symbols.gmt")
inputTab <- resTab %>% filter(pval < 0.1) %>%
  distinct(symbol, .keep_all = TRUE) %>%
  select(symbol, t_statistic) %>% data.frame() %>% column_to_rownames("symbol")
enRes <- list()
enRes[["Hallmark"]] <- runGSEA(inputTab, gmts$H, "page")

Loading required package: piano

enRes[["KEGG"]] <- runGSEA(inputTab, gmts$KEGG,"page")
enRes[["Perturbation"]] <- runGSEA(inputTab, gmts$C6,"page")
p <- plotEnrichmentBar(enRes, pCut =0.01, ifFDR= FALSE)
cowplot::plot_grid(p)

Focus on proteins from Fatty acid metabolism pathway

geneList <- piano::loadGSC(gmts$H)$gsc$HALLMARK_FATTY_ACID_METABOLISM
plotGene <- filter(filter(resTab, pval <= 0.1), symbol%in% geneList )
pList <- lapply(seq(nrow(plotGene)), function(i) {
  rec <- plotGene[i,]
  plotTab <- tibble(expr = protMat[rec$name,],
                    cluster = protSub$cluster,
                    TP53 = protSub$TP53) 
  ggplot(plotTab, aes(x=cluster, y=expr)) +
    geom_boxplot(outlier.shape = NA) +
    ggbeeswarm::geom_quasirandom(aes(color = cluster, shape = TP53), size=3) +
    ggtitle(rec$symbol) +
    theme_bw()
})

cowplot::plot_grid(plotlist = pList,ncol=3)

Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

Warning: Removed 2 rows containing missing values (`position_quasirandom()`).

Proteomics (dataset from Tobias)

Data distribution

library(SummarizedExperiment)
load("../data/ProtWide.RData")
protMat <- ProtWide

Median normalization (not performed)

#protMatNorm <- PhosR::medianScaling(protMat, scale = FALSE)
protMatNorm <- protMat
boxplot(protMatNorm)

#protNorm <- protData
#assay(protNorm) <- protMatNorm

Create assay experiment object

protTab <- protMatNorm %>% as_tibble(rownames = "uniprotID") %>%
  pivot_longer(-uniprotID, names_to = "cellLine", values_to = "count") %>%
  mutate(cluster = clusterTab[match(cellLine, clusterTab$Name),]$Cluster,
         symbol = uniprotID) %>%
  filter(cellLine %in% clusterTab$Name, 
         !symbol %in% c("",NA), !is.na(cluster))

protSub <- jyluMisc::tidyToSum(protTab, rowID = "uniprotID",colID = "cellLine", 
                     values = "count", annoRow = "symbol", annoCol = "cluster")
protSub$TP53 <- factor(colAnno[colnames(protSub),]$TP53)

Identify proteins differentially expressed between C1 and C2

protSub <- protSub[,protSub$cluster %in% c("C1","C2")]
table(protSub$cluster)


C1 C2 
17  5

Differential protein expression using proDA

library(proDA)
protMat <- assay(protSub)
fit <- proDA(protMat, design = ~ cluster,
             col_data = colData(protSub))

resTab <- test_diff(fit, contrast = "clusterC2") %>%
  arrange(pval) %>%
  mutate(symbol = rowData(protSub[name,])$symbol)

hist(resTab$pval)

Stronger associations can be observed

Proteins with p-value < 0.05

resTab.sig <- filter(resTab, pval < 0.05)
resTab.sig %>% select(symbol, pval, adj_pval, diff) %>%
  mutate_if(is.numeric, formatC, digits=1) %>%
  DT::datatable()

Plot top 9 examples

pList <- lapply(seq(9), function(i) {
  rec <- resTab.sig[i,]
  plotTab <- tibble(expr = protMat[rec$name,],
                    cluster = protSub$cluster,
                    TP53 = protSub$TP53) 
  ggplot(plotTab, aes(x=cluster, y=expr)) +
    geom_boxplot(outlier.shape = NA) +
    ggbeeswarm::geom_quasirandom(aes(color = cluster, shape = TP53), size=3) +
    ggtitle(rec$symbol) +
    theme_bw()
})

cowplot::plot_grid(plotlist = pList,ncol=3)

Enrichment analysis

gmts = list(H= "../data/gmts/h.all.v6.2.symbols.gmt",
            KEGG = "../data/gmts/c2.cp.kegg.v6.2.symbols.gmt",
            C6 = "../data/gmts/c6.all.v6.2.symbols.gmt")
inputTab <- resTab %>% filter(pval < 0.1) %>%
  distinct(symbol, .keep_all = TRUE) %>%
  select(symbol, t_statistic) %>% data.frame() %>% column_to_rownames("symbol")
enRes <- list()
enRes[["Hallmark"]] <- runGSEA(inputTab, gmts$H, "page")
enRes[["KEGG"]] <- runGSEA(inputTab, gmts$KEGG,"page")
enRes[["Perturbation"]] <- runGSEA(inputTab, gmts$C6,"page")
p <- plotEnrichmentBar(enRes, pCut =0.05, ifFDR= FALSE)
cowplot::plot_grid(p)

Focus on proteins from Fatty acid metabolism pathway

geneList <- piano::loadGSC(gmts$H)$gsc$HALLMARK_FATTY_ACID_METABOLISM
plotGene <- filter(filter(resTab, pval <= 0.05), symbol%in% geneList )
pList <- lapply(seq(nrow(plotGene)), function(i) {
  rec <- plotGene[i,]
  plotTab <- tibble(expr = protMat[rec$name,],
                    cluster = protSub$cluster,
                    TP53 = protSub$TP53) 
  ggplot(plotTab, aes(x=cluster, y=expr)) +
    geom_boxplot(outlier.shape = NA) +
    ggbeeswarm::geom_quasirandom(aes(color = cluster, shape = TP53), size=3) +
    ggtitle(rec$symbol) +
    theme_bw()
})

cowplot::plot_grid(plotlist = pList,ncol=3)

Metabolimics

Data distribution

metaData <- readRDS("../data/SC005_SummarizedExperiment_metabolomics.RDS")
metaMat <- assay(metaData)
boxplot(metaMat)

Median normalization (not performed)

#metaMatNorm <- PhosR::medianScaling(metaMat, scale = FALSE)
metaMatNorm <- metaMat
boxplot(metaMatNorm)

metaNorm <- metaData
assay(metaNorm) <- metaMatNorm

Average technical replicates for each cell line

metaTab <- assay(metaNorm) %>% as_tibble(rownames = "id") %>%
  pivot_longer(-id) %>%
  mutate(cellLine = colData(metaNorm)[name,]$cell.line) %>%
  group_by(id, cellLine) %>%
  summarise(count = mean(value, na.rm=TRUE)) %>%
  mutate(symbol = rowData(metaNorm)[id,]$metabolite,
         class = rowData(metaNorm)[id,]$class,
         cluster = clusterTab[match(cellLine, clusterTab$Name),]$Cluster) %>%
  filter(cellLine %in% clusterTab$Name, 
         !symbol %in% c("",NA), !is.na(cluster))

`summarise()` has grouped output by 'id'. You can override using the `.groups`
argument.

metaSub <- jyluMisc::tidyToSum(metaTab, rowID = "id",colID = "cellLine", 
                     values = "count", annoRow = c("symbol","class"), annoCol = "cluster")
metaSub$TP53 <- factor(colAnno[colnames(metaSub),]$TP53)

Identify proteins differentially expressed between C1 and C2

metaSub <- metaSub[,metaSub$cluster %in% c("C1","C2")]
table(metaSub$cluster)


C1 C2 
 8  3

Differential metabolites abundance

library(limma)
metaMat <- assay(metaSub)
designMat <- model.matrix(~metaSub$cluster)
fit <- lmFit(metaMat, designMat)
fit2 <- eBayes(fit)

resTab <- topTable(fit2, number= Inf) %>%
  dplyr::rename(pval = P.Value, adj_pval = adj.P.Val) %>%
  arrange(pval) %>%
  as_tibble(rownames = "name") %>%
  mutate(symbol = rowData(metaSub[name,])$symbol)

hist(resTab$pval)

Not strong difference

Proteins with p-value < 0.05

resTab.sig <- filter(resTab, pval < 0.05)
resTab.sig %>% select(symbol, pval, adj_pval, logFC, t) %>%
  mutate_if(is.numeric, formatC, digits=1) %>%
  DT::datatable()

Plot top 9 examples

pList <- lapply(seq(9), function(i) {
  rec <- resTab.sig[i,]
  plotTab <- tibble(expr = metaMat[rec$name,],
                    cluster = metaSub$cluster,
                    TP53 = metaSub$TP53) 
  ggplot(plotTab, aes(x=cluster, y=expr)) +
    geom_boxplot(outlier.shape = NA) +
    ggbeeswarm::geom_quasirandom(aes(color = cluster, shape = TP53), size=3) +
    ggtitle(rec$symbol) +
    theme_bw()
})

cowplot::plot_grid(plotlist = pList,ncol=3)

sessionInfo()

R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur/Monterey 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] limma_3.52.2                piano_2.12.0               
 [3] proDA_1.10.0                SummarizedExperiment_1.26.1
 [5] Biobase_2.56.0              GenomicRanges_1.48.0       
 [7] GenomeInfoDb_1.32.2         IRanges_2.30.0             
 [9] S4Vectors_0.34.0            BiocGenerics_0.42.0        
[11] MatrixGenerics_1.8.1        matrixStats_0.62.0         
[13] ConsensusClusterPlus_1.60.0 pheatmap_1.0.12            
[15] forcats_0.5.1               stringr_1.4.1              
[17] dplyr_1.0.9                 purrr_0.3.4                
[19] readr_2.1.2                 tidyr_1.2.0                
[21] tibble_3.1.8                ggplot2_3.4.1              
[23] tidyverse_1.3.2             jyluMisc_0.1.5             

loaded via a namespace (and not attached):
  [1] readxl_1.4.0           backports_1.4.1        fastmatch_1.1-3       
  [4] drc_3.0-1              workflowr_1.7.0        igraph_1.3.4          
  [7] shinydashboard_0.7.2   splines_4.2.0          crosstalk_1.2.0       
 [10] BiocParallel_1.30.3    TH.data_1.1-1          digest_0.6.30         
 [13] htmltools_0.5.4        fansi_1.0.3            magrittr_2.0.3        
 [16] googlesheets4_1.0.0    cluster_2.1.3          tzdb_0.3.0            
 [19] modelr_0.1.8           sandwich_3.0-2         colorspace_2.0-3      
 [22] ggrepel_0.9.1          rvest_1.0.2            haven_2.5.0           
 [25] xfun_0.31              crayon_1.5.2           RCurl_1.98-1.7        
 [28] jsonlite_1.8.3         survival_3.4-0         zoo_1.8-10            
 [31] glue_1.6.2             survminer_0.4.9        gtable_0.3.0          
 [34] gargle_1.2.0           zlibbioc_1.42.0        XVector_0.36.0        
 [37] DelayedArray_0.22.0    car_3.1-0              abind_1.4-5           
 [40] scales_1.2.0           mvtnorm_1.1-3          DBI_1.1.3             
 [43] relations_0.6-12       rstatix_0.7.0          Rcpp_1.0.9            
 [46] plotrix_3.8-2          xtable_1.8-4           km.ci_0.5-6           
 [49] DT_0.23                htmlwidgets_1.5.4      httr_1.4.3            
 [52] fgsea_1.22.0           RColorBrewer_1.1-3     gplots_3.1.3          
 [55] ellipsis_0.3.2         farver_2.1.1           pkgconfig_2.0.3       
 [58] sass_0.4.2             dbplyr_2.2.1           utf8_1.2.2            
 [61] labeling_0.4.2         tidyselect_1.1.2       rlang_1.0.6           
 [64] later_1.3.0            munsell_0.5.0          cellranger_1.1.0      
 [67] tools_4.2.0            visNetwork_2.1.0       cachem_1.0.6          
 [70] cli_3.4.1              generics_0.1.3         broom_1.0.0           
 [73] evaluate_0.15          fastmap_1.1.0          yaml_2.3.5            
 [76] knitr_1.39             fs_1.5.2               survMisc_0.5.6        
 [79] caTools_1.18.2         mime_0.12              slam_0.1-50           
 [82] xml2_1.3.3             compiler_4.2.0         rstudioapi_0.13       
 [85] beeswarm_0.4.0         ggsignif_0.6.3         marray_1.74.0         
 [88] reprex_2.0.1           bslib_0.4.1            stringi_1.7.8         
 [91] highr_0.9              lattice_0.20-45        Matrix_1.4-1          
 [94] shinyjs_2.1.0          KMsurv_0.1-5           vctrs_0.5.2           
 [97] pillar_1.8.0           lifecycle_1.0.3        jquerylib_0.1.4       
[100] data.table_1.14.2      cowplot_1.1.1          bitops_1.0-7          
[103] httpuv_1.6.6           extraDistr_1.9.1       R6_2.5.1              
[106] promises_1.2.0.1       KernSmooth_2.23-20     gridExtra_2.3         
[109] vipor_0.4.5            codetools_0.2-18       MASS_7.3-58           
[112] gtools_3.9.3           exactRankTests_0.8-35  assertthat_0.2.1      
[115] rprojroot_2.0.3        withr_2.5.0            multcomp_1.4-19       
[118] GenomeInfoDbData_1.2.8 parallel_4.2.0         hms_1.1.1             
[121] grid_4.2.0             rmarkdown_2.14         carData_3.0-5         
[124] googledrive_2.0.0      git2r_0.30.1           maxstat_0.7-25        
[127] ggpubr_0.4.0           sets_1.0-21            shiny_1.7.4           
[130] lubridate_1.8.0        ggbeeswarm_0.6.0

Analysis focused on the single agent effect

Junyan Lu

2022-04-25

Load libraries and datasets

Data preprocessing

Genomic data

Summarise plot of genomics

Exploratory data analysis

Hierarchical clustering visualization

Drug-Drug correlation heatmap

Cell line correlation heatmap

PCA

Correlate top10 PCs with baseline ATP count

Correlate top10 PCs with genomic background

Association between drug responses and genomics

Consensus clustering analysis

Run clustering with ConcsensusClustterPlus

Post-processing consensus clustering results

Identify signature drugs for the cluster

Association with proteomics/metabolomics

Proteomics

Identify proteins differentially expressed between C1 and C2

Enrichment analysis

Proteomics (dataset from Tobias)

Identify proteins differentially expressed between C1 and C2

Enrichment analysis

Metabolimics

Identify proteins differentially expressed between C1 and C2