Last updated: 2021-03-17

Checks: 6 1

Knit directory: CLLproteomics_batch13/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200227)

The command set.seed(20200227) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 3fb50c5

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 3fb50c5. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/manuscript_S1_Overview_cache/
    Ignored:    analysis/manuscript_S3_trisomy12_cache/
    Ignored:    analysis/manuscript_S4_trisomy19_cache/
    Ignored:    analysis/manuscript_S5_IGHV_cache/
    Ignored:    analysis/manuscript_S9_STAT2_cache/
    Ignored:    code/.DS_Store
    Ignored:    code/.Rhistory
    Ignored:    data/.DS_Store
    Ignored:    output/.DS_Store

Untracked files:
    Untracked:  analysis/.trisomy12_norm.pdf
    Untracked:  analysis/STAT2splicing.Rmd
    Untracked:  analysis/analysisBatch2.Rmd
    Untracked:  analysis/annotateSampleUpload.Rmd
    Untracked:  analysis/bufferAnalysis.Rmd
    Untracked:  analysis/cohortComposition.pdf
    Untracked:  analysis/cohortComposition_batch2.pdf
    Untracked:  analysis/compareBatchClinics.Rmd
    Untracked:  analysis/compareBatchGenomics.Rmd
    Untracked:  analysis/compareTreatment.Rmd
    Untracked:  analysis/complexAnalysis_overall.Rmd
    Untracked:  analysis/corumPairs.csv
    Untracked:  analysis/manuscript_S1_Overview.Rmd
    Untracked:  analysis/manuscript_S2_genomicAssociation.Rmd
    Untracked:  analysis/manuscript_S3_trisomy12.Rmd
    Untracked:  analysis/manuscript_S4_trisomy19.Rmd
    Untracked:  analysis/manuscript_S5_IGHV.Rmd
    Untracked:  analysis/manuscript_S6_del11q.Rmd
    Untracked:  analysis/manuscript_S7_SF3B1.Rmd
    Untracked:  analysis/manuscript_S8_drugResponse_Outcomes.Rmd
    Untracked:  analysis/manuscript_S9_STAT2.Rmd
    Untracked:  analysis/patAnno_exploration.csv
    Untracked:  analysis/patAnno_independent.csv
    Untracked:  analysis/patInfoTab.csv
    Untracked:  analysis/patInfoTab.tex
    Untracked:  analysis/protRNACor_eachPat.pdf
    Untracked:  analysis/test.pdf
    Untracked:  code/utils.R
    Untracked:  data/Annotation file March 2021.xlsx
    Untracked:  data/CNV_onChrom.RData
    Untracked:  data/ComplexParticipantsPubMedIdentifiers_human.txt
    Untracked:  data/Fig1A.png
    Untracked:  data/Western_blot_results_20210309_short.csv
    Untracked:  data/allComplexes.txt
    Untracked:  data/ddsrna_enc.RData
    Untracked:  data/exprCNV_enc.RData
    Untracked:  data/geneAnno.RData
    Untracked:  data/gmts/
    Untracked:  data/ic50.RData
    Untracked:  data/patMeta_enc.RData
    Untracked:  data/pepCLL_lumos_enc.RData
    Untracked:  data/proteins_in_complexes
    Untracked:  data/proteomic_explore_enc.RData
    Untracked:  data/proteomic_independent_enc.RData
    Untracked:  data/proteomic_timsTOF_enc.RData
    Untracked:  data/screenData_enc.RData
    Untracked:  data/survival_enc.RData
    Untracked:  manuscript_revision/
    Untracked:  output/MSH6_splicing.svg
    Untracked:  output/SUGP1_splicing.svg
    Untracked:  output/deResList.RData
    Untracked:  output/deResListBatch2.RData
    Untracked:  output/deResListRNA.RData
    Untracked:  output/deResList_WBC.RData
    Untracked:  output/deResList_batch1.RData
    Untracked:  output/deResList_batch3.RData
    Untracked:  output/deResList_timsTOF.RData
    Untracked:  output/dxdCLL.RData
    Untracked:  output/dxdCLL2.RData
    Untracked:  output/exprCNV.RData
    Untracked:  output/geneAnno.RData
    Untracked:  output/int_pairs.csv
    Untracked:  output/resOutcome_batch1.RData
    Untracked:  output/resOutcome_batch13.RData
    Untracked:  output/resOutcome_batch2.RData
    Untracked:  output/resOutcome_batch3.RData

Unstaged changes:
    Modified:   analysis/_site.yml
    Deleted:    analysis/analysisSF3B1.Rmd
    Deleted:    analysis/comparePlatforms.Rmd
    Deleted:    analysis/compareProteomicsRNAseq.Rmd
    Deleted:    analysis/correlateCLLPD.Rmd
    Deleted:    analysis/correlateGenomic.Rmd
    Deleted:    analysis/correlateGenomic_removePC.Rmd
    Deleted:    analysis/correlateMIR.Rmd
    Deleted:    analysis/correlateMethylationCluster.Rmd
    Modified:   analysis/index.Rmd
    Deleted:    analysis/predictOutcome.Rmd
    Deleted:    analysis/processProteomics_LUMOS.Rmd
    Deleted:    analysis/processProteomics_timsTOF.Rmd
    Deleted:    analysis/qualityControl_LUMOS.Rmd
    Deleted:    analysis/qualityControl_timsTOF.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.

Load packages and datasets

Load associations of different batches

Batch1

load("../output/resOutcome_batch1.RData")
cTab_b1 <- cTab %>% mutate(batch = "original")
uniRes_b1 <- uniRes %>% mutate(batch = "original")

Batch2

load("../output/resOutcome_batch2.RData")
cTab_b2 <- cTab %>% mutate(batch = "batch2") #multi-vai
uniRes_b2 <- uniRes %>% mutate(batch = "batch2")

Batch3

load("../output/resOutcome_batch3.RData")
cTab_b3 <- cTab %>% mutate(batch = "batch3")
uniRes_b3 <- uniRes %>% mutate(batch = "batch3")

Combined batch 1 and batch3

load("../output/resOutcome_batch13.RData")
cTab_b13 <- cTab %>% mutate(batch = "extended")
uniRes_b13 <- uniRes %>% mutate(batch = "extended")

Combined all batchs (only focus on TTT)

uniResTab <- bind_rows(uniRes_b1, uniRes_b2, uniRes_b3, uniRes_b13) %>% filter(outcome=="TTT") %>%
  mutate(dir = ifelse(HR >1, "higer risk", "lower risk")) %>%
  mutate(group =sprintf("%s (%s)", batch,dir))
cTab <- bind_rows(cTab_b1, cTab_b2, cTab_b3, cTab_b13) %>% filter(outcome == "TTT")

Compare Univariate results

Overlap of all significant assocations (Only TTT)

5% FDR Cut-off

compareTab <- uniResTab %>% filter(outcome == "TTT", p.adj < 0.05)
 

overList <- lapply(unique(compareTab$group), function(bb) {
  filter(compareTab, group ==bb)$id
})
names(overList) <- unique(compareTab$group)

upset(fromList(overList))

P < 0.05

compareTab <- uniResTab %>% filter(outcome == "TTT", p<=0.05) %>%
  mutate(dir = ifelse(HR >1, "higer", "lower")) %>%
  mutate(group =paste0(batch,"_",dir))

overList <- lapply(unique(compareTab$group), function(bb) {
  filter(compareTab, group ==bb)$id
})
names(overList) <- unique(compareTab$group)

upset(fromList(overList))

Proteins that identified by batch1 (10%) and also show significant p-values (P<0.05) in batch 3

resB1 <- filter(uniResTab, batch == "original", p.adj < 0.1) %>%
  select(name, dir) %>% dplyr::rename(dirB1  = dir)
resB3 <- filter(uniResTab, batch == "batch3", p < 0.05) %>%
  select(name, dir) %>% dplyr::rename(dirB3 = dir)
resCom <- left_join(resB1, resB3, by = "name") %>%
  filter(dirB1 == dirB3)

List of such proteins

resCom

# A tibble: 12 x 3
   name    dirB1      dirB3     
   <chr>   <chr>      <chr>     
 1 PHF1    lower risk lower risk
 2 CLTB    lower risk lower risk
 3 TPD52L2 lower risk lower risk
 4 MRPL46  higer risk higer risk
 5 RELA    lower risk lower risk
 6 MTCH2   higer risk higer risk
 7 RAB14   lower risk lower risk
 8 MTX2    higer risk higer risk
 9 PES1    higer risk higer risk
10 PURA    lower risk lower risk
11 MRPL44  higer risk higer risk
12 CDKN1B  lower risk lower risk

Proteins that identified by combined batch 1 and 3 (10%) and also show significant p-values (P<0.05) in batch 2

resB13 <- filter(uniResTab, batch == "extended", p.adj < 0.1) %>%
  select(name, dir) %>% dplyr::rename(dirB13  = dir)
resB2 <- filter(uniResTab, batch == "batch2", p < 0.05) %>%
  select(name, dir) %>% dplyr::rename(dirB2 = dir)
resCom <- left_join(resB13, resB2, by = "name") %>%
  filter(dirB13 == dirB2)

List of such proteins

resCom

# A tibble: 7 x 3
  name   dirB13     dirB2     
  <chr>  <chr>      <chr>     
1 MRPL44 higer risk higer risk
2 HSPD1  higer risk higer risk
3 APPL1  lower risk lower risk
4 TRAP1  higer risk higer risk
5 FKBP5  higer risk higer risk
6 CAP1   lower risk lower risk
7 NOP10  higer risk higer risk

Reproducibility of the candidates mentioned in the manuscript

PRMT5

filter(uniResTab, name == "PRMT5")

# A tibble: 3 x 11
         p    HR lower higher id      p.adj name  outcome batch dir    group    
     <dbl> <dbl> <dbl>  <dbl> <chr>   <dbl> <chr> <chr>   <chr> <chr>  <chr>    
1  1.25e-5 2.85  1.78    4.57 O14744 0.0290 PRMT5 TTT     orig… higer… original…
2  9.66e-1 0.990 0.611   1.60 O14744 0.993  PRMT5 TTT     batc… lower… batch3 (…
3  3.00e-3 1.65  1.19    2.30 O14744 0.0778 PRMT5 TTT     exte… higer… extended…

PES1

filter(uniResTab, name == "PES1")

# A tibble: 3 x 11
         p    HR lower higher id       p.adj name  outcome batch dir    group   
     <dbl> <dbl> <dbl>  <dbl> <chr>    <dbl> <chr> <chr>   <chr> <chr>  <chr>   
1  3.52e-4  2.51  1.51   4.15 O00541 0.0887  PES1  TTT     orig… higer… origina…
2  1.97e-2  1.87  1.11   3.17 O00541 0.390   PES1  TTT     batc… higer… batch3 …
3  1.23e-5  2.20  1.55   3.14 O00541 0.00436 PES1  TTT     exte… higer… extende…

PYGB

filter(uniResTab, name == "PYGB")

# A tibble: 4 x 11
         p    HR lower higher id       p.adj name  outcome batch dir    group   
     <dbl> <dbl> <dbl>  <dbl> <chr>    <dbl> <chr> <chr>   <chr> <chr>  <chr>   
1  4.36e-5 0.352 0.213  0.580 P11216 0.0331  PYGB  TTT     orig… lower… origina…
2  6.03e-2 0.595 0.346  1.02  P11216 0.951   PYGB  TTT     batc… lower… batch2 …
3  9.80e-2 0.700 0.459  1.07  P11216 0.579   PYGB  TTT     batc… lower… batch3 …
4  2.08e-5 0.516 0.380  0.700 P11216 0.00534 PYGB  TTT     exte… lower… extende…

Compare multi-variate results

cTab.com <- select(cTab, name, outcome, p, p.adj, batch)

Proteins that identified by batch1 (10%) and also show significant p-values (P<0.05) in batch 3

resB1 <- filter(cTab.com, batch == "original", p.adj < 0.1) %>%
  mutate(dir = "1") %>%
  select(name, dir) %>% dplyr::rename(dirB1  = dir)
resB3 <- filter(cTab.com, batch == "batch3", p < 0.05) %>%
  mutate(dir = "1") %>%
  select(name, dir) %>% dplyr::rename(dirB3 = dir)
resCom <- left_join(resB1, resB3, by = "name") %>%
  filter(dirB1 == dirB3)

List of such proteins

resCom

# A tibble: 3 x 3
  name   dirB1 dirB3
  <chr>  <chr> <chr>
1 PURA   1     1    
2 CDKN1B 1     1    
3 MTCH2  1     1

Proteins that identified by batch1&3 (10%) and also show significant p-values (P<0.05) in batch 2

resB13 <- filter(cTab.com, batch == "extended", p.adj < 0.1) %>%
  mutate(dir = "1") %>%
  select(name, dir) %>% dplyr::rename(dirB13  = dir)
resB2 <- filter(cTab.com, batch == "batch2", p < 0.05) %>%
  mutate(dir = "1") %>%
  select(name, dir) %>% dplyr::rename(dirB2 = dir)
resCom <- left_join(resB13, resB2, by = "name") %>%
  filter(dirB13 == dirB2)

List of such proteins

resCom

# A tibble: 11 x 3
   name   dirB13 dirB2
   <chr>  <chr>  <chr>
 1 PURA   1      1    
 2 MTCH2  1      1    
 3 NOP2   1      1    
 4 AP3B1  1      1    
 5 PSMD5  1      1    
 6 MRPL50 1      1    
 7 PPA1   1      1    
 8 PNPT1  1      1    
 9 FIS1   1      1    
10 NFKB1  1      1    
11 LRRC59 1      1

Reproducibility of the candidates mentioned in the manuscript

PRMT5

filter(cTab.com, name == "PRMT5")

# A tibble: 2 x 5
  name  outcome        p    p.adj batch   
  <chr> <chr>      <dbl>    <dbl> <chr>   
1 PRMT5 TTT     0.000185 0.000598 original
2 PRMT5 TTT     0.0152   0.0210   extended

PES1

filter(cTab.com, name == "PES1")

# A tibble: 2 x 5
  name  outcome          p    p.adj batch   
  <chr> <chr>        <dbl>    <dbl> <chr>   
1 PES1  TTT     0.0000309  0.000204 original
2 PES1  TTT     0.00000940 0.000261 extended

PYGB

filter(cTab.com, name == "PYGB")

# A tibble: 3 x 5
  name  outcome         p    p.adj batch   
  <chr> <chr>       <dbl>    <dbl> <chr>   
1 PYGB  TTT     0.0000142 0.000204 original
2 PYGB  TTT     0.417     0.719    batch2  
3 PYGB  TTT     0.0000161 0.000306 extended

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS  10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.5     purrr_0.3.4    
 [5] readr_1.4.0     tidyr_1.1.3     tibble_3.1.0    ggplot2_3.3.3  
 [9] tidyverse_1.3.0 UpSetR_1.4.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        lubridate_1.7.10  assertthat_0.2.1  rprojroot_2.0.2  
 [5] digest_0.6.27     utf8_1.1.4        R6_2.5.0          cellranger_1.1.0 
 [9] plyr_1.8.6        backports_1.2.1   reprex_1.0.0      evaluate_0.14    
[13] highr_0.8         httr_1.4.2        pillar_1.5.1      rlang_0.4.10     
[17] readxl_1.3.1      rstudioapi_0.13   jquerylib_0.1.3   rmarkdown_2.7    
[21] labeling_0.4.2    munsell_0.5.0     broom_0.7.5       compiler_4.0.2   
[25] httpuv_1.5.5      modelr_0.1.8      xfun_0.21         pkgconfig_2.0.3  
[29] htmltools_0.5.1.1 tidyselect_1.1.0  gridExtra_2.3     workflowr_1.6.2  
[33] fansi_0.4.2       crayon_1.4.1      dbplyr_2.1.0      withr_2.4.1      
[37] later_1.1.0.1     grid_4.0.2        jsonlite_1.7.2    gtable_0.3.0     
[41] lifecycle_1.0.0   DBI_1.1.1         git2r_0.28.0      magrittr_2.0.1   
[45] scales_1.1.1      cli_2.3.1         stringi_1.5.3     farver_2.1.0     
[49] fs_1.5.0          promises_1.2.0.1  xml2_1.3.2        bslib_0.2.4      
[53] ellipsis_0.3.1    generics_0.1.0    vctrs_0.3.6       tools_4.0.2      
[57] glue_1.4.2        hms_1.0.0         yaml_2.2.1        colorspace_2.0-0 
[61] rvest_1.0.0       knitr_1.31        haven_2.3.1       sass_0.3.1

Compare the clinical associations between batches

Junyan Lu

2021-02-16

Load packages and datasets

Load associations of different batches

Compare Univariate results

Overlap of all significant assocations (Only TTT)

5% FDR Cut-off

P < 0.05

Proteins that identified by batch1 (10%) and also show significant p-values (P<0.05) in batch 3

Proteins that identified by combined batch 1 and 3 (10%) and also show significant p-values (P<0.05) in batch 2

Reproducibility of the candidates mentioned in the manuscript

PRMT5

PES1

PYGB

Compare multi-variate results

Proteins that identified by batch1 (10%) and also show significant p-values (P<0.05) in batch 3

Proteins that identified by batch1&3 (10%) and also show significant p-values (P<0.05) in batch 2

Reproducibility of the candidates mentioned in the manuscript

PRMT5

PES1

PYGB