1 Correlation between RNA expression and protein level

1.1 Correlation for selected genes

1.1.1 All genes combined

1.1.2 Separate

CD38 is not detected in the updated proteomic data

1.2 RNAseq versus FACS

1.2.1 CD38

Note that the CD38 FACS signal and CD38 expression are not linearly correlated. When RNAseq counts are low (log2Counts < 5), CD38 FACS signals are around zero, which may be due to detection limits of FACS. Using linear fit hear and R2 may be misleading.

1.2.2 CD20

1.2.3 ZAP70

I only have the FACS data of ZAP70 as binary variable (+ or -). To do correlation plot and calculate R2, like for CD38, a FACS measurement of ZAP70 as continuous variable is needed.

1.2.4 CD23

Similar to ZAP70, only catagorical values for CD23 are available. Therefore linear regression can not be performed.

1.3 RNAseq versus Proteomics data

We have 50 CLL samples with proteomic data measured at ETH.

1.3.1 CD38

CD38 is not detected in the updated proteomic data

1.3.2 ZAP70

1.3.3 CD20

1.3.4 CD23

1.4 Proteomics version FACS

1.4.1 CD20

1.4.2 ZAP70

Only three samples have both ZAP70 protein levels and ZAP70 FACS values.

1.4.3 CD23

2 Correlation between IGHV and CD38/ZAP70 expressions

2.1 Bivariate plot of CD38 and ZAP70

2.1.1 Histogram of CD38 expression

### Histogram of ZAP70 expression

2.2 IGHV and CD38

I am using the log2(RNAseq counts) as the y axis. The counts were normalized by size factors but not using variance stabilizing transformation. Because variance stabilizing transformation sometimes do not reflect the actual value of low counts.

T-test

## 
##  Welch Two Sample t-test
## 
## data:  CD38_expr by factor(IGHV)
## t = -6.8246, df = 205.78, p-value = 9.679e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.140057 -1.732444
## sample estimates:
## mean in group M mean in group U 
##        6.554326        8.990577

2.3 IGHV and ZAP70

T-test

## 
##  Welch Two Sample t-test
## 
## data:  ZAP70_expr by factor(IGHV)
## t = -12.324, df = 193.31, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.467712 -1.786823
## sample estimates:
## mean in group M mean in group U 
##        11.36052        13.48778

3 Define gene sets that separate U-CLL from M-CLL

Here I will define two gene sets UP_in_U-CLL, DOWN_in_U-CLL based on differential expression test of U-CLL versus M-CLL. The gene sets will be used for later enrichment analysis to compare the signature of CD38 and ZAP70. The two gene sets will be added to HALLMARK gene sets collection.

4 Identify genes correlated with CD38 and ZAP70 (not blocking for IGHV)

4.0.1 CD38

4.0.1.1 P value histogram

4.0.1.2 Significantly correlated genes (1% FDR)

4.0.1.3 Plot top 10 correlations, colored by IGHV

4.0.1.4 Enrichment analysis

Hallmark gene sets

KEGG gene sets

4.0.2 ZAP70

4.0.2.1 P value histogram

4.0.2.2 Significantly correlated genes (1% FDR)

4.0.2.3 Plot top 10 correlations, stratified by IGHV

4.0.2.4 Enrichment analysis

Hallmark gene sets (10% FDR) Gene sets of IGHV stauts are strongly enriched, suggesting confounding with IGHV.

KEGG gene sets (10% FDR)

GO BP gene sets (5% FDR)

Record results

5 Identify genes correlated with CD38 and ZAP70, independent of IGHV status

5.1 Strategy 1: blockling for IGHV in multi-vairate model

5.1.1 CD38

5.1.1.1 P value histogram

5.1.1.2 Significantly correlated genes (10% FDR)

5.1.1.3 Heatmap of significantly correlated genes

5.1.1.4 Plot top 10 correlations, colored by IGHV

5.1.1.5 Enrichment analysis

Hallmark gene sets (10% FDR)

KEGG gene sets (10% FDR)

GO BP gene sets (5% FDR)

5.1.1.6 Heatmap of interested genesets

5.1.2 ZAP70

5.1.2.1 P value histogram

5.1.2.2 Significantly correlated genes (10% FDR)

5.1.2.3 Heatmap of significantly correlated genes

5.1.2.4 Plot top 10 correlations, stratified by IGHV

5.1.2.5 Enrichment analysis

Hallmark gene sets 10% FDR

KEGG gene sets (10% FDR)

GO BP gene sets (10% FDR)

Record results

5.1.2.6 Heatmap of interested genesets

5.1.3 Compare the results of with blocking and without blocking for IGHV (10% FDR)

5.1.3.1 CD38

5.1.3.2 ZAP70

5.1.3.3 CD38 and ZAP70

5.2 Strategy 2: Test for M-CLL and U-CLL separately.

5.2.1 CD38

5.2.1.1 M-CLL

5.2.1.1.1 P value histogram

5.2.1.1.2 Significantly correlated genes (10% FDR)
5.2.1.1.3 Plot top 9 correlations, stratified by IGHV

5.2.1.1.4 Enrichment analysis

Hallmark gene sets (10% FDR)

KEGG gene sets (10% FDR) GO BP gene sets (5% FDR)

5.2.1.2 Heatmap of interested genesets

5.2.1.3 U-CLL

5.2.1.3.1 P value histogram

In U-CLL, genes correlated with CD38 are significantly less than M-CLL

5.2.1.3.2 Significantly correlated genes (10% FDR)
5.2.1.3.3 Plot top 9 correlations, stratified by IGHV

5.2.1.3.4 Enrichment analysis

Hallmark gene sets (10%FDR)

KEGG gene sets (10FDR)

GO BP gene sets (10% FDR)

5.2.1.4 Compare DE genes in M-CLL and U-CLL

There are not many overlaps.

5.2.1.5 Heatmap of interested genesets

## NULL

5.2.2 ZAP70

5.2.2.1 M-CLL

5.2.2.1.1 P value histogram

5.2.2.1.2 Significantly correlated genes (10% FDR)
5.2.2.1.3 Plot top 9 correlations, stratified by IGHV

5.2.2.1.4 Enrichment analysis

Hallmark gene sets (10% FDR)

Hallmark gene sets (10% FDR) GO BP gene sets (10% FDR)

5.2.2.2 Heatmap of interested genesets

5.2.2.3 U-CLL

5.2.2.3.1 P value histogram

In U-CLL, genes correlated with ZAP70 are significantly less than M-CLL

5.2.2.3.2 Significantly correlated genes (10% FDR)

No significant DE genes at 10% can be detected. Results of P-value < 0.05 are shown

5.2.2.3.3 Plot top 9 correlations, stratified by IGHV

5.2.2.3.4 Enrichment analysis

Hallmark gene sets (10% FDR)

Hallmark gene sets (10% FDR)

GO BP gene sets (10% FDR)

5.2.2.4 Compare DE genes in M-CLL and U-CLL

No overlap can be found.

5.2.2.5 Heatmap of interested genesets

5.3 Combined enrichment barplot

5.3.1 ZAP70

5.3.1.1 w/o blocking

5.3.1.2 stratified by IGHV

5.3.2 CD38

5.3.2.1 w/o blocking

5.3.2.2 stratified by IGHV

6 Assocations between ZAP70/CD38 group and clinical outcomes

6.1 Define four groups to describe CD38 and ZAP70 expression (catagorical model)

6.1.1 Using median expression to define four groups

6.1.2 Select features that can separate those four groups

In this part, I am only using gene expression. As IGHV and methylation cluster will always be selected, and will downplay the importance of other gene expression features

6.2 All patients

TTT Patient with lowZAP70/lowCD38 expression show the best prognosis while highZAP70/highCD30 patients show the worst prognosis.

OS The trend is similar as TTT, but the separation is less clear.

6.3 U-CLLs

TTT

OS

6.4 M-CLLs

TTT

OS

6.4.1 Multivariate-model

Time to treatment The highCD38/highZAP70 group is still significant when other factors were adjusted in the model.

Time to treatment

7 Test of correlations between CD38/ZAP70 expression and other genomic features

7.1 Mutations and Copy number variations

Prepare table for test

Function for performing test

7.1.1 CD38

7.1.1.1 All CLLs

Associations (10% FDR)

## # A tibble: 6 x 4
##   gene      meanDiff   p.value   p.adj
##   <chr>        <dbl>     <dbl>   <dbl>
## 1 del11q        1.70 0.0000834 0.00309
## 2 del13q       -1.19 0.000342  0.00543
## 3 trisomy12     1.65 0.000440  0.00543
## 4 BRAF          2.29 0.000938  0.00868
## 5 NOTCH1        1.36 0.00690   0.0511 
## 6 ATM           1.46 0.0118    0.0729

Mutations numbers

##         gene IGHV mutation status number
## 1     del11q    M               0    108
## 2     del11q    U               0     67
## 3     del11q    M               1      5
## 4     del11q    U               1     28
## 5     del13q    M               0     36
## 6     del13q    U               0     38
## 7     del13q    M               1     77
## 8     del13q    U               1     57
## 9  trisomy12    M               0     99
## 10 trisomy12    U               0     82
## 11 trisomy12    M               1     14
## 12 trisomy12    U               1     13
## 13       ATM    M               0    106
## 14       ATM    U               0     80
## 15       ATM    M               1      5
## 16       ATM    U               1     13
## 17      BRAF    M               0    111
## 18      BRAF    U               0     85
## 19      BRAF    M               1      2
## 20      BRAF    U               1      9
## 21    NOTCH1    M               0     91
## 22    NOTCH1    U               0     58
## 23    NOTCH1    M               1      5
## 24    NOTCH1    U               1     18

The first column is the gene name, second column is the IGHV status, the third columns is the mutation status of that gene (1 is mutated, 0 is wildtype), the fourth column is the number of samples in the group with certain IGHV status and gene mutation status

Boxplots of significant associations

Boxplots of significant associations, stratified by IGHV

7.1.1.2 In M-CLLs only

Associations (10% FDR)

## # A tibble: 4 x 4
##   gene      meanDiff    p.value     p.adj
##   <chr>        <dbl>      <dbl>     <dbl>
## 1 trisomy12     2.82 0.00000123 0.0000247
## 2 del1q         4.05 0.0000479  0.000479 
## 3 trisomy19     3.00 0.00111    0.00741  
## 4 del13q       -1.27 0.00267    0.0133

Boxplots of significant associations

##        gene IGHV mutation status number
## 1    del13q    M               0     36
## 2    del13q    M               1     77
## 3     del1q    M               0    105
## 4     del1q    M               1      4
## 5 trisomy12    M               0     99
## 6 trisomy12    M               1     14
## 7 trisomy19    M               0    105
## 8 trisomy19    M               1      5

7.1.1.3 In U-CLLs only

Associations (10% FDR)

## # A tibble: 1 x 4
##   gene  meanDiff p.value  p.adj
##   <chr>    <dbl>   <dbl>  <dbl>
## 1 MED12    -2.38 0.00229 0.0756
##    gene IGHV mutation status number
## 1 MED12    U               0     85
## 2 MED12    U               1      8

Boxplots of significant associations

7.1.2 ZAP70

7.1.2.1 All CLLs

Associations (10% FDR)

## # A tibble: 11 x 4
##    gene      meanDiff   p.value   p.adj
##    <chr>        <dbl>     <dbl>   <dbl>
##  1 TP53         1.22  0.0000485 0.00179
##  2 del11q       1.01  0.000896  0.0133 
##  3 NOTCH1       1.18  0.00108   0.0133 
##  4 MED12        1.56  0.00328   0.0304 
##  5 trisomy19   -1.93  0.00948   0.0606 
##  6 ZMYM3        1.93  0.00983   0.0606 
##  7 BRAF         1.19  0.0142    0.0750 
##  8 del13q      -0.546 0.0201    0.0928 
##  9 CSMD3       -1.68  0.0243    0.0990 
## 10 NFKBIE       1.49  0.0283    0.0990 
## 11 DDX3X        1.62  0.0294    0.0990
##         gene IGHV mutation status number
## 1     del11q    M               0    108
## 2     del11q    U               0     67
## 3     del11q    M               1      5
## 4     del11q    U               1     28
## 5     del13q    M               0     36
## 6     del13q    U               0     38
## 7     del13q    M               1     77
## 8     del13q    U               1     57
## 9  trisomy19    M               0    105
## 10 trisomy19    U               0     93
## 11 trisomy19    M               1      5
## 12 trisomy19    U               1      0
## 13      BRAF    M               0    111
## 14      BRAF    U               0     85
## 15      BRAF    M               1      2
## 16      BRAF    U               1      9
## 17     CSMD3    M               0    106
## 18     CSMD3    U               0     93
## 19     CSMD3    M               1      5
## 20     CSMD3    U               1      0
## 21     DDX3X    M               0    111
## 22     DDX3X    U               0     88
## 23     DDX3X    M               1      0
## 24     DDX3X    U               1      5
## 25     MED12    M               0    110
## 26     MED12    U               0     85
## 27     MED12    M               1      1
## 28     MED12    U               1      8
## 29    NFKBIE    M               0    110
## 30    NFKBIE    U               0     88
## 31    NFKBIE    M               1      1
## 32    NFKBIE    U               1      5
## 33    NOTCH1    M               0     91
## 34    NOTCH1    U               0     58
## 35    NOTCH1    M               1      5
## 36    NOTCH1    U               1     18
## 37      TP53    M               0    104
## 38      TP53    U               0     68
## 39      TP53    M               1      9
## 40      TP53    U               1     26
## 41     ZMYM3    M               0    106
## 42     ZMYM3    U               0     89
## 43     ZMYM3    M               1      1
## 44     ZMYM3    U               1      4

Boxplots of significant associations

Boxplots of significant associations, stratified by IGHV

7.1.2.2 In M-CLLs only

Associations (10% FDR)

## # A tibble: 2 x 4
##   gene  meanDiff p.value p.adj
##   <chr>    <dbl>   <dbl> <dbl>
## 1 ATM       1.57  0.0191 0.227
## 2 FBXW7     1.53  0.0227 0.227

No significant correlations pass 10% FDR

##    gene IGHV mutation status number
## 1   ATM    M               0    106
## 2   ATM    M               1      5
## 3 FBXW7    M               0    106
## 4 FBXW7    M               1      5

7.1.2.3 In U-CLLs only

Associations (10% FDR)

## # A tibble: 1 x 4
##   gene  meanDiff p.value p.adj
##   <chr>    <dbl>   <dbl> <dbl>
## 1 U1      -0.945 0.00475 0.157

No significant correlations pass 10% FDR

##   gene IGHV mutation status number
## 1   U1    U               0     81
## 2   U1    U               1      9

7.2 Association with methylation clusters

ANOVA test

## # A tibble: 2 x 2
## # Groups:   gene [2]
##   gene    p.value
##   <chr>     <dbl>
## 1 CD38  5.59e-264
## 2 ZAP70 0.

Boxplot

8 Associations between CD38, ZAP70 expression and drug responses

8.1 Without blocking for IGHV

8.1.1 CD38

8.1.1.1 P value histogram

8.1.1.2 Significantly correlated drugs (10% FDR)

8.1.1.3 Heatmap of significantly correlated drugs

8.1.1.4 Plot top 10 correlations, colored by IGHV

8.1.2 ZAP70

8.1.2.1 P value histogram

8.1.2.2 Significantly correlated genes (10% FDR)

8.1.2.3 Heatmap of significantly correlated genes

8.1.2.4 Plot top 10 correlations, colored by IGHV

8.2 With blocking for IGHV

8.2.1 CD38

8.2.1.1 P value histogram

8.2.1.2 Significantly correlated drugs (10% FDR)

8.2.1.3 Heatmap of significantly correlated drugs

8.2.1.4 Plot top 10 correlations, stratified by IGHV

8.2.2 ZAP70

8.2.2.1 P value histogram

8.2.2.2 Significantly correlated genes (10% FDR)

No significant associations when blocking for IGHV

9 ZAP70 analysis using proteomics data

CD38 is not detected in the updated proteomic dataset

9.1 Association between ZAP70 expression and IGHV status

Similar to transcriptomic analysis, variance of ZAP70 expression in M-CLL is higher than U-CLL

9.2 Identify proteins correlated with ZAP70 (not blocking for IGHV)

9.2.0.1 P value histogram

9.2.0.2 Significantly correlated genes (10% FDR)

9.2.0.3 Heatmap of significantly correlated genes

9.2.0.4 Enrichment analysis

Hallmark gene sets (10% FDR)

KEGG gene sets (10% FDR)

GO BP gene sets (5% FDR)

9.3 Identify genes correlated with ZAP70, independent of IGHV status

9.3.1 Strategy 1: blockling for IGHV in multi-vairate model

9.3.1.1 P value histogram

9.3.1.2 Significantly correlated genes (10% FDR)

**P value < 0.01

9.3.1.3 Enrichment analysis

Hallmark gene sets 10% FDR

KEGG gene sets (10% FDR)

GO BP gene sets (5% FDR)

9.4 Strategy 2: Test for M-CLL and U-CLL separately.

9.4.1 ZAP70

9.4.1.1 M-CLL

9.4.1.1.1 P value histogram

9.4.1.1.2 Significantly correlated genes (10% FDR)
9.4.1.1.3 Enrichment analysis

Hallmark gene sets (10% FDR)

KEGG gene sets (10% FDR) GO BP gene sets (10% FDR)

9.4.1.2 U-CLL

9.4.1.2.1 P value histogram

9.4.1.2.2 Significantly correlated genes (10% FDR)

Different to transcriptomic data, there are more proteins associated with ZAP70 in U-CLL than M-CLL

9.4.1.2.3 Enrichment analysis

Hallmark gene sets (10% FDR)

KEGG gene sets (5% FDR)

GO BP gene sets (5% FDR)

10 Analysis of U-CLL signature

10.1 PCA with up in U-CLL signature genes

10.2 PCA with down in U-CLL signature genes

11 Genes correlated with ZAP70 in IP-CLL

11.0.0.1 P value histogram

11.0.0.2 Significantly correlated genes (10% FDR)

11.0.0.3 Heatmap of significantly assocaited genes

11.0.0.4 Enrichment analysis

Hallmark gene sets

KEGG gene sets

## [1] "No sets passed the criteria"
## NULL