CD38 is not detected in the updated proteomic data
Note that the CD38 FACS signal and CD38 expression are not linearly correlated. When RNAseq counts are low (log2Counts < 5), CD38 FACS signals are around zero, which may be due to detection limits of FACS. Using linear fit hear and R2 may be misleading.
I only have the FACS data of ZAP70 as binary variable (+ or -). To do correlation plot and calculate R2, like for CD38, a FACS measurement of ZAP70 as continuous variable is needed.
Similar to ZAP70, only catagorical values for CD23 are available. Therefore linear regression can not be performed.
We have 50 CLL samples with proteomic data measured at ETH.
CD38 is not detected in the updated proteomic data
Only three samples have both ZAP70 protein levels and ZAP70 FACS values.
### Histogram of ZAP70 expression
I am using the log2(RNAseq counts) as the y axis. The counts were normalized by size factors but not using variance stabilizing transformation. Because variance stabilizing transformation sometimes do not reflect the actual value of low counts.
T-test
##
## Welch Two Sample t-test
##
## data: CD38_expr by factor(IGHV)
## t = -6.8246, df = 205.78, p-value = 9.679e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.140057 -1.732444
## sample estimates:
## mean in group M mean in group U
## 6.554326 8.990577
T-test
##
## Welch Two Sample t-test
##
## data: ZAP70_expr by factor(IGHV)
## t = -12.324, df = 193.31, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.467712 -1.786823
## sample estimates:
## mean in group M mean in group U
## 11.36052 13.48778
Here I will define two gene sets UP_in_U-CLL, DOWN_in_U-CLL based on differential expression test of U-CLL versus M-CLL. The gene sets will be used for later enrichment analysis to compare the signature of CD38 and ZAP70. The two gene sets will be added to HALLMARK gene sets collection.
In this part, I am only using gene expression. As IGHV and methylation cluster will always be selected, and will downplay the importance of other gene expression features
TTT Patient with lowZAP70/lowCD38 expression show the best prognosis while highZAP70/highCD30 patients show the worst prognosis.
OS The trend is similar as TTT, but the separation is less clear.
TTT
OS
TTT
OS
Time to treatment The highCD38/highZAP70 group is still significant when other factors were adjusted in the model.
Time to treatment
Prepare table for test
Function for performing test
Associations (10% FDR)
## # A tibble: 6 x 4
## gene meanDiff p.value p.adj
## <chr> <dbl> <dbl> <dbl>
## 1 del11q 1.70 0.0000834 0.00309
## 2 del13q -1.19 0.000342 0.00543
## 3 trisomy12 1.65 0.000440 0.00543
## 4 BRAF 2.29 0.000938 0.00868
## 5 NOTCH1 1.36 0.00690 0.0511
## 6 ATM 1.46 0.0118 0.0729
Mutations numbers
## gene IGHV mutation status number
## 1 del11q M 0 108
## 2 del11q U 0 67
## 3 del11q M 1 5
## 4 del11q U 1 28
## 5 del13q M 0 36
## 6 del13q U 0 38
## 7 del13q M 1 77
## 8 del13q U 1 57
## 9 trisomy12 M 0 99
## 10 trisomy12 U 0 82
## 11 trisomy12 M 1 14
## 12 trisomy12 U 1 13
## 13 ATM M 0 106
## 14 ATM U 0 80
## 15 ATM M 1 5
## 16 ATM U 1 13
## 17 BRAF M 0 111
## 18 BRAF U 0 85
## 19 BRAF M 1 2
## 20 BRAF U 1 9
## 21 NOTCH1 M 0 91
## 22 NOTCH1 U 0 58
## 23 NOTCH1 M 1 5
## 24 NOTCH1 U 1 18
The first column is the gene name, second column is the IGHV status, the third columns is the mutation status of that gene (1 is mutated, 0 is wildtype), the fourth column is the number of samples in the group with certain IGHV status and gene mutation status
Boxplots of significant associations
Boxplots of significant associations, stratified by IGHV
Associations (10% FDR)
## # A tibble: 4 x 4
## gene meanDiff p.value p.adj
## <chr> <dbl> <dbl> <dbl>
## 1 trisomy12 2.82 0.00000123 0.0000247
## 2 del1q 4.05 0.0000479 0.000479
## 3 trisomy19 3.00 0.00111 0.00741
## 4 del13q -1.27 0.00267 0.0133
Boxplots of significant associations
## gene IGHV mutation status number
## 1 del13q M 0 36
## 2 del13q M 1 77
## 3 del1q M 0 105
## 4 del1q M 1 4
## 5 trisomy12 M 0 99
## 6 trisomy12 M 1 14
## 7 trisomy19 M 0 105
## 8 trisomy19 M 1 5
Associations (10% FDR)
## # A tibble: 1 x 4
## gene meanDiff p.value p.adj
## <chr> <dbl> <dbl> <dbl>
## 1 MED12 -2.38 0.00229 0.0756
## gene IGHV mutation status number
## 1 MED12 U 0 85
## 2 MED12 U 1 8
Boxplots of significant associations
Associations (10% FDR)
## # A tibble: 11 x 4
## gene meanDiff p.value p.adj
## <chr> <dbl> <dbl> <dbl>
## 1 TP53 1.22 0.0000485 0.00179
## 2 del11q 1.01 0.000896 0.0133
## 3 NOTCH1 1.18 0.00108 0.0133
## 4 MED12 1.56 0.00328 0.0304
## 5 trisomy19 -1.93 0.00948 0.0606
## 6 ZMYM3 1.93 0.00983 0.0606
## 7 BRAF 1.19 0.0142 0.0750
## 8 del13q -0.546 0.0201 0.0928
## 9 CSMD3 -1.68 0.0243 0.0990
## 10 NFKBIE 1.49 0.0283 0.0990
## 11 DDX3X 1.62 0.0294 0.0990
## gene IGHV mutation status number
## 1 del11q M 0 108
## 2 del11q U 0 67
## 3 del11q M 1 5
## 4 del11q U 1 28
## 5 del13q M 0 36
## 6 del13q U 0 38
## 7 del13q M 1 77
## 8 del13q U 1 57
## 9 trisomy19 M 0 105
## 10 trisomy19 U 0 93
## 11 trisomy19 M 1 5
## 12 trisomy19 U 1 0
## 13 BRAF M 0 111
## 14 BRAF U 0 85
## 15 BRAF M 1 2
## 16 BRAF U 1 9
## 17 CSMD3 M 0 106
## 18 CSMD3 U 0 93
## 19 CSMD3 M 1 5
## 20 CSMD3 U 1 0
## 21 DDX3X M 0 111
## 22 DDX3X U 0 88
## 23 DDX3X M 1 0
## 24 DDX3X U 1 5
## 25 MED12 M 0 110
## 26 MED12 U 0 85
## 27 MED12 M 1 1
## 28 MED12 U 1 8
## 29 NFKBIE M 0 110
## 30 NFKBIE U 0 88
## 31 NFKBIE M 1 1
## 32 NFKBIE U 1 5
## 33 NOTCH1 M 0 91
## 34 NOTCH1 U 0 58
## 35 NOTCH1 M 1 5
## 36 NOTCH1 U 1 18
## 37 TP53 M 0 104
## 38 TP53 U 0 68
## 39 TP53 M 1 9
## 40 TP53 U 1 26
## 41 ZMYM3 M 0 106
## 42 ZMYM3 U 0 89
## 43 ZMYM3 M 1 1
## 44 ZMYM3 U 1 4
Boxplots of significant associations
Boxplots of significant associations, stratified by IGHV
Associations (10% FDR)
## # A tibble: 2 x 4
## gene meanDiff p.value p.adj
## <chr> <dbl> <dbl> <dbl>
## 1 ATM 1.57 0.0191 0.227
## 2 FBXW7 1.53 0.0227 0.227
No significant correlations pass 10% FDR
## gene IGHV mutation status number
## 1 ATM M 0 106
## 2 ATM M 1 5
## 3 FBXW7 M 0 106
## 4 FBXW7 M 1 5
Associations (10% FDR)
## # A tibble: 1 x 4
## gene meanDiff p.value p.adj
## <chr> <dbl> <dbl> <dbl>
## 1 U1 -0.945 0.00475 0.157
No significant correlations pass 10% FDR
## gene IGHV mutation status number
## 1 U1 U 0 81
## 2 U1 U 1 9
ANOVA test
## # A tibble: 2 x 2
## # Groups: gene [2]
## gene p.value
## <chr> <dbl>
## 1 CD38 5.59e-264
## 2 ZAP70 0.
Boxplot
CD38 is not detected in the updated proteomic dataset
Similar to transcriptomic analysis, variance of ZAP70 expression in M-CLL is higher than U-CLL
Hallmark gene sets (10% FDR)
KEGG gene sets (10% FDR) GO BP gene sets (10% FDR)
Hallmark gene sets (10% FDR)
KEGG gene sets (5% FDR)
GO BP gene sets (5% FDR)