Analysis of the proteomic data from the spinal cord injury cohort

Last updated: 2024-05-17

Checks: 0 1

Knit directory: SpinalCord_proteomics/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

Repository version: no version control

Tracking code development and connecting the code version to the results is critical for reproducibility. To start using Git, open the Terminal and type git init in your project directory.

This project is not being versioned with Git. To obtain the full reproducibility benefits of using workflowr, please see ?wflow_start.

Section 1: Preprocessing and Quality Control of the proteomics data
This analysis describes the pre-processing and quality control steps of the CSF proteomic data. Overall, this dataset’s quality is good but contains some potential technical noises (unwanted variations). The unwanted variations (noise) can be reduced by using statistical method.
Section 2: Investigate the difference between healthy samples and samples with injury (at baseline)
In this analysis, samples with injury (at visit 3) are compared with healthy samples to identify proteins that are differentially expressed between healthy and patients with injury. This analysis can be considered as a biological QC or quality control of the proteomic data. We should identify known markers for spinal cord injury in CSF samples.
Section 3: Identify proteins associated with random node group of baseline samples (Visit 3)
This analysis aims to identify proteins that are differentially expressed between node group B (random node 10, 16, 17, 18) and node group A (4, 5, 8, 9, 13) at baseline samples. This analysis may give insight on how the two groups are different at CSF proteomic level.
Section 4: Time-series analysis on the treated group
This analysis aims to identify proteins whose expressions change over time. This analysis also tries to identify proteins whose expressions over time show different patterns in samples with better recovery (high delta_UEMS) compared to worse recovery (low delta_UEMS) or random node group B versus group A.
Section 5: Time-series analysis on the placebo group
The same as section 4, but on the placebo group.
Section 6: Time-series analysis compare treated versus placebo group
This analysis aims to identify proteins whose expression change over time show different pattern in treated versus control group, in all samples or samples from random node B. The results can help identify treatment-specific effect. However, it seems only the drug NG101 is clearly different. This may be reasonable as the direct treatment effect is largely covered by the effect from recovery over time, which can also be observed in untreated group.
Section 7: Identify proteins that are associated with outcome, delta_UEMS
This analysis aims to identify proteins whose expression at different time points or change between different time points (expression at visit 8 - expression at visit 3) are associated with outcome (delta_UEMS), in either placebo or treated group. This analysis can identify candidates for predictive multi-variate machine learning models. By comparing treated and untreated group (at the end of section 7), it can also give insight on potential drug-specific effect.
Section 8: Identify proteins whose expression changes between visit 3 and visit 8 are directly associated with the UEMS change between visit 3 and visit 8
In this analysis, the protein expression changes between visit 8 and visit 3 are correlated with the UEMS changes between visit 8 and visit 3. Maybe this analysis can help identify proteins that directly related to recovery in either placebo or treated group.
Section 9: Build machine learning model for predicting treatment outcome, delta_UEMS
In this analysis, different machine learning models are built to predict outcome, delta_UEMS, using CSF proteomics. The proteomic model are also compared to the clinical parameters to show that proteomic model can better predict outcome than current clinical parameters or add additional information to clinical parameters.
Section 10: Build machine learning model for predicting treatment outcome, delta_UEMS, only in random node B
The same analysis as section 9, but only for treated patients from node group B.