# More mice

Load the mice data using the following commands:

mice_pheno <- read.csv2(file= url("https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/mice_pheno.csv"), sep=",")
mice_pheno$Bodyweight <- as.numeric(mice_pheno$Bodyweight)
1. Visualize the bodyweights of control and high-fat mice within the females.
2. Is there a significant difference in diets for the female mice?
3. Is the difference between the two diets more pronounced in male or female mice? How do you quantify that?

# ELISA revisited

Consider the ELISA exercise from day 2 (example from MSMB).
Suppose now we have a known false positive rate of 1%. This is the probability of declaring a hit – we think we have an epitope – when there is none.

1. Can you think of a test that answers the question: What is the probability of seeing counts as large as 7 (7 out of the 50 patients had a hit), when there really is no epitope at this position?

2. What is problematic about reporting the obtained p-value for the epitope detected at position 42? Discuss in your breakout room.

# ALL data

This exercise is adapted from Bernd Klaus’ teaching materials.

The ALL data consist of microarrays from 128 different individuals with acute lymphoblastic leukemia (ALL). There are 95 samples with B-cell ALL and 33 with T-cell ALL and because these are different tissues and quite different diseases we consider them separately and focus on the B-cell ALL tumors. An interesting subset, with two groups having approximately the same number of samples in each group, is the comparison of the B-cell tumors found to carry the BCR/ABL mutation to those B-cell tumors with no observed cytogenetic abnormalities. These samples are labeled BCR/ABL and NEG in the mol.biol variable. The BCR/ABL mutation, also known as the Philadelphia chromosome, was the first cytogenetic aberration that could be associated with the development of cancer, leading the way to the current understanding of the disease. In tumors harboring the BCR/ABL translocation a short piece of chromosome 22 is exchanged with a segment of chromosome 9. As a consequence, a constitutively active fusion protein is transcribed which acts as a potent mitogene, leading to uncontrolled cell division. Not all leukemia tumors carry the Philadelphia chromosome; there are other mutations that can be responsible for neoplastic alterations of blood cells, for instance a translocation between chromosomes 4 and 11 (ALL1/AF4).

In this exercise we want to test whether the expression of BCL2 gene differs between samples with and without BCR/ABL fusion in B-cells.

Install the ALL package:

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")

BiocManager::install("ALL")

The following code loads the data and subsets them:

library(ALL)
data(ALL)

# Subset to B-cells
my_ALL <- ALL[, substr(ALL$BT,1,1) =="B"] # only consider BRC/ABL and NEG cases my_ALL <- my_ALL[, my_ALL$mol.biol %in% c("BCR/ABL", "NEG")]

# turn molecular biology into a factor
my_ALL$mol.biol <- as.factor(my_ALL$mol.biol)

# These are the expression values:
expr_data <- exprs(my_ALL)

# extract expression values of the BCL2 gene (line 1152) for the two groups:
neg <- expr_data[1152,my_ALL$mol.biol == "NEG"] bcrabl <- expr_data[1152,my_ALL$mol.biol == "BCR/ABL"]

The two vectors neg and brcabl contain the BCL2 expression values of the negative and BRC/ABL tumors, respectively.

1. Compare the expression of the two mol.biol types (i.e. neg and brcabl) visually.
2. Compare them with a t-test. What is your conclusion?
3. Per default t.test performs a Welch test. What does that mean and should you change this default option for this particular example?
4. What are the other outputs or the t.test?
5. Is the assumption of normality justified?
6. Try a Wilcoxon test instead. Is the outcome similar or different? Does that surprise you?
7. Which of the two tests would you choose to report for a publication?

# Outliers

This example is from a genomics lecture by Rafael Irizarry.

Get yourself an impression how t-test and Wilcoxon test cope with outliers:

1. Create two random variables (vectors) x and y with 25 standard-normally distributed data points each.
2. Should x and y have different means? Confirm with a t-test and Wilcoxon test.
3. Include some outliers: Replace two elements of the vector x with the values 5 and 7.
4. Run the two tests again. How do they change with the outliers? Why? Which test is more suitable?