The data visualization cheat sheet will be very useful for today’s exercises.

1 Old Faithful

Look at the data for eruptions of the “old faithful” volcano:

str(faithful)
## 'data.frame':    272 obs. of  2 variables:
##  $ eruptions: num  3.6 1.8 3.33 2.28 4.53 ...
##  $ waiting  : num  79 54 74 62 85 55 88 85 51 85 ...
  1. Make a scatter plot of the duration vs. time of the eruptions.
  2. Use geom_smooth to fit a straight line through the scatter plot.
  3. Do you think your data is sufficiently described by the line? What could be missing?

2 Penguins

Load the penguins data that we have seen in the demo:

library(palmerpenguins)
str(penguins)
## tibble[,8] [344 × 8] (S3: tbl_df/tbl/data.frame)
##  $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
##  $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##  $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
##  $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
  1. Plot the body mass against the year (ignoring other variables). Can you see a trend?
  2. Now make the same plot, but color by species. Does that change your interpretation?
  3. Do the penguins’ bill lengths differ between female and male? Make a plot that allows you to answer that question for each species separately.
  4. Make a histogram and a density plot of the bill length of all Gentoo penguins. Is it apparent from these plots that you plotted a mixture of male and female penguins?
  5. Can you twist the plot to show the histograms/densities of both female and male bill lengths in the same plot?
  6. What is the average bill length in male penguins for the three species?

3 Quality control for expression data

We are going to look at the airway data set. This data set contains read counts per gene for airway smooth muscle cell lines in an RNA-Seq experiment. The aim of this experiment was to compare the treatment with dex (dexamethasone, a drug which is used to treat asthma) to a control (no treatment). Suppose we want to check the quality of our data. We want to test whether the genes show reproducible expression behavior in two replicates of the same condition.

The following code assigns the read counts of all tested genes in two replicates of dex treatment to two vectors, rep1 and rep2:

library(airway)
data("airway")
my_data <- assay(airway)
rep1 <- my_data[,2]
rep2 <- my_data[,4]
  1. Compare the two replicates visually.
  2. Make a histogram of the counts. Is that what you expected from your visual comparison? If not, try to play with your plot to make it more informative.

4 Bonus – Tissue-specific gene expression data

Load the following gene expression data:

library(dslabs)
data("tissue_gene_expression")
tissue <- tissue_gene_expression$y
expression <- tissue_gene_expression$x
  1. Look at the dimensions of the tissue and expression objects. What are the rows and columns in both objects? How are the objects connected to each other?
  2. Where is the expression for the gene called “FLI1” stored?
  3. Set up a data.frame with two columns: the tissue and expression values for all measurements on the gene “FLI1”.
  4. Plot the expression of “FLI1” across tissues in a way that you consider informative.
  5. What does ggbeeswarm::geom_beeswarm do and can it be useful for your plot?