1 Required packages and other preparations


fD <- function(x){
    tmp <- mean(x, na.rm = T)
    data.frame(ymax = tmp, y = tmp, ymin = tmp)

fBar <- function(x){
    tmp <- mean(x, na.rm = T)
    data.frame(ymax = tmp, y = tmp, ymin = tmp)

fCI <- function(x){
    m <- mean(x, na.rm = T)
    std <- sqrt(var(x, na.rm = TRUE))
    se <-  std/length(na.exclude(x))
    ci <- qt(0.975, max(length(na.exclude(x)) - 1, 1) ) * se
    data.frame(ymax = m + ci, y = m, ymin = m - ci)

fSD <- function(x){
    m <- mean(x, na.rm = T)
    std <- sqrt(var(x, na.rm = TRUE))
    data.frame(ymax = m + qnorm(0.975)*std, 
               y = m, 
               ymin = m + qnorm(0.025)*std)

fIQR <- function(x){
    m <- mean(x, na.rm = T)
    #m <- median(x, na.rm = T)
    qU <- quantile(x, 3/4)
    qL <- quantile(x, 1/4)
    data.frame(ymax = qU, 
               y = m, 
               ymin = qL)

fBarC <- function(x){
    m <- mean(x, na.rm = T)
    count <- m
colPalette <- c("WT" = "#1B9E77", 
            "A_MT_1" = "#D95F02",
            "A_MT_2" =  "#7570B3",
            "A_MT_3" = "#7570B3",
            "MT_1" = "#66A61E",
            "MT_2" = "#E6AB02",
            "MT_3" = "#A6761D")

beeCoordinates <- function(x){
    x <- as.numeric(x)
    res <- beeswarm(data.frame(x = x), do.plot = T)$x

2 Graphics in R

There are (at least) two types of data visualization. The first enables a scientist to effectively explore data and make discoveries about the complex processes at work. The other type of visualization provides informative, clear and visually attractive illustrations of her results that she can show to others and eventually include in a publication.

Both of these types of visualizations can be made with R. In fact, R offers multiple graphics systems. This is because R is extensible, and because progress in R graphics has proceeded largely not by replacing the old functions, but by adding packages. Each of the different graphics systems has its advantages and limitations. In the following we’ll use two of them. First, we have a cursory look at the base R plotting functions (They live in the graphics package, which ships with every basic R installation.) Subsequently we will switch to ggplot2.

Base R graphics were historically first: simple, procedural, canvas-oriented. There are specialized functions for different types of plots. These are easy to call – but when you want to combine them to build up more complex plots, or exchange one for another, this quickly gets messy to program, or even impossible. The user plots directly onto a (conceptual) canvas. She explicitly needs to deal with decisions such as how many inches to allocate to margins, axes labels, titles, legends, subpanels; once something is “plotted” it cannot be easily moved or erased.

There is a more high-level approach: in the grammar of graphics, graphics are built up from modular logical pieces, so that we can easily try different visualization types for our data in an intuitive and easily deciphered way, like we can switch in and out parts of a sentence in human language. There is no concept of a canvas or a plotter, rather, the user gives ggplot2 a high-level description of the plot she wants, in the form of an R object, and the rendering engine takes a holistic view on the scene to lay out the graphics and render them on the output device.

We’ll explore faceting, for showing more than 2 variables at a time. Sometimes this is also called lattice. The first major R package to implement this was lattice; nowadays much of such functionality is also provided through ggplot2 graphics, and it allows us to visualize data in up to four or five dimensions.

3 Base R plotting

The most basic function is plot. In the code below. It is used to plot data from an enzyme–linked immunosorbent assay (ELISA) assay. The assay was used to quantify the activity of the enzyme deoxyribonuclease (DNase), which degrades DNA. The data are assembled in the R object DNase, which conveniently comes with base R. DNase is a dataframe whose columns are Run, the assay run; conc, the protein concentration that was used; and density, the measured optical density.

   Grouped Data: density ~ conc | Run
     Run   conc density
   1   1 0.0488   0.017
   2   1 0.0488   0.018
   3   1 0.1953   0.121
   4   1 0.1953   0.124
   5   1 0.3906   0.206
   6   1 0.3906   0.215
plot(DNase$conc, DNase$density)