``````library(fitdistrplus)
library(vcd)
library(tidyverse)``````

# Distribution functions and random numbers in R

R includes a whole range of distributions: Here is a list of them.

Letâ€™s, for example, look at the help for the Gaussian functions:

``?dnorm``

We see that there are four different calls:
- `dnorm`: density
- `pnorm`: cumulative distribution function (percentage of values smaller than)
- `qnorm`: quantile function (inverse of cumulative distribution)
- `rnorm`: generates random numbers

These four functions are available for most of the distributions. The first letter specifies if we want to look at the density, probability distribution/mass function, quantile or random numbers. The suffix specifies the distribution.

The arguments depend on the distribution we are looking at, but always include the parameters of that function.

## Probability

The functions starting with `d` give us densities / probabilities for certain values to occur in a distribution that we specify ourselves.

### Binomial distribution

Letâ€™s use the example where we caught 10 frogs and count how many of them are light-colored. For known parameters, we can calculate the the chances of counting exactly 5 light-colored frogs:

``````n = 10 # number of frogs we catch
p = 0.3 # true fraction of light frogs
dbinom(x=5, size=n, prob=p)``````
``## [1] 0.1029193``

We can ask for the probability of catching at most (or at least) 5 light frogs. In this case, we need the cumulative probability distribution starting with `p`:

``pbinom(q=5, size=n, prob=p) # at most``
``## [1] 0.952651``
``pbinom(q=5, size=n,prob=p, lower.tail=FALSE) # larger than``
``## [1] 0.04734899``

Catching at least 5 light frogs is a rare event.

Discrete random variables have probability mass functions, which are only defined for integer values:

``dpois(1.5, lambda=4)``
``## Warning in dpois(1.5, lambda = 4): non-integer x = 1.500000``
``## [1] 0``

The probability here is \(0\), and comes with a warning: Your calculations potentially make no sense.

## Random numbers

You can draw random numbers from a certain distribution. We use this a lot for simulating random processes. If we do so, itâ€™s advised to set a seed for reproducibility:

``set.seed(59)``

Now we can simulate frog counts:

``````frog_counts <-rpois(n = 200, lambda = 4)
``## [1] 1 4 6 5 4 6``

Letâ€™s also simulate an experiment where each frog count was done in a different lake.

For this we use the gamma-poisson distribution:

``frog_counts_different_lakes <- rnbinom(n=200, size=2, mu=4)   # the smaller size, the more spread out``

Exercise: Simulate 200 frog sizes that follow a Gaussian distribution with an average of 7 cm, and a standard deviation of 2 cm.

Solution:

``````frog_sizes <- rnorm(n = 200, mean = 7, sd = 2)
``## [1] 8.281580 8.384858 8.695852 7.667760 9.216826 7.191485``

# Histogram

If you draw random numbers from a certain distribution, the histogram will have a shape that is specific for the distribution:

``````data.frame(frog_counts) %>%
ggplot(aes(x=frog_counts))+
geom_histogram(binwidth=1)``````

Exercise: Draw a histogram of the frog sizes:

``````data.frame(frog_sizes) %>%
ggplot(aes(x=frog_sizes))+
geom_histogram()``````

# Cumulative distrubution function

The cumulative distribution is the integral of the distribution/the histogram: It gives you the percentage of values that are smaller than a certain value.
We can visualize it with `stat_ecdf` in `ggplot2`.

``````data.frame(frog_sizes) %>%
ggplot(aes(x=frog_sizes))+
stat_ecdf()``````