Fundamentals of Probability: Introduction to Probability

I. Introduction:

Main Idea:

After introducing the concept of distributions as the probability mappings to the possible range of values of a random value, here we will take a tour of different distributions. This is meant to be a slightly deeper glance while focusing on the cocepts behind the distributions. Distributions are often split between discrete and continuous. Discrete refers to random variables having individually seperable values, eg. integer values. Continuous refers to random variables having a range of values that take any value.

Largely, the expectations or distrutions we are trying to estimate are intractible, and so approximation is usually the best method toward inference.

Distributions are incredibly diverse and contain different classifications, but they all share the same rule that the sum of all probability values of a distribution must equal 1.


III. Theory Pt.2:

Distributions: Discrete Distributions

Discrete distributions can be described as mapping the probability each value of a discrete random variable. A discrete random variable has integer values, so a discrete distribution maps probabilities onto each of these values.

1. Bernoulli Distribution:

The Bernoulli Distribution is based on the idea of a Bernoulli trial, which is: "a random experiment with exactly two possible outcomes, 'success' and 'failure', in which the probability of success is the same every time the experiment is conducted". Moreover, the Bernoulli distribution maps the outcomes from a Bernoulli trial to a probability, where 'success'(or 1) is given the probability \( p \), and 'failure'(or 0) is give the probability of \( 1-p \), so

\begin{align} X \sim \text{Bern}(p) \end{align}
\begin{align} p(X=x) = \begin{cases} p, & \text{if } x=1 \\ 1-p, & \text{if } x=0 \\ \end{cases} \end{align}

Notation: p is just the variable that represents the probability of getting a success, which is \( p(X=1) = p \) The support of p is \( {0, 1} \), and we can calculate expectation and variance the Bernoulli distribution as

\begin{align} \mathbb{E}[X=1] &= p \\ \text{Var}[X=1] &= p (1-p) \end{align}

2. Geometric Distribution:

The Geometric Distribution describes sequences of size k composed of Bernoulli trials such that the k-th trial is the first success . Because Bernoulli trials are independent and unique, we can also think of this as length-k sequences of Bernoulli trials that have exactly one success anywhere in the sequence.

\begin{align} X \sim \text{Geo}(p) \end{align}
\begin{align} p(X=k ; p) &= (1-p)^{k-1} p \end{align}

Notation: k refers to the length of the sequences. The k-th element is usually expressed to be the first and only success. The support of k is \( {1, 2, \dots } \).

\begin{align} \mathbb{E}[X=1] &= \frac{1}{p} \\ \text{Var}[X=k] &= \frac{1-p}{p^2} \end{align}

3. Binomial Distribution:

The Binomial Distribution describes "the probability of getting exactly k successes in n independent Bernoulli trials". In other words, it describes the probability of getting k unique and independent sequences from n Bernoulli trials. The Binomial is seen as a generalization of the Bernoulli and Geometric distribution since they serve as special cases of the Binomial distribution.

\begin{align} X &\sim \text{Bin}(n,p) \end{align}
\begin{align} p(X=k; n,p) &= {n \choose k}p^k(1-p)^{n-k} \\ {n \choose k} &= \frac{n!}{k! \; (n-k)!} \end{align}

Notation: k is the number of successes in a sequence, n is the length of each sequence, and p is the given probability of a Bernoulli having an output of success. The support of k is \( k \in {0, 1, \dots, n} \), the suppport of n is \( n \in {1, \dots} \) and the support of p is \( p \in {0, 1} \).

\begin{align} \mathbb{E}[X=k] &= np \\ \text{Var}[X=k] &= np(1-p) \end{align}

The Binomial Coeffient, \( n \choose k \) represents the number of unique and independent k-length subsequences we can make using the elements in the original sequence of length n. What we mean by independent is that sequences with the same values but different positions are considered equal, so \( \{ a,b,c \} = \{ b,c,a \} = \dots \).

To understand this better, we can look at \( n \choose k=n \); this gives us the unique k-length subsequences we can make. Given that \( k=n \), we can make only a single unique and independent subsequence since all other combinations give us sequences with the same values but under different positions, all of which are equal to each other if we want independence.

Mathematically, \( n \choose k \) can be expressed as the unique non-independent n-length sequences, \( n! \), divided by both the unique number of non-independent k-length sequences, \( k! \), and the unique number of non-independent (n-k)-length sequences, \( (n-k)! \). \begin{align} n \choose k = \frac{n!}{n! \; (n-k)!} \end{align}


4. Multinomial Distribution:

The Multinomial distribution describes the number of n successes for k categories. In other words, "for n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories." For example, it models the probability of counts for each side of a k-sided die rolled n times.
The Multinomial distribution is seen as a generalization of the Binomial distribution since the Binomial distribution is a special case of the Multinomial distribution. The Multinomial distribution is the Binomial distribution when the number of categories is two, \( k=2 \), and the number of successes is greater than or equal to one, \( n \geq 1 \). The Multinomial distribution is the Bernoulli distribution when the number of categories is two, \( k=2 \), and the number of successes is one, \( n=1 \).

\begin{align} X \sim \text{Multinomial}(x_1, \dots, x_k; n, p_1, \dots, p_n) \end{align}
\begin{align} f(x_1, \dots, x_k; n, p_1, \dots, p_n)= \begin{cases} \frac{n!}{x_1! \dots x_k!}p_1^{x_1} \dots p_k^{x_k},& \text{when } \sum_{i=1}^k x_i = n\\ 0, & \text{otherwise} \end{cases} \end{align}

Notation: k refers to the number of categories in X, n refers to the length of the sequences, and \( p_i \) is the probability of success in each i-th category. The support of x is \( x_i \in \{ 0, 1, \dots \} \), the support for i is \( i \in \{ 1, \dots, k \} \)

\begin{align} \mathbb{E}[X_i=k] &= np_i \\ \text{Var}[X_i=k] &= np_i(1-p_i) \end{align}

5. Poisson Distribution:

The Poisson distribution describes the number of k successes that occurs in an interval of fixed \( \lambda \). It is therefore related to the Binomial Distribution if we take take the limit of trials going to infinity; here the probability of successes becomes the value the interval \( \lambda \) divided by the number of trials.

\begin{align} X \sim \text{Poi}(\lambda) \end{align}
\begin{align} p(X=k; \lambda) &= \frac{\lambda^k e^{-\lambda}}{k!} \end{align}

Notation: k refers to number of successes in an interval \( \lambda \). The support of \( \lambda \) is \( \lambda \in \mathbb{R} \), and the support for k are the natural numbers starting from zero, \( k \in \{ 0, 1, \dots \}\).

\begin{align} \mathbb{E}[X=k] &= \lambda \\ \text{Var}[X=k] &= \lambda \end{align}

III. Theory Pt.2:

Distributions: Continuous Distributions

Continuous distributions maps a probability density to a range of values using a probability density; they create a probability distribution over a range of values.

1. Exponential Distribution:

The Exponential distribution describes the probability the time between events that are continuous and independent at a constant average rate. Events that behave in this manner are described by a Poisson point process - a process in which events occur continuously and independently at a constant average rate.

The Exponential distribution is the continuous analogue of the geometric distribution. A particular property of the Exponential distribution is that is it memoryless, which means a future event only depends on the current event.

\begin{align} X \sim \text{Exp}(\lambda) \end{align}
\begin{align} f(x; \lambda) = \begin{cases} \lambda e^{-\lambda x}, & x \geq 0\\ 0, & x < 0 \end{cases} \end{align}

Notation: p is just the variable that represents the probability of getting a success, which is \( p(X=1) = p \) The support of x is \( x \in [0, \infty ) \), with \( \lambda > 0 \).

\begin{align} \mathbb{E}[X=k] &= \frac{1}{\lambda} \\ \text{Var}[X=k] &= \frac{1}{\lambda^2} \end{align}

2. Gamma Distribution:

The Gamma distribution is what's called a two-parameter family of probability distributions, which means it serves as a generalization of a family of distributions, including the Exponential and Chi-Squared distributions. The Gamma distribution has two parameters, a shape parameter \( \alpha= k \) and an inverse scale parameter \( \beta=\frac{1}{\theta} \), called a rate parameter.

\begin{align} X \sim \text{Gamma}(\alpha, \beta) \end{align}
\begin{align} p(X=x; \lambda) &= \frac{\beta^\alpha x^{\alpha -1} e^{-\beta x}}{\Gamma(\alpha)} \end{align}
\begin{align} \Gamma(n) = (n-1)! \end{align}

Notation: The Gamma distribution is parameterized by \( \alpha= k \), the shape parameter, and the rate parameter \( \beta=\frac{1}{\theta} \).

\begin{align} \mathbb{E}[X=k] &= \frac{\alpha}{\beta} \\ \text{Var}[X=k] &= \frac{\alpha}{\beta^2} \end{align}

The Gamma Function is a generalized form of the factorial to take non-integer values. For integers it is defined as \begin{equation} \Gamma(n) = (n-1)! \end{equation} For non-integer values, the gamma function is, \begin{equation} \Gamma(z) = \int_{0}^{\infty} x^{z-1} e^{-x} dx \end{equation}


3. Uniform Distribution:

The Gamma distribution is what's called a two-parameter family of probability distributions, which means it serves as a generalization of a family of distributions, including the Exponential and Chi-Squared distributions. The Gamma distribution has two parameters, a shape parameter \( \alpha= k \) and an inverse scale parameter \( \beta=\frac{1}{\theta} \), called a rate parameter.

\begin{align} X \sim \text{U}(a, b) \end{align}
\begin{align} f(x) = \begin{cases} \frac{1}{b-a}, & \text{for } a \leq x \leq b\\ 0, & \text{for } x < a, x > b \end{cases} \end{align}

Notation: The parameterization of the Uniform distribution is an interval between a and b. The support of a and b are: \( -\infty < a < b < \infty \).

\begin{align} \mathbb{E}[X=k] &= \frac{1}{2} (a+b) \\ \text{Var}[X=k] &= \frac{1}{12} (b-a)^2 \end{align}

4. Beta Distribution:

The Beta distribution is another family of distributions, but this time, it is for distributions defined on the interval \( [0,1] \).

\begin{align} X \sim \text{Beta}(\alpha, \beta) \end{align}
\begin{align} p(X=x; \alpha, \beta) &= \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} x^{\alpha - 1} (1-x)^{\beta - 1} \\ &= \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1-x)^{\beta - 1} \end{align}

Notation: The Gamma distribution is parameterized by \( \alpha= k \), the shape parameter, and the rate parameter \( \beta=\frac{1}{\theta} \).

\begin{align} \mathbb{E}[X=k] &= \frac{\alpha}{\alpha + \beta} \\ \text{Var}[X=k] &= \frac{\alpha \beta}{(\alpha + \beta)^2(\alpha + \beta + 1)} \end{align}

5. Gaussian Distribution:

The Gamma distribution is one the most important distributions, which is related to the Central Limit Theorem.

\begin{align} X \sim \text{N}(\mu, \sigma^2) \end{align}
\begin{align} p(X=x; \lambda) &= \frac{1}{\sigma 2 \pi} e^{\frac{1}{2}\big(\frac{x-\mu}{\sigma}\big)^2} \end{align}

Notation: The Gamma distribution is parameterized by \( \alpha= k \), the shape parameter, and the rate parameter \( \beta=\frac{1}{\theta} \).

\begin{align} \mathbb{E}[X=k] &= \mu \\ \text{Var}[X=k] &= \sigma^2 \end{align}