Binomial Distibution

Bernoulli Distribution

PMF of Bernoulli: $f_W(w) = P(W = w) = $ p if w = 1; 1 - p if w = 0

$f_W(w) = p^w(1-p)^{1-w} = p^wq^{1-w}$ for w = 0, 1

Random Sample

Random Sample and Population Sample are two different things.

For population: $\mu = $ population mean of X $M = $ population median of X $\sigma^2 = $ population variance of X $\sigma = $ population std of X

For random sample: $\bar x = $ sample mean of data $\tilde x = $ sample median of data $s^2 = $ sample variance of data $s = $ sample std of data

Poisson Distribution

Is useful when you want to model the number of events that occurs in a fixed interval of time. Ex: # of text messages PER hour

Uses the probability mass function: $p(x;\mu) = P(X=x) = \frac{e^{-\mu}\mu^x}{x!}$

R Code:

  • dpois($x, \lambda$) - probability density function
  • ppois($x, \lambda$) - cumulative density function. sum from 0 to n

Normal Distribution

R Code:

  • pnorm($x, \mu, \sigma$) - cumulative density function. sum from 0 to n. find area under curve/percentage
  • qnorm($x, \mu, \sigma$) - inverse cdf. used to find k or value of z

Standardization is used for normal distribution when you want your data to be a standard normal distribution.

Standardization: $z^* = \frac{x^*-\mu}{\sigma}$ (z score)

De-standardization: $x^* = \mu + \sigma z^*$

Central Limit Theorem

Use when sampling distribution of $\bar X$ is approximately normal ($\mu_X = \mu$ and $\sigma^2_{\bar X} = \frac{\sigma^2}{n}$) AND $n \geq 30$

For Central Limit Theorem, you must use standardization.

$Z = \frac{\bar X - \mu}{\sigma / \sqrt n}$

T-Distribution

Use when you don’t know population standard deviation and $n < 30$.

R Code:

  • qt($\alpha, df, lower.tail=F$)

Chi-Squared Distribution

Hypothesis Testing

Reject: p-value < $\alpha$ Fail to Reject: p-value $\ge \alpha$

Notation is as follows: $H_0 = \mu$

Confidence Intervals

Gosset’s Theorem

Useful when the population mean and population standard deviation is unknown.

$H_0: \mu = \mu_0$

Test Statistic: $t = \frac{\bar x - \mu_0}{(s / \sqrt n)} $

Difference Between S and Sigma

$\sigma$ represents the population, S represents the sample