Binomial Distibution
Bernoulli Distribution
PMF of Bernoulli: $f_W(w) = P(W = w) = $ p if w = 1; 1 - p if w = 0
$f_W(w) = p^w(1-p)^{1-w} = p^wq^{1-w}$ for w = 0, 1
Random Sample
Random Sample and Population Sample are two different things.
For population:
$\mu = $ population mean
of X
$M = $ population median
of X
$\sigma^2 = $ population variance
of X
$\sigma = $ population std
of X
For random sample:
$\bar x = $ sample mean
of data
$\tilde x = $ sample median
of data
$s^2 = $ sample variance
of data
$s = $ sample std
of data
Poisson Distribution
Is useful when you want to model the number of events that occurs in a fixed interval of time. Ex: # of text messages PER hour
Uses the probability mass function: $p(x;\mu) = P(X=x) = \frac{e^{-\mu}\mu^x}{x!}$
R Code:
- dpois($x, \lambda$) - probability density function
- ppois($x, \lambda$) - cumulative density function. sum from 0 to n
Normal Distribution
R Code:
- pnorm($x, \mu, \sigma$) - cumulative density function. sum from 0 to n. find area under curve/percentage
- qnorm($x, \mu, \sigma$) - inverse cdf. used to find k or value of z
Standardization is used for normal distribution when you want your data to be a standard normal distribution.
Standardization: $z^* = \frac{x^*-\mu}{\sigma}$ (z score)
De-standardization: $x^* = \mu + \sigma z^*$
Central Limit Theorem
Use when sampling distribution of $\bar X$ is approximately normal
($\mu_X = \mu$ and $\sigma^2_{\bar X} = \frac{\sigma^2}{n}$) AND $n \geq 30$
For Central Limit Theorem, you must use standardization.
$Z = \frac{\bar X - \mu}{\sigma / \sqrt n}$
T-Distribution
Use when you don’t know population standard deviation and $n < 30$.
R Code:
- qt($\alpha, df, lower.tail=F$)
Chi-Squared Distribution
Hypothesis Testing
Reject: p-value < $\alpha$ Fail to Reject: p-value $\ge \alpha$
Notation is as follows: $H_0 = \mu$
Confidence Intervals
Gosset’s Theorem
Useful when the population mean and population standard deviation is unknown.
$H_0: \mu = \mu_0$
Test Statistic: $t = \frac{\bar x - \mu_0}{(s / \sqrt n)} $
Difference Between S and Sigma
$\sigma$ represents the population, S represents the sample