CS 422 Notes 02

$ {y_n} = f({x_n})$ where $f$ is the fuction we assume exist and try to approximate.

$ \hat{y_n} = f^({x_n})$ where $ \hat{y_n} $ is the prediction and $ f^ $ is the model’s approximation.

If we try to get comparisons for a data instance, we try to find the best line that fits the data instance the best.

Ex: you will prefer a function over another function since data is closests to it.

What is overfitting?

Fits the training data REALLY well but does not do well on the predictions when using unknown data instances (it is only trained well on the training data that it has been working with).

What is underfitting?

Fits the trianing data loosely–not complex enough (or points do not fit well/overlap).

What is the common technique to find errors?

Square Loss - getting the squared values of predicted-actual values. Examples of Squared Loss Errors:

Squared Error (Quadratic Loss): $ L({x_n}) = {(y_n - \hat{y_n})^2}$
Sum of Squared Errors: $\sum_{n=1}^N({y_n}-\hat{y_n})^2$
Mean Squared Error: $ \frac{1}{N} \sum_{n=1}^{N} ({y_n}-\hat{y_n})^2$
Mean Absolute Error: $ \frac{1}{N} \sum_{n=1}^{N} |{y_n} - {\hat{y_n}}|$

What is Linear Regression?

The algorithm will try to optimize the parameters of the line–goal is to have the dataset fit into it.

Ex: $ {argmin}({x^2}) = 0$ where argmin is ${x}$
$\hat{\theta} = {argmin}MSE({\theta})$ where argmin is ${\theta}$

It returns theta that minimize mean squared error.

IMPORTANT TO REMEMBER: The more complex the model, the more you have to optimize parameters of the line.
For data in Mean Squared Error (MSE), if the data is fitted perfectly (where no margin of error exist ${y = \hat{y}}$), then MSE will ALWAYS be equal to 0.

Things to Note:

A low error on the training data does NOT guarantee a low error on the test data.
Underfitting - When the model is not complex enough to represent the training data well.
Overfitting - When the model is too complex that it fits the trianing data very well, but does not perform well on the test data (data it hasn’t seen before).
As the complexity of the model increases, the number of parameters increase as well.

What is Generalization?

It is how well the model performs on data it has not seen before (new/test data).