CS 422 Notes 03

It is a learning model where it is trained ONLY on the inputs.

REMEMBER: Supervised learning uses input-output pairs to parameterize the model for improvement.

Ex of Unsupervised Learning: Music recommendations by using clusters of users that have similar music taste.

No classification exists to determine data clusters. Therefore, it is a challenge to separate data in their respective classes/groups.

Clustering	Classification
No Labels (Unsupervised)	Labels/Classes (Supervised)
Determine # of Clusters is part of task	Training data to determine # of classes.

Linearly transforms input features by removing the correlated features w/ the objection of maximizing the variance

OG Dataset ->PCA-> Dataset represented w/ components -> Choose the PCs that explain a certain variability in the data -> Apply the ML Algo

PCA is a “feature extraction” method, because the original input features are transformed and not included.
A “feature selection” method selects a subset of input features that explain a certain percentage of variability.
The input features that are not selected are eliminated (feature elimination).

REMEMBER: A lot of input features does not necessarily mean they are useful!

Examples of each type:

We cannot use single integers to represent classes because the numbers have a natural order.

Ex: If we flip a coin many times, we expect half of those flips to land on heads.

Ex: Probability a coin lands on heads is 0.5. We can also believe that it will equally likely to land on heads or tails on the next toss.