CS 422 Notes 01

Notes for CS422 - Supervised Learning

September 5, 2023

Supervised Learning continued…

x $ {\in} $ X (capital X is the input space)
y $ {\in} $ Y (capital Y is the output space)

Basically, the input space is the set of possible inputs; output space is the set of possible outputs.

D = {(${x_n}, {y_n}$)}${_n}^{N}$ is the set of all input-output pairs.

N is the dataset size and D is script D.

REMEMBER: there are two types of learning models:

Classification - Supervised ML model where outputs are categorical where ${y_n}$ is categorical. (ex: classes 0-9 for handwritten digits)

EX: Classifying hand-written digits

Regression

Example of model parameters: $ f(x) = a{x^2} + bx + c$ where ${a, b, c}$ are parameters and ${x}$ is the input if ${a = 3, b = 5, c = -4}$ and ${x = 5}$: ${f(5) = 75 + 25 - 4 = 96}$

Example w/ 3-dimensional input:

enter notes here

What is classification?

It is a supervised learning model where the outputs are categorical (${y_n}$ is categorical)

EX: Hand-written digits (MNIST dataset) where the dataset has 10 different classes (0-9).

NOTE: input for MNIST is a matrix (28 * 28) matrix of pixels (black and white) where each position is heavily accounted for to predict different classes (0-9).

generalization = how well the model performs with data that it hasn’t used before.

MOST IMPORTANT NOTE: A model’s performance with data that it hasn’t used before will determine whether it is good or not (performance measure).