Lecture
Image Compression
JPEG - a form of image compression.
Code - a list of symbols (letters, numbers, bits, etc.) Code word - a sequence of symbols used to represent some information (ex: gray levels) Code worth length - number of symbols in a code word.
In terms for coding redundancy, assume you have a discrete random variable $r_k$ where it is an interval [0, L-1]. It is used to represent the intensities of an $M * N$ image.
NOTE: each r_k occurs with some probability p_r(r_k)
List of formulas to remember:
- Redundancy (R) = $ 1 - \frac{1}{C}$ or $\frac{C-1}{C}$
- Compression Rate (C) = $\frac{b}{b'}$
- $P_r(r_k) = \frac{n_k}{M*N}$
- $L_{AVG} = \sum^{L-1}_{k=0} l(r_k) * P_r(r_k)$
$P_r(r_k)$ is code 2
General Compression System
- Encoder - Performs the compression
- Decoder - Performs decompression
- Mapper - Converts data into a format suitable for compression.
- Quantizer
Huffman Encoding
- Is used to remove coding redundancy.
- A lossless compression technique.
- Utilizes a frequency-sorted binary tree to encode data.
Huffman is used becuase pixels in an image do NOT have equal frequencies.
General idea of Huffman Encoding - assigns the shortest codes to the most frequently occurring pixel values (symbols).
In terms of entropy, the reason as to why the formula $H = - \sum^J_{i=1}P(a_j)logP(a_j)$ has a negative sign is because if you input any number for $a_j$ for $log$, it will be negative. A negative times a negative is a positive.