Lecture

Image Compression

JPEG - a form of image compression.

Code - a list of symbols (letters, numbers, bits, etc.) Code word - a sequence of symbols used to represent some information (ex: gray levels) Code worth length - number of symbols in a code word.

In terms for coding redundancy, assume you have a discrete random variable $r_k$ where it is an interval [0, L-1]. It is used to represent the intensities of an $M * N$ image.

NOTE: each r_k occurs with some probability p_r(r_k)

List of formulas to remember:

  1. Redundancy (R) = $ 1 - \frac{1}{C}$ or $\frac{C-1}{C}$
  2. Compression Rate (C) = $\frac{b}{b'}$
  3. $P_r(r_k) = \frac{n_k}{M*N}$
  4. $L_{AVG} = \sum^{L-1}_{k=0} l(r_k) * P_r(r_k)$

$P_r(r_k)$ is code 2

General Compression System

  1. Encoder - Performs the compression
  2. Decoder - Performs decompression
  3. Mapper - Converts data into a format suitable for compression.
  4. Quantizer

Huffman Encoding

  • Is used to remove coding redundancy.
  • A lossless compression technique.
  • Utilizes a frequency-sorted binary tree to encode data.

Huffman is used becuase pixels in an image do NOT have equal frequencies.

General idea of Huffman Encoding - assigns the shortest codes to the most frequently occurring pixel values (symbols).

In terms of entropy, the reason as to why the formula $H = - \sum^J_{i=1}P(a_j)logP(a_j)$ has a negative sign is because if you input any number for $a_j$ for $log$, it will be negative. A negative times a negative is a positive.

Reading

Summary