multimedia signal processing algorithms part ii – minimization of the amount of information to be...

MULTIMEDIA SIGNAL PROCESSING

ALGORITHMS

PART II – MINIMIZATION OF THE AMOUNT OF INFORMATION TO BE PROCESSED AND

BASIC ALGORITHMS

The second principle of biological processingSEEMS TO BE :

MINIMIZATION OF THE AMOUNT OF

INFORMATION TO BE PROCESSED

THAT IS THE PROCESSING SYSTEM ELIMINATES AS MUCH INFORMATION AS POSSIBLE AND USES ONLY ABSOLUTELY NECESSARY MINIMUM TO ACHIEVE ITS TASKS

Why this principle is reasonable? Minimizing information to be processed saves energy, increases speed, reduces effort and is overall logical to do. This is not limited to biology but also applies to technical systems.

IN PREVIOUS LECTURES THIS PRINCIPLE

WAS EVIDENT SEVERAL TIMES:

WE ARE ABLE TO RECOGNIZE OBJECTSBASED ON VERY MINIMAL INFORMATIONTHIS MEANS PROCESSING SYSTEM IS ABLE TO REDUCE INFORMATION TOMINIMUM OR IN OTHER WORDS TOEXTRACT THE NECESSARY MINIMUM

SO WE CAN HAVE THE MAIN PRINICPLE FOR THIS

COURSE : FOR EFECTIVE MULTIMEDIA SIGNAL

PROCESSING ONE HAS TO MINIMIZE THE AMOUNT

OF INFORMATION PROCESSED, EXTRACT

THE ABSOLUTELY NECESSARY MINIMUM

FOR THE PROCESSING TASK. HOW TO DO THIS IS

NOT ALWAYS CLEAR AND EASY, WE NEED TO

STUDY THIS.

The second principle, as indicated before can be statistical

processing, producing results matched to the most likely

signals happening in the real world. But this principle also

has to be applied correctly.

ASSUME WE HAVE COMPUTER WITH CAMERAAND DIGITIZER CARD AND WE WOULD LIKE TO EXTRACT VISUAL INFORMATION ABOUT ENVIRONMENT LIKE OUR EYES DO (OR WE HAVE MICROPHONES AND WE WOULD LIKE TO EXTRACT ACOUSTICAL INFORMATION LIKE OUR EARS DO)

HOW WE SHOULD PROGRAM THE COMPUTER?

NOW LET US GO TO TECHNOLOGYASSUME WE HAVE COMPUTER SYSTEM:

Let’s think about typical example which is already

becoming popular in cameras:

We would like to implement algorithms which will mark

faces in pictures, recognize familiar faces. This may of

course extended to other objects and complete scenes, for

example camera would recognize if the picture is taken of

familiar building or landscape. The problem is not easy

since objects can be seen from different viewpoints,

lighting, time.

But the input to algorithm which we have is digitized picture

• WHAT IS THE PICTURE AFTER DIGITIZATION?

IT IS A MATRIX OF NUMBERS. THE MATRIX SIZE CAN BE EG. 256X256 OR

720x576 – TELEVISION PICTURE 1024X768 - COMPUTER MONITOR1920x1080- HIGH DEFINITION TELEVISION

PICTURE MATRIX ELEMENTS ARE USUALLY 8-BIT NUMBERS, THIS CORRESPONDS TO 256 LEVELS OF LIGHTWHICH IS ENOUGH.

COLOR PICTURES ARE DESCRIBED BY THREE SUCH MATRICES FOR EACH BASIC COLOR

HERE IS A PICTURE FROM MARS LANDERAND PART OF THE MATRIX NEARTHE OBJECT

WHAT WILL HAPPEN WHEN THE PICTURE RESOLUTION IS TOOSMALL?

RESOLUTION WILL BE IMPAIREDLESS DETAILS VISIBLE

HERE WE SEE WHAT WILL HAPPENWHEN RESOLUTION IS REDUCED FROM 512X512TO 32X32

WHAT IS THE SIZE OF

ONE TV PICTURE IN BITS?

720x576x3x8-bit = about 10 Mbits

• TOPIC: COLOR PROCESSING

IMAGES ARE REGISTEREDIN THREE BASIC COLORCOMPONENTS: RGB=RED, GREEN, BLUE

MIXTURE OF THESE COLORSPROVIDES OTHER COLORS

WE HAVE TO USE THREEIMAGE MATRICES TO REPRESENT ONE COLOR PICTURE

RGB REPESENTATION IS USED FOR DISPLAY, E.G.COMPUTER MONITORS ORTELEVISION PANELSARE DRIVEN BY R,G,BSIGNALS

• COLOR IMAGE AND RGB COMPONENTS

• WE OFTEN PERFORM CONVERSION TO MORE SUITABLE COLOR SPACE

TWO SUCH SPACES ARE VERY USEFUL:

YUV SPACE AND HSV SPACE

YUV SPACE :

Y – INTENSITY OF (WHITE) LIGHT

U, V – COLOR CHROMINANCES

TO OBTAIN YUV REPRESENTATION

WE TAKE THE R,G,B COLOR MATRICES

FOR A PICTURE AND CONVERT THEM BY ->

• RGB->YUV TRANSFORMATION

B

G

R

V

U

Y

100.0515.0615.0

437.0289.0148.0

114.0587.0299.0

NOTE: Y IS BLACK AND WHITE COMPONENT, THAT IS MIXTURE OF R, G, B WHICH GIVES GRADATIONS OF WHITE COLOR, FROM BLACK TOGREY TO WHITE.

U AND V ARE COLOR COMPONENTS – DO NOT HAVE PHYSICAL MEANINGTHUS HERE INTENSITY OF LIGHT IS SEPARATED FROM COLOR INFORMATION

• AFTER THIS TRANSFORMATION

INSTEAD OF THREE R,G,B MATRICES

WE GET THREE MATRICES Y, U, V

TRANSFORMATION IS INVERTIBLE SO ALL INFORMATION IS PRESERVED

BUT NOW WE CAN PLAY A TRICK:

HUMAN VISUAL PROCESSING IS MUCH LESS SENSITIVE TO COLOR INFORMATION THAN TO

BLACK AND WHITE LIGHT INTENSITY

INFORMATION

THUS, MATRICES U,V CAN BE REDUCED IN

SIZE

• SUBSAMPLING OF MATRICES U AND V

FOR 4 ELEMENTS OF Y THERE WILL BE

TAKEN ONLY ONE ELEMENT OF U,V

Y1 Y2 U U V V ELEMENTS U AND V CAN BE E.G.

Y3 Y4 U U V V AVERAGE VALUE OF ORIGINAL

4 ELEMENTS U AND V

THUS MATRICES U,V CAN BE REDUCED

BY FACTOR OF 4 IN SIZE

RETURNING BACK TO RGB FORM WILL

NOT CHANGE THE PICTURE VISUALLY

• THE RGB->YUV TRANSFORMATION

USES DIRECTLY PROPERTY OF HUMAN

VISION WHICH ALLOWS:

- TO REDUCE THE SIZE OF COLOR IMAGES

(IMPORTANT FOR COMPRESSION)

- TO USE ONLY LIGHT INTENSITY WITHOUT COLOR INFORMATION (FOR E.G. RECOGNITION OF OBJECTS)

• ANOTHER TRANSFORMATION IS HSI

HSI IS MORE RELATED TO HUMAN PERCEPTION WHERE WE CAN SEE SATURATION OF COLORS THAT IS WE CAN TELL ”REDNESS”, ’BLUENESS’ OF COLORS AND SO ON.TO GET THE HSI REPRESENTATION WE MAP RGB INTO H – HUE (COLOR) S – SATURATION (AMOUNT OF WHITE MIXED WITH COLOR) I - INTENSITY (AMOUNT OF GREY LEVEL

EQUATIONS FOR HSI FROM RGB AND VICE VERSA:

BASIC ASPECTS OF THE HSI REPRESENTATION:

ON A CUBE THERE ARE SOME

OTHER ’BASIC’ COLORS

APART OF RGB, MAIN

DIAGONAL IS THE AMOUNT

OF WHITE

ON THE DIAMOND WE SEE

COLORS AROUND HEXAGON

HEIGHT IS AMOUNT OF

WHITE, SATURATION IS X-AXIS LOOK WHERE IS THE I (V) AXIS, S AXIS AND HUE ANGLE

• HSI TRANSFORMATION IS USEFUL SINCE WE GET REPRESENTATION IN COLOR SPACE WHICH CORRESPONDS TO THE PROPERTY OF HUMAN VISION, THAT IS INTENSITY LEVEL CAN BE ESTIMATED. COLOR SATURATION, AND THE COLOR ITSELF.

DIGRESSION ON COLOR SENSORSASSUME YOU BUY DIGITAL CAMERA WITH E.G.

5 MEGAPIXELS.

WHAT DOES THIS MEAN?

IT TURNS OUT THAT THE PIXEL DEFINTION IS

DIFFERENT FOR DIFFERENT APPLICATIONS.

TRADITIONALLY

1 PIXEL = R, G, B COLOR COMBINATIONS

SO WE NEED 3 COLOR SENSORS FOR

CAMERA OR

3 COLOR ELEMENTS FOR DISPLAY

FOR EXAMPLE:

LCD COMPUTER MONITOR WITH RESOLUTION OF

1280X1024 PIXELS

HAS 1280X1024 ELEMENTS FOR EACH R, G, B COLOR,

THAT IS IT HAS 1280X1024X3 DISPLAY ELEMENTS.THE DISPLAY ELEMENTS ARE CALLED

SUBPIXELS, ONE PIXEL IS COMPOSED OF THREE

SUBPIXELS R G B

IN DIGITAL CAMERAS THIS IS DIFFERENTSENSOR IN DIGITAL CAMERAS LOOKS LIKE THIS:

IN DIGITAL CAMERAS EVERY COLOR SUBPIXEL COUNTS AS ”PIXEL”THE PIXELS ARE ARRANGED INA MATRIX CALLED BAYER SENSOREACH ”CAMERA” PIXEL IS MADEBY 4 COLOR PIXELS: 1 RED,2 GREEN, 1 BLUE (REMEMBER THAT MOST OF VISIBLE LIGHT IS GREEN)

WE CAN NOTICE THAT ”FULL” COLOR PIXEL CANBE MADE FROM OVERLAPPING SQUARES BY HALF SHIFT

PIXEL 1PIXEL 2

SO THE E.G. 5 MILION PIXELS IN DIGITALCAMERA IS NOT EXACTLY 5 MILIONIN THE DISPLAY SENSE.IT SHOULD BE DIVIDED BY 4 OR BY TWO IF WE TAKE INTO ACCOUNT INTERPOLATION

BUT THERE ARE TWO EXCEPTIONS:

THERE ARE VIDEO CAMERAS WHICH HAVE 3 CCD SENSORSSEPARATELY ONE FOR EACH R,G,B COLORS

IN 3 CCD VIDEO CAMERAS OPTICALSYSTEM SPLITS LIGHT INTO 3 SENSORS WHICH PICKUP R,G,B COLORS.TOTAL NUMBER OF PIXELS CORRESPONDS TO THE NUMBER OF PIXELS IN DISPLAY

ANOTHER EXCEPTION IS FOVEON SENSOR

IN FOVEON, THERE IS ONE SENSORBUT IT MEASURES ALL 3 RGB COLORSIN ONE AREA THIS IS BASED ON THEFACT THAT PHOTONS GO TO DIFFERENT DEPTHS IN THE SEMICONDUCTOR DEPENDING ONTHEIR WAVELENGHTS www.foveon.com

COMPARISON:

WE CAN SEE THAT SINGLE SENSOR DEVICESHAVE LOWER RESOLUTION THAN 3 SENSORDEVICES OR FOVEON.

BUT THEY ARE EASIEST TO PRODUCE

SO THE NUMBER OF THEIR COLOR PIXELS IS INCREASING ALL THE TIME AND RESOLUTIONPROBLEM IS SOLVED.....

• The elimination of information based on color

is an example of much more general principle:

Input signal

Elimination of information

Output signal,representationof the input signalwhich is ”just good enough”for specific task

How to produce the ”good enough” representation is the essential problemto solve

Next we will show example of representation by edges

• EDGE DETECTION LINEAR FILTERING: AREA AROUND EVERY POINT IN THE IMAGE MATRIX IS MULTIPLIED

z l mu x vn p q BY VALUES FROM

ANOTHER MATRIX AND RESULT IS SUMMED UP

• DEPENDING ON THE MATRIX BY WHICH

WE MULTPILY WE HAVE SEVERAL TYPES

OF FILTERS:

LOW PASS – SUM OF FILTER COEFFICIENTS

IS ONE

BANDPASS – SUM OF FILTER COEFFICIENTS

IS ZERO

HIGPASS - SUM IS BETWEEN ZERO AND

ONE

• WE SAID THAT IN HUMAN VISUAL SYSTEM

IN THE RETINA PROCESSING ELEMENTS

ARE SENSITIVE FOR CHANGES IN LIGHT

LEVEL.

THIS IS EQUIVALENT TO BANDPASS

FILTERING

SPECIAL CLASS OF BANDPASS FILTERS

IS CALLED EDGE DETECTORS SINCE THEY

ARE DESIGNED TO DETECT SHARP CHANGES IN

IMAGE LIGHT INTENSITY

• LET US CONSIDER THE FOLLOWING

SITUATION – WHITE BAR ON BLACK

BACKGROUND OR OPPOSITE

OUR VISUAL SYSTEM AND WE HERE ARE INTERESTEDMOSTLY IN AREAS WHERE LIGHT IS CHANGINGIT VALUE, SHARP CHANGESIN LIGHT VALUE ARE CALLEDEDGES

HOWEVER, THERE IS A PROBLEMHERE: WHAT EXACTLY IS SHARPCHANGE IN INTENSITY?THIS IS NOT WELL DEFINEDON THE RIGHT WE SEE SOME EXAMPLES OF LIGHT CHANGE:RAMP EDGE – LIGHT INCREASING GRADUALLYSTEP EDGE – SHARP TRANSITION

NARROW LINE

ROOF EDGE

THERE COULD BE MANY MORESUCH EXAMPLES!

• EDGE DETECTION IS EQUIVALENT

TO DIFFERENTIATION IN

CONTINUOUS FUNCTION DOMAIN

0),(

x

yxFif F(x,y)=const

BUT IN IMAGES WE HAVE LIMITED NUMBEROF PIXELS SO WE CAN PERFORM ONLY APPROXIMATE DIFFERENCING

• EDGE DETECTORSHERE WE HAVE TWO MATRICES

OF FILTERS FOR DIFFERENCING

NOTE THAT THE FIRST ONE WILL PROVIDE ZERO OUTPUT

WHEN THERE ARE CONSTANT

VALUES IN VERTICAL DIRECTION

AND SECONDE WHEN THERE

ARE IN HORIZONTAL

DIRECTION

• NOW LET’S TAKE THE OUTPUTS OF

BOTH FILTERS AND COMBINE THEM

TOGETHER, FOR EXAMPLE BY

VHZ THE OUTPUT WILL NOWBE QUITE INDEPENDENTFROM THE DIRECTIONOF EDGES

NOTE THATGC/GR IS EQUIVALENTTO THE DIRECTIONOF AN EDGE

• HERE WE HAVE EXAMPLE OF RESULTS:

- ORIGINAL PICTURE

- HORIZONTAL DETECTOR

- VERTICAL DETECTOR

- BOTH COMBINED

AS WE CAN SEE THE COMBINED OUTPUTGIVES BORDERS OF OBJECTS SO WE CAN RECOGNIZE IT EVEN IFTHERE IS LITTLE INFORMATIONTHIS MAY CORRESPOND IN SOME WAYTO HOW HUMAN SYSTEM WORKS

• WHY WE USED JUST SUCH MATRIX FOR

EDGE DETECTION?

THERE CAN BE MANY SUCHMATRICES USED, SOME OF THEM ARE SHOWN HERE,

AND MANY OTHERS ARE KNOWN

THEY DIFFER IN PROPERTIESAND OPERATION IN NOISE

E.G. PREWITT, SOBEL ARE GOOD

• IF WE TALK ABOUT OPERATION IN NOISY

IMAGES, THERESHOLDING IS IMPORTANT

AFTER RUNNING A DETECTOR WE GET

OUTPUT SIGNAL. UNFORTUNATELY THIS

CAN BE MADE BY NOISE, NOT BY EDGE.

EDGE DETECTORS CAN BE SENISITVE TO

NOISE.

WE THRESHOLD THE OUTPUT SIGNAL

IF IT IS > THAN SOME VALUE T

IT IS CLASSIFIED AS EDGE

HERE OPERATION OF EDGEDETECTOR IN NOISY CONDITIONSWITH THRESHOLDING IS SHOWN:AT LOW NOISE LEVEL IT IS GOOD

AT HIGHER NOISE LEVEL, WE GETSOME NOISE POINTS CLASSIFIEDAS EDGES, AND SOME EDGEPOINTS ARE MISSING (WE SEE GOOD EDGE)AT VERY HIGH NOISE LEVEL,THE DETECTOR OPERATIONBREAKS UP COMPLETELY ANDNO EDGE IS DETECTEDNOTE THAT WE CAN SEE SOMEEDGE IN THIS PICTURE

SO IN NOISY CONDITIONS THERE ARE PROBLEMS

WITH EDGE DETECTORS BUT SOMEHOW IN HUMAN

VISION THEY WORK VERY WELL – HOW???

RESEARCHERS MOTIVATED BY HUMAN VISION

NOTICED THAT FILTERING ELEMENTS IN HUMAN

RETINA AT THE BACK OF THE EYE ARE MORE

COMPLICATED THAN SIMPLE DETECTORS HERE.

• MOTIVATED BY OBSERVATION OF HUMAN SYSTEM AND SOME CONSIDERATION OF OPTIMAL NOISE ATTENUATION A ZERO-CROSSING, OR LAPLACIAN-OF-GAUSSIAN DETECTOR WAS DESIGNED

THIS DETECTOR IS OBRAINEDBY TAKING SECONDDERIVATIVE OF GAUSSIAN CURVE

222 2/)(2

224 ]

21[/1 syxe

s

yxs

The resulting curve has

characteristic ’Mexican’ hat shape

NOW IF WE TAKE SECOND DERIVATIVE OF THE OUTPUT,WE NOTICE THAT EDGE IS WHEN SIGNAL CROSSES ZERO !

• ZERO CROSSING EDGE DETECTOR WILL

BE BETTER IN NOISY CONDITIONS BUT IT

IS MORE COMPLICATED SINCE IT

REQUIRES MUCH MORE OPERATIONS FOR

CALCULATION

Assuming the we have such detector the next problem is how to build representation based on edges and this is shown next

• LINKING EDGE POINTS TO FORM CONTOURS OF OBJECTS:

WE LINK OUPUT POINTS FROM EDGE DETECTOR WHEN THEIR VALUES ARE SMILAR:

- SIMILARITY MEANS

- AMPLITUDE DIFFERENCE IS SMALLER

THAN SOME THRESHOLD

- ANGULAR DIRECTION IS SIMILAR

LINKED EDGES ARE THOUGHT TO BELONG TO

SAME OBJECT

• EXAMPLE

ORIGINALPICTURE

HORIZONTALDETECTOR

VERTICAL DETECTOR

RESULTOF EDGE LINKING

• SEGMENTATION

HOW TO EXTRACT OBJECTS FROM PICTURES? THIS CAN BE DONE BASED ON

FEATURES SUCH AS INTENSITY OR COLOR

• WE CAN GROUP AREAS WITH SPECIFIC

FEATURES BY LINKING THEM TOGETHER

IF TWO AREAS HAVE THE SAME FEATURE

WE LINK THEM TOGETHER

SEGMENTATION ALGORITHM

START WITH SOME AREA AND DIVIDE IT

IN FOUR PARTS, CONTINUE DIVISION UNTIL ONLY PARTS

WITH SPECIFIC FEATURE ARE KEPT

• THRESHOLDING

WE NEED TO DIFFERENTIATE BETWEEN THE

’USEFUL’ DATA AND ’NONEUSEFUL’

THRESHOLDING WORKS ON THE PRINCIPLE

THAT USEFUL SIGNAL IS STRONGER.

IF SIGNAL < T WE SET IT TO ZERO.

HOW TO SELECT T?

IF THE THRESHOLDIS SELECTED HEREWE CAN SEPARATEBACKGROUND ANDOBJECT

FOR THRESHOLDING,HISTOGRAM CAN BE USED SINCEIT OFTEN PROVIDES VIEW HOWOBJECT AND BACKGROUND CANBE SEPARATED

HOWEVER, FULLY AUTOMATICTHERSHOLDING IS DIFFICULTSINCE NOISE AND OBJECTLIGHT INTENSITIES MAY BE NOTCOMPLETELY SEPARATED

• FEATURE DETECTION

FEATURES ARE SMALL PARTS OF OBJECTS

WHICH ARE CRITICAL FOR RECOGNITION

AND REPRESENTATIONFEATURES

MMSP Irek Defée

• HOW TO DETECT FEATURES?THIS IS QUITE DIFFICULT PROBLEM.

FEATURES ARE OFTEN COMPOSED OF SHORT

LINE SEGMENTS, E.G. CORNERS THIS CORNERIS COMPOSEDOF TWO LINES

WE CAN THINK TO APPLY EDGEDETECTOR AND THRESHOLDING FORFINDING FEATURES

CORNEREDGE

• FOR COMPACT REPRESENTATION WE HAVE TO ELIMINATE ALL NONRELEVANT

SIGNAL ELEMENTS. THIS IS TASK SIMILAR TO MEDIA COMPRESSION MEDIA COMPRESSION HAS A GOAL TO

MINIMIZE DESCRIPTION OF MEDIA WHILE PRESERVING PERCEPTUAL QUALITY.

THIS IS ALSO IMPORTANT TO GENERAL MULTIMEDIA SIGNAL PROCESSING SINCE IT

MINIMIZES THE AMOUNT OF INFORMATION TO BE PROCESSED.

:

MEDIA SIGNAL IS A STREAM OF BITS

HOW TO REDUCE THENUMBER OF BITS NEEDEDFOR THE DESCRIPTION?

THIS CAN BE DONE IN 2 WAYS:-MORE EFFICIENT DESCRIPTION OF BITSTREAM-ELIMINATING PERCEPTUALLY INSIGNIFICANT INFORMATION

Technically this is called compression of information

COMPRESSION CAN BE DONE ON

BIT LEVEL -> BIT STREAM

BLOCK-LEVEL -> SMALL BLOCKS

OBJECT-LEVEL -> OBJECTS IN PICTURES

PICTURE-LEVEL -> SAME PICTURE IN

DIFFERENT SIZES IS VERY SIMILAR

COMPRESSION IS ALSO RELATED TO REPRESENTATION OF VISUAL INFORMATIONLET’S TAKE THE FOLLOWING EXAMPLE:

ihg

fed

cba This is matrix of 3x3 points taken from a picture. Each point represents number from0-255, that is 8-bit number.

How many different signal matrices can be constructed out of these numbers?(28)9 = 272 - this is huge number

ONLY MEANIGNFUL INFORMATION FROM THESE MATRICES MUST BE EXTRACTED. BUT WHAT IS THIS INFORMATION? IT IS ABOUT SPECIFIC SIGNAL CHANGES....

What are then those changes in small areas of pictures

which might be of interest?

1. We were talking until now about edges

We also mentioned that there can be different types of

edges in pictures

2. There can be also other types of information in these

small areas (e.g. lines)

3. The question is how to account for this information?

Let see some examples: What is there?

Dark line? Dark Line? Edge?Plus grey dots? Roof edge? Edge with white Plus black dots? dot?

We see here that interpretation of small areas of picturesis ambiguous, several interpretations are possible.Sometimes a feature looks like nonideal or contaminatedby other feature

Dots? Line?Diagonal edge?

So how to interpret such real signals?There has to be very efficient extraction mechanism allowing for - extraction of multiple features- dealing with imperfect features

What seems to be very important is that features are made by grouping pixels which are touching and have similar values.

Second, sometimes features might be imprefect. Thus, we have to try to assign each pixel where it might belong – to some feature(s) or not.

We take center pixel and try to find a group of pixels to which it belongs. Pixel belongs if it has the same value, similar value or its value can be INTERPOLATED from neighbouring pixels.

Where the center pixel belongs?

It belongs to vertical grey linebecause pixel values are same,it belongs to diagonal edgeif its value can be interpolated from neighbouring pixels, thatis the pixel values change in linear way

Pixel intensityvalues,center pixelvalue isaverageof the other two

So we can try to assign pixel to neighbouring

pixels. But there will be a problem if we look

into larger area, Pixels may belong to many

different areas

It will be good to detectregularity in the areas

When areas are irregularthey may be random and thus not interesting

How to find regularity?

By transforming area of a picture using periodic

(orthogonal) basis, e.g. Fourier Transform.

But Fourier transform has complex values which is

not the most efficient (2 real numbers)

In practice there are two other transforms used:

Discrete Cosine Transform, DCT and hierarchical

4x4 transform related to it

DCT TRANSFORMATION

• DCT : Discrete Cosine Transform • Reduction of spatial redundancy • Transform block size: 8 x 8 in our case

f x y c c f u vx u y v

where u v x y c k

k

uu v

v

k

( , ) ( , )cos[( )

]cos[( )

]

, , , , ,..., / ,

,

1

42 1

162 116

0 1 7 1 2 0

1 0

0

7

0

7

1 2

3 4

5 6

Y- black and whiteblocks

Cb Cr

16 lines

16 pixelsFor colorpictures we take blocks:

Color blocks

DCT in the matrix formOne dimension:

N

kn

NcnkHH kkn

)

2

1(cos

2),(

N

lm

N

kn

NcmnHH kkn

)

2

1(cos)

2

1(cos

2),(

Two dimensions:

• FOR N=4 WE HAVE

DCT basis vectors

For N=8

For N=4

Basis vectors are obtained by multiplying vertical and horizontal cosine functions

• Example of DCT calculation

Input matrix

Calculation of 1-D DCT for columns of the inputmatrix

Calculation of 1-D DCT for the rows of the previous

Enlarged picturewith selectedblock

The block values DCT values

THE DCT TRANSFORM IS A MAPPING

FROM PICTURE BLOCK INTO

FREQUENCY DOMAIN

SINCE THERE WILL BE FEW HIGH

FREQUENCIES NORMALLY, THERE

WILL BE MANY ZEROS OR SMALL

NUMBERS IN THE DCT MATRIX

• EXAMPLE OF THE DCT CALCULATION

140 144 147 140 140 155 179 179144 152 140 147 140 148 167 179152 155 136 167 163 162 152 172168 145 156 160 152 155 136 160 ORIGINAL PICTURE BLOCK162 148 156 148 140 136 147 162147 167 140 155 155 140 136 162 136 156 123 167 162 144 140 147148 155 136 155 152 147 147 136 12 16 19 12 11 27 51 47 16 24 12 19 12 20 39 51 24 27 8 39 35 34 24 44 40 17 28 32 24 27 8 32 34 20 28 20 12 8 19 34 19 39 12 27 27 12 8 34 8 28 –5 39 34 16 12 19 20 27 8 27 24 19 19 8 SHIFTED BLOCK

IN PRACTICE SINCE PICTURE VALUES ARE IN (0,255)

WE SHIFT THEM TO (-127 , 128)

• BLOCK AFTER DCT 185 –17 14 –8 23 –9 –13 –8

20 –34 26 –9 –10 10 13 6

-10 –23 –1 6 –18 3 -20 0

-8 -5 14 –14 -8 –2 -3 8

-3 9 7 1 -11 17 18 15 MANY SMALL NUMBERS

8 0 -2 3 -1 -7 -1 -1

0 -7 –2 1 1 4 –6 0

The DCT values allow to detect and evaluate

periodical structures in small areas.

Sometimes this may be very useful.

DCT has some drawbacks: It requires real

numbers (cosine functions) and high precision

of calculations.

Another transform was introduced recentlyto improve on the DCT. This transform is obtained by rounding the coefficients in theDCT matrix }{ DCTHroundH

When = 2.5 the following transform is obtained

1221

1111

2112

1111

H =

This transformhas extremely simplecoefficients, no multiplications areinvolved

This transformation matrix is very simple.

We can see that the rows of the matrix

correspond to caclulations detecting:- average value of four signal samples- periodical function with period 1- periodical function with period 2 (row 4)

Thus we get signal decomposition into

periodical functions.

ENERGY IN THE DCT DOMAIN

Average 8 bit/pel Equal bit alloc

Average 3.2 bit/pel Unequal bit alloc

Compression

10 8 4 2

DCT

Inverse DCT

DCT coeff

DCT coeff DCT coeff

8 bit/pelbit/pel

Lowest freq.(DC)

Highest freq.

Large entropySmall entropy

QUANTIZATION

Quantization means removing information which is

not relevant.

Example: rounding of numbers,

round(4.076756) = 4

It turns out that high frequency information is not

very relevant for human vision. It can be thus

removed.

QUANTIZATION

High frequencies in DCT can be removed by

quantizing. Let K will be a value, we make the

operation:

n x round(K/n)

This will round K to in the interval delimited by

valus K-n/2, K+n/2

We can round numbers in such intervals:

QUANTIZATION INTERVALS

$f

f

$f

f

Uniform symmetric midtreader

Uniform symmetric midriser

QUANTIZATION MATRICES FOR DCT

• 16 11 10 16 24 40 51 61 • 12 12 14 19 26 58 60 55 • 14 13 16 24 40 57 69 56 • 14 17 22 29 51 87 80 62 • 18 22 37 56 68 109 103 77 • 24 35 55 64 81 104 113

92 • 49 64 78 87 103 121 120

101 • 72 92 95 98 112 100 103

99

17 18 24 47 99 99 99 99 18 21 26 66 99 99 99 99 24 26 56 99 99 99 99 99 47 66 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

99

For luminance Y For chrominance U, V

Each number in the DCT matrix is quantized (divided and rounded)by a number in the quantization matrix above. Notice that highfrequencies have much higher quantization values.

• EXAMPLE of DCT CALCULATION

140 144 147 140 140 155 179 179144 152 140 147 140 148 167 179152 155 136 167 163 162 152 172168 145 156 160 152 155 136 160 ORIGINAL PICTURE BLOCK162 148 156 148 140 136 147 162147 167 140 155 155 140 136 162 136 156 123 167 162 144 140 147148 155 136 155 152 147 147 136 12 16 19 12 11 27 51 47 16 24 12 19 12 20 39 51 24 27 8 39 35 34 24 44 40 17 28 32 24 27 8 32 34 20 28 20 12 8 19 34 19 39 12 27 27 12 8 34 8 28 –5 39 34 16 12 19 20 27 8 27 24 19 19 8 SHIFTED BLOCK

• BLOCK AFTER DCT 185 –17 14 –8 23 –9 –13 –8

20 –34 26 –9 –10 10 13 6

-10 –23 –1 6 –18 3 -20 0

-8 -5 14 –14 -8 –2 -3 8

-3 9 7 1 -11 17 18 15 MANY SMALL NUMBERS

8 0 -2 3 -1 -7 -1 -1

0 -7 –2 1 1 4 –6 0

• QUANTIZATION THE DCT VALUES ARE DIVIDED BY SPECIAL CONSTANTS AN ROUNDED

3 5 7 9 11 13 15 17

5 7 9 11 13 15 17 19 QUANTIZATION TABLE 7 9 11 13 15 17 19 21 9 11 13 15 17 19 21 23 61 –3 2 0 2 0 0 -1 11 13 15 17 19 21 23 25 4 –4 2 0 0 0 0 013 15 17 19 21 23 25 27 -1 –2 0 0 –1 0 –1 015 17 19 21 23 25 27 29 0 0 1 0 0 0 0 017 19 21 23 25 27 29 31 0 0 0 0 0 0 0 0 AFTER QUANTIZATION 0 0 –1 0 0 0 0 0 OF THE MATRIX FROM 0 0 0 0 0 0 0 0 THE PREVIOUS PAGE 0 0 0 0 0 0 0 0

Another example – reconstruction of a block from quantized DCT coefficients

We see that approximation is better when more coefficientsare taken

THE ROLE OF DCT AND QUANTIZATION

Quantized DCT coefficients preserve very effectivelycontent of small picture blocks. That is relevant perceptualinformation is well preserved and nonrelevant eliminated.

DCT is thus very good in the representation of image featureswith minimized information. This is practically confirmed since the DCT is used in image and video compression standards,called JPEG, MPEG. These standards are used in digital cameras, digital television, DVD discs and internet media players.

• Minimization of information in video

Video is composed of picture sequences,

25-30 pictures per second

One can observed that video is composed

of ’shots’ or ’scenes’. These are short segments

which have the same content. In single shot

the difference between two subsequent pictures

(taken at 40 ms interval) is very small

Information representing video scene can be minimized as follows: - Pick and compress first picture - Calculate motion compensated difference between the second picture and first one - Calculate the motion compensated difference between the restored second picture and the third one - Continue for all pictures in the scene

So we only need information about first (compressed) picture and differences between other pictures to preserve initial information from all pictures. Thiswill result in huge saving of information

• Example

The difference is mostly caused by motion of objects

• Movement of objects- there is problem with

object borders, to avoid it we consider movements of small picture blocks and try to

detect if they moved

• The difference between two pictures can be

reduced if motion vector of objects is found

and motion is compensated, that is object

which moved in the second picture is moved

back by its motion vector.

16x16 blocks 8x8 blocks 4x4 blocks Error is lower when the blocks are smaller

• It is also possible to detect movements of blocks

with greater accuracy than 1 pixel, by

interpolation between pixels

Half-pixelinterpolation

Difference images will be smaller

Quarter pixelinterpolation

Video information reduction

• Instead of having information about all pictures it is enough to have

1. The first picture 2. Motion- compensated Motion vectors representing difference between movements of picture blocks subsequent pictures

This is very significant reduction of information and also provides movementof objects information which is very important

multimedia signal processing algorithms part ii – minimization of the amount of information to be...

Documents