image processing and computer visionimagine.enpc.fr/~de-la-gm/cours/upem/cours_image_1.pdf · on...

Image Processing and Computer VisionPixel operations & Filtering

Martin de La Gorce

[email protected]

January 2015

1 / 1

Continuous image

A continuous grayscale image can be formalized as afunction from [0, W ] × [0, H] to R. The value taken at aparticular point location is called intensity

A continuous color image can be formalized as a functionfrom [0, W ] × [0, H] to R3

2 / 1

Discretisation and quantification

A digital grayscale is obained from a real world a continousimage using two operations

Sampling : The mean intensity is obtained in each pixel byintegrating the continuous intensity on that pixel

∀(i , j) ∈ N2 : Id (i , j) =

∫ i+1

x=i

∫ j+1

y=jIc(x , y)dxdy (1)

We obtain a discretized image with values in R

3 / 1

Discretization and quantification

Quantification: Each real-value intensity of each pixel isconverted into an integer value. The values a multiplied bya constant and rounded to the nearest superior integervalue:

Iq(i , j) = dαId (i , j) − 0.5e (2)

with dxe the ceil rounding operation = min({i ∈ N|i ≥ x})

4 / 1

Discretization and quantification

A discrete is made of a set of pixel

We will suppose that the pixel are positioned on a regularsquare grid.

A grayscale image has generally intensities coded on onebyte and each pixel intensity can take a value between 0and 255

A color image has pixel intensities coded on 3 bytes, onefor each color (red, green, blue) and each color channeltakes value between 0 and 255

5 / 1

Color images

Bayer Filter:

Pixel sensors measure light intensity in the entire visiblespectrum

We add a color filter in from of each pixel sensor in order tolet through only either the red, the blue or the greenspectrum. This filter positioning follows a pattern called theBayer Filter

A color image with 3 values per pixel in obtained byinterpolating the missing colors at each pixel fromneighboring pixels (demosaicing)

bayer filter interpolated image

6 / 1

Spatial Transformations

Let T be a plan transformation (from R2 to R2) defined by

T (x , y) = (Tx(x , y), Ty (x , y))

We can apply this transformation to an image as follows:For each discrete pixel in the source image we will copy itsintensity at the rounded transformed location in the targetimage

∀(x , y) ∈ N2 : I ′(Tx(x , y), Ty (x , y))) = I(x , y)

PB : this may leave holes in the target image I′

We will instead proceed as follows: For each pixel in thetarget image we will look for the intensity of thecorresponding pixel in the source image

∀(x , y) ∈ N2 : I ′(x , y) = I(T−1x (x , y), T−1

y (x , y))

7 / 1

Transformations spatiales

Examples

zoom in/out around center c:

T−1x (x , y) = α(x − cx) + cx

T−1y (x , y) = α(y − cy ) + cy

translations with vector t = (tx , ty ):

T−1x (x , y) = x − tx

T−1y (x , y) = y − ty

rotation with angle θ (anticlockwise ) around c:

T−1x (x , y) = cos(θ)(x − cx) + sin(θ)(y − yc) + cx

T−1y (x , y) = −sin(θ)(x − cx) + cos(θ)(y − yc) + cx

8 / 1

Interpolation

When we evaluate I(T−1x (x , y), T−1

y (x , y)) the coordinateT−1

x (x , y) et T−1y (x , y) are generally not integer. We need to

interpolate values for non integer positions :

nearest neighbor

I(x , y) = I(dx − 0.5e, dx − 0.5e)

bilinear

I(x , y) = I(bxc, byc)(1 − εx)(1 − εy )

+I(bxc + 1, byc)εx(1 − εy )

+I(bxc, byc + 1)(1 − εx)εy

+I(bxc + 1, byc + 1)εxεy

with εx = x − bxc et εy = y − byc

9 / 1

Aliasing

If we reduce the size of an image (zoom-out with α > 1) wehave can have a moiré pattern due to aliasing:

10 / 1

Aliasing 1D

On dimensional signal explanation:

red curve: continuous unsampled/over-sampled signal

blue points: sub-sampling points

blue curve : curve as reconstructed by our retina+brainfrom the blue point

There exist several curves (red and blue) passing through thesampled points. Our retina+brain favor the low frequencyinterpretation.

11 / 1

Aliasing

Solution : We smooth the image using a local avaragingoperation using a neighborhood of size α. This suppress spatialfrequencies superior to (1/α) before subsampling (we will seehow we do smoothing later in this lecture)

12 / 1

Intensities transformations

It is often useful to modify intensities of an image using afunction applied independently on each pixel of the image :

I2(i , j) = f (I(i , j)) (3)

The way we perceive the content of the image do not changemuch if the function is an increasing function ( this happensnaturally when wearing sun glasses, changing the contrast of ascreen etc.):

f(x)=x f (x) = x3 x = 1 − x

13 / 1

Linear contrast change

we apply an affine transformation of the intensities:

I2(i , j) = αI(i , j) + β

example with α = 0.5, β = 127:transformation function:

0 50 100 150 200 2500

50

100

150

200

250

image transformation:

14 / 1

Saturating

Saturating intensities

I2(i , j) = I(i , j) if I(i , j) > τ, 0 sinon

example with τ = 127:transformation function:

0 50 100 150 200 2500

50

100

150

200

250

image:

15 / 1

Thresholding

I2(i , j) = 255 if I(i , j) > τ, 0 otherwise

example with τ = 160:

transformation function:

0 50 100 150 200 2500

50

100

150

200

250

image:

16 / 1

Intensities transformations

For all these transformations it can be useful do visualize theintensities distribution in order to make the right choice ofparameter (threshold etc) This is done by compute thehistogram of the image

17 / 1

Histogram

In order to visualize the pixel intensities distribution, wecompute the histogram of the image:

for k ∈ {0, . . . , maxij(Iq(i , j))} we compute the number ofpixels whose intensities are equal to k in the image :

h(k) = card({(i , j)|Id (i , j) = k}) (4)

with card(E) the cardinality E i.e. its number of elements

0 50 100 150 200 2500

100020003000400050006000700080009000

18 / 1

Cumulated histogram

for k ∈ N we count the number of pixel whose intensity issmaller or equal to k :

H(k) = card({(i , j)|Id(i , j) ≤ k}) (5)

=∑

l≤k

h(l) (6)

0 50 100 150 200 250

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

19 / 1

Cumulated histogram

The advantage if the cumulated histogram is that it generalizeeasily to unquantified image (real value intensities)

for x ∈ R we count the number of pixels with value smalleror equal to x

H(x) = card({(i , j)|Id(i , j) ≤ x}) (7)

H is an increasing function that is piecewise constant withdiscontinuities at a finite set of locations D ⊂ R defined by

D =⋃

i,j

{Id(i , j)}

H(x) is the position of the intensity x in the vectorcontaining each image pixel intensity when sorted in theincreasing order.

20 / 1

Contrast change

Suppose we apply a strictly increasing function f the pixelintensities:

I′(i , j) = f (I(i , j)) (8)

Effect on the cumulated histogram:

H ′(x) = card({(i , j)|f (Id (i , j)) ≤ x}) (9)

= card({(i , j)|Id (i , j)) ≤ f−1(x)} (10)

= H(f−1(x)) (11)

21 / 1

Contrast change

Effect on the cumulated histogram:

H ′(f (x)) = card({(i , j)|f (Id(i , j)) ≤ f (x)}) (12)

= card({(i , j)|Id (i , j)) ≤ x} (13)

= H(x) (14)

example f (x) = x/2 + 127:

x=126

x=126 f(x)=190

22 / 1

Affine contrast change

Effect of f (x) = αx + β on the cumulated histogram for α > 0:

H ′(x) = card({(i , , j)|αId(i , j) + β ≤ x}) (15)

= card({(i , j)|Id (i , j)) ≤ (x − β)/α} (16)

= H((x − β)/α) (17)

horizontal dilatation/compression by a factor αtranslation with offset of β

Example with α = 0.5, β = 127:

0 50 100 150 200 2500

50000

100000

150000

200000

250000

300000

350000

400000

450000

x=126 f(x)=190

23 / 1

Changement de contraste linéaire

Pour calculer l’histogramme de l’image transformée il estnécessaire de la quantifier i.e

I′(i , j) = dαI(i , j) + β − 0.5e

On observe:Une dilatation/compression horizontale par facteur α

Une translation de longueur β

Une dilatation verticale par facteur 1/α si 1/α est entier,plus complexe sinon (voir slide suivante)

exemple avec α = 0.5, β = 127:

0 50 100 150 200 2500100020003000400050006000700080009000

0 50 100 150 200 2500

2000

4000

6000

8000

10000

12000

14000

16000

18000

24 / 1

Changement de contraste lineaire

Lorsque 1/α n’est pas entier on a un effet "dents de scie" surl’histogramme exemple avec α = 2, β = 0:

0 50 100 150 200 2500100020003000400050006000700080009000

0 20 40 60 80 1000

1000

2000

3000

4000

5000

6000

7000

8000

9000

Cela est dû au fait que l’on travaille avec des valeurs d’intensitédiscrètes dans le cas de l’histogramme et donc que l’on obtientdes zéros sur les intensités impaires après cette transformationlorsque α = 2.

25 / 1

égalisation d’histogramme

Egalisation de l’histogramme

On cherche une transformation non lineaire f quiuniformise l’histogramme de sorte à avoir le même nombrede pixels pour tous les niveaux de gris i.e

∀k ∈ {0, . . . , N − 1} : h′(k) = w × h/N

avec N le nombre de niveaux de gris (en général N = 256)

Permet d’augmenter automatiquement le contraste d’uneimage sans choisir de seuils

A cause de l’effet dent de scie évoqué plus haut et du faitqu’en général w × h/N /∈ N, il est difficile d’obtenir unhistogramme complètement uniforme

26 / 1


Idéalement on a:

∀k ∈ {0, . . . , N − 1} : h′(k) = w × h/N

⇔ ∀k ∈ {0, . . . , N − 1} : H ′(k) = (k + 1)w × h/N

On cherche un histogramme cumulé H ′ linéaire.

Or par construction, l’histogramme cumulé est constantpar morceaux avec des discontinuités en f (D) . Orf (D) 6= {0, . . . , N − 1} . Les N égalités précédentes nepeuvent pas toutes être vérifiées

27 / 1


On va à la place chercher à ce que la restriction de H ′ àf (D) soit linéaire i.e

∀x ∈ D : H ′(f (x)) = (f (x) + 1)w × h/N

On a H ′(f (x)) = H(x) donc f (x) = N×H(x)w×h − 1

H(x)

H'(x)

(x+1)*w*h/255

x f(x)

H(x)=H'(f(x))

f(x)28 / 1


Conséquence: On espace deux intensités consecutivesy − 1 et y par une distance proportionelle au nombre depixels aillant l’intensité y (On étale là où la densité estimportante):

f (x) − f (x − 1) =N × h(x)

w × h

29 / 1


Exemple avec l’image0 1 21 2 4

et N = 6 On obtient après

égalisation de l’image0 2 42 4 5

0 1 2 3 4 50

1

2

3

4

5

6

histogramme

0 1 2 3 4 50

1

2

3

4

5

6

cummulé

0 1 2 3 4 50

1

2

3

4

5

6

histogramme

0 1 2 3 4 50

1

2

3

4

5

6

cummulé

30 / 1

Filtres

On a vu jusque maintenant des transformations quiopèrent sur chaque pixel indépendamment de l’intensitéde ses voisins.

De nombreuses transformations sur les image reviennentà calculer l’intensité d’une pixel de la nouvelle imagecomme une combinaison linéaire des pixels voisins dansl’image originale.

31 / 1

Convolution

En formalisant les coefficients de la combinaison linéairecomme les intensités d’une image w , cela se formalisecomme une convolution discrète avec le symbole ∗:

(w ∗ I)(x , y) =∞∑

i=−∞

∞∑

j=−∞

w(i , j)I(x − i , y − j)

32 / 1

Convolution

Propriétés de la convolution:

une transformation linéaire:

w ∗ (I1 + I2) = w ∗ I1 + w ∗ I2 (18)

w ∗ (αI) = α(w ∗ I) (19)

invariante par translation. Si on définit la translation part ∈ Z2 comme : I′ = T (t , I) ⇔ I′(x , y) = I(x − tx , y − ty ))alors w ∗ T (t , I) = T (t , w ∗ I)

commutative f ∗ g = g ∗ f

associative f ∗ (g ∗ h) = (f ∗ g) ∗ h

l’image δ(x , y) =

{1 si x = y = 00 sinon

est l’identité pour la

convolution i.e.: f ∗ δ = f

33 / 1

Filtrage

Un filtre est défini comme une transformation linéaire del’image invariante par translation.

Théorème : tout filtre peut s’écrire comme une convolution.

34 / 1

Filtrage

Preuve en 1D:Soit F le filtre, en utilisant T et δ version 1D des définitionsde la slide précédente, on décompose I comme un sommepondérée de l’images δ translatée

I =∑

i

I[i]T (i , δ)

par linéarité de F:

F (I) =∑

i

I[i]F (T (i , δ))

l’invariance par translation donne:

F (I) =∑

i

I[i]T (i , F (δ))

Finalement :

F (I)[j] =∑

i

I[i]F (δ)[j − i])

35 / 1

Corrélation

La corrélation, écrite avec ◦, est un opérateur similaire à laconvolution:

w ◦ I(x , y) =∞∑

i=−∞

∞∑

j=−∞

w(i , j)I(x + i , y + j)

la corrélation n’est pas commutative (w ◦ I 6= I ◦ w) onappelle w le noyau

On représente souvent les coefficients w(i , j) sous formed’une matrice W avec un nombre impair de lignes et decolones avec w(0, 0) au centre. Exemple:W = [−1, 0, 1] ⇔ w(0,−1) = −1, w(0, 0) = 0, w(0, 1) = 1et les autre coefficients sont 0

Attention: on trouve beaucoup de documents où le termeconvolution est utilisé alors qu’il s’agit d’une corrélation1...

1exemple http://fr.wikipedia.org/wiki/Filtre_de_Prewitt36 / 1

http://fr.wikipedia.org/wiki/Filtre_de_Prewitt

Moyenne glissante

On remplace l’intensité de chaque pixel par la moyennedes intensités dans un voisinage de taille 2N + 1 × 2N + 1:

I′(x , y) =1

(2N + 1)2

N∑

i=−N

N∑

j=−N

I(x − i , y − j)

N = 0 N = 5 N = 20

37 / 1

Moyenne glissante

Dans le cas N=3 on peut représenter la moyenne glissante par

la corrélation I′ = w ◦ I avec le noyau W = 19

1 1 11 1 11 1 1

la convolution I′ = w ∗ I, w étant symétrique

38 / 1

Moyenne glissante : Séparabilité

On peut réécrire ce calcul en deux étapes: lissage vertical puislissage horizontal (ou l’inverse)

Ih(x , y) =1

(2N + 1)

N∑

i=−N

I(x − i , y)

Is(x , y) =1

(2N + 1)

N∑

j=−N

Ih(x , y − j)

On passe de w ×h× (2N +1)2 opérations à 2×w ×h× (2N +1)

I Ih Is

39 / 1

Moyenne glissante : Récursivité

On peut écrire les deux filtres 1D sous forme récursive:

Ih(x , y) = Ih(x − 1, y) + (I(x + N, y) − I(x − N − 1, y))/(2N + 1)

Is(x , y) = Is(x , y − 1) + (h(x , y + N) − I(x , y − N − 1))/(2N + 1)

On passe de 2 × w × h × (2N + 1) opérations à 4 × w × h

40 / 1

Filtre Gaussien

On remplace l’intensité de chaque pixel par la moyenne desintensités dans un voisinage de taille 2N + 1 × 2N + 1 en pondérantpar un poids décroissant avec la distance suivant une loi Gaussiennede déviation standard σ:

I′(x , y) =1Z

N∑

i=−N

N∑

j=−N

w(i , j)I(x − i , y − j)

avec w(i , j) = exp(− (i2+j2)

2σ2

)et Z =

∑Ni=−N

∑Nj=−N w(i , j). On prend

> 3 pour éviter de couper la queue de la gaussienne.

20 15 10 5 0 5 10 15 20 2015

105

05

1015

20

0.2

0.4

0.6

0.8

1.0

σ = 5 σ = 5 σ = 20

41 / 1

Filtre Gaussien : Séparabilité

On pose w1(i) = exp(−i2/(2σ2)

)et Z1 =

∑Ni=−N w1(i). On a

w(i , j) = w1(i) × w1(j) et Z = Z 21

On peut donc reécrire ce calcul en deux étapes: lissage verticalpuis lissage horizontal (ou l’inverse)

Ih(x , y) =1Z1

N∑

i=−N

w1(i)I(x − i , y)

Is(x , y) =1Z1

N∑

j=−N

w1(j)Ih(x , y − j)

On passe de w ×h× (2N +1)2 opérations à 2×w ×h× (2N +1)

I Ih Is 42 / 1

Filtre de dérivée

Dérivée partielle d’une fonction 2D suivant la direction x :

Ix =∂I∂x

(x , y) = limh→0I(x + h, y) − I(x , y)

h

approximation pour image échantillonnée:

∂I∂x

(x , y) ≈ I(x + 1, y) − I(x , y)

formulation par corrélation :

∂I∂x

(x , y) ≈0∑

i−1

w(i)I(x + i , y) = w ◦ I(x , y)

avec w(1) = 1 ,w(0) = −1 i.e W = [0,−1, 1]

Un problème avec ce filtre est que l’image se décale d’un demipixel, contrairement à la formulation centrée (slide suivante) quine décale pas l’image

43 / 1

Filtre de dérivée centré

Formulation centrée:

Dérivée partielle d’une fonction 2D suivant la direction x :

∂I∂x

(x , y) = limh→0I(x + h, y) − I(x − h, y)

2h


∂I∂x

(x , y) ≈12

(I(x + 1, y) − I(x − 1, y))

formulation par corrélation :

∂I∂x

(x , y) ≈1∑

i−1

w(i)I(x + i , y) = w ◦ I(x , y)

avec w(−1) = −1/2 ,w(0) = 0, w(1) = 1/2 i.e W = [−1, 0, 1]/2

44 / 1

Filtre de dérivée centré

|Ix | grand ⇔ forte variation pour un déplacement horizontal (ex:bord gauche de la langue)

|Iy | grand ⇔ forte variation pour un déplacement vertical (ex:rides sur le front)

Ix Iy

45 / 1

Filtre de dérivée seconde

Dérivée seconde x :

∂2I∂2x

x , y) =∂

∂y

(∂I∂x

)

(x , y)


∂2I∂2x

(x , y) ≈∂I∂x

(x , y) −∂I∂x

(x − 1, y)

≈ I(x + 1) − 2I(x) + I(x − 1)

formulation par corrélation:

∂2I∂2x

(x , y) ≈1∑

i−1

w(i)I(x + i , y) = w ◦ I

avec w(−1) = 1 ,w(0) = −2, w(1) = 1 i.e W = [1,−2, 1]

46 / 1

Gradient d’une image

Définition en continu:

∇I(x , y) =

[∂I∂x

(x , y),∂I∂y

(x , y)

]

Dérivée partielle d’une fonction 2D suivant la direction ~d :

∂I

∂~d(x , y) = limh→0

I(x + dx , y + dy ) − I(x , y)

h= ∇I(x , y) ∙ ~d

Si on voit l’image comme un surface: le gradient est un vecteur2D orienté dans la direction de la plus grande pente et dont lanorme est croissante avec la pente

I et gradient ‖∇I(x , y)‖ ‖∇I(x , y)‖47 / 1

Gradient d’une image

48 / 1

Détecteur de bords de Canny

Détecteur de bord de canny:

On lisse l’image avec une filtre gaussien pour retirer lespetits détails dont le bruit. On obtient IsOn calcul Ix et Iy , les dérivées en x et y de l’image lissée

Is Ix Iy

49 / 1

Canny

on calcule l’image NG = ‖∇Is(x , y)‖ =√

Ix (x , y)2 + Iy (x , y)2 quicorrespond à la norme du gradient de l’image lissée

∇Is(x , y) NG

50 / 1

Canny

Seuiller NG ne suffit pas car l’on obtient des bords trop gros(image ci dessous). On a besoin de raffiner les bords.

On cherche les points qui sont des maximums locaux de lanorme du gradient lorsque l’on considère leurs voisins dans lesegment orienté dans la direction du gradient

On va arrondir la direction du gradient en 8 direction possibles{kπ/4|k ∈ Z}

On compare l’amplitude du gradient au centre avec celle aux 2voisins dans la direction discrétisée et celle opposée

NG > 451 / 1

Canny

On calcule le champs de vecteurs ~D(x , y) = ∇(x , y)/‖∇(x , y)‖correspondant au gradient normalisé (orientations) et oncalcule l’angle θ(x , y) qu’il forme avec le vecteur (1, 0)

~D(x , y) θ(x , y)

52 / 1

On arrondi l’angle θ(x , y) avec la valeur la plus proche dans{kπ/4|k ∈ Z} pour obtenir θ̃(x , y). On obtient ensuite desdirections arrondiesD̃(x , y) = [round(cos(θ̃(x , y))), round(sin(θ̃(x , y)))]

D̃(x , y) θ̃(x , y)

53 / 1

Canny

On obtient l’ensemble M ⊂ {1, . . . , w} × {1, . . . , h} desmaximum locaux comme l’ensemble des points de l’imagevérifiant les deux conditions

NG(x , y) > NG(x + D̃x(x , y), y + D̃y (x , y))

NG(x , y) > NG(x − D̃x(x , y), y − D̃y (x , y))

M54 / 1

Canny: seuillage

On filtre les bords en utilisant deux seuils τ1 et τ2 avec τ1 < τ2

On définit deux sous ensembles M1 et M2 de M comme lesmaximaux pour lesquels l’amplitude du gradient estrespectivement plus grande que τ1 et τ2

M1 = {(x , y)|NG(x , y) > τ1) ∩ M

M2 = {(x , y)|NG(x , y) > τ1) ∩ M

M1 M2

55 / 1

Canny: hysteresis

Pour prolonger les bords importants et non les tronquer :

On garde les points de M1 qui peuvent être connectés par unsuite de point de M1 voisins deux à deux (pixels dont un bord ouun coin se touche) à un point de M2.

autrement dit: on garde l’ensemble des point qui appartiennentà une région connexe de M1 contenant au moins un point de M2

M1 M2 Bords

56 / 1

image processing and computer visionimagine.enpc.fr/~de-la-gm/cours/upem/cours_image_1.pdf · on...

Documents