1 image databases conventional relational databases, the user types in a query and obtains an...

11

Image Databases Conventional relational databases, the user types in a query

and obtains an answer in response It is different in image databases

a police officer may issue a query: “ Retrieve all pictures from the image database that are “similar” to this person and give the identities of the people.”

This query is fundamentally different from ordinary queries for 2 reasons: 1. The query includes a picture as part of the query 2. The query asks about similar pictures and therefore uses a

notion of “imprecise match”

22

Raw images the content of an image consists of all “interesting” objects

in that image each object is characterized by

a shape descriptor: that describes the shape/location of the region within which the object is located inside the image

a property descriptor that describes the properties of individual pixels (e.g. RGB values of the pixel, RGB values aggregated over a group of pixels, grayscale levels)

a property consists of a property name, e.g., red, green, blue, texture a property domain - range of values that a property can assume {0, 1, ..7}

33

Images Every image is associated with a pair of positive integers

(m,n), called grid-resolution, which divides the image into (mn) cells of equal size (called image grid)

Each cell consists of a collection of pixels A cell property: (Name, Values, Method)

Example (bwcolor, {b,w}, bwalgo}, where the possible values are b(black) and

w(white), and bwalgo is an algorithm that takes a cell as an input and returns either black or white by somehow combining the black/white levels of the pixels in the cell

(graylevel, [0,1], grayalgo), where the possible values are real numbers within the interval [0,1].

44

Image Database Image Database: (GI,Prop,Rec)

GI is a set of gridded images (Image,m,n) Prop is a set of cell properties Rec is a mapping that associates with each image, a set of

rectangles denoting objects (in fact this does not necessarily have to be rectangle)

55

Problems with image databases Images are often very large

infeasible to explicitly store the properties on a pixel by pixel basis

This led to a family of image “compression” techniques: attempt to compress the image into one containing fewer pixels

There is a need to determine the “features” of the image (compressed or raw) done by “segmentation” : breaking up the image into a set of

homogeneous rectangular regions called segments Need to support “match” operations that compare either a

whole image or a segmented image against another

66

Image Compression Lossy Compression

Image may contain details that human eye cannot recognize get rid of those details

DCT(Discrete Cosine Transform) DFT(Discrete Fourier Transform) DWT(Discrete Wavelet Transform)

• convert images from time domain(Spatial) to frequency domain• get rid of the frequencies which do not contain information.

Transforms DCT and DFT are similar concepts

From time domain to signal domain Given a signal of length “n”, these transforms return a sequence of n

frequencies. • Sample1, Sample2, . . . . . . . , Sample n transforms to :• Freq1, Freq2, . . . . . . . . . , Freq n.

77

Why do we use the transform Noise removal is easier in the frequency domain Various filters are easier to implement in frequency domain Compression (gathers similar values together)

88

Desirable Properties of Transforms DFT

Invertibility: It is possible to get back the original image I from its DFT representation. (useful for decompression)

Note: practical implementations of DFT often use DFT with other non-invertible operations: thus sacrifice invertibility

Distance preservation: DFT preserves Euclidean distance. This is important in image matching applications where we often use distance

measures to represent similarity levels

DCT DCT preserves all the above a given signal can be represented with fewer frequencies

DWT DFT and DCT have no temporal locality

a change in one single part of data changes all frequencies wavelets introduce locality

99

Distance preservation

1010

Distance preservation

1111

Fractal Compression Transform-based approaches benefit from the difference in

visual perception in different frequencies What else can we use for compression ?

Self similarity We can find self similarities in a given image and describe the image in

terms of these similarities.

1212

Fractal Compression

1313

Image Processing: Segmentation A process of taking an image as input and cutting up the

image into disjoint homogeneous regions Connected region (R):

is a set of cells C1 .. Cn in R such that the Euclidean distance between Ci and Ci+1 for all i < n is 1

Example R1,R2,R3 is connected R1 R2 is connected R2 R3 is connected R1R2 R3 is connected R1 R3 is not connected Because the Euclidian distance between (2,3) and (3,4) is 2>1

R3

R1R2

1 2 3 4

4

3

2 1

1414

Measuring Homogeneity Homogeneity predicate: is a function H that takes any

connected region as input and returns either true or false Example 1:

Suppose is some real number between 0 and 1 H

bw can be defined as Hbw (R) is true if over (100*)% of cells in

R have the same color

Region # of black #of white cells cellsR1 800 200R2 900 100R3 100 900

Region H0.8bw H0.89

bw H0.92bw

R1 true false falseR2 true true false R3 true true false

1515

Measuring Homogeneity Example 1:

Suppose each cell has a real value between 0, 1, this value is bw-level

Suppose f assigns a value between 0 and 1 to each cell Assume is the noise factor and a threshold H,f,(R) is true if {(x,y)| |bwlevel(x,y)-f(x,y)|< }/(mn) >

1616

Segmentation Given an image I with (mn) cells, a segmentation of I wrt

a homogeneity predicate P is a set of R1, .Rk such that Ri Rj = for all 1 i j k I = R1 .. Rk H(Ri) = true for all i j k for all distinct i,j, 1 I, j n such that Ri Rj is a connected

region, it is the case that H(Ri Rj) = false

1717

An Example of Segmentation

Row/Col 1 2 3 4 1 0.1 0.25 0.5 0.52 0.05 0.30 0.6 0.63 0.35 0.30 0.55 0.84 0.6 0.63 0.85 0.90

For Hdyn,0.03(R) of the following (44) image will yield the following segmentation R1 = {(1,1),(1,2)} R2 = {(1,3),(2,1),(2,2),(2,3)} R3 = {(3,1),(3,2),(3,3),(4,1),(4,2)} R4 = {(3,4),(4,3),(4,4)} R5 = {(1,4),(2,4)}

Row/Col 1 2 3 4 1 0.1 0.25 0.5 0.52 0.05 0.30 0.6 0.63 0.35 0.30 0.55 0.84 0.6 0.63 0.85 0.90

1818

Segmentation Algorithm Split:

if the whole image is homogeneous, we are done otherwise, split the image into two parts and recursively repeat

this process till we find a set of R1 .. Rn such that each region is homogeneous

Merge: check whether any of the Ri’s can be merged together at the end of this step, we obtain a valid segmentation R1, ..Rk

1919

Similarity Based Retrieval

2020

Similarity Based Retrieval

2121

Similarity Based Retrieval The Metric Approach:

Uses a distance measure d that can compare tow images The smaller the distance, the more similar they are I.e., given an input image I, find the “nearest neighbor” of I in the

image archive The Transformation Approach:

The metric approach assumes that the notion of similarity is fixed Whereas the transformation approach computes the cost of

transforming one image into another based on user-specified cost functions that may vary from one query to another

2222

The Metric Approach We define a distance function on a k dimensional space

(k=n+2) the distance function satisfies the following properties

d(x,y) = d(y,x) d(x,z) d(x,z) + d(z,y) d(x,x) = 0

Example: Let the image object consists of (256256) cells with 3 attributes (red,green,blue) each of which assumes a value from {0,…7} di(o1,o2) = (diffr[i,j]+diffg[i,j]+diffb[i,j]) where diffr[i,j] = (o1[i,j].red - o2[i,j].red)2

diffg[i,j] = (o1[i,j].green - o2[i,j].green)2

diffb[i,j] = (o1[i,j].blue - o2[i,j].blue)2

Such computations can be cumbersome (65536 expressions being computed inside the sum)

2323

The Metric Approach How can this massive similarity computation be avoided? Through feature extraction! Use a good feature extraction function fe and use it to map

objects into single points in a s-dimensional space where s would typically be pretty small compared to n+2

This leads to two reductions an object is originally is a set of points in an (n+2) dimensional

space. In contrast, fe(0) is a single point fe(o) is a point in s-dimensional space where s << (n+2)

The feature extraction mapping must preserve the distance relationships in the original space

(n+2) dim space s-dim space indexing algorithm index object repository (could be quadtree, R-tree for s-dim data)

2424

Searching Finding the best matches

find the nearest neighbors of fe(o) in the tree using a nearest neighbor search technique.

Finding sufficiently similar objects execute a range query in the tree with center fe(o) and radius

2525

The Transformation Approach The main principle

the level of dis-similarity between o1,o2 is proportional to the cost of transforming o1 into o2, or vice-versa

Transformation operators translation rotation scaling (uniform and nonuniform) excision

Transformation of o into o’ is a sequence of transformation operations (to1,to2, ..tor) such that to1(o) = o1 …... To(or) = o’ Cost of transformation, cost(TS) = cost(toi)

2626

Example

2727

Example

2828

Example

2929

Transformation vs. Metric Advantages of the transformation model

user can setup his own notion of similarity by specifying certain transformation operators

user may associate a cost function with each transformation operator

Advantages of the metric model by forcing the user to use only one similarity metric, the system

can facilitate the indexing of data so as to optimize nearest neighbor search

1 image databases conventional relational databases, the user types in a query and obtains an...

Documents

image database image

segmented image

image databases images

set of gridded images

domain f

cell property

texture f

cell f graylevel