fast class rendering using multiresolution classification in discrete cosine transform domain
Post on 09-Jan-2016
38 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Fast Class Rendering Using Multiresolution Classification in
Discrete Cosine Transform Domain
Presented byLi-Jen Kao
July, 2005
2
Outline
Introduction Feature Extraction Classification Scheme Experimental Results Conclusion
3
1 Introduction Classification of objects (or patterns) into
a number of predefined classes has been extensively studied in wide variety of applications such as optical character recognition (OCR) speech recognition face recognition
We may consider the design of classification systems in terms of two subproblems: feature extraction classification.
4
Feature extraction: Features are functions of the measurements
performed on a class of objects It has not found a general solution in most
applications. Our purpose is to design a general
classification scheme, which is less dependent on domain-specific knowledge.
Reliable and general features are required
5
Discrete Cosine Transform (DCT)
It helps separate an image into parts of differing importance with respect to the image's visual quality.
Due to the energy compacting property of DCT, much of the signal energy has a tendency to lie at low frequencies.
6
Four advantages in applying DCT
The features extracted by DCT are general and reliable. It can be applied to most of the vision-oriented applications.
The amount of data to be stored can be reduced tremendously.
Multiresolution classification and progressive matching can be achieved by nature.
The DCT is scale-invariant and less sensitive to noise and distortion.
7
Two philosophies of classification
Statistical the measurements that describe an
object are treated only formally as statistical variables, neglecting their “meaning
Structural regards objects as compositions of
structural units, usually called primitives.
8
2 Feature Extraction via DCT The DCT coefficients C(u, v) of an N×N
image represented by x(i, j) can be defined as
where
1
0
1
0
),()()(2
),(N
i
N
j
jixvuN
vuC ),2
)12(cos()
2
)12(cos(
N
vj
N
ui
.1
,021
)(otherwise
wforw
9
Figure 1. The DCT coefficients of the character image “ 為” .
10
Figure 2. Illustratation of the multiresolution ability
of DCT
(a) (b) (c) (d)
(a) The original image of size 48×48; (b) The reconstructed image of size 8×8; (c) The reconstructed image of size 16×16; (d) The reconstructed image of size 32×32.
11
3. The Proposed Classification Scheme
The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes (c1, c2,…, cM).
Each pattern is represented by a set of D features, viewed as a D-dimensional feature vector.
12
3.1. Our classification model
In the training mode: the feature extraction module finds the
appropriate features for representing the input patterns, and the classifier is trained.
In the classification mode: the trained classifier assigns the input
pattern to one of the pattern classes based on the measured features.
13
To alleviate the burden of classification process, the process is usually divided into two stages: Coarse Classification Fine Classification
14
Figure 3. Model for multiresolution classification
15
3.2. Coarse classification module
In the training mode: The features of each training sample are first
extracted by DCT and quantized. Then the most D significant quantized DCT
features of each training sample are transformed to a code, called grid code (GC), which corresponds to a grid of feature space partitioned by the quantization method.
The training samples with the same GC are similar and can be classified into a coarse class.
Therefore, the information about all possible GCs is gathered in the training mode.
16
In the classification mode: The classes with the same GC as that
of the test sample are chosen as the candidates of the test sample.
17
3.2.1. Quantization
The 2-D DCT coefficient F(u,v) is quantized to F’(u,v) according to the following equation:
Most of the high frequency coefficients will be quantized to zero and only the most significant coefficients will be retained.
Q
vuFvuF
),(),(
18
3.2.2. Grid Code Transformation
After the quantization process, the most D significant quantized DCT features of sample Oi are obtained, say [qi1, qi2, .., qiD].
The significance of each DCT coefficient is decided according to the following zigzag order: F(0,0), F(0,1), F(1,0), F(2,0), F(1,1), F(0,2), F(0,3), F(1,2), F(2,1), F(3,0), F(3,1),…, and so on.
Because the value of qij may be negative, for the ease of operation, we transform qij to positive integer dij by adding a number, say kj, to qij.
In this way, object Oi can be transformed to a D-digit GC.
This process is called the grid code transformation (GCT).
19
3.2.3. Grid Code Sorting and Elimination
After the GCT, we obtain a list of triplets (Ti, Ci, GCi) Ti is the ID of a training sample Ci is the Class ID the training sample
belongs to GCi is the grid code of the training sample.
Then the list is sorted according to the GC ascendingly.
Given the GC of a test sample, we can get a list of candidate classes of the same GC for the test sample.
20
Elimination of Redundancy
Redundancy occurs as the training samples belonging to the same class have the same GC.
This redundancy can be eliminated by establishing an abstract lookup table that only contains the information about the GCs and their corresponding classes.
Then, given a GC, this table can tell the relevant classes very quickly by binary search.
21
3.3. The fine classification module
Progressive matching method Adding more DCT coefficients usually imply increasing
the resolution level of an image. If current resolution is not high enough to distinguish
one character from the others, we have to raise the level of resolution such that the discrimination power can also be improved.
The establishment of the templates for each class
Templates are established in the DCT domain. The average DCT coefficients of size N×N are obtained from the set of training samples with respect to the class.
Such that M sets of average DCT coefficients are obtained and served as the templates for each class.
22
The sum of squared differences (SSD) is used as the matching criterion.
The matching of x and Ti is decomposed into K iterations, each of which corresponds to the matching under the block of size nk×nk.
After the kth iteration, the block size is enlarged from nk×nk to nk+1×nk+1 (nk+1 = nk+d).
The process is repeated until one of the stop criterions is satisfied:
1) to preserve enough signal energy in the block, and 2) to reject unqualified classes as soon as possible.
23
4 Experimental Results 18600 samples (about 640 categories)
are extracted from Kin-Guan ( 金剛 ) bible. Each character image was transformed into
a 48×48 bitmap. 1000 of the 18600 samples are used for
testing and the others are used for training. The most D significant DCT coefficients were
quantized and transformed to a GC for each
sample.
24
Figure 3. Reduction and accuracy rate using our coarse classification scheme
25
Figure 4. Accuracy rate using both coarse and fine
classification
26
6 Conclusions This paper presents a multiresolution
classification scheme based on DCT for vision-based applications.
The DCT features of a pattern can be extracted progressively according to their significance.
On classifying an unknown object, most of the improbable candidate classes for the object can be eliminated at lower resolution levels.
Experiments were conducted for recognizing handwritten characters in Chinese palaeography and showed that our approach performs well in this application domain.
27
Future Works
Since only preliminary experiment has been made to test our approach, a lot of works should be done to improve this system. For example, since features of different
types complement one another in classification performance, by using different types of vision-oriented features simultaneously, classification accuracy could be improved.
top related