need of color histogram
TRANSCRIPT
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 1/14
Key-f rame extraction is a widely used method for video summarization. The key-frames are the
characteristic f rames of the video and represents meaningf ul information about the contents
of the video. The extracted key-frames from the video can be arranged chronologically to
generate a storyboard. In video archiving systems, the key-frames can be used f or indexing in
such a way that the content based indexing and retrieval techniques developed for image
retrieval can be applied for video retrieval.
The key-frames extracted must summarize the characteristics of the video, and the image
characteristics of a video can be tracked by all the key-frames in time sequence.A common
methodology for extraction of key-frames is to compare consecutive frames based on some low
level Frame Difference Measures (FDMs). The frame difference is measured and if this
difference exceeds a certain threshold, then that frame is selected as a key-frame otherwise
discard the frame. Some of the low level features which are commonly used for the extraction
purpose include color histogram, shape histogram, motion information and edge histogram etc.
Then we apply Discrete Wavelet Transform on the key-frame and decompose the frame into four
sub-images. Then the frame difference is computed between the current frame and the last
extracted key-frame. This frame difference is computed by using colour histogram, shape feature
descriptor, texture feature. Then the obtained frame difference is compared with certain
threshold, if the difference satisfies with the threshold condition then the current frame is
selected as a key-frame. By continuously repeating the procedure for all frames we can extract
the key-frames.
The main reason and advantage for applying the wavelet transform to the detection of
edges in a frame is the possibility of choosing the size of the details that will be detected. When
processing a 2-D frame, the wavelet analysis is performed separately for the horizontal, vertical
and diagonal directions. Thus, horizontal, vertical and diagonal co-efficients are obtained
separately. The 2D discrete wavelet transform (DWT) decomposes the frames into sub-images,
3 details and 1 approximation. The approximation looks similar to the input image but only 1/4
of original size. The 2-D DWT is an extension of the 1-D DWT in both horizontal andvertical directions. The successive decomposition is performed only on the low pass output. The
resulting sub-images from an octave (a single iteration of the DWT) are labeled as A (the
approximation or we say the smoothing image of the original frame which contains the most
information of the original frame), H (preserves the horizontal edge details), V (preserves the
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 2/14
vertical edge details), and D (preserves the diagonal details which are influenced by noise
greatly), according to the filters used to generate the sub-image.
a frame is decomposed into Approximate (A), Horizontal (H), Vertical (V) and Diagonal
(D) details. Two levels of decomposition are done. After that, quantization is done on the
decomposed frame where diff erent quantization maybe done on diff erent components thus
maximizing the amount of needed details and ignoring ‘not-so-wanted’ details. This is done by
thresholding where some coefficient values for pixels in frames are ‘thrown out’ or set to zero or
some ‘smoothing’ aff ect is done on the image matrix.
Need of color histogram
The color histograms have been commonly used for key-frame extraction in frame difference
based techniques. Each frame obtained from the video is added to the collection and is analyzed
to compute a color histogram which shows the proportion of pixels of each color within the
frame. Then histogram difference is calculated by comparing histograms of successive frames
using distance measure. This value is used to identify the key-frame by comparing with the
threshold (T). A particular threshold is automatically computed and set by using the average
value of the frame difference of all the extracted frames. The frames which are below the
threshold (T) are discarded and the frames which are above the threshold are taken as key-frames.
The color feature is one of the most widely used visual features used in distincting
frames. It is relatively robust to background complication and independent of image size and
orientation. In frame extraction color histogram is most commonly used color feature
representation. Statistically, it denotes the joint probability of intensities of the three color
channels.
The use of color in video processing is motivated by two principle factors
(1) First, color is a powerful descriptor that often simplifies object identification and
extraction from a frame.
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 3/14
(2) Second, humans can discern thousands of color shades and intensities compared to about
every two dozen shades of gray. This second factor is particularly important in manual
(i.e., when performed by human) image analysis .
The main purpose of the RGB color model is for the sensing, representation and display
of images in electronic systems such as televisions and computers, though it has also been used
in conventional photography. Before the electronic age, the RGB color model already had a solid
theory behind it based on human perception of colors.
We can represent the RGB model by using a unit cube as shown in Figure 3.2. Each point in the
cube (or vector where the other point is the origin) represents a specific color. This model is the
best for setting the electron guns for a CRT. Note that for the complimentary colors the sum of
the values equals white light (1,1,1).For example:
Red (1,0,1) + cyan (0,1,1) = white (1,1,1)
Green (0,1,0) + magenta (1,0,1) = white (1,1,1)
Blue (0,0,1) + yellow (1,1,0) = white (1,1,1)
SHAPE FEATURE EXTRACTION
In video summarization technique, depending on the applications, some require the shape
representation to be invariant to translation, rotation and scaling, while others do not. There are
many important feature components for describing the dissimilarity between the frames such as
color, texture, shape, and spatial relationship. Among these features, shape contains the most
attractive visual information for human perception. Shape representation compared to other
features, like texture and color, is much more effective in semantically characterizing the
contents of a frame.
In general shape representation can be divided into two categories, boundary based and
region based. The former uses only the outer boundary of the shape while the latter uses the
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 4/14
entire shape region. The most successful representatives of these two categories are transform
coefficients and moment invariants.
An important step before shape extraction is edge point detection. Edges define the boundaries
between regions in a frame, which helps with segmentation and object recognition. Edge
detection is a fundamental of low-level image processing and good edges are necessary for
higher level processing. Traditional edge-detection algorithms such as gradient-based edge
detectors, Laplacian of Gaussian (LOG), zero crossing, and Canny edge detectors suffer from
some particular limitations.
The construction of shape descriptors is even more complicated when invariance, with
respect to a number of possible transformations, such as scaling, shifting, and rotation, is
required.
For a given frame an edge map is obtained by using wavelet decomposition. Shape feature
vector is computed using moment invariants. Similarly feature vector is computed between all
the successive frames. Canberra distance measure is used to calculate the distance between
feature vectors. A threshold (T) is set to discard the similar frames for efficient key-frame
extraction.
Similarity between two frames is obtained by evaluating the canberra distance formula
between the feature vectors of every nth frame and successive (n+1)th frames in a video. Similar
frames of the video are discarded by calculating the distance between the obtained frame feature
vectors. Canberra distance formula is used for calculating the distance and is given by equation :
CDk =∑ni=1|xi - yik |
|xi| + |yik |
Where CD is the Canberra distance, x and y are the feature vectors of every n th and (n+1)th
frame respectively, n is the length of the feature vector. Here it is 28. And k = 1 to m, where m is
the number of frames in given video. Frames are indexed based on the distance between every n th
frame and (n+1)th frame. Similar frames are displayed in the ranking order.
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 5/14
TEXTURE FEATURE EXTRACTION
Texture is characterized by the spatial distribution of gray levels in a neighborhood.
Though texture is widely used and intuitively obvious, it has no precise definition due to its widevariability. Frame textures can be artificially created or found in natural scenes captured in a
frame. Frame textures are one way that can be used to help in summarization of videos (video
processing) or classification of images.
The two dimensional discrete wavelet transform (DWT) is an effective tool to analyze frames in
a video and to capture localized frame details in both spatial and frequency domains. The DWT
is efficiently implemented using 'Haar' wavelet that applies iterative linear filtering and critical
down sampling on the original frame yielding three high-frequency directional sub-bands at each
scale level in addition to one low-frequency sub-band usually known as image approximation.
Directional sub-bands are sparse sub-images exhibiting image details according to horizontal,
vertical and diagonal orientations.
The decomposition process with the top image undergoing a first level decomposition to
generate 3 detail sub-bands (H1, V1, and D1), and one image approximation (A1). At the second
level of decomposition, the approximation image (A1) undergoes the same process to produce a
second scale level of image details (V2, H2 and D2) and a new image approximation (A2)
resulting 2-level DWT. At the third level of decomposition, the approximation image (A2)
undergoes the same process to produce a second scale level of image details (V3, H3 and D3)
and a new image approximation (A3) resulting 3-level DWT .
APPROACH
Key-frame is extracted based on discrete wavelet transform. Using the wavelet
transform, the texture frame is decomposed into four sub images, as low-low, low-high, high-
low and high-high sub-bands. To compute the wavelet features in the first step Haar wavelet is
calculated for whole frame. As a result of this transform there are 4 sub band images at each
scale. The wavelet transform is a multi-resolution technique, which can be implemented as a
pyramid or tree structure and is similar to sub-band decomposition. This process is continued
until third level decomposition. Energy of all third level decomposed frames is calculated
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 6/14
using energy level algorithm. Using Euclidean distance dissimilar frames are identified as
key-frames. The greater the distance is, the more dissimilar the frames are.
The frames in the video can be discriminated by their textures. To invoke the system, the
user must provide the first texture frame as the key-frame. The system then tries to compare the
frames with similar visual attributes of the successive frame. This method describes feature
extraction, the representation on textural descriptions, and the algorithm of similarity matching.
TEXTURE FEATURE CALCULATION
The texture frame is decomposed into four sub images, as low-low, low-high, high-
low and high-high sub-bands i.e, Approximation(A), Horizontal(H), Vertical(V) and
Diagonal(D) respectively. This process is continued until third level decomposition . Energy of
all decomposed frames is calculated using energy level algorithm. Similarity between the
successive frames can be obtained by comparing the Euclidean Distances.
ENERGY LEVEL ALGORITHM
1. Decompose the image into four sub-images.
2. Calculate the energy of all decomposed images at the same scale, using:
Where M and N are the dimension of the frame and X is the pixels co-efficient corresponding to
the ith row and jth column of the frame map.
3. Repeat from st e p 1 for the low-low sub-band image, until it becomes third level.
Using the above algorithm, the energy levels of the sub bands is calculated and further
decomposition of the low-low sub image is also done. This is repeated twice to reach the third
level decomposition. These energy levels are stored to be used for the Euclidean Distance
algorithm.
5.7.2 EUCLIDEAN DISTANCE CALCULATION
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 7/14
The Euclidean distance between the vectors X and Y is given by
D=√(∑(X-Y)^2)
Using the above algorithm, the Euclidean is distance is calculated between the n th frame
and every (n+1)th frame in the database. This process is repeated until all the frames are
compared with their adjacent frames. After completing the Euclidean distance algorithm, an
array of Euclidean distances is obtained and then it is sorted.
5.8 Steps for key-frame extraction using texture feature extraction
1. Convert the input video into frames.
2. Decompose the frames using Discrete Wavelet Transform.
3. Calculate the energy of the frame using energy level algorithm.
4. Calculate the histogram distance using Euclidean distance formula.
5. Compare the histogram using threshold and extract the key-frames.
FUZZY COMPREHENSIVE EVALUATION
METHODOLOGY
The evaluation of key-frame extraction mechanism is inherently subjective in nature.
Moreover, the factors effecting the human user’s decision to declare a frame as key frame are not
determined. This makes the problem of evaluation of summaries to be well suited for fuzzy
analysis. The purpose of fuzzy set and fuzzy logic is to deal with problems involving knowledge
expressed in vague linguistic terms. In a fuzzy set, each element of universe of disclosure isawarded a degree of membership value in a fuzzy set using a membership function. The
membership function is used to associate a grade to each linguistic variable. We use four
linguistic terms to represent the quality of the summary of a video. These fuzzy sets for quality
include Very Good (VG), Good (G), Average (A) and Poor (P). The Accuracy and Error Rates of
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 8/14
training videos are fuzzified based on the triangular like member functions to associate a level of
degree of goodness and badness to the user summaries.
Next the Fuzzy Comprehensive Evaluation (FCE) is separately applied to compute the
weight for each FDM. FCE is a well known method which comprehensively judges the
membership grade status of the items to be evaluated based on some factors. So, we evaluate
each FDM based on the factors of Accuracy and Error Rates by applying FCE.
In general, FCE has following requirements:
1. A Factor Set U= {u1,u2,....,um} is composed of m different factors. These factors influence the
evaluation of objects.
2. An Evaluation Set V= {v1,v2,....,vn}which is composed of n types of remarks.
3. A weight set W={w1,w2,.....,wm} where ∑ mi=1 wi = 1 and wi > = 0 each member of this set represents
the weight coefficients of each factor in the factor set U.
4. A Fuzzy transformation Γf that transforms Factor set U into the evaluation set V.
5. A Fuzzy relation R on UxV defined as:
R= (nij)m*n ∈ ( R )m*n
The membership degree of the subject to remark v j from the view point of factor u i is
given by:
ni j = uR (ui,v j)
where nij ∈ [0 ,1] , i=1,.....m; j=1,......n.
Based on sets U, V, W and Fuzzy relation
Algorithm for key-frame extraction based on color feature extraction
Step1: All the frames are extracted from the input sports video.
Step2: Consider first frame as a key-frame.
Step 3: Now the current key-frame is converted into a RGB frame.
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 9/14
Step 4: The RGB frame is converted into a HSV frame.
Step 5: Select the subsequent HSV frame from the extracted frames and apply discrete wavelet
transform. The frame is divided into A, H, V, D components.
Step 5.1: The frame is decomposed until 2
nd
level and finally we obtain 7 subcomponents frame.
Step 5.2: Apply Quantization for the subsequent frame.
Step 5.3: Now apply Normalization on the above obtained frame.
Step 6: Histogram Creation
Step6.1: Histogram Creation: The normalization values of each section are then averaged.
The histogram values are measured for two channel values i.e hue and saturation.
Step 6.2: Concatenation: The histogram values of the HSV frame are
concatenated.
Step 7: Distance Calculation
Step 7.1: Distance Calculation: Now the distance between consecutive frame histograms
is calculated using the Sum-of-Absolute Differences (SAD) formula.
Step 7.2: The Sum-of-Absolute Differences (SAD) is calculated using the below
formula:
SAD (fq, ft) = Σ|fq[i] − ft[i]|
Step 8: SAD is compared with the threshold value to detect key-frame. The frames with
higher SAD as compared to threshold are treated as key-frame.
Step 9: To detect key-frames based on color histogram difference measure in entire video
repeat from step3 to step8.
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 10/14
Algorithm for extracting key-frame based on shape feature extraction
Step1: All the frames are extracted from the input sports video.
Step2: Consider first frame as a key-frame.
Step3: Select the next subsequent frame from the extracted frames.
Step4: Edge map Creation: Four edge maps are obtained for each frame by multiplying
the four masks with approximation component.
Step4.1: Get first edge map by applying k1σ on h, v and d components and
combining them and then multiplying the resulting mask with the approximation
component.
Step4.2: Obtain second edge map similarly as that of first one by using k2σ.
Step4.3: Get third edge map by applying both k1σ and k2σ on approximation
component.
Step4.4: Obtain fourth edge map by finding the highest intensity pixels (both positive
and negative values) among h, v and d components and by multiplying with
approximation component.
Step5: Shape Representation: Seven moment invariants are for each edge map. As there
are four edge maps for each image, feature vector for an image consists of 28
features.
φ1 = Ŋ20 + Ŋ 02
φ2 = (Ŋ 20 -Ŋ 02)2
+ 4 Ŋ112
φ3 = (Ŋ 30 - 3 Ŋ 12)2 + (3 Ŋ 21 -Ŋ 03)2
φ4 = (Ŋ 30+ Ŋ 12)2 + (Ŋ 21 + Ŋ 03)
2
φ5 = (Ŋ 30 -3 Ŋ 12) ( Ŋ30+ Ŋ12) [ (Ŋ 30+ Ŋ 12)2 -3 ( Ŋ21+ Ŋ03)
2]+ (3Ŋ 21 - Ŋ 03)
( Ŋ21+Ŋ03) [ 3(Ŋ 30 + Ŋ12)2 - ( Ŋ21+ Ŋ03)
2]
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 11/14
φ6 = (Ŋ 20 - Ŋ 02) [ (Ŋ 30 + Ŋ12)2- ( Ŋ21+ Ŋ 03)
2] + 4 Ŋ11(Ŋ30 + Ŋ12) (Ŋ03 + Ŋ21)
φ7 = (3Ŋ21 - Ŋ03)( Ŋ30 + Ŋ12)[ (Ŋ30 + Ŋ12)2 -3(Ŋ21 + Ŋ03)
2] + (3Ŋ12 - Ŋ30) ( Ŋ21 + Ŋ03)
[3( Ŋ30 + Ŋ12)2 -(Ŋ21 + Ŋ03)
2]
Step6: Shape Matching: Canberra distance is used to find the distance measure for
similarity between the two feature vectors.
Step7: CD is compared with the threshold value to detect key-frame. The frames with
higher CD as compared to threshold are treated as key-frame.
Step8: To detect key-frames based on shape difference measure in entire video.
Algorithm for key-frame extraction based on texture feature extraction
Step1: All the frames are extracted from the input sports video.
Step2: Consider first frame as a key-frame.
Step3: Select the next subsequent frame from the extracted frames and convert RGB to
Gray image then divide frame into Approximation (A), Horizontal (H), Vertical
(V), Diagonal (D) each.
Step4: ENERGY CALCULATION:
Calculate the energy of all decomposed frames at the same scale, using:
Where M and N are the dimensions of the image, and X is the pixel coefficient
corresponding to i th row and j th column in the frame map.
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 12/14
Step5: Repeat from step 4 for the low-low sub-band image, until it becomes third level.
Step6: The Euclidean distance is calculated between the two frames.
D=√(∑(X-Y)^2)
Step7: ED is compared with the threshold value to detect key-frame. The frames with
higher ED as compared to threshold are treated as key-frame.
Step8: To detect key-frames based on texture feature measure in entire video.
Calculation for fuzzy comprehensive evaluation
We modeled our problem into the scenario of FCE as under:
1. The accuracy rate (CUSA) and error rate (CUSE) are the criteria used to evaluate the efficacy of
a specific measure thus they made up the factor set U.
2. The Evaluation set V includes the scale factor for CUS A and CUSE, such as Very Good (VG),
Good (G), Average (A) and Poor (P). The values of these scale factors are determined by
fuzzification of accuracy and error rates of the training videos and then averaging the degree of
membership of each set.
3. For better quality of video summary, the value of CUS A must be high and value of CUSE must
be low. Therefore, both the factors are equally important and are assigned equal weight to
determine weight set W = (0.5, 0.5).
4. The Fuzzy transformation Γf is defined as:
1
Γ f = 0.7
0.4
0
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 13/14
5. The single factor evaluation matrix R is then determined and b is computed using b = W◦R
and the weight/credibility of a technique is found by multiplying b with fuzzy transformation
function Γf.
6. The process is repeated to determine the weight of each frame difference measure.
Weight =b. Γ f
Including numerical example for the determination of weights. For this example, the
training has been shown only on 3 videos using a single FDM. The Accuracy and Error Rates,
the degree of membership after fuzzification and the average values of each scale factor for a
shape FDM are given in Table 1.
Vide
o
CUSA Degree of Membership (CUSA)
VG G A P
CUSE Degree of Membership( CUSE)
VG G A P
1 0.73 0.45 0.8 0 0 0.41 0 0.34 0.3 0
2 0.5 0 0.75 0 0 0.21 0.16 0 0 0
3 0.66 0.05 0.45 0 0 0.56 0 0.11 0.95 0
AVG 0.17 0.66 0 0 0.05 0 0.41 0
Table 1 A numerical example for fuzzification of accuracy and error rates.
Using Table 1, the matrix R is then given as:
0 .17 0 .66 0 0
R= and W = [0.5 0.5]0 .05 0 .11 0 .41 0
Therefore b = W◦R is determined as b = [0.17 0.5 0.41 0]
Weight =b. Γf = [0.17 0.5 0.41 0] 1 = 0.684
0.7
7/27/2019 Need of Color Histogram
http://slidepdf.com/reader/full/need-of-color-histogram 14/14
0.4
0