image to excel sheet conversion and measurement of ... · image processing-converting images to...
TRANSCRIPT
International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:12 No:05 1
120805-4343-IJVIPNS-IJENS © October 2012 IJENS I J E N S
Image to Excel Sheet Conversion and Measurement of Similarity Using VB.Net
Dr. Ebtesam Najim Abdullah Al-Shemmary College of Education for Girls
Computer Science Department
University of Kufa, An Najaf, Iraq
Email: [email protected]
Abstract In this paper we present a new
algorithm to design and implement a fully automatic
system with high level of accuracy to convert digital
image to Excel sheet and comparing two images to find
the discrepancy between them. Matrix-based are a
useful tool for exploring relationships between related
records in a data set. Relationships can be any relation
between two records, but are generally similarity or
dissimilarity measures. Converting image to Excel file
provides the possibility of dealing with the
image matrix and makes any mathematical
operations on the image much easier to the user.
Experimental results from test images data are given
to demonstrate the performance of the proposed
program and algorithms. The program was written in
Visual Basic.Net.
Index Term Digital Image, Excel Workbook,
Spreadsheet, Similarity Measurement.
I. INTRODUCTION
Some can imagine that the images processing means
processes only adorn the pictures and make some
decorations and drawings or delete them to become
different from the original. However, images
processing almost do not care with this aspect of
images processing at all. Its important to focus on
digital coding of images and the appropriate ways to
deal with this digital data. So that the information
carried by the image are usable with a machine,
computer, robot or other machines. Today, the
medical industry, astronomy, physics, chemistry,
forensics, remote sensing, manufacturing, and
defense are just some of the many fields that rely
upon images to store, display, and provide
information about the world around us. The
challenge to scientists, engineers and business
people is to quickly extract valuable information
from raw image data. This is the primary purpose of
image processing-converting images to information
[1]. Image that processed by computer called the
digital image. A digital image is composed of a grid
of pixels and stored as an array. A single pixel
represents a value of either light intensity or color.
Images are processed to obtain information beyond
what is apparent given the image‘s initial pixel
values [2]. Figure 1 show square pixels (picture
elements) arranged in columns and rows.
Fig. 1. An image — an array or a matrix of pixels arranged in
columns and rows.
In a (8-bit) grayscale image shown in Figure 2,
each picture element has an assigned intensity that
ranges from 0 to 255. A gray scale image is what
people normally call a black and white image, but
the name emphasizes that such an image will also
include many shades of gray. Each pixel has a
value from 0 (black) to 255 (white). The possible
range of the pixel values depend on the color depth
of the image, here 8 bit = 256 tones or grayscales.
Fig. 2. 8-bit grayscale image
A normal grayscale image has 8 bit color depth
= 256 grayscales. A ―true color‖ image has 24 bit
color depth = 8 x 8 x 8 bits = 256 x 256 x 256 colors
= ~16 million colors. Figure 3 shows a true-color
image assembled from three grayscale images
colored red, green and blue. Such an image may
contain up to 16 million different colors.
International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:12 No:05 2
120805-4343-IJVIPNS-IJENS © October 2012 IJENS I J E N S
Fig. 3. A true-color image
Some grayscale images have more grayscales,
for instance 16 bit = 65536 grayscales. In principle
three grayscale images can be combined to form an
image with 281,474,976,710,656 grayscales.
There are two general groups of ‗images‘: vector
graphics (or line art) and bitmaps (pixel-based or
‗images‘). Table 1 illustrates some of the most
common file formats [3].
Excel is perhaps the most important computer
software program used in the workplace today.
From the viewpoint of the employer, particularly
those in the field of information systems, the use of
Excel as an end-user computing tool is essential. In
general, Excel dominates the spreadsheet product
industry with a market share estimated at 95
percent. Excel 2007 has the capacity for
spreadsheets of up to a million rows by 16,000
columns, enabling the user to import and work with
massive amounts of data and achieves faster
calculation performance than ever before.
TABLE I
Some Common File Formats
Group Description
Bit
ma
p F
orm
at
GIF an 8-bit (256 color), non-destructively
compressed bitmap format. Mostly used for web.
Has several sub-standards one of which is the
animated GIF.
JPEG
a very efficient (i.e. much information per byte)
destructively compressed 24 bit (16 million
colors) bitmap format. Widely used, especially
for web and Internet (bandwidth-limited).
TIFF
the standard 24 bit publication bitmap format.
Compresses non-destructively with, for instance,
Lempel-Ziv-Welch (LZW) compression
Vec
tor
Fo
rma
t
PS
Postscript, a standard vector format. Has
numerous sub-standards and can be difficult to
transport across platforms and operating systems.
PSD a dedicated Photoshop format that keeps all the
information in an image including all the layers.
Outside the workplace, Excel is in broad use for
everyday problem solving. There is a surprisingly
wide range of mathematics involved in manipulating
digital images, from simple arithmetic (e.g.
increasing or decreasing the image brightness), to
matrix algebra (applying a filter) to numerical PDEs
(image sharpening) to more complex processes such
as pattern recognition. Students and end-user are
familiar with digital images, most likely because of
their mobile phone camera, and many are also
familiar with the post-processing possibilities -either
using in-camera tools or through tools such as
Photoshop. Many students have a good working
knowledge of spreadsheets -particularly Microsoft
Excel- as they are widely used as a support tool in
their course. This paper describes algorithms to
allow students to make use of their spreadsheet
skills in applying the mathematical techniques
necessary to implement a variety of image
modifications. It can also find the similarity degree
between images from the data in the worksheets [4].
In 2007, a detailed description of the algorithms
of fingerprint analysis was made by AlShemmary
[5] in which a new method of data analysis,
especially very large datasets analysis using Excel
were produced. Data preparation can be easily
accomplished in Excel.
Many techniques exist to create an Excel file.
Each of them offers some unique advantages.
Knowing and understanding the different techniques
is essential for programmers to quickly and
effectively produce a report that will meet the
requirement provided by the customer [6]. This
paper described the algorithms to generate an Excel
file from digital image using VB.NET, and provide
the appropriate method to use when an Excel output
must be created.
The structure of this paper is as follows: first we
introduce the computer drawings in Section 2.
Section 3 provides some background on algorithms
that use patch matching, and reviews the approaches
of measuring image patch similarity. Section 4
proposes a method for automatically similarity
matrix calculation. Section 5 shows the results of
our experiments and finally, Section 6 interprets the
results and indicates our future work.
I. Computer Drawings
Computer drawings are divided into 2 categories:
1- Vector Drawings
It's made up of lines and curves inside the
computer knowledge of mathematical objects called
"vector". Vectors describing picture elements
according to engineering. In other words, this aspect
of processing based on the lines related to making
the computer read drawing like a set of
mathematical equations that lead it to redraw a
board if it wanted to be getting bigger or smaller
size or want to move from its place. However, the
computers work and understand if the drawings
works in such a way seem more intelligent. Lines
and neighboring is the foundation for the work
because if we knew the straight line, we will find
the line between two points without need to define
all the points in the middle. The advantages of this
International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:12 No:05 3
120805-4343-IJVIPNS-IJENS © October 2012 IJENS I J E N S
type [7]:-
Does not require much space for storage on
the computer.
Does not affect by the painting, whether
zoomed in or out, and up to 10 times the
original size.
The drawings are stored using the format:
eps, cdr, and ai. Programs that
deal with these drawings are Adobe
Illustrator, Corel Draw, Freehand, and
Macromedia Flash.
2 Bitmap Drawings
It's a network of colors representing the image,
and every point in the optical network unit called
"pixel". Every pixel is determined by two pieces, the
location of the pixel through the coordinates, and
color of the pixel. These drawings are used
electronically in the bitmap graphics, photographs
and digital graphics. The advantages of this drawing
[7]:-
Display a huge spectrum of colors, so that it
displays color gradients and shadows, and
complex interactions.
The number of pixels representing image is
static, but in fact it is disadvantage because,
it has negative impact at zoom in or out, less
image quality.
The drawings are stored using the format: gif, jpg,
bmp, tiff, psd, and png. Programs that
deal with these drawings are Adobe Photoshop,
Macromedia Fireworks, Paint ShopPro, CorelDraw,
and Photo Paint [7].
II. IMAGE PATCH SIMILIARITY
The ability to compare image regions (patches)
has been the basis of many approaches to core
computer vision problems, including object, texture
and scene categorization. Developing
representations for image patches has also been in
the focus of much work. However, the question of
appropriate similarity measure between patches has
largely remained unattended.
The main context in which comparing image
patches has emerged in computer vision is that of
high-level vision tasks, that can be described as
scene understanding. This includes [8]:
Object recognition: This means finding a specific
object: a face of a certain person, a shoe of a
particular make, a magazine etc.
Object categorization and object class detection:
Rather than looking for a specific instance of an
object, the interest here is in all objects that belong
to a certain class, for an appropriate definition of the
latter: any face, any car, etc.
Entire image classification: Sometimes the goal is
not to localize, or determine the presence of an
object, but rather to assign the entire image to a
certain class. For instance, location recognition and
texture classification belong to this category of tasks
[5].
The question of measuring similarity between
patches has not received very much attention in the
computer vision literature. Usually, a standard
distance measure is adopted for whatever
representation is used:
1 Pixel-based distance
The simplest similarity measure consists of
directly comparing the pixel values of the two
regions, e.g. by means of the L1 distance [3]:
( ) ∑ ( ) ( )
This is rarely a useful measure, since it is extremely
sensitive to minor transformations, both in geometry
(shifts and rotations) and in imaging conditions
(lighting or noise).
2 Correlation
Normalized correlation between patches x1 and
x2 is defined as:
( ) ∑ ( ( ) ̅ ) ( ( ) ̅ )
,
where ̅ are the mean and standard
deviation of pixels in xi. Because of the factoring in
of the means it is much more robust than the pixel-
wise distance. Normalized correlation has been used
extensively in fragment-based recognition, where it
is assumed that viewing conditions are fixed, or
alternatively that there exist examples from all
viewing conditions–in other words, not matching a
patch to a version of itself rotated by 90 degrees is
acceptable. We would like to avoid such an
assumption [9].
3 Descriptor distance
Another popular method is to compute a
descriptor of each patch, and then simply apply a
distance measure on the two descriptors. Most
commonly the descriptors are vectors in a metric
space of fixed dimensions and the distance of choice
is L1 [3].
4 Probabilistic matching
A different approach is taken by some of the
methods that instead of measuring distance between
representations patches, evaluate directly the
probability that the two patches belong to the same
class. This is usually limited to models in which a
fixed number of patch classes, called parts, are
combined in some framework. A well known
example of this kind is the family of constellation
models [10].
IV. PROPOSED ALGORITHM
Matching covers the groups of techniques
based on similarity measures where the distance
between the feature vectors, describing the extracted
character and the description of each class is
calculated. Different measures may be used, but the
common is the Euclidean distance. This minimum
distance classifier works well when the classes are
well separated, that is when the distance between
International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:12 No:05 4
120805-4343-IJVIPNS-IJENS © October 2012 IJENS I J E N S
the means large is compared to the spread of each
class [11]. Three common pattern arrangements
used in practices are: (Vectors, Strings, and Trees).
In this paper Vectors are used. The complete
diagram of proposed design was presented in Fig 4.
Fig. 4. Flowchart of the proposed system.
Decision-theoretic approaches to recognition are
based on the use of decision functions. Let represent
an n-dimensional pattern vector. For W pattern
classes, we want to find W decision functions with
the property that, if a pattern x belongs to class, then
[3]:
The decision boundary separating class is given by:
This section is to identify the decision of similarity
between two images by the single function
0)()()( xxxjiij
ddd . Thus dij<=0.002
(similarity decision) for image is similar, otherwise
is not.
Suppose each image is presented by a mean
vector:
∑
( )
where:
Nj: the number of vectors from image wj
W: the number of images
One way to find the closeness between images is the
Euclidean distance. If Euclidean distance is used for
closeness decision:
( ) ‖ ‖ ( )
where: ‖ ‖ √ is the Euclidean norm.
Selecting the smallest distance is equivalent to
evaluating the functions:
( )
( )
From Equ.(2) and Equ.(5), the decision
boundary between images wi and wj for a minimum
distance classifier is:
)()()( xxxjiij
ddd
( )
( )
( )
( )
In classic paper, Fisher [12] reported the use of
what, then was a new technique called discriminate
analysis to recognize three types of iris flowers (iris
setosa, virginca, and versicolor) by measuring the
widths and lengths of their petals (see Figure 5).
Figure 6 shows an example of two vectors extracted
from the iris samples in Fig. 5. The two images, iris
versicolor and iris setosa denoted w1 and w2
respectively, have mean vectors m1= (4.3, 1.3)T and
m2=( 1.5, 0.3)T.
The minimum distance classifier work well
when the distance between means is large compared
to the spread or randomness of each image with
respect to its mean.
Fig. 5. Three types of iris flowers described by two
measurements
Yes
No
Start
Determine size of image from
settings
Open Image
Convert Image to Gray Image
Create Excel Sheet
Convert Image to
Excel
Vector of Gray
Image
Save Vector
in Notepad
IF
d12<=0.
002
Not
Similar
Find m1&m2
for two images
Find d1&d2 for
two images
Find d12=d1-d2
Is
similar
End
)1....(;,...,2,1 )()( ijWjddji
xx
)2.....(0)()(or )()( xxxxjiji
dddd
International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:12 No:05 5
120805-4343-IJVIPNS-IJENS © October 2012 IJENS I J E N S
Fig. 6. Decision boundary of minimum distance classifier for
images of iris versicolor and iris setosa. The dark dot and square
are the means.
V. Implementation and Results
The image processing algorithms discussed above
are modeled in VBasic.Net using Windows7
Ultimate operating system. The design is
implemented on Laptop LG Dual-Core CPU 2GHz
and 2Gbyte RAM. Visual Basic.Net is a powerful
tool and effective way to develop applications
compatible with the Windows environment.
Providing with an integrated development
environment to create easy use solutions, running
commands, viewing output, editing, and managing
files and variables. The screens and windows are
designed by clicks and mouse movements as
paint light boxes and circles using drawing
programs and other. The (.NET) is essentially a
framework for software development.
In this Section we will describe operations that
are fundamental to digital image processing. These
operations can be divided into two lists: file list
which contain the standard command (open, save,
save vector, and exit) and process list that contain
all operations of the program (convert to gray and to
binary, vector of gray and binary, create Excel
sheet, convert image to Excel file, and measurement
of similarity degree). We have evaluated our
methods on several different standard images, as
shown in Table 2.
The implementations of the program starts by
selecting image size from settings, then opening
images, convert to gray and find vectors of those
images. Also we can save this vector in (. dat) or
(.txt) format to be used in another techniques, e.g.
(input data to the neural network). Using these
vectors we can find if these two images are similar
or not. Further, these operations and their executions
can also be described in GUI shown in Appendix-A.
Our experiments showed that such a system is a
powerful tool that provides programmers with a
wide range of solutions and options for exporting
data to Excel. Although several methods exist to
accomplish the same result, they might not
necessarily be appropriate for the task at hand.
While some techniques offer better customization of
reports, others provide simpler, more efficient
syntax. This project gave us the opportunity to
subject ourselves to the tools and techniques we are
studying and gave us valuable insight into the how
the digital image processing integrates into the
Excel workbook. Programming an Excel report is a
common task for most programmers. Customers
often want to be able to view, sort, and filter their
data, and Excel is usually the tool they master best.
Excel spreadsheets used to manage and analyze data
efficiently. Excel provide many functions and
formulas that will not only help to manage data
records, but will also make sure that they could
analyze all data based on the constantly changing
business environment. Excel is spreadsheet software
that helps us organize and chart large amounts of
data [13].
The experimental results show how image
processing can be done in Microsoft Excel. The
paper also explains how can read any image and
obtain the pixel values of the image. Thus it is
shown that the whole set of image processing
operations such as reading, processing and printing
of image can be done in excel. Above all this paper
stressed on the potential of Microsoft Excel as a
scientific learning tool. VI. CONCLUSIONS
In this paper, we have introduced similarity
measure, which reveal Euclidean distance on the use
of decision functions. A system to achieve, analyze
and compare two images using minimum distance
classifier algorithm have been proposed in this
work. The approach is founded on similarity
analysis, in which the smallest distance (dij) between
images gives a good result for similarity. The
efficient program design enables us to resize image
in an efficient way. In the context of the larger
research project, this project has identified some
possible avenues for future work. Future work will
focus on applying this work to other techniques and
applications. Adding large database, to save the
images that are similar and that different, comparing
them directly is a good suggestion in pattern
recognition. Almost everyone can use Excel
experienced. Therefore, data processing in Excel
can be implemented easily, not only by people with
specialty knowledge but also by people without
specialty knowledge.
REFERENCES
[1] http://heritage.stsci.edu/commonpages/infoindex/ourimage
s/color_comp.html. [2] Thomas M. Lehmann, Claudia G¨onner, Klaus Spitzer, "
Survey: Interpolation Methods in Medical Image
Processing", IEEE Transactions on Medical Imaging, Vol. 18, No. 11, November 1999.
[3] Rafael Gonzalez, Richard E. Woods, "Digital image
Processing /2E", Printice Hall, Upper Saddle River, New Jersey 07458, 2002.
International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:12 No:05 6
120805-4343-IJVIPNS-IJENS © October 2012 IJENS I J E N S
[4] Sheri Graves, "where is Microsoft Excel Used?".
Http://ezinearticles.com, September, 14, 2007. [5] Ebtesam N. AlShemmary, "Fingerprint Image
Enhancement and Recognition Algorithms", PhD.
Thesis, University of Technology, Baghdad, 2007. [6] Musa J. Jafar, "A Tools-Based Approach to Teaching
Data Mining Methods", Journal of Information
Technology Education, Vol 9, Canyon, TX, USA, 2010. [7] http://ar.wikipedia.org/wiki
[8] Yu Shiu, Hong Jeong, C.C.Jay Kuo, "Similarity Matrix
Processing for Music Structure Analysis", AMCMM, Santa Barbara, California, USA, 06, October 27, 2006.
[9] IDL,"Image Processing", 0509IDL71IP, ITT Visual
Information Solutions, May 2009. [10] Ian T. Young, Jan J. Gerbrands, Lucas J. van Vliet,
"Fundamentals of Image Processing", Delft University
of Technology, 1997. [11] T. Y. Young & K-S Fu. "Handbook of Pattern
Recognition &Image Processing", Academic Press, 1986.
[12] Fisher, R. A., "The Use of Multiple Measurements in
Taxonomic Problems", Ann Eugenics, Vol.7, Part2, pp.
179-188, 1963.
[13] MacDonald, M. , "Excel 2007 : The Missing Manual", Published by Pogue Press, O‟Reilly. ISBN:978-0-596-
52759-4, 2006.
TABLE II
Experimental Results of Proposed Method
Original Image Binary Image Excel File of Image (10% Zoom)
International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:12 No:05 7
120805-4343-IJVIPNS-IJENS © October 2012 IJENS I J E N S
Appendix-A: Program implementation and examples of acquired results
Fig. A-1. Start project screenshot
Fig. A-2. Project GUI screenshot
Fig. A-3. File list screenshot Fig. A-4. Process list screenshot
Fig. A-5. Choose size of the image screenshot Fig. A-6. Opening first and second image screenshot
Fig. A-7. Convert first and second image to gray screenshot Fig. A-8. Convert first and second image to gray with its vectors (by tricking
shows boxes of vector to the right) screenshot.
International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:12 No:05 8
120805-4343-IJVIPNS-IJENS © October 2012 IJENS I J E N S
Appendix-A: Program implementation and examples of acquired results (continued)
Fig. A-9. Create excel sheet of first and second image screenshot
Fig. A-10. Implementation of converting image to excel file and its output
Fig. A-11. Two examples of similar images.
Fig. A-12. An Example of not similar images Fig. A-13. Implementation Command of Saving image Vector in Notepad
International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:12 No:05 9
120805-4343-IJVIPNS-IJENS © October 2012 IJENS I J E N S
Fig. A-14. Dialog box to save image vector in another format. Fig. A-15. Store image vector of in (. dat) file format