docmentation with rules
TRANSCRIPT
ASSITIVE TEXT AND PRODUCT LABEL READING
FROM HAND - HELD OBJECTS FOR BLIND PERSONS
USING MATLAB
Submitted in partial fulfillment of the Requirement for the award of the degree of
BACHELOR OF TECHNOLOGY IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Submitted by
Student Name(s) Regd. No.
SAI PRATHAP REDDY.K 12095A0421
NEELIMA.G 11091A0485
RAMANJANEYULU.Y 11091A04A0
MALLESWARI.N 11091A0464
SAI PRASANTH.A 11091A04B5
Under the Esteemed Guidance of
Mr.N.RAMANJANEYULU M.Tech,(Ph.D.)
Associate Professor in ECE
(ESTD-1995)
SCHOOL OF ELECTRONICS AND COMMUNICATION ENGINEERING
RAJEEV GANDHI MEMORIAL
COLLEGE OF ENGINEERING AND TECHNOLOGY (AUTONOMOUS)
Affiliated to J.N.T. University-Anantapur, Approved by A.I.C.T.E., New Delhi, Accredited by N.B.A-New Delhi,
Accredited by NAAC with A- grade, Participated in World Bank TEQIP-1
NANDYAL –518501, Kurnool Dist. A.P. YEAR: 2011-2015
RAJEEV GANDHI MEMORIAL COLLEGE OF
ENGINEERING & TECHNOLOGY AUTONOMOUS
(Approved by A.I.C.T.E-New Delhi, Affiliated to JNT University-Anantapur,
Accredited by NBA-New Delhi, Accredited by NAAC with A-Grade)
NANDYAL – 518 501, A.P, India
SCHOOL OF ELECTRONICS AND COMMUNICATION ENGINEERING
CERTIFICATE
This is to certify that the dissertation entitled “ASSITIVE TEXT AND
PRODUCT LABEL READING FROM HAND - HELD OBJECTS FOR BLIND
PERSONS USING MATLAB” is being submitted by SAI PRATHAP REDDY.K
(12095A0421), NEELIMA.G (11091A0485), RAMANJANEYULU.Y (11091A04A0),
MALLESWARI.N (11091A0464), SAI PRASANTH.A (11091A04B5) under the
guidance of Mr.N.RAMANJANEYULU for Project of the award of B.Tech Degree in
ELECTRONICS AND COMMUNICATION ENGINEERING in the RAJEEV
GANDHI MEMORIAL COLLEGE OF ENGINEERING & TECHNOLOGY, Nandyal
(Affiliated to J.N.T. University Anantapur) is a record of bonafied work carried out by
them under our guidance and supervision.
Head of the Department: Project Guide: Dr. D.SATYANARAYANA, Mr.N.RAMANJANEYULU
M.Tech. Ph.D, MISTE, FIETE, MIEEE M.Tech.( Ph.D.)
Professor and H.O.D Associate Professor in ECE
Signature of the External Examiner
Date of Examination:
i
ACKNOWLEDGEMENT
The successful completion of this project report is made possible
with the help and guidance received from various quarters. We would like
to avail this opportunity to express our sincere thanks and gratitude to all
of them.
We are deeply indebted to our guide, Mr.N.RAMANJANEYULU,
M.Tech., (Ph.D.) Associate Professor, Department of Electronics and
Communication Engineering. We are really fortunate to associate
ourselves with such an advising and helping guide in every possible way at
all stages for successful completion of this project work.
We extend our deep sense of gratitude to Dr. D.SATYANARAYANA
B.E, M.Tech., Ph.D., MISTE, FIETE professor and HOD of ECE,
RGMCET, for his moral support and valuable advices during this project
work and the course.
We also express our deep gratitude to our principal, Dr.
T.JAYACHANDRA PRASAD GARU and to our chairman, Dr. M.SANTHI
RAMUDU GARU for providing the required facilities.
We express our thanks to all other teaching and non-teaching staff
for their cooperation in many aspects for successful completion of the mini
project.
We also like to thank all our family members and friends who gave
us constructive suggestions and encouragement throughout the project.
PROJECT MEMBERS
K. SAI PRATHAP REDDY 12125A0412 G. NEELIM 11091A0485 Y. RAMANJANEYULU 11091A04A0
N. MALLESWARI 11091A0464 A. SAI PRASANTH 11091A04B5
ii
ABSTRACT
We propose assistive text reading framework to help blind
people to read text labels and product labels from hand-held objects
in their daily lives. Printed text is everywhere in the form of reports,
receipts, bank statements, restaurant menus, classroom handouts,
product packages, instructions on medicine bottles, etc. we first
propose an efficient and effective motion-based method to define a
region of interest (ROI). To isolate the object from the surrounding
objects in the camera view, Camera acts as main vision in detecting
the label image of the product or board then image is processed
internally and separates label from image by using MATLAB and
finally identifies the product and identified product name is
pronounced through voice. When capture button is clicked system
captures the product image placed in front of the web camera. Now
captured image or the selected image from the system by using
graphical user interface is converted into text by using edge based text
region extraction. Finally extracted text is converted into speech by
using text to speech synthesizer. We explore user interface issues in
extracting and reading text from different objects with complex
backgrounds.
iii
TABLE OF CONTENTS
CHAPTER NO TITLE PAGE NO.
ACKNOWLEDGEMENT i
ABSTRACT ii
LIST OF FIGURES v
CHAPTER 1 INTRODUCTION 1
1.1 Importance of Printed Text 1
CHAPTER 2 FUNDAMENTALS OF IMAGE PROCESSING 5
2.1 Introduction 5
2.1.1 Image 5
2.1.2 Image File Sizes 6
2.1.3 Image File Formats 6
2.1.4 Raster Formats 6
2.1.5 Vector Formats 9
2.2 Digital Image Processing 10
2.3 Applications of Digital Image Processing 10
2.4 Fundamental Steps in Digital Image
Processing 11
2.5 Components of Image Processing System 16
CHAPTER 3 EXISTING METHODS 20
3.1 Portable Bar Code Readers 20
3.2 KReader Mobile 21
3.3 Pen Scanners 22
CHAPTER 4 PROPOSED METHOD 23
4.1 Algorithm for Edge Based Text Region
Extraction 25
4.1.1 Detection 26
4.1.2 Localization 29
4.1.3 Character Extraction 29
4.2 IMPLEMENTATION 30
iv
4.2.1 Software Requirement -MATLAB 30
4.2.2 Typical Uses of MATLAB 31
4.2.3 Features of MATLAB 31
4.2.4 Basic Building Blocks of MATLAB 32
4.3 MATLAB Window 32
4.3.1 Command Window 32
4.3.2 Workspace Window 33
4.3.3 Current Directory Window 33
4.3.4 Command History Window 33
4.3.5 Editor Window 33
4.3.6 Graphics or Figure Window 34
4.3.7 Online Help Window 34
4.4 MATLAB Files 34
4.4.1 M-Files 35
4.4.2 Script Files 35
4.4.3 Function Files 35
4.4.4 Mat-Files 35
4.5 MATLAB Function 35
4.5.1 Development Environment 35
4.5.2 MATLAB Mathematical Function 36
4.5.3 MATLAB Language 36
4.5.4 GUI Construction 36
4.5.5 MATLAB Application Interface 36
4.6 MATLAB Working Environment 36
4.6.1 MATLAB Desktop 36
4.6.2 Using MATLAB Editor To Create M-
Files 38
4.6.3 Getting Help 39
CHAPTER 5 RESULTS 40
5.1 ADVANTAGES 44
5.2 CONCLUSION AND FURUTE SCOPE 44
REFERENCES 46
v
LIST OF FIGURES
FIGURE NO. NAME OF THE FIGURE PAGENO.
Figure 1.1 Printed text with multiple colours Complex backgrounds
or non-flat surfaces 3
Figure 1.2 Examples of text localization and recognition from
camera captured images. (Top) milk box. (bottom) men
bathroom 3
Figure 2.1 Fundamental Steps in Image processing 12
Figure 2.2 Components of Image Processing System 16
Figure 3.1 Barcode on a product 20
Figure 3.2 Barcode machine scannig unique code on product 21
Figure 3.3 KReader mobile in mobiles 21
Figure 3.4 Pen scnanners scanning a document 22
Figure 4.1 Extration of text from product 24
Figure 4.1.1 Image with multiple background and multiple fonts 24
Figure 4.1.2 Basic Block diagram for edge based text extraction 25
Figure 4.1.3 Default filter returned by the fspecial Gaussian
function 26
Figure 4.1.4 Sample Gaussian pyramid with 4 levels 27
Figure 4.1.5 Each resolution image resized to original image size 27
Figure 4.1.6 The directional kernels 27
Figure 4.1.7 Sample image from Figure 3 after convolution with
each directional kernel Note how the edge
information in each direction is highlighted 28
Figure 4.1.8 Sample resized image of the pyramid after
convolution with 0º kernel 28
Figure 4.1.9 (a) Before dilation (b) After dilation 29
Figure 4.1.10 (a) Original image (b) Result 30
Figure 4.2 Representation of MATLAB Window 37
Figure 5.1 GUI Window 40
Figure 5.2 Popup window for selecting picture 40
Figure 5.3 Browsing for a picture 41
vi
Figure 5.4 Loading of a Picture 41
Figure 5.5 Directory of the picture 42
Figure 5.6 Finished finding of picture from directory 42
Figure 5.7 Converted .txt file and result in notepad 43
Figure 5.8 Appeared picture and output text in notepad 43
Figure 5.9 Extracted text to speech 44
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 1
CHAPTER 1
INTRODUCTION
Of the 314 million visually impaired people worldwide, 45million
are blind. Even in developed country like the U.S., U.S., the 2008
National Health Interview Survey reported that an estimated 25.2
million adult Americans (over 8%) are blind or visually impaired. This
number is increasing rapidly as the baby boomer generation ages.
Recent developments in computer vision, digital cameras, and
portable computers make it feasible to assist these individuals by
developing camera-based products that combine computer vision
technology with other existing commercial products such optical
character recognition (OCR) systems.
1.1 Importance Of Printed Text
Reading is obviously essential in today‟s society. Printed text is
everywhere in the form of reports, receipts, bank statements,
restaurant menus, classroom handouts, product packages,
instructions on medicine bottles, etc. And while optical aids, video
magnifiers, and screen readers can help blind users and those with
low vision to access documents, there are few devices they can provide
good access to common hand-held objects such as product packages,
and objects printed with text such as prescription medication bottles.
The ability of people who are blind or have significant visual
impairments to read printed labels and product packages will enhance
independent living and foster economic and social self-sufficiency.
Today, there are already a few systems that have some promise for
portable use, but they cannot handle product labelling. For example,
portable bar code readers designed to help blind people identify
different products in an extensive product database can enable users
who are blind to access information about these products through
speech and Braille. But a big limitations that it is very hard for blind
users to find the position of the bar code and to correctly point the bar
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 2
code reader at the bar code. Some reading-assistive systems such as
pen scanners might be employed in these and similar situations. Such
systems integrate OCR software to offer the function of scanning and
recognition of text and some have integrated voice output.
However, these systems are generally designed for and perform
best with document images with simple backgrounds, standard fonts,
a small range of font sizes, and well-organized characters rather than
commercial product boxes with multiple decorative patterns. Most
state of the art OCR software cannot directly handle scene images
with complex backgrounds. A number of portable reading assistants
have been designed specifically for the visually impaired. KReader
Mobile runs on a cell phone and allows the user to read mail, receipts,
fliers, and many other documents. However, the document to be read
must be nearly flat, placed on a clear, dark surface (i.e., a non-
cluttered background), and contain mostly text. Furthermore,
KReader Mobile accurately reads black print on a white background,
but has problems recognizing coloured text or text on a coloured
background. It cannot read text with complex backgrounds, text
printed on cylinders with warped or incomplete images (such as soup
cans or medicine bottles). Furthermore, these systems require a blind
user to manually localize areas of interest and text regions on the
objects in most cases. Although a number of reading assistants have
been designed specifically for the visually impaired, to our knowledge,
no existing reading assistant can read text from the kinds of
challenging patterns and backgrounds found on many everyday
commercial products.
As shown in Figure 1.1 such text information can appearing
multiple scales, fonts, colors, and orientations. To assist blind persons
to read text from these kinds of hand-held objects, we have conceived
of a camera-based assistive text reading framework to track the object
of interest within the camera view and extract print text information
from the object. Our proposed algorithm can effectively handle
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 3
complex background and multiple patterns, and extract text
information from both hand-held objects and nearby signage.
Figure 1.1 Printed text with multiple colours Complex backgrounds or
non-flat surfaces
As shown in Figure 1.2 in assistive reading systems for blind
persons, it is very challenging for users to position the object of
interest within the center of the camera‟s view. As of now, there are
still no acceptable solutions. We approach the problem in stages. To
make sure the hand-held object appears in the camera view, we use a
camera with sufficiently wide angle to accommodate users with only
approximate aim. This may often result in other text objects appearing
in the camera‟s view (for example, while shopping at a supermarket).
To extract the hand-held object from the camera image, we develop a
motion-based method to obtain a region of interest (ROI) of the object.
Then, we perform text recognition only in this ROI.
Figure 1.2 Examples of text localization and recognition from camera
captured images. (Top) milk box. (bottom) men bathroom signage.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 4
From the above figure (a) Camera captured images. (b) Localized
text regions (marked in blue). (c) Text regions cropped from image. (d)
Text codes recognized by OCR. Text at the top right corner of bottom
image is shown in a magnified callout.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 5
CHAPTER 2
FUNDAMENTALS OF IMAGE PROCESSING
2.1 Introduction
Image processing usually refers to digital image processing, but
optical and analog image processing also are possible. The acquisition
of images (producing the input image in the first place) is referred to
as imaging. Digital image processing is the use of computer
algorithms to perform image processing on digital images. As a
subcategory or field of digital signal processing, digital image
processing has many advantages over analog image processing. It
allows a much wider range of algorithms to be applied to the input
data and can avoid problems such as the build-up of noise and signal
distortion during processing. Since images are defined over two
dimensions (perhaps more) digital image processing may be modelled
in the form of multidimensional systems.
2.1.1 Image
An image is a two-dimensional picture, which has a similar
appearance to some subject usually a physical object or a person.
Image is a two-dimensional, such as a photograph, screen display,
and as well as a three-dimensional, such as a statue. They may be
captured by optical devices such as cameras, mirrors, lenses,
telescopes, microscopes, etc. and natural objects and phenomena,
such as the human eye or water surfaces.
The word image is also used in the broader sense of any two-
dimensional figure such as a map, a graph, a pie chart, or an abstract
painting. In this wider sense, images can also be rendered manually,
such as by drawing, painting, carving, rendered automatically by
printing or computer graphics technology, or developed by a
combination of methods, especially in a pseudo-photograph.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 6
2.1.2 Image File Sizes
Image file size is expressed as the number of bytes that
increases with the number of pixels composing an image, and the
colour depth of the pixels. The greater the number of rows and
columns, the greater will be the image resolution, and larger the file.
Also, each pixel of an image increases in size when its colour depth
increases, an 8-bit pixel (1 byte) stores 256 colours, a 24-bit pixel (3
bytes) stores 16 million colours, the latter known as true colour.
Image compression uses algorithms to decrease the size of a file.
High resolution cameras produce large image files, ranging from
hundreds of kilobytes to megabytes, per the camera's resolution and
the image-storage format capacity. High resolution digital cameras
record 12 megapixel (1MP = 1,000,000 pixels / 1 million) images, or
more, in true colour. For example, an image recorded by a 12 MP
camera; since each pixel uses 3 bytes to record true colour, the
uncompressed image would occupy 36,000,000 bytes of memory, a
great amount of digital storage for one image, given that cameras
must record and store many images to be practical. Faced with large
file sizes, both within the camera and a storage disc, image file
formats were developed to store such large images.
2.1.3 Image File Formats
Image file formats are standardized means of organizing and
storing images. This entry is about digital image formats used to store
photographic and other images. Image files are composed of either
pixel or vector (geometric) data that are characterized to pixels when
displayed (with few exceptions) in a vector graphic display. Including
proprietary types, there are hundreds of image file types. The PNG,
JPEG, and GIF formats are most often used to display images on the
Internet.
2.1.4 Raster Formats
These formats store images as bitmaps (also known as pixmaps).
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 7
JPEG/JFIF
JPEG (Joint Photographic Experts Group) is a compression method.
JPEG compressed images are usually stored in the JFIF (JPEG File
Interchange Format) file format. JPEG compression is lossy
compression. Nearly every digital camera can save images in the
JPEG/JFIF format, which supports 8 bits per colour (red, green, blue)
for a 24-bit total, producing relatively small files. Photographic images
may be better stored in a lossless non-JPEG format if they will be re-
edited, or if small "artefacts" are unacceptable. The JPEG/JFIF format
also is used as the image compression algorithm in many Adobe PDF
files.
EXIF
The EXIF (Exchangeable image file format) format is a file standard
similar to the JFIF format with TIFF extensions. It is incorporated in
the JPEG writing software used in most cameras. Its purpose is to
record and to standardize the exchange of images with image
metadata between digital cameras and editing and viewing software.
The metadata are recorded for individual images and include such
things as camera settings, time and date, shutter speed, exposure,
image size, compression, name of camera, colour information, etc.
When images are viewed or edited by image editing software, all of this
image information can be displayed.
TIFF
The TIFF (Tagged Image File Format) format is a flexible format that
normally saves 8 bits or 16 bits per color (red, green, blue) for 24-bit
and 48-bit totals, respectively, usually using either the TIFF or TIF
filename extension. TIFFs are lossy and lossless. Some offer relatively
good lossless compression for bi-level (black & white) images. Some
digital cameras can save in TIFF format, using the LZW compression
algorithm for lossless storage. TIFF image format is not widely
supported by web browsers. TIFF remains widely accepted as a
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 8
photograph file standard in the printing business. TIFF can handle
device-specific colour spaces, such as the CMYK defined by a
particular set of printing press inks.
PNG
The PNG (Portable Network Graphics) file format was created as the
free, open-source successor to the GIF. The PNG file format supports
true colour (16 million colours) while the GIF supports only 256
colours. The PNG file excels when the image has large, uniformly
coloured areas. The lossless PNG format is best suited for editing
pictures, and the lossy formats, like JPG, are best for the final
distribution of photographic images, because JPG files are smaller
than PNG files. PNG, an extensible file format for the lossless,
portable, well compressed storage of raster images. PNG provides a
patent-free replacement for GIF and can also replace many common
uses of TIFF. Indexed-colour, gray scale, and true colour images are
supported, plus an optional alpha channel. PNG is designed to work
well in online viewing applications, such as the World Wide Web. PNG
is robust, providing both full file integrity checking and simple
detection of common transmission errors
GIF
GIF (Graphics Interchange Format) is limited to an 8-bit palette, or
256 colors. This makes the GIF format suitable for storing graphics
with relatively few colors such as simple diagrams, shapes, logos and
cartoon style images. The GIF format supports animation and is still
widely used to provide image animation effects. It also uses a lossless
compression that is more effective when large areas have a single
color, and ineffective for detailed images or dithered images.
BMP
The BMP file format (Windows bitmap) handles graphics files within
the Microsoft Windows OS. Typically, BMP files are uncompressed,
hence they are large. The advantage is their simplicity and wide as
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 9
acceptance in Windows programs. The BMP file format (Windows
bitmap) handles graphics files within the Microsoft Windows OS.
Typically, BMP files are uncompressed, hence they are large. The
advantage is their simplicity and wide as acceptance in Windows
programs.
2.1.5 Vector Formats
As opposed to the raster image formats above (where the data
describes the characteristics of each individual pixel), vector image
formats contain a geometric description which can be rendered
smoothly at any desired display size.
At some point, all vector graphics must be rasterized in order to
be displayed on digital monitors. However, vector images can be
displayed with analog CRT technology such as that used in some
electronic test equipment, medical monitors, radar displays, laser
shows and early video games. Plotters are printers that use vector
data rather than pixel data to draw graphics.
CGM
CGM (Computer Graphics Metafile) is a file format for 2D vector
graphics, raster graphics, and text. All graphical elements can be
specified in a textual source file that can be compiled into a binary file
or one of two text representations. CGM provides a means of graphics
data interchange for computer representation of 2D graphical
information independent from any particular application, system,
platform, or device.
SVG
SVG (Scalable Vector Graphics) is an open standard created and
developed by the World Wide Web Consortium to address the need for
a versatile, scriptable and all purpose vector format for the web and
otherwise. The SVG format does not have a compression scheme of its
own, but due to the textual nature of XML, an SVG graphic can be
compressed using a program such as zip.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 10
2.2 Digital Image Processing
A digital image processing allows the use of much more complex
algorithms for image processing, and hence, can offer both more
sophisticated performance at simple tasks, and the implementation of
methods which would be impossible by analog means. Some
techniques which are used in digital image processing include:
Pixelisation
Linear filtering
Principal components analysis
Independent component analysis
Hidden Markov models
Anisotropic diffusion
Partial differential equations
Self-organizing maps
Neural networks
2.3 Applications of Digital Image Processing
Some of the applications of digital image processing include:
Intelligent transportation systems
Film
Digital camera images
Medical applications
Restorations and enhancements
Digital cinema
Image transmission and coding
Colour processing
Remote sensing
High-resolution display
High-quality Colour representation
Super-high-definition image processing
Impact of standardization on image processing
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 11
Digital image processing, the manipulation of images by
computer, is relatively recent development in terms of man‟s ancient
fascination with visual stimuli. In its short history, it has been applied
to practically every type of images with varying degree of success. The
inherent subjective appeal of pictorial displays attracts perhaps a
disproportionate amount of attention from the scientists and also from
the layman. Digital image processing like other glamour fields, suffers
from myths, mis-connections, mis-understandings and mis-
information. It is vast umbrella under which fall diverse aspect of
optics, electronics, mathematics, photography graphics and computer
technology.
Several factor combine to indicate a lively future for digital
image processing. A major factor is the declining cost of computer
equipment. Several new technological trends promise to further
promote digital image processing. These include parallel processing
mode practical by low cost microprocessors, and the use of charge
coupled devices (CCDs) for digitizing, storage during processing and
display and large low cost of image storage arrays.
2.4 Fundamental Steps in Digital Image Processing
Image Acquisition
Image Acquisition is to acquire a digital image. To do so requires
an image sensor and the capability to digitize the signal produced by
the sensor. The sensor could be monochrome or color TV camera that
produces an entire image of the problem domain every 1/30 sec. the
image sensor could also be line scan camera that produces a single
image line at a time. In this case, the objects motion past the line.
Scanner produces a two-dimensional image. If the output of the
camera or other imaging sensor is not in digital form an analog to
digital converter digitizes it. The nature of the sensor and the image it
produces are determined by the application.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 12
Figure 2.1 Fundamental Steps in Image processing
Image Enhancement
Image enhancement is among the simplest and most appealing
areas of digital image processing. Basically, the idea behind
enhancement techniques is to bring out detail that is obscured, or
simply to highlight certain features of interesting an image. A familiar
example of enhancement is when we increase the contrast of an image
because “it looks better.” It is important to keep in mind that
enhancement is a very subjective area of image processing.
Image Restoration
Image restoration is an area that also deals with improving the
appearance of an image. However, unlike enhancement, which is
subjective, image restoration is objective, in the sense that restoration
techniques tend to be based on mathematical or probabilistic models
of image degradation.
Enhancement on the other hand is based on human subjective
preferences regarding what constitutes a “good” enhancement result.
For example, contrast stretching is considered an enhancement
technique because it is based primarily on the pleasing aspects it
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 13
might present to the viewer, where as removal of image blur by
applying a de-blurring function is considered a restoration technique.
Colour Image Processing
The use of color in image processing is motivated by two principal
factors. First, color is a powerful descriptor that often simplifies object
identification and extraction from a scene. Second, humans can
discern thousands of color shades and intensities, compared to about
only two dozen shades of gray. This second factor is particularly
important in manual image analysis.
Wavelets and Multi – Resolution Processing
Wavelets are the formation for representing images in various
degrees of resolution. Although the Fourier transform has been the
mainstay of transform based image processing since the late 1950‟s, a
more recent transformation, called the wavelet transform, and is now
making it even easier to compress, transmit, and analyse many
images. Unlike the Fourier transform, whose basis functions are
sinusoids, wavelet transforms are based on small values, called
Wavelets, of varying frequency and limited duration.
Wavelets were first shown to be the foundation of a powerful new
approach to signal processing and analysis called Multi-resolution
theory. Multi-resolution theory incorporates and unifies techniques
from a variety of disciplines, including sub band coding from signal
processing, quadrature mirror filtering from digital speech recognition,
and pyramidal image processing.
Compression
Compression, as the name implies, deals with techniques for
reducing the storage required saving an image, or the bandwidth
required for transmitting it. Although storage technology has improved
significantly over the past decade, the same cannot be said for
transmission capacity. This is true particularly in uses of the Internet,
which are characterized by significant pictorial content. Image
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 14
compression is familiar to most users of computers in the form of
image file extensions, such as the jpg file extension used in the JPEG
(Joint Photographic Experts Group) image compression standard.
Morphological Processing
Morphological processing deals with tools for extracting image
components that are useful in the representation and description of
shape. The language of mathematical morphology is set theory. As
such, morphology offers a unified and powerful approach to numerous
image processing problems. Sets in mathematical morphology
represent objects in an image. For example, the set of all black pixels
in a binary image is a complete morphological description of the
image.
Segmentation
Segmentation procedures partition an image into its constituent
parts or objects. In general, autonomous segmentation is one of the
most difficult tasks in digital image processing. A rugged segmentation
procedure brings the process a long way toward successful solution of
imaging problems that require objects to be identified individually.
On the other hand, weak or erratic segmentation algorithms
almost always guarantee eventual failure. In general, the more
accurate the segmentation, the more likely is to succeed.
Representation and Description
Representation and description almost always follow the output of
a segmentation stage, which usually is raw pixel data, constituting
either the boundary of a region (i.e., the set of pixels separating one
image region from another) or all the points in the region itself. In
either case, converting the data to a form suitable for computer
processing is necessary. The first decision that must be made is
whether the data should be represented as a boundary or as a
complete region. Boundary representation is appropriate when the
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 15
focus is on external shape characteristics, such as corners and
inflections.
Regional representation is appropriate when the focus is on
internal properties, such as texture or skeletal shape. In some
applications, these representations complement each other. Choosing
a representation is only part of the solution for transforming raw data
into a form suitable for subsequent computer processing . A method
must also be specified for describing the data so that features of
interest are highlighted. Description, also called feature selection,
deals with extracting attributes that result in some quantitative
information of interest or are basic for differentiating one class of
objects from another.
Object Recognition
The last stage involves recognition and interpretation. Recognition
is the process that assigns a label to an object based on the
information provided by its descriptors. Interpretation involves
assigning meaning to an ensemble of recognized objects.
Knowledge Base
Knowledge about a problem domain is coded into image processing
system in the form of a knowledge database. This knowledge may be
as simple as detailing regions of an image when the information of
interests is known to be located, thus limiting the search that has to
be conducted in seeking that information. The knowledge base also
can be quite complex, such as an inter related to list of all major
possible defects in a materials inspection problem or an image data
base containing high resolution satellite images of a region in
connection with change deletion application. In addition to guiding the
operation of each processing module, the knowledge base also
controls the interaction between modules. The system must be
endowed with the knowledge to recognize the significance of the
location of the string with respect to other components of an address
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 16
field. This knowledge glides not only the operation of each module, but
it also aids in feedback operations between modules through the
knowledge base. We implemented pre-processing techniques using
MATLAB.
2.5 Components of Image Processing System
As recently as the mid-1980s, numerous models of image
processing systems being sold throughout the world were rather
substantial peripheral devices that attached to equally substantial
host computers. Late in the 1980s and early in the 1990s, the market
shifted to image processing hardware in the form of single boards
designed to be compatible with industry standard buses and to fit into
engineering workstation cabinets and personal computers. In addition
to lowering costs, this market shift also served as a catalyst for a
significant number of new companies whose specialty is the
development of software written specifically for image processing.
Although large-scale image processing systems still are being
sold for massive imaging applications, such as processing of satellite
images, the trend continues toward miniaturizing and blending of
general-purpose small computers with specialized image processing
hardware. Figure 2.2 shows the basic components comprising a
typical general-purpose system used for digital image processing. The
function of each component is discussed in the following paragraphs,
starting with image sensing.
Figure 2.2 Components of Image Processing System
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 17
Image Sensors
With reference to sensing, two elements are required to acquire
digital images. The first is a physical device that is sensitive to the
energy radiated by the object we wish to image. The second, called a
digitizer, is a device for converting the output of the physical sensing
device into digital form. For instance, in a digital video camera, the
sensors produce an electrical output proportional to light intensity.
The digitizer converts these outputs to digital data.
Specialised Image Processing Hardware
Specialized image processing hardware usually consists of the
digitizer just mentioned, plus hardware that performs other primitive
operations, such as an arithmetic logic unit (ALU), which performs
arithmetic and logical operations in parallel on entire images. One
example of how an ALU is used is in averaging images as quickly as
they are digitized, for the purpose of noise reduction. This type of
hardware sometimes is called a front-end subsystem, and its most
distinguishing characteristic is speed.
Computer
The computer in an image processing system is a general-purpose
computer and can range from a PC to a supercomputer. In dedicated
applications, sometimes specially designed computers are used to
achieve a required level of performance, but our interest here is on
general-purpose image processing systems. In these systems, almost
any well-equipped PC-type machine is suitable for offline image
processing tasks.
Image Processing Software
Software for image processing consists of specialized modules that
perform specific tasks. A well-designed package also includes the
capability for the user to write code that, as a minimum, utilizes the
specialized modules. More sophisticated software packages allow the
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 18
integration of those modules and general-purpose software commands
from at least one computer language.
Mass Storage
Mass storage capability is a must in image processing
applications. An image of size 1024*1024 pixels, in which the
intensity of each pixel is an 8-bit quantity, requires one megabyte of
storage space if the image is not compressed. When dealing with
thousands, or even millions, of images, providing adequate storage in
an image processing system can be a challenge. Digital storage for
image processing applications fall into three principal categories:
short-term storage for use during processing, (2) on-line storage
for relatively fast recall, and (3) archival storage, characterized by
infrequent access. Storage is measured in bytes (eight bits), Kbytes
(one thousand bytes), Mbytes (one million bytes), Gbytes (meaning
giga, or one billion, bytes), and Tbytes (meaning tera, or one trillion,
bytes).
One method of providing short-term storage is computer
memory. Another is by specialized boards, called frame buffers that
store one or more images and can be accessed rapidly, usually at
video rates. The latter method allows virtually instantaneous image
zoom, as well as scroll (vertical shifts) and pan (horizontal shifts).
Online storage generally takes the form of magnetic disks or optical-
media storage. The key factor characterizing on-line storage is
frequent access to the stored data. Finally, archival storage is
characterized by massive storage requirements but infrequent need for
access. Magnetic tapes and optical disks housed in “jukeboxes” are
the usual media for archival applications.
Image Displays
Image displays in use today are mainly colour (preferably flat
screen) TV monitors. Monitors are driven by the outputs of image and
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 19
graphics display cards that are an integral part of the computer
system.
Hardcopy
Hardcopy devices for recording images include laser printers, film
cameras, heat-sensitive devices, inkjet units, and digital units, such
as optical and CD-ROM disks. Film provides the highest possible
resolution, but paper is the obvious medium of choice for written
material. For presentations, images are displayed on film
transparencies or in a digital medium if image projection equipment is
used. The latter approach is gaining acceptance as the standard for
image presentations.
Network
Networking is almost a default function in any computer system in
use today. Because of the large amount of data inherent in image
processing applications, the key consideration in image transmission
is bandwidth. In dedicated networks, this typically is not a problem,
but communications with remote sites via the Internet are not always
as efficient. Fortunately, this situation is improving quickly as a result
of optical fiber and other broadband technologies
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 20
CHAPTER 3
EXISTING METHODS
There are three existing methods
Portable Bar code readers
KReader Mobile
Pen scanners
3.1 Portable Bar Code Readers A barcode is an optical machine-readable representation of
data, which shows certain data on certain products. Originally,
barcodes represented data in the widths (lines) and the spacing of
parallel lines, and may be referred to as linear or 1D (1 dimensional)
barcodes or symbolises. They also come in patterns of squares, dots,
hexagons and other geometric patterns within images termed 2D (2
dimensional) matrix codes or symbolises. Although 2D systems use
symbols other than bars, they are generally referred to as barcodes as
well. Barcodes can be read by optical scanners called barcode readers,
or scanned from an image by special software.
These are designed to help blind people identify different
products in an extensive product database. But as shown in figure 3.1
big limitation is that it is very hard for blind users to find the position
of the bar code and to correctly point the bar code reader at the bar
code.
Figure 3.1 Barcode on a product
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 21
Figure 3.2 Barcode machine scannig unique code on product
3.2 KReader mobile
KReader Mobile runs on a cellphone and allows the user to read
mail, receipts, files and many other documents. The disadvantage of
KReader Mobile is that document must be nearly flat, placed on a
clear, dark surface and should mostly text. It accurately reads black
print on a black background but has problems recognizing colored
text or text on a colored background.
As shown in figure 3.3 it has a major advancement in portability
and functionality of print access for struggling readers and those
learning a second language. Developed under the direction of Assistive
Technology pioneer Ray Kurzweil the kReader Mobile software package
runs on a multifunction cell phone and allows users to snap a picture
of virtually any document, including mail, receipts, handouts, memos
and many other documents. Our proprietary document analysis
technology determines the words and reads them aloud to the user.
Reading in other languages is available, along with translation
between languages. This is a truly portable solution to reading on the
go, allowing users to read what they want wherever they happen to be.
Figure 3.3 KReader mobile in mobiles
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 22
3.3 Pen Scanners
A scanner is a device that captures an image from a physical
object or document to create a digital copy of it. They come in a wide
range of designs and styles, but overall their purpose is to create a
digital backup of a physical image or document. Some of the most
common models on the market are flatbed designs that have a glass
bed that you lay a document onto and then a light is used to scan the
item and create a digital copy of it.
These have become increasingly powerful, with high-resolution
models often used for photographs and illustrations. There are also
specialized scanners used to capture images from older photographic
film and slides, as well as smaller versions that can quickly scan a
business card. Pen scanners provide small options for scanning
individual lines of text, while portable models can be used to scan any
document at any location. Consider what types of items you have to
scan, and where you'll be doing most of your scanning to find the best
model for your needs.
As shown in figure 3.4 Pen scanners are reading-assistive
systems, these systems are generally designed for reading and perform
best with document images with simple backgrounds, standard fonts,
a small range of font sizes and well organized characters rather than
commercial product boxes with multiple decorative patterns.
Figure 3.4 Pen scnanners scanning a document
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 23
CHAPTER 4
PROPOSED METHOD
To overcome the problems defined in problem definitions and
also to assist blind persons to read text from those kinds of
challenging patterns and backgrounds found on many everyday
commercial products of Hand-held objects, then have to conceived of a
camera-based assistive text reading framework to track the object of
interest within the camera view and extract print text information
from the object. Proposed algorithm used in this system can effectively
handle complex background and multiple patterns, and extract text
information from both hand-held objects and nearby signage.
To overcome the problem in assistive reading systems for blind
persons, in existing system very challenging for users to position the
object of interest within the center of the camera‟s view. As of now,
there are still no acceptable solutions. This problem approached in
stages. As shown in figure 4.1 the hand-held object should be appears
in the camera view, this thesis use a camera with sufficiently wide
angle to accommodate users with only approximate aim. This may
often result in other text objects appearing in the camera‟s view (for
example, while shopping at a supermarket). To extract the hand-held
object from the camera image, this system going to develop a motion-
based method to obtain a region of interest (ROI) of the object. Then
perform text recognition only that ROI. As sshown in figure 4.2 It is a
challenging problem to automatically localize objects and text ROIs
from captured images with complex backgrounds, because text in
captured images is most likely surrounded by various background
outlier “noise,” and text characters usually appear in multiple scales,
fonts, and colors. For the text orientations, this thesis assumes that
text strings in scene images keep approximately horizontal alignment.
Many algorithms have been developed for localization of text
regions in scene images. Here we are using this method:
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 24
Edge Based Text Region Extraction
In solving the task at hand, to extract text information from complex
backgrounds with multiple and variable text patterns, here propose a
text localization algorithm that combines rule-based layout analysis
and learning-based text classifier training, which define novel feature
maps based on stroke orientations and edge distributions. These, in
turn, generate representative and discriminative text features to
distinguish text characters from background outliers.
Figure 4.1 Extraction of text from a product
By using this Edge Based Text Region Extraction algorithm we can
extract the text from the desired image. Now the extracted text can be
conerted into speech by using text to speech synthesizer.
Figure 4.1.1 Image with multiple background and multiple fonts
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 25
DESCRIPTION
4.1 Algorithm For Edge Based Text Region Extraction
The basic steps of the edge-based text extraction algorithm are
given below, and diagrammed in Figure 5.1. The details are explained
in the following sections.
1. Create a Gaussian pyramid by convolving the input image with
a Gaussian kernel and successively down-sample each direction
by half.
2. Create directional kernels to detect edges at 0, 45, 90 and 135
orientations.
3. Convolve each image in the Gaussian pyramid with each
orientation filter.
4. Combine the results of step 3 to create the Feature Map.
5. Dilate the resultant image using a sufficiently large structuring
element to cluster candidate text regions together.
6. Create final output image with text in white pixels against a
plain black background.
Figure 4.1.2 Basic Block diagram for edge based text extraction
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 26
The procedure for extracting a text region from an image can be
broadly classified into three basic steps:
1. Detection of the text region in the image.
2. Localization of the region.
3. Creating the extracted output character image.
4.1.1 Detection
This section corresponds to Steps 1 to 4 of 5.1. Given an input
image, the region with a possibility of text in the image is detected. A
Gaussian pyramid is created by successively filtering the input image
with a Gaussian kernel of size 3x3 and downsampling the image in
each direction by half. Down sampling refers to the process whereby
an image is resized to a lower resolution from its original resolution. A
Gaussian filter of size 3x3 will be used as shown in Figure 5.2. Each
level in the pyramid corresponds to the input image at a different
resolution. A sample Gaussian pyramid with 4 levels of resolution is
shown in Figure 5.3. These images are next convolved with directional
filters at different orientation kernels for edge detection in the
horizontal (0°), vertical (90°) and diagonal (45°, 135°) directions. The
kernels used are shown in Figure 5.5.
Figure 4.1.3 Default filter returned by the fspecial Gaussian function
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 27
Figure 4.1.4 Sample Gaussian pyramid with 4 levels
Figure 4.1.5 Each resolution image resized to original image size
Figure 4.1.6 The directional kernels
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 28
Figure 4.1.7 Sample image from Figure 3 after convolution with each
directional kernel Note how the edge information in each direction is
highlighted
Figure 4.1.8 Sample resized image of the pyramid after convolution
with 0º kernel
After convolving the image with the orientation kernels, a
feature map is created. A weighting factor is associated with each pixel
to classify it as a candidate or non candidate for text region. A pixel is
a candidate for text if it is highlighted in all of the edge maps created
by the directional filters. Thus, the feature map is a combination of all
edge maps at different scales and orientations with the highest
weighted pixels present in the resultant map.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 29
4.1.2 Localization
This section corresponds to Step 5 of 5.1. The process of
localization involves further enhancing the text regions by eliminating
non-text regions. One of the properties of text is that usually all
characters appear close to each other in the image, thus forming a
cluster. By using a morphological dilation operation, these possible
text pixels can be clustered together, eliminating pixels that are far
from the candidate text regions. Dilation is an operation which
expands or enhances the region of interest, using a structural element
of the required shape and/or size. The process of dilation is carried
out using a very large structuring element in order to enhance the
regions which lie close to each other. In this algorithm, a structuring
element of size [7x7] has been used. Figure 5.8 below shows the result
before and after dilation.
Figure 4.1.9 (a) Before dilation (b) After dilation
The resultant image after dilation may consist of some non-text
regions or noise which needs to be eliminated. An area based filtering
is carried out to eliminate noise blobs present in the image. According
to only those regions in the final image are retained which have an
area greater than or equal to 1/20 of the maximum area region.
4.1.3 Character Extraction
This section corresponds to Step 6 of 5.1. The common OCR
systems available require the input image to be such that the
characters can be easily parsed and recognized. The text and
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 30
background should be monochrome and background-to-text contrast
should be high. Thus this process generates an output image with
white text against a black background. A sample test image and its
resultant output image from the edge based text detection algorithm
are shown in Figures 9(a) and 9(b) below.
Figure 4.1.10 (a) Original image (b) Result
4.2 IMPLEMENTATION
4.2.1 Software Requirement-MATLAB
Matlab is a high performance language for technical computing.
It integrates computation visualization and programming in an easy to
use environment. Matlab stands for matrix laboratory. It was written
originally to provide easy access to matrix software developed by
LINPACK (linear system package) and EISPACK (Eigen system
package) projects. Matlab is therefore built on a foundation of
sophisticated matrix in which the basic element in matrix that does
not require pre dimensioning which to solve many technical
computing problem especially those with matrix and vector
formulations, in a fraction of time.
Matlab features of applications specific solutions called toolbox.
Very important to most users of Matlab, toolboxes allow learning and
applying specialized technology. These are comprehensive collections
of Matlab functions that extend the Matlab environment to solve
particular classes of problems. Areas in which toolboxes are available
include signal processing, control system, neural networks, fuzzy
logic, wavelets, simulation and many others.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 31
4.2.2 Typical Uses of MATLAB
The typical using areas of Matlab are:
1. Math and computation
2. Algorithm and development
3. Data acquisition
4. Data analysis, exploration and visualization
5. Scientific and engineering graphics
6. Modelling
7. Simulation
8. Prototyping
9. Application development
10. Including graphical user interface building
Matlab is an interactive system whose basic data element is an
array that does not require dimensioning. This allow you to solve
many technical computing problems, especially those with matrix and
vector formulations, in a fraction of the time it would take to write a
program in a scalar non-interactive language such as C or FORTRON.
Matlab features a family of add-on application-specific solutions
called toolbox. Very important to most users of Matlab, toolbox allows
you to learn and apply specialized technology. Toolbox is
comprehensive collections of Matlab functions that extend the Matlab
environment to solve particular classes of problems. Areas in which
toolboxes are available include signal processing, control systems,
neural networks, fuzzy logic, wavelets, simulation and many others.
4.2.3 Features of MATLAB
1. Advance algorithm for high performance numerical
computation, especially in the field matrix algebra.
2. A large collection of predefined mathematical functions and the
ability to define one‟s own functions.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 32
3. Two- and three dimensional graphics for plotting and displaying
datass
4. Powerful, matrix or vector oriented high level programming
language for individual applications.
5. Toolboxes available for solving advanced problems in several
application areas.
4.2.4 Basic Building Blocks of MATLAB
The basic building block of Matlab is matrix. The fundamental
data type is the array. Vectors, scalars, real matrices and complex
matrix are handled as specific class of this basic data type. The built
in functions are optimized for vectors operations. No dimension
statements are requiring Med for vectors of arrays.
4.3 MATLAB Window
The Matlab works based on five windows
1. Command window
2. Work space window
3. Current directory window
4. Command history window
5. Editor window
6. Graphics window
7. Online-help window
4.3.1 Command Window
The command window is where the user types Matlab
commands and expressions at the prompt (>>) and where the output
of those commands is displayed. It is opened when the application
program is launched. All commands including user-written programs
are typed in this window at Matlab prompt for execution.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 33
4.3.2 Work Space Window
Matlab defines the workspace as the set of variables that the
user creates in a work session. The workspace browser shows these
variables and some information about them. Double clicking on a
variable in the work space browser launches the array editor, which
can be used to obtain information.
4.3.3 Current Directory Window
The current directory tab shows the contents of the current
directory, whose path is shown in the current directory window. For
example, in the windows operating system the path might be as
follows: c\matlab\work, indicating that directory “work” is a sub
directory of the main directory “Matlab”, which is installed in drive c.
Clicking on the arrow in the current directory window shows a list of
recently used paths. Matlab uses a search path to find M-files and
other Matlab related files. Any file run in Matlab must reside in the
current directory that is on search path.
4.3.4 Command History Window
The command history window contains a record of the
commands a user has entered in the command window, including
both current and previous Matlab sessions. Previously entered Matlab
commands can be selected and re-executed from the command history
window by right clicking on a command. This is useful to select
various options in addition to executing the commands and is useful
feature when experimenting with various commands in work sessions.
4.3.5 Editor Window
The Matlab editor is both a text editor specialized for creating
M-files and a graphical Matlab debugger. The editor can appear in a
window by itself, or it can be a sub window in the desktop. In this
window one can write, edit, create and save programs in files called M-
files.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 34
Matlab editor window has numerous pull-down menus for tasks
such as saving, viewing and debugging files. Because it performs some
simple checks and also uses color to differentiate between various
elements of code, this text editor is recommend as the tool of choice
for writing and editing M-files.
Matlab editor window has numerous pull-down menus for tasks
such as savings, viewing and debugging files. Because it performs
some simple checks and also uses color to differentiate between
various elements of code, this editor is recommended as the tool of
choice for writing and editing M-files.
4.3.6 Graphics or Figure Window
The output of all graphic commands typed in the command window is
seen in this window.
4.3.7 Online Help Window
Matlab provides online help for its built in functions and
programming language constructs. The principal way to get help
online is to use the Matlab help browser, opened as a separate window
either by clicking on the question mark symbol on the desktop
toolbar, or by typing help browser at the prompt in the command
window. The help browser is a web browser integrated into the Matlab
desktop that displays a hypertext mark up language documents. The
help browser consists of two panes, the help navigator plane, used to
find information, and the display plane, used to view this information.
Self-explanatory tabs other than navigator plane are used to perform a
search.
4.4 MATLAB Files
Matlab has three types of files for storing information. They are M-files
and MAT- files.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 35
4.4.1 M-Files
These are standard ASCII text file with „m‟ extension to the file
name and creating own matrices using m-files which are text files
containing Matlab code. Matlab editor or another editor is used to
create a file containing the same statements which are typed at the
Matlab command line and save the file under a name that ends in m.
There are two types of m-files.
4.4.2 Script Files
M-files with a set of Matlab commands in it and is executed by
typing name of file on the command line. These files work on global
variables currently in that environment.
4.4.3 Function Files
A function file is also an M-file except that the variables in a
function file are all local. This type of files begins with a function
definition line.
4.4.4 Mat-Files
These are binary files with .mat extension to that files created
by Matlab when the data is saved. The data written in a special format
that only Matlab can read. These are located into Matlab with load
command.
4.5 MATLAB System
The Matlab system consists of five main parts.
4.5.1 Development Environment
This is the set of tools and facilities that help you see use
Matlab functions and files. Many of these tools are graphical user
interface. In includes the Matlab desktop and command window, a
command history, an editor and debugger, and browser for viewing
help, the work space, files and search path.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 36
4.5.2 MATLAB Mathematical Function
This is a vast collection of computational algorithms ranging from
elementary functions like sum, sine, cosine and complex arithmetic to
more many functions like matrix inverse, martrix Eigen values, Bessel
functions and fast Fourier transforms.
4.5.3 MATLAB Language
This is a high level matrix or array language with control flow
statements, functions, data structures, input or output and object
oriented programming features. It allows both programming in the
small to rapidly create quick and dirty throw-away programs, and
programming in the large to create complete large and complex
application programs.
4.5.4 GUI Construction
Matlab has extensive facilities for displaying vectors and
matrices as graphs, as well as annotating and printing these graphs.
It includes high-level functions for two-dimensional and three-
dimensional data visualization, image processing, and animation and
presentation graphics. It also includes low-level functions that allow
you to fully customize the appearance of graphics as well as to build
complete graphical user interface on your Matlab applications.
4.5.5 MATLAB Application Interface
It is a library that allows you to write C and FORTRAN programs
that interact with Matlab. It includes facilities for calling routines from
Matlab, calling Matlab as a computational engine and for reading and
writing MAT-files.
4.6 MATLAB Working Environment
4.6.1 MATLAB Desktop
Matlab desktop is the main Matlab application window. The
desktop contains five sub windows, the command window, workspace
browser, current directory window, command history window, and one
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 37
or more figure windows which are shown only when the user displays
a graphic.
Figure 4.2 Representation of MATLAB Window
The command window is where the user types Matlab
commands and expressions at the prompt (>>) and where the output
of those commands is displayed. Matlab defines the workspace as the
set of variables that the user creates in a work session. The workspace
shows these variables and some information about them. Double
clicking on a variable in the workspace browser launches the array
editor, which can be used to obtain information and income instances
edit certain properties of the variable.
The current directory tab above the workspace tab shows the
contents of the current directory, whose path is shown in the current
directory window. Clicking on the arrow in the current directory
window shows a list of recently used paths. Clicking on the button to
the right of the window allows the user to change the current
directory.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 38
MATLAB uses a search path to find M-Files and other related
files, which are organize in directories in the computer file system.
Any file run in Matlab must reside in the current directory that is on
search path. By default, the files supplied with MATLAB and math
works toolboxes are included in the search path. The easiest ways to
see which directories is soon the search path or add to modify as
search path is to select set path from the file menu the desktop and
then use the set path dialog box. It is good practice to add any
commonly used directories to the search path to avoid repeatedly
having the change the current directory.
The command history window contains a record of the
commands a user has entered in the command window including both
current and previous MATLAB sessions. Previously entered MATLAB
commands can be selected and re-executed from the command history
window by right clicking on a command or sequence of commands.
4.6.2 Using MATLAB Editor To Create M-Files
The Matlab editor is both a text editor specialized for creating
m-files and a graphical Matlab debugger. The editor can appear in
window by itself, or it can be a sub window in the desktop. M-files are
denoted by the extension .m.
The Matlab editor window has numerous pull-down menus for
tasks such as savings, viewing and debugging files. Because it
performs some simple checks and also uses color to differentiate
various elements of code, this text editor is recommended as the tool
of choice for writing and editing m-functions.
To open the editor type edit at the prompt opens the m-file
filename.m in an editor window is ready for editing. As noted that the
file must be in the current directory or in a directory in the search
path.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 39
4.6.3 Getting Help
The principal way to get help online is to use the Matlab help
browser opened as a separate window either by clicking on the
questions mark symbol on the desktop toolbar or by typing help
browser at the prompt in the command window. The help browser is a
web browser integrated into the Matlab desktop that displays a
hypertext mark up language documents. The help browser consists of
two windows, the help navigator window, used to find information and
the display window, used to view the information. Self-explanatory
tabs other than navigator pane are used to perform a search.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 40
CHAPTER 5
RESULTS
When program is executed, running GUI is appeared on the command
window as shown fig5.1.
Figure 5.1 Running GUI
After appearing, GUI window is opened by which we select a desired
picture as shown in fig 5.2
Figure 5.2 Popup window for selecting picture
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 41
Open option indicates selection of a desired picture of formats such as
jpg, png etc.
Figure 5.3 Browsing for a picture
After selecting the desired picture, it will be displayed on GUI axes as
shown in fig5.4
Figure 5.4 Loading of a Picture
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 42
Selected image directory is displayed on edit box as shown in fig5.5
Figure 5.5 Directory of the picture
After selecting the Close option, Finished finding picture will be
appeared on command window.
Figure 5.6 Finished finding of picture from directory
The text extracted from the image is converted into .txt file format and
result is displayed on Notepad window as shown in fig 5.7
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 43
Figure 5.7 Converted .txt file and result in notepad
The previously selected image will again appear in figure window to
compare with the text result as shown in fig 5.8
Figure 5.8 Appeared picture and output text in notepad
The extracted text from the desired image iss converted into speech as
shown in fig 5.9
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 44
Figure 5.9 Extracted text to speech
5.1 ADVANTAGES
Advantages
It is a portable device.
The texts with complex backgrounds can be recognize and
extract.
It will enhance independent living and social self-sufficiency for
blind people.
Automatic detection.
Accuracy and flexibility.
Applications
It is applicable for blind people to know about the products.
It is also applicable for them to read signages.
5.2 CONCLUSION AND FUTURESCOPE
Conclusion
In this project, we have described a simulation method to read
printed text on hand-held objects for assisting blind people. In order
to solve the common aiming problem for blind users, we have
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 45
proposed a region of interest method to detect the object, while the
blind user simply shakes the object for a couple of seconds. This
method can effectively distinguish the object of interest from
background or other objects in the camera view. To extract text
regions from complex backgrounds, we have proposed an edge based
text region extraction. The corresponding feature maps estimate the
global structural feature of text at every pixel. Off-the-shelf OCR is
used to perform word recognition on the localized text regions and
transform into audio output for blind users by using text to speech
synthesizer.
Future scope
We will also extend our algorithm to handle non-horizontal text
strings. Furthermore, we will address the significant human interface
issues associated with reading text by blind users. This will be done
by eliminating the below disadvantages
It is difficult to recognize the text from images which are not flat
using this process.
It cannot handle non-horizontal text strings.
Assistive text and product label reading from hand-held objects for blind persons using
MATLAB
Department of ECE Page 46
REFERENCES
[1]. Chucai Yi, Yigli Tian. Aries Arditi, “Portable Camera-Based
Assistive Text and Product Label Reading From Hand-Held
Objects for Blind Persons,” In IEEE Transactions on
mechatronics, Vol. 19, No. 3, June 2014.
[2]. L. Ma, C. Wang, and B. Xiao, “Text detection in natural images
based on multi-scale edge detection and classification,” in Proc.
Int. Congr. Image Signal Process., 2010, vol.4, pp. 1961–1965.
[3]. C. Yi and Y. Tian, “Text string detection from natural scenes by
structure based partition and grouping,” IEEE Trans. Image
Process., vol. 20, no. 9, pp. 2594–2605, Sep. 2011.
[4]. X. Chen and A. L. Yuille, “Detecting and reading text in natural
scenes,” In CVPR, Vol. 2, pp. II-366 – II-373, 2004.
[5]. X. Chen, J. Yang, J. Zhang and A. Waibel, “Automatic detection
and recognition of signs from natural scenes,” In IEEE
Transactions on image processing, Vol. 13, No. 1, pp. 87-99,
2004.
[6]. “KReader Mobile User Guide”, knfb Reading Technology Inc,
http://www.knfbReading.com
[7]. J. Liang, D. Doermann, and H. Li, Camera-based analysis of text
and documents: a survey. In International Journal of Document
Analysis and Recognition (IJDAR), No. 2-3, pp. 84-104, 2005.
[8]. N. Nikolaou and N. Papamarkos, “Color Reduction for Complex
Document Images,” In International Journal of Imaging Systems
and Technology, Vol.19, pp.14-26, 2009.
[9]. ScanTalker, Bar code scanning application to help Blind Identify
over one million
products,http://www.freedomscientific.com/fs_news/PressRoom
/en/2006/ScanTalker2-Announcement_3-30-2006.asp