mohamed salman ismail gadit - thesis.pdf
TRANSCRIPT
Optical Character Recognition (OCR) based Business Card
Reader on Android Operating System
Mohamed Salman Ismail Gadit
April 5, 2013
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
BACHELOR OF ENGINEERING
NATIONAL UNIVERSITY OF SINGAPORE
i
Abstract
This project deals with the use of Optical Character Recognition (OCR) technology for effective translation of
business cards that are photo-captured under normal environmental noise. The scope of this project branches out
into improving the current OCR engine, removing environmental noise with effective pre-processing, stream-
lining usable information after conversion by effective post-processing and putting all these features together in
a user-friendly mobile application for the end user.
The report starts with a detailed timeline for the project, listing out various short term and long term goals
on the road towards completion, a literature review that highlights the need for such a technology, quantitative
and qualitative improvements over competitor apps and justification behind the choice of mobile platform that
is being used. The principles and guides to be adhered to, have been highlighted along with relevant work
which has been accomplished. The review concludes with a summary of the current improvements possible
over competing technologies.
The next part highlights contributions accomplished during the course of project development. Short term
and long term goals have been discussed in detail. The final vision for the product, consisting of both technical
and directional goals, has been discussed and justified. The next few chapters highlight individual contributions
to the main application, along with the challenges faced and solutions to overcome them. The results of the
applications performance as compared to competitor benchmarks has been documented and explained. Various
development and design bottlenecks observed have been listed out and improvements have been suggested
accordingly for future iterations. The report concludes with a section on future prospects for the app.
ii
Acknowledgements
This Final Year Project could not have been taken on and completed in its current state without the help and
contribution of the following people, who have from time to time provided direction, critical advice, viable
alternatives and praise. I’m truly grateful for their assistance and contribution towards the culmination of this
final year project.
Our supervisor, Professor Ko Chi Chung - Prof Ko has played a pivotal role in the way this project has
worked out its course. Right from the beginning, he gave us the imaginative freedom to take the resources that
were given to us and apply it to any imaginable extent. Through each stage of assessment, he gave us his critical
opinions and advice on the way we were crafting this mobile application and implementing the innovations
mentioned in this paper. While giving us the freedom, Prof Ko also helped us tie our ideas down in the core
of practicality by reminding us about constraints and suggesting ways to tackle them. His supervision of the
project is truly appreciated.
My team mates - Arnab Ghosh, Aravindh Ravishankar, Varun Ganesh - Without these gentlemen, this
project would not stand on its own anything like its present-day iteration. Right from day one, my team mates
and I have worked together in a cohesive unit to come up with a wide array of viable, and sometimes extensive,
visions for the way this project was going to turn out. Bouncing ideas off each other, we were able to distill out
the best possible option for us given our practical constraints of time and resources. My team mates have also
helped to motivate me during periods where my results weren’t as promising and my progress was slow. It has
been my pleasure to work with these gentlemen realizing a vision that we shared together, right from the start,
into the app that has resulted from countless hours that we have worked on it.
National University of Singapore - My gratitude to the university for giving me the opportunity to work on
this project.
iii
My examiner, Professor Lawrence Wong - While working towards a larger goal, it is often possible to lose
focus on the smaller issues at hand. I value Prof Wong’s valuable and critical advice during CA2 where he
pointed out the aspects of the project that needed a closer attention. His recommendations helped me take a
closer look at the project and strengthen some aspects of the presentation as well.
To my family and friends - Who stood by me, encouraged me and brought me to the end of a wonderful
journey both in university, and in this project.
Thank you, one and all.
iv
Contents
Abstract ii
Acknowledgements iii
List of Figures ix
List of Tables xii
List of Abbreviations xiii
Part I - Literature Review 1
1 Introduction 2
1.1 Need for OCR in business cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Need for mobile OCR business card readers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Need for offline mobile OCR business card readers . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Project goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Review 5
2.1 Application Oriented OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Tesseract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Tesseract Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Cube Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Image Processing Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.1 AForge.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.2 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.3 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.4 Comparison of Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
v
2.7 Current Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7.1 Google Goggles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7.2 OCR Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.7.3 ABBYY Business Card Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Part II - Technical Details 18
3 Project Decision 19
3.1 Understanding the Tesseract OCR Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Windows WPF and Windows Phone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Moving to Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Application Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Image Processing 23
4.1 Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Smoothing Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.1 Homogeneous Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 Gaussian Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.3 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.4 Bilateral Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Combining Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 ScopeMate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Image Segmentation 31
5.1 Background Colour Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.1 Objectives of Background Colour Segmentation . . . . . . . . . . . . . . . . . . . . . . 32
5.1.2 Alternatives Explored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.3 Background colour Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 34
vi
5.2 Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.1 Objectives of Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.2 Alternatives Explored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.3 Text Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.1 Objectives of Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.2 Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.3 Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4 Region of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6 Optical Character Recognition in Android 48
6.1 Managing Images in Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.2 Integrating Tesseract with Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 Multi-threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.4 Using the CUBE libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Part III - Results 56
7 Segmentation Results 57
7.1 Background Colour Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.1.1 Results of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.1.2 Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.2 Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2.1 Results of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2.2 Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8 Performance Results 62
8.1 Resolution vs. Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
vii
8.2 Resolution vs. Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
9 App Results 67
9.1 NUS Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.1.1 Clear Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.1.2 Unclear Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9.2 External Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2.1 Clear Light Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2.2 Unclear Light Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.2.3 Clear Dark Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.2.4 Unclear Dark Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.2.5 Colored Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
9.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
10 Conclusion 77
viii
List of Figures
1 OCR technology patent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Tesseract Working Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Importance of Adaptive Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Original Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Using only Tesseract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6 Using Tesseract and Cube Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7 Android OS running ”Jellybean” (version 4.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8 Google Goggles OCR Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
9 OCR Test app being used in continuous preview mode . . . . . . . . . . . . . . . . . . . . . . 16
10 ABBYY Business Card Reader Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
11 Android Native Development Kit [35] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
12 Scope application workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
13 Application of the Brightness Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
14 Application of the Homogeneous Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
15 1D Gaussian Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
16 Application of the Gaussian Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
17 Application of a Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
18 Application of a Bilateral Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
19 ScopeMate main screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
20 Image Segmentation Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
21 Background Segmentation Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
22 NUS card used for this example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
23 Histogram of NUS card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
24 Image masking applied to histogram bin from 112 to 127 . . . . . . . . . . . . . . . . . . . . . 36
25 Image masking applied to histogram bin from 240 to 255 . . . . . . . . . . . . . . . . . . . . . 37
ix
26 Contour analysis applied to histogram bin from 112 to 127 . . . . . . . . . . . . . . . . . . . . 37
27 Contour analysis applied to histogram bin from 240 to 255 . . . . . . . . . . . . . . . . . . . . 38
28 Workflow for Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
29 Business card used to demonstrate Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . 40
30 After applying Median Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
31 After applying adaptive threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
32 After applying strong dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
33 After applying weak erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
34 Full contour analysis of card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
35 Text segmentation without clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
36 Clustering feedback loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
37 Clustering algorithm issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
38 Clustering algorithm flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
39 Text segmentation with clustering applied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
40 Region of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
41 Bitmap handler - high resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
42 Bitmap handler - medium resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
43 Bitmap handler - low resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
44 Scope integrated with Tesseract and OpenCV libraries . . . . . . . . . . . . . . . . . . . . . . . 52
45 Single-threaded model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
46 Multi-threaded model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
47 Tesseract with OSD segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
48 Asynchronous loading of CUBE libraries in Scope . . . . . . . . . . . . . . . . . . . . . . . . 55
49 Effect of CUBE libraries: (a) Original Text (b) Only Tesseract (c) CUBE and Tesseract together 55
50 Working test case: Background Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
51 Boundary test case: Background Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 57
x
52 Failing test case: Background Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
53 Working test case: Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
54 Boundary test case: Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
55 Failing test case: Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
56 Card used for performing resolution testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
57 Graph of resolution testing for speed results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
58 Graph of resolution testing for accuracy results . . . . . . . . . . . . . . . . . . . . . . . . . . 65
59 Image used for NUS Clear Card test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
60 Image used for NUS Unclear Card test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
61 Image used for External Card with Light Background . . . . . . . . . . . . . . . . . . . . . . . 70
62 Image used for External Unclear Card with Light Background . . . . . . . . . . . . . . . . . . 71
63 Image used for External Clear Card with Dark Background . . . . . . . . . . . . . . . . . . . . 72
64 Image used for External Unclear Card with Dark Background . . . . . . . . . . . . . . . . . . . 73
65 Image used for External Card with Mixed Background . . . . . . . . . . . . . . . . . . . . . . 74
66 Graph of accuracy for NUS cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
67 Graph of accuracy for external cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
xi
List of Tables
1 Relationship between pixel size and character accuracy . . . . . . . . . . . . . . . . . . . . . . 9
2 Effect of text segmentation on result accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 Resolution vs. Time Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4 NUS Clear Card Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5 NUS Unclear Card Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 External Card with Light Background Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 External Unclear Card with Light Background Results . . . . . . . . . . . . . . . . . . . . . . 71
8 External Clear Card with Dark Background Results . . . . . . . . . . . . . . . . . . . . . . . . 72
9 External Unclear Card with Dark Background Results . . . . . . . . . . . . . . . . . . . . . . . 73
10 External Card with Mixed Background Results . . . . . . . . . . . . . . . . . . . . . . . . . . 74
xii
List of Abbreviations
ABBYY - A Russian software company, headquartered in Moscow, that provides optical character recognition,
document capture and language software for both PC and mobile devices BMP - Bitmap Image File Format
GUI - Graphical User Interface
HP - Hewlett-Packard
ICR - Intelligent Character Recognition
JNI - Java Native Interface
NCut - Normalized Cut
NDK - Native Development Kit
OCR - Optical Character Recognition
OS - Operating System
OSD - Orientation and Screen Detection
ROI - Region of Interest SDK - Software Development Kit
TIFF - Tagged Image File Format
UTF-8 - Universal Character Set Transformation Format - 8-bit
WPF - Windows Presentation Foundation
xiii
Part I.
Literature Review
1
1 Introduction
It is the 21st century and the human species is becoming increasingly dependent on the devices that it built.
Transistor count is still following an upward trend on Moores law and the processing power of devices have
never been better [1].
Human-machine interaction has conventionally been thought of as a one-way input system where the human
understands the machine and gives it explicit commands to function. However, with the current increasing trend
in human-machine interaction and the increasing processing power of new age devices, the need for devices
to understand the human world too is becoming increasingly important. Optical Character Recognition (OCR)
technology tries to bridge this gap by giving devices the power to understand characters and languages from the
human world.
OCR technology was first used by libraries for historic newspaper digitization projects in the early 1990s.
An initial experiment at the British Library with the Burney collection and a cooperative project in Australia
(the ACDP) with the Ferguson collection were both considered unsuccessful largely due to the difficulties
with OCR technology and historic newspapers[2]. Fast-forwarding two decades, OCR software accuracy has
improved drastically and they are being currently used in a wide array of applications such as data entry, number
plate recognition, assisting the visually impaired, importing information from business cards, etc [3].
1.1 Need for OCR in business cards
As already mentioned, OCR software had initially been developed to digitize newspapers and library books.
The images of documents that were digitized by this process were obtained using commercial scanners, where
the light intensity distribution was uniform and the image consisted of standard black font styles printed on a
white background. However, once the number of people who exchanged business cards increased, there was an
increasing need to introduce OCR for business cards too. Chip Cutter, editor at LinkedIn, writes that although
the use of paper has drastically decreased in the current day and age, the convention of swapping business cards
at the end of a conversation still prevails high even among the tech savvy attendees of TED 2013 (Technology,
Entertainment and Design global conference) [4].
2
He continues to explain in his article on why people might still prefer swapping business cards instead of
sharing contacts on their mobile phones and thus puts forward four points to support his argument: business
cards are easy to use, quick to share, have a small learning curve and showcase the owners creativity. However,
one of the main problems with swapping business cards is maintainability and searching. The business cards
tend to wear out with time and are very difficult to search through during the time of need [5]. Thus, the
problem here is not people wanting to move away from the use of conventional business cards, but rather,
the need for a solution to maintain and manage all of their business cards over time. This signals a need to
digitize business cards using OCR applications in order to make them more maintainable and to make searching
through them easier. Some of the challenges in using OCR applications on business cards as compared to their
earlier implementation to read documents include: light distribution is non-uniform because of the environment
in which the photo of the card is captured, and the font, colours and arrangement of letters do not follow a
standard pattern across business cards.
1.2 Need for mobile OCR business card readers
Many OCR business card readers have been developed as standalone software solutions on the PC (Personal
Computer) [6]. Recently, students at the National University of Singapore, B.K. Ng & Jackson yeow, have
also developed an OCR system named SCORE and a business card reader B.SCORE on the PC platform [7].
B.SCORE was a follow up of the initial SCORE platform that was coded in C#, using the Windows Visual Basic
architecture.
However in OCR applications on the PC, the images have to be uploaded manually or should be captured
directly using the PCs webcam. The quality of OCR depends on the quality of the image captured [8]. The image
quality of a standard desktop webcam is only around 1.3 Megapixel whereas an average smartphone camera can
offer a higher range of image quality ranging from 5 megapixels and upward. Therefore the accuracy of OCR
on images captured using smartphones would be better than those captured using webcams.
World-wide smartphone users have already topped 1 billion, and this trend is only set to increase In the near
future [9]. Moreover since most of the contacts are maintained on the phone, capturing a photo using a camera,
3
then applying OCR on a PC, and then transferring back the recognized data to the phone might be cumbersome
when compared to carrying out this entire process chain on the smartphone directly. This would also give the
user the ability to digitize business cards anywhere, as opposed to only in the presence of a personal computer.
1.3 Need for offline mobile OCR business card readers
Most of the current mobile OCR applications need internet access in order to perform character recognition. A
detailed comparison of feature sets available in current mobile OCR business card readers has been performed
under the literature review.
However, this constrains users to only use their mobile OCR applications in the presence of an active internet
connection, thereby limiting the mobile aspect of the OCR application itself. Expecting users to be connected
to the internet at all times when they might exchange business cards, might be unrealistic. However with the
increasing processing power in todays smartphones [10], the ability to realize the entire OCR process by just
using the mobile phones processor can be a reality. Therefore, a robust, efficient, offline and mobile OCR
business card application would be the solution to realizing the business card digitizing needs of the current day
smartphone user.
1.4 Project goal
Therefore, the goal of this project is to develop a robust, efficient and offline mobile application that uses Optical
Character Recognition to automatically digitize business cards into maintainable and searchable mobile phone
contacts.
The team has planned to call this application Scope. In this project the team’s goal is to achieve a minmum
of 90% accuracy for NUS cards, and 75% accuracy for any external cards.
4
2 Literature Review
Optical Character Recognition, abbreviated to OCR, is the mechanical or electronic conversion of scanned
images of handwritten, typewritten or printed text into machine-encoded text. It is widely used as a form of data
entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of
printed records [11].
It is a common method of digitizing printed texts so that they can be electronically searched, stored more
compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and
text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
Figure 1: OCR technology patent
The OCR technology was developed in the 1920s and remains an area of interest and concentrated research
to date. Systems for recognizing machine-printed text originated in the late 1950s and there has been widespread
use of OCR on desktop computers since the early 1990s [12].
The OCR technology enables users to liberate large amount of information held captive in hard copy form.
Once converted to electronic form, information can be edited and extracted easily according to users needs.
5
2.1 Application Oriented OCR
Since OCR technology has been more and more widely applied to paper-intensive industry, it is facing more
complex image environments in the real world. For example: complicated backgrounds, degraded-images,
heavy-noise, paper skew, picture distortion, low-resolution, disturbed by grid & lines, text image consisting of
special fonts, symbols, glossary words and etc. All the factors affect OCR products stability in recognition
accuracy.
In recent years, the major OCR technology providers began to develop dedicated OCR systems, each for
special types of images. They combine various optimization methods related to the special image, such as
business rules, standard expression, glossary or dictionary and rich information contained in colour images, to
improve the recognition accuracy.
Such strategy to customize OCR technology is called Application-Oriented OCR or ”Customized OCR”[13],
widely used in the fields of Business-card OCR, Invoice OCR, Screenshot OCR, ID card OCR, Driver-license
OCR or Auto plant OCR, and so on. For the purpose of this project, the application, Scope, will make use of an
open-source OCR engine, called Tesseract, that can be ported onto a mobile phone
2.2 Tesseract
Tesseract is a free, open-source optical character recognition engine which can be used across various operating
systems. The engine was originally developed as proprietary software at Hewlett-Packard between 1985 and
1995, but it was then released as open source in 2005 by Hewlett-Packard and University of Nevada, Las Vegas
[14]. Tesseract development has been sponsored by Google since 2006[15].
The Tesseract algorithm is illustrated in Figure 2 [16]. A grayscale or colour image will be loaded into
the engine and processed. The program takes .tiff (TIFF) and .bmp (BMP) files but plug-ins can be installed
to allow processing of other image extensions. As there is no rectification capability, the input image should
ideally be a flat image from a scanner.
In the adaptive thresholding process, the engine performs the reduction of a grayscale image to a binary
image. The algorithm assumes that there are foreground (black) pixels and background (white) pixels and
6
Figure 2: Tesseract Working Flowchart
Figure 3: Importance of Adaptive Thresholding
7
calculates the optimal threshold that separates the two pixel classes such that the variance between them is
minimal .
Following that, Tesseract searches through the image and identifies foreground pixel and marks them as
potential character. Lines of texts are found by analysing the image spaces adjacent to the potential character.
For each line, the baselines are found and Tesseract examines them to find the appropriate height across the line.
Characters that lie out of this appropriate height or are not of uniform width are reclassified to be processed in
an alternate manner[17].
After nding all of the possible characters in the document, Tesseract does word recognition word by word,
on a line by line basis. Words are then passed through a contextual and syntactical analyser which will then
produce an editable .txt file in the tesseract folder. The tesseract folder is where all the source codes are located
and where the main engine is run. In addition Tesseract is also able to undergo training in order to recognize
special characters, other than alphabets. The Scope application would need Tesseract to recognize numbers and
certain symbols such as @ and +.
2.3 Tesseract Shortcomings
Despite the numerous contributions from many developers over the years, Tesseract performance suffers from
many shortcomings and restraints. OCR accuracy falls drastically if the image processed has a coloured back-
ground. The design and layout of the name cards and websites affects the precision adversely.
Another challenge faced by Tesseract is text size of the images. The Tesseract Frequently Asked Questions
(FAQ) [18] page states the noise-reduction mechanisms can and will hinder the processing of small text sizes.
In order to achieve notable results, text sizes should typically be around 20 pixels and any text under 8 pixels
will be recognized as noise and filtered off.
Table 1 illustrates the relationship between pixels size and character accuracy.
Essentially, Tesseract is a raw skeleton OCR engine with the core feature of text recognition. It does not
come with any GUI, consists of no page layout analysis, no output formatting and is lacking of additional
features.
8
Image Dimension (pixels) Character Accuracy (%)
255x285 4.61
384x429 98.12
1024x1087 99.49
2048x2289 99.15
Table 1: Relationship between pixel size and character accuracy
2.4 Cube Libraries
The current OCR detection using Tesseract 3.02 simply translates the image to text, but does not take into
account the relationship behind the identified letters and word formations in order to provide an intelligent
result. The Cube libraries when used along with Tesseract, help in improving the contrast between the words
and the background in images and help boost the performance of OCR recognition.
The key features of CUBE libraries include:
• Performing adaptive thresholding prior to OCR, to improve text contrast..
• Windowed Segmentation to improve word recognition, by recognizing smaller pieces of image first and
stitching them together later.
• Compare translated junk data to a dictionary database and the most frequently used words in a particular
language, so as to retrieve data lost due to noise.
A comparison of results on using Tesseract and on using Cube without any pre-processing, has been illus-
trated below.
9
Figure 4: Original Image
Figure 5: Using only Tesseract
Figure 6: Using Tesseract and Cube Libraries
10
2.5 Android
Android is a Linux-based operating system designed primarily for touchscreen mobile devices such as smart-
phones and tablet computers. Initially developed by Android, Inc., whom Google financially backed and later
purchased in 2005[19]. Android was unveiled in 2007 along with the founding of the Open Handset Alliance: a
consortium of hardware, software, and telecommunication companies devoted to advancing open standards for
mobile devices. The first Android-powered phone was sold in October 2008 [20].
Figure 7: Android OS running ”Jellybean” (version 4.2)
Android is open source and Google releases the code under the Apache License. This open source code
and permissive licensing allows the software to be freely modified and distributed by device manufacturers,
wireless carriers and enthusiast developers. Additionally, Android has a large community of developers writing
11
applications (”apps”) that extend the functionality of devices, written primarily in a customized version of the
Java programming language. In October 2012, there were approximately 700,000 apps available for Android,
and the estimated number of applications downloaded from Google Play, Android’s primary app store, was 25
billion[21].
These factors have allowed Android to become the world’s most widely used smartphone platform and
the software of choice for technology companies who require a low-cost, customizable, lightweight operating
system for high tech devices without developing one from scratch[22]. As a result, despite being primarily
designed for phones and tablets, it has seen additional applications on televisions, games consoles and other
electronics. Android’s open nature has further encouraged a large community of developers and enthusiasts to
use the open source code as a foundation for community-driven projects, which add new features for advanced
users or bring Android to devices which were officially released running other operating systems[23].
The open source nature of the Android platform and thereby, the culture of the community makes Android
a great choice for using with the Tesseract OCR Engine. Using Androids Java Native Interace (JNI) and Native
Development Kit (NDK), the conversion of the C++ code in Tesseract into managed code in Java which is
recognised by Android becomes a manageable task and this allows the full functionality of Tesseract to be
explored with the immensely added advantage of an Android mobile operating system. In addition, development
and deployment of application on Android and consequently the Google Play store is a smooth process. Thus,
taking into consideration these various factors, the author has decided to specialise his project, Scope, for the
Android platform.
2.6 Image Processing Libraries
2.6.1 AForge.NET
AForge.NET is a C# framework designed for developers and researchers in the fields of Image Processing,
Computer Vision, and Artificial Intelligence. AForge.Imaging which is the biggest library of the framework,
contains different image processing routines, which are aimed to help in image enhancement or processing as
required various computer vision tasks.
12
The library consists of a wide array of filters to perform various colour correction, convolution, binarization
and thresholding operations. In addition to these, AForge also offers methods to perform edge detection and
feature extraction with Hough Transform analysis[24].
The functions are extensively documented and online help in the form of user forums and sample code
snippets is readily available. The libraries are constantly updated with new versions being frequently released.
2.6.2 OpenCV
OpenCV (Open Source Computer Vision Library) is an open source C/C++ library for image processing and
computer vision developed by Intel. It is a library of programming functions mainly aimed at real time image
processing. It is free for both commercial and non-commercial use[25].
OpenCV was originally written in C but now has a full C++ interface and all new development is in C++.
There is also a full Python interface to the library. Recently the OpenCV4Android SDK was developed and
released to enable using OpenCV functionality in Android applications.
OpenCV offers a comprehensive collection of image processing capabilities that surpasses that offered by
AForge. OpenCV in addition to supporting various image filters, transformations and thresholding mechanisms,
presents us with the ability to identify, compare and manipulate histograms in order to perform intelligent and
automated processing of images.
The most important feature of OpenCV is that it allows complex matrix operations to be performed on
images. This enables developers dealing with image processing to perform actions with greater understanding
and control over what is being done.
2.6.3 MATLAB
The Image Processing Toolbox that is included with MathWorks MATLAB provides a comprehensive set of
reference-standard algorithms, functions, and applications for image processing, analysis, visualization, and
algorithm development.
Operations that can be performed include image enhancement, image de-blurring, feature detection, noise
13
reduction, image segmentation, geometric transformations, and image registration. Many toolbox functions are
multithreaded to take advantage of multicore and multiprocessor computers[26].
MATLAB is a high-level scripting language meaning that it will take care of lower-level programming issues
such as declaring variables and performing memory management without the user having to worry about it. This
essentially makes MATLAB an easier language to get familiar with faster and allows the user to quickly piece
together a small amount of programming code to prototype an image processing operation [27].
2.6.4 Comparison of Libraries
AForge and MATLAB are more generic processing libraries that cater to a variety of requirements whereas
OpenCV was built with the main focus on image manipulation. Hence its code is highly optimised for this
purpose. It provides basic data structures for matrix operations and image processing and offers more extensive
functions when compared to the other two libraries.
Since AForge and MATLAB were built on C# and Java which are in turn built on C, they are higher level
languages. Though this means that memory management or other lower-level programming issues will be taken
care of, it also means that the processor will be kept more busy trying to interpret the higher level language,
turning it into lower-level C code and finally executing that code[28].
OpenCV however, is essentially a library of functions written in C. Which means it is closer to providing the
computer machine level code for it to execute. Ultimately more image processing is done during the computers
processing cycles, and less interpreting. As a result of this, programs written in OpenCV run much faster than
similar programs written in MATLAB or AForge.
Moreover, OpenCV is available for use on multiple mobile platforms such as iOS[29], Android and Win-
dows. This is a crucial factor in choosing the library as the development of the Scope application is intended
to be based on the Android platform. MATLAB does not at present provide an SDK for Android development
whereas AForge will require an external framework such as Mono for it be ported onto Android[30].
Thus given all these issues, it was decided that the Scope application would incorporate the OpenCV library
for its image processing operations, primarily due to its speed and efficiency of processing and its cross-platform
14
compatibility which allow for easier expansion of the application in the future.
2.7 Current Alternatives
Such an implementation of the OCR technology isnt the first to show up on the Android market, and listed
below are the closest competitors to the Scope application.
2.7.1 Google Goggles
Figure 8: Google Goggles OCR Scenario
Google dominates Android market in general and its no different when it comes to OCR applications.
Google Goggles [31] is a visual search app that allows users to take pictures of items that they want to obtain
more information about. Among the various types of input that Goggles accepts, business cards are one of them.
The application captures the text areas in the image of the business card and sends it to Googles OCR engine
in the cloud. The parsed information is then pushed back to the phone, which recognises the information as a
contact and shows the user relevant contextual menus.
One of the main drawbacks of the app is that it requires an active internet connection to be able to carry
out any OCR processing. This means that the results churned out by the application are subject to delays and
15
lags faced in sending the photo, processing the photo, compiling the results and pushing out the results over the
cellular network service of the user.
2.7.2 OCR Test
Figure 9: OCR Test app being used in continuous preview mode
OCR Test is an experimental app that attempts to harness the power of Tesseract and use it for OCR on
Android [32]. This app runs the Tesseract engine on the users device–without uploading your images to a
server–and is suitable for recognizing individual words or short phrases of text. The app also offers translation
services of the recognised text, which is powered by Google/Bing Translate.
The default single-shot capture runs OCR on a snapshot image that is captured when the user clicks the
shutter button, like a regular photo. The application also offers a continuous preview option while taking a
picture in which it shows a dynamic, real-time display of what the device is recognizing right beside the camera
viewfinder. An on-screen resize-able viewfinder box allows the user to focus on one word or phrase at a time
and the recognised text is displayed in the top right corner of the window. The continuous preview mode works
best on a fast device.
While the application is a decent attempt at using the Tesseract engine to carry out the processing, its
accuracy is thwarted Tesseracts limitations (see: Tesseract Shortcomings) making the operation of this app a
hit-or-miss scenario, where the best results are produced only when the ideal image conditions are met.
16
2.7.3 ABBYY Business Card Reader
Figure 10: ABBYY Business Card Reader Interface
By far, the most competitive solution in the Android marketplace currently is the ABBYY Business Card
Reader [33]. Developed by the Russian company ABBYY, the applications built-in optical character recognition
allows the user to quickly receive precise results. The application also supports a database of 20 different
languages, which is used to translate business cards in various languages to English. Although the application
works using its built-in OCR engine, connecting to the network is required to authorise licenses and actually
use the app. This adds for a slight inconvenience for users who arent connected through their phones [34].
Added functionalities like searching for more information on social networks also depend on network con-
nectivity. The application, however, fails to recognise words accurately in a number of scenarios, for instance,
if the background is in black and the text is in white. Special symbols (@, #, $ etc.) also are another issue.
The application circumvents this problem by highlighting the characters which it is unsure of and providing
alternatives for the user to choose from.
17
Part II.
Technical Details
18
3 Project Decision
3.1 Understanding the Tesseract OCR Engine
The most important task to start with was to understand the heart of this entire project, the OCR Engine devel-
oped by HP and now by Google Labs Tesseract. It was voted in as one of the 13 best OCR Engines in existence
and is considered one of the most accurate free software OCR engines currently available.
The most recent change is that Tesseract can now recognize 60 languages, is fully UTF8 capable, and is
fully trainable. This makes Tesseract an extremely powerful free tool to use in this project.
Tesseract 3.0.2 is the latest version of the engine with a whole new set of languages it supports, and revolu-
tionary features that allow it to recognize text from any angle. With this, we feel like its a good starting point to
make this project as accurate as possible.
In Java, the Tesseract engine is encapsulated by the TessBaseApi class. Below I have outlined some of the
public methods and their functionalities, so as to understand the public scope of the Tesseract engine in Java.
TessBaseAPI.init - Initializes the Tesseract engine with a specified language model
TessBaseAPI.getInitLanguagesAsString - Returns the languages string for last valid initialization
TessBaseAPI.clear - Frees up recognition results and any stored image data
TessBaseAPI.end - Closes down tesseract and free up all memory
TessBaseAPI.setPageSegMode - Sets the page segmentation mode
TessBaseAPI.setImage - Provides an image for Tesseract to recognize
TessBaseAPI.getUTF8Text - The recognized text is returned as a String as UTF8
3.2 Windows WPF and Windows Phone
The first step was to try building the Tesseract Engine on Windows and replicating simple code that could
allow images to be converted into text. The team planned on port it into a complete Windows system including
Windows Phone and Windows 8. When working on the WPF (Windows Presentation Foundation) application
19
the results were good, the author used the .NET wrapper for Tesseract and created a simple project and managed
to convert clear text into letters. However, there were memory issues at times which indicated errors with the
wrapper that was being used.
The next step was trying to port this into Windows Phone. After building several libraries, the author
discovered a fundamental flaw Windows Phone does not accept code in C++ or C, only C#. Since Tesseract
had been written completely in C++ and C#, it could never be run on the current version of Windows Phone
(7.5). However Microsoft has promised C and C++ support in Windows Phone 8 which was released to the
public on 29th October 2012.
Therefore, the team would have to move into a platform which accepted native code and allowed us to use
the Tesseract Engine.
3.3 Moving to Android
Moving to iOS was out of question there was no way the team could get it to work with all the restrictions
Apple Inc. has over it. In addition, the pre-requisite of having a Mac to code in iOS made this an all the more
easier decision. Thus the resultant move was to port to the Android platform.
The biggest challenges here were to get used to the Java programming language which Android uses. In
addition, since Tesseract is written in C++, the team would have to use one additional layer to convert the C++
code into Java managed code. This layer is called the JNI and is converted by using the Android NDK.
As seen from the figure above, the NDK can convert the C++ code into to the Java Native Library and then
the JNI. This was essential for Tesseract. To make this work, the author had to also use Apache Ant, a Java
builder that can combine the JNI and Java and build a final application compatible for Android and other Java
projects.
Once this was ready there was now a Tesseract Engine Library that was working on Android. Now the
remaining task was simply to make it work in the final application.
20
Figure 11: Android Native Development Kit [35]
3.4 Application Workflow
The workflow for the app is given in figure 12.
Up on receiving the photographed or uploaded image, the application proceeds to perform an edge detection
operation to identify the portion of the card from the image. Following this an automatic rotation is carried out
which aims to align the identified card in parallel with the horizontal plane. This is done in order to facilitate
effective text recognition. The cropped image of the card is then passed through functions which will segregate
the card in terms of differences in background colours for the case of cards containing coloured patterns. These
segments are then subjected to image processing filters which will clean the image by removing noise and
undesirable pixels or dust. The cleaned segments are further divided into smaller segments based on text on the
card which are grouped together in a process known as text segmentation. The various text segments are then
fed to the Tesseract OCR engine which will recognise and convert the segments to machine readable text. The
final step involves parsing through the text which has been retrieved from these segments and categorising it
into the various pieces of contact information such as name, address, E-mail, website, fax and phone number.
The author of this document is a member of the team of four that was involved in the development of
this application. In this workflow, the work was divided equally amongst the four team mates, to do advance
21
Figure 12: Scope application workflow
22
research and create relevant algorithms to accomplish their relevant sections. The author of this thesis took on
the following tasks:
• Creation of layouts and managing integration within the Android App
• Background segmentation
• Text segmentation
• Segment clustering
• Image and memory management in app
• Integrating Tesseract with Android and optimizing performance
The rest of this report logically analyses and covers these various tasks in due detail. The final section of
the report also covers the results of this app after the entire workflow has been implemented and run together
for images.
4 Image Processing
The first task for the project introduction of image processing filters to the system. In a system where the
accuracy of the result depends highly on the quality of the image and how clean it is, it is important to perform
some pre-processing to the image before passing it into the Tesseract Engine. To do this in a diverse and
wholesome manner, a set of Image Processing libraries have to be compiled together in a simple fashion to be
simple enough to be accessed by the system, and later, the automation algorithm. This task was undertaken by
the author and fellow team-mate and the tasks were divided equally between them. This section describes the
filters implemented by the author and the rationale behind implementing each one of the filters mentioned.
4.1 Brightness
A general image processing operator is a function that takes one or more input images and produces an output
image. In this kind of image processing transform, each output pixels value depends on only the corresponding
23
input pixel value (plus, potentially, some globally collected information or parameters)[36].
One commonly used point process is addition with a constant:
g(i, j) = f(i, j) + β (1)
In equation 1 , β is the bias parameter, which controls the brightness of the image, g is the output image
matrix, f is the input image matrix and i and j refer to row and column number respectively.
Brightness is a basic functionality of any image processing class and a good brightened image will help
enhance results better and therefore the author chose this class as one of the implementations.
Figure 13: Application of the Brightness Algorithm
4.2 Smoothing Filters
Smoothing, also called blurring, is a simple and frequently used image processing operation. There are many
reasons for smoothing. For Scope, the main purpose of smoothing is to reduce noise in a picture, and thereby
ensure smoothness in a picture. This, naturally, leads to better results in OCR reading due to the balance between
varying pixels in the image.
24
To perform a smoothing operation we will apply a filter to our image. The most common type of filters is
linear, in which an output pixels value is determined as a weighted sum of input pixel values:
g(i, j) =∑k,l
f(i+ k, j + l)h(k, l) (2)
In equation 2 , h(k,l) is the filter kernel, which is nothing more than the coefficients of the filter, i is row size
and j is column size.
4.2.1 Homogeneous Filters
This filter is the simplest of all. Each output pixel is the mean of its kernel neighbours ( all of them contribute
with equal weights)[37]. This results in a simple matrix kernel, which looks like:
κ =1
κwidthκheight
1 . . . 1
.... . .
...
1 . . . 1
(3)
The result of a Homogeneous Filter looks as in figure 14.
25
Figure 14: Application of the Homogeneous Filter
4.2.2 Gaussian Filters
Gaussian filtering is done by convolving each point in the input array with a Gaussian kernel and then summing
them all to produce the output array. To understand what a Gaussian kernel is like, we can imagine a 1-D image,
and the kernel looks in a Gaussian Curve. This is referenced in figure 15
Figure 15: 1D Gaussian Kernel
The weight of its neighbours decreases as the spatial distance between them and the centre pixel increases[38].
This concept is the same for a Gaussian kernel in 2-D, representing a 2-D curve that increases in the centre of the
26
2-D space. The equation for calculating a Gaussian kernel therefore, is as per the Normal distribution equation
4.
G0(x, y) = Ae−(x−µx)2
2σ2x+
−(y−µy)2
2σ2y (4)
In 4, µ is the mean and σ is the variance for variables x and y. An applied Gaussian filter is shown in 16.
Figure 16: Application of the Gaussian Filter
4.2.3 Median Filter
The median filter runs through each element of the signal (in this case the image) and replaces each pixel with
the median of its neighbouring pixels (located in a square neighbourhood around the evaluated pixel). Median
filtering is very widely used in digital image processing because, under certain conditions, it preserves edges
while removing noise [39]. The result of a Median filter is shown in figure 17.
The weight of its neighbours decreases as the spatial distance between them and the centre pixel increases.
This concept is the same for a Gaussian kernel in 2-D, representing a 2-D curve that increases in the centre
27
Figure 17: Application of a Median Filter
of the 2-D space. The equation for calculating a Gaussian kernel therefore, is as per the Normal distribution
equation 4.
4.2.4 Bilateral Filter
Sometimes filters do not only dissolve the noise, but also smooth away the edges. To avoid this (at certain
extent at least), we can use a bilateral filter. In an analogous way as the Gaussian filter, the bilateral filter also
considers the neighbouring pixels with weights assigned to each of them. These weights have two components,
the first of which is the same weighting used by the Gaussian filter. The second component takes into account
the difference in intensity between the neighbouring pixels and the evaluated one[40].
The basic idea underlying bilateral filtering is to do in the range of an image what traditional filters do in
its domain. Two pixels can be close to one another, that is, occupy nearby spatial location, or they can be
similar to one another, that is, have nearby values, possibly in a perceptually meaningful fashion. It replaces
the pixel value at x with an average of similar and nearby pixel values. In smooth regions, pixel values in a
28
small neighbourhood are similar to each other, and the bilateral filter acts essentially as a standard domain filter,
averaging away the small, weakly correlated differences between pixel values caused by noise. The result of a
Bilateral filter is shown in figure 18.
Figure 18: Application of a Bilateral Filter
4.3 Combining Filters
Concurrently while the author was in the process of developing the filters mentioned in the previous section, a
fellow team-mate was implementing the second half of image processing filters for the teams self-created image
processing library. The other functions implemented are listed as follows:
• Contrast
• Greyscale
• Histogram Equaliser
• Morphing filters
29
• Pyramids
• Thresholding Filter.
The team agreed that this large collection of filters should serve their initial tests well enough. There was
now a need to find a way to test all functions independently and together.
4.4 ScopeMate
ScopeMate was a separate app developed for the purposes of testing the functionality of Scope. It was developed
so that image processing filter can be dynamically and immediately tested and the OCR results retrieved. The
author was purely responsible of creating this test app and combining the Image Processing library with the
Tesseract Engine. In addition, the order of applying the filters also affects the final result, so ScopeMate was
laden with an internal algorithm to listen to the order of the application of filters by the tester. This order and
values are recorded and applied to an image, showing the visual result as well as the results from the testing
process.
Figure 19: ScopeMate main screen
ScopeMate was to be primarily used by the teams tester (a fellow team-mate) and the results calibrated are
explained on the team-mates report.
30
5 Image Segmentation
Image segmentation is the process of subdividing a digital image into multiple segments or sub-images. The
goal of segmentation is to streamline the representation of an image into something that is more meaningful and
easier to analyse. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in
images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such
that pixels with the same label share certain visual characteristics [41].
The result of image segmentation is a set of segments that collectively cover the entire image, or a set of
contours extracted from the image. Each of the pixels in a region is similar with respect to some characteristic
or calculated property, such as colour, intensity, or the presence of some common trait[42].
In this project, the author performed two types of image segmentation:
• Background Colour Segmentation
• Text Segmentation
The relevance, objectives, implementation and results of the above methods of segmentation are given in the
sections that follow. It must be noted, that due to the device resource constraints, no existing algorithms were
used to perform these operations, and it was a complete research effort from ground up. The workflow for the
image segmentation used in the app is given in figure 20.
Figure 20: Image Segmentation Workflow
5.1 Background Colour Segmentation
Background colour segmentation is the process of segmenting an image based on the trait of background colour.
In business cards, there could be many cases of varying background colours, and it is important to segment by
31
background colour before performing text segmentation. Usually each background contains a certain number
of text segments, and it is contained within the background colour. In this section, the author will attempt to
justify the necessity for background segmentation, the algorithms involved, provide a thorough analysis of the
results and explore the future uses and improvements on the idea.
5.1.1 Objectives of Background Colour Segmentation
The Tesseract OCR Engine ”learns” an entire image before it performs the OCR imaging. For this purpose, the
more concentrated the image is, in terms of several features, the better the recognition process by the OCR. One
purpose of background segmentation is to provide this contextual purpose to the application. Using background
segmentation, the card is divided by background colour and this provides a more concentrated scope of learning
for the OCR engine.
The second objective of background colour segmentation is to separate the business card into its differently
coloured backgrounds because the adaptive thresholding mechanisms used for cards with dark backgrounds, is
different from that used for light coloured backgrounds. The preliminary identification of the different parts
of the card is performed by the background colour segmentation algorithm. By splitting the card into its dif-
ferent background colours, the strategy is to create a stronger, more customised adaptive thresholding for each
background colour, and thereby create more accurate results.
5.1.2 Alternatives Explored
Histogram Based: Histogram based background segmentation involves the utilisation of an image histogram
to detect the primary intensities of the image. By method of contour analysis, combined with the results of the
histogram, the user is able to make a reasonable guess as to the nature of the backgrounds in the card. Text
is usually a very small representation on the histogram, so by eliminating all histogram peaks that fall below
a set threshold value, the program is able to identify the types of backgrounds, and by using contour analysis
and image masking, accurately point out backgrounds of the image. The problem with using a mere histogram
based approach is that, it fails to work for any backgrounds with gradients[43]. Gradients occupy a large number
32
of peaks in the histogram even though they may be from one background technically. To overcome this, this
method has to be combined with Canny edge detection.
Histogram and Canny Detection Based: The Canny Detector takes as input a grayscale image, and produces
as output an image showing the positions of tracked intensity discontinuities. The Canny algorithm performs
various operations. Firstly, since the image is susceptible to noise present in image data, a filter is used where the
raw image is convolved with a Gaussian filter. The result is a slight blurred version of the original which is not
affected by a single noisy pixel to any significant degree. The edge detection operator then returns a value for the
first derivative in the horizontal (Gx) and vertical (Gy) directions. From this the edge gradient magnitude and
direction can be found. The direction is rounded to one of four angles (0, 45, 90 and 135 degrees) representing
vertical, horizontal and diagonal directions. Given estimates of the image gradients, a search is then carried out
to determine if the gradient magnitude assumes a local maximum in the gradient direction, a process known as
non-maximal suppression[44]. The tracking process exhibits hysteresis controlled by two thresholds: T1 and
T2, with T1 is greater than T2. Tracking can only begin at a point on a ridge higher than T1. Tracking then
continues in both directions out from that point until the height of the ridge falls below T2. This hysteresis helps
to ensure that noisy edges are not broken up into multiple edge fragments. Thus the Canny Detector is great
for identifying squares with gradient shading in the image. By using this Canny detector, we are able to find
edges of the card with similar gradients. This helps split the image into various edges. By applying histogram
analysis to each bounded contour, we are able to identify that contours primary background colour and apply
our masking technique as before to accurately identify the different segments in the card. By this method, the
combination of Canny detection and histograms yields a more accurate result. Unfortunately, this is a more
expensive and time consuming process primarily because of the addition of Canny detection. As the mobile
phone is a limited resource environment, usage of this technique must be carefully justified. The concepts for
both the above algorithms were developed by the author of this thesis.
Background Removal: The third option for this section would be the removal of background altogether. This
is done by detecting the backgrounds using the methods above and applying a reverse mask, so as to preserve
33
the foreground and hide the background. Though this would work quite well, it turns out to be a problem for
cards with darker backgrounds as the text left behind is usually white or closer to white and it would involve yet
another process to fill in the text itself. Again, even though this is possible, the expensive nature of the entire
process must be considered in a limited resource environment like the Android OS for smartphones, and this
leads one to believe that this method may not be suitable.
5.1.3 Background colour Segmentation Algorithm
The author decided to choose the histogram based background segmentation. The rationale behind this was that
the phone has limited resources, and this was still the pre-processing stage of the application. There was still
adaptive thresholding and text segmentation to go before the OCR engine was activated and all the following
processes would need more memory and processing power especially the Tesseract OCR engine. In lieu, the
background segmentation with the least amount of resources consumed would be the most ideal. In addition, the
author hypothesized that there arent very many business cards with gradient backgrounds. Usually professional
organisations do not use linear gradients in business cards as it gives of an aura of unprofessionalism, and with
this assumption, using the histogram based background segmentation method makes most sense. A workflow
for this algorithm is given in figure 21.
Figure 21: Background Segmentation Workflow
To highlight the algorithm, the author is using the example of the NUS card given in figure 22.
The first step in the process is grayscaling. With the help of our custom built library on top of OpenCV, this
is made very easy and results in a grayed out version of figure 22. The next step is harder, requiring a histogram
analysis. To calculate the histogram, we use the OpenCV library again.
Histograms are collected counts of data organized into a set of predefined bins. Since it is known that the
range of information value for this case is 256 values, we can segment our range in subparts (called bins). This
34
Figure 22: NUS card used for this example
is illustrated in equation 5.
[0, 255] = [0, 15] ∪ [16, 31] ∪ ... ∪ [240, 255]
range = bin1 ∪ bin2 ∪ ... ∪ binn=15
(5)
The number of pixels that fall in the range of each particular bin is kept count of. Applying this to the NUS
card, the histogram analysis the histogram shown in figure x.
The histogram analysis tells the users that there are two primary peaks in the NUS cards. To be safe, we run
it through an auto analysis algorithm, where the value of the tallest peak is taken, and anything above 30% of
that value, is considered a valid peak. Everything else is removed. From figure x, the peaks at values 240-255
and 112-127 are successfully selected from this analysis. The author has labelled this process as background
detection.
The next step in the process is to apply image masking. The idea of masking is each pixels value in an image
is recalculated according to a mask matrix (also known as kernel). This mask holds values that will adjust how
much influence neighbouring pixels (and the current pixel) have on the new pixel value. In this algorithm, the
author has applied a mask for each bin range, that is a separate mask from 112 to 127 and another mask from
240 to 255. After masking, the figures are given in figures 24 and 25 respectively.
Following masking, each masked image undergoes a contour analysis. The purpose of this is simple after
35
Figure 23: Histogram of NUS card
Figure 24: Image masking applied to histogram bin from 112 to 127
36
Figure 25: Image masking applied to histogram bin from 240 to 255
masking, we have removed all the unnecessary parts of the image and converted them black. The largest
remaining white area will now be the desired background. To identify this, a contour analysis will provide a
set of contours, and the largest contour will be the large white background area visually recognisable after the
masking. The contour analysis for both images are given in figures 26 and 27. With this it is clearly evident that
the masking has definitely helped identify the correct backgrounds from the card.
Figure 26: Contour analysis applied to histogram bin from 112 to 127
With this the correct background colours and segments have been identified. This can now be separated into
sub images using OpenCVs Region of Interest functionality. More about this is given in the section Region of
37
Figure 27: Contour analysis applied to histogram bin from 240 to 255
Interest below. The results and testing sections of the algorithm has been covered in Part III of this thesis.
5.2 Text Segmentation
Text segmentation is the second kind of image segmentation that has been performed in this project. Text
segmentation is the method of locating, identifying, and separating the different closely located text areas in an
image. Each segment is then separately analysed by the OCR engine for maximal results.
5.2.1 Objectives of Text Segmentation
The main objective of text segmentation is to provide a simple, concentrated view of just text for the OCR
engine to analyse. It identifies the different parts of the cards containing text and, using the clustering algorithm
explained in the next section, compiles together the different segments in a visually understandable format. As
the OCR engine learns the image before processing results, a smaller area with just text has a higher chance
of success. Text segmentation also acts as an excellent follow up from adaptive thresholding. For some cards,
adaptive thresholding could leave unwarranted pixels around the thresholded image. The text segmentation
ensures to box in a very concentrated area leaving all these dirty pixels out of the final OCR result, therefore
rendering a more accurate result.
38
5.2.2 Alternatives Explored
Erosion-Clustering: The erosion-clustering method of text segmentation applies erosion based pre-processing
techniques to the image to blur together similar neighbouring regions. The idea is to blur all the text areas to-
gether close enough to be identified clearly in a condition based contour analysis. This approach is inexpensive
and works quite well. However, too many dirty pixels can cause a mistaken assessment of the image and the
text segments identified could be wrong. Erosion-clustering was developed by the author of this thesis.
Spectral Clustering: Spectral clustering makes use of the image like a spectrum to identify different regions
of the image as a matrix and using vector theory. By constructing a weighted graph and an affinity matrix,
the spectral clustering method computes the diagonal of the matrix and combines it with the Eigen vectors of
the same to create a new spectral vector, which then is used to partition the image [45]. Though this image is
powerful, it is still a fairly fresh method and hasnt seen popularity yet in the recent years. Spectral clustering
was developed students in the Harbin Institute of Technology in China.
Normalized Cut Segmentation: There are also methods of segmentation using pure thresholding, by analysing
what the threshold level required for text segmentation is. NCut or Normalized Cut segmentation figures out
a normalised threshold value of the image for text by treating the image segmentation as a graph partitioning
problem for segmenting the graph and the threshold value can be used, much like Otsus threshold over an image
to separate the text[46]. The problem with this is that it does not work very well for images with coloured text
and coloured backgrounds, and seeing that most of the business cards will be in colour, this alternative could
certainly be a problem for this project.
5.2.3 Text Segmentation Algorithm
The chosen method for performing text segmentation was erosion-clustering, which was developed by the author
of the thesis. Besides the innovative appeal, erosion-clustering is also inexpensive in this project as it uses many
readily available OpenCV methods to enhance the image well enough to produce a reasonable result for text
segmentation. The workflow used in this text segmentation method by the author is given in figure 28.
39
Figure 28: Workflow for Text Segmentation
To explain the text segmentation algorithm, the image in figure 29 can be used. As noticeable, contrary to
the previous section on background segmentation, the author is now using an external card to show that it works
in all scenarios, not just NUS cards. By a mere glance of this image, a visual idea that one receives is that this
card has 6 text segments. If correctly working, this algorithm, along with the clustering algorithm will help get
exactly that.
Figure 29: Business card used to demonstrate Text Segmentation
The first step is to grayscale the image. This is done using OpenCVs grayscaling function, which provides a
greyed out version of the image in figure 29. Following that, a median blur is applied over the entire card to blur
out any dirt or dust particles that have arrived in the image. The median blur has been explained in the image
processing section above. The image after the median blur is given in figure 30.
Next, a general adaptive threshold is applied all over the image, so that the whites of the image can be
inverted with the outlines of the text. It also further helps clarify any dirt pixels from the image. The image now
looks like in figure 31.
40
Figure 30: After applying Median Blur
Figure 31: After applying adaptive threshold
41
Following this, the card is now run by a strong dilate function. The dilate function applies a strong blur all
over the image and the text is no longer visually readable. As the goal of the whole process is to get blurred
lumps of what was text, this method is the most useful in the process so far.
Figure 32: After applying strong dilation
The next step is to apply a relatively weak erosion with a large kernel. The reason for doing that is to possibly
create spaces in images that have been over dilated from the previous process. It doesnt affect images that have
dilated correctly, much like this has. Thus the result in figure 33 isnt that different from the previous image. In
some other cards, it plays a bigger role and can be a differentiating factor for perfect text segmentation.
Figure 33: After applying weak erosion
42
Following this step, the image is run through a contour analysis. All contours found are set to be bounded
by a rectangle. This roughly generates many contours that overlap within on another as seen in figure 34.
Figure 34: Full contour analysis of card
The next thing to do is to clean the image. A cleansing process is applied through out the image to remove
contours that may be entire card and to remove contours that may be dust particles. This is done by dynamically
checking the largest contours available and creating thresholds against them against the total area of the image.
By applying the cleaner and also removing any contours within contours, the image generated is as per figure
35.
Figure 35: Text segmentation without clustering
43
The text segments have now been identified successfully at this point. The text segmentation process is done.
However there are 6 visual segments and by looking at figure 35, there are certainly more than 6 identified
segments. The next section of this report deals with the algorithm used to cluster all the neighbouring text
segments together.
5.3 Clustering
Clustering is the process of radially collect information about the closest objects and merge them together. In
this project, clustering is directly relevant to text segmentation. By using the clustering algorithm developed by
the author of this thesis, the text segmentation process is completed and reaches the recommended results.
5.3.1 Objectives of Clustering
The clustering algorithm comes into play in this context in the scenario shown in figure 36. Clustering behaves as
Figure 36: Clustering feedback loop
a feedback loop. This means that, if there are any segments that are overlapping they will be sent through another
pass with clustering algorithm to merge together. The discussion on this is done in the further improvements
section below.
5.3.2 Future Improvements
Future improvements to the clustering algorithm could be to use a recursive check. This will mean that instead
of a feedback loop, a single recursive loop checks for predicted overlaps and clusters the relevant portions
together.
44
Figure 37: Clustering algorithm issue
Another issue in the current clustering algorithm is illustrated in figure 37. In the mentioned figure, the blue
border represents the edges of the image. In this instance assume A, B and C are actual text regions, while D
is a logo. Due to the clustering algorithm, event though A, B and C successfully segment, D overlaps the new
segmentation, and by virtue of the feedback loop, D too will now be clustered with A, B and C. This would
mean, almost the entire card would be sent back in without any segmentation. This issue still prevails in the
existing algorithm and should be fixed in the near future. A viable solution would be to identify that D is a logo
and prevent clustering when that identification is successful, but it takes too much time and effort to be able to
implement this in the time span of this project.
5.3.3 Clustering Algorithm
The clustering algorithm works on distance check mechanism and then populates a list of queues depending
on which cluster they belong to. For all the corners of every rectangle in the image, the distances to other
rectangles are taken and compared. If the distance is shorter than a specified minimum distance, then the
clustering algorithm adds it to the relevant clustering queue. The flowchart given in figure 38 describes the
workings of this algorithm.
When clustering is applied to the card given in 35, the text segmentation can now be successfully completed.
After applying the clustering algorithm, the result is as in figure 39. This results in the 6 segments that were
predicted visually and can now be separated into sub images using the Region of Interest.
45
Figure 38: Clustering algorithm flowchart
46
Figure 39: Text segmentation with clustering applied
5.4 Region of Interest
A region of interest (ROI) is a sub-matrix that has to be extracted from within a matrix. In segmentation, this is
extremely important as the card is being split into segments. Almost all functions have support the work with
ROI and work with the selected image area, which is useful to speed up the algorithms. Thus if only a specific
area is needed - it can be extracted and worked with without affecting the whole image.
Figure 40: Region of Interest
To use ROI in Android with OpenCV, the following simple code snippet can be used.
// Create region of interest and save as a seperate bitmap
Mat cropped = performCrop(x, y, width, height, sourceImageMat);
destImage = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888);
Utils.matToBitmap(cropped.clone(), destImage);
47
6 Optical Character Recognition in Android
6.1 Managing Images in Android
There are a number of reasons why loading bitmaps in Android applications is tricky. Some of them are:
• Mobile devices typically have constrained system resources. Android devices can have as little as 16MB
of memory available to a single application.
• Bitmaps take up a lot of memory, especially for rich images like photographs. If the bitmap configuration
used is ARGB 8888 (the default from the Android 2.3 onward) then loading this image into memory
takes about 19MB of memory (2592*1936*4 bytes), immediately exhausting the per-app limit on some
devices
• Android app UIs frequently require several bitmaps to be loaded at once. Components such as ListView,
GridView and ViewPager commonly include multiple bitmaps on-screen at once with many more poten-
tially off-screen ready to show at the flick of a finger.
Since there is very limited memory, ideally a lower resolution version has to be loaded in memory[47]. The
lower resolution version should match the size of the UI component that displays it. An image with a higher
resolution does not provide any visible benefit, but still takes up precious memory and incurs additional perfor-
mance overhead due to additional on the fly scaling. In events that memory is not managed right, it results in an
overload of the virtual machine heap and the following dreaded message crashes the entire app:
java.lang.OutofMemoryError: bitmap size exceeds VM budget
To implement this on the fly from the varying types of Android phones, the author has implemented some-
thing called a Bitmap Handler. In creating the handler, the traits to be considered were: Here are some factors
to consider:
• Estimated memory usage of loading the full image in memory.
• Amount of memory willing to commit to loading this image given any other memory requirements of the
app.
48
• Dimensions of the target UI component that the image is to be loaded into.
• Screen size and density of the current device.
To ensure this, the bitmap handler first detects the screen size of the phone and sets appropriate dimensions.
Following this, it runs through a feedback loop with the image, where if the image is too big, it will be scaled
down proportionally. After it has reached the correct scaled size, it is allowed to be passed into the app. The
bitmap handler comes with 3 settings: high resolution, medium resolution and low resolution, though the author
is only using the high resolution option in most of the app. Lower resolution yields less accuracy, but faster
processing time and this has been explored in Chapter 8 in detail.
The bitmap handler has been developed by the author of this thesis. Figures 41, 42 and 43 show examples
of how the different resolutions are changed and handled by the bitmap handler for the same image.
Figure 41: Bitmap handler - high resolution
6.2 Integrating Tesseract with Android
To integrate Tesseract into Android, a Java-based fork of the Tesseract OCR Engine created by Robert Theis
on GitHub was used. It is based upon the tesseract-android-tools project which is a reference from the original
Tesseract website. It is a project written on Javas Native Development Kit, with a Java based API. By compiling
it and running the local NDK, the project is converted into an Android-compatible library. Being a Windows
49
Figure 42: Bitmap handler - medium resolution
Figure 43: Bitmap handler - low resolution
50
user, there was a need to get a Linux based environment to be able to run the build functions. To do this, the
author installed Cygwin and used that as a Linux layer to carry out the necessary functionality. Cygwin is a
collection of tools which provide a Linux look and feel environment for Windows and acts as a Linux API layer
providing substantial Linux API functionality[48]. By installing the gcc-core, gcc-g++, make, swig libraries
into Cygwin, the author was able to acquire an environment where the first stage of building the NDK was
ready. When built, the C++ side of the NDK was completely built. The process took a little over 50 minutes.
The next step is to compile the C++ side and integrate it into the Java API and make a Java library. To do this,
the author used a tool called Apache Ant. Apache Ant is a Java library and command-line tool whose mission
is to drive processes described in build files as targets and extension points dependent upon each other. The
main known usage of Ant is the build of Java applications. Ant supplies a number of built-in tasks allowing
to compile, assemble, test and run Java applications[49]. Ant can also be used effectively to build non Java
applications, for instance C or C++ applications. By running it with the two tools, the author was able to
convert the project into an Android library. From this point, it was as simple as importing as a reference into
the Scope app and marking it as a library. Figure 44 shows all the libraries for Scope correctly referenced after
compilation. As visible, Scope is a project compatible only on Android OS version 4.0 (Ice Cream Sandwich)
and upwards, thereby meeting the minimum requirement for Tesseract on Android which is 2.2.
6.3 Multi-threading
Scope is architected as a workflow. Each step has to be completed before it reaches the next step and acts as
the input for the next process. In this way, it follows what is referred to as a chain model. When the final
data reaches the OCR Engine, it is in the form of several separate adaptively thresholded images. As the OCR
Engine can be instantiated only once, it is a matter of careful architecture to how resources can be used at
maximum whilst getting the fastest possible results. Assume there are 6 resulting segments after all the pre-
processing techniques have been completed. These 6 segments can be passed into the OCR one at a time in two
different ways. The authors first method of implementation was to use a linear chain of Asynchronous tasks as
represented in figure 45.
51
Figure 44: Scope integrated with Tesseract and OpenCV libraries
Figure 45: Single-threaded model
52
In this method, which is called the Single Threading model, each segment passes through the OCR Engine
only once. For 6 segments, the time went up to152 seconds before being sent to the contacts parser! This was
completely inefficient for the app and therefore the author decided to introduce multithreading into the app. The
multi-threaded model is represented in figure 46.
Figure 46: Multi-threaded model
In the multi-threaded model, all the segments are treated parallel. The instantiation reference object identi-
fies the phone resources at that moment and how many threads can be spawned to run an instance of Tesseract.
By instantiating Tesseract outside the multithreaded model[50], the engine has to only be instantiated once, and
various clones of the references are passed to each thread, thereby saving valuable time constructing the engine.
In addition, the maximum possible segments are being processed in parallel, therefore shortening the entire
time. When trying the multi-threaded model, the time for the same card was now an astounding 87 seconds.
The performance had improved by 42.7%. This is a tremendous improvement and even though it is still slow,
it is limited by the fact that the app is offline. In this environment, 87 seconds is indeed a very optimal time for
the Engine to perform results in. Therefore, the current model used in the app is the multithreaded model, based
on the optimisation that it renders.
6.4 Using the CUBE libraries
As discussed in the literature review, the current OCR detection using Tesseract 3.02 simply just translates the
image to text, but does not take into account the relationship behind identified letters and word formations in
order to provide an intelligent result. The English CUBE libraries consist of seven different files [51].
53
• eng.cube.bigrams This helps automatically correct the identification of most commonly found bigrams
in detected text.
• eng.cube.fold This auto-formats the detected document into sections, lists, and paragraphs.
• eng.cube.lm This trains tesseract to identify special characters along with alphabets and numbers.
• eng.cube.params Params defines a collated list of global OCR parameters like Max Word Aspect Ratio,
Max segment per character, etc, for faster OCR.
• eng.cube.size Size automatically grids large images into smaller portions for faster OCR.
• eng.cube.word-freq This uses word-frequency data from the identified text to correct similar words that
continue to appear in the image.
• eng.osd.traineddata This automatically corrects orientation of the image if it is not top-down. This would
increase the usability for this applications users, especially when taking pictures with their phones
Figure 47: Tesseract with OSD segmentation
These libraries are however too big to be loaded every time an OCR request comes up. As a solution
to this problem, the CUBE libraries are asynchronously loaded along with the splash screen, right before the
application begins. This method takes place in a background thread such that the user doesnt notice the lag
54
when he/she uses the application. The libraries are permanently loaded on a small space on the memory card,
so that there is no delay in loading the libraries after first time use.
Figure 48: Asynchronous loading of CUBE libraries in Scope
Figure 49: Effect of CUBE libraries: (a) Original Text (b) Only Tesseract (c) CUBE and Tesseract together
The seven CUBE files that have been implemented in this project provide for effective post-processing and
segmentation capabilities that give a substantial boost to Tesseracts performance. One such example of CUBEs
performance improvement over stand-alone tesseract has been illustrated in figure 49.
55
Part III.
Results
56
7 Segmentation Results
7.1 Background Colour Segmentation
7.1.1 Results of Algorithm
Figure 50: Working test case: Background Segmentation
A perfectly working example of background segmentation is given in figure 50. This image is perfectly
segmented into its two predominant colours by the background segmentation algorithm. As seen by the result
on the right, the split into its correct background segment is indicated by the coloured boxes, representing
contours around the segments.
Figure 51: Boundary test case: Background Segmentation
In figure 51, a boundary case is presented. This is defined as a boundary case because it is a very special
type of card that requires a non-rectangular segmentation cut. Since this is not possible using OpenCV, a part
of the card is common to both segments the part with the name. This would result in this segment being sent
57
into the OCR engine twice for results, and a check should be done on the smart parser side.
Figure 52: Failing test case: Background Segmentation
The third text case of background segmentation is a failing case. This is a very special case that fails because
the colour of the logo on the second segment is in the same bin as the background colour of the first segment.
Thus, when finding contours, this too gets included into the final contour analysis and skews the image. This
should be marked as an issue to fix, and has been discussed in the improvements section.
7.1.2 Future Improvements
Background segmentation can be improved in 3 ways.
Firstly, the 30% rule implemented to find which parts of the histogram are backgrounds should be changed
into a dynamic analysis of the histogram. The algorithm should also check if immediately neighbouring bins
are also meeting the 30% requirement and should cluster them together in customised and varying bin sizes.
This will prevent there being overlaps in similar coloured backgrounds.
The second improvement will help the problem in figure x. This would involve changing rectangular
segments into more compound objects therefore preventing overlaps, and helping the results as a whole. As
OpenCV provides only a rectangular extractable region of interest, some sort of overlap splitter algorithm must
be developed for this cause.
The third suggested improvement includes the failing case seen in figure x. In this very special case scenario,
it is important to identify that the background segmentation has failed, take the common overlap area as one
segment and the balance as the other. This check would work in this case, as the overlap occurs due to colour
58
mixing, and the future iteration of the this algorithm should focus on this case.
7.2 Text Segmentation
7.2.1 Results of Algorithm
The text segmentation algorithm, usage and purpose have been discussed in Part II of this document. This
section explores some of the results obtained by the text segmentation algorithm and failing cases have been
displayed and discussed as well. The first example discussed is that in figure 53. This card is an example of a
Figure 53: Working test case: Text Segmentation
successful text segmentation and clustering. Visually, there are 3 segments in the card. This translates perfectly,
as the image on the right perfectly shows the three identified segments encapsulated within the specified area
correctly. This is another example of the successful test case of the text segmentation algorithm and this should
yield better results than the image sent without text segmentation. To check the effect of text segmentation, the
author sent the same card in with and without text segmentation and analysed the results. The results are shown
in the following table.
Without Text Segmentation With Text Segmentation
Accuracy (%) 65.2 82.8
Table 2: Effect of text segmentation on result accuracy
This shows that text segmentation does play a big part in obtaining accurate results in the system, and a
59
successful segmentation can improve results in a big way.
Figure 54: Boundary test case: Text Segmentation
The boundary case for text segmentation is explained by figure 54. In this instance, the text segmenter
wrongly takes a larger segment than what actually is (as the two segments are close by) and this overlaps with
an existing segment. Due to this, the feedback loop kicks into place and performs clustering, which leads to the
image below the two images above. This basically returns the original card, in a smaller area. This way, the text
segmentation has not been effective enough to analyse what has happened, and therefore not worked perfectly,
but hasnt failed either.
The failing case example is the card above. In this instance, no segments have been recognised correctly, and
the feedback loop doesnt clean up overlaps as it usually would. This usually happens in cards with a big logo
right in the middle of the card. In this instance, there are many overlaps but none of them meet the minimum
clustering distance requirement, which is 5% of the diagonal of the card. In the improvements section, the
author has discussed what could be done to improve this and fix this behaviour.
60
Figure 55: Failing test case: Text Segmentation
7.2.2 Future Improvements
Even though text segmentation works very well now, there still need to be a few fixes that need to be addressed
to make it even better. Firstly, is the issue discussed in figure 37. In this figure, it can been that the clustering
portion of the algorithm does not do a sound job in clustering at times and leads to giving a result like in the
case of figure 54. To avoid this, the text segmentation needs to be able to judge if it is holding too much in its
segment and perform a recursive segment validation test. By performing it recursively, it will ensure never to
over compensate its segments, keeping the clustering at the right amounts. The algorithm also needs to look
into moving from just rectangular segments into more compound shaped objects, so as to segment just the right
area.
The second improvement, is the reason for the failing case in 55. This happens because the minimum
clustering distance is too small for the image to eventually cluster and yield a result. To fix this, the text
segmentation algorithm must analyse the segments and dynamically create a minimum distance size based on
the average of the smallest distances between corners of the segments. This way, the clustering distance is kept
dynamic and the result accuracy will improve drastically.
61
8 Performance Results
The first set of test results focus on the accuracy-speed trade-off that the application consumes. As the entire
app is offline, loading higher resolution images is more expensive and time consuming but the results are
significantly better. The card used for testing this is shown in figure 56.
Figure 56: Card used for performing resolution testing
8.1 Resolution vs. Time
App Process Low Resolution Medium Resolution High Resolution
Edge detection (ms) 8483 26684 69033
Background segmentation (ms) 433 2191 3881
Adaptive threshold (ms) 1935 13380 52258
Text Segmentation (ms) 90 834 1864
OCR Engine (ms) 64337 80862 108009
Total time (ms) 75278 123951 235045
Table 3: Resolution vs. Time Test Results
As expected, with increasing resolution, the time taken to complete all the processes in the app is increasing.
62
These results are graphically demonstrated in 57. However, there is a very significant advantage to increasing
the resolution, and this is discussed in the next section.
Figure 57: Graph of resolution testing for speed results
8.2 Resolution vs. Accuracy
Accuracy results significantly vary too. As can seen below, the actual texts of results has been shown with a
percentage of accuracy. The percentage of accuracy has been done by matching every letter with the expected
letter, and taking a success ratio.
Expected Result
Cammie TAN
Senior Manager (NUS Career Centre)
NUS Career Centre
Office of Student Affairs
Yusof Ishak House, Level 1
31 Lower Kent Ridge Road, Singapore 119078
Tel: (65) 6516 1278 Fax: (65) 6774 4463
63
E-mail: [email protected]
Website: www.nus.edu.sg/osa/career
Low Resolution Result
NUS Career Cetttt
Mtxe or ’tuit Mtaita
-hhrmermusN tetelt
" CpNerkent bdge _ bogapore
Tel (651551s " CacFSib?rqMs
S-ttTad tancan-uivdurs7
vtgbsneevvvv- cdu ’Wosaxareer
Accuracy: 35.27%
Medium Resolution Result
Cammie TAN
Senior Manager tNUS Ci"rser Centre)
NUS Career Centre
Office ot Student Affairs
Yusof [shak Hcruse. Level l
31 Lower Kent Ridge Mad, Singapore IT9078
Tel: (65) 6ST6 1278 Fax: (65) 6774 M63
E-mail: [email protected]
Website: wwwsnss.edusgftrsa/career
Accuracy: 77.80%
High Resolution Result
64
Cammy TAN
Senior Manager (NUS Career Centre)
NUS Career Centre
Office of Student Affairs
Yusof Ishak House, Level 1
31 Lower Kent Ridge Road, Singapore 11S078
Tel: (65) 6516 1278 Fax: (65) 6774 4463
E-mail: [email protected]
Website: vvwvv.nus.edusg/osa/career
Accuracy: 94.7%
From this, we are able to see that resolution highly affects accuracy. The significance in difference is shown
in the graph in figure 58.
Figure 58: Graph of resolution testing for accuracy results
Even though high resolution is slow, the author’s primary focus in this project is accuracy, and therefore
65
all results of cards shown will be shown in high resolution, which means processing time would be quite slow.
This happens due to limited processing power and the fact that the application is running completely offline
with inbuilt libraries.
For the rest of the results section, everything will be assumed to be in high resolution henceforth.
66
9 App Results
To demonstrate the accuracy and result of the app so far as a whole, various different types of NUS and external
cards with different features have been tested and accuracy results given. The percentage accuracy of the results
have been calculated, and justification provided if the result is not good enough. All the cards in this section
have to be assumed to be of the highest resolution written by the apps customised bitmap handler.
Given below are descriptions of a few of the terms used in the testing process:
• Clear indicates a fairly new card, with clear text and less dirt and crushing
• Unclear indicates an old card, lots of dust, dirt and crushed edges
• Light indicates the type of background colour of the card, being close to white
• Dark indicates cards with darker background colours
67
9.1 NUS Cards
9.1.1 Clear Card
Figure 59: Image used for NUS Clear Card test
Expected Result Actual Result
Cammie TAN Cammy TAN
Senior Manager (NUS Career Centre) Senior Manager (NUS Career Centre)
NUS Career Centre NUS Career Centre
Office of Student Affairs Office of Student Affairs
Yusof Ishak House, Level 1 Yusof Ishak House, Level l
31 Lower Kent Ridge Road, Singapore 119078 31 Lower Kent Ridge Road, Singapore 11S078
E-mail: [email protected] E-mail: [email protected]
Tel: (65) 6516 1278 Fax: (65) 6774 4463 Tel: (65) 6516 1278 Fax: (65) 6774 4463
Website: www.nus.edu.sg/osa/career Website: vvwvv.nus.edusg/osa/career
Table 4: NUS Clear Card Results
Accuracy: 94.7%
68
9.1.2 Unclear Cards
Figure 60: Image used for NUS Unclear Card test
Expected Result Actual Result
Shona Gillies Shone 1Gli(Lll,,lE5
Assistant Manager Assistant Manager
International Relations Office International Relations Office
3rd Storey, Unit 03-03 3rd Storey, Unit 03-03
Shaw Foundation Alumni House Shaw Foundation Alumni House
11 Kent Ridge Drive, Singapore 119244 ll Kent Ridge Drive, Singapore l 19244
Tel: (65) 6526 4084 Fax: (65) 6778 0177 Tel: (65) 6516 4084 Fax: (65) 6778 0177
E-mail: [email protected] E-mail: [email protected]
Website: www.nus.edu.sg/iro Website: www.nus.edu.sg/iro
Table 5: NUS Unclear Card Results
Accuracy: 93.7%
The name section of this card has not yielded a good result. This is because it is evident that the top part of the
card is quite crushed. In this instance, retrieving the data even by good pre-processing is not possible if the card
has a physical anomaly like the one shown in this example.
69
9.2 External Cards
9.2.1 Clear Light Background
Figure 61: Image used for External Card with Light Background
Expected Result Actual Result
Ching Kuan Thye Keith (tChing Kuan Thye Keith
Engineer, Sofware Dev. Engineer, SoMare Dev.
BG Lifestyle Entertainment I&D BG Lifestyle Entertainment l&D
Tel: +65 6882 4904 Fax: +65 6258 0892 Tel: +65 6882 4904 Fax: +65 6258 0892
[email protected] [email protected]
Philips Consumer Lifestyle Philips Consumer Lifestyle
620A Lorong I, Toa Payoh 620A Lorong I,Toa Payoh
Building TPI, Level 3 Singapore 319762 BuildingTPl , Level 3 Singapore 319762
www.philips.com www.philipsxom
Table 6: External Card with Light Background Results
Accuracy: 96.1%
70
9.2.2 Unclear Light Background
Figure 62: Image used for External Unclear Card with Light Background
Expected Result Actual Result
Hassan Gaffar, Project Manager Hassan Ciaffar, Project Manager
SolarWorld Asia Pacific Pte Ltd. SolatWorld Asia Pacific Pte Lid.
Co Reg No. 198102529K Co ioto, No. 19,%02S2u’.1K
70 Bendemeer Road Luzerne, #06-01 Singapore 339940 70 frendemeey ftoad Lugerne. #06 ’01 Singapore 339940
www.solarworld.sg www.solarworkisg
Tel: +65-6842-3886 Fax: +65-6842-3887 Tel: +65-6842-3886 Fax: +6S-6842-3887
Mobile: +65-8646-5317 DID: +65-6422-2695 Mobile: +6S-8646-S317 DID: ’65-6422-2695
[email protected] hassan gafrar@solarworldsg
Table 7: External Unclear Card with Light Background Results
Accuracy: 81.6%
This card is an example of a dirty card, as it looks like it has been stored away in a possibly dusty place for a
long time. It also has very small fonts, making it harder for Tesseract to see. However, using the preprocessing
methods and segmentation algorithms combined, the result meets the team’s minimum accuracy of 75% and
renders reasonably good results
71
9.2.3 Clear Dark Background
Figure 63: Image used for External Clear Card with Dark Background
Expected Result Actual Result
TITANSOFT PTE LTD TITANS0FT PTE LTD
Software Consultancy & Development Software Consultancy & Development
Design & Build Design & Build
E-Gaming Solution E-Gaming Solution
Video Streaming Video Streaming
Network Security & Consultancy Network Security d Consultancy
TEL +65 6396 6458 FAX +65 6396 6496 TEL -65 6396 6.15t? FAX - 65 6396 6496
www.titansoft.com.sg www.titansoft.com.sg
150 Cantonment Road #02-06/08, 150 Cantonment Road tc02-06 ”
Cantonment Centre Block B, Singapore 089762 Cantonment Centre Block B Singapore 089762
Table 8: External Clear Card with Dark Background Results
Accuracy: 80.8%
Tesseract doesn’t work well with dark coloured backgrounds. Usually, there would be a result of under 20% for
this card. However, the workflow provided by the team ensures that the preprocessing is customised for darker
backgrounds and therefore meets the minimum accuracy percentage set by the team for external cards, of 75%.
72
9.2.4 Unclear Dark Background
Figure 64: Image used for External Unclear Card with Dark Background
Expected Result Actual Result
CHRISTOPHER CAI (CHlRyE)”ir((i))F)Tifiytii c)) -
TECHNICAL DIRECTOR (iOS) Tf CHNKAL DIRECTOR (sos)
+65 9756 2375 +65 9/56 *g75
+65 6342 0810 -65 6–. 00
[email protected] [email protected] thrts@ ’,’, srtiloC9 ’ q
www.replaid.com www.replaid.com
facebook.com/chriscai facebook.corn/r: l”,
REPLAID REPLAID
81 JOO CHIAT ROAD #02-02, SINGAPORE 427725 81 JOO CHIST RWAD #02-?), C/MPR .;,;
Table 9: External Unclear Card with Dark Background Results
Accuracy: 46.5%
This is an example of a failing case card for the application. This card has dark backgrounds, folded edges
and reflecting lights on the card. This makes the analysis process very difficult and therefore renders a negative
result, with only 46.5% accuracy. It is a requirement that the user take a good picture if good results are to be
expected from the app.
73
9.2.5 Colored Background
Figure 65: Image used for External Card with Mixed Background
Expected Result Actual Result
B. T. LOKESH B.Tech. PhD B, T, LOKESH BTeCh ’ti-t
+65 6248 9777 +65 6248 9777
+65 6248 5296 +65 6248 S296
+65 6463 5056 +65 6463 5056
[email protected] bheemal6Punisim.edu.sg
www.unisim.edu.sg wwwoumiisiimt,edluusg
Table 10: External Card with Mixed Background Results
Accuracy: 84.7%
This is an example of a card with both light and dark backgrounds. Here all the components in the process come
into pay together to seperate the sections and render reasonable results. The card meets the threshold set by the
team for a minimum results accuracy of 75% and therefore satisfies the goals set by the team.
74
9.3 Summary of Results
The team tested out cards for 10 NUS cards and 25 External Cards of different types. On this testing, the author
and the team realised that the results of the app depend on three primary factors:
1. Adequate lighting - the cards need to have good lighting to be able to yield reasonable results. Therefore
the user needs to ensure that the lighting is enough before taking a picture. Shadows aren’t much of an
issue, but if they can be avoided, it leads to better results.
2. Physical condition of the card - If the card is damaged in the corners, it can prove to be a problem as seen
in figure 60. Even though the rest of the card was perfect, the name in the card did not yield good results
due to physical damage in the card itself.
3. Dust - If there is a lot of dust and dirt on the card, it can interfere with the letters. Even though the team
implements measures to remove random dust particles, if they merge with the lettering it is unidentifiable
and thus should be avoided when used by the users as much as possible.
Figure 66: Graph of accuracy for NUS cards
With the results, the team has achieved a test accuracy of 92.26% for NUS cards and 77.34% for external
cards.
75
Figure 67: Graph of accuracy for external cards
76
10 Conclusion
At the heart of the Scope application lies innovation, finding creative answers to challenging problems. Fuelled
by sound imagination, ideas that are generated from scratch are fortified by extensive research in the respective
fields consequently transforming them into novel solutions. These solutions when put in place help to utilise the
available resources optimally, given their limitations, to generate the best possible results. This is seen across
the entirety of the application across the different processes. The edge detection and auto-rotation extract and
supply the most important part of the input image. The background segmentation analyses colour variations
and separates the card into different sections so that the cleaning filters and pixel fillers can cater to them
specifically and perform precise image correction. The OCR Engine is then made to work with these processed
segments, generating text that is processed by the Parser, allowing leeway for error that is expected. Thus in
essence, Scope is a system of components that follow a logical flow of functionality while all along effectively
complementing each other to ultimately produce a product that is robust and adaptive; the in-depth analysis of
results and performance that has been done is testament to this. The application produces accurate results when
subjected to a host of day-to-day scenarios and this goes to show that it is a product that is indeed more than
functional in its current state. Suggestions that have been given to improve specific aspects indicate that Scope
undeniably has potential for further growth in future iterations. These improvements will help the application
overcome certain limitations and extend its reach into an ever growing market and help realise its eventual goal
of being one amongst the best and definitively the most unique in its category.
77
References
[1] M. Kanellos. (2003). ”Moore’s Law to roll on for another decade”. CNET [online].
http://news.cnet.com/2100-1001-984051.html.
[2] R. Holley , ”How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic
Newspaper Digitisation Programs”, volume 15 no 3/4, D-Lib Magazine, National Library of Australia,
2009.
[3] CVision Tech. (1998). ”Applications of OCR” [online]. http://www.cvisiontech.com/reference/general-
information/ocr-applications.html.
[4] C. Cutter. (2013). ”TED 2013: Here, the Business Card Is Not Dead” [online].
http://www.linkedin.com/today/post/article/20130308001134-13780238-ted-2013-the-business-card-
is-not-dead.
[5] R. Bennet.(2012). ”How Business Cards Survive in the Age of LinkedIn” [online].
http://www.businessweek.com/articles/2012-02-16/how-business-cards-survive-in-the-age-of-linkedin.
[6] Tech Tip Org. ”Convert Images to Text with Free OCR Applications” [online].
http://www.techtip.org/convert-image-to-text-with-ocr/.
[7] B. K. Ng, ”Screen Capture Optical Recognition Engine SCORE”, National University of Singapore, 2010.
[8] W.S. Lian, ”Heuristic Based OCR Post”, University of North Carolina Chapel Hille, 2009.
[9] S. Dover. (2012 ). ”Study: Number of smartphone users tops 1 billion” [online].
http://www.cbsnews.com/8301-205 162-57534583/study-number-of-smartphone-users-tops-1-billion/.
[10] PassMark Software. (2013). ”Android Benchmarks” [online]. http://www.androidbenchmark.net/cpumark chart.html.
[11] H.F. Schantz, ”The History of OCR: Optical Character Recognition”, Recognition Technologies Users
Association, 2012.
78
[12] M. Mann, ”Reading Machine Spells Out Loud”, Popular Science, 1949.
[13] R. Ahmad. (2012). ”Optical Character Recognition (OCR)” [online].
http://rosalindaahmad.blogspot.sg/2012/04/optical-character-recognition-ocr.html.
[14] A. Kay. (2007). ”Tesseract: an Open-Source Optical Character Recognition Engine” [online].
http://www.linuxjournal.com/article/9676.
[15] L. Vincent. (2006). ”Announcing Tesseract OCR” [online]. http://google-code-
updates.blogspot.sg/2006/08/announcing-tesseract-ocr.html.
[16] D. Wolski. (2012). ”Toolbox: OCR with Tesseract OCR” [online].
http://www.heise.de/open/artikel/Toolbox-Texterkennung-mit-Tesseract-OCR-1674881.html.
[17] S. Bhaskar et al. (n.d.). ”Implementing Optical Character Recogni-
tion on the Android Operating System for Business Cards” [online].
http://www.stanford.edu/class/ee368/Project 10/Reports/Bhaskar Lavassar Green BusinessCardRecognition.pdf.
[18] Tesseract. ”Tesseract FAQ” [online]. https://code.google.com/p/tesseract-ocr/wiki/FAQ.
[19] B. Elgin. (2005) ”Google Buys Android for Its Mobile Arsenal” [online].
http://www.webcitation.org/5wk7sIvVb.
[20] Open Handset Alliance. (2007) ”Industry Leaders Announce Open Platform for Mobile Devices” [online].
http://www.openhandsetalliance.com/press 110507.html.
[21] J. Rosenberg. (2012) ”Google Play hits 25 billion downloads” [online].
http://officialandroid.blogspot.ca/2012/09/google-play-hits-25-billion-downloads.html.
[22] Canalys. (2011) ”Googles Android becomes the worlds leading smart phone platform” [on-
line]. http://www.canalys.com/newsroom/google%E2%80%99s-android-becomes-world%E2%80%99s-
leading-smart-phone-platform.
79
[23] A. Russakovskii. (2012) ”Custom ROMs For Android Explained - Here Is Why You Want Them” [on-
line]. http://www.androidpolice.com/2010/05/01/custom-roms-for-android-explained-and-why-you-want-
them/.
[24] AForge.NET. (n.d.). Retrieved 01 12, 2012, from http://www.aforgenet.com/framework/features/
[25] OpenCV. ”Image Processing” [online]. http://docs.opencv.org/modules/imgproc/doc/imgproc.html.
[26] MathWorks. ”Image Processing Toolbox” [online]. http://www.mathworks.com/products/image/.
[27] Fixational. (2012). ”OpenCV vs MATLAB” [online]. http://blog.fixational.com/post/19177752599/opencv-
vs-matlab.
[28] U. Sinha. (2012). ”Why OpenCV?” [online]. http://blog.fixational.com/post/19177752599/opencv-vs-
matlab.
[29] A. Curylo. (2012). ”OpenCV for iOS OFFICIAL” [online]. http://www.alexcurylo.com/blog/2012/07/11/opencv-
for-ios-official/.
[30] R. Paul. (2011). ”Mono for Android framework lets C# developers tame the Droid” [online].
http://arstechnica.com/gadgets/2011/04/mono-for-android-framework-lets-c-developers-tame-the-droid/.
[31] Google Inc.. (2013). ”Google Goggles” [online]. http://www.google.com/mobile/goggles/#text.
[32] R. Theis. (2013). ”OCR Test” [online]. Google Play Store. https://play.google.com/store/apps/details?id=edu.sfsu.cs.orange.ocr&hl=en.
[33] ABBYY. (n.d.). ”ABBYY Business Card Reader” [online]. http://www.abbyy.com/products/bcr/.
[34] J. Richardson. (2012). ”ABBYY Business Card Reader and ABBYY CardHolder – scan business cards with
your iPhone” [online]. http://www.iphonejd.com/iphone jd/2012/02/review-abbyy-cardholder.html.
[35] M. Gargenta. (2009). ”Using NDK to Call C code from Android Apps” [online].
http://www.aishack.in/2010/02/why-opencv/l.
[36] OpenCV. ”Changing the contrast and brightness of an image!” [online].
http://docs.opencv.org/doc/tutorials/core/basic linear transform/basic linear transform.html.
80
[37] W. Jarosz. ”Fast Image Convolutions” [online]. http://www.acm.uiuc.edu/siggraph/workshops/wjarosz convolution 2001.pdf.
[38] E. Reinhard, ”High dynamic range imaging: Acquisition, Display, and Image-Based Lighting”, Morgan
Kaufmann, pp. 233234, 2006.
[39] E. Arias-Castro and D. L. Donoho, ”Does median filtering truly preserve edges better than linear filter-
ing?”, Annals of Statistics, vol. 37, no. 3, 2009.
[40] C. Tomasi and R. Manduchi, ”Bilateral Filtering for Gray and Color Images”, Proceedings of the 1998
IEEE International Conference on Computer Vision, Bombay, India, 1998.
[41] L. G. Shapiro and G. C. Stockman, ”Computer Vision”, Prentice-Hall, New Jersey, pp. 279-325, 2001.
[42] D.L. Pham, ”Current Methods in Medical Image Segmentation”, Annual Review of Biomedical Engineer-
ing, 2000.
[43] R. Ohlander et al, ”Picture Segmentation Using a Recursive Region Splitting Method”, Computer Graphics
and Image Processing, 1998.
[44] J. Canny, ”A Computational Approach to Edge Detection”, Pattern Analysis and Machine Intelligence,
IEEE Transactions on, Vols. PAMI-8, no. 6, pp. 679-698, 1986.
[45] R. Wu et al., ”A Text Image Segmentation Method Based on Spectral Clustering”, Harbin Institute of
Technology, 2008.
[46] J. Shi and J. Malik, ”Normalized Cuts and Image Segmentation”, UC Berkeley, 2000.
[47] ”Loading Large Bitmaps Efficiently” [online]. http://developer.android.com/training/displaying-
bitmaps/load-bitmap.html.
[48] ”Cygwin” [online]. http://www.cygwin.com/.
[49] S. Loughran et al, ”Ant in Action”, 2nd Ed, July 12, 2007.
[50] P. Hyde, ”Java thread programming”, Sams Pub., Indianapolis, Ind, 1999.
81
[51] Tesseract 3.02. ”Tesseract wiki” [online]. http://code.google.com/p/tesseract-ocr/w/list.
82