mohamed salman ismail gadit - thesis.pdf

95
Optical Character Recognition (OCR) based Business Card Reader on Android Operating System Mohamed Salman Ismail Gadit April 5, 2013 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING I N PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF BACHELOR OF ENGINEERING NATIONAL UNIVERSITY OF SINGAPORE i

Upload: testerdrept

Post on 08-Jul-2016

255 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mohamed Salman Ismail Gadit - Thesis.pdf

Optical Character Recognition (OCR) based Business Card

Reader on Android Operating System

Mohamed Salman Ismail Gadit

April 5, 2013

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF

BACHELOR OF ENGINEERING

NATIONAL UNIVERSITY OF SINGAPORE

i

Page 2: Mohamed Salman Ismail Gadit - Thesis.pdf

Abstract

This project deals with the use of Optical Character Recognition (OCR) technology for effective translation of

business cards that are photo-captured under normal environmental noise. The scope of this project branches out

into improving the current OCR engine, removing environmental noise with effective pre-processing, stream-

lining usable information after conversion by effective post-processing and putting all these features together in

a user-friendly mobile application for the end user.

The report starts with a detailed timeline for the project, listing out various short term and long term goals

on the road towards completion, a literature review that highlights the need for such a technology, quantitative

and qualitative improvements over competitor apps and justification behind the choice of mobile platform that

is being used. The principles and guides to be adhered to, have been highlighted along with relevant work

which has been accomplished. The review concludes with a summary of the current improvements possible

over competing technologies.

The next part highlights contributions accomplished during the course of project development. Short term

and long term goals have been discussed in detail. The final vision for the product, consisting of both technical

and directional goals, has been discussed and justified. The next few chapters highlight individual contributions

to the main application, along with the challenges faced and solutions to overcome them. The results of the

applications performance as compared to competitor benchmarks has been documented and explained. Various

development and design bottlenecks observed have been listed out and improvements have been suggested

accordingly for future iterations. The report concludes with a section on future prospects for the app.

ii

Page 3: Mohamed Salman Ismail Gadit - Thesis.pdf

Acknowledgements

This Final Year Project could not have been taken on and completed in its current state without the help and

contribution of the following people, who have from time to time provided direction, critical advice, viable

alternatives and praise. I’m truly grateful for their assistance and contribution towards the culmination of this

final year project.

Our supervisor, Professor Ko Chi Chung - Prof Ko has played a pivotal role in the way this project has

worked out its course. Right from the beginning, he gave us the imaginative freedom to take the resources that

were given to us and apply it to any imaginable extent. Through each stage of assessment, he gave us his critical

opinions and advice on the way we were crafting this mobile application and implementing the innovations

mentioned in this paper. While giving us the freedom, Prof Ko also helped us tie our ideas down in the core

of practicality by reminding us about constraints and suggesting ways to tackle them. His supervision of the

project is truly appreciated.

My team mates - Arnab Ghosh, Aravindh Ravishankar, Varun Ganesh - Without these gentlemen, this

project would not stand on its own anything like its present-day iteration. Right from day one, my team mates

and I have worked together in a cohesive unit to come up with a wide array of viable, and sometimes extensive,

visions for the way this project was going to turn out. Bouncing ideas off each other, we were able to distill out

the best possible option for us given our practical constraints of time and resources. My team mates have also

helped to motivate me during periods where my results weren’t as promising and my progress was slow. It has

been my pleasure to work with these gentlemen realizing a vision that we shared together, right from the start,

into the app that has resulted from countless hours that we have worked on it.

National University of Singapore - My gratitude to the university for giving me the opportunity to work on

this project.

iii

Page 4: Mohamed Salman Ismail Gadit - Thesis.pdf

My examiner, Professor Lawrence Wong - While working towards a larger goal, it is often possible to lose

focus on the smaller issues at hand. I value Prof Wong’s valuable and critical advice during CA2 where he

pointed out the aspects of the project that needed a closer attention. His recommendations helped me take a

closer look at the project and strengthen some aspects of the presentation as well.

To my family and friends - Who stood by me, encouraged me and brought me to the end of a wonderful

journey both in university, and in this project.

Thank you, one and all.

iv

Page 5: Mohamed Salman Ismail Gadit - Thesis.pdf

Contents

Abstract ii

Acknowledgements iii

List of Figures ix

List of Tables xii

List of Abbreviations xiii

Part I - Literature Review 1

1 Introduction 2

1.1 Need for OCR in business cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Need for mobile OCR business card readers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Need for offline mobile OCR business card readers . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Project goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Literature Review 5

2.1 Application Oriented OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Tesseract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Tesseract Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Cube Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.6 Image Processing Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.6.1 AForge.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.6.2 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6.3 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6.4 Comparison of Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

v

Page 6: Mohamed Salman Ismail Gadit - Thesis.pdf

2.7 Current Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.7.1 Google Goggles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.7.2 OCR Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.7.3 ABBYY Business Card Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Part II - Technical Details 18

3 Project Decision 19

3.1 Understanding the Tesseract OCR Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Windows WPF and Windows Phone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Moving to Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Application Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Image Processing 23

4.1 Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 Smoothing Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2.1 Homogeneous Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2.2 Gaussian Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2.3 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2.4 Bilateral Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 Combining Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.4 ScopeMate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Image Segmentation 31

5.1 Background Colour Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.1.1 Objectives of Background Colour Segmentation . . . . . . . . . . . . . . . . . . . . . . 32

5.1.2 Alternatives Explored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.1.3 Background colour Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 34

vi

Page 7: Mohamed Salman Ismail Gadit - Thesis.pdf

5.2 Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2.1 Objectives of Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2.2 Alternatives Explored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.2.3 Text Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3.1 Objectives of Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3.2 Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3.3 Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4 Region of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6 Optical Character Recognition in Android 48

6.1 Managing Images in Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.2 Integrating Tesseract with Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.3 Multi-threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.4 Using the CUBE libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Part III - Results 56

7 Segmentation Results 57

7.1 Background Colour Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.1.1 Results of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.1.2 Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7.2 Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7.2.1 Results of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7.2.2 Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

8 Performance Results 62

8.1 Resolution vs. Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

vii

Page 8: Mohamed Salman Ismail Gadit - Thesis.pdf

8.2 Resolution vs. Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

9 App Results 67

9.1 NUS Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

9.1.1 Clear Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

9.1.2 Unclear Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

9.2 External Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

9.2.1 Clear Light Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

9.2.2 Unclear Light Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

9.2.3 Clear Dark Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

9.2.4 Unclear Dark Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

9.2.5 Colored Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

9.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

10 Conclusion 77

viii

Page 9: Mohamed Salman Ismail Gadit - Thesis.pdf

List of Figures

1 OCR technology patent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Tesseract Working Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Importance of Adaptive Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Original Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Using only Tesseract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6 Using Tesseract and Cube Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

7 Android OS running ”Jellybean” (version 4.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

8 Google Goggles OCR Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

9 OCR Test app being used in continuous preview mode . . . . . . . . . . . . . . . . . . . . . . 16

10 ABBYY Business Card Reader Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

11 Android Native Development Kit [35] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

12 Scope application workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

13 Application of the Brightness Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

14 Application of the Homogeneous Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

15 1D Gaussian Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

16 Application of the Gaussian Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

17 Application of a Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

18 Application of a Bilateral Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

19 ScopeMate main screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

20 Image Segmentation Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

21 Background Segmentation Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

22 NUS card used for this example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

23 Histogram of NUS card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

24 Image masking applied to histogram bin from 112 to 127 . . . . . . . . . . . . . . . . . . . . . 36

25 Image masking applied to histogram bin from 240 to 255 . . . . . . . . . . . . . . . . . . . . . 37

ix

Page 10: Mohamed Salman Ismail Gadit - Thesis.pdf

26 Contour analysis applied to histogram bin from 112 to 127 . . . . . . . . . . . . . . . . . . . . 37

27 Contour analysis applied to histogram bin from 240 to 255 . . . . . . . . . . . . . . . . . . . . 38

28 Workflow for Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

29 Business card used to demonstrate Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . 40

30 After applying Median Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

31 After applying adaptive threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

32 After applying strong dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

33 After applying weak erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

34 Full contour analysis of card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

35 Text segmentation without clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

36 Clustering feedback loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

37 Clustering algorithm issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

38 Clustering algorithm flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

39 Text segmentation with clustering applied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

40 Region of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

41 Bitmap handler - high resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

42 Bitmap handler - medium resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

43 Bitmap handler - low resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

44 Scope integrated with Tesseract and OpenCV libraries . . . . . . . . . . . . . . . . . . . . . . . 52

45 Single-threaded model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

46 Multi-threaded model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

47 Tesseract with OSD segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

48 Asynchronous loading of CUBE libraries in Scope . . . . . . . . . . . . . . . . . . . . . . . . 55

49 Effect of CUBE libraries: (a) Original Text (b) Only Tesseract (c) CUBE and Tesseract together 55

50 Working test case: Background Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

51 Boundary test case: Background Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 57

x

Page 11: Mohamed Salman Ismail Gadit - Thesis.pdf

52 Failing test case: Background Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

53 Working test case: Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

54 Boundary test case: Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

55 Failing test case: Text Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

56 Card used for performing resolution testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

57 Graph of resolution testing for speed results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

58 Graph of resolution testing for accuracy results . . . . . . . . . . . . . . . . . . . . . . . . . . 65

59 Image used for NUS Clear Card test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

60 Image used for NUS Unclear Card test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

61 Image used for External Card with Light Background . . . . . . . . . . . . . . . . . . . . . . . 70

62 Image used for External Unclear Card with Light Background . . . . . . . . . . . . . . . . . . 71

63 Image used for External Clear Card with Dark Background . . . . . . . . . . . . . . . . . . . . 72

64 Image used for External Unclear Card with Dark Background . . . . . . . . . . . . . . . . . . . 73

65 Image used for External Card with Mixed Background . . . . . . . . . . . . . . . . . . . . . . 74

66 Graph of accuracy for NUS cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

67 Graph of accuracy for external cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

xi

Page 12: Mohamed Salman Ismail Gadit - Thesis.pdf

List of Tables

1 Relationship between pixel size and character accuracy . . . . . . . . . . . . . . . . . . . . . . 9

2 Effect of text segmentation on result accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3 Resolution vs. Time Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4 NUS Clear Card Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5 NUS Unclear Card Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 External Card with Light Background Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7 External Unclear Card with Light Background Results . . . . . . . . . . . . . . . . . . . . . . 71

8 External Clear Card with Dark Background Results . . . . . . . . . . . . . . . . . . . . . . . . 72

9 External Unclear Card with Dark Background Results . . . . . . . . . . . . . . . . . . . . . . . 73

10 External Card with Mixed Background Results . . . . . . . . . . . . . . . . . . . . . . . . . . 74

xii

Page 13: Mohamed Salman Ismail Gadit - Thesis.pdf

List of Abbreviations

ABBYY - A Russian software company, headquartered in Moscow, that provides optical character recognition,

document capture and language software for both PC and mobile devices BMP - Bitmap Image File Format

GUI - Graphical User Interface

HP - Hewlett-Packard

ICR - Intelligent Character Recognition

JNI - Java Native Interface

NCut - Normalized Cut

NDK - Native Development Kit

OCR - Optical Character Recognition

OS - Operating System

OSD - Orientation and Screen Detection

ROI - Region of Interest SDK - Software Development Kit

TIFF - Tagged Image File Format

UTF-8 - Universal Character Set Transformation Format - 8-bit

WPF - Windows Presentation Foundation

xiii

Page 14: Mohamed Salman Ismail Gadit - Thesis.pdf

Part I.

Literature Review

1

Page 15: Mohamed Salman Ismail Gadit - Thesis.pdf

1 Introduction

It is the 21st century and the human species is becoming increasingly dependent on the devices that it built.

Transistor count is still following an upward trend on Moores law and the processing power of devices have

never been better [1].

Human-machine interaction has conventionally been thought of as a one-way input system where the human

understands the machine and gives it explicit commands to function. However, with the current increasing trend

in human-machine interaction and the increasing processing power of new age devices, the need for devices

to understand the human world too is becoming increasingly important. Optical Character Recognition (OCR)

technology tries to bridge this gap by giving devices the power to understand characters and languages from the

human world.

OCR technology was first used by libraries for historic newspaper digitization projects in the early 1990s.

An initial experiment at the British Library with the Burney collection and a cooperative project in Australia

(the ACDP) with the Ferguson collection were both considered unsuccessful largely due to the difficulties

with OCR technology and historic newspapers[2]. Fast-forwarding two decades, OCR software accuracy has

improved drastically and they are being currently used in a wide array of applications such as data entry, number

plate recognition, assisting the visually impaired, importing information from business cards, etc [3].

1.1 Need for OCR in business cards

As already mentioned, OCR software had initially been developed to digitize newspapers and library books.

The images of documents that were digitized by this process were obtained using commercial scanners, where

the light intensity distribution was uniform and the image consisted of standard black font styles printed on a

white background. However, once the number of people who exchanged business cards increased, there was an

increasing need to introduce OCR for business cards too. Chip Cutter, editor at LinkedIn, writes that although

the use of paper has drastically decreased in the current day and age, the convention of swapping business cards

at the end of a conversation still prevails high even among the tech savvy attendees of TED 2013 (Technology,

Entertainment and Design global conference) [4].

2

Page 16: Mohamed Salman Ismail Gadit - Thesis.pdf

He continues to explain in his article on why people might still prefer swapping business cards instead of

sharing contacts on their mobile phones and thus puts forward four points to support his argument: business

cards are easy to use, quick to share, have a small learning curve and showcase the owners creativity. However,

one of the main problems with swapping business cards is maintainability and searching. The business cards

tend to wear out with time and are very difficult to search through during the time of need [5]. Thus, the

problem here is not people wanting to move away from the use of conventional business cards, but rather,

the need for a solution to maintain and manage all of their business cards over time. This signals a need to

digitize business cards using OCR applications in order to make them more maintainable and to make searching

through them easier. Some of the challenges in using OCR applications on business cards as compared to their

earlier implementation to read documents include: light distribution is non-uniform because of the environment

in which the photo of the card is captured, and the font, colours and arrangement of letters do not follow a

standard pattern across business cards.

1.2 Need for mobile OCR business card readers

Many OCR business card readers have been developed as standalone software solutions on the PC (Personal

Computer) [6]. Recently, students at the National University of Singapore, B.K. Ng & Jackson yeow, have

also developed an OCR system named SCORE and a business card reader B.SCORE on the PC platform [7].

B.SCORE was a follow up of the initial SCORE platform that was coded in C#, using the Windows Visual Basic

architecture.

However in OCR applications on the PC, the images have to be uploaded manually or should be captured

directly using the PCs webcam. The quality of OCR depends on the quality of the image captured [8]. The image

quality of a standard desktop webcam is only around 1.3 Megapixel whereas an average smartphone camera can

offer a higher range of image quality ranging from 5 megapixels and upward. Therefore the accuracy of OCR

on images captured using smartphones would be better than those captured using webcams.

World-wide smartphone users have already topped 1 billion, and this trend is only set to increase In the near

future [9]. Moreover since most of the contacts are maintained on the phone, capturing a photo using a camera,

3

Page 17: Mohamed Salman Ismail Gadit - Thesis.pdf

then applying OCR on a PC, and then transferring back the recognized data to the phone might be cumbersome

when compared to carrying out this entire process chain on the smartphone directly. This would also give the

user the ability to digitize business cards anywhere, as opposed to only in the presence of a personal computer.

1.3 Need for offline mobile OCR business card readers

Most of the current mobile OCR applications need internet access in order to perform character recognition. A

detailed comparison of feature sets available in current mobile OCR business card readers has been performed

under the literature review.

However, this constrains users to only use their mobile OCR applications in the presence of an active internet

connection, thereby limiting the mobile aspect of the OCR application itself. Expecting users to be connected

to the internet at all times when they might exchange business cards, might be unrealistic. However with the

increasing processing power in todays smartphones [10], the ability to realize the entire OCR process by just

using the mobile phones processor can be a reality. Therefore, a robust, efficient, offline and mobile OCR

business card application would be the solution to realizing the business card digitizing needs of the current day

smartphone user.

1.4 Project goal

Therefore, the goal of this project is to develop a robust, efficient and offline mobile application that uses Optical

Character Recognition to automatically digitize business cards into maintainable and searchable mobile phone

contacts.

The team has planned to call this application Scope. In this project the team’s goal is to achieve a minmum

of 90% accuracy for NUS cards, and 75% accuracy for any external cards.

4

Page 18: Mohamed Salman Ismail Gadit - Thesis.pdf

2 Literature Review

Optical Character Recognition, abbreviated to OCR, is the mechanical or electronic conversion of scanned

images of handwritten, typewritten or printed text into machine-encoded text. It is widely used as a form of data

entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of

printed records [11].

It is a common method of digitizing printed texts so that they can be electronically searched, stored more

compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and

text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Figure 1: OCR technology patent

The OCR technology was developed in the 1920s and remains an area of interest and concentrated research

to date. Systems for recognizing machine-printed text originated in the late 1950s and there has been widespread

use of OCR on desktop computers since the early 1990s [12].

The OCR technology enables users to liberate large amount of information held captive in hard copy form.

Once converted to electronic form, information can be edited and extracted easily according to users needs.

5

Page 19: Mohamed Salman Ismail Gadit - Thesis.pdf

2.1 Application Oriented OCR

Since OCR technology has been more and more widely applied to paper-intensive industry, it is facing more

complex image environments in the real world. For example: complicated backgrounds, degraded-images,

heavy-noise, paper skew, picture distortion, low-resolution, disturbed by grid & lines, text image consisting of

special fonts, symbols, glossary words and etc. All the factors affect OCR products stability in recognition

accuracy.

In recent years, the major OCR technology providers began to develop dedicated OCR systems, each for

special types of images. They combine various optimization methods related to the special image, such as

business rules, standard expression, glossary or dictionary and rich information contained in colour images, to

improve the recognition accuracy.

Such strategy to customize OCR technology is called Application-Oriented OCR or ”Customized OCR”[13],

widely used in the fields of Business-card OCR, Invoice OCR, Screenshot OCR, ID card OCR, Driver-license

OCR or Auto plant OCR, and so on. For the purpose of this project, the application, Scope, will make use of an

open-source OCR engine, called Tesseract, that can be ported onto a mobile phone

2.2 Tesseract

Tesseract is a free, open-source optical character recognition engine which can be used across various operating

systems. The engine was originally developed as proprietary software at Hewlett-Packard between 1985 and

1995, but it was then released as open source in 2005 by Hewlett-Packard and University of Nevada, Las Vegas

[14]. Tesseract development has been sponsored by Google since 2006[15].

The Tesseract algorithm is illustrated in Figure 2 [16]. A grayscale or colour image will be loaded into

the engine and processed. The program takes .tiff (TIFF) and .bmp (BMP) files but plug-ins can be installed

to allow processing of other image extensions. As there is no rectification capability, the input image should

ideally be a flat image from a scanner.

In the adaptive thresholding process, the engine performs the reduction of a grayscale image to a binary

image. The algorithm assumes that there are foreground (black) pixels and background (white) pixels and

6

Page 20: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 2: Tesseract Working Flowchart

Figure 3: Importance of Adaptive Thresholding

7

Page 21: Mohamed Salman Ismail Gadit - Thesis.pdf

calculates the optimal threshold that separates the two pixel classes such that the variance between them is

minimal .

Following that, Tesseract searches through the image and identifies foreground pixel and marks them as

potential character. Lines of texts are found by analysing the image spaces adjacent to the potential character.

For each line, the baselines are found and Tesseract examines them to find the appropriate height across the line.

Characters that lie out of this appropriate height or are not of uniform width are reclassified to be processed in

an alternate manner[17].

After nding all of the possible characters in the document, Tesseract does word recognition word by word,

on a line by line basis. Words are then passed through a contextual and syntactical analyser which will then

produce an editable .txt file in the tesseract folder. The tesseract folder is where all the source codes are located

and where the main engine is run. In addition Tesseract is also able to undergo training in order to recognize

special characters, other than alphabets. The Scope application would need Tesseract to recognize numbers and

certain symbols such as @ and +.

2.3 Tesseract Shortcomings

Despite the numerous contributions from many developers over the years, Tesseract performance suffers from

many shortcomings and restraints. OCR accuracy falls drastically if the image processed has a coloured back-

ground. The design and layout of the name cards and websites affects the precision adversely.

Another challenge faced by Tesseract is text size of the images. The Tesseract Frequently Asked Questions

(FAQ) [18] page states the noise-reduction mechanisms can and will hinder the processing of small text sizes.

In order to achieve notable results, text sizes should typically be around 20 pixels and any text under 8 pixels

will be recognized as noise and filtered off.

Table 1 illustrates the relationship between pixels size and character accuracy.

Essentially, Tesseract is a raw skeleton OCR engine with the core feature of text recognition. It does not

come with any GUI, consists of no page layout analysis, no output formatting and is lacking of additional

features.

8

Page 22: Mohamed Salman Ismail Gadit - Thesis.pdf

Image Dimension (pixels) Character Accuracy (%)

255x285 4.61

384x429 98.12

1024x1087 99.49

2048x2289 99.15

Table 1: Relationship between pixel size and character accuracy

2.4 Cube Libraries

The current OCR detection using Tesseract 3.02 simply translates the image to text, but does not take into

account the relationship behind the identified letters and word formations in order to provide an intelligent

result. The Cube libraries when used along with Tesseract, help in improving the contrast between the words

and the background in images and help boost the performance of OCR recognition.

The key features of CUBE libraries include:

• Performing adaptive thresholding prior to OCR, to improve text contrast..

• Windowed Segmentation to improve word recognition, by recognizing smaller pieces of image first and

stitching them together later.

• Compare translated junk data to a dictionary database and the most frequently used words in a particular

language, so as to retrieve data lost due to noise.

A comparison of results on using Tesseract and on using Cube without any pre-processing, has been illus-

trated below.

9

Page 23: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 4: Original Image

Figure 5: Using only Tesseract

Figure 6: Using Tesseract and Cube Libraries

10

Page 24: Mohamed Salman Ismail Gadit - Thesis.pdf

2.5 Android

Android is a Linux-based operating system designed primarily for touchscreen mobile devices such as smart-

phones and tablet computers. Initially developed by Android, Inc., whom Google financially backed and later

purchased in 2005[19]. Android was unveiled in 2007 along with the founding of the Open Handset Alliance: a

consortium of hardware, software, and telecommunication companies devoted to advancing open standards for

mobile devices. The first Android-powered phone was sold in October 2008 [20].

Figure 7: Android OS running ”Jellybean” (version 4.2)

Android is open source and Google releases the code under the Apache License. This open source code

and permissive licensing allows the software to be freely modified and distributed by device manufacturers,

wireless carriers and enthusiast developers. Additionally, Android has a large community of developers writing

11

Page 25: Mohamed Salman Ismail Gadit - Thesis.pdf

applications (”apps”) that extend the functionality of devices, written primarily in a customized version of the

Java programming language. In October 2012, there were approximately 700,000 apps available for Android,

and the estimated number of applications downloaded from Google Play, Android’s primary app store, was 25

billion[21].

These factors have allowed Android to become the world’s most widely used smartphone platform and

the software of choice for technology companies who require a low-cost, customizable, lightweight operating

system for high tech devices without developing one from scratch[22]. As a result, despite being primarily

designed for phones and tablets, it has seen additional applications on televisions, games consoles and other

electronics. Android’s open nature has further encouraged a large community of developers and enthusiasts to

use the open source code as a foundation for community-driven projects, which add new features for advanced

users or bring Android to devices which were officially released running other operating systems[23].

The open source nature of the Android platform and thereby, the culture of the community makes Android

a great choice for using with the Tesseract OCR Engine. Using Androids Java Native Interace (JNI) and Native

Development Kit (NDK), the conversion of the C++ code in Tesseract into managed code in Java which is

recognised by Android becomes a manageable task and this allows the full functionality of Tesseract to be

explored with the immensely added advantage of an Android mobile operating system. In addition, development

and deployment of application on Android and consequently the Google Play store is a smooth process. Thus,

taking into consideration these various factors, the author has decided to specialise his project, Scope, for the

Android platform.

2.6 Image Processing Libraries

2.6.1 AForge.NET

AForge.NET is a C# framework designed for developers and researchers in the fields of Image Processing,

Computer Vision, and Artificial Intelligence. AForge.Imaging which is the biggest library of the framework,

contains different image processing routines, which are aimed to help in image enhancement or processing as

required various computer vision tasks.

12

Page 26: Mohamed Salman Ismail Gadit - Thesis.pdf

The library consists of a wide array of filters to perform various colour correction, convolution, binarization

and thresholding operations. In addition to these, AForge also offers methods to perform edge detection and

feature extraction with Hough Transform analysis[24].

The functions are extensively documented and online help in the form of user forums and sample code

snippets is readily available. The libraries are constantly updated with new versions being frequently released.

2.6.2 OpenCV

OpenCV (Open Source Computer Vision Library) is an open source C/C++ library for image processing and

computer vision developed by Intel. It is a library of programming functions mainly aimed at real time image

processing. It is free for both commercial and non-commercial use[25].

OpenCV was originally written in C but now has a full C++ interface and all new development is in C++.

There is also a full Python interface to the library. Recently the OpenCV4Android SDK was developed and

released to enable using OpenCV functionality in Android applications.

OpenCV offers a comprehensive collection of image processing capabilities that surpasses that offered by

AForge. OpenCV in addition to supporting various image filters, transformations and thresholding mechanisms,

presents us with the ability to identify, compare and manipulate histograms in order to perform intelligent and

automated processing of images.

The most important feature of OpenCV is that it allows complex matrix operations to be performed on

images. This enables developers dealing with image processing to perform actions with greater understanding

and control over what is being done.

2.6.3 MATLAB

The Image Processing Toolbox that is included with MathWorks MATLAB provides a comprehensive set of

reference-standard algorithms, functions, and applications for image processing, analysis, visualization, and

algorithm development.

Operations that can be performed include image enhancement, image de-blurring, feature detection, noise

13

Page 27: Mohamed Salman Ismail Gadit - Thesis.pdf

reduction, image segmentation, geometric transformations, and image registration. Many toolbox functions are

multithreaded to take advantage of multicore and multiprocessor computers[26].

MATLAB is a high-level scripting language meaning that it will take care of lower-level programming issues

such as declaring variables and performing memory management without the user having to worry about it. This

essentially makes MATLAB an easier language to get familiar with faster and allows the user to quickly piece

together a small amount of programming code to prototype an image processing operation [27].

2.6.4 Comparison of Libraries

AForge and MATLAB are more generic processing libraries that cater to a variety of requirements whereas

OpenCV was built with the main focus on image manipulation. Hence its code is highly optimised for this

purpose. It provides basic data structures for matrix operations and image processing and offers more extensive

functions when compared to the other two libraries.

Since AForge and MATLAB were built on C# and Java which are in turn built on C, they are higher level

languages. Though this means that memory management or other lower-level programming issues will be taken

care of, it also means that the processor will be kept more busy trying to interpret the higher level language,

turning it into lower-level C code and finally executing that code[28].

OpenCV however, is essentially a library of functions written in C. Which means it is closer to providing the

computer machine level code for it to execute. Ultimately more image processing is done during the computers

processing cycles, and less interpreting. As a result of this, programs written in OpenCV run much faster than

similar programs written in MATLAB or AForge.

Moreover, OpenCV is available for use on multiple mobile platforms such as iOS[29], Android and Win-

dows. This is a crucial factor in choosing the library as the development of the Scope application is intended

to be based on the Android platform. MATLAB does not at present provide an SDK for Android development

whereas AForge will require an external framework such as Mono for it be ported onto Android[30].

Thus given all these issues, it was decided that the Scope application would incorporate the OpenCV library

for its image processing operations, primarily due to its speed and efficiency of processing and its cross-platform

14

Page 28: Mohamed Salman Ismail Gadit - Thesis.pdf

compatibility which allow for easier expansion of the application in the future.

2.7 Current Alternatives

Such an implementation of the OCR technology isnt the first to show up on the Android market, and listed

below are the closest competitors to the Scope application.

2.7.1 Google Goggles

Figure 8: Google Goggles OCR Scenario

Google dominates Android market in general and its no different when it comes to OCR applications.

Google Goggles [31] is a visual search app that allows users to take pictures of items that they want to obtain

more information about. Among the various types of input that Goggles accepts, business cards are one of them.

The application captures the text areas in the image of the business card and sends it to Googles OCR engine

in the cloud. The parsed information is then pushed back to the phone, which recognises the information as a

contact and shows the user relevant contextual menus.

One of the main drawbacks of the app is that it requires an active internet connection to be able to carry

out any OCR processing. This means that the results churned out by the application are subject to delays and

15

Page 29: Mohamed Salman Ismail Gadit - Thesis.pdf

lags faced in sending the photo, processing the photo, compiling the results and pushing out the results over the

cellular network service of the user.

2.7.2 OCR Test

Figure 9: OCR Test app being used in continuous preview mode

OCR Test is an experimental app that attempts to harness the power of Tesseract and use it for OCR on

Android [32]. This app runs the Tesseract engine on the users device–without uploading your images to a

server–and is suitable for recognizing individual words or short phrases of text. The app also offers translation

services of the recognised text, which is powered by Google/Bing Translate.

The default single-shot capture runs OCR on a snapshot image that is captured when the user clicks the

shutter button, like a regular photo. The application also offers a continuous preview option while taking a

picture in which it shows a dynamic, real-time display of what the device is recognizing right beside the camera

viewfinder. An on-screen resize-able viewfinder box allows the user to focus on one word or phrase at a time

and the recognised text is displayed in the top right corner of the window. The continuous preview mode works

best on a fast device.

While the application is a decent attempt at using the Tesseract engine to carry out the processing, its

accuracy is thwarted Tesseracts limitations (see: Tesseract Shortcomings) making the operation of this app a

hit-or-miss scenario, where the best results are produced only when the ideal image conditions are met.

16

Page 30: Mohamed Salman Ismail Gadit - Thesis.pdf

2.7.3 ABBYY Business Card Reader

Figure 10: ABBYY Business Card Reader Interface

By far, the most competitive solution in the Android marketplace currently is the ABBYY Business Card

Reader [33]. Developed by the Russian company ABBYY, the applications built-in optical character recognition

allows the user to quickly receive precise results. The application also supports a database of 20 different

languages, which is used to translate business cards in various languages to English. Although the application

works using its built-in OCR engine, connecting to the network is required to authorise licenses and actually

use the app. This adds for a slight inconvenience for users who arent connected through their phones [34].

Added functionalities like searching for more information on social networks also depend on network con-

nectivity. The application, however, fails to recognise words accurately in a number of scenarios, for instance,

if the background is in black and the text is in white. Special symbols (@, #, $ etc.) also are another issue.

The application circumvents this problem by highlighting the characters which it is unsure of and providing

alternatives for the user to choose from.

17

Page 31: Mohamed Salman Ismail Gadit - Thesis.pdf

Part II.

Technical Details

18

Page 32: Mohamed Salman Ismail Gadit - Thesis.pdf

3 Project Decision

3.1 Understanding the Tesseract OCR Engine

The most important task to start with was to understand the heart of this entire project, the OCR Engine devel-

oped by HP and now by Google Labs Tesseract. It was voted in as one of the 13 best OCR Engines in existence

and is considered one of the most accurate free software OCR engines currently available.

The most recent change is that Tesseract can now recognize 60 languages, is fully UTF8 capable, and is

fully trainable. This makes Tesseract an extremely powerful free tool to use in this project.

Tesseract 3.0.2 is the latest version of the engine with a whole new set of languages it supports, and revolu-

tionary features that allow it to recognize text from any angle. With this, we feel like its a good starting point to

make this project as accurate as possible.

In Java, the Tesseract engine is encapsulated by the TessBaseApi class. Below I have outlined some of the

public methods and their functionalities, so as to understand the public scope of the Tesseract engine in Java.

TessBaseAPI.init - Initializes the Tesseract engine with a specified language model

TessBaseAPI.getInitLanguagesAsString - Returns the languages string for last valid initialization

TessBaseAPI.clear - Frees up recognition results and any stored image data

TessBaseAPI.end - Closes down tesseract and free up all memory

TessBaseAPI.setPageSegMode - Sets the page segmentation mode

TessBaseAPI.setImage - Provides an image for Tesseract to recognize

TessBaseAPI.getUTF8Text - The recognized text is returned as a String as UTF8

3.2 Windows WPF and Windows Phone

The first step was to try building the Tesseract Engine on Windows and replicating simple code that could

allow images to be converted into text. The team planned on port it into a complete Windows system including

Windows Phone and Windows 8. When working on the WPF (Windows Presentation Foundation) application

19

Page 33: Mohamed Salman Ismail Gadit - Thesis.pdf

the results were good, the author used the .NET wrapper for Tesseract and created a simple project and managed

to convert clear text into letters. However, there were memory issues at times which indicated errors with the

wrapper that was being used.

The next step was trying to port this into Windows Phone. After building several libraries, the author

discovered a fundamental flaw Windows Phone does not accept code in C++ or C, only C#. Since Tesseract

had been written completely in C++ and C#, it could never be run on the current version of Windows Phone

(7.5). However Microsoft has promised C and C++ support in Windows Phone 8 which was released to the

public on 29th October 2012.

Therefore, the team would have to move into a platform which accepted native code and allowed us to use

the Tesseract Engine.

3.3 Moving to Android

Moving to iOS was out of question there was no way the team could get it to work with all the restrictions

Apple Inc. has over it. In addition, the pre-requisite of having a Mac to code in iOS made this an all the more

easier decision. Thus the resultant move was to port to the Android platform.

The biggest challenges here were to get used to the Java programming language which Android uses. In

addition, since Tesseract is written in C++, the team would have to use one additional layer to convert the C++

code into Java managed code. This layer is called the JNI and is converted by using the Android NDK.

As seen from the figure above, the NDK can convert the C++ code into to the Java Native Library and then

the JNI. This was essential for Tesseract. To make this work, the author had to also use Apache Ant, a Java

builder that can combine the JNI and Java and build a final application compatible for Android and other Java

projects.

Once this was ready there was now a Tesseract Engine Library that was working on Android. Now the

remaining task was simply to make it work in the final application.

20

Page 34: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 11: Android Native Development Kit [35]

3.4 Application Workflow

The workflow for the app is given in figure 12.

Up on receiving the photographed or uploaded image, the application proceeds to perform an edge detection

operation to identify the portion of the card from the image. Following this an automatic rotation is carried out

which aims to align the identified card in parallel with the horizontal plane. This is done in order to facilitate

effective text recognition. The cropped image of the card is then passed through functions which will segregate

the card in terms of differences in background colours for the case of cards containing coloured patterns. These

segments are then subjected to image processing filters which will clean the image by removing noise and

undesirable pixels or dust. The cleaned segments are further divided into smaller segments based on text on the

card which are grouped together in a process known as text segmentation. The various text segments are then

fed to the Tesseract OCR engine which will recognise and convert the segments to machine readable text. The

final step involves parsing through the text which has been retrieved from these segments and categorising it

into the various pieces of contact information such as name, address, E-mail, website, fax and phone number.

The author of this document is a member of the team of four that was involved in the development of

this application. In this workflow, the work was divided equally amongst the four team mates, to do advance

21

Page 35: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 12: Scope application workflow

22

Page 36: Mohamed Salman Ismail Gadit - Thesis.pdf

research and create relevant algorithms to accomplish their relevant sections. The author of this thesis took on

the following tasks:

• Creation of layouts and managing integration within the Android App

• Background segmentation

• Text segmentation

• Segment clustering

• Image and memory management in app

• Integrating Tesseract with Android and optimizing performance

The rest of this report logically analyses and covers these various tasks in due detail. The final section of

the report also covers the results of this app after the entire workflow has been implemented and run together

for images.

4 Image Processing

The first task for the project introduction of image processing filters to the system. In a system where the

accuracy of the result depends highly on the quality of the image and how clean it is, it is important to perform

some pre-processing to the image before passing it into the Tesseract Engine. To do this in a diverse and

wholesome manner, a set of Image Processing libraries have to be compiled together in a simple fashion to be

simple enough to be accessed by the system, and later, the automation algorithm. This task was undertaken by

the author and fellow team-mate and the tasks were divided equally between them. This section describes the

filters implemented by the author and the rationale behind implementing each one of the filters mentioned.

4.1 Brightness

A general image processing operator is a function that takes one or more input images and produces an output

image. In this kind of image processing transform, each output pixels value depends on only the corresponding

23

Page 37: Mohamed Salman Ismail Gadit - Thesis.pdf

input pixel value (plus, potentially, some globally collected information or parameters)[36].

One commonly used point process is addition with a constant:

g(i, j) = f(i, j) + β (1)

In equation 1 , β is the bias parameter, which controls the brightness of the image, g is the output image

matrix, f is the input image matrix and i and j refer to row and column number respectively.

Brightness is a basic functionality of any image processing class and a good brightened image will help

enhance results better and therefore the author chose this class as one of the implementations.

Figure 13: Application of the Brightness Algorithm

4.2 Smoothing Filters

Smoothing, also called blurring, is a simple and frequently used image processing operation. There are many

reasons for smoothing. For Scope, the main purpose of smoothing is to reduce noise in a picture, and thereby

ensure smoothness in a picture. This, naturally, leads to better results in OCR reading due to the balance between

varying pixels in the image.

24

Page 38: Mohamed Salman Ismail Gadit - Thesis.pdf

To perform a smoothing operation we will apply a filter to our image. The most common type of filters is

linear, in which an output pixels value is determined as a weighted sum of input pixel values:

g(i, j) =∑k,l

f(i+ k, j + l)h(k, l) (2)

In equation 2 , h(k,l) is the filter kernel, which is nothing more than the coefficients of the filter, i is row size

and j is column size.

4.2.1 Homogeneous Filters

This filter is the simplest of all. Each output pixel is the mean of its kernel neighbours ( all of them contribute

with equal weights)[37]. This results in a simple matrix kernel, which looks like:

κ =1

κwidthκheight

1 . . . 1

.... . .

...

1 . . . 1

(3)

The result of a Homogeneous Filter looks as in figure 14.

25

Page 39: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 14: Application of the Homogeneous Filter

4.2.2 Gaussian Filters

Gaussian filtering is done by convolving each point in the input array with a Gaussian kernel and then summing

them all to produce the output array. To understand what a Gaussian kernel is like, we can imagine a 1-D image,

and the kernel looks in a Gaussian Curve. This is referenced in figure 15

Figure 15: 1D Gaussian Kernel

The weight of its neighbours decreases as the spatial distance between them and the centre pixel increases[38].

This concept is the same for a Gaussian kernel in 2-D, representing a 2-D curve that increases in the centre of the

26

Page 40: Mohamed Salman Ismail Gadit - Thesis.pdf

2-D space. The equation for calculating a Gaussian kernel therefore, is as per the Normal distribution equation

4.

G0(x, y) = Ae−(x−µx)2

2σ2x+

−(y−µy)2

2σ2y (4)

In 4, µ is the mean and σ is the variance for variables x and y. An applied Gaussian filter is shown in 16.

Figure 16: Application of the Gaussian Filter

4.2.3 Median Filter

The median filter runs through each element of the signal (in this case the image) and replaces each pixel with

the median of its neighbouring pixels (located in a square neighbourhood around the evaluated pixel). Median

filtering is very widely used in digital image processing because, under certain conditions, it preserves edges

while removing noise [39]. The result of a Median filter is shown in figure 17.

The weight of its neighbours decreases as the spatial distance between them and the centre pixel increases.

This concept is the same for a Gaussian kernel in 2-D, representing a 2-D curve that increases in the centre

27

Page 41: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 17: Application of a Median Filter

of the 2-D space. The equation for calculating a Gaussian kernel therefore, is as per the Normal distribution

equation 4.

4.2.4 Bilateral Filter

Sometimes filters do not only dissolve the noise, but also smooth away the edges. To avoid this (at certain

extent at least), we can use a bilateral filter. In an analogous way as the Gaussian filter, the bilateral filter also

considers the neighbouring pixels with weights assigned to each of them. These weights have two components,

the first of which is the same weighting used by the Gaussian filter. The second component takes into account

the difference in intensity between the neighbouring pixels and the evaluated one[40].

The basic idea underlying bilateral filtering is to do in the range of an image what traditional filters do in

its domain. Two pixels can be close to one another, that is, occupy nearby spatial location, or they can be

similar to one another, that is, have nearby values, possibly in a perceptually meaningful fashion. It replaces

the pixel value at x with an average of similar and nearby pixel values. In smooth regions, pixel values in a

28

Page 42: Mohamed Salman Ismail Gadit - Thesis.pdf

small neighbourhood are similar to each other, and the bilateral filter acts essentially as a standard domain filter,

averaging away the small, weakly correlated differences between pixel values caused by noise. The result of a

Bilateral filter is shown in figure 18.

Figure 18: Application of a Bilateral Filter

4.3 Combining Filters

Concurrently while the author was in the process of developing the filters mentioned in the previous section, a

fellow team-mate was implementing the second half of image processing filters for the teams self-created image

processing library. The other functions implemented are listed as follows:

• Contrast

• Greyscale

• Histogram Equaliser

• Morphing filters

29

Page 43: Mohamed Salman Ismail Gadit - Thesis.pdf

• Pyramids

• Thresholding Filter.

The team agreed that this large collection of filters should serve their initial tests well enough. There was

now a need to find a way to test all functions independently and together.

4.4 ScopeMate

ScopeMate was a separate app developed for the purposes of testing the functionality of Scope. It was developed

so that image processing filter can be dynamically and immediately tested and the OCR results retrieved. The

author was purely responsible of creating this test app and combining the Image Processing library with the

Tesseract Engine. In addition, the order of applying the filters also affects the final result, so ScopeMate was

laden with an internal algorithm to listen to the order of the application of filters by the tester. This order and

values are recorded and applied to an image, showing the visual result as well as the results from the testing

process.

Figure 19: ScopeMate main screen

ScopeMate was to be primarily used by the teams tester (a fellow team-mate) and the results calibrated are

explained on the team-mates report.

30

Page 44: Mohamed Salman Ismail Gadit - Thesis.pdf

5 Image Segmentation

Image segmentation is the process of subdividing a digital image into multiple segments or sub-images. The

goal of segmentation is to streamline the representation of an image into something that is more meaningful and

easier to analyse. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in

images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such

that pixels with the same label share certain visual characteristics [41].

The result of image segmentation is a set of segments that collectively cover the entire image, or a set of

contours extracted from the image. Each of the pixels in a region is similar with respect to some characteristic

or calculated property, such as colour, intensity, or the presence of some common trait[42].

In this project, the author performed two types of image segmentation:

• Background Colour Segmentation

• Text Segmentation

The relevance, objectives, implementation and results of the above methods of segmentation are given in the

sections that follow. It must be noted, that due to the device resource constraints, no existing algorithms were

used to perform these operations, and it was a complete research effort from ground up. The workflow for the

image segmentation used in the app is given in figure 20.

Figure 20: Image Segmentation Workflow

5.1 Background Colour Segmentation

Background colour segmentation is the process of segmenting an image based on the trait of background colour.

In business cards, there could be many cases of varying background colours, and it is important to segment by

31

Page 45: Mohamed Salman Ismail Gadit - Thesis.pdf

background colour before performing text segmentation. Usually each background contains a certain number

of text segments, and it is contained within the background colour. In this section, the author will attempt to

justify the necessity for background segmentation, the algorithms involved, provide a thorough analysis of the

results and explore the future uses and improvements on the idea.

5.1.1 Objectives of Background Colour Segmentation

The Tesseract OCR Engine ”learns” an entire image before it performs the OCR imaging. For this purpose, the

more concentrated the image is, in terms of several features, the better the recognition process by the OCR. One

purpose of background segmentation is to provide this contextual purpose to the application. Using background

segmentation, the card is divided by background colour and this provides a more concentrated scope of learning

for the OCR engine.

The second objective of background colour segmentation is to separate the business card into its differently

coloured backgrounds because the adaptive thresholding mechanisms used for cards with dark backgrounds, is

different from that used for light coloured backgrounds. The preliminary identification of the different parts

of the card is performed by the background colour segmentation algorithm. By splitting the card into its dif-

ferent background colours, the strategy is to create a stronger, more customised adaptive thresholding for each

background colour, and thereby create more accurate results.

5.1.2 Alternatives Explored

Histogram Based: Histogram based background segmentation involves the utilisation of an image histogram

to detect the primary intensities of the image. By method of contour analysis, combined with the results of the

histogram, the user is able to make a reasonable guess as to the nature of the backgrounds in the card. Text

is usually a very small representation on the histogram, so by eliminating all histogram peaks that fall below

a set threshold value, the program is able to identify the types of backgrounds, and by using contour analysis

and image masking, accurately point out backgrounds of the image. The problem with using a mere histogram

based approach is that, it fails to work for any backgrounds with gradients[43]. Gradients occupy a large number

32

Page 46: Mohamed Salman Ismail Gadit - Thesis.pdf

of peaks in the histogram even though they may be from one background technically. To overcome this, this

method has to be combined with Canny edge detection.

Histogram and Canny Detection Based: The Canny Detector takes as input a grayscale image, and produces

as output an image showing the positions of tracked intensity discontinuities. The Canny algorithm performs

various operations. Firstly, since the image is susceptible to noise present in image data, a filter is used where the

raw image is convolved with a Gaussian filter. The result is a slight blurred version of the original which is not

affected by a single noisy pixel to any significant degree. The edge detection operator then returns a value for the

first derivative in the horizontal (Gx) and vertical (Gy) directions. From this the edge gradient magnitude and

direction can be found. The direction is rounded to one of four angles (0, 45, 90 and 135 degrees) representing

vertical, horizontal and diagonal directions. Given estimates of the image gradients, a search is then carried out

to determine if the gradient magnitude assumes a local maximum in the gradient direction, a process known as

non-maximal suppression[44]. The tracking process exhibits hysteresis controlled by two thresholds: T1 and

T2, with T1 is greater than T2. Tracking can only begin at a point on a ridge higher than T1. Tracking then

continues in both directions out from that point until the height of the ridge falls below T2. This hysteresis helps

to ensure that noisy edges are not broken up into multiple edge fragments. Thus the Canny Detector is great

for identifying squares with gradient shading in the image. By using this Canny detector, we are able to find

edges of the card with similar gradients. This helps split the image into various edges. By applying histogram

analysis to each bounded contour, we are able to identify that contours primary background colour and apply

our masking technique as before to accurately identify the different segments in the card. By this method, the

combination of Canny detection and histograms yields a more accurate result. Unfortunately, this is a more

expensive and time consuming process primarily because of the addition of Canny detection. As the mobile

phone is a limited resource environment, usage of this technique must be carefully justified. The concepts for

both the above algorithms were developed by the author of this thesis.

Background Removal: The third option for this section would be the removal of background altogether. This

is done by detecting the backgrounds using the methods above and applying a reverse mask, so as to preserve

33

Page 47: Mohamed Salman Ismail Gadit - Thesis.pdf

the foreground and hide the background. Though this would work quite well, it turns out to be a problem for

cards with darker backgrounds as the text left behind is usually white or closer to white and it would involve yet

another process to fill in the text itself. Again, even though this is possible, the expensive nature of the entire

process must be considered in a limited resource environment like the Android OS for smartphones, and this

leads one to believe that this method may not be suitable.

5.1.3 Background colour Segmentation Algorithm

The author decided to choose the histogram based background segmentation. The rationale behind this was that

the phone has limited resources, and this was still the pre-processing stage of the application. There was still

adaptive thresholding and text segmentation to go before the OCR engine was activated and all the following

processes would need more memory and processing power especially the Tesseract OCR engine. In lieu, the

background segmentation with the least amount of resources consumed would be the most ideal. In addition, the

author hypothesized that there arent very many business cards with gradient backgrounds. Usually professional

organisations do not use linear gradients in business cards as it gives of an aura of unprofessionalism, and with

this assumption, using the histogram based background segmentation method makes most sense. A workflow

for this algorithm is given in figure 21.

Figure 21: Background Segmentation Workflow

To highlight the algorithm, the author is using the example of the NUS card given in figure 22.

The first step in the process is grayscaling. With the help of our custom built library on top of OpenCV, this

is made very easy and results in a grayed out version of figure 22. The next step is harder, requiring a histogram

analysis. To calculate the histogram, we use the OpenCV library again.

Histograms are collected counts of data organized into a set of predefined bins. Since it is known that the

range of information value for this case is 256 values, we can segment our range in subparts (called bins). This

34

Page 48: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 22: NUS card used for this example

is illustrated in equation 5.

[0, 255] = [0, 15] ∪ [16, 31] ∪ ... ∪ [240, 255]

range = bin1 ∪ bin2 ∪ ... ∪ binn=15

(5)

The number of pixels that fall in the range of each particular bin is kept count of. Applying this to the NUS

card, the histogram analysis the histogram shown in figure x.

The histogram analysis tells the users that there are two primary peaks in the NUS cards. To be safe, we run

it through an auto analysis algorithm, where the value of the tallest peak is taken, and anything above 30% of

that value, is considered a valid peak. Everything else is removed. From figure x, the peaks at values 240-255

and 112-127 are successfully selected from this analysis. The author has labelled this process as background

detection.

The next step in the process is to apply image masking. The idea of masking is each pixels value in an image

is recalculated according to a mask matrix (also known as kernel). This mask holds values that will adjust how

much influence neighbouring pixels (and the current pixel) have on the new pixel value. In this algorithm, the

author has applied a mask for each bin range, that is a separate mask from 112 to 127 and another mask from

240 to 255. After masking, the figures are given in figures 24 and 25 respectively.

Following masking, each masked image undergoes a contour analysis. The purpose of this is simple after

35

Page 49: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 23: Histogram of NUS card

Figure 24: Image masking applied to histogram bin from 112 to 127

36

Page 50: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 25: Image masking applied to histogram bin from 240 to 255

masking, we have removed all the unnecessary parts of the image and converted them black. The largest

remaining white area will now be the desired background. To identify this, a contour analysis will provide a

set of contours, and the largest contour will be the large white background area visually recognisable after the

masking. The contour analysis for both images are given in figures 26 and 27. With this it is clearly evident that

the masking has definitely helped identify the correct backgrounds from the card.

Figure 26: Contour analysis applied to histogram bin from 112 to 127

With this the correct background colours and segments have been identified. This can now be separated into

sub images using OpenCVs Region of Interest functionality. More about this is given in the section Region of

37

Page 51: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 27: Contour analysis applied to histogram bin from 240 to 255

Interest below. The results and testing sections of the algorithm has been covered in Part III of this thesis.

5.2 Text Segmentation

Text segmentation is the second kind of image segmentation that has been performed in this project. Text

segmentation is the method of locating, identifying, and separating the different closely located text areas in an

image. Each segment is then separately analysed by the OCR engine for maximal results.

5.2.1 Objectives of Text Segmentation

The main objective of text segmentation is to provide a simple, concentrated view of just text for the OCR

engine to analyse. It identifies the different parts of the cards containing text and, using the clustering algorithm

explained in the next section, compiles together the different segments in a visually understandable format. As

the OCR engine learns the image before processing results, a smaller area with just text has a higher chance

of success. Text segmentation also acts as an excellent follow up from adaptive thresholding. For some cards,

adaptive thresholding could leave unwarranted pixels around the thresholded image. The text segmentation

ensures to box in a very concentrated area leaving all these dirty pixels out of the final OCR result, therefore

rendering a more accurate result.

38

Page 52: Mohamed Salman Ismail Gadit - Thesis.pdf

5.2.2 Alternatives Explored

Erosion-Clustering: The erosion-clustering method of text segmentation applies erosion based pre-processing

techniques to the image to blur together similar neighbouring regions. The idea is to blur all the text areas to-

gether close enough to be identified clearly in a condition based contour analysis. This approach is inexpensive

and works quite well. However, too many dirty pixels can cause a mistaken assessment of the image and the

text segments identified could be wrong. Erosion-clustering was developed by the author of this thesis.

Spectral Clustering: Spectral clustering makes use of the image like a spectrum to identify different regions

of the image as a matrix and using vector theory. By constructing a weighted graph and an affinity matrix,

the spectral clustering method computes the diagonal of the matrix and combines it with the Eigen vectors of

the same to create a new spectral vector, which then is used to partition the image [45]. Though this image is

powerful, it is still a fairly fresh method and hasnt seen popularity yet in the recent years. Spectral clustering

was developed students in the Harbin Institute of Technology in China.

Normalized Cut Segmentation: There are also methods of segmentation using pure thresholding, by analysing

what the threshold level required for text segmentation is. NCut or Normalized Cut segmentation figures out

a normalised threshold value of the image for text by treating the image segmentation as a graph partitioning

problem for segmenting the graph and the threshold value can be used, much like Otsus threshold over an image

to separate the text[46]. The problem with this is that it does not work very well for images with coloured text

and coloured backgrounds, and seeing that most of the business cards will be in colour, this alternative could

certainly be a problem for this project.

5.2.3 Text Segmentation Algorithm

The chosen method for performing text segmentation was erosion-clustering, which was developed by the author

of the thesis. Besides the innovative appeal, erosion-clustering is also inexpensive in this project as it uses many

readily available OpenCV methods to enhance the image well enough to produce a reasonable result for text

segmentation. The workflow used in this text segmentation method by the author is given in figure 28.

39

Page 53: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 28: Workflow for Text Segmentation

To explain the text segmentation algorithm, the image in figure 29 can be used. As noticeable, contrary to

the previous section on background segmentation, the author is now using an external card to show that it works

in all scenarios, not just NUS cards. By a mere glance of this image, a visual idea that one receives is that this

card has 6 text segments. If correctly working, this algorithm, along with the clustering algorithm will help get

exactly that.

Figure 29: Business card used to demonstrate Text Segmentation

The first step is to grayscale the image. This is done using OpenCVs grayscaling function, which provides a

greyed out version of the image in figure 29. Following that, a median blur is applied over the entire card to blur

out any dirt or dust particles that have arrived in the image. The median blur has been explained in the image

processing section above. The image after the median blur is given in figure 30.

Next, a general adaptive threshold is applied all over the image, so that the whites of the image can be

inverted with the outlines of the text. It also further helps clarify any dirt pixels from the image. The image now

looks like in figure 31.

40

Page 54: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 30: After applying Median Blur

Figure 31: After applying adaptive threshold

41

Page 55: Mohamed Salman Ismail Gadit - Thesis.pdf

Following this, the card is now run by a strong dilate function. The dilate function applies a strong blur all

over the image and the text is no longer visually readable. As the goal of the whole process is to get blurred

lumps of what was text, this method is the most useful in the process so far.

Figure 32: After applying strong dilation

The next step is to apply a relatively weak erosion with a large kernel. The reason for doing that is to possibly

create spaces in images that have been over dilated from the previous process. It doesnt affect images that have

dilated correctly, much like this has. Thus the result in figure 33 isnt that different from the previous image. In

some other cards, it plays a bigger role and can be a differentiating factor for perfect text segmentation.

Figure 33: After applying weak erosion

42

Page 56: Mohamed Salman Ismail Gadit - Thesis.pdf

Following this step, the image is run through a contour analysis. All contours found are set to be bounded

by a rectangle. This roughly generates many contours that overlap within on another as seen in figure 34.

Figure 34: Full contour analysis of card

The next thing to do is to clean the image. A cleansing process is applied through out the image to remove

contours that may be entire card and to remove contours that may be dust particles. This is done by dynamically

checking the largest contours available and creating thresholds against them against the total area of the image.

By applying the cleaner and also removing any contours within contours, the image generated is as per figure

35.

Figure 35: Text segmentation without clustering

43

Page 57: Mohamed Salman Ismail Gadit - Thesis.pdf

The text segments have now been identified successfully at this point. The text segmentation process is done.

However there are 6 visual segments and by looking at figure 35, there are certainly more than 6 identified

segments. The next section of this report deals with the algorithm used to cluster all the neighbouring text

segments together.

5.3 Clustering

Clustering is the process of radially collect information about the closest objects and merge them together. In

this project, clustering is directly relevant to text segmentation. By using the clustering algorithm developed by

the author of this thesis, the text segmentation process is completed and reaches the recommended results.

5.3.1 Objectives of Clustering

The clustering algorithm comes into play in this context in the scenario shown in figure 36. Clustering behaves as

Figure 36: Clustering feedback loop

a feedback loop. This means that, if there are any segments that are overlapping they will be sent through another

pass with clustering algorithm to merge together. The discussion on this is done in the further improvements

section below.

5.3.2 Future Improvements

Future improvements to the clustering algorithm could be to use a recursive check. This will mean that instead

of a feedback loop, a single recursive loop checks for predicted overlaps and clusters the relevant portions

together.

44

Page 58: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 37: Clustering algorithm issue

Another issue in the current clustering algorithm is illustrated in figure 37. In the mentioned figure, the blue

border represents the edges of the image. In this instance assume A, B and C are actual text regions, while D

is a logo. Due to the clustering algorithm, event though A, B and C successfully segment, D overlaps the new

segmentation, and by virtue of the feedback loop, D too will now be clustered with A, B and C. This would

mean, almost the entire card would be sent back in without any segmentation. This issue still prevails in the

existing algorithm and should be fixed in the near future. A viable solution would be to identify that D is a logo

and prevent clustering when that identification is successful, but it takes too much time and effort to be able to

implement this in the time span of this project.

5.3.3 Clustering Algorithm

The clustering algorithm works on distance check mechanism and then populates a list of queues depending

on which cluster they belong to. For all the corners of every rectangle in the image, the distances to other

rectangles are taken and compared. If the distance is shorter than a specified minimum distance, then the

clustering algorithm adds it to the relevant clustering queue. The flowchart given in figure 38 describes the

workings of this algorithm.

When clustering is applied to the card given in 35, the text segmentation can now be successfully completed.

After applying the clustering algorithm, the result is as in figure 39. This results in the 6 segments that were

predicted visually and can now be separated into sub images using the Region of Interest.

45

Page 59: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 38: Clustering algorithm flowchart

46

Page 60: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 39: Text segmentation with clustering applied

5.4 Region of Interest

A region of interest (ROI) is a sub-matrix that has to be extracted from within a matrix. In segmentation, this is

extremely important as the card is being split into segments. Almost all functions have support the work with

ROI and work with the selected image area, which is useful to speed up the algorithms. Thus if only a specific

area is needed - it can be extracted and worked with without affecting the whole image.

Figure 40: Region of Interest

To use ROI in Android with OpenCV, the following simple code snippet can be used.

// Create region of interest and save as a seperate bitmap

Mat cropped = performCrop(x, y, width, height, sourceImageMat);

destImage = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888);

Utils.matToBitmap(cropped.clone(), destImage);

47

Page 61: Mohamed Salman Ismail Gadit - Thesis.pdf

6 Optical Character Recognition in Android

6.1 Managing Images in Android

There are a number of reasons why loading bitmaps in Android applications is tricky. Some of them are:

• Mobile devices typically have constrained system resources. Android devices can have as little as 16MB

of memory available to a single application.

• Bitmaps take up a lot of memory, especially for rich images like photographs. If the bitmap configuration

used is ARGB 8888 (the default from the Android 2.3 onward) then loading this image into memory

takes about 19MB of memory (2592*1936*4 bytes), immediately exhausting the per-app limit on some

devices

• Android app UIs frequently require several bitmaps to be loaded at once. Components such as ListView,

GridView and ViewPager commonly include multiple bitmaps on-screen at once with many more poten-

tially off-screen ready to show at the flick of a finger.

Since there is very limited memory, ideally a lower resolution version has to be loaded in memory[47]. The

lower resolution version should match the size of the UI component that displays it. An image with a higher

resolution does not provide any visible benefit, but still takes up precious memory and incurs additional perfor-

mance overhead due to additional on the fly scaling. In events that memory is not managed right, it results in an

overload of the virtual machine heap and the following dreaded message crashes the entire app:

java.lang.OutofMemoryError: bitmap size exceeds VM budget

To implement this on the fly from the varying types of Android phones, the author has implemented some-

thing called a Bitmap Handler. In creating the handler, the traits to be considered were: Here are some factors

to consider:

• Estimated memory usage of loading the full image in memory.

• Amount of memory willing to commit to loading this image given any other memory requirements of the

app.

48

Page 62: Mohamed Salman Ismail Gadit - Thesis.pdf

• Dimensions of the target UI component that the image is to be loaded into.

• Screen size and density of the current device.

To ensure this, the bitmap handler first detects the screen size of the phone and sets appropriate dimensions.

Following this, it runs through a feedback loop with the image, where if the image is too big, it will be scaled

down proportionally. After it has reached the correct scaled size, it is allowed to be passed into the app. The

bitmap handler comes with 3 settings: high resolution, medium resolution and low resolution, though the author

is only using the high resolution option in most of the app. Lower resolution yields less accuracy, but faster

processing time and this has been explored in Chapter 8 in detail.

The bitmap handler has been developed by the author of this thesis. Figures 41, 42 and 43 show examples

of how the different resolutions are changed and handled by the bitmap handler for the same image.

Figure 41: Bitmap handler - high resolution

6.2 Integrating Tesseract with Android

To integrate Tesseract into Android, a Java-based fork of the Tesseract OCR Engine created by Robert Theis

on GitHub was used. It is based upon the tesseract-android-tools project which is a reference from the original

Tesseract website. It is a project written on Javas Native Development Kit, with a Java based API. By compiling

it and running the local NDK, the project is converted into an Android-compatible library. Being a Windows

49

Page 63: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 42: Bitmap handler - medium resolution

Figure 43: Bitmap handler - low resolution

50

Page 64: Mohamed Salman Ismail Gadit - Thesis.pdf

user, there was a need to get a Linux based environment to be able to run the build functions. To do this, the

author installed Cygwin and used that as a Linux layer to carry out the necessary functionality. Cygwin is a

collection of tools which provide a Linux look and feel environment for Windows and acts as a Linux API layer

providing substantial Linux API functionality[48]. By installing the gcc-core, gcc-g++, make, swig libraries

into Cygwin, the author was able to acquire an environment where the first stage of building the NDK was

ready. When built, the C++ side of the NDK was completely built. The process took a little over 50 minutes.

The next step is to compile the C++ side and integrate it into the Java API and make a Java library. To do this,

the author used a tool called Apache Ant. Apache Ant is a Java library and command-line tool whose mission

is to drive processes described in build files as targets and extension points dependent upon each other. The

main known usage of Ant is the build of Java applications. Ant supplies a number of built-in tasks allowing

to compile, assemble, test and run Java applications[49]. Ant can also be used effectively to build non Java

applications, for instance C or C++ applications. By running it with the two tools, the author was able to

convert the project into an Android library. From this point, it was as simple as importing as a reference into

the Scope app and marking it as a library. Figure 44 shows all the libraries for Scope correctly referenced after

compilation. As visible, Scope is a project compatible only on Android OS version 4.0 (Ice Cream Sandwich)

and upwards, thereby meeting the minimum requirement for Tesseract on Android which is 2.2.

6.3 Multi-threading

Scope is architected as a workflow. Each step has to be completed before it reaches the next step and acts as

the input for the next process. In this way, it follows what is referred to as a chain model. When the final

data reaches the OCR Engine, it is in the form of several separate adaptively thresholded images. As the OCR

Engine can be instantiated only once, it is a matter of careful architecture to how resources can be used at

maximum whilst getting the fastest possible results. Assume there are 6 resulting segments after all the pre-

processing techniques have been completed. These 6 segments can be passed into the OCR one at a time in two

different ways. The authors first method of implementation was to use a linear chain of Asynchronous tasks as

represented in figure 45.

51

Page 65: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 44: Scope integrated with Tesseract and OpenCV libraries

Figure 45: Single-threaded model

52

Page 66: Mohamed Salman Ismail Gadit - Thesis.pdf

In this method, which is called the Single Threading model, each segment passes through the OCR Engine

only once. For 6 segments, the time went up to152 seconds before being sent to the contacts parser! This was

completely inefficient for the app and therefore the author decided to introduce multithreading into the app. The

multi-threaded model is represented in figure 46.

Figure 46: Multi-threaded model

In the multi-threaded model, all the segments are treated parallel. The instantiation reference object identi-

fies the phone resources at that moment and how many threads can be spawned to run an instance of Tesseract.

By instantiating Tesseract outside the multithreaded model[50], the engine has to only be instantiated once, and

various clones of the references are passed to each thread, thereby saving valuable time constructing the engine.

In addition, the maximum possible segments are being processed in parallel, therefore shortening the entire

time. When trying the multi-threaded model, the time for the same card was now an astounding 87 seconds.

The performance had improved by 42.7%. This is a tremendous improvement and even though it is still slow,

it is limited by the fact that the app is offline. In this environment, 87 seconds is indeed a very optimal time for

the Engine to perform results in. Therefore, the current model used in the app is the multithreaded model, based

on the optimisation that it renders.

6.4 Using the CUBE libraries

As discussed in the literature review, the current OCR detection using Tesseract 3.02 simply just translates the

image to text, but does not take into account the relationship behind identified letters and word formations in

order to provide an intelligent result. The English CUBE libraries consist of seven different files [51].

53

Page 67: Mohamed Salman Ismail Gadit - Thesis.pdf

• eng.cube.bigrams This helps automatically correct the identification of most commonly found bigrams

in detected text.

• eng.cube.fold This auto-formats the detected document into sections, lists, and paragraphs.

• eng.cube.lm This trains tesseract to identify special characters along with alphabets and numbers.

• eng.cube.params Params defines a collated list of global OCR parameters like Max Word Aspect Ratio,

Max segment per character, etc, for faster OCR.

• eng.cube.size Size automatically grids large images into smaller portions for faster OCR.

• eng.cube.word-freq This uses word-frequency data from the identified text to correct similar words that

continue to appear in the image.

• eng.osd.traineddata This automatically corrects orientation of the image if it is not top-down. This would

increase the usability for this applications users, especially when taking pictures with their phones

Figure 47: Tesseract with OSD segmentation

These libraries are however too big to be loaded every time an OCR request comes up. As a solution

to this problem, the CUBE libraries are asynchronously loaded along with the splash screen, right before the

application begins. This method takes place in a background thread such that the user doesnt notice the lag

54

Page 68: Mohamed Salman Ismail Gadit - Thesis.pdf

when he/she uses the application. The libraries are permanently loaded on a small space on the memory card,

so that there is no delay in loading the libraries after first time use.

Figure 48: Asynchronous loading of CUBE libraries in Scope

Figure 49: Effect of CUBE libraries: (a) Original Text (b) Only Tesseract (c) CUBE and Tesseract together

The seven CUBE files that have been implemented in this project provide for effective post-processing and

segmentation capabilities that give a substantial boost to Tesseracts performance. One such example of CUBEs

performance improvement over stand-alone tesseract has been illustrated in figure 49.

55

Page 69: Mohamed Salman Ismail Gadit - Thesis.pdf

Part III.

Results

56

Page 70: Mohamed Salman Ismail Gadit - Thesis.pdf

7 Segmentation Results

7.1 Background Colour Segmentation

7.1.1 Results of Algorithm

Figure 50: Working test case: Background Segmentation

A perfectly working example of background segmentation is given in figure 50. This image is perfectly

segmented into its two predominant colours by the background segmentation algorithm. As seen by the result

on the right, the split into its correct background segment is indicated by the coloured boxes, representing

contours around the segments.

Figure 51: Boundary test case: Background Segmentation

In figure 51, a boundary case is presented. This is defined as a boundary case because it is a very special

type of card that requires a non-rectangular segmentation cut. Since this is not possible using OpenCV, a part

of the card is common to both segments the part with the name. This would result in this segment being sent

57

Page 71: Mohamed Salman Ismail Gadit - Thesis.pdf

into the OCR engine twice for results, and a check should be done on the smart parser side.

Figure 52: Failing test case: Background Segmentation

The third text case of background segmentation is a failing case. This is a very special case that fails because

the colour of the logo on the second segment is in the same bin as the background colour of the first segment.

Thus, when finding contours, this too gets included into the final contour analysis and skews the image. This

should be marked as an issue to fix, and has been discussed in the improvements section.

7.1.2 Future Improvements

Background segmentation can be improved in 3 ways.

Firstly, the 30% rule implemented to find which parts of the histogram are backgrounds should be changed

into a dynamic analysis of the histogram. The algorithm should also check if immediately neighbouring bins

are also meeting the 30% requirement and should cluster them together in customised and varying bin sizes.

This will prevent there being overlaps in similar coloured backgrounds.

The second improvement will help the problem in figure x. This would involve changing rectangular

segments into more compound objects therefore preventing overlaps, and helping the results as a whole. As

OpenCV provides only a rectangular extractable region of interest, some sort of overlap splitter algorithm must

be developed for this cause.

The third suggested improvement includes the failing case seen in figure x. In this very special case scenario,

it is important to identify that the background segmentation has failed, take the common overlap area as one

segment and the balance as the other. This check would work in this case, as the overlap occurs due to colour

58

Page 72: Mohamed Salman Ismail Gadit - Thesis.pdf

mixing, and the future iteration of the this algorithm should focus on this case.

7.2 Text Segmentation

7.2.1 Results of Algorithm

The text segmentation algorithm, usage and purpose have been discussed in Part II of this document. This

section explores some of the results obtained by the text segmentation algorithm and failing cases have been

displayed and discussed as well. The first example discussed is that in figure 53. This card is an example of a

Figure 53: Working test case: Text Segmentation

successful text segmentation and clustering. Visually, there are 3 segments in the card. This translates perfectly,

as the image on the right perfectly shows the three identified segments encapsulated within the specified area

correctly. This is another example of the successful test case of the text segmentation algorithm and this should

yield better results than the image sent without text segmentation. To check the effect of text segmentation, the

author sent the same card in with and without text segmentation and analysed the results. The results are shown

in the following table.

Without Text Segmentation With Text Segmentation

Accuracy (%) 65.2 82.8

Table 2: Effect of text segmentation on result accuracy

This shows that text segmentation does play a big part in obtaining accurate results in the system, and a

59

Page 73: Mohamed Salman Ismail Gadit - Thesis.pdf

successful segmentation can improve results in a big way.

Figure 54: Boundary test case: Text Segmentation

The boundary case for text segmentation is explained by figure 54. In this instance, the text segmenter

wrongly takes a larger segment than what actually is (as the two segments are close by) and this overlaps with

an existing segment. Due to this, the feedback loop kicks into place and performs clustering, which leads to the

image below the two images above. This basically returns the original card, in a smaller area. This way, the text

segmentation has not been effective enough to analyse what has happened, and therefore not worked perfectly,

but hasnt failed either.

The failing case example is the card above. In this instance, no segments have been recognised correctly, and

the feedback loop doesnt clean up overlaps as it usually would. This usually happens in cards with a big logo

right in the middle of the card. In this instance, there are many overlaps but none of them meet the minimum

clustering distance requirement, which is 5% of the diagonal of the card. In the improvements section, the

author has discussed what could be done to improve this and fix this behaviour.

60

Page 74: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 55: Failing test case: Text Segmentation

7.2.2 Future Improvements

Even though text segmentation works very well now, there still need to be a few fixes that need to be addressed

to make it even better. Firstly, is the issue discussed in figure 37. In this figure, it can been that the clustering

portion of the algorithm does not do a sound job in clustering at times and leads to giving a result like in the

case of figure 54. To avoid this, the text segmentation needs to be able to judge if it is holding too much in its

segment and perform a recursive segment validation test. By performing it recursively, it will ensure never to

over compensate its segments, keeping the clustering at the right amounts. The algorithm also needs to look

into moving from just rectangular segments into more compound shaped objects, so as to segment just the right

area.

The second improvement, is the reason for the failing case in 55. This happens because the minimum

clustering distance is too small for the image to eventually cluster and yield a result. To fix this, the text

segmentation algorithm must analyse the segments and dynamically create a minimum distance size based on

the average of the smallest distances between corners of the segments. This way, the clustering distance is kept

dynamic and the result accuracy will improve drastically.

61

Page 75: Mohamed Salman Ismail Gadit - Thesis.pdf

8 Performance Results

The first set of test results focus on the accuracy-speed trade-off that the application consumes. As the entire

app is offline, loading higher resolution images is more expensive and time consuming but the results are

significantly better. The card used for testing this is shown in figure 56.

Figure 56: Card used for performing resolution testing

8.1 Resolution vs. Time

App Process Low Resolution Medium Resolution High Resolution

Edge detection (ms) 8483 26684 69033

Background segmentation (ms) 433 2191 3881

Adaptive threshold (ms) 1935 13380 52258

Text Segmentation (ms) 90 834 1864

OCR Engine (ms) 64337 80862 108009

Total time (ms) 75278 123951 235045

Table 3: Resolution vs. Time Test Results

As expected, with increasing resolution, the time taken to complete all the processes in the app is increasing.

62

Page 76: Mohamed Salman Ismail Gadit - Thesis.pdf

These results are graphically demonstrated in 57. However, there is a very significant advantage to increasing

the resolution, and this is discussed in the next section.

Figure 57: Graph of resolution testing for speed results

8.2 Resolution vs. Accuracy

Accuracy results significantly vary too. As can seen below, the actual texts of results has been shown with a

percentage of accuracy. The percentage of accuracy has been done by matching every letter with the expected

letter, and taking a success ratio.

Expected Result

Cammie TAN

Senior Manager (NUS Career Centre)

NUS Career Centre

Office of Student Affairs

Yusof Ishak House, Level 1

31 Lower Kent Ridge Road, Singapore 119078

Tel: (65) 6516 1278 Fax: (65) 6774 4463

63

Page 77: Mohamed Salman Ismail Gadit - Thesis.pdf

E-mail: [email protected]

Website: www.nus.edu.sg/osa/career

Low Resolution Result

NUS Career Cetttt

Mtxe or ’tuit Mtaita

-hhrmermusN tetelt

" CpNerkent bdge _ bogapore

Tel (651551s " CacFSib?rqMs

S-ttTad tancan-uivdurs7

vtgbsneevvvv- cdu ’Wosaxareer

Accuracy: 35.27%

Medium Resolution Result

Cammie TAN

Senior Manager tNUS Ci"rser Centre)

NUS Career Centre

Office ot Student Affairs

Yusof [shak Hcruse. Level l

31 Lower Kent Ridge Mad, Singapore IT9078

Tel: (65) 6ST6 1278 Fax: (65) 6774 M63

E-mail: [email protected]

Website: wwwsnss.edusgftrsa/career

Accuracy: 77.80%

High Resolution Result

64

Page 78: Mohamed Salman Ismail Gadit - Thesis.pdf

Cammy TAN

Senior Manager (NUS Career Centre)

NUS Career Centre

Office of Student Affairs

Yusof Ishak House, Level 1

31 Lower Kent Ridge Road, Singapore 11S078

Tel: (65) 6516 1278 Fax: (65) 6774 4463

E-mail: [email protected]

Website: vvwvv.nus.edusg/osa/career

Accuracy: 94.7%

From this, we are able to see that resolution highly affects accuracy. The significance in difference is shown

in the graph in figure 58.

Figure 58: Graph of resolution testing for accuracy results

Even though high resolution is slow, the author’s primary focus in this project is accuracy, and therefore

65

Page 79: Mohamed Salman Ismail Gadit - Thesis.pdf

all results of cards shown will be shown in high resolution, which means processing time would be quite slow.

This happens due to limited processing power and the fact that the application is running completely offline

with inbuilt libraries.

For the rest of the results section, everything will be assumed to be in high resolution henceforth.

66

Page 80: Mohamed Salman Ismail Gadit - Thesis.pdf

9 App Results

To demonstrate the accuracy and result of the app so far as a whole, various different types of NUS and external

cards with different features have been tested and accuracy results given. The percentage accuracy of the results

have been calculated, and justification provided if the result is not good enough. All the cards in this section

have to be assumed to be of the highest resolution written by the apps customised bitmap handler.

Given below are descriptions of a few of the terms used in the testing process:

• Clear indicates a fairly new card, with clear text and less dirt and crushing

• Unclear indicates an old card, lots of dust, dirt and crushed edges

• Light indicates the type of background colour of the card, being close to white

• Dark indicates cards with darker background colours

67

Page 81: Mohamed Salman Ismail Gadit - Thesis.pdf

9.1 NUS Cards

9.1.1 Clear Card

Figure 59: Image used for NUS Clear Card test

Expected Result Actual Result

Cammie TAN Cammy TAN

Senior Manager (NUS Career Centre) Senior Manager (NUS Career Centre)

NUS Career Centre NUS Career Centre

Office of Student Affairs Office of Student Affairs

Yusof Ishak House, Level 1 Yusof Ishak House, Level l

31 Lower Kent Ridge Road, Singapore 119078 31 Lower Kent Ridge Road, Singapore 11S078

E-mail: [email protected] E-mail: [email protected]

Tel: (65) 6516 1278 Fax: (65) 6774 4463 Tel: (65) 6516 1278 Fax: (65) 6774 4463

Website: www.nus.edu.sg/osa/career Website: vvwvv.nus.edusg/osa/career

Table 4: NUS Clear Card Results

Accuracy: 94.7%

68

Page 82: Mohamed Salman Ismail Gadit - Thesis.pdf

9.1.2 Unclear Cards

Figure 60: Image used for NUS Unclear Card test

Expected Result Actual Result

Shona Gillies Shone 1Gli(Lll,,lE5

Assistant Manager Assistant Manager

International Relations Office International Relations Office

3rd Storey, Unit 03-03 3rd Storey, Unit 03-03

Shaw Foundation Alumni House Shaw Foundation Alumni House

11 Kent Ridge Drive, Singapore 119244 ll Kent Ridge Drive, Singapore l 19244

Tel: (65) 6526 4084 Fax: (65) 6778 0177 Tel: (65) 6516 4084 Fax: (65) 6778 0177

E-mail: [email protected] E-mail: [email protected]

Website: www.nus.edu.sg/iro Website: www.nus.edu.sg/iro

Table 5: NUS Unclear Card Results

Accuracy: 93.7%

The name section of this card has not yielded a good result. This is because it is evident that the top part of the

card is quite crushed. In this instance, retrieving the data even by good pre-processing is not possible if the card

has a physical anomaly like the one shown in this example.

69

Page 83: Mohamed Salman Ismail Gadit - Thesis.pdf

9.2 External Cards

9.2.1 Clear Light Background

Figure 61: Image used for External Card with Light Background

Expected Result Actual Result

Ching Kuan Thye Keith (tChing Kuan Thye Keith

Engineer, Sofware Dev. Engineer, SoMare Dev.

BG Lifestyle Entertainment I&D BG Lifestyle Entertainment l&D

Tel: +65 6882 4904 Fax: +65 6258 0892 Tel: +65 6882 4904 Fax: +65 6258 0892

[email protected] [email protected]

Philips Consumer Lifestyle Philips Consumer Lifestyle

620A Lorong I, Toa Payoh 620A Lorong I,Toa Payoh

Building TPI, Level 3 Singapore 319762 BuildingTPl , Level 3 Singapore 319762

www.philips.com www.philipsxom

Table 6: External Card with Light Background Results

Accuracy: 96.1%

70

Page 84: Mohamed Salman Ismail Gadit - Thesis.pdf

9.2.2 Unclear Light Background

Figure 62: Image used for External Unclear Card with Light Background

Expected Result Actual Result

Hassan Gaffar, Project Manager Hassan Ciaffar, Project Manager

SolarWorld Asia Pacific Pte Ltd. SolatWorld Asia Pacific Pte Lid.

Co Reg No. 198102529K Co ioto, No. 19,%02S2u’.1K

70 Bendemeer Road Luzerne, #06-01 Singapore 339940 70 frendemeey ftoad Lugerne. #06 ’01 Singapore 339940

www.solarworld.sg www.solarworkisg

Tel: +65-6842-3886 Fax: +65-6842-3887 Tel: +65-6842-3886 Fax: +6S-6842-3887

Mobile: +65-8646-5317 DID: +65-6422-2695 Mobile: +6S-8646-S317 DID: ’65-6422-2695

[email protected] hassan gafrar@solarworldsg

Table 7: External Unclear Card with Light Background Results

Accuracy: 81.6%

This card is an example of a dirty card, as it looks like it has been stored away in a possibly dusty place for a

long time. It also has very small fonts, making it harder for Tesseract to see. However, using the preprocessing

methods and segmentation algorithms combined, the result meets the team’s minimum accuracy of 75% and

renders reasonably good results

71

Page 85: Mohamed Salman Ismail Gadit - Thesis.pdf

9.2.3 Clear Dark Background

Figure 63: Image used for External Clear Card with Dark Background

Expected Result Actual Result

TITANSOFT PTE LTD TITANS0FT PTE LTD

Software Consultancy & Development Software Consultancy & Development

Design & Build Design & Build

E-Gaming Solution E-Gaming Solution

Video Streaming Video Streaming

Network Security & Consultancy Network Security d Consultancy

TEL +65 6396 6458 FAX +65 6396 6496 TEL -65 6396 6.15t? FAX - 65 6396 6496

www.titansoft.com.sg www.titansoft.com.sg

150 Cantonment Road #02-06/08, 150 Cantonment Road tc02-06 ”

Cantonment Centre Block B, Singapore 089762 Cantonment Centre Block B Singapore 089762

Table 8: External Clear Card with Dark Background Results

Accuracy: 80.8%

Tesseract doesn’t work well with dark coloured backgrounds. Usually, there would be a result of under 20% for

this card. However, the workflow provided by the team ensures that the preprocessing is customised for darker

backgrounds and therefore meets the minimum accuracy percentage set by the team for external cards, of 75%.

72

Page 86: Mohamed Salman Ismail Gadit - Thesis.pdf

9.2.4 Unclear Dark Background

Figure 64: Image used for External Unclear Card with Dark Background

Expected Result Actual Result

CHRISTOPHER CAI (CHlRyE)”ir((i))F)Tifiytii c)) -

TECHNICAL DIRECTOR (iOS) Tf CHNKAL DIRECTOR (sos)

+65 9756 2375 +65 9/56 *g75

+65 6342 0810 -65 6–. 00

[email protected] [email protected] thrts@ ’,’, srtiloC9 ’ q

www.replaid.com www.replaid.com

facebook.com/chriscai facebook.corn/r: l”,

REPLAID REPLAID

81 JOO CHIAT ROAD #02-02, SINGAPORE 427725 81 JOO CHIST RWAD #02-?), C/MPR .;,;

Table 9: External Unclear Card with Dark Background Results

Accuracy: 46.5%

This is an example of a failing case card for the application. This card has dark backgrounds, folded edges

and reflecting lights on the card. This makes the analysis process very difficult and therefore renders a negative

result, with only 46.5% accuracy. It is a requirement that the user take a good picture if good results are to be

expected from the app.

73

Page 87: Mohamed Salman Ismail Gadit - Thesis.pdf

9.2.5 Colored Background

Figure 65: Image used for External Card with Mixed Background

Expected Result Actual Result

B. T. LOKESH B.Tech. PhD B, T, LOKESH BTeCh ’ti-t

+65 6248 9777 +65 6248 9777

+65 6248 5296 +65 6248 S296

+65 6463 5056 +65 6463 5056

[email protected] bheemal6Punisim.edu.sg

www.unisim.edu.sg wwwoumiisiimt,edluusg

Table 10: External Card with Mixed Background Results

Accuracy: 84.7%

This is an example of a card with both light and dark backgrounds. Here all the components in the process come

into pay together to seperate the sections and render reasonable results. The card meets the threshold set by the

team for a minimum results accuracy of 75% and therefore satisfies the goals set by the team.

74

Page 88: Mohamed Salman Ismail Gadit - Thesis.pdf

9.3 Summary of Results

The team tested out cards for 10 NUS cards and 25 External Cards of different types. On this testing, the author

and the team realised that the results of the app depend on three primary factors:

1. Adequate lighting - the cards need to have good lighting to be able to yield reasonable results. Therefore

the user needs to ensure that the lighting is enough before taking a picture. Shadows aren’t much of an

issue, but if they can be avoided, it leads to better results.

2. Physical condition of the card - If the card is damaged in the corners, it can prove to be a problem as seen

in figure 60. Even though the rest of the card was perfect, the name in the card did not yield good results

due to physical damage in the card itself.

3. Dust - If there is a lot of dust and dirt on the card, it can interfere with the letters. Even though the team

implements measures to remove random dust particles, if they merge with the lettering it is unidentifiable

and thus should be avoided when used by the users as much as possible.

Figure 66: Graph of accuracy for NUS cards

With the results, the team has achieved a test accuracy of 92.26% for NUS cards and 77.34% for external

cards.

75

Page 89: Mohamed Salman Ismail Gadit - Thesis.pdf

Figure 67: Graph of accuracy for external cards

76

Page 90: Mohamed Salman Ismail Gadit - Thesis.pdf

10 Conclusion

At the heart of the Scope application lies innovation, finding creative answers to challenging problems. Fuelled

by sound imagination, ideas that are generated from scratch are fortified by extensive research in the respective

fields consequently transforming them into novel solutions. These solutions when put in place help to utilise the

available resources optimally, given their limitations, to generate the best possible results. This is seen across

the entirety of the application across the different processes. The edge detection and auto-rotation extract and

supply the most important part of the input image. The background segmentation analyses colour variations

and separates the card into different sections so that the cleaning filters and pixel fillers can cater to them

specifically and perform precise image correction. The OCR Engine is then made to work with these processed

segments, generating text that is processed by the Parser, allowing leeway for error that is expected. Thus in

essence, Scope is a system of components that follow a logical flow of functionality while all along effectively

complementing each other to ultimately produce a product that is robust and adaptive; the in-depth analysis of

results and performance that has been done is testament to this. The application produces accurate results when

subjected to a host of day-to-day scenarios and this goes to show that it is a product that is indeed more than

functional in its current state. Suggestions that have been given to improve specific aspects indicate that Scope

undeniably has potential for further growth in future iterations. These improvements will help the application

overcome certain limitations and extend its reach into an ever growing market and help realise its eventual goal

of being one amongst the best and definitively the most unique in its category.

77

Page 91: Mohamed Salman Ismail Gadit - Thesis.pdf

References

[1] M. Kanellos. (2003). ”Moore’s Law to roll on for another decade”. CNET [online].

http://news.cnet.com/2100-1001-984051.html.

[2] R. Holley , ”How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic

Newspaper Digitisation Programs”, volume 15 no 3/4, D-Lib Magazine, National Library of Australia,

2009.

[3] CVision Tech. (1998). ”Applications of OCR” [online]. http://www.cvisiontech.com/reference/general-

information/ocr-applications.html.

[4] C. Cutter. (2013). ”TED 2013: Here, the Business Card Is Not Dead” [online].

http://www.linkedin.com/today/post/article/20130308001134-13780238-ted-2013-the-business-card-

is-not-dead.

[5] R. Bennet.(2012). ”How Business Cards Survive in the Age of LinkedIn” [online].

http://www.businessweek.com/articles/2012-02-16/how-business-cards-survive-in-the-age-of-linkedin.

[6] Tech Tip Org. ”Convert Images to Text with Free OCR Applications” [online].

http://www.techtip.org/convert-image-to-text-with-ocr/.

[7] B. K. Ng, ”Screen Capture Optical Recognition Engine SCORE”, National University of Singapore, 2010.

[8] W.S. Lian, ”Heuristic Based OCR Post”, University of North Carolina Chapel Hille, 2009.

[9] S. Dover. (2012 ). ”Study: Number of smartphone users tops 1 billion” [online].

http://www.cbsnews.com/8301-205 162-57534583/study-number-of-smartphone-users-tops-1-billion/.

[10] PassMark Software. (2013). ”Android Benchmarks” [online]. http://www.androidbenchmark.net/cpumark chart.html.

[11] H.F. Schantz, ”The History of OCR: Optical Character Recognition”, Recognition Technologies Users

Association, 2012.

78

Page 92: Mohamed Salman Ismail Gadit - Thesis.pdf

[12] M. Mann, ”Reading Machine Spells Out Loud”, Popular Science, 1949.

[13] R. Ahmad. (2012). ”Optical Character Recognition (OCR)” [online].

http://rosalindaahmad.blogspot.sg/2012/04/optical-character-recognition-ocr.html.

[14] A. Kay. (2007). ”Tesseract: an Open-Source Optical Character Recognition Engine” [online].

http://www.linuxjournal.com/article/9676.

[15] L. Vincent. (2006). ”Announcing Tesseract OCR” [online]. http://google-code-

updates.blogspot.sg/2006/08/announcing-tesseract-ocr.html.

[16] D. Wolski. (2012). ”Toolbox: OCR with Tesseract OCR” [online].

http://www.heise.de/open/artikel/Toolbox-Texterkennung-mit-Tesseract-OCR-1674881.html.

[17] S. Bhaskar et al. (n.d.). ”Implementing Optical Character Recogni-

tion on the Android Operating System for Business Cards” [online].

http://www.stanford.edu/class/ee368/Project 10/Reports/Bhaskar Lavassar Green BusinessCardRecognition.pdf.

[18] Tesseract. ”Tesseract FAQ” [online]. https://code.google.com/p/tesseract-ocr/wiki/FAQ.

[19] B. Elgin. (2005) ”Google Buys Android for Its Mobile Arsenal” [online].

http://www.webcitation.org/5wk7sIvVb.

[20] Open Handset Alliance. (2007) ”Industry Leaders Announce Open Platform for Mobile Devices” [online].

http://www.openhandsetalliance.com/press 110507.html.

[21] J. Rosenberg. (2012) ”Google Play hits 25 billion downloads” [online].

http://officialandroid.blogspot.ca/2012/09/google-play-hits-25-billion-downloads.html.

[22] Canalys. (2011) ”Googles Android becomes the worlds leading smart phone platform” [on-

line]. http://www.canalys.com/newsroom/google%E2%80%99s-android-becomes-world%E2%80%99s-

leading-smart-phone-platform.

79

Page 93: Mohamed Salman Ismail Gadit - Thesis.pdf

[23] A. Russakovskii. (2012) ”Custom ROMs For Android Explained - Here Is Why You Want Them” [on-

line]. http://www.androidpolice.com/2010/05/01/custom-roms-for-android-explained-and-why-you-want-

them/.

[24] AForge.NET. (n.d.). Retrieved 01 12, 2012, from http://www.aforgenet.com/framework/features/

[25] OpenCV. ”Image Processing” [online]. http://docs.opencv.org/modules/imgproc/doc/imgproc.html.

[26] MathWorks. ”Image Processing Toolbox” [online]. http://www.mathworks.com/products/image/.

[27] Fixational. (2012). ”OpenCV vs MATLAB” [online]. http://blog.fixational.com/post/19177752599/opencv-

vs-matlab.

[28] U. Sinha. (2012). ”Why OpenCV?” [online]. http://blog.fixational.com/post/19177752599/opencv-vs-

matlab.

[29] A. Curylo. (2012). ”OpenCV for iOS OFFICIAL” [online]. http://www.alexcurylo.com/blog/2012/07/11/opencv-

for-ios-official/.

[30] R. Paul. (2011). ”Mono for Android framework lets C# developers tame the Droid” [online].

http://arstechnica.com/gadgets/2011/04/mono-for-android-framework-lets-c-developers-tame-the-droid/.

[31] Google Inc.. (2013). ”Google Goggles” [online]. http://www.google.com/mobile/goggles/#text.

[32] R. Theis. (2013). ”OCR Test” [online]. Google Play Store. https://play.google.com/store/apps/details?id=edu.sfsu.cs.orange.ocr&hl=en.

[33] ABBYY. (n.d.). ”ABBYY Business Card Reader” [online]. http://www.abbyy.com/products/bcr/.

[34] J. Richardson. (2012). ”ABBYY Business Card Reader and ABBYY CardHolder – scan business cards with

your iPhone” [online]. http://www.iphonejd.com/iphone jd/2012/02/review-abbyy-cardholder.html.

[35] M. Gargenta. (2009). ”Using NDK to Call C code from Android Apps” [online].

http://www.aishack.in/2010/02/why-opencv/l.

[36] OpenCV. ”Changing the contrast and brightness of an image!” [online].

http://docs.opencv.org/doc/tutorials/core/basic linear transform/basic linear transform.html.

80

Page 94: Mohamed Salman Ismail Gadit - Thesis.pdf

[37] W. Jarosz. ”Fast Image Convolutions” [online]. http://www.acm.uiuc.edu/siggraph/workshops/wjarosz convolution 2001.pdf.

[38] E. Reinhard, ”High dynamic range imaging: Acquisition, Display, and Image-Based Lighting”, Morgan

Kaufmann, pp. 233234, 2006.

[39] E. Arias-Castro and D. L. Donoho, ”Does median filtering truly preserve edges better than linear filter-

ing?”, Annals of Statistics, vol. 37, no. 3, 2009.

[40] C. Tomasi and R. Manduchi, ”Bilateral Filtering for Gray and Color Images”, Proceedings of the 1998

IEEE International Conference on Computer Vision, Bombay, India, 1998.

[41] L. G. Shapiro and G. C. Stockman, ”Computer Vision”, Prentice-Hall, New Jersey, pp. 279-325, 2001.

[42] D.L. Pham, ”Current Methods in Medical Image Segmentation”, Annual Review of Biomedical Engineer-

ing, 2000.

[43] R. Ohlander et al, ”Picture Segmentation Using a Recursive Region Splitting Method”, Computer Graphics

and Image Processing, 1998.

[44] J. Canny, ”A Computational Approach to Edge Detection”, Pattern Analysis and Machine Intelligence,

IEEE Transactions on, Vols. PAMI-8, no. 6, pp. 679-698, 1986.

[45] R. Wu et al., ”A Text Image Segmentation Method Based on Spectral Clustering”, Harbin Institute of

Technology, 2008.

[46] J. Shi and J. Malik, ”Normalized Cuts and Image Segmentation”, UC Berkeley, 2000.

[47] ”Loading Large Bitmaps Efficiently” [online]. http://developer.android.com/training/displaying-

bitmaps/load-bitmap.html.

[48] ”Cygwin” [online]. http://www.cygwin.com/.

[49] S. Loughran et al, ”Ant in Action”, 2nd Ed, July 12, 2007.

[50] P. Hyde, ”Java thread programming”, Sams Pub., Indianapolis, Ind, 1999.

81

Page 95: Mohamed Salman Ismail Gadit - Thesis.pdf

[51] Tesseract 3.02. ”Tesseract wiki” [online]. http://code.google.com/p/tesseract-ocr/w/list.

82