ocr with n n

29
OCR with Neural OCR with Neural Network Network Made By: Made By: Marwa Fadhel Jassim Marwa Fadhel Jassim Karam Samir Khalid Karam Samir Khalid

Upload: marwa-alkubaissy

Post on 19-Jan-2015

1.453 views

Category:

Education


1 download

DESCRIPTION

ocr is converting image to text ,so we can edit it or use it add to it ....

TRANSCRIPT

Page 1: ocr with N N

OCR with Neural OCR with Neural NetworkNetwork

Made By:Made By:• Marwa Fadhel JassimMarwa Fadhel Jassim• Karam Samir KhalidKaram Samir Khalid

Page 2: ocr with N N

IntroductionOptical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website.

Page 3: ocr with N N

OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as machine translation, text-to-speech and text mining to it. OCR is a field of research in pattern recognition, artificial intelligence and computer vision. OCR systems require calibration to read a specific font; early versions needed to be

Page 4: ocr with N N

programmed with images of each character, and worked on one font at a time. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common.Some systems are capable ofreproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.

Page 5: ocr with N N

OCR: picture of text → textAnd Bruno, conqueror of Carthage, strode up to me and said:

"Devil take you, Edith!"

"Finally, you scoundrel - are you going to confess your love for me?" I retorted.

The German warrior stood stoically. He surveyed the landscape before him; grinned; spoke:

"You are much better with an axe than Jane - I grant you that."

(Killing, I admit, was my favourite pastime. Long before I enlisted in the Order of the Knights of Malta, I liked playing with knives. No-one objected.)

"Overall, how would you rank/rate my performance in axing?"

"Performance evaluations are meaningless!"

(Quite true.)

Radically changing the topic, I asked:

"So, what are your thoughts on Empress Teresa?"

"Unshareable; irrelevant; bitter."

"Secret? Very unsurprising. We warrior/troubadours are quite reserved - nay - silent."

(Xenophobia played a role too. You knew that. So did my friend, Zoe.)

Page 6: ocr with N N

OCR Step 1: Find letters

Page 7: ocr with N N

OCR Step 2: Identify each letter

“P”

Page 8: ocr with N N

Identifying letters is hard

● Letters can be:● Blurry● Rotated / squashed / skewed● In different fonts● Bold or in italics

● Background can have:● Speckles, dirt● Texture from paper

Page 9: ocr with N N

Approaches

● Compare with reference images● Find major lines, use heuristics, eg “vertical

line on left, vertical line on right, horizontal line in the middle → H”

● Etc...● How do humans do it? → neural networks

Page 10: ocr with N N

What is Neural Networks?A neural network is a powerful data modeling tool that is

able to capture and represent complex input/output

relationships. The motivation for the development of

neural network technology stemmed from the desire to

develop an artificial system

that could perform

"intelligent“ tasks similar to

Those performed by the

human brain.

Page 11: ocr with N N

Real Neurons

Page 12: ocr with N N

Neuronal Connections

Page 13: ocr with N N

Firing neurons excite others

Firing threshold

Page 14: ocr with N N

Firing neurons excite others

Firing threshold

Page 15: ocr with N N

Firing neurons excite others

Firing threshold

Page 16: ocr with N N

...which in turn excite others

Firing threshold

Page 17: ocr with N N

Inputs can be weighted

Firing threshold

0.7

0.4

Page 18: ocr with N N

Neurons can suppress others

Firing threshold

0.7

0.4

-0.5

Page 19: ocr with N N

And they can have a starting bias

Firing threshold

Bias 0.3

Page 20: ocr with N N

(So they're basically logic gates)0.5

0.5

1

1

-1Bias: 1

AND

OR

NOT

Page 21: ocr with N N

Neurons arranged in layers

● Neurons in one layer excite/suppress neurons in the next one

● Excitation of neurons in first layer set according to the input

● “Hidden” layer(s) in between

● Final layer is output

In Hid Out

Page 22: ocr with N N

Simple letter identification network

● One input neuron per pixel in scaled picture of letter

● One output neuron per possible letter

● Train network to excite the output neuron that corresponds to the letter input

ABCDEFGHIJKLM...

Page 23: ocr with N N

Training Method Training Method The most popular and simple approach to OCR problem is based on feed forward neural network with back propagation learning. The main idea is that we should first prepare a training set and then train a neural network to recognize patterns from the training set. In the training step we teach the network to respond with desired output for a specified input. For this purpose each training sample is represented by two components: possible input and the desired network's output for the input. After the training step is done, we can give an arbitrary input to the network and the network will form an output, from which we can resolve a pattern type presented to the network.

Page 24: ocr with N N

Let's assume that we want to train a network to recognize 26 capital letters represented as images of 5x6 pixels, something like this one:

One of the most obvious ways to convert an image to an input part of a training sample is to create a vector of size 30 (for our case), containing "1" in all positions corresponding to the letter pixel and "0" in all positions corresponding to the background pixels. But, in many neural network training tasks, it's preferred to represent training patterns in so called "bipolar" way, placing into input vector "0.5" instead of "1" and "-0.5" instead of "0". Such sort of pattern coding will lead to a greater learning performance improvement.

Page 25: ocr with N N

our training sample should look something like this:

For each possible input we need to create a desired network's output to complete the training samples. For OCR task it's very common to code each pattern as a vector of size 26 (because we have 26 different letters), placing into the vector "0.5" for positions corresponding to the patterns type number and "-0.5" for all other positions

Page 26: ocr with N N

So, a desired output vector for letter "K“ will look something like this:

After having such training samples for all letters, we can start to train our network. But, the last question is about the network's structure. For the above task we can use one layer of neural network, which will have 30 inputs corresponding to the size of input vector and 26 neurons in the layer corresponding to the size of the output vector.

Page 27: ocr with N N

The OCR software breaks the image into sub-images, each containing a single character. The sub-images are then translated from an image format into a binary format, where each 0 and 1 represents an individual pixel of the sub-image. The binary data is then fed into a neural network that has been trained to make the association between the character image data and a numeric value that corresponds to the character. The output from the neural network is then translated into ASCII text and saved as a file.

Another Method

Page 28: ocr with N N
Page 29: ocr with N N

Thanks for listeningThanks for listening