automatic spelling correction probability models and algorithms

Post on 05-Jan-2016

50 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Automatic Spelling Correction Probability Models and Algorithms. Motivation and Formulation Demonstration of a Prototype Program The Underlying Probability Models Algorithms for Automatic Correction Conclusion. Motivation and Formulation. A set of words: the vocabulary  - PowerPoint PPT Presentation

TRANSCRIPT

Automatic Spelling CorrectionProbability Models and Algorithms

• Motivation and Formulation

• Demonstration of a Prototype Program

• The Underlying Probability Models

• Algorithms for Automatic Correction

• Conclusion

Motivation and Formulation

• A set of words: the vocabulary • Single-word correction: Given any

character string S that may or may not belong to , match S with the most likely word W in .

• Example: = {is, are, am}

iis is ae are anam

Motivation and Formulation

• Multiple-word correction: Given a series of character string S1S2…Sm, each of which may or may not belong to , match them with the most likely word series W1W2…Wm formed by words from .

• Example: = {I, is, are, am}

ii bn I am

Motivation and Formulation

• Given a word w, what do we mean by the most likely word for w in ?

Needs some probability models

• How to find the most likely word for w?

Needs to develop algorithms

Probability Models: Typical Typos

Errors in the transition of mental states– Repeating characters: iis is– Skipping characters: ae are

Mentally right, but the finger wrongly land in a nearby key– anam

Probability Models

• The Word Model: for each word w, how do we probabilistically transition from one mental state of trying to type some character in the word to another. e.g.

Ideally: a r ebut things like: a a r e

a e could happen.

Probability Models• The keyboard model :

(i.e. the acoustic model in speech recognition)for a mental state of trying to type a character c in a word what is the probability distribution over the actual keys touched. e.g.Ideally: you want to type a you touch abut you might touch b, q , z , s , w , x , …

Probability Models

• The Language Model: (i.e. the sentence model)How do we put words together to form sentences?

• The language model is not absolutely necessary for single-word correction, but it can further improve the accuracy and multiple-word correction by considering the context.

Probability Models

• The Language Model: (i.e. the sentence model)For example, a bigram language model shows how likely each individual word will appear in a sentence and how likely one word will follow another word . Such knowledge can help :

e.g. you see two words: I an I an are much more likely generated from I am than from I a

Algorithms

Calculate the probability of generating a character string S of s characters when trying to type a word W of w characters.

• O(sw2) using dynamic programming• O(ws) using a naïve approach

Algorithms

Single-word correction:

Determine the most likely word from a vocabulary of v words (with maximally w characters per word) for a string S of s characters.

• O(vsw2) using dynamic programming• For each word W in the vocabulary, calculate the

probability of generating S from W, weighted by individual word frequency, find the most like one.

Algorithms

Multiple-word correction:

Determine the most likely word series W1W2…Wm of

m words from a vocabulary of v words (with maximally w characters in each word there) for m strings S1S2…Sm of (with maximally s characters in

each string).

Conclusion

• Similar modeling and analysis applicable to speech recognition

• Mathematical structures provides powerful tools for modeling and analysis

• Design and analysis of algorithms important to real-world problem solving

• Mathematical structures and algorithms: two key components of modern AI research.

top related