2013 speech tek - alphanumeric recognition discussion

Post on 08-Jul-2015

380 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

This morning's discussion on Alphanumeric Reco was great. Here are the slides for anyone who is interested. Thanks to all for sharing their experiences!

TRANSCRIPT

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

Alphanumeric Speech Recognition

SpeechTek

August 19, 2013

Crispin Reedy

“The fault, dear Brutus, is not in our stars, but in ourselves”

-- Julius Caesar, Act I, scene ii

2

The Problem With Alphanumerics

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

The Need

• Account Numbers

• Policy Numbers

• Spelling out names and addresses

• Special cases

– VIN, Canadian Postal Code

• And more…

3

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

Methods for Addressing

• Project Tactics

• Limit the grammar

– Constraint List

– N-Best + Back-End Data Validation

• Confirmation

• Prefiller

4

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

Project Tactics

• Can you avoid it?

– Phone number / SSN / Zip / DOB?

• Set expectations

– Not always easy!

• Describe the problem

• What tools do you have available?

– Constraints / patterns?

– Back-end data source available?

• Can you run a proof of concept / experiment?5

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

Constraints and Patterns

• Does the number have any known pattern that can be used to limit possible values (and thereby improve recognition)– For example:

• First character is always A

• First three characters are always numbers

• Last characters are always C, G or T.

• If the answer is “no,” consider doing your own analysis.– Even if you don’t think there is a pattern, there

may be one.6

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

Applying Constraints

• Writing grammar specifically for the pattern

– How complicated is it?

• Applying a constraint list.

– How big is it?

7

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

Using nBest + Back-End Data

• Collect using an unconstrained grammar

• Set your recognizer to return an nBest list.

• Use a webservice / back end data dip to determine which ones are “real.”

• Confirm the first “real” one on the list

– Throw out the ones that are not real.

• If no, confirm the second “real” one on the list.

– Potentially collect again after that.8

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

Confirmation Strategy

• PROTIP: Phonemes that are difficult for the recognizer to hear … are also difficult for humans to hear when they are spoken back.

• Confirm using letter names for easily confusable alphanumerics.

– “You said 8, 2, 7 G as in George, B as in Boy, 9. Is that right?”

9

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

What About Letter Names?

• Yes with caveats:– Do you have a special domain that would allow

you to teach the caller letter names?

– Letter names invented by the caller will be quite variable. • Some of the “oddballs” will never be recognized

– If letter names are used during confirmation, and the utterance is re-collected, the caller may tend to use those letter names during the second collection. • So add them.

10

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

What About Letter Names?

• Yes, because:

– Longer utterances “B as in Boy” are not likely to generate false acceptance between shorter utterances such as “G” “T” etc.

• Make them separate rules so they can be weighted

11

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

Using Prefiller

• “The account number is… B Z 3 9 0”

– Noticeable improvement in recognition of first letter

– Caller may spontaneously offer

– Consider teaching the caller to say the prefiller

• Especially if you have repeat callers

12

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

Other Suggestions

• Look at speech recognition parameters that are not directly related to alphanumeric

– Are callers calling from a very noisy environment?

• Adjust overall speech threshold

– Timing of utterance collection?

• Listen to recording of utterances to make sure everything is getting collected

13

© 2002 – 2012 Versay Solutions, LLC. All rights reserved.

Specific Cases

• VIN

– Has specific pattern, but different for each manufacturer

– 16 digits: nobody will want to re-enter if you get it wrong.

14

IT DEPENDS!

15

but which way is “the best?”

top related