phonetic search for multiple choice questioncsyu/yu_resume 2016_01_08... · phonetic search for...

18
Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Upload: others

Post on 12-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Phonetic Search for Multiple Choice Question

By Chung-Hsien (Jacky) Yu

01/06/2016

1

Page 2: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Problem Definition

2

selection list = [ (1,"Montague Expressway, Milpitas, CA"), (2,"5120 North 1st Street, San Jose, CA"), (3,"2870 Zanker Road, San Jose, CA")] query = "Montag"

• Select a string from a given list, which’s sound is similar to the sound of the query string.

• Select by the given index number. (1,2,3) • Select by the ordinal sequence. (Frist, second, last)

Page 3: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Beider-Morse Phonetic Matching

• Encoding of the words by the sound.

• The words with the same sound have the same encoding.

• Recognizing the words written in a different way actually can be phonetically equivalent or sound alike.

• Other encoding methods, such as Soundex, do not include the vowels, a, e, i, …, but BMPM does.

Source: http://stevemorse.org/phonetics/bmpm.htm 3

Page 4: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

BMPM Encoding List

Source: http://stevemorse.org/phonetics/bmpm.htm 4

Example Example

a Like in part b Like in boy

d Like in dog e Like in set

f Like in flag g Like in dog

h Like in hand i Like in Nice (the city), or ee as in fleet

j Like y in yes, equivalent to German j k Like in king

l Like in lamp m Like in man

n Like in neck o Like in port

p Like in pot r Like in ring

s Like in star t Like in tent

u Like in flu, or oo in good v Like in vase

w Like in wax x Like ch in loch; equivalent to Germanch

z Like in zoo S Like s in sure, or sh in shop

Z Like z in azure; equivalent to French j

Page 5: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

BMPM Implementation

5

• http://stevemorse.org/phoneticinfo.htm

• pip install abydos

• from abydos.phonetic import bmpm

Page 6: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

BMPM Function

6

bmpm(word, language_arg=0, name_mode='gen', match_mode='approx', concat=False, filter_langs=False): str word: the word to transform str language_arg: the language of the term; supported values str name_mode: the name mode of the algorithm: str match_mode: matching mode: 'approx' or 'exact' bool concat: concatenation mode bool filter_langs: filter out incompatible languages returns: the BMPM value(s) rtype: tuple

Page 7: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

BMPM Encoding

7

“starbucks” = ['sterbuks', 'sterbaks‘, 'storbuks', 'storbaks', 'starbuks', 'starbaks']

bmpm(‘Starbucks’, 'english', 'gen', 'exact', False, True).split(" ")

The combinations of possible pronunciations

Page 8: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

BMPM Combination Codes

8

“Starbucks” = ['sterbuks', 'sterbaks', 'storbuks', 'storbaks', 'starbuks', 'starbaks']

“startbuck” = ['sterdbuk', 'sterdbak', 'stordbuk' ,'stordbak', 'stardbuk', 'stardbak']

Comparing two list of codes to find the similarity between two words

Page 9: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Comparing Two Codes

9

• Levenshtein distance : The minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.

https://en.wikipedia.org/wiki/Levenshtein_distance

Page 10: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Levenshtein Distance

10

"kitten" and "sitting" = 3 1. kitten → sitten (substitution of "s" for "k") 2. sitten → sittin (substitution of "i" for "e") 3. sittin → sitting (insertion of "g" at the end).

https://en.wikipedia.org/wiki/Levenshtein_distance

Page 11: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Python Levenshtein Distance

11

pip install python-levenshtein import Levenshtein

similarity = Levenshtein.ratio(string1, string2) • Compute similarity of two strings. • The similarity is a number between 0 and 1. • 1 means that they are the same string.

Page 12: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Comparing Two Code Lists

12

Q.str1 Q.str2 Q.str3

S.str1 1 0.4 0.6

S.str2 0.5 0.9 0.8

S.str3 0.6 0.3 0.7

Max. 1 0.9 0.8 Avg=0.9

Query = [str1, str2. str3]

S=

[str1, str2

, str3]

0.9 is the matching score between Q and S.

Page 13: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Matching the Query

13

• The selection with the highest matching score is chosen as the best match with the query.

• Returning the index of the selection. • If the highest matching score is lower than a

threshold, it is an indecisive choice returning None.

Page 14: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

The Numbers

14

“2870 Zanker Road” = ['zenker', 'zonker', 'zanker', 'rout']

All the numbers got ignored !!! All the numbers got ignored !!!

Page 15: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Convert the Numbers to Strings

15

“2870 Zanker Road” = “two thousand, eight hundred and seventy Zanker Road”

pip install num2words from num2words import num2words str = num2words(int) str = num2words(int, ordinal=True)

Page 16: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Why Number to String?

16

• Converting the numbers in both the query and selection strings for consistency.

• Allow select by the numbers included in the string. • The query string can use number, ‘1’, ’2’,.. , or ‘one’,

‘two’,… for selection. • Could be extended to select by the ordinal sequence.

(Frist, second, last)

Page 17: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Select by Index

17

selection list = [ (6,"Montague Expressway, Milpitas, CA"), (7,"5120 North 1st Street, San Jose, CA"), (8,"2870 Zanker Road, San Jose, CA")] query = “number 6“ or “number six“

Add the index to the string: [ (“six Montague Expressway, Milpitas, CA"), (“seven 5120 North 1st Street, San Jose, CA"), (“eight 2870 Zanker Road, San Jose, CA")] query = “number six“

Page 18: Phonetic Search for Multiple Choice Questioncsyu/YU_resume 2016_01_08... · Phonetic Search for Multiple Choice Question By Chung-Hsien (Jacky) Yu 01/06/2016 1

Select by Order

18

selection list = [ (6,"Montague Expressway, Milpitas, CA"), (7,"5120 North 1st Street, San Jose, CA"), (8,"2870 Zanker Road, San Jose, CA")] query = “the first one“

Add the ordinal index to the string: [ (“first Montague Expressway, Milpitas, CA"), (“second 5120 North 1st Street, San Jose, CA"), (“third last 2870 Zanker Road, San Jose, CA")] query = “the first one“