searching with autocompletion cool algorithms for a cool feature holger bast [email protected]
DESCRIPTION
Doc 1. Doc 2. A B. A-D. D. ADA B AA B 1110101101 00110001000001. Doc 3. Doc 4. Doc 5. A B. B C D. -. A-B. C-D. Doc 6. Doc 7. Doc 8. B B B 1010010 111. CDDC 0001111 0110. -. A D. A B D. Doc 9. Doc10. -. B C. C. A. B. D. 000. 000. 0000. 1000. - PowerPoint PPT PresentationTRANSCRIPT
Searching with Autocompletioncool algorithms for a cool feature
Holger [email protected]
A: 1,3,7,8B: 1,3,5,8,10C: 5,10D: 2,5,7,8
Ordinary Inverted Index
~ |W| time per query
Our new data structure
~ |W'| time per query
Doc 1
A BDoc 2
D
Doc 3
A BDoc 4
-Doc 5
B C D
Doc 6
-Doc 7
A DDoc 8
A B D
Doc 9
-Doc10
B CDCBA
A-D
A-B C-D
ADA B AA B111010110100110001000001
B B B 1010010111
CDDC00011110110
000 000 0000 1000
The core algorithmic problem
Given a range of words W (all completions of the last word the user has started typing)and a set of documents D (the hits of the preceding part of the query),
compute the subset W' W⊆ of words that occur in at least one document of D and the subset D' D⊆ of documents containing a word from W'.
For the queryadfocs rag
all words aredisplayed that start with ragand that occurin a document
that also containsa word startingwith adfocs.
This list ofcompletions is
updated with every new letteryou type!
The list of tophits is also
updated withevery new letter
you type.
It is oftensurprising howlittle one has totype to get towhat one was
looking for!
Please try ityourself on the
MPII homepage.
Type a ? for helpbefore you start
searching!
typically
|W'| << |W|