searching with autocompletion cool algorithms for a cool feature holger bast [email protected]

Searching with Autocompletion cool algorithms for a cool feature Holger Bast [email protected] A: 1,3,7,8 B: 1,3,5,8,10 C: 5,10 D: 2,5,7,8 Ordinary Inverted Index ~ |W| time per query Our new data structure ~ |W'| time per query Doc 1 A B Doc 2 D Doc 3 A B Doc 4 - Doc 5 B C D Doc 6 - Doc 7 A D Doc 8 A B D Doc 9 - Doc10 B C D C B A A-D A-B C-D ADA B AA B 1110101101 00110001000001 B B B 1010010 111 CDDC 0001111 0110 000 000 0000 1000 The core algorithmic problem a range of words W (all completions of the last word the user has started ty and a set of documents D (the hits of the preceding part of the query), compute the subset W' ⊆ W of words that occur in at least one document of D and the subset D' ⊆ D of documents containing a word from W'. For the query adfocs rag all words are displayed that start with rag and that occur in a document that also contains a word starting with adfocs. This list of completions is updated with every new letter you type! The list of top hits is also updated with every new letter you type. It is often surprising how little one has to type to get to what one was looking for! Please try it yourself on the MPII homepage. Type a ? for help before you start searching! typically |W'| << |W|

Upload: derex

Post on 20-Jan-2016

30 views

Category:

Documents

1 download

Report

Download

Embed Size (px):

DESCRIPTION

Doc 1. Doc 2. A B. A-D. D. ADA B AA B 1110101101 00110001000001. Doc 3. Doc 4. Doc 5. A B. B C D. -. A-B. C-D. Doc 6. Doc 7. Doc 8. B B B 1010010 111. CDDC 0001111 0110. -. A D. A B D. Doc 9. Doc10. -. B C. C. A. B. D. 000. 000. 0000. 1000. - PowerPoint PPT Presentation

TRANSCRIPT

Searching with Autocompletioncool algorithms for a cool feature

Holger [email protected]

A: 1,3,7,8B: 1,3,5,8,10C: 5,10D: 2,5,7,8

Ordinary Inverted Index

~ |W| time per query

Our new data structure

~ |W'| time per query

Doc 1

A BDoc 2

Doc 3

A BDoc 4

-Doc 5

B C D

Doc 6

-Doc 7

A DDoc 8

A B D

Doc 9

-Doc10

B CDCBA

A-D

A-B C-D

ADA B AA B111010110100110001000001

B B B 1010010111

CDDC00011110110

000 000 0000 1000

The core algorithmic problem

Given a range of words W (all completions of the last word the user has started typing)and a set of documents D (the hits of the preceding part of the query),

compute the subset W' W⊆ of words that occur in at least one document of D and the subset D' D⊆ of documents containing a word from W'.

For the queryadfocs rag

all words aredisplayed that start with ragand that occurin a document

that also containsa word startingwith adfocs.

This list ofcompletions is

updated with every new letteryou type!

The list of tophits is also

updated withevery new letter

you type.

It is oftensurprising howlittle one has totype to get towhat one was

looking for!

Please try ityourself on the