why tune relevance

11

Why tune relevance

Because we want to find the one single best item, among a large group of possible candidates….

Multiple levels of control

RelevancyRankingPrecision

Recall

Business Rules

InPerspective™

Core Algorithmic Model

Application Model

Levels o

f con

trol

FAST Relevancy Framework

Business Rules

InPerspective™

Core Algorithmic Model

Application ModelSorting order,

navigation, relevance feedback

Accessible to… Control Mechanisms

End Users

Business Managers

Query and document

“boosting” (BMCP)

Administrator “Rank Profile”

Developer Algorithm “weights”

Levels o

f con

trol

Multiple levels of control

FAST Relevancy FrameworkInPerspective™

Freshness• How fresh is the document compared to the time of the query?

Completeness• How well does the query match superior contexts like the title or the url?• Example: query=”Mexico”, Is ”Mexico” or ”University of New Mexico” best?

Authority• Is the document considered an authority for this query?• Examples: Web link cardinality, article references, product revenue, page

impressions, ...

Statistics• How well does the contents of this document on overall match the query?• Examples: Proximity, context weights, tf-idf, degree of linguistic normalization,

++

Quality• What is the quality of the document? • Examples: Homepage?, Press release?, ...

Distance• What is the distance from where I am?

InPerspective

6

FAST Relevancy Framework : Rank Profile

Rank-Profile: A Relevancy Mixing BoardAuthority:

Freshness:

Proximity:

Context: Body:Description

URL:Keywords:

Title:

• Rank-Profile: Financial News• Descript

ion• Body: • URL:

• Keywords:

• Title:• Context:• Proximit

y:• Freshne

ss:• Authorit

y:

• Rank-Profile: Default (Intranet)• Descript


• Keywords:


y:• Freshne

ss:• Authorit

y:

• Rank-Profile: Wealth Management• Descript


• Keywords:


y:• Freshne

ss:• Authorit

y:

Search Business Center

SBC

FAST UnityTM

What It Does, How It Works, and What Value It Provides

9

FAST Unity at a Glance

SearchIndex

Internal Sources External Sources (e.g. another ESP instance)

Front-end Search

Application

FAST ESP

Federation

… Web Search Engine Web Site…

FAST Sources• FAST ESP 5.x• FAST Data Search 4.x• FAST ImPulse• FAST AdMomentum• FAST RetrievalWare

External Sources• Microsoft

SharePoint 2003 & 2007

• Web search engines• Google, Yahoo,

OpenSearch, Gigablast

• Web services• Match.com,

PriceGrabber, Google Image

• Advertising services• Google Adsense

10

Look and feel - Unity

Featured Content

Calls-to- Action

Ads

User-generatedContent

Third-partyContent

Multimedia

SubscriptionFeeds

11

ExampleWeb 2.0 Model

UNITY

Query

Source Queries

AdMo

– One query - multiple result

sets

– Results are returned

asynchronously

– Delivered directly to the

browser

Single Search Node Performance– 20-50 Million documents

Up to 1TB of information– 100-500 queries per second– 20-50 ms query response time– Down to 50 ms indexing latency– Indexing 50+ documents per second

while maintaining search performance

FAST Scalability Facts:

• Deployments with >40TB• Deployments with >3B documents

• Deployments with 1 to 1000+ servers• Deployments with 1000s of queries per second

• Deployments with >500 updates per second• 20-50 ms query response time• Sub-second indexing latency

• Crawling >200 documents per second per server

Document Freshness SCALING

FAST ESP - Scalability

3D Scalability: #Documents - #Users - Index Latency

Dual Pentium4, 3 Ghz4 GB Ram

3 X SCSI 15K rpmHW RAID-0 derivate

13

Query Performance of FAST Search VS RDBMSProven High QPS, Low Latency Access – Database Offloading

0

2

4

6

8

10

12

14

16

18

20

1/16 1/8 1/4 1/2 1 2 4 8 16 32

[sec.]

# que

ries

• Structured data:• 5 million records;

• 13 fields per record

• Structured queries:• 22 SQL queries

( Representative in ERP )

ESP5

RDBMS

• #1: FAST ESP4 w/ disk• Mean = 99 [ms]• St.dev. = 36 [ms]

• #2: Oracle w/ memory mapping

• Mean = 4 057 [ms]• St.dev. = 9 368 [ms]

0

100

200

300

400

500

600

700

800

900

1 2 3

FAST

ORA

20 users

50 users

100 usersQPS

Latency

Identical HW : single node, 2 CPU, 4GB ram 3 SCSI disks

Identical data : auction data from eBay, 3.6 million doc’s

Identical queries: 200 queries defined by Oracle

ESP5

RDBMS

14

ESP5 ScalabilityEfficiency Per Server & Linear Scaling

CONTENTREFINEMENT

QUERY PROCESSING

RESULT PROCESSING

SEARCH

SEARCHINDEX

...

...

...

...

... ... ... ...

Plu

gg

able

Co

nte

nt

Dis

pat

cher

Qu

ery

& R

esu

lt D

istr

ibu

tio

nDocuments

Query

15

» Linear scaling of feeding capacity

» Archival solutions @ 40 PB

» 14G Search solution (14X google)

» Feed @ >6000 updates/s

» Querying @ >2000 QPS

ESP5 – Raising the BarEnabling the Adaptive Information Warehouse

» 100M documents per server

» >2 X indexing throughput

» Consistent low latency

» Reduced disk footprint

» Feeding architecture improved

» Simplified state management

» Improved fault-tolerance

» Out-of-the-box monitoring

» End2End SOA philosophy

» Studio&Programmatic extensibility

» Semantic index

» SAN/NAS optimizations

SCALABLE HIGH PERFORMING

RELIABLE FLEXIBLE

16

FAST ESP Competence Analysis

. Performance & Scalability with commodity servers

. 70+ multi-language support

. Easy to use management tool and security control

. Relevancy/Precision find what users want

. Navigation to quickly to find what users want within few clicks

. Add-on applications including Recommendation, Advertising promotion, Mobile access, DB cleansing/offloading, …. 200+ connectors to connect market popular silos. Extensibility and Integration with open architecture. Market leading #1. Large R&D investment and commitment

why tune relevance

Documents

fast search transfer12

rank profile rankprofile

fre rankprofile

freshne rankprofile

web search enginesgoogle

xfast data search

site search gui users

end sources