why tune relevance
DESCRIPTION
Why tune relevance. Because we want to find the one single best item, among a large group of possible candidates…. 1. Multiple levels of control. Relevancy Ranking Precision Recall. Application Model. Business Rules. Levels of control. InPerspective ™. Core Algorithmic Model. - PowerPoint PPT PresentationTRANSCRIPT
11
Why tune relevance
Because we want to find the one single best item, among a large group of possible candidates….
Multiple levels of control
RelevancyRankingPrecision
Recall
Business Rules
InPerspective™
Core Algorithmic Model
Application Model
Levels o
f con
trol
FAST Relevancy Framework
Business Rules
InPerspective™
Core Algorithmic Model
Application ModelSorting order,
navigation, relevance feedback
Accessible to… Control Mechanisms
End Users
Business Managers
Query and document
“boosting” (BMCP)
Administrator “Rank Profile”
Developer Algorithm “weights”
Levels o
f con
trol
Multiple levels of control
FAST Relevancy FrameworkInPerspective™
Freshness• How fresh is the document compared to the time of the query?
Completeness• How well does the query match superior contexts like the title or the url?• Example: query=”Mexico”, Is ”Mexico” or ”University of New Mexico” best?
Authority• Is the document considered an authority for this query?• Examples: Web link cardinality, article references, product revenue, page
impressions, ...
Statistics• How well does the contents of this document on overall match the query?• Examples: Proximity, context weights, tf-idf, degree of linguistic normalization,
++
Quality• What is the quality of the document? • Examples: Homepage?, Press release?, ...
Distance• What is the distance from where I am?
InPerspective
6
FAST Relevancy Framework : Rank Profile
Rank-Profile: A Relevancy Mixing BoardAuthority:
Freshness:
Proximity:
Context: Body:Description
URL:Keywords:
Title:
• Rank-Profile: Financial News• Descript
ion• Body: • URL:
• Keywords:
• Title:• Context:• Proximit
y:• Freshne
ss:• Authorit
y:
• Rank-Profile: Default (Intranet)• Descript
ion• Body: • URL:
• Keywords:
• Title:• Context:• Proximit
y:• Freshne
ss:• Authorit
y:
• Rank-Profile: Wealth Management• Descript
ion• Body: • URL:
• Keywords:
• Title:• Context:• Proximit
y:• Freshne
ss:• Authorit
y:
Search Business Center
SBC
FAST UnityTM
What It Does, How It Works, and What Value It Provides
9
FAST Unity at a Glance
SearchIndex
Internal Sources External Sources (e.g. another ESP instance)
Front-end Search
Application
FAST ESP
Federation
… Web Search Engine Web Site…
FAST Sources• FAST ESP 5.x• FAST Data Search 4.x• FAST ImPulse• FAST AdMomentum• FAST RetrievalWare
External Sources• Microsoft
SharePoint 2003 & 2007
• Web search engines• Google, Yahoo,
OpenSearch, Gigablast
• Web services• Match.com,
PriceGrabber, Google Image
• Advertising services• Google Adsense
10
Look and feel - Unity
Featured Content
Calls-to- Action
Ads
User-generatedContent
Third-partyContent
Multimedia
SubscriptionFeeds
11
ExampleWeb 2.0 Model
UNITY
Query
Source Queries
AdMo
– One query - multiple result
sets
– Results are returned
asynchronously
– Delivered directly to the
browser
Single Search Node Performance– 20-50 Million documents
Up to 1TB of information– 100-500 queries per second– 20-50 ms query response time– Down to 50 ms indexing latency– Indexing 50+ documents per second
while maintaining search performance
FAST Scalability Facts:
• Deployments with >40TB• Deployments with >3B documents
• Deployments with 1 to 1000+ servers• Deployments with 1000s of queries per second
• Deployments with >500 updates per second• 20-50 ms query response time• Sub-second indexing latency
• Crawling >200 documents per second per server
Document Freshness SCALING
FAST ESP - Scalability
3D Scalability: #Documents - #Users - Index Latency
Dual Pentium4, 3 Ghz4 GB Ram
3 X SCSI 15K rpmHW RAID-0 derivate
13
Query Performance of FAST Search VS RDBMSProven High QPS, Low Latency Access – Database Offloading
0
2
4
6
8
10
12
14
16
18
20
1/16 1/8 1/4 1/2 1 2 4 8 16 32
[sec.]
# que
ries
• Structured data:• 5 million records;
• 13 fields per record
• Structured queries:• 22 SQL queries
( Representative in ERP )
ESP5
RDBMS
• #1: FAST ESP4 w/ disk• Mean = 99 [ms]• St.dev. = 36 [ms]
• #2: Oracle w/ memory mapping
• Mean = 4 057 [ms]• St.dev. = 9 368 [ms]
0
100
200
300
400
500
600
700
800
900
1 2 3
FAST
ORA
20 users
50 users
100 usersQPS
Latency
Identical HW : single node, 2 CPU, 4GB ram 3 SCSI disks
Identical data : auction data from eBay, 3.6 million doc’s
Identical queries: 200 queries defined by Oracle
ESP5
RDBMS
14
ESP5 ScalabilityEfficiency Per Server & Linear Scaling
CONTENTREFINEMENT
QUERY PROCESSING
RESULT PROCESSING
SEARCH
SEARCHINDEX
...
...
...
...
... ... ... ...
Plu
gg
able
Co
nte
nt
Dis
pat
cher
Qu
ery
& R
esu
lt D
istr
ibu
tio
nDocuments
Query
15
» Linear scaling of feeding capacity
» Archival solutions @ 40 PB
» 14G Search solution (14X google)
» Feed @ >6000 updates/s
» Querying @ >2000 QPS
ESP5 – Raising the BarEnabling the Adaptive Information Warehouse
» 100M documents per server
» >2 X indexing throughput
» Consistent low latency
» Reduced disk footprint
» Feeding architecture improved
» Simplified state management
» Improved fault-tolerance
» Out-of-the-box monitoring
» End2End SOA philosophy
» Studio&Programmatic extensibility
» Semantic index
» SAN/NAS optimizations
SCALABLE HIGH PERFORMING
RELIABLE FLEXIBLE
16
FAST ESP Competence Analysis
. Performance & Scalability with commodity servers
. 70+ multi-language support
. Easy to use management tool and security control
. Relevancy/Precision find what users want
. Navigation to quickly to find what users want within few clicks
. Add-on applications including Recommendation, Advertising promotion, Mobile access, DB cleansing/offloading, …. 200+ connectors to connect market popular silos. Extensibility and Integration with open architecture. Market leading #1. Large R&D investment and commitment