enterprise search @epam

Post on 15-Jan-2015

2.083 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Excellence in Software Engineering 1Confidential

Enterprise Search• Best Practices• Connector Framework• Relevancy overview

Sharepoint User Group2013, March 26, Minsk

Excellence in Software Engineering 2Confidential

Information.epam.com

XXX.epam.com

knowledgebase.epam.com

HR file shares

Jira.epam.comYYY.epam.com

trainings.epam.com

Bla.bla.bla.epam.com

???????

EPAM has more than 100 systems

Excellence in Software Engineering 3Confidential

Excellence in Software Engineering 4Confidential

Little homework

Excellence in Software Engineering 5Confidential

We started POC in September 2012

Excellence in Software Engineering 6Confidential

Available as search.epam.com in November 2012

• Sharepoint 2010• FAST Search for Sharepoint• Branded Search Center• Custom connectors• Fine-tuned relevance to reflect EPAM

landscape

Excellence in Software Engineering 7Confidential

Excellence in Software Engineering

Excellence in Software Engineering 9Confidential

We become stronger every day…

• 550 000 searchable items

• 30+ content sources

• 400+ daily searches

• Exposed to internet

Excellence in Software Engineering 10Confidential

… to help you search

Excellence in Software Engineering 12Confidential

What we’ve learned

1. Deploy “painkiller” project as soon as possible2. Connect as much systems as possible (Cap O. speaking)

3. Analyze• Watch search logs• Connect external analytics• Speak with users

• Feedback forms sucks4. Tune relevancy

• hot-fix using bugs using best-bets5. Work with departments to adopt their content

• Basic SEO

Excellence in Software Engineering

Search Connectors in SP2010/2013

Search Connectors

Protocol Handers

File Share

SharePoint

WebSite

People

BCS

Lotus Notes Exchange Custom BCS

Database

WebService

.NET

Excellence in Software Engineering

BCS Connectors in SP 2010/2013

Stereotyped Operations• Get IDs• Get By ID• Describe Security• Read Stream

Excellence in Software Engineering

EPAM Data Import Framework

ISource

IDestination

IImporter

• Altassian Confluence• SVN• PMC

• SharePoint Library• File System

Workflow1. Source to build tree2. Destination to build tree3. Diff trees4. Destination to import diff (add, remove)

Tree DescribeTree()Node DownloadData(Node)

Tree DescribeTree()void Import(Tree)

Timer Job

Excellence in Software Engineering

BCS vs DataImport Comparison

Data Import BCS

Effort to build the same + +Document Previews + -Indexing Speed + +/-Customizable + -Storage Space - +Unit Testing + +/-Incremental crawl + +/-

Excellence in Software Engineering

RELEVANCY

Excellence in Software Engineering 18

0. User submits query

1. Get candidates: all docs that match query

2. Predict relevancy• Query terms importance

• Proximity of query terms

• Hit location (mp) importance

• Freshness

• Clicks

• User rating

• …

Search is a two step process

Confidential

Excellence in Software Engineering 19

• Linear combination of features

• RankProfile

• Weights are configured via Powershell

• Easy to understand via RankLog

• Easy tuning

– Content Source

– Managed Property

Relevancy in FAST Search

Confidential

Excellence in Software Engineering 20

RankLog example (QueryLogger @codeplex)

Confidential

Excellence in Software Engineering

Relevancy in Sharepoint

Confidential

Excellence in Software Engineering Confidential 22

Relevancy in Sharepoint

Type InstanceBM25 BM25Static UrlDepthBucketedStatic InternalFileTypeBucketedStatic LanguageStatic ClickDistanceStatic QueryLogClicksStatic QueryLogSkipsStatic LastClicksStatic EventRateMinSpan - soft TitleMinSpan - soft TitleMinSpan - soft TitleMinSpan - soft Content

• Nonlinear combination of features. Two Neural Networks.• Ranking Model Schema described

• http://www.google.com/patents/US8296292• http://www.google.com/patents/US7840569

• Cmdlets to import/export• Default Ranking Model Features:

Excellence in Software Engineering Confidential 23

• Google for “explain rank sharepoint”

• Parses RankDetail managed Property

ExplainRank page

Excellence in Software Engineering Confidential 24

Ranking Model Tuning

Excellence in Software Engineering Confidential 25

Approach described by Microsoft

– http://msdn.microsoft.com/en-us/library/bb499682(v=office.12).aspx

1. Collect Query Judgements

2. Use Machine Learning to train Neural Network

• namespace Microsoft.Office.Server.Search.RankerTuning

• Wait for tuning tool

Ranking Model Tuning

Excellence in Software Engineering Confidential 26

Query Judgment framework

Excellence in Software Engineering Confidential 27

• Authoritative Pages

• QueryRules

– Best Bets

– Understanding User Intent

• Synonyms (cmdlets)

• Entity Extractors

• Spelling Corrections

• Query Suggestions

• ManagedMetadata

• (!) Query Builder

Manual relevancy tuning in Sharepoint

Excellence in Software Engineering 28

Manual relevancy tuning in Sharepoint

Excellence in Software Engineering 29

Manual relevancy tuning in Sharepoint

Excellence in Software Engineering Confidential 30

• http://sp2013searchtool.codeplex.com/

SP 2013 REST Query tool

Excellence inSoftware Engineering

Solution Architect, Enterprise Search

Confidential

31

Alexey Kozhemiakin

top related