enterprise search @epam

30
Excellence in Software Engineering Confidential 1 nterprise Search Best Practices Connector Framework Relevancy overview arepoint User Group 13, March 26, Minsk

Upload: alex-kozhemiakin

Post on 15-Jan-2015

2.080 views

Category:

Documents


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Enterprise Search @EPAM

Excellence in Software Engineering 1Confidential

Enterprise Search• Best Practices• Connector Framework• Relevancy overview

Sharepoint User Group2013, March 26, Minsk

Page 2: Enterprise Search @EPAM

Excellence in Software Engineering 2Confidential

Information.epam.com

XXX.epam.com

knowledgebase.epam.com

HR file shares

Jira.epam.comYYY.epam.com

trainings.epam.com

Bla.bla.bla.epam.com

???????

EPAM has more than 100 systems

Page 3: Enterprise Search @EPAM

Excellence in Software Engineering 3Confidential

Page 4: Enterprise Search @EPAM

Excellence in Software Engineering 4Confidential

Little homework

Page 5: Enterprise Search @EPAM

Excellence in Software Engineering 5Confidential

We started POC in September 2012

Page 6: Enterprise Search @EPAM

Excellence in Software Engineering 6Confidential

Available as search.epam.com in November 2012

• Sharepoint 2010• FAST Search for Sharepoint• Branded Search Center• Custom connectors• Fine-tuned relevance to reflect EPAM

landscape

Page 7: Enterprise Search @EPAM

Excellence in Software Engineering 7Confidential

Page 8: Enterprise Search @EPAM

Excellence in Software Engineering

Page 9: Enterprise Search @EPAM

Excellence in Software Engineering 9Confidential

We become stronger every day…

• 550 000 searchable items

• 30+ content sources

• 400+ daily searches

• Exposed to internet

Page 10: Enterprise Search @EPAM

Excellence in Software Engineering 10Confidential

… to help you search

Page 11: Enterprise Search @EPAM

Excellence in Software Engineering 12Confidential

What we’ve learned

1. Deploy “painkiller” project as soon as possible2. Connect as much systems as possible (Cap O. speaking)

3. Analyze• Watch search logs• Connect external analytics• Speak with users

• Feedback forms sucks4. Tune relevancy

• hot-fix using bugs using best-bets5. Work with departments to adopt their content

• Basic SEO

Page 12: Enterprise Search @EPAM

Excellence in Software Engineering

Search Connectors in SP2010/2013

Search Connectors

Protocol Handers

File Share

SharePoint

WebSite

People

BCS

Lotus Notes Exchange Custom BCS

Database

WebService

.NET

Page 13: Enterprise Search @EPAM

Excellence in Software Engineering

BCS Connectors in SP 2010/2013

Stereotyped Operations• Get IDs• Get By ID• Describe Security• Read Stream

Page 14: Enterprise Search @EPAM

Excellence in Software Engineering

EPAM Data Import Framework

ISource

IDestination

IImporter

• Altassian Confluence• SVN• PMC

• SharePoint Library• File System

Workflow1. Source to build tree2. Destination to build tree3. Diff trees4. Destination to import diff (add, remove)

Tree DescribeTree()Node DownloadData(Node)

Tree DescribeTree()void Import(Tree)

Timer Job

Page 15: Enterprise Search @EPAM

Excellence in Software Engineering

BCS vs DataImport Comparison

Data Import BCS

Effort to build the same + +Document Previews + -Indexing Speed + +/-Customizable + -Storage Space - +Unit Testing + +/-Incremental crawl + +/-

Page 16: Enterprise Search @EPAM

Excellence in Software Engineering

RELEVANCY

Page 17: Enterprise Search @EPAM

Excellence in Software Engineering 18

0. User submits query

1. Get candidates: all docs that match query

2. Predict relevancy• Query terms importance

• Proximity of query terms

• Hit location (mp) importance

• Freshness

• Clicks

• User rating

• …

Search is a two step process

Confidential

Page 18: Enterprise Search @EPAM

Excellence in Software Engineering 19

• Linear combination of features

• RankProfile

• Weights are configured via Powershell

• Easy to understand via RankLog

• Easy tuning

– Content Source

– Managed Property

Relevancy in FAST Search

Confidential

Page 19: Enterprise Search @EPAM

Excellence in Software Engineering 20

RankLog example (QueryLogger @codeplex)

Confidential

Page 20: Enterprise Search @EPAM

Excellence in Software Engineering

Relevancy in Sharepoint

Confidential

Page 21: Enterprise Search @EPAM

Excellence in Software Engineering Confidential 22

Relevancy in Sharepoint

Type InstanceBM25 BM25Static UrlDepthBucketedStatic InternalFileTypeBucketedStatic LanguageStatic ClickDistanceStatic QueryLogClicksStatic QueryLogSkipsStatic LastClicksStatic EventRateMinSpan - soft TitleMinSpan - soft TitleMinSpan - soft TitleMinSpan - soft Content

• Nonlinear combination of features. Two Neural Networks.• Ranking Model Schema described

• http://www.google.com/patents/US8296292• http://www.google.com/patents/US7840569

• Cmdlets to import/export• Default Ranking Model Features:

Page 22: Enterprise Search @EPAM

Excellence in Software Engineering Confidential 23

• Google for “explain rank sharepoint”

• Parses RankDetail managed Property

ExplainRank page

Page 23: Enterprise Search @EPAM

Excellence in Software Engineering Confidential 24

Ranking Model Tuning

Page 24: Enterprise Search @EPAM

Excellence in Software Engineering Confidential 25

Approach described by Microsoft

– http://msdn.microsoft.com/en-us/library/bb499682(v=office.12).aspx

1. Collect Query Judgements

2. Use Machine Learning to train Neural Network

• namespace Microsoft.Office.Server.Search.RankerTuning

• Wait for tuning tool

Ranking Model Tuning

Page 25: Enterprise Search @EPAM

Excellence in Software Engineering Confidential 26

Query Judgment framework

Page 26: Enterprise Search @EPAM

Excellence in Software Engineering Confidential 27

• Authoritative Pages

• QueryRules

– Best Bets

– Understanding User Intent

• Synonyms (cmdlets)

• Entity Extractors

• Spelling Corrections

• Query Suggestions

• ManagedMetadata

• (!) Query Builder

Manual relevancy tuning in Sharepoint

Page 27: Enterprise Search @EPAM

Excellence in Software Engineering 28

Manual relevancy tuning in Sharepoint

Page 28: Enterprise Search @EPAM

Excellence in Software Engineering 29

Manual relevancy tuning in Sharepoint

Page 29: Enterprise Search @EPAM

Excellence in Software Engineering Confidential 30

• http://sp2013searchtool.codeplex.com/

SP 2013 REST Query tool

Page 30: Enterprise Search @EPAM

Excellence inSoftware Engineering

Solution Architect, Enterprise Search

Confidential

31

Alexey Kozhemiakin