enterprise search @epam
DESCRIPTION
TRANSCRIPT
Excellence in Software Engineering 1Confidential
Enterprise Search• Best Practices• Connector Framework• Relevancy overview
Sharepoint User Group2013, March 26, Minsk
Excellence in Software Engineering 2Confidential
Information.epam.com
XXX.epam.com
knowledgebase.epam.com
HR file shares
Jira.epam.comYYY.epam.com
trainings.epam.com
Bla.bla.bla.epam.com
???????
EPAM has more than 100 systems
Excellence in Software Engineering 3Confidential
Excellence in Software Engineering 4Confidential
Little homework
Excellence in Software Engineering 5Confidential
We started POC in September 2012
Excellence in Software Engineering 6Confidential
Available as search.epam.com in November 2012
• Sharepoint 2010• FAST Search for Sharepoint• Branded Search Center• Custom connectors• Fine-tuned relevance to reflect EPAM
landscape
Excellence in Software Engineering 7Confidential
Excellence in Software Engineering
Excellence in Software Engineering 9Confidential
We become stronger every day…
• 550 000 searchable items
• 30+ content sources
• 400+ daily searches
• Exposed to internet
Excellence in Software Engineering 10Confidential
… to help you search
Excellence in Software Engineering 12Confidential
What we’ve learned
1. Deploy “painkiller” project as soon as possible2. Connect as much systems as possible (Cap O. speaking)
3. Analyze• Watch search logs• Connect external analytics• Speak with users
• Feedback forms sucks4. Tune relevancy
• hot-fix using bugs using best-bets5. Work with departments to adopt their content
• Basic SEO
Excellence in Software Engineering
Search Connectors in SP2010/2013
Search Connectors
Protocol Handers
File Share
SharePoint
WebSite
People
BCS
Lotus Notes Exchange Custom BCS
Database
WebService
.NET
Excellence in Software Engineering
BCS Connectors in SP 2010/2013
Stereotyped Operations• Get IDs• Get By ID• Describe Security• Read Stream
Excellence in Software Engineering
EPAM Data Import Framework
ISource
IDestination
IImporter
• Altassian Confluence• SVN• PMC
• SharePoint Library• File System
Workflow1. Source to build tree2. Destination to build tree3. Diff trees4. Destination to import diff (add, remove)
Tree DescribeTree()Node DownloadData(Node)
Tree DescribeTree()void Import(Tree)
Timer Job
Excellence in Software Engineering
BCS vs DataImport Comparison
Data Import BCS
Effort to build the same + +Document Previews + -Indexing Speed + +/-Customizable + -Storage Space - +Unit Testing + +/-Incremental crawl + +/-
Excellence in Software Engineering
RELEVANCY
Excellence in Software Engineering 18
0. User submits query
1. Get candidates: all docs that match query
2. Predict relevancy• Query terms importance
• Proximity of query terms
• Hit location (mp) importance
• Freshness
• Clicks
• User rating
• …
Search is a two step process
Confidential
Excellence in Software Engineering 19
• Linear combination of features
• RankProfile
• Weights are configured via Powershell
• Easy to understand via RankLog
• Easy tuning
– Content Source
– Managed Property
Relevancy in FAST Search
Confidential
Excellence in Software Engineering 20
RankLog example (QueryLogger @codeplex)
Confidential
Excellence in Software Engineering
Relevancy in Sharepoint
Confidential
Excellence in Software Engineering Confidential 22
Relevancy in Sharepoint
Type InstanceBM25 BM25Static UrlDepthBucketedStatic InternalFileTypeBucketedStatic LanguageStatic ClickDistanceStatic QueryLogClicksStatic QueryLogSkipsStatic LastClicksStatic EventRateMinSpan - soft TitleMinSpan - soft TitleMinSpan - soft TitleMinSpan - soft Content
• Nonlinear combination of features. Two Neural Networks.• Ranking Model Schema described
• http://www.google.com/patents/US8296292• http://www.google.com/patents/US7840569
• Cmdlets to import/export• Default Ranking Model Features:
Excellence in Software Engineering Confidential 23
• Google for “explain rank sharepoint”
• Parses RankDetail managed Property
ExplainRank page
Excellence in Software Engineering Confidential 24
Ranking Model Tuning
Excellence in Software Engineering Confidential 25
Approach described by Microsoft
– http://msdn.microsoft.com/en-us/library/bb499682(v=office.12).aspx
1. Collect Query Judgements
2. Use Machine Learning to train Neural Network
• namespace Microsoft.Office.Server.Search.RankerTuning
• Wait for tuning tool
Ranking Model Tuning
Excellence in Software Engineering Confidential 26
Query Judgment framework
Excellence in Software Engineering Confidential 27
• Authoritative Pages
• QueryRules
– Best Bets
– Understanding User Intent
• Synonyms (cmdlets)
• Entity Extractors
• Spelling Corrections
• Query Suggestions
• ManagedMetadata
• (!) Query Builder
Manual relevancy tuning in Sharepoint
Excellence in Software Engineering 28
Manual relevancy tuning in Sharepoint
Excellence in Software Engineering 29
Manual relevancy tuning in Sharepoint
Excellence in Software Engineering Confidential 30
• http://sp2013searchtool.codeplex.com/
SP 2013 REST Query tool
Excellence inSoftware Engineering
Solution Architect, Enterprise Search
Confidential
31
Alexey Kozhemiakin