sharepoint search - spsnyc 2014
DESCRIPTION
Avtex's Brian Caauwe was a presenter at SharePoint Saturday in NYC in July of 2014. Here is his presentation.TRANSCRIPT
SHAREPOINT SEARCHIntroducing the new search service
Brian Caauwe – Sr. Consultant
July 26th, 2014
KEY TOPICS
• Editions
• Components
• Administration
• Customizations
WHO AM I?• Brian Caauwe
• SharePoint Consultant & Speaker
• Avtex Solutions (Minneapolis, MN)
• Email: [email protected]
• Twitter: @bcaauwe
• Blog: http://blog.avtex.com/author/bcaauwe
• Unfortunate Sports Fan
• Minnesota Twins
• Minnesota Vikings
• Technical Editor
• Professional SharePoint 2013 Administration
• Certifications
• MCM: SharePoint Server 2010
THANK YOU EVENT SPONSORS
• Please visit them and inquire about their products & services
• To win prizes make sure to get your bingo card stamped by ALL sponsors
POLL
• SharePoint Version
• 2007 – WSS, MOSS
• 2010 – SPF, Server, FAST
• 2013 – SPF, Server
• Work Roles
• SharePoint Administrator
• SharePoint Developer
• Business User
• Other
SEARCH EDITIONS
SEARCH EDITIONS
• SharePoint Foundation 2013
• SharePoint Server 2013
• Standard
• Enterprise
• ALL editions now use the SAME search service
• osearch15
• TechNet Reference: http://technet.microsoft.com/en-us/library/cb36484c-0e8f-480e-be88-5daa8bf2d47d#bkmk_SearchfeaturesOnPrem
SEARCH EDITIONSSHAREPOINT FOUNDATION 2013
• Now uses enterprise search engine
• Can now administer service
• Content Sources
• Crawl Schedule
• etc
• Limited scalability
SEARCH EDITIONSSHAREPOINT SERVER 2013 - STANDARD
• Scalable components
• People Search
• Promoted Results
• Customized Sorting
• Graphical Refiners
• Search Server web parts
SEARCH EDITIONSSHAREPOINT SERVER 2013 - ENTERPRISE
• Content by Search web part
• Entity Extraction
• Content Processing Enrichment
• Video Search
• Item Recommendations
SEARCH COMPONENTS
SEARCH COMPONENTSLOGICAL ARCHITECTURE
Search Admin
Crawl
Links
Analytics Reporting
CrawlContent
ProcessingIndex
QueryProcessing
Administration
AnalyticsProcessing
WFE
Event Store
SEARCH COMPONENTSADMINISTRATION COMPONENT
Component
• Monitors states of all other components
• Managed Topology Changes
• Finally scalable
• Only one active at a time
Database
• Search Admin Database
• Configuration data
• Topology
• Crawl, Query rules
• Property Mappings
• Content Sources, Crawl Schedules
• Analytics Settings
Administration
SEARCH COMPONENTSCRAWL COMPONENT
Component
• Performs the crawling
• Invokes connectors / protocol handlers
• SharePoint content
• Business Applications
• File Shares
• More…
• Delivers crawled items AND metadata to Content Processing Component
• Communicates with ALL crawl databases
Database(s)
• Crawl Database
• Crawl history
• Information on crawled items
• Scale out for each 20 million items crawled
• Host distribution
• 2010 Handled by Host URL
• 2013 Handled by Content DB
Crawl
SEARCH COMPONENTSCONTENT PROCESSING COMPONENT (CPC)
Component
• Handles document parsing and iFilters
• Extracts data for Document Parsing and Property Mappings
• Performs linguistic processing
• Entity Extraction
• Generates phonetic name variations (people search)
• Sends items to the Index Component
Database(s)
• Link Database
• Receives information about links and URLs from CPC
• Stores unprocessed information for use in analytics
• Information on search clicks
• # of times people pick on results
• Scale out for each 20 million items crawled
• Scale out for each 100 million queries / year
ContentProcessing
SEARCH COMPONENTSANALYTICS PROCESSING COMPONENT (APC)
Component
• Performs Search Analytics
• Pulls information from Links DB
• Stores information for search reports
• Performs Usage Analytics
• Pulls information from event store
• Generates recommendations, usage and statistics reports
• Sends results to the content processing component to be pushed to the index
Database(s)
• Analytics Reporting Database
• Results of usage analytics
• Statistics information from the analyses
• Scale out when size > 200 GB
AnalyticsProcessing
SEARCH COMPONENTSINDEX COMPONENT
Component
• Logical representation of an index replica
• Mapped one-to-one to an index replica
• Each partition holds one or more index replicas
• Receives processed items from content processing component
• Receives queries from query processing component and writes to index
• Returns result sets to the query processing component
On File index
• Located ON SharePoint servers housing index component
• Index update groups
• Default (majority of managed properties)
• Security (ACL managed property)
• Link (managed properties related to link structure)
• Usage (managed properties related to usage data)
• People (managed properties related to people search)
• Full-text index
• Contains text from searchable managed properties
• Multiple replicas / server supported after October 2013 CU
Index
SEARCH COMPONENTSQUERY PROCESSING COMPONENT (QPC)
Component
• Analyses and processes queries
• Decides which query rules are applicable
• Submits query to index component
• Determines which index partition to send query to
• Performs pre processing
• Receives result sets from index component
• Performs post processing
• Sends result set back to requestor
• Performs linguistic processing at query time
• Word breaking, stemming, spellchecking, thesaurus
QueryProcessing
SEARCH COMPONENTSCOMPONENT PARTNERS
Name CPU Network Disk Memory
Administration ● ● ● ●
Crawl ●● ●●● ●● ●●
Content Processing (CPC) ●●● ●● ●●●
Analytics Processing (APC) ●● ●●● ●● ●●
Index ●●● ●● ●●● ●●●
Query Processing (QPC) ● ●● ●●
The content of this slide is borrowed from Neil Hodgkinson (@nellymo)
QueryProcessingIndex
AnalyticsProcessing
ContentProcessing
CrawlAdministration
SEARCH ADMINISTRATION
SEARCH ADMINISTRATIONMAPPING TERMINOLOGY FROM 2010 TO 2013
2010 Term 2013 Term
Scopes Result Source
Federated Location Result Source
Keyword Query Rule
Best Bets Promoted Result
Managed Property Schema > Managed Property
Crawled Property Schema > Crawled Property
Search Result Removal Crawl Log > URL View > Remove the item from the Index
XSLT Display Templates
N/A Result Types
N/A Result Block
N/A Continuous Crawl
Host Distribution Rule N/A
SEARCH ADMINISTRATIONSEARCH TOPOLOGY
Central Administration
• View topology
• No more options…
PowerShell
• Manage the search service instances
• Manage topology and components
SEARCH ADMINISTRATIONSEARCH TOPOLOGY - POWERSHELL
## Get Service ##$svc = Get-SPEnterpriseSearchServiceInstance -Identity “servername”
## Start Service ##Start-SPEnterpriseSearchServiceInstance -Identity $svc
## Get Search Service Application ##$ssa = Get-SPEnterpriseSearchServiceApplication
## Get Active Topology ##$activeTop = Get-SPEnterpriseSearchTopology -SearchApplication $ssa -Active
## Clone Topology ##$clone = New-SPEnterpriseSearchTopology -SearchApplication $ssa -SearchTopology $activeTop -Clone
SEARCH ADMINISTRATIONSEARCH TOPOLOGY - POWERSHELL
## New Administration Component ##$adminComp = New-SPEnterpriseSearchAdminComponent -SearchTopology $clone -SearchServiceInstance $svc
## New Analytics Processing Component ##$apc = New-SPEnterpriseSearchAnalyticsProcessingComponent -SearchTopology $clone -SearchServiceInstance $svc
## New Crawl Component ##$crawlComp = New-SPEnterpriseSearchCrawlComponent -SearchTopology $clone -SearchServiceInstance $svc
## New Content Processing Component ##$cpc = New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $clone -SearchServiceInstance $svc
SEARCH ADMINISTRATIONSEARCH TOPOLOGY - POWERSHELL
## New Query Processing Component ##$qpc = New-SPEnterpriseSearchQueryProcessingComponent -SearchTopology $clone -SearchServiceInstance $svc
## New Index Partition / Replica ##$idx = New-SPEnterpriseSearchIndexComponent -SearchTopology $clone -SearchServiceInstance $svc -IndexPartition 0 –RootDirectory “D:\SP\SearchIndex”
## Activate New Topology ##$clone.Activate()## OR ##Set-SPEnterpriseSearchTopology –Identity $clone
SEARCH ADMINISTRATIONSEARCH TOPOLOGY
Topology Recap
• Ensure service is “online” before using in search topology
• To clone topology, use New-SPEnterpriseSearchTopology -Clone
• Otherwise you won’t have component ID’s
• Index Component
• When specifying a root directory, it MUST exist but be empty
• Also if referencing remote server, the Cmdlet checks local server
• Always specify a partition, otherwise it chooses 0
• When adding a new partition, it must have the same number of replicas as existing partitions
• After adding a new partition, the index WILL be repartitioned … amount of time it takes depends on index size
• You can ADD a partition, but not DELETE
• Clean up old topologies / components
SEARCH ADMINISTRATIONFARM ADMINISTRATION
Diagnostics
• Crawl Logs
• Only way to directly remove item from index
• Search Reports
• Crawl Health
• Query Health
• Usage Reports
SEARCH ADMINISTRATIONFARM ADMINISTRATION
Crawling
• Content Sources
• Crawl Schedules
• Continuous OR Incremental crawl
• Full crawl
• Crawl Rules
• Server Name Mappings
• File Types
• Index Reset
• Pause / Resume
• Crawler Impact Rules
SEARCH ADMINISTRATIONFARM ADMINISTRATION
Queries and Results
• Authoritative Pages
• Result Sources
• Query Rules
• Query Client Types
• Search Schema
• Query Suggestions
• Enabled / Disabled
• Always / Never Suggest
• Import AND Export
• Search Dictionaries (Term Store Management)
• Company Exclusion / Inclusion
• Query Spelling Exclusion / Inclusion
• Search Result Removal
SEARCH ADMINISTRATIONFARM ADMINISTRATION
Search Schema (Managed / Crawled Properties)
• Searchable
• Advanced Searchable Settings
• Full-text index
• Weight group
• Queryable
• Retrievable
• Allow Multiple Values
• Refinable
• Sortable
• Safe for Anonymous
• Alias
• Token Normalization
• Complete Matching
• Company Name Extraction
• Custom Entity Extraction
SEARCH ADMINISTRATIONFARM ADMINISTRATION - POWERSHELL ONLY
## Result Types ##$owner = Get-SPEnterpriseSearchOwner -Level Ssa
$word = Get-SPEnterpriseSearchResultItemType –SearchApplication $ssa –Owner $owner | ?{$_.Name –eq “Microsoft Word”}
$pdf = Get-SPEnterpriseSearchResultItemType –SearchApplication $ssa –Owner $owner | ?{$_.Name –eq “PDF”}
$wordPDF = New-SPEnterpriseSearchResultItemType -SearchApplication $ssa -Name “WordPDF” –Owner $owner –ExistingResultItemType $pdf –ExistingResultItemTypeOwner $owner
Set-SPEnterpriseSearchResultItemType –Identity $wordPDF –SearchApplication $ssa –owner $owner –RulePriority 1 –DisplayTemplateUrl $word.DisplayTemplateUrl
## Thesaurus ##Import-SPEnterpriseSearchThesaurus -SearchApplication $ssa -FileName “\\server\share\thesaurus.csv”
SEARCH ADMINISTRATIONSITE ADMINISTRATION
Result Types
• Map results to display templates
Consumes farm settings, but allows site independent settings
• Result Sources
• Query Rules
• Search Schema
• Map Existing Managed Properties to Crawled Properties
• New Managed Properties - Types: Text or Yes/No
• Cannot make Sortable, Refinable, Multiple Values
SEARCH ADMINISTRATIONSITE ADMINISTRATION
Search Settings
• Search Center URL
• Search Navigation
Searchable Columns
• Exclude site columns from indexing
List Settings
• Can flag a list to force re-index
SEARCH CUSTOMIZATIONS
SEARCH CUSTOMIZATIONSCRAWL COMPONENT
Custom Connectors
• Really means BCS
• LOBSystemInstance needs ShowInSearchUI to show in Central Admin for content source
• DisplayUriField set on method otherwise URL’s in search will start with bdc3://
• LastModifiedTimeStampField set and ChangedIdEnumerator and DeletedIdEnumerator implemented if you want incremental crawls
MSDN Reference: http://msdn.microsoft.com/en-us/library/gg294165.aspx
Crawl
SEARCH CUSTOMIZATIONSCONTENT PROCESSING COMPONENT (CPC)
Content Enrichment Web Service
• Web service call outside of SharePoint to:
• Clean data
• Remove from index
• Augment properties
• Configurations
• Trigger Expression
• Input Managed Properties
• Output Managed Properties
• Failure Mode
• Debug Mode
MSDN Reference: http://msdn.microsoft.com/en-us/library/jj163968.aspx
ContentProcessing
SEARCH CUSTOMIZATIONSCONTENT PROCESSING COMPONENT (CPC)
Content Enrichment Web Service
• Registering the service in PowerShell
$ssa = Get-SPEnterpriseSearchServiceApplication
$cewsConfig = New-SPEnterpriseSearchContentEnrichmentConfiguration$cewsConfig.Endpoint = “http://externalserver/cews.svc”$cewsConfig.InputProperties = “Title”, “Company”$cewsConfig.OutputProperties = “Title”, “Company”, “Prop3”$cewsConfig.Trigger = ‘Contains(Company, “CoName”)’$cewsConfig.FailureMode = “Error”$cewsConfig.DebugMode = $false
Set-SPEnterpriseSearchContentEnrichmentConfiguration -SearchApplication $ssa -ContentEnrichmentConfiguration $cewsConfig
ContentProcessing
SEARCH CUSTOMIZATIONSCONTENT PROCESSING COMPONENT (CPC)
Custom Entity Extraction
• Different Extraction types
• Word Extraction
• 5 Dictionaries
• Microsoft.UserDictionaries.EntityExtraction.Custom.Word.n
• Word Part Extraction
• 5 Dictionaries
• Microsoft.UserDictionaries.EntityExtraction.Custom.WordPart.n
• Word Exact Extraction
• One Dictionary
• Microsoft.UserDictionaries.EntityExtraction.Custom.ExactWord.1
• Word Part Exact Extraction
• One Dictionary
• Microsoft.UserDictionaries.EntityExtraction.Custom.ExactWordPart.1
TechNet Reference: http://technet.microsoft.com/en-us/library/jj219480.aspx
ContentProcessing
SEARCH CUSTOMIZATIONSCONTENT PROCESSING COMPONENT (CPC)
## Entity Extraction ##Import-SPEnterpriseSearchCustomExtractionDictionary -SearchApplication $ssa –DictionaryName Microsoft.UserDictionaries.EntityExtraction.Custom.Word.1 –FileName “\\server\share\dictionary.csv”
Custom Entity Extraction
• Sample File
• Import through PowerShell
ContentProcessing
SEARCH CUSTOMIZATIONSCONTENT PROCESSING COMPONENT (CPC)
Custom Entity Extraction
• Map in Central Administration
ContentProcessing
SEARCH CUSTOMIZATIONSQUERY PROCESSING COMPONENT (QPC)
Ranking Models
• Customize ranking based on YOUR logic
• VERY complex… a LOT of math
Registered in PowerShell
MSDN Reference: http://msdn.microsoft.com/en-us/library/sharepoint/dn169052.aspx
$ssa = Get-SPEnterpriseSearchServiceApplication$owner = Get-SPEnterpriseSearchOwner -Level Ssa$customModel = [string](Get-Content .\CustomModel.xml)
$newModel = New-SPEnterpriseSearchRankingModel –SearchApplication $ssa –Owner $owner –RankingModelXML $customModel
QueryProcessing
SEARCH CUSTOMIZATIONSQUERY PROCESSING COMPONENT (QPC)
Security Trimming
• Pre
• Augments claims
• Processed BEFORE index lookup
• Accurate refiner counts
• Post
• Secondary security checkpoint
• Processed AFTER index lookup
• Negatively effects refiner counts
Needs to be deployed to GAC
Registered in PowerShell
MSDN Reference: http://msdn.microsoft.com/en-us/library/sharepoint/ee819930.aspx
$ssa = Get-SPEnterpriseSearchServiceApplication
New-SPEnterpriseSearchSecurityTrimmer -ID “1” -SearchApplication $ssa -TypeName “<strong typed assembly>”
QueryProcessing
UX
SEARCH CUSTOMIZATIONSUSER EXPERIENCE
Display Templates
• New way to change search results
• Good by XSLT
• Get used to JavaScript
• Available through Design Manager
• Live in Master Page Gallery
• Separate folders for Content by Search and Core Search
• .HTML file
• .JS file (DO NOT TOUCH)
MSDN Reference: http://msdn.microsoft.com/en-us/library/jj945138.aspx
UX
SEARCH CUSTOMIZATIONSUSER EXPERIENCE
Display Templates
• Samples
• Announcements
• Pages
• Documents
UX
SEARCH CUSTOMIZATIONSUSER EXPERIENCE
Search Web Parts
• Search Results
• Query Builder
• Auto Refine
• Sorting
• Query Rules
• Inline testing
• Content by Search
• Search Results Web Part settings plus
• Term Navigation
• Tuned for use out of search center
SESSION SUMMARY
• Editions
• Components
• Administration
• Customizations
HOW TO CONTACT ME
• Brian Caauwe
• SharePoint Consultant & Speaker
• Email: [email protected]
• Twitter: @bcaauwe
• Blog: http://blog.avtex.com/author/bcaauwe
REFERENCES
SharePoint 2013 training for IT pros
• http://technet.microsoft.com/en-US/sharepoint/fp123606
Search Edition Features
• http://technet.microsoft.com/en-us/library/cb36484c-0e8f-480e-be88-5daa8bf2d47d#bkmk_SearchfeaturesOnPrem
BCS Connector
• http://msdn.microsoft.com/en-us/library/gg294165.aspx
Content Enrichment Web Service
• http://msdn.Microsoft.com/en-us/library/jjl63968.aspx
REFERENCES
Custom Entity Extraction
• http://technet.microsoft.com/en-us/library/jj219480.aspx
Ranking Models
• http://msdn.microsoft.com/en-us/library/sharepoint/dn169052.aspx
Security Trimming
• http://msdn.microsoft.com/en-us/library/sharepoint/ee819930.aspx
Display Templates
• http://msdn.microsoft.com/en-us/library/jj945138.aspx