microsoft enterprise seach using sharepoint
TRANSCRIPT
Microsoft Office SharePoint Microsoft Office SharePoint Server 2007Server 2007
Search WorkshopSearch Workshop
游家德 游家德 Jade YuJade Yu敦群數位科技股份有限公司敦群數位科技股份有限公司
Microsoft Office SharePoint Microsoft Office SharePoint Server 2007 Enterprise SearchServer 2007 Enterprise Search
Enterprise Search Advanced Training – Enterprise Search Advanced Training – Building and Implementing Enterprise Building and Implementing Enterprise
Search SolutionsSearch Solutions
Workshop AgendaWorkshop Agenda Day 1 – Search Overview
Microsoft Search Landscape MOSS 2007 Walkthrough Architecture and Deployment
Scenarios Crawl and Query Processes Search Object Model
Day 2 – Customization and Management Search Object Model Business Data Catalog (BDC) Search Extensibility and Integration AdministrationAdministration Capacity PlanningCapacity Planning
AssumptionsAssumptions Some knowledge and experience with Search
functionality Knowledge of the Business Data Catalog in
general (new in Office 2007 System)
Office 2007 System Content Creation/Contribution experience
Knowledge of Web site creation and management in general
Knowledge of MS platform (Windows 2003 Server, ADS, IIS, SQL 2005 & Office Clients)
Knowledge of ASP.NET 2.0 and XSLT
Workshop ObjectivesWorkshop Objectives Explain how to use the Office 2007 Search
functionality Interpret the Office 2007 System Search
Terminology Describe the rich feature set of Office 2007
System Search - Servers and Clients Describe how to use the platform well enough
to use its APIs to extend the products Explain how Office 2007 System Search will
solve enterprise business requirements
Module 1Module 1
Enterprise Search OverviewEnterprise Search Overview
Module AgendaModule Agenda
Microsoft Enterprise Search Client-side Search Platform Client-side Comparison Server-side Search Platform Key Differences between WSS and MOSS MOSS 2007 for Search Key Features MOSS 2007 for Search and MOSS 2007
Comparison
Microsoft Enterprise SearchMicrosoft Enterprise Search
Server-Side Search Platform
Line-of-business systems and structured data sources
Unstructured information
People, expertise
External Web sites
E-mail messages, appointments, and instant messaging
Client-Side Search Platform
Documents, programs, and media
Client-Side Search PlatformClient-Side Search Platform Windows Desktop Search (WDS) for
XP and Windows Server You must install an additional program for
Search
Vista – Integrated Desktop Search Integration in the Operating System Ability to search nearly anywhere Virtual Folders
Client-Side ComparisonClient-Side ComparisonMicrosoft®
Windows®
Desktop Search
Microsoft® Windows® Vista
Rich, actionable interface X X
Integration with Microsoft Outlook X X
Polite indexing (Pauses when computer is in use)
XX
Live icons & document previews XX
Advanced Search integrated into the Operating System X
Save searches to search folders X
Instant SearchX
(on taskbar)
X
(from start menu)
Server-Side Search Server-Side Search PlatformsPlatforms Windows SharePoint Services v3
“Basic” index / search capabilities to support WSS collaboration and document management
Microsoft Office SharePoint Server (MOSS) 2007 Enterprise search and indexing features
“unlocked” Several SKUs to support different
scenarios and customer needs
Key Differences Between WSS and MOSSKey Differences Between WSS and MOSS
WSS v3Microsoft Office SharePoint
Server (MOSS)
Can IndexLocal SharePoint
content
XSharePoint sites / collections, Exchange Public Folders, File Shares, Web Content, Lotus Notes, LOB Apps,
and others . . .Rich, relevant results X
Alerts, RSS, Did you mean, Duplicate collapsing
X
Scopes, Managed Properties
X
Best Bets, Result Removal, Query Reports
X
Search Center Tabs
X
BDC Search XAPI’s provided Query Query + Admin
MOSS 2007 for SearchMOSS 2007 for Search A Search-only solution for intranets and
public-facing Web (Internet) sites Two versions
Standard Edition limited to 500,000 docs Enterprise Edition with unlimited docs
Includes Out of the box search for file shares, Web sites,
SharePoint sites, Exchange Public Folders, Lotus Notes databases
Extensibility to 3rd party document repositories and file types
MOSS 2007 and MOSS FS MOSS 2007 and MOSS FS Usage ScenariosUsage Scenarios
Description Scenario
MOSS 2007 An information management solution that includes enterprise search integrated with portal, collaboration, web content management, ECM, forms, and BI functionalities
Customers who desire search as an integrated part of a broader information management solution
MOSS FS A core search-only solution for intranet and public-facing web sites
•Customers who require a core search-only product that can be integrated into their existing infrastructure•Customers who require search functionality for their public-facing web (Internet) sites
MOSS 2007 for Search and MOSS 2007MOSS 2007 for Search and MOSS 2007Features ComparisonFeatures Comparison
Features MOSS 2007 for Search
(Standard Edition)
MOSS 2007 for Search
(Enterprise Edition)
MOSS 2007 (Standard
CAL)
MOSS 2007 (Standard plus
Enterprise CAL)
File shares X X X X
Web sites X X X X
SharePoint sites X X X X
Microsoft Exchange Server public folders
X X X X
Lotus Notes databases X X X X
Third party document repositories 1
X X X X
Secure content access control
X X X X
Enhanced Search Center user interface
X X
Search for people and expertise
X X
Business Data Catalog (BDC)
X
Search structured data sources
X
Document limit 500,000 No Limit2 No Limit2 No Limit2
Questions?Questions?
Module 2Module 2
Microsoft Office SharePoint Microsoft Office SharePoint Search 2007 – Search 2007 – WalkthroughWalkthrough
Module AgendaModule Agenda End-User ImprovementsEnd-User Improvements
RelevanceRelevance People and ExpertisePeople and Expertise Business Data SearchBusiness Data Search
Administration ImprovementsAdministration Improvements Design GoalsDesign Goals Indexing ManagementIndexing Management Security Security CustomizationCustomization Query ReportingQuery Reporting
Performance ImprovementsPerformance Improvements Demo MOSS 2007Demo MOSS 2007
End-User ImprovementsEnd-User ImprovementsRelevanceRelevance
Dramatically improved relevanceDramatically improved relevanceis the top goal of this releaseis the top goal of this release
New ingredients added including:New ingredients added including: Anchor textAnchor text Click distanceClick distance URL depth URL depth Missing metadata creationMissing metadata creation
Result is noticeably more relevant searchResult is noticeably more relevant search 100% better on all queries100% better on all queries 500% better on common queries500% better on common queries
End-User Improvements End-User Improvements People and ExpertisePeople and Expertise
Bring people into the Search experienceBring people into the Search experience Getting your job done means working withGetting your job done means working with
the right peoplethe right people Find subject-matter experts based on theirFind subject-matter experts based on their
knowledge and contactsknowledge and contacts
Numerous improvements over SPS 2003Numerous improvements over SPS 2003 Index any LDAP V3 directoryIndex any LDAP V3 directory Dedicated tab for finding peopleDedicated tab for finding people Results grouped by “social distance” to youResults grouped by “social distance” to you
End-User Improvements End-User Improvements Business Data SearchBusiness Data Search Information in Line of Business (LOB) systems is Information in Line of Business (LOB) systems is
often hard to accessoften hard to access MOSS 2007 can bring that data to your usersMOSS 2007 can bring that data to your users
Data is accessed through the Data is accessed through the Business Data Business Data CatalogCatalog
Exposed to many features in SharePointExposed to many features in SharePoint Search can easily index the dataSearch can easily index the data
No need to write codeNo need to write code Highly customizable resultsHighly customizable results Integrated with scopes and Search centerIntegrated with scopes and Search center
Address SPS 2003 administration user Address SPS 2003 administration user interface pain pointsinterface pain points
Unify WSS and MOSS searchUnify WSS and MOSS search Enable full programmability via the object Enable full programmability via the object
modelmodel Even better scalability and performanceEven better scalability and performance
Administration ImprovementsAdministration ImprovementsDesign GoalsDesign Goals
Streamlined experience and more controlStreamlined experience and more control One index per shared service; no need to One index per shared service; no need to
worry about managing discrete indexesworry about managing discrete indexes Multiple start addresses per content sourceMultiple start addresses per content source MOSS indexes can drive the WSS search MOSS indexes can drive the WSS search
experienceexperience Allow upgrade from WSS to MOSSAllow upgrade from WSS to MOSS
Administration ImprovementsAdministration Improvements Indexing ManagementIndexing Management
Administration ImprovementsAdministration Improvements SecuritySecurity
Query-time security trimming in SPS 2003Query-time security trimming in SPS 2003 File shares, WSS/SPS 2003, Exchange, Lotus File shares, WSS/SPS 2003, Exchange, Lotus
Notes (via mapping)Notes (via mapping)
Now supports pluggable authenticationNow supports pluggable authenticationfor content in WSS/MOSS sitesfor content in WSS/MOSS sites Based on ASP.NET 2.0 modelBased on ASP.NET 2.0 model
Minimum required crawler permission is nowMinimum required crawler permission is nowjust Full Read, not Administratorjust Full Read, not Administrator Still provides the same security trimming Still provides the same security trimming
functionalityfunctionality
Ability to remove single itemsAbility to remove single items
Administration ImprovementsAdministration Improvements CustomizationCustomization
Search in Search in everyevery company is different company is different Different metadata might matter:Different metadata might matter:
Documents: Title, Author, File location, SizeDocuments: Title, Author, File location, Size Records: Patient, Doctor, Healthcare provider, SSN…Records: Patient, Doctor, Healthcare provider, SSN…
How users meaningfully scope searches differs:How users meaningfully scope searches differs: ““All finance documents”All finance documents” ““All patient records”All patient records” ““All published documents”All published documents”
Customize results to “pop” metadata that Customize results to “pop” metadata that mattersmatters
Customization offered at many levelsCustomization offered at many levels Web Parts, XSLT/CSS, full object model…Web Parts, XSLT/CSS, full object model…
Administration ImprovementsAdministration Improvements Query ReportingQuery Reporting Best way to improve SearchBest way to improve Search
is to understand current usageis to understand current usage New out-of-box usage reporting:New out-of-box usage reporting:
Query volume trends, top queries, Query volume trends, top queries, click-through rates, queries with zero click-through rates, queries with zero results, etc. results, etc.
At both site and service provider levelsAt both site and service provider levels Export data for extended reporting in Export data for extended reporting in
ExcelExcel Respond to feedback with configuration Respond to feedback with configuration
changes or editorial resultschanges or editorial results
Performance ImprovementsPerformance Improvements
Key new features make the crawls faster so Key new features make the crawls faster so the content is fresherthe content is fresher More efficient SharePoint crawlingMore efficient SharePoint crawling
(Change Log Crawl)(Change Log Crawl) Continuous propagationContinuous propagation Unified WSS and MOSS searchUnified WSS and MOSS search Security Change Only CrawlSecurity Change Only Crawl
Maximum scale is Maximum scale is 10s of millions10s of millionsof documents per indexerof documents per indexer
Demo – MOSS 2007Demo – MOSS 2007
Goal of demo is a high level overview with focus on:•Search boxes and advanced search•Search results experience•Search Center•Admin experience
Questions?Questions?
Module 3Module 3
Architecture and Deployment Architecture and Deployment ScenariosScenarios
AgendaAgenda Key concepts Key concepts
MS Search ArchitectureMS Search Architecture Deployment Building BlocksDeployment Building Blocks WSS v3 Search TopologiesWSS v3 Search Topologies MOSS 2007 Search Topologies MOSS 2007 Search Topologies
Search Topology scenarios Search Topology scenarios Small Small Medium Medium Large Large Geographically distributedGeographically distributed
Solution scenarios Solution scenarios Collaboration sites Collaboration sites Enterprise portal Enterprise portal Internet facing portalInternet facing portal
Microsoft Search ArchitectureMicrosoft Search Architecture
Query Engine
Index Engine
Protocol
HandlersiFilters
ContentIndex
OOB Search UI/Custom Search Apps
Query OM and Web Service
Information
…ExchangeFolders
NetworkShares
ExternalWeb Sites
SharePointSites
BusinessData
Stemmers
WordBreakers
Resu
lts
Qu
ery
Content Sources
Crawl Log
Scopes
Schema
Best Bets
Keywords
Ranking
Searc
h C
on
fig
ura
tion
Data
Notes
SharePoint Search Topologies:SharePoint Search Topologies:Deployment Building BlocksDeployment Building Blocks Physical building blocks: Physical building blocks:
Web Front-End ServersWeb Front-End Servers Application servers (Query, Index, Excel Services, etc.)Application servers (Query, Index, Excel Services, etc.) SQL Databases SQL Databases
Search functionality segmented into two roles: Search functionality segmented into two roles: Indexer Indexer QueryQuery
MOSS 2007 specificMOSS 2007 specific Shared Service Provider (SSP)Shared Service Provider (SSP)
IndexerIndexer Web Application(s)Web Application(s)
Site Collection(s)Site Collection(s) Content Database(s)Content Database(s)
Virtual Server(s) (IIS)Virtual Server(s) (IIS)
WSS v3 Search Topology BasicsWSS v3 Search Topology Basics WSS uses both server roles on the same WSS uses both server roles on the same
machine (“Search Server”)machine (“Search Server”) IndexingIndexing Query Query
Ability to index local content onlyAbility to index local content only Site Collection (content database(s))Site Collection (content database(s))
Content is automatically indexedContent is automatically indexed minimal search administration minimal search administration
Ability to query at a site and below itAbility to query at a site and below it stsadm command exposes some admin stsadm command exposes some admin
operationsoperations Can Crawl Multiple content databases Can Crawl Multiple content databases
Sample Sample WSSWSS v3 v3 Topology Topology
...
...
X
User Requests
...
Search Server – Indexing and Query
Crawling
Web Front Ends
Content Databases
Load Balancer
Crawling
WSS v3 - Topology WSS v3 - Topology ConsiderationsConsiderations Scale out just like WSSScale out just like WSS Add content databases for contentAdd content databases for content Add search servers for searchAdd search servers for search Each search server can serve up to 100 Each search server can serve up to 100
content databasescontent databases Could be lower depending on the data in Could be lower depending on the data in
the content databasethe content database
Adds new functionality over base WSS Adds new functionality over base WSS SearchSearch
Application server roles can be Application server roles can be separated:separated: IndexerIndexer Query serverQuery server
Propagation from indexer to query Propagation from indexer to query serversservers
Crawl local + external contentCrawl local + external content Enhanced administration experienceEnhanced administration experience Ability to search across site collectionsAbility to search across site collections
MOSS 2007 Search Topology MOSS 2007 Search Topology BasicsBasics
MOSS 2007 Search Topology MOSS 2007 Search Topology Basics (cont)Basics (cont) Query role can be assigned to one or Query role can be assigned to one or
more serversmore servers Indexing role can only be assigned to a Indexing role can only be assigned to a
single serversingle server Multiple query servers not allowed IF Multiple query servers not allowed IF
server is providing both indexing and server is providing both indexing and query servicesquery services
Only one index per SSP . . . although Only one index per SSP . . . although you can have multiple SSPsyou can have multiple SSPs
Sample Sample MOSS MOSS 20072007 Topology Topology
...
...
X
User Requests
Load Balancer
Query servers
Web front ends
...Crawling
Content databases
Indexer
Propagation of indexes
...
External content
Query servers
separated from indexer
Indexer crawling local +
external content
MOSS 2007 – Search Topology MOSS 2007 – Search Topology ConsiderationsConsiderations Indexing operations are CPU intensiveIndexing operations are CPU intensive Dedicated query servers *might* be Dedicated query servers *might* be
better in a query heavy environmentbetter in a query heavy environment MOSS / WSS crawls do involve making MOSS / WSS crawls do involve making
HTTP requests against the WFE(s)HTTP requests against the WFE(s) Dual role, WFE / Query servers more Dual role, WFE / Query servers more
efficient with security trimmingefficient with security trimming All servers should be on same network All servers should be on same network
segmentsegment
MOSS 2007 – Search Topology MOSS 2007 – Search Topology Considerations (cont)Considerations (cont) Each farm can index up to 50 million Each farm can index up to 50 million
itemsitems Beyond this, add more farmsBeyond this, add more farms Hardware is importantHardware is important
Shared Search ServiceShared Search Service Shared Service Provider (SSP) – grouped Shared Service Provider (SSP) – grouped
high-value, resource intensive serviceshigh-value, resource intensive services
Shared services are consumed by web Shared services are consumed by web applications (and sites within them)applications (and sites within them)
““Always on” shared services – all sites in a Always on” shared services – all sites in a web application use the same indexweb application use the same index
Resource intensive operations controlled Resource intensive operations controlled centrallycentrally
Some admin experience is manageable at site Some admin experience is manageable at site levellevel
Search servicePeople service
…
Shared Service Provider (SSP)
http://sales http://finance http://hr
spsite spsite spsite spsite spsite spsite
spweb spweb spweb spweb spweb spweb
Virtual Servers
Search Shared ServiceSearch Shared Service
Content Databases
External content
Search Shared ServiceSearch Shared Service
...
...
X
User Requests
Load Balancer
Query servers
Web front ends
...Crawling
Content databases
Indexer
Propagation of indexes
...
Search servicePeople service
…
Shared Service Provider
http://sales http://finance http://hr
spsite spsite spsite spsite spsite spsite
spweb spwebspweb spweb spweb spweb
Virtual Servers
Content Indexed
Content Databases
External content
Common Search TopologiesCommon Search Topologies
Deployment scenarios Deployment scenarios Small Small Medium Medium Large Large Geographically Distributed (MOSS only)Geographically Distributed (MOSS only)
Small Search DeploymentSmall Search Deployment WSSWSS
Single Search Server with both rolesSingle Search Server with both roles IndexIndex
Single Site Collection only!Single Site Collection only! Single Set of Content DatabasesSingle Set of Content Databases
QueryQuery
MOSSMOSS Single ServerSingle Server
Dual RoleDual Role IndexIndex
SSP Based – Multiple Site CollectionsSSP Based – Multiple Site Collections Multiple Set of Content DatabasesMultiple Set of Content Databases
QueryQuery
MOSS for SearchMOSS for Search Single Server / Dual Role (Index and Query) Single Server / Dual Role (Index and Query)
Medium Search DeploymentMedium Search Deployment WSSWSS
Multiple Search Servers with the following limitationsMultiple Search Servers with the following limitations Single Index ServerSingle Index Server
Single Site CollectionSingle Site Collection Single Set of Content DatabasesSingle Set of Content Databases
Multiple Query ServersMultiple Query Servers
MOSSMOSS Three ServersThree Servers
One Index ServerOne Index Server Two Query Servers running on two Web Front-End serversTwo Query Servers running on two Web Front-End servers
MOSS for SearchMOSS for Search Three ServersThree Servers
One Index ServerOne Index Server Two Query ServersTwo Query Servers
Large Search DeploymentLarge Search Deployment WSSWSS
Multiple Search Servers with the following limitationsMultiple Search Servers with the following limitations Multiple Index Servers (64-bit)Multiple Index Servers (64-bit)
Each Indexing a Single Site Collection with their own Set of Each Indexing a Single Site Collection with their own Set of Content DatabasesContent Databases
Index Servers are not redundant from one another.Index Servers are not redundant from one another. Multiple Query Servers each associated with their own single Multiple Query Servers each associated with their own single
Index Server running on the same machine (64-bit)Index Server running on the same machine (64-bit) Query servers are not redundant from one anotherQuery servers are not redundant from one another
MOSSMOSS One Index Server (64-bit)One Index Server (64-bit) Many Separate Query servers (64-bit)Many Separate Query servers (64-bit)
MOSS for SearchMOSS for Search One Index Server (64-bit)One Index Server (64-bit) Many Separate Query servers (64-bit)Many Separate Query servers (64-bit)
Geographically Distributed SitesGeographically Distributed SitesMOSS Search DeploymentMOSS Search Deployment
Search service People service
---
Shared Service Provider (SSP)Index Corp, EMEA, APACand other locations
http://sales http://finance http://hr
spsite spsite spsite spsite spsite spsite
spweb spweb spweb spweb spweb spweb
Virtual Servers
External content
Search service People service
---
Shared Service Provider (SSP)Index APAC only
http://apacsaleshttp://apacfinancehttp://apachr
spsite spsite spsite spsite spsite spsite
spwebspweb spweb spwebspweb spweb
Virtual Servers
External contentSearch service People service
---
Shared Service Provider (SSP)Index EMEA only
http://emeasaleshttp://emeafinancehttp://emeahr
spsite spsite spsite spsite spsite spsite
spwebspweb spweb spwebspweb spweb
Virtual Servers
External content
Other Locations
Corp. Sites
Deployment ScenariosDeployment Scenarios
Collaboration Environment (WSS v3)Collaboration Environment (WSS v3) Enterprise Portal (MOSS 2007)Enterprise Portal (MOSS 2007) Internet Facing Portal (MOSS 2007)Internet Facing Portal (MOSS 2007)
Collaboration Environment Collaboration Environment Scenario WSS v3Scenario WSS v3 iTech – startup software consulting iTech – startup software consulting
firmfirm
Large number of disjoint teams Large number of disjoint teams working on projects of varying working on projects of varying durationsdurations
Team sites used for collaboration and Team sites used for collaboration and communicationcommunication
No organizational needs across sitesNo organizational needs across sites
Collaboration Environment Scenario Collaboration Environment Scenario WSS v3 (cont)WSS v3 (cont)
WSS farm with single WSS farm with single IIS virtual server IIS virtual server http://team http://team
Scales to large number Scales to large number of team sites of team sites
Content indexed Content indexed automatically automatically
WSS v3 standalone WSS v3 standalone topology topology 1 Search box (both 1 Search box (both
roles)roles)
X
User Requests
Search Server – Indexing and Query
Crawling
Web Front Ends
ContentDatabases
Load Balancer
Collaboration Environment Collaboration Environment Scenario WSS v3 (cont)Scenario WSS v3 (cont)
http://team
team1 team2
spweb spweb
Virtual Server
team3
spwebspweb
SPSites
Content Databases
Search – core feature of WSS
Contextual scopes – site and list
No search across sites
Enterprise Portal ScenarioEnterprise Portal ScenarioMOSS 2007MOSS 2007 iTech – growing company with growing iTech – growing company with growing
needsneeds iTech – needs a single point for iTech – needs a single point for
information access for employeesinformation access for employees They now need to search over other They now need to search over other
repositories:repositories: Personnel records – People searchPersonnel records – People search Seibel sources – BDC searchSeibel sources – BDC search File Shares / Web sites – other external File Shares / Web sites – other external
datadata
Enterprise Portal ScenarioEnterprise Portal ScenarioMOSS 2007 (cont)MOSS 2007 (cont) Upgrade from WSS Upgrade from WSS MOSS MOSS Search is a shared service through the SSPSearch is a shared service through the SSP Central enterprise portal – http://itechCentral enterprise portal – http://itech Existing virtual server http://team associated Existing virtual server http://team associated
with SSP – search box switches to use with SSP – search box switches to use MOSSMOSS
Base WSS search is not running – but Base WSS search is not running – but search available to sites through shared search available to sites through shared search servicesearch service
Indexes – local and external contentIndexes – local and external content
Enterprise Portal ScenarioEnterprise Portal ScenarioMOSS 2007 (cont)MOSS 2007 (cont)
http://team
team1 team2
spweb spweb
Virtual Server
team3
spwebspweb
SPSites
Content Databases
Search servicePeople service
…Shared Service Provider
FarmExternal content
http://itech
HR Sales
spweb spweb
Virtual Server
Finance
spwebspweb
SPSites
Content Databases
Enterprise Portal ScenarioEnterprise Portal ScenarioMOSS 2007 (cont)MOSS 2007 (cont) Topology with Topology with
indexer and indexer and query serversquery servers
Load balanced Load balanced query serversquery servers
Scale out and Scale out and scale up – new scale up – new SSP dimensionSSP dimension
X
User Requests
Load Balancer
Query servers
Web front ends
Crawling
Content databases
Indexer
Propagation of indexes
Query Servers
added for throughput
Single indexer crawls logical SSP = local +
external content
Internet Facing Portal Internet Facing Portal Scenario - MOSS 2007Scenario - MOSS 2007 Internet facing site for customers – Internet facing site for customers –
www.itech.comwww.itech.com High traffic focused on content High traffic focused on content
presentationpresentation Public accessPublic access More publishing and less collaborationMore publishing and less collaboration Controlled and tightly managed Controlled and tightly managed
contentcontent
Internet Facing Portal Internet Facing Portal Scenario - MOSS 2007 (cont)Scenario - MOSS 2007 (cont) Two separate farms: Production and Two separate farms: Production and
test farmstest farms MOSS installationMOSS installation Controlled publishing of content to Controlled publishing of content to
production farm from test farmproduction farm from test farm Single shared service provider per farmSingle shared service provider per farm Shared search service in each farm Shared search service in each farm
crawls content in each farm crawls content in each farm independentlyindependently
Internet Facing Portal Internet Facing Portal Scenario - MOSS 2007 (cont)Scenario - MOSS 2007 (cont)
www.itech.com
Services Customers
spweb spweb
Virtual Server
About itech
spwebspweb
Content Databases
SPSites
Search servicePeople service
---
SSPProduction farm
http://itechtest
Services Customers
spweb spweb
Virtual Server
About itech
spwebspweb
Content Databases
SPSites
Search servicePeople service
---
SSPTest Farm
Questions?Questions?
Module 4Module 4
Crawl and Query ProcessesCrawl and Query Processes
AgendaAgenda
The Crawl ProcessThe Crawl Process Crawl WalkthroughCrawl Walkthrough Index PropagationIndex Propagation
The Query ProcessThe Query Process
Crawl WalkthroughCrawl Walkthrough
When a crawl is requested . . .When a crawl is requested . . .
1.1. Indexer grabs the start address of Indexer grabs the start address of content sourcecontent source
2.2. Start address is prefixed with protocol Start address is prefixed with protocol associated with accessing the contentassociated with accessing the content
3.3. Appropriate protocol handler invoked Appropriate protocol handler invoked to traverse the content sourceto traverse the content source
4.4. During traversal, the handler will During traversal, the handler will identify content nodes it needs to identify content nodes it needs to indexindex
Crawl Walkthrough (cont)Crawl Walkthrough (cont)5.5. Protocol handler invokes IFilter Protocol handler invokes IFilter
associated with content node typeassociated with content node type
6.6. IFilter identifies and extracts properties IFilter identifies and extracts properties from content nodefrom content node
7.7. Protocol handler supplements IFilter Protocol handler supplements IFilter data with additional property data with additional property informationinformation
8.8. Data associated with content node is Data associated with content node is added to indexadded to index
9.9. Index “delta” propagates to search Index “delta” propagates to search serversservers
Crawl Overview DiagramCrawl Overview Diagram
Search Process
Chunks
Filter Daemon
Shared Memory
Protofcol Handler
IPro
toco
lHan
dler
Filter
IFilt
er
URL
Chunks
Documents
SSP Catalog
Filtering Thread
pool
GathererMetadata
ExtractionIndexer
Catalog
Property Store
SQL Server
· URL History· Crawl Queue· Property Store
Word breakers
URL
Index PropagationIndex PropagationFarm SampleFarm Sample
Indexer
Load Balancer
Crawling
User Requests
Web
front
ends
Ind
ex P
rop
ag
ati
on
Query
Servers
Propagation will occur only when Propagation will occur only when the index and search components the index and search components are on separate serversare on separate servers
Continuous propagationContinuous propagation Changes sent incrementally to all query Changes sent incrementally to all query
servers associated with the index server.servers associated with the index server. Merging of the index occurs on the query Merging of the index occurs on the query
servers after propagation.servers after propagation. Query servers continue serving queries Query servers continue serving queries
while propagation is in progresswhile propagation is in progress
Index PropagationIndex Propagation
Index PropagationIndex Propagation
Index File LocationIndex File Location Set in Office SharePoint Server Search Set in Office SharePoint Server Search
Service settingsService settings Default location: Default location: C:C:\\Program Files\Microsoft Office Program Files\Microsoft Office
Servers\12.0\Data\Office Server\ApplicationsServers\12.0\Data\Office Server\Applications
Can be programmatically set using the stsadm commandCan be programmatically set using the stsadm command
Index Server:Index Server:
““stsadm.exe -o editssp –indexlocation stsadm.exe -o editssp –indexlocation index file path”index file path”
Query ServerQuery Server
““stsadm.exe –o osearch –propagationlocation stsadm.exe –o osearch –propagationlocation index file path”index file path”
The Query ProcessThe Query Process
Query Initiation and Results Query Initiation and Results PresentationPresentation
Query ExecutionQuery Execution Query WalkthroughQuery Walkthrough
Query Initiation and Results Query Initiation and Results PresentationPresentation Typically, provided by the WSS / MOSS Typically, provided by the WSS / MOSS
WFE role, through OOB WebPartsWFE role, through OOB WebParts Could be an Office client or other Could be an Office client or other
custom applicationcustom application Responsible for constructing the “full” Responsible for constructing the “full”
query and communicating with the query and communicating with the query execution servicesquery execution services
Query ExecutionQuery Execution
Always provided by a server tagged Always provided by a server tagged with the Query rolewith the Query role
Consumes a query requestConsumes a query request Executes the request using the query Executes the request using the query
index on the file system as well as the index on the file system as well as the SSP search database (if MOSS)SSP search database (if MOSS)
Handles OOB security trimmingHandles OOB security trimming Returns requested properties of the Returns requested properties of the
result set to the callerresult set to the caller
Query Walkthrough (cont)Query Walkthrough (cont)When a query is requested . . .When a query is requested . . .
1.1. Query terms collectedQuery terms collected
2.2. Terms supplemented with contextual Terms supplemented with contextual informationinformation
3.3. Query formulated and issued through the Query formulated and issued through the Query OM or the Web ServiceQuery OM or the Web Service
4.4. Query is executed against the index and Query is executed against the index and property storeproperty store
5.5. Query results returnedQuery results returned Results are ordered according to their relevance Results are ordered according to their relevance
to the query wordsto the query words Trimmed based on the user’s permissions.Trimmed based on the user’s permissions.
Questions?Questions?
Module 5Module 5
The Search End-User ExperienceThe Search End-User Experience
Module AgendaModule Agenda Introducing the Search End-User Introducing the Search End-User
ExperienceExperience Customizing SearchCustomizing Search People SearchPeople Search
Introducing the Search End-Introducing the Search End-User ExperienceUser Experience Complete Search experienceComplete Search experience Search is everywhereSearch is everywhere Tab-based user interface for easy Tab-based user interface for easy
navigationnavigation Easy to extend and customizeEasy to extend and customize
Introducing the End-User Search ExperienceIntroducing the End-User Search Experience
Search BoxesSearch Boxes Search CenterSearch Center Search Web PartsSearch Web Parts
Query OM
Qu
ery
Resu
lts
Advanced
Search
Hidden ObjectHttp: Get Http: PostSearch
Box XML XMLXML
Web Parts
XSL
Transformation
OOB Search UI/Custom Search Apps
Query OM and Web Service
Search WebPartsSearch WebParts Nine Standard Search Web Parts Nine Standard Search Web Parts
Search BoxSearch Box Core ResultsCore Results High ConfidenceHigh Confidence StatisticsStatistics PaginationPagination Action LinksAction Links Matching Keywords and Best BetsMatching Keywords and Best Bets Search Summary Search Summary (Did you mean?)(Did you mean?)
Advanced SearchAdvanced Search
Result page infrastructure Result page infrastructure Data shared through hidden objectData shared through hidden object
All Search Web Parts within the same page share All Search Web Parts within the same page share the same hidden objectthe same hidden object
Connection between Search Web Part is Connection between Search Web Part is automatically doneautomatically done
Need only to Drag and Drop (or select) a Search Need only to Drag and Drop (or select) a Search Web Part on the pageWeb Part on the page
Allows for rapid page designAllows for rapid page design Hidden Object is internal and cannot be used by Hidden Object is internal and cannot be used by
custom Web Partscustom Web Parts
All Search Web Parts derive from Data Form All Search Web Parts derive from Data Form Web PartWeb Part
Advanced Search Advanced Search
Allows power searchers to exercise greater Allows power searchers to exercise greater control on how they querycontrol on how they query
A link from the search boxA link from the search box Control what is displayed in the page by Control what is displayed in the page by
modifying the xml stored in the web part modifying the xml stored in the web part property “Properties”property “Properties” i.e., can be used for displaying a new i.e., can be used for displaying a new
language check boxlanguage check box
Not provided by WSS Search UINot provided by WSS Search UI Implemented using the SQL syntaxImplemented using the SQL syntax
Customizing the End User Customizing the End User ExperienceExperience Search in everySearch in every company is differentcompany is different
Different metadata might matterDifferent metadata might matter Documents: Title, Author, File location, sizeDocuments: Title, Author, File location, size Records: Patient, Doctor, Healthcare provider, SSN…Records: Patient, Doctor, Healthcare provider, SSN…
Multi- or single-languagesMulti- or single-languages How users meaningfully scope searches differsHow users meaningfully scope searches differs
““All finance documents”All finance documents” ““All patient records”All patient records” ““All published documents”All published documents”
Customize results to “pop” metadata that Customize results to “pop” metadata that mattersmatters
Customization offered at many levelsCustomization offered at many levels Web Parts, XSLT/CSS, full Object Model…Web Parts, XSLT/CSS, full Object Model…
Customization ChoicesCustomization Choices Search CenterSearch Center
Simple Site with few pagesSimple Site with few pages Default PageDefault Page Result PageResult Page Advanced Search PageAdvanced Search Page People Search PagePeople Search Page
Results PagesResults Pages All Sites Results PageAll Sites Results Page People Results PagePeople Results Page
Advanced Search Page and Web PartAdvanced Search Page and Web Part Show Scope PickerShow Scope Picker
ScopesScopes
Property PickerProperty Picker LanguagesLanguages
Search Web PartsSearch Web Parts
Customizing SearchCustomizing Search
Adding Search Center TabsAdding Search Center Tabs Customizing Search Web PartsCustomizing Search Web Parts Customizing Search ResultsCustomizing Search Results
People SearchPeople Search Bring people into the search experienceBring people into the search experience
Getting your job done means working withGetting your job done means working withthe right peoplethe right people
Find subject matter experts based on theirFind subject matter experts based on theirknowledge and contactsknowledge and contacts
People list can come from AD, SQL, othersPeople list can come from AD, SQL, others
Discovering ExpertsDiscovering ExpertsPeople are as important as data!People are as important as data!
People SearchPeople Search
People ResultsPeople Results Customizing ResultsCustomizing Results
Refine Your People Search Refine Your People Search
Refine by Job TitleRefine by Job Title Searches for the selected Job Searches for the selected Job
TitleTitle
Refine by Department Refine by Department Searches for the selected Searches for the selected
DepartmentDepartment
““Show more options” link (6+) Show more options” link (6+) Listed in order of frequencyListed in order of frequency
People Search Web Parts People Search Web Parts
Two OOB People Search Web Parts Two OOB People Search Web Parts People Search BoxPeople Search Box People Search Core ResultsPeople Search Core Results
Inherit from the Search Core Results Web PartInherit from the Search Core Results Web Part
Can be mixed on the same page with Can be mixed on the same page with other Search Web Partsother Search Web Parts
People Results Search Web People Results Search Web PartsParts Web Part properties such as:Web Part properties such as:
(similar to Core Search WP)(similar to Core Search WP) Formatting (i.e. width of the search Formatting (i.e. width of the search
box)box) Number of Results per pageNumber of Results per page Display “Alert Me”, “RSS” linksDisplay “Alert Me”, “RSS” links Turn stemming on/off (default “off”)Turn stemming on/off (default “off”) Remove Duplicate Results on/off Remove Duplicate Results on/off
(default “on”)(default “on”) Fixed keyword QueryFixed keyword Query Select ColumnsSelect Columns Results formatting with XSLResults formatting with XSL Social Distance (view)Social Distance (view)
Social Distance Colleagues Social Distance Colleagues
Suggested Colleague list Suggested Colleague list members are mined from:members are mined from: Microsoft Windows Microsoft Windows
Messenger (IM)Messenger (IM) Microsoft OfficeMicrosoft Office
Outlook e-mailOutlook e-mail
(Outlook Add-In)(Outlook Add-In)
Questions?Questions?
Module 6Module 6
Search Object ModelSearch Object Model
Workshop AgendaWorkshop Agenda
Scenarios for Extending Search Query Syntax Query Object Model Query Web Service
Topic: Scenarios for Topic: Scenarios for Extending SearchExtending Search
In this first section we will examine 2 scenarios for extending Search:Integrate with Search Center Integrate Search into 3rd party sites and applications
Integrate with MOSS Search CenterIntegrate with MOSS Search Center
Use cases: Use Search URL request parameters to add
predefined saved searches Build custom search box Web parts for
custom look and feel Build custom search core result Web parts
for own look and feel and customized querying
Extending Search
Integrate MOSS Search into 3rd Party Integrate MOSS Search into 3rd Party Sites and ApplicationsSites and Applications
Build 3rd party user interface which leverages MOSS Search through Web Services
Use cases Add MOSS Search features into existing
Web sites Add MOSS Search into existing line of
business or custom applications
Extending Search
Topic: Query SyntaxTopic: Query Syntax
In this section we will examine the three types of search syntax for building search queries supported by MOSS:KeywordURLSQL
Keyword SyntaxKeyword Syntax
Used in standard Search Box New keyword syntax Simple and easy to use Consistent property:value syntax
across Office, Windows and Live search
OverviewOverview
gallery hinges –brass site:http//supportdesk scope:Productsgallery hinges –brass site:http//supportdesk scope:Products
Build-in support for using include and exclude terms
Look for term bike, but not related to fitness
Look for phrase “SharePoint Services” but not the term v2
Include is implied when is no (+/-) prefix
Keyword SyntaxKeyword SyntaxInclude/ExcludeInclude/Exclude
bike -fitnessbike -fitness
+”SharePoint Services”-v2+”SharePoint Services”-v2
Narrowing results by default Searches using “AND” between query terms
Does not recognize logical operators like “OR”, “NEAR” as keywords – it treats them all as search terms
Does not support complex queries like (A AND B) OR (C AND D)
Complex Boolean searches are supported by the engine and the SQL syntax
Keyword SyntaxKeyword SyntaxBoolean SearchBoolean Search
Keyword SyntaxKeyword SyntaxProperty restrictionsProperty restrictions
• Supports property:value as part of the keyword string
• Can use any managed property
• Supports the use of phrases Can be used for exact matches when the property
value includes spaces Without quotes then prefix matching is done.
Supports word stemming
No wildcard support in Keyword Syntax Search box does not do wildcard searching. The
following is not recognized as a wildcard search
Use Advanced Search property restrictions to look for parts of a word
Requires new search results Web parts Wildcards are supported by the engine and
the SQL query syntax
Keyword SyntaxKeyword SyntaxNo wildcard supportNo wildcard support
SharePShareP**
URL SyntaxURL SyntaxUse Case
Launching a URL in custom application Save Searches Custom search boxes
Request Parameters Content: results.aspx?k=fish Scopes: results.aspx?k=fish&s=BBC Sort:
results.aspx?v=date results.aspx?v=relevance
Page: results.aspx?start=21
SQL Syntax OverviewSQL Syntax OverviewSQL Syntax offers: Consistent SQL across enterprise and
desktop Complex queries and Boolean searches
Comparison operators Arbitrary groupings for AND, OR, NOT Freetext() CONTAINS() LIKE ORDER BY ASC | DESC
Custom SQL query statements Wildcard support
Write complex Boolean searches using AND, OR, NOT
SQL SyntaxSQL SyntaxComplex Boolean SearchesSQL SyntaxSQL SyntaxComplex Boolean Searches
Returns documents for which the following is true: Document contains all the search terms in
at least one of the columns specified One of the search terms must also be
found in the Contents column
Use only one FREETEXT predicate for most optimal ranking
The FREETEXT predicate also supports (+/-)
SQL SyntaxSQL SyntaxFREETEXT predicateSQL SyntaxSQL SyntaxFREETEXT predicate
Get wildcard support using the CONTAINS predicate:
Wildcard: Words or phrases with an asterisk (*) added to the end. WHERE CONTAINS
('
"compu*" NEAR "soft*"
')
SQL SyntaxSQL SyntaxWildcard SupportSQL SyntaxSQL SyntaxWildcard Support
Removed in MOSS 2007 Query property weights UNION ALL MATCHES SELECT * COALESCE TABLE
SQL SyntaxSQL SyntaxRemoved from SQL syntaxSQL SyntaxSQL SyntaxRemoved from SQL syntax
Topic: Query Object ModelTopic: Query Object Model
In this section we will examine:The Query Object ModelThe Query Object PathThe Query Web Service
Query Object ModelQuery Object Model
New object model Use the query object model to:
Build custom search user interface, like Web parts or ASPX applications
Gain direct access to query and results properties
Invoke custom queries
2 types of query syntaxes: Keyword SQL
Query Object ModelQuery Object ModelFeaturesFeatures Managed code API Single request – multiple results
Result Types• Relevant
results• High
confidence results
• Special terms• Definitions
Optional parameters
• # of Sentences in Summary
• Implicit - AND/OR• Number of results• Ignore noise words• Enable stemming• Language
Query Object PathQuery Object Path
Query OMQuery OMInputInput OutputOutput
SQL SQL QueryQuery
OptionalOptional
ParametersParameters
Query Query EngineEngine
ResultTableCollectionResultTableCollection ResultTable:ResultTable:
IDataReaderIDataReader
Relevant Relevant resultsresults
High High confidenceconfidence
Special Special termsterms
DefinitionsDefinitions
Site UISite UI
Custom ClientCustom Client
LocalLocal
RemoteRemote
Keyword Keyword QueryQuery
Execute()Execute()
Query Web ServiceQuery Web ServiceUse and MethodsUse and Methods Use Case
Leverage Search in remote sites or application
Office Research Pane
Methods Query QueryEx GetSearchMetaData Registration Status
Query Web ServiceQuery Web ServiceSearch Center FeaturesSearch Center Features
Standard Search Center features not built into the Web service Hit highlighting Search usage reporting Search logging Search statistics Result type icons
Using Query vs. QueryEx Implementing hit highlightingImplementing hit highlighting
Questions?Questions?
Module 7Module 7
AdministrationAdministration
Module AgendaModule Agenda
Administrative ArchitectureAdministrative Architecture Farm AdministrationFarm Administration SSP AdministrationSSP Administration Site Collection AdministrationSite Collection Administration Site AdministrationSite Administration
Search Usage ReportingSearch Usage Reporting Administrative ToolsAdministrative Tools Lab: Adding Content SourcesLab: Adding Content Sources Lab: Search SchemaLab: Search Schema
Shared ServicesShared ServicesBusiness unit ITBusiness unit ITService-level Service-level configurationconfigurationE.g. Create searchE.g. Create searchcontent source, content source, Search ScopesSearch Scopes
Central AdministrationCentral AdministrationIT AdministratorsIT AdministratorsFarm-level Farm-level
StatusStatusResource Resource managementmanagement
One per farmOne per farmE.g. Create new E.g. Create new sitesite
Administrative ArchitectureAdministrative Architecture
Site SettingsSite SettingsBusiness site ownerBusiness site ownerSite specific Site specific configuration and configuration and taskstaskse.g. Create new liste.g. Create new list
Three Tier AdministrationThree Tier AdministrationWeb-basedWeb-basedRole- and Task-delineatedRole- and Task-delineatedControlled DelegationControlled DelegationSecure IsolationSecure Isolation
Farm ManagementFarm Management(IT Administrators)(IT Administrators)
SharePoint 3.0 Central AdministrationSharePoint 3.0 Central Administration
Common TasksCommon Tasks Manage Topology and ServicesManage Topology and Services
Servers in FarmServers in Farm Services in ServerServices in Server
Security ConfigurationSecurity Configuration Update Farm Administrator’s GroupUpdate Farm Administrator’s Group
Backup and RestoreBackup and Restore IndexIndex Search DatabaseSearch Database
Global ConfigurationGlobal Configuration Timer Job DefinitionsTimer Job Definitions Timer Job StatusTimer Job Status
Manage Search ServiceManage Search Service
Using Central AdminUsing Central Admin
Operations – Topology and ServicesOperations – Topology and ServicesServers in Farm / Services on ServerServers in Farm / Services on Server Query Server(s)Query Server(s)
Office SharePoint Server Search ServiceOffice SharePoint Server Search Service Stop / StartStop / Start
Office SharePoint ServicesOffice SharePoint ServicesHelp Search ServiceHelp Search Service Stop / StartStop / Start
Index Server(s)Index Server(s) Office SharePoint Server Search ServiceOffice SharePoint Server Search Service
Stop / StartStop / Start
Operations – Backup and RestoreOperations – Backup and Restore
Perform a backupPerform a backup Restore from backupRestore from backup
Operations – Global ConfigurationOperations – Global Configuration Timer Job DefinitionsTimer Job Definitions
SharePoint Services Search RefreshSharePoint Services Search Refresh Disable / Enable Disable / Enable (Change and update WSS search configuration)(Change and update WSS search configuration)
Indexing Schedule Manager on MOSSIndexing Schedule Manager on MOSS Disable / EnableDisable / Enable
Timer Job StatusTimer Job Status Succeeded / FailedSucceeded / Failed
Search Application ManagementSearch Application Management
Manage Search ServiceManage Search Service Farm-level Search settingsFarm-level Search settings Proxy Server settingsProxy Server settings Query and Index ServersQuery and Index Servers Server Listing and their Search Server Listing and their Search
serviceservice Shared Service Providers with Shared Service Providers with
Search enabledSearch enabled SSP name listingSSP name listing Crawler Impact RulesCrawler Impact Rules
Crawler Impact RulesCrawler Impact Rules
Configured through Central Configured through Central AdministrationAdministration
Allows “throttling” of the indexer to Allows “throttling” of the indexer to reduce impact of a crawl on a reduce impact of a crawl on a particular serverparticular server
Supports wildcardsSupports wildcards Used in conjunction with crawl Used in conjunction with crawl
schedules schedules
Crawler Impact Rules (cont)Crawler Impact Rules (cont)
Use . . . To . . .
* as the site name Apply the rule to all sites
*.* as the site name Apply the rule to sites with a dot in their name
*.site_name.com as the site name Apply the rule to all sites in the site_name.com domain
*.top-level_domain_name (such as *.com or *.net) as the site name
Apply the rule to all sites that end with a specific top-level domain name
? Replace any single character in a rule
Shared Services ProviderShared Services Provider(SSP)(SSP)
ManagementManagement(SSP Administrators)(SSP Administrators)
(Content Oriented Administration)(Content Oriented Administration)
Common TasksCommon Tasks
Configure Search Settings Configure Search Settings Content SourcesContent Sources Crawl SettingsCrawl Settings Authoritative Pages SettingsAuthoritative Pages Settings ScopesScopes
Content SourcesContent Sources
Represent an arbitrary container of Represent an arbitrary container of informationinformation
Require at least one start address, Require at least one start address, although multiple start addresses can although multiple start addresses can be provided be provided
Start address cannot be reusedStart address cannot be reused Requires a registered protocol handlerRequires a registered protocol handler Five out-of-box content source types Five out-of-box content source types
are available, mapping to the five out-are available, mapping to the five out-of-box protocol handlersof-box protocol handlers
SharePoint Content SourceSharePoint Content Source
Includes both SPS 2003, MOSS 2007, WSS v2, and Includes both SPS 2003, MOSS 2007, WSS v2, and WSS v3 sitesWSS v3 sites
Can limit crawl to only sites specified in start Can limit crawl to only sites specified in start address or all sites found below one or more address or all sites found below one or more provided hostnamesprovided hostnames
Crawler will use target site’s APIs to include Crawler will use target site’s APIs to include security information around content in the indexsecurity information around content in the index
For SPS 2003 content sources, crawler account For SPS 2003 content sources, crawler account requires “change” rights, which necessitates the requires “change” rights, which necessitates the crawler having administrator rightscrawler having administrator rights
Examples: sps3://moss-01/ or Examples: sps3://moss-01/ or http://moss-01/sitecollection/
Content sources decoupled from scopesContent sources decoupled from scopes
Web Site Content SourceWeb Site Content Source
Any content source available over Any content source available over HTTP or HTTPSHTTP or HTTPS
If a SharePoint URL is provided, the If a SharePoint URL is provided, the crawler will detect this and index it as crawler will detect this and index it as though it were a SharePoint content though it were a SharePoint content source (this can be overridden with source (this can be overridden with crawl rules)crawl rules)
Page depth and server hops can be Page depth and server hops can be controlledcontrolled
Web Site Content Source Web Site Content Source (cont)(cont) Security information around content is Security information around content is
not included in indexnot included in index Dynamic personalization will result in Dynamic personalization will result in
the index being populated with what the index being populated with what the crawler is presented withthe crawler is presented with
Example: Example: http://website or or http://www.somesite.com
File Shares Content SourceFile Shares Content Source
Any content visible over a Windows Any content visible over a Windows server shared folderserver shared folder
Some non-Windows shares *may* be Some non-Windows shares *may* be crawled, if that share can be presented crawled, if that share can be presented as a Windows share (for instance, as a Windows share (for instance, Samba with Linux, Services for Unix)Samba with Linux, Services for Unix)
Start address can be the share root or Start address can be the share root or subfolders beneath itsubfolders beneath it
Security information is picked up by Security information is picked up by the gathererthe gatherer
Exchange Public Folders Exchange Public Folders Content SourceContent Source Allows the indexer to crawl a public Allows the indexer to crawl a public
folder that exists on Exchangefolder that exists on Exchange Requires Outlook Web Access, as Requires Outlook Web Access, as
crawl is done over HTTPcrawl is done over HTTP Includes messages, conversations, Includes messages, conversations,
and other collaborative contentand other collaborative content URL presented in the search results URL presented in the search results
will point to a deep link within OWAwill point to a deep link within OWA Example: http://owa/public/folderExample: http://owa/public/folder
Business Data Content Business Data Content SourceSource Allows the indexer to crawl metadata Allows the indexer to crawl metadata
exposed through the Business Data exposed through the Business Data CatalogCatalog
Can elect to include all Business Data Can elect to include all Business Data Applications or a selected number of Applications or a selected number of themthem
Lotus Notes Content Lotus Notes Content SourceSource
Crawling SchedulesCrawling Schedules
Allow administrator to indicate the frequency Allow administrator to indicate the frequency at which a content source will be re-crawled at which a content source will be re-crawled (daily, weekly, monthly)(daily, weekly, monthly)
Can indicate what time the content source Can indicate what time the content source should be crawledshould be crawled
Schedule should be driven by:Schedule should be driven by: Anticipated change at the content source (is this Anticipated change at the content source (is this
static content or content that is constantly static content or content that is constantly changing)changing)
Business expectations around when content Business expectations around when content changes should be reflected in the indexchanges should be reflected in the index
Schedule can always be modifiedSchedule can always be modified
Maximum File SizeMaximum File Size
Default file size limit is 16MBDefault file size limit is 16MB To change the limit, you must add in To change the limit, you must add in
the registry new DWORD entry the registry new DWORD entry MaxDownloadSize at MaxDownloadSize at HKEY_LOCAL_MACHINE\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\
Gathering ManagerGathering Manager Make sure to increase timeout value to Make sure to increase timeout value to
avoid timeout exceptionsavoid timeout exceptions Change the value using the Manage Change the value using the Manage
Search Service page of the Central AdminSearch Service page of the Central Admin
Crawl RulesCrawl Rules
Define exceptions to the “typical” Define exceptions to the “typical” crawl processcrawl process Addresses can be pattern matched for Addresses can be pattern matched for
special treatmentspecial treatment Support exclusionSupport exclusion Support altering the authentication Support altering the authentication
mechanismmechanism
Examples of Crawl RulesExamples of Crawl Rules Testing of Crawl RulesTesting of Crawl Rules
Search Result Removal Search Result Removal (From Live Index)(From Live Index) Typically used when someone Typically used when someone
discovers something in the index that discovers something in the index that shouldn’t be thereshouldn’t be there
Permits administrator to immediately Permits administrator to immediately remove that content from the indexremove that content from the index
Crawl rule automatically created to Crawl rule automatically created to prevent that content from being indexed prevent that content from being indexed in the futurein the future
Restoring that content requires Restoring that content requires dropping the crawl rule and re-indexingdropping the crawl rule and re-indexing
Default Content Access Default Content Access AccountAccount Account used for crawling, by defaultAccount used for crawling, by default Can be overridden in the Crawl RulesCan be overridden in the Crawl Rules Set the default account to use when Set the default account to use when
crawling contentcrawling content Minimum crawler permission is “Full Read” Minimum crawler permission is “Full Read”
(still provides the same security trimming (still provides the same security trimming functionality)functionality)
Automatically configured for new sitesAutomatically configured for new sites Do not use an Administrator Account to Do not use an Administrator Account to
avoid crawling unpublished versions of a avoid crawling unpublished versions of a document.document.
Metadata Property MappingsMetadata Property Mappings
Server Name MappingServer Name Mapping
Override how MOSS displays Override how MOSS displays search resultssearch results
Hide file pathHide file path Sample: “file://moss/HOL” to Sample: “file://moss/HOL” to
“http://moss.litwareinc.com”“http://moss.litwareinc.com”
Search-based AlertsSearch-based Alerts Can be Activated / DeactivatedCan be Activated / Deactivated Deactivated after a reset of crawled contentDeactivated after a reset of crawled content Users can subscribe to an alert on a search Users can subscribe to an alert on a search
query query Alert is triggered if there are new or changed Alert is triggered if there are new or changed
items that satisfy the search queryitems that satisfy the search query An item is considered changed if its content An item is considered changed if its content
or metadata has changedor metadata has changed
Timer service is used to issue all alerts notifications (See User Alerts in Site Settings)Timer service is used to issue all alerts notifications (See User Alerts in Site Settings) Frequency can be set to Daily / WeeklyFrequency can be set to Daily / Weekly ““Alert Me” and RSS links can be added/removed using their Web Part propertyAlert Me” and RSS links can be added/removed using their Web Part property
Reset Crawled ContentReset Crawled Content
Powerful action!Powerful action! Will delete the content index!Will delete the content index! Search Results will no longer be available Search Results will no longer be available
on the farm until the index has been rebuild!on the farm until the index has been rebuild! Search alerts are deactivated unless the Search alerts are deactivated unless the
administrator unchecks the check box. administrator unchecks the check box. Alerts should be activated after a full crawl Alerts should be activated after a full crawl
was performed.was performed.
Specify Authoritative PagesSpecify Authoritative Pages
Helps prioritize Search Results - a way to Helps prioritize Search Results - a way to influence relevance results that are linked to influence relevance results that are linked to the authoritative pages, which will benefit the authoritative pages, which will benefit from a boost in rank.from a boost in rank. Most authoritativeMost authoritative Second-level authoritativeSecond-level authoritative Third-level authoritativeThird-level authoritative Sites to demoteSites to demote
ScopesScopes
Scopes are filters applied to Scopes are filters applied to search results to narrow the search results to narrow the results of a search queryresults of a search query
Types of ScopesTypes of ScopesScope Rules and BehaviorsScope Rules and BehaviorsSingle-rule ScopesSingle-rule ScopesMulti-rule ScopesMulti-rule Scopes
Site CollectionSite CollectionManagementManagement
(Site Collection Administrators)(Site Collection Administrators) (Application Administrators) (Application Administrators)
Site Collection Administration OptionsSite Collection Administration Options
Common TasksCommon Tasks Search SettingsSearch Settings Search ScopesSearch Scopes Search KeywordsSearch Keywords
Search SettingsSearch Settings
Two OptionsTwo Options Use the Search Center and custom scopes in the Use the Search Center and custom scopes in the
dropdowndropdown The way to change standard Search Center URL The way to change standard Search Center URL
for search boxesfor search boxes Do not use the Search Center – no custom scopesDo not use the Search Center – no custom scopes
Site Level ScopesSite Level Scopes Site Level Scopes display all scopes associated with a Site Site Level Scopes display all scopes associated with a Site
CollectionCollection Display Scopes are a site-level feature that is purely UIDisplay Scopes are a site-level feature that is purely UI
Administrator Administrator – – Combine multiple scopes into one selectable itemCombine multiple scopes into one selectable item Visitors Visitors – – UI Search dropdown box (or checked boxes for the UI Search dropdown box (or checked boxes for the
Advanced Search page) populated with the scopes included in the Advanced Search page) populated with the scopes included in the display groupdisplay group
+
Keywords and Best BetsKeywords and Best Bets
Prominently present editorially selected Prominently present editorially selected search resultssearch results
Keywords: Glossary of important terms Keywords: Glossary of important terms within your organizationwithin your organization
Best Bets are associated with particular Best Bets are associated with particular search keywordssearch keywords
Not available across site collectionsNot available across site collections
Search Settings for Fields - NoCrawlSearch Settings for Fields - NoCrawl
Set a NoCrawl attribute on one or Set a NoCrawl attribute on one or more columns within the site more columns within the site collectioncollection
Column content will not be indexed! Column content will not be indexed! Associated with Site Columns Associated with Site Columns
(Content Types)(Content Types)
Search VisibilitySearch Visibility
Site levelSite level Allow or deny the site to appear in search results.Allow or deny the site to appear in search results. If denied, the site will not be indexed.If denied, the site will not be indexed. Control ASPX pages within the site for visibility. Will Control ASPX pages within the site for visibility. Will
take into consideration item’s specific permissions.take into consideration item’s specific permissions.
List LevelList Level Allow or deny the list to appear in search results.Allow or deny the list to appear in search results. If denied, the list will not be indexed.If denied, the list will not be indexed.
Document Libraries and Folder LevelDocument Libraries and Folder Level Allow or deny the document library or folder to Allow or deny the document library or folder to
appear in search results.appear in search results. If denied, the Document Library (or folder) will not be If denied, the Document Library (or folder) will not be
indexed.indexed.
Search Usage Search Usage ReportsReports
Benefits of Search Queries Benefits of Search Queries and Results Reportingand Results Reporting Allows Site and SSP Administrators to:Allows Site and SSP Administrators to:
Have a visual look at end-user queries Have a visual look at end-user queries through charts and graphsthrough charts and graphs
Quickly quantify the success or failure of Quickly quantify the success or failure of the optimizations they can make to the optimizations they can make to crawlers and indexescrawlers and indexes
Export data to Microsoft Excel to further Export data to Microsoft Excel to further analyze and mineanalyze and mine
To Improve the Overall Search To Improve the Overall Search Experience One Must…Experience One Must…
Best way to improve search is to Best way to improve search is to understand visitors’ current search usage!understand visitors’ current search usage!
Understand what visitors are searching forUnderstand what visitors are searching for Products, features, services, general Information about Products, features, services, general Information about
the company, etc.the company, etc.
Understand if their search was successfulUnderstand if their search was successful Have they clicked on one of the results?Have they clicked on one of the results? Were there any results – does content exist?Were there any results – does content exist? Were they offered suggestions specifically associated Were they offered suggestions specifically associated
with their query?with their query? Have they misspelled the words within their query?Have they misspelled the words within their query?
Reporting ToolsReporting Tools Two sets of reportsTwo sets of reports
Search Query ReportsSearch Query Reports Search Results ReportsSearch Results Reports
Two different levels of reportsTwo different levels of reports Shared Service Provider (SSP)Shared Service Provider (SSP) Site CollectionSite Collection
Enabled by defaultEnabled by default Enabled within the SSPEnabled within the SSP Do not log queries from the Search Web Do not log queries from the Search Web
Service and from the custom Web Parts Service and from the custom Web Parts administratorsadministrators
Note: Data Stored in the SSP databaseNote: Data Stored in the SSP database
Reporting ToolsReporting Tools At the SSP levelAt the SSP level For enterprise content oriented For enterprise content oriented
administratorsadministrators
Reporting ToolsReporting Tools At the Site Collection levelAt the Site Collection level For Site Collection administratorsFor Site Collection administrators
Search Query Reporting – SSPSearch Query Reporting – SSP Tracks Queries that users Tracks Queries that users
issued for issued for all sites managed all sites managed by this SSPby this SSP
Five Different ReportsFive Different Reports Queries Over Previous 30 DaysQueries Over Previous 30 Days Queries Over Previous 12 MonthsQueries Over Previous 12 Months Top Query Origin Site Collection Top Query Origin Site Collection
Over Previous 30 Days*Over Previous 30 Days* Query for Scopes Over Previous Query for Scopes Over Previous
30 Days30 Days Top Queries Over Previous 30 Top Queries Over Previous 30
DaysDays
Also has Tabular View for Also has Tabular View for most reportsmost reports
* Specific to SSP
Search Query Reporting – Site Search Query Reporting – Site CollectionCollection
Tracks Queries issued Tracks Queries issued within this Site Collectionwithin this Site Collection
Four Different ReportsFour Different Reports Queries Over Previous 30 DaysQueries Over Previous 30 Days Queries Over Previous 12 Queries Over Previous 12
MonthsMonths Top Queries Over Previous 30 Top Queries Over Previous 30
DaysDays Query for Scopes Over Query for Scopes Over
Previous 30 DaysPrevious 30 Days
Also has Tabular View for Also has Tabular View for most reportsmost reports
Search Results Reporting – SSPSearch Results Reporting – SSP Tracks Result Click Tracks Result Click
Selections by users Selections by users within the sites managed within the sites managed by this SSPby this SSP
Five Different ReportsFive Different Reports Search Results Top Search Results Top
Destination PagesDestination Pages Queries with Zero ResultsQueries with Zero Results Most Clicked Best BetsMost Clicked Best Bets Queries With Zero Best BetsQueries With Zero Best Bets Queries With Low Click-Queries With Low Click-
throughthrough
Search Results Reporting – Site Search Results Reporting – Site CollectionCollection
Tracks Result Click Tracks Result Click Selections by users for this Selections by users for this Site CollectionSite Collection
Five Different ReportsFive Different Reports Search Results Top Destination Search Results Top Destination
PagesPages Queries with Zero ResultsQueries with Zero Results Most Clicked Best Bets (Editorial Most Clicked Best Bets (Editorial
Results)Results) Queries With Zero Best BetsQueries With Zero Best Bets Queries With Low Click-throughQueries With Low Click-through
Same list reports as SSP but, for Site Collection
Exporting ResultsExporting ResultsExport data for Export data for
extended extended reporting in Excel reporting in Excel and/orand/orExcel ServicesExcel Services
Questions?Questions?
Module 8Module 8
Performance, Scalability, and Performance, Scalability, and Capacity PlanningCapacity Planning
Module AgendaModule Agenda IntroductionIntroduction Search Capacity Planning in SPS 2003Search Capacity Planning in SPS 2003 MOSS 2007 Search Capacity PlanningMOSS 2007 Search Capacity Planning
Topology Topology QueryingQuerying IndexingIndexing Test EnvironmentTest Environment
Real World Experiences Real World Experiences Microsoft IntranetMicrosoft Intranet Microsoft Technology Center Proof of Microsoft Technology Center Proof of
Concept (PoC)Concept (PoC)
MOSS 2007 Search MOSS 2007 Search Capacity PlanningCapacity Planning Improvement highlightsImprovement highlights
Topology restrictions removedTopology restrictions removed Indexing limitations improvedIndexing limitations improved Continuous propagationContinuous propagation
TopologyTopology Deployment optionsDeployment options
Collapse index and query services on the Collapse index and query services on the same serversame server
Enable index service on one server and Enable index service on one server and query service on one or more different query service on one or more different servers servers
For both options you can have only For both options you can have only one index server one index server
Scale up versus scaling outScale up versus scaling out
Topology (cont)Topology (cont)
Topology restrictions from v2 removedTopology restrictions from v2 removed Can mix indexer/search rolesCan mix indexer/search roles Service can be managed after initial setup Service can be managed after initial setup
or later onor later on
Use mixed x86 and x64 hardware Use mixed x86 and x64 hardware architecturesarchitectures Ifilter, Protocol Handler limitationsIfilter, Protocol Handler limitations
Index server is very CPU intensiveIndex server is very CPU intensive Plan for availablity requirementsPlan for availablity requirements
Topology (cont)Topology (cont)
Topology Scaling Topology Scaling Reccomandations (for Search):Reccomandations (for Search): Query servers: 8 per farmQuery servers: 8 per farm Front end servers: 8 per farmFront end servers: 8 per farm Index servers: 4 per farmIndex servers: 4 per farm
MOSS 2007 Search TopologyMOSS 2007 Search Topology
Indexer
Load Balancer
Propagationof indexes
Contentdatabases
Externalcontent
User Requests
Web
front
ends
Query serversQuery serversseparated from
indexer
QueryingQuerying
Performance parametersPerformance parameters Scaling factorsScaling factors
Querying – Performance ParametersQuerying – Performance Parameters
Network always is responsible on Network always is responsible on query performances to end-user query performances to end-user experience:experience: In querying the Index Catalog, a front-end In querying the Index Catalog, a front-end
always hits SQL database for getting always hits SQL database for getting information on search results and for information on search results and for Security Trimming.Security Trimming.
In querying the Property Store, the Query In querying the Property Store, the Query server is not involved since the Property server is not involved since the Property Store is now on SQL Search database.Store is now on SQL Search database.
Querying – Performance ParametersQuerying – Performance Parameters
Querying – Performance ParametersQuerying – Performance Parameters
Query server memory:Query server memory: The more memory is available, the less The more memory is available, the less
the Search service will have to access the the Search service will have to access the hard disk to satisfy a given query.hard disk to satisfy a given query.
Ideally, enough memory should be Ideally, enough memory should be installed on the query servers to installed on the query servers to accommodate the entire index.accommodate the entire index.
Query server disk speed:Query server disk speed: RAID 10 is recommended.RAID 10 is recommended.
Querying – Scaling FactorsQuerying – Scaling Factors
Processor architectureProcessor architecture Use 64-bit serversUse 64-bit servers
Planning for performances: separate query Planning for performances: separate query from front-endfrom front-end Dedicated processor timeDedicated processor time Much available RAM for cachingMuch available RAM for caching
Planning for availability: add more than one Planning for availability: add more than one query server in your farmquery server in your farm This will require a dedicated machine for index, This will require a dedicated machine for index,
as described beforeas described before Tested maximum of eight query serversTested maximum of eight query servers
IndexingIndexing
PlanningPlanning Performance optimizationPerformance optimization StorageStorage LimitationsLimitations ScalingScaling
Indexing PlanningIndexing Planning Customer environmentCustomer environment
Number of usersNumber of users Network and connectivityNetwork and connectivity Disperse locationsDisperse locations Expected workloadsExpected workloads
PilotPilot Rollout planRollout plan
Estimate indexing windowEstimate indexing window
Indexing Planning (cont)Indexing Planning (cont)
Corpus definition:Corpus definition: A corpus is defined as the sum of all A corpus is defined as the sum of all
content that is being indexed.content that is being indexed. This includes all valid content sources, This includes all valid content sources,
like Web pages, items, documents, BDC, like Web pages, items, documents, BDC, and any metadata and security and any metadata and security information associated with this content.information associated with this content.
Indexing Planning (cont)Indexing Planning (cont) For each content source estimate:For each content source estimate:
Number of itemsNumber of items Storage used Storage used Types of itemsTypes of items SecuritySecurity Latency requirementsLatency requirements ConnectivityConnectivity Estimate indexing windowEstimate indexing window Expected yearly growthExpected yearly growth
Indexing - Indexing - PerformancePerformance OptimizationOptimization
Use dedicated front-end for best indexing Use dedicated front-end for best indexing performanceperformance No other services allowed on that serverNo other services allowed on that server
Adjust the Adjust the indexing performance level indexing performance level Use Maximum for best performanceUse Maximum for best performance
Use Crawler Impact RulesUse Crawler Impact Rules Carefully test impactCarefully test impact
Continuous propagationContinuous propagation Average time is 3 to 27 secondsAverage time is 3 to 27 seconds
WSS Change log for incremental crawlsWSS Change log for incremental crawls
Indexing - Indexing - PerformancePerformance OptimizationOptimization
Index server CPU:Index server CPU: As many processors are available as much crawl As many processors are available as much crawl
speed increasesspeed increases
Index server memory:Index server memory: The greater the memory capacity the more The greater the memory capacity the more
documents the crawler can process in paralleldocuments the crawler can process in parallel Having much available memory means to improve Having much available memory means to improve
crawl speedcrawl speed
Index Server Disk Speed:Index Server Disk Speed: Raid 10 with 2 ms access time and greater than Raid 10 with 2 ms access time and greater than
150 MB/sec write time150 MB/sec write time
Index StorageIndex Storage
Planning index storage as ratio of Planning index storage as ratio of corpuscorpus
Sizing depends on content in corpusSizing depends on content in corpus Type of content sourceType of content source Document formatsDocument formats Level of metadata and security Level of metadata and security
informationinformation Plan for expected growth ratesPlan for expected growth rates
Index Storage (cont)Index Storage (cont) Index / Query Server disk space Index / Query Server disk space
requirements:requirements: Index catalog size is normally in a Index catalog size is normally in a
range of 5% to trough 12% of corpus range of 5% to trough 12% of corpus sizesize
Recommended initial disk space is a Recommended initial disk space is a minimum of 2.5 times of index minimum of 2.5 times of index catalog sizecatalog size
That means: recommended initial That means: recommended initial disk space is disk space is at lease 30%at lease 30% of of indexed corpus sizeindexed corpus size
Index Storage (cont)Index Storage (cont)
Search databaseSearch database Contains metadata, ACLs, hit highlighting, Contains metadata, ACLs, hit highlighting,
crawl history, and usage reportscrawl history, and usage reports Estimated 2K per crawled documentEstimated 2K per crawled document Sizing depends on corpus contentSizing depends on corpus content Requires more space than the index Requires more space than the index
catalogcatalog Recommended initial disk space is a Recommended initial disk space is a
minimum of 4 times of index catalog sizeminimum of 4 times of index catalog size
Index Capacity LimitationsIndex Capacity Limitations Supported limit for a single index server is Supported limit for a single index server is
50 million documents50 million documents In this scenario we recommand only one Index In this scenario we recommand only one Index
server per farmserver per farm
One index server per SSPOne index server per SSP More SSPs can use the same indexerMore SSPs can use the same indexer
All MOSS 2007 for Search Editions All MOSS 2007 for Search Editions are are limited limited to one SSP per farmto one SSP per farm
MOSS 2007 is limited to 20 SSPs per farmMOSS 2007 is limited to 20 SSPs per farm MOSS 2007 for Search Standard Edition MOSS 2007 for Search Standard Edition
limited to 500,000 documents per farmlimited to 500,000 documents per farm
Index ScalingIndex Scaling First scale up (recommended)First scale up (recommended)
Optimal ranking and user experienceOptimal ranking and user experience Best managabilityBest managability Scale up system resourcesScale up system resources
Use x64 architectureUse x64 architecture Add more CPUs to increase performanceAdd more CPUs to increase performance Plan for minimum 4GB of memoryPlan for minimum 4GB of memory RAID 10 is recommended for optimal disk RAID 10 is recommended for optimal disk
speedsspeeds
Index ScalingIndex Scaling Scale outScale out
Add multiple SSPs each crawling unique Add multiple SSPs each crawling unique parts of the corpusparts of the corpus
Complete isolation between SSPsComplete isolation between SSPs Querying across multiple SSPs to get a Querying across multiple SSPs to get a
single relevant results set is not possiblesingle relevant results set is not possible Tested maximum of four index servers per Tested maximum of four index servers per
farmfarm
Recommended limit per farm across all Recommended limit per farm across all indexes is 50 million itemsindexes is 50 million items For scenarios higher than 50 million For scenarios higher than 50 million
items, add more farmsitems, add more farms
Test EnvironmentTest Environment
Establish a starting point topologyEstablish a starting point topology Use monitoring to establish actual Use monitoring to establish actual
performance and capacity dataperformance and capacity data Use Performance Monitor to collect Use Performance Monitor to collect
processor, memory, and disk information processor, memory, and disk information for each serverfor each server
Look for resource bottlenecksLook for resource bottlenecks Scale up available resourcesScale up available resources Scale out server rolesScale out server roles
Real World ExperiencesReal World Experiences
Microsoft IntranetMicrosoft Intranet Microsoft Technology Center PoCMicrosoft Technology Center PoC
Microsoft IntranetMicrosoft Intranet EnvironmentEnvironment
Estimate of indexed content Estimate of indexed content Around 12 TB in SharePoint Content Databases (mix of Around 12 TB in SharePoint Content Databases (mix of 2003 / 2007), unknown size outside of this environment2003 / 2007), unknown size outside of this environment
Total size of the indexTotal size of the index SSP search database ~282GBSSP search database ~282GB SSP profiles database ~51GBSSP profiles database ~51GB Index size on disk ~156GBIndex size on disk ~156GB
Total number of objects Total number of objects 23 million objects23 million objects 30 content sources, 6 with daily crawls30 content sources, 6 with daily crawls
Typical 'real world' query response time from this Typical 'real world' query response time from this implementation implementation ~2 seconds, although the product group is looking into ~2 seconds, although the product group is looking into
ways we can optimize this for our environmentways we can optimize this for our environment
Microsoft Technology Microsoft Technology Center PoCCenter PoC ObjectivesObjectives
Indexing large numbers of secure files on Indexing large numbers of secure files on file sharesfile shares
Verify MOSS 2007 search architectureVerify MOSS 2007 search architecture Test and recommend capacity planning Test and recommend capacity planning
and scaleand scale
TopologyTopology
Indexed corpus
Search db
Index catalog
Propagated catalog
1TB
23GB
25GB
ResultsResults For the biggest test run, which included For the biggest test run, which included
indexing 2.4 million secure files, here are the indexing 2.4 million secure files, here are the key metrics:key metrics: Full first-time indexing of entire corpus Full first-time indexing of entire corpus
took 23.1 hours.took 23.1 hours. Incremental crawls, where 4.7% of the Incremental crawls, where 4.7% of the
corpus was updated, took 3.7 hours.corpus was updated, took 3.7 hours. Total size of index, versus the corpus, Total size of index, versus the corpus,
was 2.4%, and for the search database, it was 2.4%, and for the search database, it was 2.1%. was 2.1%.
Full corpus crawl versus average number Full corpus crawl versus average number of items indexed per minute was 1642 of items indexed per minute was 1642 files/minute.files/minute.
Results (cont)Results (cont)
Summary of Known Limits and Summary of Known Limits and RestrictionsRestrictions
Tested recommendation of 50 million Tested recommendation of 50 million items per farmitems per farm
Hard limits:Hard limits: 1 indexer per SSP1 indexer per SSP 20 indexes per MOSS 2007 farm20 indexes per MOSS 2007 farm 1 index per MOSS 2007 for Search farm1 index per MOSS 2007 for Search farm 500 content sources per SSP500 content sources per SSP 500 start addresses per content source500 start addresses per content source 500,000 documents limit for MOSS 2007 500,000 documents limit for MOSS 2007
for Search Standard Editionfor Search Standard Edition
Capacity Planning ReferencesCapacity Planning References
Planning for performance and capacity:Planning for performance and capacity: http://technet2.microsoft.com/Office/en-us/library
/eb2493e8-e498-462a-ab5d-1b779529dc471033.mspx
Plan for software boundaries:Plan for software boundaries: http://technet2.microsoft.com/Office/en-us/library
/6a13cd9f-4b44-40d6-85aa-c70a8e5c34fe1033.mspx
Estimate performance and capacity Estimate performance and capacity requirements for search environmentsrequirements for search environments http://technet2.microsoft.com/Office/en-us/library
/5465aa2b-aec3-4b87-bce0-8601ff20615e1033.mspx
Questions?Questions?