Avi Rappoport, Search Tools Consulting
www.searchtools.com
Search and Discovery Tools
A View into the Future
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 2
Defining Intranet Search
• Searching internal network– Intranet and file servers– Email archives, Lotus Notes– External sites or feeds
• Using Internet-developed search tools – Protocols such as TCP/IP and HTTP– Thin client = Web browser– Search engine functionality and interface
• Like Google, Yahoo, AskJeeves
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 3
Present vs. Future
• 80/20 rule– Solve the easy problems now– Simple search
• “Information needs” -- a non-trivial question
• Technology is not a panacea
• Complex Research
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 4
Three Parts of Usable Search
content
search functionali
ty
user interfac
e
Like an iceberg,search ismostly invisible
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 5
Discovery: Finding What You Have
• Core Intranet– Varies with intranet history– HR and Communications– Facilities
• Support
• International
• Public sites
• Partner and Extranet Sites
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 6
Discovery: Good, Bad and Ugly
• Some items should be there but aren't– Problem links: bad syntax, JavaScript, etc.– Wrongly configured robots.txt– Graphical text, funky PDFs
• Some items shouldn't be there all– Confidential information– Early versions of documents– Very local content (4,000 tech support cases)
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 7
Discovery: What to Look For
• Documents with and without metadata– Title tag is the most important
• Frequency of updates– Dynamic servers don't show mod date
• Incoming and outgoing links• Languages and character sets• Errors
– Bad links– Access control
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 8
Search: Intranet Information Needs
• Don’t assume you know - invest in asking– Wide target for surveys– Outlying offices– Key audiences
• Data mining– Intranet user feedback– Search log analysis– Phone and email trends
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 9
Common Intranet Searches
• Employee and departmental contacts
• HR issues– Holidays, benefits, evaluations, surveys
• Office functions– Heating & cooling, training, menus
• Technical information– Product data, support, services
• Topical research (less frequent)
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 10
Real Intranet Usage Example3 business cards
7 fedex
8 webex
9 expense report
11 training
12 401k
13 pto
14 accounts payable
15 holiday party
17 bereavement
18 payroll
20 holiday
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 11
Most Frequent Search Problems
• Useful content not indexed
• Confusing interfaces
• Complicated query languages
• Mysterious relevance ranking
• Not enough human judgment
• Excess complexity
• Lack of user testing and log analysis
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 12
Defining Search Priorities
• Identify pain points– Common information needs– Frequently-changing content– Confusing interfaces
• Define audiences– Self-selected search users– People who have significant problems
• Work with content creators• Do the easy stuff first
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 13
Discovery and Indexing
• Index almost everything– Invest in understanding content– Find new valuable data– Avoid duplication
• Work with content creators– Encourage focused pages with titles
• Keep the index current– Update quickly in times of change
• Hide old stuff in archives
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 14
Improve Basic Searching
• Offer a search field in all navigation bars– Long search fields are best– Minimize complexity
• Default to keyword matching• Simplify search results pages
– Show intranet navigation– Provide a filled-in search box– Show match pages with context– Avoid clutter
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 15
Keep Search Metrics
• Number of searches per day / week / month– Correlate with corporate trends
• Percentage of frequent queries– Should go down if navigation improves
• Problems – No-matches– Server errors
• Audience information
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 16
Search Log Analysis
• What people are looking for – What words do they use?– Are they getting good results?
• What they click on– Candidates for search suggestions (best bets)
• Improve taxonomy & controlled vocabulary• Analyze search and information architecture
– Search default to "match all words"?– Add high-level navigation link?
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 17
Continue Intranet Discovery
• Track new content• Use APIs, including Web Services
– Index CMSs and other data stores
• Deal with date problems• Linguistics
– Character set recognition and correct tokenization– Language recognition
• Document attributes• Stemming
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 18
Security and Access Control
• Be careful what you index– Reverse-engineering via search
• HTTPS for showing SSL results
• Access control & authentication
• Search security design– Entire engine / index– Collection security– Hit-level (document) access control
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 19
Simplify Searching
• Minimal query expansion– Stemming (light pluralization)– Explain anything
• Offer options don't force them– Search suggestions (Best Bets)– Synonyms (can get 20% usage)– Spell-checking (can get 15% usage)
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 20
Sometimes, Advanced Search Works
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 21
Relevance Ranking: KISS
• Keep It Simple– No complex algorithms – Start with basic query word matches
• Use Heuristics– Exact phrase match in title is usually best– Phrase matches are good– Metadata matches are good– Take advantage of intranet IA, taxonomies– Leverage human judgment
• Transparency: mark match terms in context
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 22
Improve Results Page Layout
• Should fit with look and feel of intranet• Navigation• Search Results Header
– Search field– Number and type of matches– Results navigation
• Search Results Items– Use whatever content you have– Provide context for result
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 23
Problem Results Page
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 24
Better Results Page
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 25
Why Searches Fail
• Vocabulary mismatch
• Spelling errors
• Wrong scope
• Empty search
• Query requirements not met
• Software problems
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 26
Dealing with Search Failure
• Improve the no-matches page– Standard design and navigation links– Display a search field– Describe contents covered on site and search– Link to specialized search engines
• Log analysis– Track frequent failures– Add synonyms, suggestions or intranet content
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 27
Unhelpful No-Matches Pages
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 28
Better No-Matches Page
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 29
Search Engine Software Requirements
• Flexible and configurable indexer– Integration and import modules for data sources– Current file formats (e.g. Acrobat 6)
• Good defaults for interface, retrieval and relevance
• Override default settings• Security & access control• Admin interface• Logging and analysis tools• Scalable
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 30
Search and Information Architecture
• IA: the art and science of organizing and labeling information
• Search provides ad-hoc access, reduces the need to organize everything perfectly
• Search can take advantage of IA– Less duplication and overlap– Fewer gaping holes in coverage– Controlled vocabulary– Labels can explain search results
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 31
Search and Taxonomies
AKA ontologies, cataloging, categorization, classification, directories, hierarchies
• Taxonomy: organizing information into levels of named categories, like Yahoo!
• Vital to navigate within large data sets• No such thing as a finished taxonomy
– A resource-intensive challenge– Language and requirements change
• Multiple topic areas, multiple taxonomies
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 32
Search & Taxonomy Work Together
• Search – Crosses categories– Supplements drill-down– Handles non-standard vocabulary
• Taxonomy Categories– Create subset for precise search– Provide valuable context in search results
• Refer to the same controlled vocabulary
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 33
Future Discovery & Indexing Tools
• Integration with CMS and DMS
• Metadata– Entity Extraction– Date extraction and tracking– Other facets
• Automatic Chunking– Topical sections of long documents
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 34
New Tools for Better Search
• Grouping results by location
• Faceted Metadata Search / Browse– Expose available structure– Allow users to drill down intelligently
• Federated search– Search across multiple engines– “Best Source” problem
• Personalization - user control
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 35
In-Depth Research
• Medical diagnosis• Scientific articles & experiments• Investment• Business intelligence• Market research• Patent searches• Journalism, sociology, history• Politics and current events
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 36
Research Requirements
• Full recall - everything on a topic• Organize results• Save searches• Understand topic within context• Find the experts• Revise and extend queries• Share knowledge• Get alerts for new information
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 37
Tools for Better Research
• Federated searching– Research and purchased reports– Databases and email archives– News, RSS and other information streams
• Complex query-building
• Visualization
• Networking
• Collaboration
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 38
Checklist for Intranet Search
• Keep researching user needs• Provide wide coverage in the index• Make the search field ubiquitous• Keep it simple and fast• Tune relevance ranking• Take advantage of IA and taxonomies• Offer suggestions• Usable results and no-matches pages• Search log analysis for continuous improvement
Intranets 2004 / © Avi Rappoport, Search Tools Consulting www.searchtools.com 39
Apply the Right Tools
• Simple search for the wide intranet– Rich indexing– Leverage metadata– Solve common problems– Tune for employee needs
• Research tools when appropriate– Concepts and topics– Visualization– Networking