federated search in a disparate environment
TRANSCRIPT
June 4, 2009June 4, 2009
Federated Search in a Disparate Environment
PREPARED FOR:PREPARED FOR:Gilbane San FranciscoGilbane San Francisco
8403 Colesville Road Silver Spring Metro Plaza 2Suite 400Silver Spring, MD 20910
301.588.5900301.588.0390
[email protected] www.macf.com
Helen L. Mitchell CurtisHelen L. Mitchell CurtisSenior Program Director, Enterprise Senior Program Director, Enterprise
SolutionsSolutions
2
BiographyBiography
Helen L. Mitchell Curtis – Senior Program Director of Enterprise Solutions, Macfadden
• 32+ years at FDA, and led one of the largest enterprise search implementations among Civilian Federal Agencies
• Develop enterprise-wide search strategies & solutions• Integrate search technologies across IT applications and disparate
document repositories• Build governance, management and end user buy-in• Promote collaboration, standards, findability and improved
organization of data and document assets• Passion – to help clients to reduce costs, improve quality and
efficiency, reduce 'pain points' and achieve a positive search experience
3
About MacfaddenAbout Macfadden
• Founded in 1986 as a small disadvantaged entrepreneurial company-graduated SBA 8(a) in 1998
• Became 100% employee-owned in 2007, S-Corporation
• Acquired Systems Integration Group, Inc. and Total Security Services International, Inc. (TSSI) in 2008
• 225 employees; projected 2009 annual gross revenues $40 million; $135M in contract backlog; 90% prime contracts; (TSSI sole wholly-owned subsidiary)
CAPABILITIES:•Enterprise Search Solutions•Integrated IT Solutions & Security•Counter Terrorism Planning•Disaster Response Management•Threat & Vulnerability Assessment•Program/Project Management•Intelligence Gathering & Analysis
FAST X10 Partner
Microsoft Certified Partner - Information Worker Solutions with Search Specialization Competency
4
Clarify TermsClarify Terms
1. Definition by AIIM Market IQ2. Definition by CMS Watch3. A Federated Search Primer – Part II4. Deep Web Technologies
5
Findability IssuesFindability Issues
• AIIM Market IQ Research on Findability (of 528 end users):• 50% believe Findability in their organization is “Worse to Much Worse”
than their consumer-facing web sites• 49% have no formal goal for Enterprise Findability within their
organizations• 49% “Agreed or Strongly Agreed” that finding the information to do their
job is difficult and time consuming• 69% believe less than 50% of their organization's information is
searchable online• 36% reference five or more systems in any given week
Source: AIIM Market Intelligence, 2008
6
Why Use Federated SearchWhy Use Federated Search
1. To increase findability so users can accomplish their business objectives
2. To access multiple content sources through a common search interface
3. To increase user awareness of all content sources4. To eliminate using multiple database search
protocols and passwords5. To access public or subscription search sites6. To search the deep web for scientific, technical and
business content 7. To reduce search time and display results in a
common format
7
Federated Federated ‘‘Master IndexMaster Index’’ SearchSearch
• Index content from multiple data sources into a single master search index
• Queries & results come from that one master index• Many Enterprise Search products integrate FS via ‘connectors’ to
accomplish this (ex., FAST, Autonomy, Endeca)
Source: New Idea Engineering, Inc.
8
Federated Federated ‘‘Data SilosData Silos’’ SearchSearch
• ‘Search federator’ process queries each data source silo• Transforms the users search terms to match each content source's
requirements• Submits the query to each of the sources simultaneously• Merges each source’s results together - a single look and feel• Maintains no indices of its own, relies upon the capabilities of all
the linked systems
Source: New Idea Engineering, Inc.
9
Surface vs. Deep Web SearchSurface vs. Deep Web Search
Deep Web FS Examples:www.completeplanet.com ‐70,000+ searchable DBs & specialty search engineswww.science.gov‐federates U.S. federal agency science informationhttp://imlsdcc.grainger.uiuc.edu/‐ Institute of Museum & Library Services (IMLS) ‐ Digital Collections & Content w/descriptions of digital resources developed by IMLS grantees
Source: Juanico-Environmental Consultants, Ltd.
10
Vertical Search EngineVertical Search Engine
• Closely related to Deep Web – searches for a particular niche i.e., a specific industry, topic, type of content (e.g., scientific research, travel, movies, images, blogs)
• Example: www.vetseek.info - is a search engine focusing on veterinary science and related topics
11
ChallengesChallenges
• Authentication• Showing each record’s branding and copyright information• Licensed or subscription databases
• True De-duplication• Virtually impossible because DBs return 10-20 results at a
time• Vendors usually just de-duping the first results set
returned
• Security• Mapping user credentials and access rights to each
repository security model
• Speed• Limited by slowest search engine’s performance
12
Challenges Challenges (continued)(continued)
• Lack of data standardization• Each source has a unique access method & needs
translation• Metadata mapping between FSS and underlying systems
• Access methods to sources may change• Requires an interface rewrite or modification
• Rules for error handling • Ex. Query term not available—exclude the query, the
repository, or proceed without the term?• Ex. Timeouts or connection problem
• Complex searches usually not available• Fielded searches
13
Challenges Challenges (continued)(continued)
• Relevancy scores• Can’t identify a single relevancy ranking model
• Relevancy rankings for repository’s results refers to its own• May be not be useful when comparing the results with
those from another system
• Access to content stored in a variety of places
• Results page may not let user obtain identified documents• This may involve a built-in viewer or invoking the owning
product’s interface.
• Combining navigators from each result set• i.e., faceted search, taxonomies and auto-generate
clusters
• Selecting the right FS engine• Depends on business goals, type of content sources –
structured vs. unstructured, licensed/subscriptions
14
BenefitsBenefits
• Single master index• Quicker response times• No need to access original data sources• Relevancy algorithms applied uniformly• Dynamic navigators are available for all documents
• Time savings• Searches many sources at one time• Combines results into a single results page
• Quality of results• Client selects the sources to search
• Minimum impact on the data silos • Only accessed when a user performs a query
• Eliminates increased load crawling/indexing the data source
15
Benefits Benefits (continued)(continued)
• Improve productivity• Reduces number of searches executed to find relevant results• Save, reuse, schedule, and even share effective search queries
• Leverage security controls at queried source• Access repositories secured against crawls but can be accessed by
search queries
• Reduce costs• No additional capacity requirements for content index since its not
crawled by search server
• Most current content• As soon as the source is updated, the info is available to the searcher
on the very next query
• Increase awareness• Identify most relevant sources to search based on # of results each
source produced
16
FDA Case Study SuccessFDA Case Study Success(Federated (Federated ‘‘Master IndexMaster Index’’ Search System)Search System)
ACTIONS RESULT
Started small with high ‘pain points’
Increased productivity & popularity
Modified business processes*
Standardized nomenclature increased efficiencies
Users across organization could find content in silos
Produced more timely and QUALITY work products
Indexed structured & unstructured content repositories with document level security
Grew from 1 repository of 500 documents to 50 repositories with 30+ million documents & data. Users access based on ‘need to know’.
Introduced standardized search web services into applications
Decreased development time and costs, increased management & user acceptance, integrated in more applications
Increased user awareness through training, newsletters and meetings
Used more & content added. Search requirements gathered at BEGINNING of project development.
17
FSS ExampleFSS Example(uses FAST ESP (uses FAST ESP –– Vertical Search)Vertical Search)
18
FSS ExampleFSS Example(uses MS & (uses MS & VivisimoVivisimo))
19
FSS Example FSS Example (uses (uses WebfeatWebfeat))
20
Best PracticesBest Practices
21
Future VisionFuture Vision
22
Future Vision Future Vision (continued)(continued)
23
ResourcesResources
• Great source of info on many Federated Search topics: www.federatedsearchblog.com – Author: Sol Lederman
• List of Open Source & commercial search components & tools: http://www.searchcomponentsonline.com/federated-search-vendors.html
• List of many Deep Web Databases: http://www.noodletools.com/debbie/literacies/information/5locate/advicedepth.html
• Info on the Deep Web: http://www.internettutorials.net/deepweb.asp
• Some Digital Image Resources on the Deep Web: http://www.readwriteweb.com/archives/digital_image_resources_on_the_deep_web.php
• Info on Vertical Search Engines:http://www.altsearchengines.com/category/verticals/
• 50 Niche Search Engines: http://www.accrediteddldegrees.com/2008/50-niche-search-engines-that-will-make-your-everyday-life-easier/
• Library of Congress list of FS Portal Products & Vendors: http://www.loc.gov/catdir/lcpaig/portalproducts.html
• 99 Resources to Research & Mine the Invisible Web: http://www.collegedegree.com/library/college-life/99-resources-to/
24
ReferencesReferences
• “What’s in a Name: Federated Search” – By Miles Kehoe, New Idea Engineering, Inc. - Volume 4 Number 4 - August 2007
• “Federated Search Engine Article” - Online (Weston, Conn.) 28 no2 16-19 Mr/Ap2004 (Reprint of article by Donna Fryer www.SearchitRight.com )
• “Growing Up With Federated Search” - by Walt Warnick, OSTI • “Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network:
Part 3” – Walt Warnick, OSTI• “Vertical Search Engines & the Deep Web” - Laura B. Cohen
http://www.internettutorials.net/• www.federatedsearchblog.com – by Sol Lederman • “Exploring a ‘Deep Web’ that Google can’t Grasp” - NYT 2-23-09
http://www.nytimes.com/2009/02/23/technology/internet/23search.html?_r=1&ref=business
• “Federated Search Primer, Part I-III” – by Sol Lederman• www.searchdoneright.com – by Vivisimo –Raoul – CEO & Cofounder• “Enterprise Search Grows Up’”- Podcast from BizTalk• “Federation: Big Need, Still a Challenge” – Stephen Arnold, 4/25/08• “The Future of Federated Search or What Will the World Look Like in 10 Years”
– Rich Turner
25
THANK YOU!
Helen L. Mitchell CurtisSenior Program Director, Enterprise Solutions
240-247-1946 (w)240-743-7975 (m)
25
26
MACFADDENMACFADDEN
Delivering Results. Exceeding Expectations.Delivering Results. Exceeding Expectations.