integration issues imt 589 february 4, 2006. 2/4/2006imt 589-applied and structural metadata2
Post on 22-Dec-2015
216 views
TRANSCRIPT
Integration Issues
IMT 589February 4, 2006
2/4/2006 IMT 589-Applied and Structural Metadata 2
2/4/2006 IMT 589-Applied and Structural Metadata 3
Solving the Integration Problem
Need to find a way to bring multiple metadata schemas togetherExamples from readings this week show several approaches
Tannenbaum has examples of standalone and distributed repositories, metadata interexchange, and enterprise portals
Hunter discusses mapping elements through a shared term thesaurus
Bedford talks about metadata integration at the World Bank
After looking at some lessons from Bedford, we’ll discuss in more depth the MSWeb example in Rosenfeld and Morville’s article
2/4/2006 IMT 589-Applied and Structural Metadata 4
Enterprise Architecture Basics
Design your Enterprise Architecture to support your goalsEnterprise implies integration and contextHigh level reference model must take into account the following Functional Architecture Technical Architecture Content Architecture Presentation Architecture
From Bedford, Denise. Presentation to American Society of IndexersAnnual Conference – Arlington Virginia – May 15, 2004
2/4/2006 IMT 589-Applied and Structural Metadata 5
Facilitate integration and Facilitate integration and repurposing of contentrepurposing of content
- Provide broad search and retrieval capabilities
- Increase reuse and decrease redundancy across content providers
Increase the value and quality Increase the value and quality of contentof content
- Build intelligent relationships among disparate content sources using concepts and metadata
- Define, enforce, monitor processes/procedures on content collections to ensure quality
Consistent information security Consistent information security and disclosure enforcementand disclosure enforcement
- Bank records must be consistent in order to facilitate disclosure policy compliance and information sharing for partners
Simplify and complete the Simplify and complete the content life-cyclecontent life-cycle- Reduce the number of user-facing content entry points by using already existent business processes- Manage content end-to-end from initial inception to final disposition
What are the Goals of the World Bank Enterprise Architecture?
From Bedford, Denise. Presentation to American Society of IndexersAnnual Conference – Arlington Virginia – May 15, 2004
2/4/2006 IMT 589-Applied and Structural Metadata 6
The ECA TaxonomyView
Thesaurus
Topics Language
From Bedford, Denise. Presentation to American Society of IndexersAnnual Conference – Arlington Virginia – May 15, 2004
2/4/2006 IMT 589-Applied and Structural Metadata 7
Bank Metadata – Purpose & Taxonomies
Agent Country Authorized By
Record Identifier
Title Region Rights Management
Disposal Status
Date Abstract/ Summary
Access Rights
Disposal Review Date
Format Keywords Location Management History
Publisher Subject-Sector- Theme-Topic
Use History Retention Schedule/ Mandate
Language Business Function
Disclosure Status Preservation History
Version Disclosure Review Date
Aggregation Level
Series & Series #
Relation
Content Type
Identification/ Distinction
Search & Browse
Use Management Compliant Document Management
Flat Taxonomy Hierarchical Taxonomy
Network Taxonomy
Faceted Taxonomy
From Bedford, Denise. Presentation to American Society of IndexersAnnual Conference – Arlington Virginia – May 15, 2004
2/4/2006 IMT 589-Applied and Structural Metadata 8
Agent Country Authorized By
Record Identifier
Title Region Rights Management
Disposal Status
Date Abstract/ Summary
Access Rights
Disposal Review Date
Format Keywords Location Management History
Publisher Subject-Sector- Theme-Topic
Use History Retention Schedule/ Mandate
Language Business Function
Preservation History
Version Aggregation Level
Series & Series #
Relation
Content Type
Identification/ Distinction
Use Management Compliant Document Management
Human Capture
Inherit from Structured Content
Programmatic Capture
Inherit from System Context
Extrapolate from Business Rules
Search & Browse
Metadata Capture Methods
From Bedford, Denise. Presentation to American Society of IndexersAnnual Conference – Arlington Virginia – May 15, 2004
2/4/2006 IMT 589-Applied and Structural Metadata 9
Search As a Service
A project to formalize and productize MSWeb offerings in the enterprise search arenaIncluded search, metadata support, search metrics, and optional UIAlso included search and vocabulary management tools, formal documentation and processes for customer support and change controlThe first step toward an object-based portal
2/4/2006 IMT 589-Applied and Structural Metadata 10
Metadata in Search
Indexing
User
Other Users
Query Preprocessing
Result Set Manipulation
Searching Index(es)User
Interface
Indexer
Independent Metadata
Data Stores
Data Analysis
Index Metadata
database schemasthesauri
file systemhttpmessaging storesDocument storeDatabasesDirectory stores
string manipulationsynonym sets &thesauristemmingwordbreaking
adaptive crawlingword breakingword stemmingNLP
dedupingconcatenationranking
Result Refining
User Metadata
2/4/2006 IMT 589-Applied and Structural Metadata 11
Preliminary Architecture
Organizations
Audience
Core vocabulary repositoryNon-core repositories
Metadata Tag Set Registry
Tag Set 1 Tag Set 2 Core Tag Set Tag Set 3 Tag Set 4 Tag Set 5
Registry canbe used tosegment
results throughmatches to
predeterminedtags and
vocabularies
results can besegmented
throughpulldownmenus
exposing tagsets
intermediatedisambiguation
of searchterms
vocabulariescan be
exposed basedon user-defined
metadata,allowing
customizedviews ofcontent
EmbeddedSearch
ExplicitSearch
Browsing
Title
Author
Category
Audience
Location
Title
Group
Term List
term a (AT)
term b (AT)
term c (ET)
term d (ET)
term e (AT)
term f (ET)
term g (ET)
term h (AT)
term j (AT)
term i (AT)
Subject
Org.
Geography
VocabularySet 2
Vocabularyset 1
Title
Author
Keyword
Org.
Product
Title
Author
Keyword
Org.
Geography
Title
Author
Keyword
Org.
Audience
Headline
Analyst
Subject
Service
Region
Organizationorg group
Locationnorth
americaunited stateswashington
seattle
term j
Developersorg 1.1
Editors
org 1.1
org 1.1.1
org 1.1.2
term j
2
3
4
6
1
911
8
Registry canbe used to
expose tags(and
vocabulariesthrough them)for authoring
tools
Tagging
12
7
org 1.1.1org 1.1.2
5
10
term k (ET)
org 1.5term j
8
88
There are three layers of possible interaction between vocabularies, metadata tag sets and the search/tagging interface. Examples of these interactions are shown in Figure 1, and described briefly below. Using a common data model for vocabulary construction allows the most powerful interactions between the layers; however, any vocabulary registering its metadata tag set can be exposed to a user at the time of search or browse.
Vocabularies using this repository can:
1) Reuse terms in other vocabularies.
2) Reuse terms with structure in other vocabularies.
3) Map entire vocabularies to metadata tags.
4) Map parts of vocabularies to metadata tags.
5) Expose their terms and synsets to non-core users.
Vocabularies not using this repository can:
6) Register metadata tags in the metadata registry.
7) Use terms and synsets from core vocabularies.
All vocabularies associated with metadata tags can:
8) Be linked to tags from other tag sets through registry mapping.
All registered metadata tag sets can:
9) Expose metadata tags and associated vocabularies for publishing tools.
10) Segment results in searches through matches to predetermined tags and vocabularies at time of query.
11) Expose metadata tags before search as an advanced user interface, or during search as an intermediate query refinement assist.
12) Expose vocabularies and tags in whole or in part as browsing structures for content (as in Yahoo!).
2/4/2006 IMT 589-Applied and Structural Metadata 12
Products Included
Search Query and results Best Bets Catalogs
Metadata Management Tools Metadata Registry Unified Catalog
Service
Vocabularies Core vocabularies Other vocabularies Categories I Need Tos
Showcase Support Documentation Demos, screenshots Code
2/4/2006 IMT 589-Applied and Structural Metadata 13
Quick Return
In order to show customers and users what could be done, the MSWeb team focused on creating a service that used editorial selection, tagging and the core vocabularies to create a small set of highly relevant content for MSWeb and sub-portals This was leveraged in: Navigation through categories Search results (Best Bets) Customizable versions of both for sub-portals
2/4/2006 IMT 589-Applied and Structural Metadata 14
What Did This Involve?
Less than 100 categoriesFewer than 1000 tagged surrogate records for high demand search contentThose 1000 records are used in categories, Best Bets, and other areas on siteTakes only a few hours a week to maintain and update databaseResults enabled users to directly navigate and search for high-use content
2/4/2006 IMT 589-Applied and Structural Metadata 15
Common Elements
2/4/2006 IMT 589-Applied and Structural Metadata 16
How It Works
From Peter Morville- http://semanticstudios.com/events/iacm1102.ppt
2/4/2006 IMT 589-Applied and Structural Metadata 17
Moving to the Enterprise
Once MSWeb had shown what was possible, the next step was to deploy to the sub-portals. To make this happen, they:
Turned user’s (whether an end-user or a sub-portal search page) query into an XML request which returns an XML response of search results.
Leveraged taxonomy work to provide a deep resource for search and browse
Enabled rapid customization by embedding parameters into XML schema
Provided consultation and assistance with category building and tagging skills
Result- everyone was on the same platform, and using core vocabularies
2/4/2006 IMT 589-Applied and Structural Metadata 18
How It Works
Vocabulary and Schema
Database
Site Server indexes
Search DLL
Modifiedstring
Searchresults
Input query
XML
XML
2/4/2006 IMT 589-Applied and Structural Metadata 19
Query String ParsingQuery: XML in
SQL2000
2/4/2006 IMT 589-Applied and Structural Metadata 20
MSWeb Search
2/4/2006 IMT 589-Applied and Structural Metadata 21
MSW All-Intranet Schema
Query type & schema(from query parser)
Results properties definition
Collection definition
Sort parameters
2/4/2006 IMT 589-Applied and Structural Metadata 22
Category Schema
Results properties definition
Collection definition (Category set)
Hierarchy display parameters
Sort parameters
2/4/2006 IMT 589-Applied and Structural Metadata 23
What This Does
Puts all processing on server side- client (the sub-portal server) just needs a few lines of code to pass and receive XML streamsClient (sub-portal server) site is insulated from code changesSimple parameter changes allow customization of collections, query type, indexed properties, etc.
2/4/2006 IMT 589-Applied and Structural Metadata 24
The Old Way
2/4/2006 IMT 589-Applied and Structural Metadata 25
The New Way
2/4/2006 IMT 589-Applied and Structural Metadata 26
NTServer
2/4/2006 IMT 589-Applied and Structural Metadata 27
ITGWeb
2/4/2006 IMT 589-Applied and Structural Metadata 28
WordTest
2/4/2006 IMT 589-Applied and Structural Metadata 29
PGPortal
2/4/2006 IMT 589-Applied and Structural Metadata 30
Measures for Success
For MSWeb, the goal was to increase user’s ability to navigate easily and find information more quicklyFor sub-portals, the goal was to have the ability to leverage MSWeb’s resources locally in a sustainable wayHere’s some results
2/4/2006 IMT 589-Applied and Structural Metadata 31
Results for End-UsersKey measure Q4 99 Q1 00 Q2 00
Total number of registered sites 834 858 808
Average # Best Bets returned with 20 top search strings 3.6 2.75 4.35
Modal # BB with top 20 1 5 1
Median # BB with top 20 2.5 3 3
Percentage of all top search strings that return Best Bets 69% 85% 98%
Percentage of 50 top search strings that return BBs 82% 84% 98%
Percentage of 20 top search strings that return BBs 90% 80% 100%
Number of all top search strings returning 10 or more Best Bets
18 12 5
Number of top50 search strings returning 10 or more BB 6 10 5
Number of top 20 search strings returning 10 or more BB 3 6 4
2/4/2006 IMT 589-Applied and Structural Metadata 32
User Satisfaction
Usability testing provided the following before and after numbers: A 62% reduction in the number of clicks An average of 16 seconds saved per
task An 11% increase in task success rate High employee satisfaction with the site
42% VSAT in year 2000 field survey Only 4% DSAT on same survey
2/4/2006 IMT 589-Applied and Structural Metadata 33
Portals Using MSWeb Services
In the first three months of the offering, 9 sub-portals implemented search on their sites2 of those created site-specific categories for their navigationAll leveraged the MSWeb Best Bets results in their custom searchNo increase in staff at MSWebEquivalent to a cost savings of 45 person years in avoided work
2/4/2006 IMT 589-Applied and Structural Metadata 34
What Worked?
Providing a clear example of taxonomy value in MSWeb search and navigationBuilding a taxonomy management tool that used an extensible data modelEmpowering portal owners to do their own management of site navigation, and separating from the core shared taxonomyDivorcing presentation from delivery through use of XML and XSLLeveraging the taxonomy through tools to support all of the above