usability issues in metasearch interface design: persectives of an information provider lita human...
TRANSCRIPT
Usability Issues in Metasearch Interface Design: persectives
of an information provider
LITA Human Machine Interface Interest Group
June 25, 2004
Oliver PeschChief Strategist
EBSCO Information [email protected]
Overview
Customer service or denial of service Protecting copyright Being included in results/removal of
duplicates The usage statistics challenge Investing in products The hope for standards
Metasearch casts a wide net Search widely, bring back results and look for the
most relevant ones and present to the user For the information provider
Searches are less targeted System load increases by multiples Serendipitous discoveries of relevant content
For the end user Relevant sources searched automatically Serendipitous discoveries of relevant content Can work like Google Can work like Google
Denial of service attacks?
A denial of service attack when a web-site or series of web sites intentionally or unintentionally sends an extremely high volume of requests to a web-based service such that the service is spends all its resources responding to these requests and cannot accommodate requests of normal users.
Searching without Metasearch
user
Resources
-Product 1-Product 2-Product 3…
EBSCO OCLC
With Metasearch engine
EBSCO OCLC
user
Metasearch
-Search All -Business-Medicine…
Being included in results
Metasearch engine (as configured by library) decides what gets searched and how results are displayed
Are the most relevant items always presented first? One approach to determine position is speed of
response. A good measure of relevance? Another is to attempt to score the results Consistent ranking of relevance is not easy The risk of not being included is real
Relevance calculations…
Typical calculation considers Number of words from search appearing in
document (higher number higher relevance)
The number of times these words appear(higher number higher relevance)
The size of the document (longer document less relevant)
The number of times the words appear in the database
(higher number less relevance)
Relevance calculations…
Some vendors may also consider… Proximity of search words in document
(words in closer proximity higher relevance) The fields in which the search words appear
(words found in important fields higher relevance) The number of forward references pointing to the
document(more references to document higher relevance)
Usage(the more times others read document higher relevance)
Other attributes of the item, such as peer-review
Relevance calculation…
Not one standard for calculating scores Scores may be represented as
Percentages, or Raw scores
Not all information providers return the scores When they do, the scores from different
information providers cannot be considered normalized
Metasearch engine must rank results
Metasearch cannot rely on scores returned Data not available for optimized calculation Option is to retrieve metadata and possibly
documents from information provider to calculate
Impact on information provider: More system overhead from increased retrievals Usage statistics can be skewed
Deduping results
When the same article is found in more than one sources Which one is picked? Are all sources attributed? Who/what influences the decision?
Abiding by content agreements
Information provider is responsible for upholding agreements with copyright holders
Requirements for display may include Copyright statements Specific language Enforcing specific restrictions Integrity of presentation
Copyright holders may question display of their content through another interface
Usage statistics challenge
What can be affected by metasearch? Session counts Search counts Article retrieval counts
Why? “Search all” option or automatic selection/search of many
resources Perform simultaneous activities Pre-fetching data for purpose of ranking Optimization techniques
Searching without Metasearch
EBSCO OCLC ProQuest OVID
user
Resources
-Product 1-Product 2-Product 3…
Visits = 1Sessions = 2Searches = 2
Visits = 1Sessions = 2Searches = 2
With Metasearch engine
EBSCO OCLC ProQuest OVID
user
Metasearch
-Search All -Business-Medicine…
Visits = 1Sessions = 20Searches = 20
Visits = 1Sessions = 20Searches = 20
Visits = 1Sessions = 20Searches = 20
Visits = 1Sessions = 20Searches = 20
With Metasearch engine
EBSCO OCLC ProQuest OVID
user
Metasearch
-Search All -Business-Medicine…
Visits = 1Sessions = 28Searches = 28
Visits = 1Sessions = 28Searches = 28
Usage statistics challenge
Metasearch engines can cause inflation of session, search and possibly article retrieval statistics
Not all activity for a given customr will be through a metasearch engine
Unless metasearch activity can be isolated, overall statistics lose meaning
Investing for differentiation
Information providers invest to differentiate their product by Enhancing the data Enhancing the interface to improve precision
Metasearch access often results in a lowest common denominator approach
Enhancements to data and interface have little effect
Do these investments help metasearch users?
Impact on information provider
Institutions pay money for information One measure of effectiveness is usage Results must be shown to be accessed Investments to enhance data or access tools
may not be cost effective Agreements with copyright holders may be
challenged Loss of control of the user experience is a
real concern
Adapting to change
Products are developed based on market needs
If the market needs metasearch, we will develop our products meet that need
Standards initiatives become vitally important to allow information providers to regain some control
Importance of standards
Optimization of computing resources Isolate Metasearch traffic Minimize work to present data
Tailor responses to metasearch needs Provide metadata for ranking and deduping Provide copyright information
Include URLs to facilitate seamless access to content
Provide appropriate “viewer” for content More accurately represent usage statistics
Thank you