finding stuff: -lsi and database searching- a business use case

25
1 Finding Stuff: -LSI and Database Searching- A Business Use Case Joe Tragert EBSCO Publishing Bentley June 26, 2006

Upload: mahdis

Post on 09-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Finding Stuff: -LSI and Database Searching- A Business Use Case. Joe Tragert EBSCO Publishing Bentley June 26, 2006. Overview. EBSCO Publishing overview Latent Semantic Indexing pros and cons Integrated diverse content types – the Executive Daily Brief use case - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

1

Finding Stuff: -LSI and Database Searching-

A Business Use Case

Joe TragertEBSCO PublishingBentleyJune 26, 2006

Page 2: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

2

Overview

EBSCO Publishing overview Latent Semantic Indexing pros and cons Integrated diverse content types – the Executive Daily

Brief use case Discovering obfuscated records – the US PTO example

Page 3: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

3

EBSCO Industries • Ranked #162 in Forbes “America’s Largest Private Companies” in 2005

Page 4: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

4

EBSCO Publishing Research & reference solutions

Corporate Medical Academic Public Library K-12

73 terabytes of content, configured into over 100 different proprietary full-text databases

Redistribute 100+ 3rd-party reference products Founded in 1987, 550 employees world wide, HQ in

Ipswich, MA

Page 5: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

5

Latent Semantic Indexing

Searching is focused on the words, not indices or metadata. The engine can be “trained” to optimize results by domain

(engineering, medical, general business, etc.) Engine creates a vector space based upon the data it sees.

All articles are placed within that vector space. Updates are quickly assigned values within the vector space,

enabling real-time integration of RSS feeds. Multiple data sources are integrated rapidly, requiring a few

hours to a few days.

Page 6: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

6

Conceptual Search: concepts are matched, not key words Easier to create searches by using chunks of text as search “terms” No need to understand thesauri or Boolean operators

Integrated Content: databases, blogs, RSS, etc. Multiple databases can be searched at once (similar to federated search, but different…) Since the words are searched, no need to normalize indices or record structures of source data sets

Real time content The engine can rapidly assign new content to the existing vector space, enabling integration of current

content with archival material Language agnostic

Since all content is converted to value in the vector space, multiple languages can be searched and returned in a single result list

LSI Advantages

Page 7: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

7

Precision: Matching concepts does not lead to the “one perfect article”

Multiple content types in one result set requires robust filtering and refining functionality, to minimize confusion

Default date order sorting can “overwhelm” a result list Multiple languages is seductive, but requires quality translator

feature to get best utility from the results Can be difficult for the “Google generation” to grasp the concept of

“concepts”

LSI Disadvantages

Page 8: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

8

Structured data: users tend not to care about meta data Currency is king: users tend to focus on “real time”

content (news sites, blogs) but periodicals can provide real value

Skills: not everyone is a librarian… actually, most aren’t Tools: slow to learn, slower to change Perspective: impatient with complexity

Why Use LSI?

Page 9: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

9

LSI Use Case: Customizable monitoring and alert service Supports non-librarian corporate uses: brand management, corporate intelligence,

general counsel, IP management, etc. Two types of Search

Content Analyst LLC’s patented Concept Search™ EBSCO’s keyword search

Multiple content types Premium business content (EBSCO structured content) Newspapers RSS feeds (blogs, news sites) Licensed databases (USPTO, INSPEC, etc.) Intranet repositories

Page 10: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

10

1. Users can set up folders, and monitor for content related conceptually (same meaning, but different words) to key words or article “examples” already in the folders

2. Users can search for immediate results that are related to words, articles, emails or external documents, using Concept Search or Key Word Search

3. Users can link to “advanced” key word search options, thesauri, and visual searching

Multiple Content Types and Search Methods

Page 11: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

11

• Users can add, delete or edit “alerts” (folders) as needed

• Users put words, phrases, paragraphs, full articles, emails, MS Word docs, etc. into the folders.

• EDB adds matches to the folders

• Results for a folder appear when the folder is selected

• Users can easily make a result into a “concept” (example) and put it into a folder

Folders Are Determined by End Users

Page 12: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

12

• The full text is viewed in a pop up window

• The user will link to the source (the article on EBSCOhost, news site, the RSS feed provider, licensed database or intranet file)

• Users can email, save, print the document, or add it to their folder as a new example to be monitored

Structured Content in Familiar Layout

Page 13: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

13

• Selected RSS articles are viewed in a pop up window

• The user links to the source

Linking to RSS Providers Simplifies Access

Page 14: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

14

Results Are Refined, Interactively

• Users can sort results by Date, Title, Publication and Relevance

• Users can narrow results by Publication or Content Type

• Users can delete previously read content, content of a specific relevance, or content published before a specific date

Page 15: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

15

• Users can set up email lists (groups and individuals) to automatically forward documents

• Users can set higher relevancy threshold for shared documents, vs. their own inbox (only send the “best” articles to colleagues

Alerts Controlled by End User

Page 16: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

16

LSI Use Case:

Find deliberately obscured patents Compare prior art to current research Monitor pending patents Search patents in native languages

USPTO European Patent Organization Japan Patent Office

Expose patent search to more staff Bench scientists Competitive intelligence Risk managers

Page 17: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

17

Sneak Peak: EBSCO Patent Monitor

In development – Fall 2006 release

Use Concept Searching to identify “conceptually related patents”

Enable cross-database searching Patents (various sources) Published STM literature Proprietary research & intranets

Page 18: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

18

Searching on “motorcycle” finds patents that do not include the term “motorcycle”

Page 19: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

19

Patent #6,085,857 does not contain the word “motorcycle”, but it sure looks like one…

aka: “motorcycle”

Page 20: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

20

Running a concept search on the patent abstract creates an ‘instant context list”

These terms are found in the USPTO database and relate to “saddle-type riding vehicles.” Users can search the USPTO database to find those patents, or they can research the individuals to see who else is an expert…

Page 21: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

21

The terms and names on the Instant Context list can indicate the true nature of the patent…

Shinobu Tsutsumikoshi is a developer at Suzuki...

Page 22: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

22

Search using press release on the new Maxim Knee System and get hundreds of related patents….

Page 23: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

23

US Patent #6,090,144 is about prosthetic knees even though the Maxim press release never used the term “prosthesis”

Page 24: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

24

Finding Stuff: The Dead Mouse Test

LSI, key words, proximity, etc… The real question is not which mouse trap

works better… …just did we kill the mouse?

Page 25: Finding Stuff:   -LSI and Database Searching-  A Business Use Case

25

Joe TragertDirector, Market Development

EBSCO PublishingO: +800-653-2726 ext. 661

E: [email protected]

Thank You