© 2012 deep web technologies, inc. swetswise searcher powered by explorit research accelerator by...

44
© 2012 Deep Web Technologies, Inc. Swetswise Searcher Powered by Explorit Research Accelerator By Abe Lederman President and CTO Copenhagen, Denmark 11 June 2012

Upload: alban-nichols

Post on 18-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

© 2012 Deep Web Technologies, Inc.

Swetswise SearcherPowered by Explorit Research Accelerator

By Abe LedermanPresident and CTO

Copenhagen, Denmark11 June 2012

© 2012 Deep Web Technologies, Inc. 2

About Deep Web Technologies...

• Founded by Abe Lederman in 2002–A co-founder of Verity, acquired by Autonomy–BS & MS Degrees in Computer Science from

MIT–25 years experience in Information Retrieval

• 20 person company based in Santa Fe, New Mexico

• Over $5M in DOE SBIR Grants (2002-2011)

• Pioneer/trailblazer in federated search

© 2012 Deep Web Technologies, Inc. 3

Customers Include...

Academic:• Stanford University• George Mason University• Texas Medical Center• University College of Cork• Tennessee Community

College Consortia

Public Portals:• WorldWideScience.org• Science.gov• Biznar• Mednar• ScienceResearch.com

Government:• Defense Technical Info

Center (DTIC)• Office of Sci. & Tech. Info

(DOE-OSTI)• UNECA• European Space Agency

Corporate:• Boeing • BASF• Intel• HP• P&G

© 2012 Deep Web Technologies, Inc. 4

What is the Deep Web?

The Deep Web is a collection of internet information sources that are generally not accessible to web spiders or crawlers and can not, therefore, be indexed for search by popular search engines such as Google, Yahoo! or Bing (the Surface Web).

It is estimated that there is more than 500 times more content in the Deep Web than the Surface Web.

© 2012 Deep Web Technologies, Inc. 5

What is “Federated Search”?

“Federated Search is an application or service that allows users to submit a real-time search in parallel to multiple, distributed information sources and retrieve aggregated, ranked and de-duplicated results.”

© 2012 Deep Web Technologies, Inc. 6

Public WebSources

One Search, Many Sources

Blogs

eBooks

Enter Your Search… Begin Search

OPACs

Internal Databases

Journals

Wikis

SubscriptionSources

© 2012 Deep Web Technologies, Inc. 7

Why Federated Search? 4 Big Reasons…

1. Provides greater efficiency than searching sources one by one

2. Returns the most current information because sources are searched in real-time

3. Eliminates learning disparate publisher interfaces

4. Simplifies discovery of the most relevant results

© 2012 Deep Web Technologies, Inc. 8

Best Science-Focused Engines

5 of 9 created by DWT

Science.govWorldWideScience.orgScienceResearch.comScienceAcceleratorScitopia.org

© 2012 Deep Web Technologies, Inc. 9

Science.gov (2002)

© 2012 Deep Web Technologies, Inc. 10

WorldWideScience.org (2007)

© 2012 Deep Web Technologies, Inc. 11

Science Accelerator (2006)

© 2012 Deep Web Technologies, Inc. 12

ScienceResearch.com (2005)

© 2012 Deep Web Technologies, Inc. 13

Scitopia.org (2007-2011)

© 2012 Deep Web Technologies, Inc. 14

Presentation available at: www.deepwebtech.com/ala2011.ppt

© 2012 Deep Web Technologies, Inc. 15

• It is too slow• Connectors break• Brings back too few results from

each source• Brings back too many results• Unable to rank results well (meta-

data differences, lack of info)

Federated Search Has Gotten a Bad Reputation

© 2012 Deep Web Technologies, Inc.

SW Searcher vs. Discovery Services

SwetsWise Searcher Discovery Service

Real-time search of multiple collections

Multiple collections are indexed to one database

Initial results returned in 3-4 seconds – Remaining results incrementally returned in up to 30 seconds

Results returned within 1-3 seconds

New results are available as soon as on publisher’s site

New results are available only after re-indexing

Searches full text where possible

Mostly indexes just metadata

Search any collection regardless of publisher

Search only collections the service subscribes to

© 2012 Deep Web Technologies, Inc. 17

Drawbacks of Discovery Services

• Lack of transparency of what’s in Service

• Incomplete coverage of publisher content

• Lag between when content appears on publisher site and when available on Discovery Service

• Normalized metadata loses content source-specific metadata

• Content in Service limited by relationships, content of general interest

© 2012 Deep Web Technologies, Inc. 18

Landscape is Not So Clear

• Summon (ProQuest)– Discovery Service

• EDS (EBSCO)– Discovery Service + Federated Search

• WorldCat Local (OCLC)– Discovery Service + Federated Search

• Primo (Ex Libris)– Discovery Service + Federated Search

• Encore Synergy (Innovative Interfaces)– Limited Discovery Service + Federated Search

• Explorit (Deep Web Technologies)– Federated Search

© 2012 Deep Web Technologies, Inc. 19

When Should You Choose Federated Search?

• Access to up-to-date information is important.

• You want control of your sources.• You want to search internal/non-

mainstream sources• Your research is specialized (ex. Medical

and legal)• You have a wide range of subscribed

content (ex. EBSCO and ProQuest)

© 2012 Deep Web Technologies, Inc. 20

Partners since January 2010

© 2012 Deep Web Technologies, Inc. 21

Major Advantages of SwetsWise Searcher

• Rich, easy-to-use interface• Incremental display of results• Sophisticated connector technology• Retrieve 50-100 results or more per

source• Relevance ranking• Smart clustering• Alerts and Search Builder• Metrics

© 2012 Deep Web Technologies, Inc. 22

Easy-to-use Interface

Simple Search Box–One-Search, “Google-like” box

–Can be embedded in your home page, blog or intranet.

© 2012 Deep Web Technologies, Inc. 23

Advanced Search Page–Unlimited categories (sources can be in multiple categories)

–Select sources to search–One or Two columns–Fielded Searching–Boolean Searching

AND, OR, NOT

© 2012 Deep Web Technologies, Inc. 24

Incremental Results

© 2012 Deep Web Technologies, Inc. 25

Connectors: Think “Connections”

Connectors make it possible to talk to other data sources

–Each source is unique so connectors “normalize” a query

–Submit proper authentication to sources

–Extract the right results

–Parse results to display the data

© 2012 Deep Web Technologies, Inc. 26

Connector Monitoring

• Proactively monitor connectors

• Monitor: source health, speed, responsiveness and errors

• Evaluated by dedicated software maintenance engineers

• Generally errors are discovered by our team before users ever notice a problem

© 2012 Deep Web Technologies, Inc. 27

Relevance Ranking

• Occurance of search terms within titles & snippets

• Assigning weight to sources

• More current reults are assigned greater weight

Read: “Ranking: The Secret Sauce for Searching the Deep Web”

© 2012 Deep Web Technologies, Inc. 28

Clustering

• Real-time semantic analysis of results creates clusters on-the-fly.

• Discover relationships behind the results, not just “keywords.”

Read: “Clusters That Think”

© 2012 Deep Web Technologies, Inc. 29

Alerts–Delivery online or via email–Daily, Weekly, Monthly–Pick and choose your sources

–Export to RSS reader–Maintain database of past results

© 2012 Deep Web Technologies, Inc. 30

Search Builder–Create search pages easily

–Choose collections and search fields

–Integrates with Course Management Software

–Embed search box using built-in widget

© 2012 Deep Web Technologies, Inc. 31

SwetsWise Searcher Metrics

• Graphics-based or tabular• Single day (hourly breakdown) or entire

month• Downloadable to spreadsheet• Reports include:

–Number of queries run–Number of results retrieved per source

–Average time to retrieve results from a source

–Average rank of results retrieved per source

–Timeouts/errors by source–Searches run (query strings)–Clickthrough stats

© 2012 Deep Web Technologies, Inc. 32

© 2012 Deep Web Technologies, Inc.

Deep Web Technologies hosts the application

Client hosts the application

Technical support through Deep Web Technologies

Client IT staff must support application

Deep Web Technologies can access application at any time

Deep Web Technologies has limited or no access to the application

Deep Web Technologies monitors and maintains connectors

Deep Web Technologies monitors and maintains accessible connectors

Limited or no ability to access internal sources

Can access internal sources

Hosted vs. Installed Solutions

Hosted Installed

© 2012 Deep Web Technologies, Inc. 34

Multilingual WorldWideScience.org

© 2012 Deep Web Technologies, Inc. 35

WorldWideScience.org is an Excellent Candidate for

Multilingual Search• A global gateway to international science

databases and portals

• All content is from national governments or vetted by national governments

• Developed in partnership with the DOE Office of Scientific and Technical Information (OSTI), WWS Alliance and Microsoft Research

• One-stop searching

• Includes databases from China, Japan, Korea, Germany, and other non-English countries

© 2012 Deep Web Technologies, Inc. 36

How Multilingual Federated Search Works

Ranked resultstranslated by Microsoft to user’s language

Results returned to user

EXPLORIT

Microsoft Translator

German

Chinese

Russian

Queryin user’s language

Ranked resultsin user’s language

Queryto be translatedfor each source

Queryin source’slanguage

Foreign language

search engines

Resultsin source’slanguage

Ranking

© 2012 Deep Web Technologies, Inc. 37

© 2012 Deep Web Technologies, Inc. 38

Coming in the Fall

• Visualization• Full-Faceted Navigation• Mendeley Integration • Document Type and

Document Format Clusters• Full Text Filter

© 2012 Deep Web Technologies, Inc. 39

Visualization

Using our clustering technology, results visualization allows users to see relationships between topics easily.

© 2012 Deep Web Technologies, Inc. 40

Mendeley

© 2012 Deep Web Technologies, Inc. 41

Document Type and Document Format Clusters

© 2012 Deep Web Technologies, Inc. 42

Full Text Filter

Access Full Text!

© 2012 Deep Web Technologies, Inc. 43

Future - Mobile Searching

© 2012 Deep Web Technologies, Inc. 44

Thank you!

Abe Lederman [email protected]