more powerful solr search with semaphore - jeremy bentley

29
Smartlogic TM Apache Lucene Eurocon Jeremy Bentley, CEO

Upload: lucenerevolution

Post on 06-Jul-2015

1.078 views

Category:

Technology


3 download

DESCRIPTION

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 Metadata is widely understood to be a critical element of search, discovery and classification. But with the preponderance of unstructured data addressed by search technology, consistent native metadata is often in short supply. Organizations often find that the quality and depth of contextual metadata -- what documents are about – can maker or break search relevancy, precision and recall. Semaphore is an enterprise semantic platform that uniquely captures an organization‘s subjects and topics into a taxonomy or ontology (model), in a manner that adds context for enhanced navigation and findability. Semaphore augments traditional information management systems like Solr search by adding advanced content classification, metadata and navigation capabilities to deliver a more complete, higher quality enterprise information management experience. This talk will focus on the following: Deep dive into the technical integration of Semaphore with Apache/ Solr (including the connection points between Semaphore and Solr) Discuss the Semaphore modules (Ontology Manager, Classification Server, Semantic Enhancement Server and Search Application Server) and how they provide better findability Share a demonstration of Solr in action Present a client case study (Nordyske).

TRANSCRIPT

Page 1: More Powerful Solr Search with Semaphore - Jeremy Bentley

Smartlogic TM

Apache Lucene Eurocon    

Jeremy  Bentley,  CEO  

Page 2: More Powerful Solr Search with Semaphore - Jeremy Bentley

1st degree of order

Filing management • 80% of enterprise information is unstructured • Doubling every 19 months and accelerating [Gartner] • Increasing burden of compliance • Enterprise 2.0 additions

Page 3: More Powerful Solr Search with Semaphore - Jeremy Bentley

2nd degree of order

Index management • File plans and metadata schema • Mono- hierarchical standardised taxonomies • Manually applied classification • Low level of consistency and quality

Page 4: More Powerful Solr Search with Semaphore - Jeremy Bentley

3rd degree of order Computerised 1st and 2nd degrees

Page 5: More Powerful Solr Search with Semaphore - Jeremy Bentley

A 10 year Flatline Expectation Gap

• 2001,  IDC,  “Quan5fying  Enterprise  Search”    Searchers  are  successful  in  finding  what  they  seek  50%  of  the  9me  or  less    

 

• 2011,  MindMetre/SmartLogic  More  than  half    (52%)  cannot  find  the  informa9on  they  need  using  their  Enterprise  search  system    

5  

Page 6: More Powerful Solr Search with Semaphore - Jeremy Bentley

Terabytes  o

f  data  

Source:  the  Na5onal  Archives  

The explosion of information

2001-­‐2009  1993-­‐2001  

?  4Tb  

80Tb  

20  5mes  increase  in  Informa5on  volume  

Page 7: More Powerful Solr Search with Semaphore - Jeremy Bentley

Search Gets Harder as Data sets Grow

   

 

7  

Circa  1996  

Page 8: More Powerful Solr Search with Semaphore - Jeremy Bentley

Different vocabulary and ambiguity You  Say   I  Say  

Moon  Buggy   Lunar  Roving  Vehicle  Manned  Lunar  Surface  Vehicle  

Swine  Flu   Swine  Influenza  Virus  H1N1  

Touchscreen   Touch  screen  Mul5-­‐touch  

You  Say   What  do  you  mean?  

Apple   A  fruit?  Fiona  -­‐  A  singer  /  songwriter?  An  electronics  company?  

Rights   Employment  rights?  Equal  rights?  Right  of  way?  

Ford   Ford  Motor  Forward  Industrials  (5cker=FORD)  A  shallow  river  crossing  

Missing results

Too many results

Page 9: More Powerful Solr Search with Semaphore - Jeremy Bentley

Drawbacks Apparent

1 Needle in the Haystack

2 Multiple search terms

3 Irrelevant results

4 Out of date results

5 Multiple media forms

6 Unrestricted geography

7 Inappropriate ads

Not So Apparent

8 Can’t filter, select subset

9 No related topics

10 Missing results

11 No context or guidance

12 Best resource not clear

ü  Time consuming ü  Inefficient ü  Ineffective

1  

2  

3  

4  

5  

7  

6  

Conventional Search - Ineffective, Frustrating, and Inadequate

Page 10: More Powerful Solr Search with Semaphore - Jeremy Bentley

Knowing what you have

Page 11: More Powerful Solr Search with Semaphore - Jeremy Bentley

Web Enterprise

Metadata effort High Low

Result Quality requirement

Low High

Paradox of Effort

Metadata  is  to  search,  what  pistons  are  to  a  petrol  engine.  

Page 12: More Powerful Solr Search with Semaphore - Jeremy Bentley

How do I structure it?

Crea5on  Date  

Modified  Date  

Author  

Format  (PDF,DOC,XLS)  

Subject  

Loca5on  

Project  

Func5on  (IT,HR,Finance)  

Expe

rt  

Protec5ve  

Marker  

Reten5

on  

Expiry  

Publish

er  

Site  

Structural Process

Information

Page 13: More Powerful Solr Search with Semaphore - Jeremy Bentley

3rd degree content universe

Digital  Asset  

Management  

Publishing  Systems  

Social  collaboraFon  

eDiscovery  

Document    Management  

Content  Management  

Enterprise  Search  

Records  Management  

Portal  Infrastructure  

Process    Management  &  

Workflow  

Page 14: More Powerful Solr Search with Semaphore - Jeremy Bentley

4th degree of order

Digital  Asset  

Management  

Publishing  Systems  

Social  collaboraFon  

eDiscovery  

Document    Management  

Content  Management  

Enterprise  Search  

Records  Management  

Portal  Infrastructure  

Process    Management  &  

Workflow  

Content

Intelligence

Page 15: More Powerful Solr Search with Semaphore - Jeremy Bentley

4th degree of order Content Intelligence

Content  Intelligence  Plahorm  

     Solr  

Page 16: More Powerful Solr Search with Semaphore - Jeremy Bentley

Semaphore

Copyright  @  2011  Smartlogic  Semaphore  Limited   16  

Business    Vocabulary  

Classifica5on  Decision  User  

Ac5on  

Apply  

Inform  

Expose  

Page 17: More Powerful Solr Search with Semaphore - Jeremy Bentley

Semaphore

Copyright  @  2011  Smartlogic  Semaphore  Limited   17  

Business  Vocabulary  

Classifica5on  Decision  

Apply  

Inform  

Expose   Metadata  

Contextual  User  Experience  

Seman6c  models  

Seman6c  So7ware  

User  Ac5on  

Page 18: More Powerful Solr Search with Semaphore - Jeremy Bentley

Components • Metadata  • Seman5c  Models  • Contextual  User  Experience  • Seman5c  Sokware  

Copyright  @  2011  Smartlogic  Semaphore  Limited   18  

Page 19: More Powerful Solr Search with Semaphore - Jeremy Bentley

Metadata

Copyright  @  2011  Smartlogic  Semaphore  Limited   19  

Low  Quality  tags  High  cost  to  apply  

Manual  Process  

Single  Unified  ‘one  size  fits  all’  approach    

Long  5me  to  crak    &  build  ,  manually  applied  

Today  

High  Quality  tags  Low  cost  to  apply  

Automa5c  Process  

Mul5ple    approaches    for  various  domains/audiences  

Short  5me  to  build  &  deploy,  automa5cally    

With  Content  Intelligence  

Page 20: More Powerful Solr Search with Semaphore - Jeremy Bentley

Content-types available – Flashnotes

– Research reports – Trade ideas

Analytics available – Current bond price

– Relative bond spreads Influenced by – Credit ratings on

Ford Motor Credit Company – European and US economies – Changes in consumer demand

Automate compliance and

distribution tasks – ‘Watch list’ lookup

– Distribution according to preset rules

– Automated mapping to create aggregator metadata

Harnessing

User Experience – Conceptual relevance

– Related topics – Links to analytics

Search engine enhancement – Search results – Email alerts

Contextualising

Key competitors – BMW

– Daimler Chrysler – General Motors

– Toyota – Volkswagen Products

– Focus – Ka

– MX5

Preferred term (Agreed Label)

Ford Motor Company

Subsidiaries – Ford Motor Credit Company

– Mazda

Parent topics – Automotive sector

– Bond issuers

Also known as – Ford

– Ford Motor – F (Bloomberg)

– FoMoCo – blue oval

Covered by – Bob Smith

Location of fundamental data – Earnings estimates

– Historic sales and profits

Organising

Unstructured content integration

– Published reports – Related topics

– Links to analytics – Search results – Email alerts

Semantic Models

Page 21: More Powerful Solr Search with Semaphore - Jeremy Bentley

Key Features 1 Taxonomy enables

discovery, related searches

2 Related topics and content

3 Facets enable filtering results by:

4 -  Source

5 -  Numerous topics

6 - Date

7 Best Bets

8 Automated doc. Tagging

9 A-Z

ü  More relevant results ü  Fewer “bad hits” ü  Powerful navigation

1  

3  

5  

4  

2  

8  

9  

6  

7  

Contextual User Experience

Page 22: More Powerful Solr Search with Semaphore - Jeremy Bentley

Content  ExploraFon

Highligh5ng  rela5onships  in  a  result  set  greatly  improves  the  user  experience.  

Page 23: More Powerful Solr Search with Semaphore - Jeremy Bentley

Semantic Software

Semaphore  Ontology    &  Metadata  Management  

Text  Analysis  &  Extrac5on  Automa5c    and  assisted    Content  classifica5on  

Contextual  Naviga5on  Services  Seman5c  Reasoning  &  Processing  

Page 24: More Powerful Solr Search with Semaphore - Jeremy Bentley

Semaphore Search Integration

Search  Engine  

Query   Index  

Corpus  

Web  Services  API  

Search  Enhancement  

Server  

XML  API  

Classifica5on  Server  

Collector/Normalizer  

Extracted  Text  Document  “Tags”  

Ontology  Informa5on  

Text  Miner  

Ontology  Manager  

User  R

eque

sts  

Portal  

Search  Applica5on  Framework  

Sample  Interface  Co

de  

Semaphore  core  module  

Semaphore  op5onal  module  

Local  Term  Index  

Classifi

ca5o

n  Ru

les  

Page 25: More Powerful Solr Search with Semaphore - Jeremy Bentley

4th degree of order

Digital  Asset  

Management  

Publishing  Systems  

Social  collaboraFon  

eDiscovery  

Document    Management  

Content  Management  

Enterprise  Search  

Records  Management  

Portal  Infrastructure  

Process    Management  &  

Workflow  

Content

Intelligence

Page 26: More Powerful Solr Search with Semaphore - Jeremy Bentley

Content Intelligence

Informa5on  Manufacturing  

Knowledge  Recovery  

Content    Analy5cs  

Data  Loss  Preven5on  Risk  &  Compliance  

Mone5sa5on  

Metadata  

Page 27: More Powerful Solr Search with Semaphore - Jeremy Bentley

Content Intelligent Solutions

Web    Self  Service  

Knowledge    Acquisi5on  &  Recovery  

Governance  Risk    Compliance  

Cross  Plahorm  Content  Integra5on  

Micro-­‐Targe5ng  &  Distribu5on    

Page 28: More Powerful Solr Search with Semaphore - Jeremy Bentley

www.smartlogic.com   28  

Page 29: More Powerful Solr Search with Semaphore - Jeremy Bentley

Smartlogic TM

[email protected]

29  www.smartlogic.com