mapr lucidworks joint webinar

21
1 ©MapR Technologies - Confidential Crowd Sourcing Reflected Intelligence Using Search and Big Data Ted Dunning Grant Ingersoll

Upload: ted-dunning

Post on 10-May-2015

1.186 views

Category:

Technology


3 download

DESCRIPTION

Slides for my recent webinar with Grant Ingersoll. We talked about how search technology can be abused to implement apparently intelligent systems.

TRANSCRIPT

Page 1: MapR lucidworks joint webinar

1©MapR Technologies - Confidential

Crowd Sourcing Reflected Intelligence Using Search and Big Data

Ted Dunning

Grant Ingersoll

Page 2: MapR lucidworks joint webinar

2©MapR Technologies - Confidential

Grant’s Background

Co-founder:– LucidWorks – Chief Scientist– Apache Mahout

Long time Lucene/Solr committer Author: Taming Text Background in IR and NLP– Built CLIR, QA and a variety of other search-based apps

Page 3: MapR lucidworks joint webinar

3©MapR Technologies - Confidential

Ted’s Background

Academia, Startups– Aptex, MusicMatch, ID Analytics, Veoh– Big data since before big

Open source– since the dark ages before the internet– Mahout, Zookeeper, Drill– bought the beer at first HUG

MapR– Chief Application Architect

Founding member of Apache Drill

Page 4: MapR lucidworks joint webinar

4©MapR Technologies - Confidential

Agenda

Intro Search Evolution and Search Revolution Reflected Intelligence Use Cases Building a Next Generation Search and Discovery Platform– MapR– LucidWorks

1+1=3

Page 5: MapR lucidworks joint webinar

5©MapR Technologies - Confidential

Search is Dead, Long Live Search

Search is a system building block– text is only a part of the story

If the algorithms fit,

use them!

Embrace fuzziness!

Scoring features are everywhere

Content

User Interaction

Access

Content Relationships

Page 6: MapR lucidworks joint webinar

6©MapR Technologies - Confidential

Search (R)evolution

Search use leads to search abuse– denormalization frees your mind– scoring is just a sparse matrix multiply

Lucene/Solr evolution– non free text usages abound– many DB-like features– noSQL before NoSQL was cool– flexible indexing– finite State Transducers FTW!

Scale

“This ain’t your father’s relevance anymore”

Page 7: MapR lucidworks joint webinar

7©MapR Technologies - Confidential

Add (Lots of) Water

Large-scale analysis is key to reflected intelligence– correlation analysis• based on queries, clicks, mouse tracks,

even explicit feedback• produce clusters, trends, topics, SIP’s

– start with engineered knowledge, refine with user feedback

Large-scale discovery features

encourage experimentation

Always test, always enrich!

Search

DiscoveryAnalytics

Page 8: MapR lucidworks joint webinar

8©MapR Technologies - Confidential

Social Media Analysis in Telecom

Correlate mobile traffic analysis with social media analysis– events cause traffic micro-bursts– participants tweet the events ahead of time

Deploy operations faster to predict outages and better handle emergency situations– high cost bandwidth augmentation can be marshaled as the traffic appears– anticipation beats reaction

Page 9: MapR lucidworks joint webinar

9©MapR Technologies - Confidential

Provenance is 80% of value

Analysis of social media to determine advertising reach and response

In one case the same untargeted advertising was worth 5x if sold with supporting data.

Page 10: MapR lucidworks joint webinar

10©MapR Technologies - Confidential

Claims Analysis

Goal– Insurance claims processing and analysis– fraud analysis

Method– Combine free text search with metadata analysis to identify high risk

activities across the country– Integrate with corporate workflows to detect and fix outliers in customer

relations

Results– Questions that took 24-48 hours now take seconds to answer

Page 11: MapR lucidworks joint webinar

11©MapR Technologies - Confidential

Virginia Tech - Help the World

Grab data around crisis Search immediately Large-scale analysis enriches data to find

ways to improve responses and understanding

http://www.ctrnet.net

Page 12: MapR lucidworks joint webinar

12©MapR Technologies - Confidential

Bright Planet - Catch the Bad Guys

Online Drug Counterfeit detection Identify commonly used language indicating counterfeits– you know it when you see it– and you know you have seen it

Feed to analyst via search-driven application– enrich based on analysts feedback

Page 13: MapR lucidworks joint webinar

13©MapR Technologies - Confidential

Veoh - Cross Recommendations

Cross recommendation as search– with search used to build cross recommendation!

Recommend content to people who exhibit certain behaviors (clicks, query terms, other)

(Ab)use of a search engine– but not as a search engine for content– more like a search engine for behavior

Page 14: MapR lucidworks joint webinar

14©MapR Technologies - Confidential

What Platform Do You Need?

Fast, efficient, scalable search– bulk and near real-time indexing– handle billions of records with sub-second search and faceting

Large scale, cost effective storage and processing capabilities

NLP and machine learning tools that scale to enhance discovery and analysis

Integrated log analysis workflows that close the loop between the raw data and user interactions

Page 15: MapR lucidworks joint webinar

15©MapR Technologies - Confidential

Shards

1 23 N

Search View

•Documents •Users •Logs

DocumentStore

Analytic Services

•View into numeric/historic data

•Classification•Recommendation

Personalization & Machine Learning

Services

Classification ModelsIn memoryReplicatedMulti-tenant

Discovery & EnrichmentClustering, classification, NLP, topic identification, search log analysis, user behavior

Content AcquisitionETL, batch or near real-time

Access APIs

Data• LucidWorks Search

connectors• Push

Reference Architecture

Page 16: MapR lucidworks joint webinar

16©MapR Technologies - Confidential

MapR

MapR provides the technology leading Hadoop distribution– full eco-system distribution– integrated data platform– complete solution for data integrity

MapR clusters also provide tight integration with search technologies like LucidWorks– integration is key for effective ops

Page 17: MapR lucidworks joint webinar

17©MapR Technologies - Confidential

LucidWorks

LucidWorks provides the leading packaging of Apache Lucene and Solr– build your own, we support– founded by the most prominent Lucene/Solr experts

LucidWorks Search– “Solr++”• UI, REST API, MapR connectors, relevance tools, much more

LucidWorks Big Data– Big Data as a Service– Integrated LucidWorks Search, Hadoop, machine learning with prebuilt

workflows for many of these tasks

Page 18: MapR lucidworks joint webinar

18©MapR Technologies - Confidential

LucidWorks Big Data Architecture

Big Data Operating System

• Administration

• Provisioning

• Monitoring

• Configuration

• Service Management • Data Management

• Security

SystemManagement

Uniform ReST API

Search – Discovery – Analytics• LucidWorks Search• Machine Learning (classification, clustering,

recommendations)• Natural Language Processing• SQL (Hive) Interface• Data Workflows (ETL, log analysis, common metrics)• Extensible

• Enterprise Repository

• Social Media

• Databases

• HDFS

• Cloud (S3)

• Push

ContentAcquisition

Hadoop/HBase SearchIndexes

SearchLogs

Page 19: MapR lucidworks joint webinar

19©MapR Technologies - Confidential

Easy Wins

Analyze logs from application stored in MapR Seamlessly store search indexes in MapR– and feed to Pig, Mahout and others– use mirrors + NFS to directly deploy indexes

Snapshots make backups a snap LucidWorks 2.5 (2013 Q1) easily connects with MapR

Page 20: MapR lucidworks joint webinar

20©MapR Technologies - Confidential

1 + 1 = 3

Page 21: MapR lucidworks joint webinar

21©MapR Technologies - Confidential

Learn More

More informationhttp://www.mapr.com/company/events/lucidworks-12-13-2012

Vote for this topic for Hadoop Summit EU:http://bit.ly/128tLQe

Talk to Ted@[email protected]

Talk to Grant@gsingers

MapR and Lucid Workshttp://www.mapr.comhttp://www.lucidworks.com