the analytics landscape: a personal viewcseweb.ucsd.edu/~elkan/analyticsdec2011.pdf1.structured data...

30
The analytics landscape: A personal view Charles Elkan el December 20, 2011

Upload: others

Post on 24-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

The analytics landscape:A personal view

Charles Elkan

el December 20, 2011

Page 2: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

What is analytics?

• Big data, business intelligence (BI), decision support (DSS), data warehousing, unstructured data, knowledge discovery in databases (KDD), information visualization, map-reduce.

• analytics = convert data into intelligence + capture value = statistics + optimization

• statistics = machine learning = data mining

• optimization = microeconomics + operations research

JARGON

Page 3: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Outline

1.Structured data (predictive, visual)2.Unstructured data3.The business of analytics4.A research and business opportunity

Page 4: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

A basic distinction

I. Structured dataTables in databasesNodes and links in networks

II. Unstructured dataTextVideosTables in web pagesXML

Page 5: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

I. Structured data

• A data warehouse is a cost center, not a profit center.• How can structured data be a profit center?

1.Predictive analytics2.Visual analytics

Page 6: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

1. Predictive analytics

• So, what can we do with structured data?• Answer: Make predictions, then take actions. • Example:

• But, what are the costs and benefits of alternative actions?• And, who pays which costs?

Page 7: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Cost-sensitive learning

• Cross-domain theory of making optimal decisions given predictions:

Page 8: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

2. Visual analytics

• So, what can we do with structured data?• Answer: Find and display patterns; prompt human insight.

Page 9: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Patterns of human metabolism

Page 10: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Information visualization

• “state of the art analytic tools to identify biomarkers”

Page 11: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

II. Unstructured data

Page 12: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

A case study

Page 13: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

A general need: Task-oriented semantic search

LaVerne Council, CIO of Johnson & Johnson:

“... allow anyone to ask a question ... folks that have given us access to their email ... data mining for answers to that question

... help us solve a very hairy issue for one of our products ... one of the associates had completed his thesis in college on that very topic ... they weren’t in the same company

... we were able to really come back with answers.”

Page 14: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

A grand vision

• “Open source intelligence (OSI)”

Page 15: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

A less grand vision

Page 16: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

III. The business of analytics

• Analytics applications are valuable.

Page 17: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Analytics companies are valuable

Page 18: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Are valuations bubble-icious?

• HP compared to Autonomy:

Sales: $128B versus $963MIncome: $12B versus $343M

Value: $50B versus $11B

• Forrester: “The Autonomy IP is stagnant. There hasn’t been a major release in five years.”

• Zero recent patents for the core analytics.

Page 19: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

IV. A research and market opportunity

Page 20: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

• New platform for diverse dataCloud-basedMultiply the user base 10x:

• Easy to use• Fun to use

• Opportunity: Add “secret sauce” to open-source softwareNewer artificial intelligencePatented artificial intelligence

Disruption from below

Page 21: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

• A role model for cloud-based ease of use: Box.net• $650M valuation, but no intelligence.

Page 22: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

• Cloud-based software as a service (SaaS)• Easy to use, fun to use• Newer AI, patented AI

• Open-source foundation:Lucene and Solr as backendTika for importing unstructured data

Disruption from below

Page 23: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Newer artificial intelligence

• Sentiment analysis• Topic models for

organizing content• Recursive neural nets

for deep understanding www.socher.org/index.php/Main/ParsingNaturalScenes AndNaturalLanguageWith RecursiveNeuralNetworks

Page 24: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Newer AI: Fewer topics, better fit

Page 25: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Patented AI: Sentiment analysis

• ... labels designate level of quality, such as interestingness, appropriateness, timeliness, humor, style of language, obscenity, sentiment

• ... a classifier means effective to automatically associate a quality value to items of data, wherein said quality value is indicative of the qualitative nature of said items of data

Page 26: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Today in the New York Times

Page 27: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

SQUID

• Sentiment analysis• Question answering• Unstructured data organization• Interactive insight• Diverse entity extraction

• But what will be most beneficial and profitable?

• Historical answer: Specific vertical applications.

Page 28: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Profit lies in verticals, I

Page 29: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Profit lies in verticals, II

Page 30: The analytics landscape: A personal viewcseweb.ucsd.edu/~elkan/AnalyticsDec2011.pdf1.Structured data (predictive, visual) 2.Unstructured data 3.The business of analytics 4.A research

Discussion

• Acknowledgement: Most images are due to other authors.