matt mcilwain opening keynote

24
FROM “BIG DATA” TO DATAWARE SIM Technology Leadership Summit May 20, 2015

Upload: seattlesim

Post on 17-Aug-2015

36 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Matt McIlwain opening keynote

FROM “BIG DATA” TO DATAWARE

SIM Technology Leadership SummitMay 20, 2015

Page 2: Matt McIlwain opening keynote

MADRONA OVERVIEW

• Madrona is a leading venture capital firm focused on sourcing and growing early-stage technology companies in the Pacific Northwest

• About $1 billion under management across five funds–Investors include the University of Washington, University of Virginia, Irvine Foundation, University of North Carolina, and strategic individuals

• Investments made in over 100 companies the past 20 years with over 50 active portfolio companies and over 40 positive exits

• Madrona team–7 Managing Directors–Strategic Directors and Venture Partners include: Sujal Patel, Steve Singh, John McAdam, Prof. Oren Etzioni, and Prof. Dan Weld

Page 3: Matt McIlwain opening keynote

THE PNW TECH ECOSYSTEM IS STRONG AND GROWING

Anchor Tenants

Large Tech Satellite Offices

Mid-Cap Tech with Seattle HQ

World-Class Research

Page 4: Matt McIlwain opening keynote

OUR FUTURE

1995 TODAY (2015) 2035

COMMUNICATION Snail mail, fax, early emailSMS, Facebook, Skype, Snapchat & Twitter

Virtual Reality Rooms

DEVICES Desktop PCs Smart Mobile DevicesEmbedded on you & everything else (IoT)

SOFTWARE/DATAWARE

Packaged/Licensed SaaS subscription/Apps Intelligent apps

INTERNET/CONNECTIVITY

Dial up modem 56k“Ubiquitous” broadband 100 Mbps to mobile

“Always On” and IoT

COMPUTE/STORAGE

Pentium processor 100 MIPSSingle-core ~$1 million/TB

Intel Xeon E7 processor – 4000 MIPSMulti-core $59/TB

$5/Petabyte

INFRASTRUCTURE Internet & Dedicated servers Cloud Real-time hybrid marketplace

COMMERCE 1 book/10 days/$5 deliveryAnything 2 days free; 50,000 items in 2 hours free delivery

Drones or autonomous car delivery & 3D printed

Page 5: Matt McIlwain opening keynote

WHAT IS “DATAWARE”?A framework for describing the combination of data, software, math formulas and “predictive” analytics that help data savvy teams turn information and insights into profitable actions.

5

Why Now?• Cloud Enablement: “Cloud” abstracts hardware into software and

enables unprecedented elasticity, scale and speed

• Big Data: The volume, velocity and variety of data types and stores has expanded rapidly while the value of retaining/leveraging data often exceeds the cost

• Legacy “Datastores”: Highly structured and constrained systems (databases, data warehouses, BI tools) that are too rigid to unlock data’s full value yet too ubiquitous and important to NOT leverage

• Emerging Solutions: A combination of point solutions, systematic approaches and “vertical” services emerging to leverage these trends in an agile manner. These solutions require a structured framework to prioritize market opportunities

Page 6: Matt McIlwain opening keynote

INSERT BIG DATA LANDSCAPE SLIDE

6

Page 7: Matt McIlwain opening keynote

MADRONA DATAWARE FRAMEWORK

7

INTELLIGENT APPS & SERVICES

DATA INTELLIGENCE

ENABLING INFRASTRUCTURE

Ag

ile D

ata

Sta

ck

Marc Benihoff, Founder and CEO of Salesforce.com, when asked what hethinks is the major tech trend of the next five years responded that we arein an “AI Spring.” Fortune Term Sheet 1/6/15

Page 8: Matt McIlwain opening keynote

WHAT MAKES THE DATA “BIG”?

Value More valuable to store than throw away

8

Variety Different sources & structures create opportunities… & challenges

Volume Easy, plentiful & cheap data to collect & store

Velocity Speed of turning data into actionable insights – batch vs. real-time!

Page 9: Matt McIlwain opening keynote

DATA INPUTS

• Legacy Databases: Highly structured, transactional focused, generally rigid

– Databases with SQL queries (OLTP)

– Historic “Extract, Transform, Load” tools (ETL)

– Data warehouses and data cubes

– Business Intelligence (BI) and “Online Analytics Processing (OLAP)”

• “Big Data” Sources: Structure variety, high volume/velocity, agile

– “Not Only SQL” (NoSQL) data repositories

– Allow for “Extract, Load, Transform” (ELT) flexibility

– Continuous, online (streamed) data flows

– Relationship focus vs. Relational focus9

Page 10: Matt McIlwain opening keynote

Places Things

Profiles

WHERE DOES DATA & METADATA COME FROM?

People

• Consumers• Office Workers• Field Workers• Citizens• Partners• Customers

• Home• Work• Stores• Destinations• Routes

• Individuals• Demographics• Devices• Locations• Objects• “Campaigns”• Biology• “Networks"

• Devices• Vehicles• Machines• Medical• Homes• Content

Page 11: Matt McIlwain opening keynote

WHY DOES IT MATTER?

From To

Structure Mostly structured (relational)

Flexibly structured (relationship)

Flexibility Rigid & slow(R + cubes +BI)

Agile & rapid(Python + graphs/ML + UI)

Availability Offline & batch Online & continuous

Key Drivers Code & “Rules”(“hard coded”, structured learning)

Data, Statistics, Discovery(“machine learned”, “inferred”, Bayesian)

Conceptually Certainty & consistency

Iteration & “surprise”

11

Page 12: Matt McIlwain opening keynote

TECHNOLOGY SECTOR IMPACT OF “DATAWARE”

YEARS: 0 – 2 2 – 5 5+

Relational Databases (Oracle, MSFT) + ?? -

Traditional Infrastructure(HP, IBM, Dell)

+ - --

Traditional Apps(Oracle, SAP) + +/- -

Cloud Infrastructure ++ ++ +

SAAS ++ ++ +/-

12

Page 13: Matt McIlwain opening keynote

BIG COMPANY “LEADING INDICATORS”

• Microsoft-AzureML, Revolution Analytics, much more

• HP reorganizes software business around “Big Data”

• Salesforce.com buys RelateIQ for $390M for “data cloud”

• Oracle builds “data cloud” team including Blue Kai and Datalogix

• SAP promotes HANA, buys Concur

• IBM advertises Watson, Blue Mix

• AWS – AmazonML, Lambda, Kinesis13

Page 14: Matt McIlwain opening keynote

KEY QUESTIONS

• How do big, especially software-driven, companies unlock their “data silos”?

• How will traditional databases/warehouses, newer “big data” stores and integrated big data “lakes” compliment or compete?

• What models will emerge to capture value in “data intelligence”?

• To what extent can intelligent apps and services disrupt legacy apps/services?

14

Page 15: Matt McIlwain opening keynote

MADRONA DATAWARE FRAMEWORK

15

INTELLIGENT APPS & SERVICES

DATA INTELLIGENCE

ENABLING INFRASTRUCTURE

Ag

ile D

ata

Sta

ck

Page 16: Matt McIlwain opening keynote

KEYS TO EMBRACING DATAWARE

1. Enabling infrastructure complex (Hadoop/Cloudera, NoSQL/MongoDB, Spark, Legacy) & hard/expensive but getting simplified and cheaper

2. Data Intelligence holds big promise but scarcity of “data scientists” requires professional services (Dato, Context Relevant, Atigeo, Palantir) and systematic, standardized approaches from emerging companies

3. Early “App Intelligence” that is real-time and agile already exists (ad serving, content recommendations, personalization, vertical markets). Tremendous opportunity here to reinvent categories

4. Opportunities also exist in the data pipeline (Trifacta) and data management, but tend to be deeper technical systems

16

Page 17: Matt McIlwain opening keynote

APPLICATION INTELLIGENCE

1. What will an “application” look like in 5+ years?

2. What will make that application “intelligent”?

17

=

+

+

Apps

Algos

Data

App Intelligence

Page 18: Matt McIlwain opening keynote

MADRONA DATAWARE INVESTMENTS

18

INTELLIGENTAPPS &

SERVICES

DATA INTELLIGENCE

ENABLING INFRASTRUCTURE

AG

ILE

DA

TA S

TAC

K

YIELDEX

DATO

BOOMERANG

JOBALINE HIGHSPOTBIZIBLE

PLACED

MAXPOINT

APPTIO

SEEQ

QUMULO

CONTEXT RELEVANT

ALGORITHMIA

IGNEOUS

ICEBRG

EXTRAHOP

Fund III Fund IV Fund V

Page 19: Matt McIlwain opening keynote

Appendix

19

Page 20: Matt McIlwain opening keynote

Dataware Case Study: Apptio

20

Category: “Full Stack”

Focus: Data-driven enterprise SAAS for CIO & team to run the business of IT (TBM)

Revenue: $100M+

Lineage: Startups, HP, IBM/rational

Keys: • Combine legacy General Ledger & modern usage data to “cost” services and share with users

• Define industry data & metadata standard – ATUM• Deliver real-time enterprise SAAS solution

Investors: Madrona Venture Group, Greylock Partners, Shasta Ventures, Andreessen Horowitz, T. Rowe Price

Page 21: Matt McIlwain opening keynote

Dataware Case Study: Cloudera

21

Category: Enabling Infrastructure

Focus: Became the industry standard for extracting, storing and managing a variety of data types so that they can enable data intelligence and data-driven services to suceed

Revenue: $100M+

Lineage: Hadoop, Open Source, Google, UW

Keys: • Early player in being a diverse, indexed data store• Helped define the “file system”, called HDFS, for

managing large-scale data stores• Attempting to be the underlying platform for

dataware

Investors: Accel Partners, Greylock Partners, Intel, T. Rowe Price

Page 22: Matt McIlwain opening keynote

Dataware Case Study: Dato

22

Category: Data Intelligence

Focus: Leverage machine learning and various data types from inspiration to insight and to build scalable, predictive and recommendation systems

Revenue: < $10M

Lineage: UW, Carnegie Mellon

Keys: • Use S-frames to combine graph, table, text & image data types

• Build an “end to end” data intelligence system from prototype to production

• Deliver predictive and recommender systems as services or stand alone applications for business customers

Investors: Madrona Venture Group, NEA, Vulcan

Page 23: Matt McIlwain opening keynote

Dataware Case Study: Placed.com

23

Category: App Intelligence

Focus: Combine location database & active panel data to analyze and optimize advertising and marketing programs

Revenue: < $10M

Lineage: Farecast, Quantcast, aQuantive

Keys: • Leverage data science to build highly accurate place database

• Create statistically significant panels to measure physical world impact of digital advertising

• Embed service into mobile add ecosystem to deliver actionable insights

Investors: Madrona Venture Group, Two Sigma

Page 24: Matt McIlwain opening keynote

Dataware Case Study: Trifacta

24

Category: Continuous Data Pipeline

Focus: Automate the process of cleaning, normalizing and preparing data for “Data Intelligence” use cases

Revenue: Unknown

Lineage: Stanford (Jeff Herr), Cal (Joe Hellerstein)

Keys: • Focus on core “Data Wrangling” problem• Use machine learning to recognize patterns &

suggest automated fixes• Simple visualization/UI

Investors: Greylock Partners, Accel Partners, Ignition Partners