hadoop meets cloud with multi-tenancy

Post on 10-May-2015

7.023 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

CTO Kaz's talk at Hadoop Conference Japan 2013 Winter.

TRANSCRIPT

Treasure DataHadoop meets Cloud with Multi-Tenancy

Kazuki OhtaFounder and CTO at Treasure Data, Inc.

Hadoopユーザー会 k@treasure-data.com

@kzk_mover

Friday, April 5, 13

Who are you? Kazuki Ohta (太田一樹)

• @kzk_mover, k@treasure-data.com

Treasure Data, Inc.• Chief Technology Officer, Founded July 2011

Hadoop User Group Japan• One of Founders• “Hadoop徹底入門”

Open-Source Enthusiast• Hadoop, memcached, jemalloc, MongoDB, memcached, uim, etc...

2

Friday, April 5, 13

3

Data Volume

Cloud

EnterpriseRDBMSLightweight

RDBMS

DB2

1Bil entryOr 10TB

TraditionalData Warehouse

$10Bmarket

$34Bmarket

Database-as-a-service

Big Data-as-a-Service

On-Premise

© 2012 Forrester Research, Inc. Reproduction Prohibited

Treasure Data = Cloud + Big Data

Friday, April 5, 13

4

What is the Problem?

Friday, April 5, 13

Big Data? NoSQL?

5

Friday, April 5, 13

6

Too Many Solutions

Friday, April 5, 13

7from http://marblejenka.blogspot.jp/2013/01/hadoop.html

Hadoop Versions

Too Many Variations (+Eco System)

Friday, April 5, 13

Current Big Data Solutions: ‘Feature Creep’

8http://en.wikipedia.org/wiki/Feature_creepFriday, April 5, 13

9

We need Machete :)

Machete Design by James LindenbaumHeroku Co-Founderhttp://www.youtube.com/watch?v=3BhDLm9jo5Y

EVERYTHINGwith

ONE interface

Simple & Discoverable

Friday, April 5, 13

‘Simplicity’ itself is a feature :)

10

by Anand Babu PeriasamyGlusterFS Co-Founder

Friday, April 5, 13

Next Topic: Cloud?

11

Friday, April 5, 13

12

http://www.saasblogs.com/saas/demystifying-the-cloud-where-do-saas-paas-and-other-acronyms-fit-in/

Friday, April 5, 13

Battle Field of IaaS Vendors: SCM

13

HW Performance / Price

Time

On-Premise

Decrease withMoore’s Law

IaaS Vendors

Battle Field:Supply Chain Management

In the near future, most of HW buyers aren’t individual companies, but cloud.

Friday, April 5, 13

PaaS, SaaS:IT is all about Operation

14

With PaaS, you offload your development operations function and have the PaaS provider handle the tools and components required to deploy and manage applications reliably. - EngineYard

More Sleep, More Value

Friday, April 5, 13

15

PaaS/SaaS Battle Field: ‘Time’ is Money

CustomerValue

Time

IdealExpectation

Sign-up or PO

Obsoleteover time

Reality(On-Premise)

HW/SW Selection, PoC, Deploy...Upgrade

Friday, April 5, 13

16

Introductionto

Treasure Data

Friday, April 5, 13

17

Company Overview

US team as of 2012 JulyFriday, April 5, 13

Company Overview Silicon Valley-based Company

• All Founders are Japanese• Hironobu Yoshikawa• Kazuki Ohta• Sadayuki Furuhashi

OSS Enthusiasts• MessagePack, Fluentd, etc.• Cloud native

18

Friday, April 5, 13

19

Our 50+ Customers – Fortune Global 500 leaders and start-ups including:

250 billion records / month in Feb 2013

2 million jobs executed

Friday, April 5, 13

20

Vision: Single Analytics Platform for the World

Friday, April 5, 13

Investors Bill Tai Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO Othman Laraki - Former VP Growth at Twitter James Lindenbaum, Adam Wiggins, Orion Henry - Heroku

Founders Anand Babu Periasamy, Hitesh Chellani - Gluster

Founders Yukihiro “Matz” Matsumoto - Creator of Ruby Dan Scheinman - Director of Arista Networks + 10 more people

• and....21

Jerry Yang, Founder of Yahoo!where Hadoop was invented :)

Check out Today (2013/01/21)’s Morning 日経新聞!

Friday, April 5, 13

22

Treasure Data’sPhilosophy and Architecture

Friday, April 5, 13

23

Big Data Adoption Stages

Intelligence Sophistication

Standard Reports

Ad-hoc Reports

Drill Down Query

Alerts

Statistical Analysis

Predictive Analysis

Optimization

What happened?

Where?

Where exactly?

Error?

Why?

What’s a trend?

What’s the best?

Analytics

Reporting

Treasure Data’s FOCUS

(80% of needs)

Friday, April 5, 13

24

Full Stack Support for Big Data Reporting

Our best-in-class architecture and operations team ensure the integrity and availability of your data.

Data from almost any source can be securely and reliably uploaded using td-agent in streaming or batch mode.

Our SQL, REST, JDBC, ODBC and command-line interfaces support all major query tools and approaches.

You can store gigabytes to petabytes of data efficiently and securely in our cloud-based columnar datastore.

Friday, April 5, 13

25

Treasure Data = Collect + Store + Query

Friday, April 5, 13

26

Example in AdTech: MobFox

1. Europe’s largest independent mobile ad exchange.

2. 20 billion imps/month (circa Jan. 2013)

3. Serving ads for 15,000+ mobile apps (circa Jan. 2013)

4. Needed Big Data Analytics infrastructure ASAP.

Friday, April 5, 13

27

Two Weeks From Start to Finish!

Friday, April 5, 13

28

Our Value was Proven :)

CustomerValue

Time

Our Value: Save Time!

Sign-up or PO

Obsoleteover time

Reality(On-Premise)

HW/SW Selection, PoC, Deploy...Upgrade

SimpleInterface

Friday, April 5, 13

29

Architecture Breakdown

Data Collection• Increasing variety of

data sources• No single data schema• Lack of streaming data

collection method• 60% of Big Data project

resource consumed

Data Store/Analytics• Remaining complexity in

both traditional DWH and Hadoop (very slow time to market)

• Challenges in scaling data volume and expanding cost.

Connectivity• Required to ensure

connectivity with existing BI/visualization/apps by JDBC, REST and ODBC.

Friday, April 5, 13

1) Data Collection 60% of BI project resource is consumed here Most ‘underestimated’ and ‘unsexy’ but MOST important Fluentd: OSS lightweight but robust Log Collector

• http://fluentd.org/

30

15:40~ Log analysis system with Hadoop in livedoor 2013

by Satoshi Tagomori @ NHN Japan

16:30~ いかにしてHadoopにデータを集めるか by Sadayuki Furuhahsi @ Treasure Data, Inc.

These talks will cover Fluentd :)

Friday, April 5, 13

31

2) Data Store / Analytics - Columnar Storage

Friday, April 5, 13

32

3) Connectivity

Query

Web App

MySQLTreasure Data

Columnar Storage

QueryProcessingCluster

Query API

REST API

JDBC, ODBC Driver

td-command

BI apps

Postgres

Result

Friday, April 5, 13

Most Difficult Challenge: Multi-Tenancy All customers share the Hadoop clusters (4 Data Centers) Resource Sharing (Burst Cores), Rapid Improvement, Ease of Upgrade

33

datacenter A

datacenter B

datacenter C

datacenter D

Local FairScheduler

Local FairScheduler

Local FairScheduler

Local FairScheduler

GlobalScheduler

On-DemandResouce Allocation

Job Submission+ Plan Change

Friday, April 5, 13

Conclusion Big Data is too complex

• Needs Simplicity• Machete v.s. Swiss Army Knife (Feature Creep)

IT is changing• The value of Software itself is decreasing• Operation is the key

Treasure Data = Cloud + Big Data• Currently Focusing on Big Data Reporting• Instant Value with Simple Interface

34

Friday, April 5, 13

35

We’re Hiring Top Talents, please contact me :)

Friday, April 5, 13

3618

Appendix

Friday, April 5, 13

37

Big Data Market GrowthBig Data Revenue Breakdown(average of IDC, Gartner and Wikibon stats)

CAGR 38%

“More than half a billion dollars in venture capital has been invested in new big data technology.”

— Dan Vessett, IDC

“In 2012…BI and Analytics are rated #1 priorities.” — Ravi Kalakota, Gartner

“Big Data is the new definitive source of competitive advantage across all industries.”

— Jeff Kelly, Wikibon

Friday, April 5, 13

38

Big Data Situation

CustomerValue

Time

Treasure Data

AWS

On-premise solutions

Sign-up or PO

Software B

EMR

RedShift

Software A

Obsolescenceover time

Friday, April 5, 13

39

Treasure Data Service ArchitectureUser

Apache

App

App

Other data sources

RDBMS

Treasure Data columnar data

warehouse

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

td-command

BI apps

Friday, April 5, 13

40

Our Own Open Source technologiesWe are open source natives and proud of our heritage.We’ve contributed to Hibernate, Hadoop, Cassandra, Memcached, KDE, MongoDB among others.Our product reflects our deep commitment to the open-source community and is built on top of open source software we’ve authored and open sourced.• Fluentd - a popular data collector daemon written in Ruby www.fluentd.org (a leading user: SlideShare/Linkedin, One Kings Lane)• MessagePack - a fast, compact serializer. www.msgpack.org (a leading user: Pinterest, Redis)

Substantial commitment(Code, Packaging, Documentation,

Sponsorship)

Tech marketing, Possible lead gen

Friday, April 5, 13

41

Example in Web Industry

Friday, April 5, 13

42

Example Use Case – MySQL to TD

Friday, April 5, 13

43

Example Use Case – MySQL to TD

Friday, April 5, 13

Big Data for the Rest of Us

www.treasure-data.com | @TreasureData

Friday, April 5, 13

top related