couchbase server and spark machine learning meetup

22
Machine Learning Demo Spark & Couchbase Will Gardella, Product Manager @WillGardella [email protected]

Upload: will-gardella

Post on 15-Apr-2017

238 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Couchbase Server and Spark Machine Learning Meetup

Machine Learning DemoSpark & CouchbaseWill Gardella, Product Manager

@[email protected]

Page 2: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 2

Agenda Technologies – Spark & Couchbase ML Use Cases Demo

Get Couchbase 4.5 Enterprise Edition http://www.couchbase.com/nosql-databases/downloads Sample Code on Github – Word2VecExample.scalahttps://github.com/couchbaselabs/couchbase-spark-samples

Page 3: Couchbase Server and Spark Machine Learning Meetup

Spark

Page 4: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 4

Spark – not slowing yet

Source: http://stackoverflow.com/research/developer-survey-2016

Page 5: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 5

Yep, Spark is popular

Page 6: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 6

Hello, Spark!Fast, general engine for big data processing with libraries for advanced analytics

Page 7: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 7

Spark Fast

– 100x better than MR when in-memory, 10x on disk Sophisticated

– Powerful primitives – not just MR– Advanced algorithms, graph, machine learning

Developer Convenience– Well designed APIs in Java, Scala, Python, R– Supports SQL, DataFrames, Datasets and many other formats– Interactive shell (REPL), standalone mode,

Batch & Streaming

Page 8: Couchbase Server and Spark Machine Learning Meetup

Couchbase

Page 9: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 9

Couchbase addresses the needs of Digital Economy businesses

Page 10: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 10

Combines the flexibility of JSON, power of SQL, and scalability of NoSQL

Develop with Agility Operate at Any Scale

• Flexible JSON data model

• Dynamic schema support

• Powerful query language (N1QL) extends SQL to JSON

• Sub-millisecond latency at scale

• Elastic scaling on commodity servers

• High availability

Couchbase Server – the operational DBMS for web, mobile & IoT

Page 11: Couchbase Server and Spark Machine Learning Meetup

©2014 Couchbase, Inc.©2016 Couchbase Inc. 11

Achieving scale & availability with Couchbase

11

Scale cluster online with growing application needs, on demand

Build always available apps with replication & failover

Remove programming complexity by pushing sharding to the database

Page 12: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 12

Achieve Global Data Distribution and HA/DR

12

Built-in Cross Data Center Replication (XDCR)

Page 13: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 13

N1QL access to JSON

N1QL - Next generation, NoSQL query language

SELECT …FROM … JOIN …

WHERE … LIKE … GROUP etc.,

Powerful Extensions for JSON(Un)Folding of nested Structures with NEST, UNNEST,

Array Handling EVERY/ANY … IN array SATISFIES

Page 14: Couchbase Server and Spark Machine Learning Meetup

Couchbase & Spark

Page 15: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 15

Damn it Jim, I’m a big data processing engine, not a database!

Page 16: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 16

NoSQL + Spark use cases

Operations Analysis

Recommendations Next gen data

warehousing Predictive analytics Fraud detection

Catalog Customer 360 + IOT Personalization Mobile applications

Page 17: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 17

Big Data at a Glance

Couchbase Spark

Use cases • Operational• Web / Mobile

• Analytics• Machine

LearningProcessing mode • Online

• Ad Hoc• Ad Hoc • Batch

Low latency = < 1ms ops SecondsPerformance Highly

predictableVariable

Users are typically…

Millions of customers

100’s of analysts or data scientists

Memory-centric Memory-centricBig data = 10s of Terabytes Petabytes (?)ANALYTIC

ALOPERATIONAL

Page 18: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 18

Couchbase Spark Connector

Features• Automatic cluster & resource

management• Create RDDs from KV, N1QL, Views• Create DStreams from DCP feeds• Persist RDDs and Dstreams• Support for Datasets, DataFrames and

SparkSQL

Page 19: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 19

Couchbase & Spark for Machine Learning

Hadoop

Data scientists train machine learning modelsLoad results into Couchbase so end users can interact with them onlineExamples including recommendations for content and products, flagging fraud or spam

Machine Learning Models

Data Warehous

e

Historical Data

Page 20: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 20

DEMO TIME!

Page 21: Couchbase Server and Spark Machine Learning Meetup

©2016 Couchbase Inc. 22

Learn More - Couchbase Spark ConnectorCouchbase Spark Connector - Source https://github.com/couchbase/couchbase-spark-connectorCode Samples https://github.com/couchbaselabs/couchbase-spark-samples Talk: Spark with Couchbase to Electrify your Data Processing https://youtu.be/sBnAf7gAfLc Market Basket Analysis Sample App (Avalon) https://github.com/Avalon-Consulting-LLC/couchbase-spark-mba

22

Page 22: Couchbase Server and Spark Machine Learning Meetup

Questions?Will Gardella, Product Manager

@[email protected]