couchbase server and spark machine learning meetup

Machine Learning DemoSpark & CouchbaseWill Gardella, Product Manager

@[email protected]

©2016 Couchbase Inc. 2

Agenda Technologies – Spark & Couchbase ML Use Cases Demo

Get Couchbase 4.5 Enterprise Edition http://www.couchbase.com/nosql-databases/downloads Sample Code on Github – Word2VecExample.scalahttps://github.com/couchbaselabs/couchbase-spark-samples

http://www.couchbase.com/nosql-databases/downloads

https://github.com/couchbase/couchbase-spark-samples



Spark – not slowing yet

Source: http://stackoverflow.com/research/developer-survey-2016

http://stackoverflow.com/research/developer-survey-2016

http://stackoverflow.com/research/developer-survey-2016


Yep, Spark is popular


Hello, Spark!Fast, general engine for big data processing with libraries for advanced analytics


Spark Fast

– 100x better than MR when in-memory, 10x on disk Sophisticated

– Powerful primitives – not just MR– Advanced algorithms, graph, machine learning

Developer Convenience– Well designed APIs in Java, Scala, Python, R– Supports SQL, DataFrames, Datasets and many other formats– Interactive shell (REPL), standalone mode,

Batch & Streaming

Couchbase


Couchbase addresses the needs of Digital Economy businesses


Combines the flexibility of JSON, power of SQL, and scalability of NoSQL

Develop with Agility Operate at Any Scale

• Flexible JSON data model

• Dynamic schema support

• Powerful query language (N1QL) extends SQL to JSON

• Sub-millisecond latency at scale

• Elastic scaling on commodity servers

• High availability

Couchbase Server – the operational DBMS for web, mobile & IoT

©2014 Couchbase, Inc.©2016 Couchbase Inc. 11

Achieving scale & availability with Couchbase

11

Scale cluster online with growing application needs, on demand

Build always available apps with replication & failover

Remove programming complexity by pushing sharding to the database


Achieve Global Data Distribution and HA/DR

12

Built-in Cross Data Center Replication (XDCR)


N1QL access to JSON

N1QL - Next generation, NoSQL query language

SELECT …FROM … JOIN …

WHERE … LIKE … GROUP etc.,

Powerful Extensions for JSON(Un)Folding of nested Structures with NEST, UNNEST,

Array Handling EVERY/ANY … IN array SATISFIES

Couchbase & Spark


Damn it Jim, I’m a big data processing engine, not a database!


NoSQL + Spark use cases

Operations Analysis

Recommendations Next gen data

warehousing Predictive analytics Fraud detection

Catalog Customer 360 + IOT Personalization Mobile applications


Big Data at a Glance

Couchbase Spark

Use cases • Operational• Web / Mobile

• Analytics• Machine

LearningProcessing mode • Online

• Ad Hoc• Ad Hoc • Batch

Low latency = < 1ms ops SecondsPerformance Highly

predictableVariable

Users are typically…

Millions of customers

100’s of analysts or data scientists

Memory-centric Memory-centricBig data = 10s of Terabytes Petabytes (?)ANALYTIC

ALOPERATIONAL


Couchbase Spark Connector

Features• Automatic cluster & resource

management• Create RDDs from KV, N1QL, Views• Create DStreams from DCP feeds• Persist RDDs and Dstreams• Support for Datasets, DataFrames and

SparkSQL


Couchbase & Spark for Machine Learning

Hadoop

Data scientists train machine learning modelsLoad results into Couchbase so end users can interact with them onlineExamples including recommendations for content and products, flagging fraud or spam

Machine Learning Models

Data Warehous

e

Historical Data


DEMO TIME!


Learn More - Couchbase Spark ConnectorCouchbase Spark Connector - Source https://github.com/couchbase/couchbase-spark-connectorCode Samples https://github.com/couchbaselabs/couchbase-spark-samples Talk: Spark with Couchbase to Electrify your Data Processing https://youtu.be/sBnAf7gAfLc Market Basket Analysis Sample App (Avalon) https://github.com/Avalon-Consulting-LLC/couchbase-spark-mba

22

https://github.com/couchbase/couchbase-spark-connector


https://youtu.be/sBnAf7gAfLc

https://github.com/Avalon-Consulting-LLC/couchbase-spark-mba

https://github.com/Avalon-Consulting-LLC/couchbase-spark-mba

Questions?Will Gardella, Product Manager

@[email protected]

couchbase server and spark machine learning meetup

Software