hugfr spark & riak -20160114_hug_france

25
SPARK & RIAK INTRODUCTION TO THE SPARK-RIAK-CONNECTOR LATERAL THOUGHTS

Upload: hug-france

Post on 22-Jan-2018

1.641 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Hugfr  SPARK & RIAK -20160114_hug_france

SPARK & RIAKINTRODUCTION TO THE SPARK-RIAK-CONNECTOR

LATERALTHOUGHTS

Page 2: Hugfr  SPARK & RIAK -20160114_hug_france

Me, Myself & I

Associate at LateralThoughts.com

Scala, Java, Python Developer

Data Engineer @ Axa & Carrefour

Apache Spark Trainer with Databricks

LATERALTHOUGHTS

Page 3: Hugfr  SPARK & RIAK -20160114_hug_france

And the Other One …

Director Sales @ Basho Technologies

(Basho make Riak)

Ex of MySQL France

Co-Founder MariaDB

Funny Accent

Page 4: Hugfr  SPARK & RIAK -20160114_hug_france

Quick Introduction …2011 Creators of Riak

Riak KV: NoSQL key value database Riak S2: Large Object Storage

2015 New Products Basho Data Platform: Integrated NoSQL databases, caching, in-memory analytics, and search

Riak TS: NoSQL Time Series database

120+ employees

Global Offices Seattle (HQ), Washington DC, London, Paris, Tokyo

300+ Enterprise customers, 1/3 of the Fortune 50

Page 5: Hugfr  SPARK & RIAK -20160114_hug_france
Page 6: Hugfr  SPARK & RIAK -20160114_hug_france

PRIORITIZED NEEDS

High Availability - Critical Data

High Scale – Heavy Reads & Writes

Geo Locality – Multiple Data Centers

Operational Simplicity – Resources

Don’t Scale as Clusters

Data Accuracy – Write Conflict Options

RIAK S2 USE CASES

Large Object Store Content Distribution

Web & Cloud Services Active Archives

RIAK KV USE CASES

User Data Session Data Profile Data

Real-time Data Log Data

RIAK TS USE CASES

IoT/Devices Financial/Economic

Scientific Observations Log Data

Page 7: Hugfr  SPARK & RIAK -20160114_hug_france

The Evolution of NoSQL

Unstructured Data Platforms

Multi-Model Solutions

Point Solutions

Page 8: Hugfr  SPARK & RIAK -20160114_hug_france

Basho Data Platform …

Page 9: Hugfr  SPARK & RIAK -20160114_hug_france

ABOUT SPARK & RIAK

Page 10: Hugfr  SPARK & RIAK -20160114_hug_france

Spark & Riak

Disclaimer, the following presentation uses :

Spark v1.5.2

Spark-Riak-Connector v1.1.0

Page 11: Hugfr  SPARK & RIAK -20160114_hug_france

Pre-Requisites

To use the Spark Riak Connector, as of now, you need to build it yourself :

Clone https://github.com/basho/spark-riak-connector

`git checkout v1.1.0`

`mvn clean install`

Page 12: Hugfr  SPARK & RIAK -20160114_hug_france

Bootstrapped project

Page 13: Hugfr  SPARK & RIAK -20160114_hug_france

Reading from

Connect to a Riak KV Cluster from Spark

Query it :

Full Scan

Using Keys

Using secondary indexes (2i)

Page 14: Hugfr  SPARK & RIAK -20160114_hug_france

Connecting to

Page 15: Hugfr  SPARK & RIAK -20160114_hug_france

Loading data from

riakBucket[V](bucketName: String): RiakRDD[V]

riakBucket[V](bucketName: String, bucketType: String): RiakRDD[V]

riakBucket[K, V](bucketName: String, convert: (Location, RiakObject) => (K, V)): RiakRDD[(K, V)]

On your Spark Context, you can use :

Page 16: Hugfr  SPARK & RIAK -20160114_hug_france

add a query, otherwise…

Page 17: Hugfr  SPARK & RIAK -20160114_hug_france

Find all :

Find by key(s) :

Page 18: Hugfr  SPARK & RIAK -20160114_hug_france

Implicits that will give you the riak* methods

Page 19: Hugfr  SPARK & RIAK -20160114_hug_france

Reading from

Using case classes

Using Secondary Indexes

Page 20: Hugfr  SPARK & RIAK -20160114_hug_france

Basic I/O

Page 21: Hugfr  SPARK & RIAK -20160114_hug_france

Mapping Objects - Buckets

Page 22: Hugfr  SPARK & RIAK -20160114_hug_france
Page 23: Hugfr  SPARK & RIAK -20160114_hug_france

Adding fields during save

Page 24: Hugfr  SPARK & RIAK -20160114_hug_france

Spark Riak Connector - RoadmapBetter Integration with Riak TS

Enhanced DataFrames - based on Riak TS Schema APIs

Server-side aggregations and grouping - using TS SQL commands

Speed

Data Locality (partition RDDs according to replication in the cluster) - launch Spark executors on the same nodes where the data resides.

Better mapping from vnodes to Spark workers using coverage plan

Better support for Riak data types (CRDT) and Search queries

Today requires using Java Riak client APIs

Spark Streaming

Provide example and sample integration with Apache Kafka

Improve reliability using Riak for checkpoints and WAL

Add examples and documentation for Python support

DRAFT

Page 25: Hugfr  SPARK & RIAK -20160114_hug_france

Thank you@ogirardot

[email protected]

https://github.com/ogirardot/spark-riak-example

https://speakerdeck.com/ogirardot/spark-and-riak-introduction-to-the-spark-riak-connector

@mcarney23

[email protected]

fr.basho.com