scala, apache spark, the playframework and docker in ibm platform as a service

23
Soft-Shake 15 - Geneva @romeokienzler [email protected] Scala, Apache Spark, The PlayFramework, Docker and Platform as a Service

Upload: romeo-kienzler

Post on 12-Apr-2017

748 views

Category:

Internet


2 download

TRANSCRIPT

Page 1: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Soft-Shake 15 - Geneva

@romeokienzler

[email protected]

Scala, Apache Spark, The PlayFramework, Docker and Platform as a Service

Page 2: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

The Ingredients

NodeJS NodeRED Scala The Play Framework Apache Spark Docker, DockerCompose, DockerSwarm Platform as a Service powered by IBM Bluemix

2

Page 3: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

NodeJS

Server Side JavaScript Runtime Framework OpenSource Very frequently used by Startups REACTIVE (see explanation on PlayFramework slide)

3

Page 4: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

NodeRED

OpenSource Data Integration Framework Supports Visual Programming Very large set of connectors and extensions (> 400) Created by IBM Runs on top of NodeJS Extensible through JavaScript

4

Page 5: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Scala

Invented @EPFL Runs on top of JVM Open but commercialized through Typsafe Strong on functional programming paradigm (nice for data analytics tasks) Supports OOP as well

5

Page 6: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

The PlayFramework

Written in Scala Compatible with Scala and Java Meant to build REACTIVE HTTP services by unbinding the requests from the

threads through callback handlers Used at LinkedIn for example and at a major company in Valais

6

Page 7: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Apache Spark

Successor of MapReduce Supports various data stores, e.g. HDFS, Swift, S3, ... Forces you to use functional programming Therefore creates highly parallelizable code Programmable in Java, Scala and Python Central Data Structure are RDDs (Resilient Distributed Datasets) virtualizing the

underlying storage architecture

7

Page 8: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Docker

Behavior similar to virtual machines Based on cgroups and namespaces Linux kernel extension Uses LXC internally In contrast to virtual machines the runtime instances are called container Operating system processes are running on the host system but within a

container they apear to be alone A docker container starts in < 100 ms and you can run 100rds of them on a

single host system

8

Page 9: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

DockerCompose

A way to define and run a multi container topology Topology defined in a single docker-compose.yml file Individual containers serving different tiers can be scaled up/down

9

Page 10: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

DockerSwarm

What if a single machine is to weak to run your topology? Groups multiple nodes together to act as a single docker node Uses same API than DOCKER on a standalone machine In combination with DockerCompose you get a lightweight and ultra fast

scaling runtime

10

Page 11: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Platform as a Service through IBM Bluemix

Powerd by CloudFoundry (OpenSource/OpenStandard) Supports Docker, runs on DockerSwarm (with a container placement optimizer) DockerCompose support by end of year Supports virtual machines via OpenStack > 100 services (e.g. Hadoop, Spark, SWIFT, MongoDB, MySQL, Watson, ...) Core runtime for this talk

11

Page 12: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Usecase

Get tweets for the public twitter API (not firehose)

Using NodeRED add sentiment analysis through an IBM Watson Service

Store tweets plus sentiment score in OpenStack Swift Service on Bluemix

Additionally store them in the HDFS Service on Bluemix

Using Apache Spark and Scala apply retrospective analysis

Using BigSQL, JQuery and the PlayFramework draw a realtime chart

12

Page 13: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Architecture – Get the tweets

NodeRED

OpenStack SWIFT

HADOOP HDFS

13

Page 14: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Architecture – down stream analysis

OpenStack SWIFT

HADOOP HDFS

Spark Service

BigSQL

iPyhton Notebook supporting Scala

CloudFoundry Container with PlayFramework running on JVM REST Service

Web Browser running AJAX application using JQuery

14

Page 15: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

NodeRED Tweet ingestion & sentiment scoring

Page 16: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

PlayFramework REST Service

def data = Action.async {

var statement = connection.createStatement

val resultSet = statement.executeQuery("select count(*) as

total, (select count(*) as IBM from tweetsift where UCASE(tweet)

like '%IBM%'), (select count(*) as softlayer from tweetsift where

UCASE(tweet) like '%SOFTLAYER%') from tweetsift")

resultSet.next() // we expect exactly one row

val total = resultSet.getInt("TOTAL")

val ibm = resultSet.getInt("IBM")

val softlayer = resultSet.getInt("SOFTLAYER")

val result = "["+total+","+ibm+","+softlayer+"]"

Ok(result)

}

Page 17: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Preprocessed data using R service in Bluemix

17

Page 18: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

JQuery AJAX WebApplication calling REST Service

Page 19: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

View on the SWIFT explorer

Page 20: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Apache Spark Access to the data in IBM Bluemix var tweets = sc.textFile("swift://softshake.spark/tmp_25573-tweets1126007960.csv");

var companies = sc.textFile("swift://softshake.spark/tmp_25573-companies-384438100.csv");

val tweetsHeaderAndRows = tweets.map(line => line.split(",").map(_.trim))

val tweetsHeader = tweetsHeaderAndRows.first

val tweetsData = tweetsHeaderAndRows.filter(_(0) != tweetsHeader(0))

val tweetMaps = tweetsData.map(splits => tweetsHeader.zip(splits).toMap)

val companiesData = companies.filter(s => !s.equals("COMPANY_NAME_ID"));

Page 21: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Calculating tweet frequency per company

val tweetsWithCompany = tweetMaps.cartesian(companiesData).filter(t =>

t._1("TEXT").toLowerCase().contains(t._2.toLowerCase))

val companyAndScore = tweetsWithCompany.map(t => (t._2,t._1("SCORE").toDouble))

val companyFrequency = companyAndScore.map(t => (t._2,1)).reduceByKey(_ + _)

Page 22: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Wanna do it yourself?

IBM Cloud Free Tier (incl. Bluemix): http://ibm.biz/joinIBMCloud

24-120K CHF Cloud credits for startups [email protected]

*A*N*Y question [email protected]

Free usage for Students and Faculties [email protected]

Page 23: Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

Wanna hear more?

Nov 2nd. in Zurich: Apache Spark Advanced Meetup http://www.meetup.com/HackSessionsSwitzerland/events/225445919/?oc=evam

Nov 3rd. in Berne: - cloud computing - Apache spark - challenges in NG sequencing http://www.meetup.com/SwissLifeScience/events/225836187/?oc=evam

Nov 11th. in Lausanne: Introduction to Docker, Streamcomputing on ApacheSpark

and InfoSphere Streams http://www.meetup.com/HackSessionsSwitzerland/events/225441845/?oc=evam

Some sessions will be streamed at: http://www.meetup.com/Cloud-Scale-Data-Science-virtual-UserGroup-

worldwide/