apache zeppelin the missing component for the big data ecosystem

39
@doanduyhai #VoxxedVienna Apache Zeppelin the missing GUI for your BigData eco-system DuyHai DOAN Apache Cassandra Evangelist

Upload: duyhai-doan

Post on 07-Jan-2017

929 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai #VoxxedVienna

Apache Zeppelin the missing GUI for your

BigData eco-system DuyHai DOAN

Apache Cassandra Evangelist

Page 2: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Who Am I ?Duy Hai DOAN Cassandra technical advocate•  talks, meetups, confs•  open-source devs (Achilles, …)•  OSS Cassandra point of contact

[email protected] ☞ @doanduyhai

2

Page 3: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Datastax•  Founded in April 2010

•  We contribute a lot to Apache Cassandra™

•  400+ customers (25 of the Fortune 100), 400+ employees

•  Headquarter in San Francisco Bay area

•  EU headquarter in London, offices in France and Germany

•  Datastax Enterprise = OSS Cassandra + extra features

3

Page 4: Apache zeppelin the missing component for the big data ecosystem

What is Apache Zeppelin ?

PresentationArchitecture

Page 5: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Zeppelin Presentation

5

Page 6: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 7: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Zeppelin Architecture

Zeppelin Server

Zeppelin Engine

7

REST

Web

Sock

et

Spark Interpreter Group

Spark SparkSQL

Zeppelin Interpreter

Factory

Tajo Interpreter

Flink Interpreter

Cassandra Interpreter

JVM

JVM

JVM

JVM

JVM

Page 8: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

What does Zeppelin provide ?Front-end & display system for free

Generic back-end with REST APIs & WebSocket

Pluggable interpreters system

Task scheduler (à la CRON)

8

Page 9: Apache zeppelin the missing component for the big data ecosystem

Zeppelin UI Layout

NotebookParagraph

UI elements

Page 10: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 11: Apache zeppelin the missing component for the big data ecosystem

Zeppelin Display System

Raw, Table, HTML, Angular with ScalaAvailable graphs

View modesDynamic formIframe export

Page 12: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 13: Apache zeppelin the missing component for the big data ecosystem

Interpreter to Front-End Streaming

Page 14: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Interpreter to front-end streaming

Zeppelin Server

14

Web

Sock

et

Interpreter

JVM

JVM

Page 15: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 16: Apache zeppelin the missing component for the big data ecosystem

Interpreter system

Core interpretersThird-parties interpreters

Interpreters conf & usage

Page 17: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Interpreter processing lifecycle①  Receive input commands/data•  as raw text

•  from form data

②  Process the input commands/data by the external back-end

③  Format the response using Zeppelin display system

④  Send response back to the Zeppelin engine

17

Page 18: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Core interpreters•  Spark (Spark core, SparkSQL/DataFrame, PySpark)•  Spark core = default (or %spark)

•  SparkSQL = %sql

•  Shell (%sh)

•  Markdown (%md)

•  AngularJS (%angular)

18

Page 19: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Third-parties interpreters•  Hive•  Phoenix•  Tajo•  Flink•  Ignite•  Lens•  Cassandra •  Geode•  PostgreSQL•  Kylin•  ElasticSearch

19

Page 20: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Interpreter conf & usage https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 21: Apache zeppelin the missing component for the big data ecosystem

Writing An Interpreter

How ToSimple interpreter example (AsciiDoc)

Complex interpreter example (Cassandra)

Page 22: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Steps to write your own interpreter

•  Create a class that extends Interpreter base class•  Register it in a static block

•  Optionnally define default config params

22

static {Interpreter.register("MyInterpreterName", MyClassName.class.getName());

}

static {Interpreter.register("MyInterpreterName", MyClassName.class.getName(),

new InterpreterPropertyBuilder() .add("property1", "default value", "Description of property1").build());

}

Page 23: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

To register your interpreter as default

•  Edit the enum ZeppelinConfiguration.ConfVars

•  Add your interpreter FQCN in the property ZEPPELIN_INTERPRETERS

23

Page 24: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

To register your interpreter in config files

•  Create conf/zeppelin-site.xml from conf/zeppelin-site.xml.template

•  Add your interpreter FQCN in the property zeppelin.interpreters

24

<property><name>zeppelin.interpreters</name><value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,

org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,com.me.MyNewInterpreter

</value></property>

Page 25: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Update interpreter pom.xml

25

Page 26: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Update main pom.xml

26

Page 27: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Simple AsciiDoc Interpreter

27

Zeppelin Server

AsciiDoc Interpreter

JVM Zeppelin Engine

RawTextBlock

RawTextBlock

ConvertedTo

HTML

HTMLOutput

① ②

③ ④

JVM

Page 28: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Simple interpreter (AsciiDoc) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 29: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Cassandra Interpreter Architecture

29

Cassandra Interpreter

JVM

DisplayResults as

HTML

① ②

Zeppelin Server

JVM

RawTextBlock

RawTextBlock

Cassandra Cassandra

Java Driver

Async CQLstatements

④ RenderHTML

Page 30: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Cassandra Interpreter Commands

30

Native CQL statements

SELECT * FROM …;INSERT INTO …;…

Schema commands

DESCRIBE TABLE …;DESCRIBE KEYSPACE …;…

Prepared statements Commands

@prepare …;@bind …;@remove_prepared …;

Help commandHELP;

Options Commands@consistency …;@retryPolicy …;@fetchSize …;

Page 31: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Complex interpreter (Cassandra) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 32: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Cassandra Online Interpreter Docs

32

•  http://zeppelin.incubator.apache.org/docs/interpreter/cassandra.html

Page 33: Apache zeppelin the missing component for the big data ecosystem

Zeppelin Future

Roadmap

Page 34: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Enterprise Ready•  Apache Shiro authentication (ZEPPELIN-548)•  Note authorization (PR #681) •  Multi-tenancy

34

Page 35: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Usability•  UX improvement•  Better table data support•  Export data as CSV etc . (PR #725, PR #714, PR #6, PR #89)•  Table pagination …

35

Page 36: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Pluggability•  Pluggable visualization•  Pluggable interpreter•  Repository and registry for pluggable components

36

Page 37: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

More interpreters

37

Page 38: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Q & R

! "

Page 39: Apache zeppelin the missing component for the big data ecosystem

@doanduyhai

Thank You @doanduyhai

[email protected]

http://zeppelin.incubator.apache.org/