apache zeppelin the missing component for the big data ecosystem

@doanduyhai #VoxxedVienna

Apache Zeppelin the missing GUI for your

BigData eco-system DuyHai DOAN

Apache Cassandra Evangelist

@doanduyhai

Who Am I ?Duy Hai DOAN Cassandra technical advocate•  talks, meetups, confs•  open-source devs (Achilles, …)•  OSS Cassandra point of contact

☞ [email protected] ☞ @doanduyhai

2

@doanduyhai

Datastax•  Founded in April 2010

•  We contribute a lot to Apache Cassandra™

•  400+ customers (25 of the Fortune 100), 400+ employees

•  Headquarter in San Francisco Bay area

•  EU headquarter in London, offices in France and Germany

•  Datastax Enterprise = OSS Cassandra + extra features

3

What is Apache Zeppelin ?

PresentationArchitecture

@doanduyhai

Zeppelin Presentation

5

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

@doanduyhai

Zeppelin Architecture

Zeppelin Server

Zeppelin Engine

7

REST

Web

Sock

et

Spark Interpreter Group

Spark SparkSQL

Zeppelin Interpreter

Factory

Tajo Interpreter

Flink Interpreter

Cassandra Interpreter

JVM

JVM

JVM

JVM

JVM

@doanduyhai

What does Zeppelin provide ?Front-end & display system for free

Generic back-end with REST APIs & WebSocket

Pluggable interpreters system

Task scheduler (à la CRON)

8

Zeppelin UI Layout

NotebookParagraph

UI elements

@doanduyhai


Zeppelin Display System

Raw, Table, HTML, Angular with ScalaAvailable graphs

View modesDynamic formIframe export

@doanduyhai


Interpreter to Front-End Streaming

@doanduyhai

Interpreter to front-end streaming

Zeppelin Server

14

Web

Sock

et

Interpreter

JVM

JVM

@doanduyhai


Interpreter system

Core interpretersThird-parties interpreters

Interpreters conf & usage

@doanduyhai

Interpreter processing lifecycle①  Receive input commands/data•  as raw text

•  from form data

②  Process the input commands/data by the external back-end

③  Format the response using Zeppelin display system

④  Send response back to the Zeppelin engine

17

@doanduyhai

Core interpreters•  Spark (Spark core, SparkSQL/DataFrame, PySpark)•  Spark core = default (or %spark)

•  SparkSQL = %sql

•  Shell (%sh)

•  Markdown (%md)

•  AngularJS (%angular)

18

@doanduyhai

Third-parties interpreters•  Hive•  Phoenix•  Tajo•  Flink•  Ignite•  Lens•  Cassandra •  Geode•  PostgreSQL•  Kylin•  ElasticSearch

19

@doanduyhai

Interpreter conf & usage https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Writing An Interpreter

How ToSimple interpreter example (AsciiDoc)

Complex interpreter example (Cassandra)

@doanduyhai

Steps to write your own interpreter

•  Create a class that extends Interpreter base class•  Register it in a static block

•  Optionnally define default config params

22

static {Interpreter.register("MyInterpreterName", MyClassName.class.getName());

}

static {Interpreter.register("MyInterpreterName", MyClassName.class.getName(),

new InterpreterPropertyBuilder() .add("property1", "default value", "Description of property1").build());

}

@doanduyhai

To register your interpreter as default

•  Edit the enum ZeppelinConfiguration.ConfVars

•  Add your interpreter FQCN in the property ZEPPELIN_INTERPRETERS

23

@doanduyhai

To register your interpreter in config files

•  Create conf/zeppelin-site.xml from conf/zeppelin-site.xml.template

•  Add your interpreter FQCN in the property zeppelin.interpreters

24

<property><name>zeppelin.interpreters</name><value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,

org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,com.me.MyNewInterpreter

</value></property>

@doanduyhai

Update interpreter pom.xml

25

@doanduyhai

Update main pom.xml

26

@doanduyhai

Simple AsciiDoc Interpreter

27

Zeppelin Server

AsciiDoc Interpreter

JVM Zeppelin Engine

RawTextBlock

RawTextBlock

ConvertedTo

HTML

HTMLOutput

① ②

③ ④

JVM

@doanduyhai

Simple interpreter (AsciiDoc) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

@doanduyhai

Cassandra Interpreter Architecture

29

Cassandra Interpreter

JVM

DisplayResults as

HTML

① ②

⑤

Zeppelin Server

JVM

RawTextBlock

RawTextBlock

Cassandra Cassandra

Java Driver

③

Async CQLstatements

④ RenderHTML

⑥

@doanduyhai

Cassandra Interpreter Commands

30

Native CQL statements

SELECT * FROM …;INSERT INTO …;…

Schema commands

DESCRIBE TABLE …;DESCRIBE KEYSPACE …;…

Prepared statements Commands

@prepare …;@bind …;@remove_prepared …;

Help commandHELP;

Options Commands@consistency …;@retryPolicy …;@fetchSize …;

@doanduyhai

Complex interpreter (Cassandra) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

@doanduyhai

Cassandra Online Interpreter Docs

32

•  http://zeppelin.incubator.apache.org/docs/interpreter/cassandra.html

Zeppelin Future

Roadmap

@doanduyhai

Enterprise Ready•  Apache Shiro authentication (ZEPPELIN-548)•  Note authorization (PR #681) •  Multi-tenancy

34

@doanduyhai

Usability•  UX improvement•  Better table data support•  Export data as CSV etc . (PR #725, PR #714, PR #6, PR #89)•  Table pagination …

35

@doanduyhai

Pluggability•  Pluggable visualization•  Pluggable interpreter•  Repository and registry for pluggable components

36

@doanduyhai

More interpreters

37

@doanduyhai

Q & R

! "

@doanduyhai

Thank You @doanduyhai

[email protected]

http://zeppelin.incubator.apache.org/

apache zeppelin the missing component for the big data ecosystem

Technology