apache zeppelin the missing component for the big data ecosystem
TRANSCRIPT
@doanduyhai #VoxxedVienna
Apache Zeppelin the missing GUI for your
BigData eco-system DuyHai DOAN
Apache Cassandra Evangelist
@doanduyhai
Who Am I ?Duy Hai DOAN Cassandra technical advocate• talks, meetups, confs• open-source devs (Achilles, …)• OSS Cassandra point of contact
☞ [email protected] ☞ @doanduyhai
2
@doanduyhai
Datastax• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 400+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = OSS Cassandra + extra features
3
What is Apache Zeppelin ?
PresentationArchitecture
@doanduyhai
Zeppelin Presentation
5
@doanduyhai
Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation
@doanduyhai
Zeppelin Architecture
Zeppelin Server
Zeppelin Engine
7
REST
Web
Sock
et
Spark Interpreter Group
Spark SparkSQL
Zeppelin Interpreter
Factory
Tajo Interpreter
Flink Interpreter
Cassandra Interpreter
JVM
JVM
JVM
JVM
JVM
@doanduyhai
What does Zeppelin provide ?Front-end & display system for free
Generic back-end with REST APIs & WebSocket
Pluggable interpreters system
Task scheduler (à la CRON)
8
Zeppelin UI Layout
NotebookParagraph
UI elements
@doanduyhai
Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation
Zeppelin Display System
Raw, Table, HTML, Angular with ScalaAvailable graphs
View modesDynamic formIframe export
@doanduyhai
Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation
Interpreter to Front-End Streaming
@doanduyhai
Interpreter to front-end streaming
Zeppelin Server
14
Web
Sock
et
Interpreter
JVM
JVM
@doanduyhai
Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation
Interpreter system
Core interpretersThird-parties interpreters
Interpreters conf & usage
@doanduyhai
Interpreter processing lifecycle① Receive input commands/data• as raw text
• from form data
② Process the input commands/data by the external back-end
③ Format the response using Zeppelin display system
④ Send response back to the Zeppelin engine
17
@doanduyhai
Core interpreters• Spark (Spark core, SparkSQL/DataFrame, PySpark)• Spark core = default (or %spark)
• SparkSQL = %sql
• Shell (%sh)
• Markdown (%md)
• AngularJS (%angular)
18
@doanduyhai
Third-parties interpreters• Hive• Phoenix• Tajo• Flink• Ignite• Lens• Cassandra • Geode• PostgreSQL• Kylin• ElasticSearch
19
@doanduyhai
Interpreter conf & usage https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation
Writing An Interpreter
How ToSimple interpreter example (AsciiDoc)
Complex interpreter example (Cassandra)
@doanduyhai
Steps to write your own interpreter
• Create a class that extends Interpreter base class• Register it in a static block
• Optionnally define default config params
22
static {Interpreter.register("MyInterpreterName", MyClassName.class.getName());
}
static {Interpreter.register("MyInterpreterName", MyClassName.class.getName(),
new InterpreterPropertyBuilder() .add("property1", "default value", "Description of property1").build());
}
@doanduyhai
To register your interpreter as default
• Edit the enum ZeppelinConfiguration.ConfVars
• Add your interpreter FQCN in the property ZEPPELIN_INTERPRETERS
23
@doanduyhai
To register your interpreter in config files
• Create conf/zeppelin-site.xml from conf/zeppelin-site.xml.template
• Add your interpreter FQCN in the property zeppelin.interpreters
24
<property><name>zeppelin.interpreters</name><value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,
org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,com.me.MyNewInterpreter
</value></property>
@doanduyhai
Update interpreter pom.xml
25
@doanduyhai
Update main pom.xml
26
@doanduyhai
Simple AsciiDoc Interpreter
27
Zeppelin Server
AsciiDoc Interpreter
JVM Zeppelin Engine
RawTextBlock
RawTextBlock
ConvertedTo
HTML
HTMLOutput
① ②
③ ④
JVM
@doanduyhai
Simple interpreter (AsciiDoc) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation
@doanduyhai
Cassandra Interpreter Architecture
29
Cassandra Interpreter
JVM
DisplayResults as
HTML
① ②
⑤
Zeppelin Server
JVM
RawTextBlock
RawTextBlock
Cassandra Cassandra
Java Driver
③
Async CQLstatements
④ RenderHTML
⑥
@doanduyhai
Cassandra Interpreter Commands
30
Native CQL statements
SELECT * FROM …;INSERT INTO …;…
Schema commands
DESCRIBE TABLE …;DESCRIBE KEYSPACE …;…
Prepared statements Commands
@prepare …;@bind …;@remove_prepared …;
Help commandHELP;
Options Commands@consistency …;@retryPolicy …;@fetchSize …;
@doanduyhai
Complex interpreter (Cassandra) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation
@doanduyhai
Cassandra Online Interpreter Docs
32
• http://zeppelin.incubator.apache.org/docs/interpreter/cassandra.html
Zeppelin Future
Roadmap
@doanduyhai
Enterprise Ready• Apache Shiro authentication (ZEPPELIN-548)• Note authorization (PR #681) • Multi-tenancy
34
@doanduyhai
Usability• UX improvement• Better table data support• Export data as CSV etc . (PR #725, PR #714, PR #6, PR #89)• Table pagination …
35
@doanduyhai
Pluggability• Pluggable visualization• Pluggable interpreter• Repository and registry for pluggable components
36
@doanduyhai
More interpreters
37
@doanduyhai
Q & R
! "