asakusa frameworkとscalaの密かな関係
TRANSCRIPT
-
2016/10/8
Asakusa FrameworkScala
ScalaSummit 2016
-
2
1.
ScalaAsakusaFW
2.ScalaSparkAsakusaFW3.ScalaSparkAsakusaFW4.AsakusaFW5.
-
3
Twitter ID@hishidama
http://hiroba.dqx.jp/sc/character/1091135261820/
AsakusaFWDQ10
-
4
2004 Scala2006 Apache HadoopJava62010 2 HadoopHBase2010 SparkOSS2010 Scala2011 3 Asakusa Framework2011 7 Spark2012 2 SIer2012 8 DQ102014 2 Apache Spark2014 3 Java8
-
5
Hadoop1. 2. l Hadoop
http://techblog.yahoo.co.jp/architecture/hadoop/ 6635 53470
l 8 33
3. Hadoopl Twitter
-
6
HBase1. HadoopNoSQLHBase
RDBHBase
HBaseHadoop
-
7
NoSQLNoSQLNot Only SQL
SQLRDBDBDB
CAPCconsistencyAavailabilityPpartition toleranceCANoSQLCPNoSQL CA
-
8
Scala1. HBaseScala
Better JavaScala
import
2. Scala ScalaSeq
orz
-
9
Asakusa Framework1. Hadoop
HadoopMapReduce Hive Pig Cascading Huahin FrameworkAsakusaFW AZAREA ClusterAsakusaFW
AsakusaFW
-
10
Spark1. Hadoop
SparkScala
Mesos
SparkHadoop
-
11
ScalaSparkAsakusaFW
-
12
Scala
-
13
Apache Hadoop1/3lHDFSlMapReducelYARN
1.2.MapReducejar
-
14
Apache Hadoop2/3
DB
app
app
Hadoop
Hadoop
app
app
-
15
Apache Hadoop3/3Hadoopl Hadoop
HadooplMapReduce
l
-
16
Apache SparklRDDScalal
lHDFS
AMPLabDatabrickslhttps://databricks.com/spark/aboutlApache Spark
-
17
Asakusa FrameworklJavaDSLlHadoopSparkM3BP
l http://www.asakusafw.com/
-
18
ScalaSparkAsakusaFW
-
19
Scalaval operator = new MyOperator
val s0 : Stream[Data] =
val s1 = s0.filter(operator.f)
val s2 = s1.map(operator.m)
val out1 = s2.toSeq
-
20
ScalaMyOperatorclass MyOperator {
def f(data: Data) : Boolean =
data.getValue() % 2 == 0
def m(data: Data) : Data =
Data(data.getValue() + 1)
}
-
21
Scalaval operator = new MyOperator
val s0 : Stream[Data] =
val s1 = s0.filter(operator.f)
val s2 = s1.map(operator.m)
val out1 = s2.toSeq
DAG
-
22
DAG1/2l ER
-
23
DAG2/2Directed Acyclic Graphll
-
24
Scalaval operator = new MyOperator
val s0 : Stream[Data] =
val s1 = s0.filter(operator.f)
val s2 = s1.map(operator.m)
val out1 = s2.toSeq
s0 filter
f map m
out1
-
25
Apache Sparkval sc = new SparkContext()
val operator = new MyOperator
val s0 : RDD[Data] = sc.
val s1 = s0.filter(operator.f)
val s2 = s1.map(operator.m)
s2.saveAsTextFile(out1)
MyOperatorScala
s0 filter
f map m
out1
-
26
Asakusa FrameworkIn s0 = ; //
Out out1 = ; //
MyOperatorFactory operator = new MyOperatorFactory();
Source s1 = operator.f(s0).out;
Source s2 = operator.m(s1).out;
out1.add(s2);
s0 @Branch
f @Update
m out1
-
27
Asakusa FrameworkMyOperatorFactorypublic abstract class MyOperator {
@Branch
public Filter f(Data data) {
return (data.getValue() % 2 == 0) ? Filter.OUT : Filter.MISSED;
}
@Update
public void m(Data data) {
data.setValue(data.getValue() + 1);
}
}
MyOperatorFactory
-
28
DAG
DAG
s0 filter
f map m
out1
s0 @Branch
f @Update
m out1
-
29
1unionjoinzip
s0
out1
s1
-
30
1 unionJava8 Stream API
Stream out = Stream.concat(Stream.concat(s0, s1), s2);
Scala Spark
val out = s0 ++ s1 ++ s2
AsakusaFW
Source out = core.confluent(s0, s1, s2);
1,abc
2,def
1,foo
3,bar
1,abc
2,def
1,foo
3,bar
-
31
1 joinJava8 Stream API
Scala
Spark val out = s0.join(s1)
AsakusaFW
Source out = operator.join(s0, s1).joined; // @MasterJoin
1,abc
2,def
1,foo
3,bar
1,abc,foo
-
32
1 cogroupJava8 Stream API
Scala
Spark val out = s0.cogroup(s1)
AsakusaFW
Source out = operator.group(s0, s1).out; // @CoGroup
1,abc
2,def
1,foo
3,bar
2,def,null 1,abc,foo
3,null,bar
-
33
1zip zipJava8 Stream API
Scala Spark
val out = s0.zip(s1)
AsakusaFW
1,abc
2,def
1,foo
3,bar
zip 2,def,3,bar 1,abc,1,foo
-
34
2duplicate
s0
2 out2
1 out1
-
35
2duplicate duplicateJava8 Stream API
Scala TraversableOnceSpark
Spark
val out1 = s0.map(operator.m1) val out2 = s0.map(operator.m2)
AsakusaFW
Source out1 = operator.m1(s0).out; Source out2 = operator.m2(s0).out;
-
36
3branch
s0
out2
out1
-
37
3 branchJava8 Stream API
Scala Spark
filter
AsakusaFW
// @Branch Branch result = operator.branch(s0); Source out1 = result.out1; Source out2 = result.out2; Source out3 = result.out3;
-
38
AsakusaDAGDAG
@Convert
@CoGroup
@Summarize
@CoGroup
@MJoinUpdate
1 252
@MJoinUpdate
-
39
Java8 Stream API
ListStream Stream
Scala
.par
Spark Scala Streaming
AsakusaFW HadoopSparkM3BP
Hadoop, Spark
-
40
AsakusaFW
-
41
Asakusa Framework1. Hadoop
Hadoop
2.
3. SparkM3BPScalaJava.NETJava
-
42
M3 for Batch ProcessingM3BP
https://github.com/fixstars/m3bpOS
Spark
CPUGB
-
43
1
1
2010Hadoophttp://shiumachi.hatenablog.com/entry/20100703/1278133318CPU 816 1632GB 424TB
http://www.atmarkit.co.jp/ait/articles/1608/22/news027.html CPU 20 256GB 36TB
100GB
-
44
Asakusa FrameworkAsakusaFWjarHadoopMapReduce
SparkM3BP
-
45
Asakusa on MapReduce
Asakusa on Spark
Asakusa on M3BP
javac javac javac CMake gcc/g++
MapReducejava
SparkASMclass scalac
C++
SEGV
-
46
AsakusaAsakusa on MapReduce
Asakusa on Spark
Asakusa on M3BP
1.2GB561 60MB69
110 85 8
29kB900 280B1
15 60 3
11GB21700 940MB783
380 700 260
74MB53 81GB1084
3400 2030 400 256270GB
76GB2420 153MB89
670 360 92
-
47
CPU
HadoopMapR Spark
13 128 750GB
M3BP 1 88 512GB
l M3BPHadoop Spark M3BP
l M3BP 122 881.11.2
-
48
Asakusa Framework
M3BP
AsakusaFW
-
49
1/2Hadoop1HDDHDD
-
50
2/2
SSD2020100TB
CPU100Asakusa on M3BP
RSACPU110TB
MRAM
-
51
Asakusa DSL1. AsakusaFWAsakusa DSLJava
DSL
2. DSLScala3. Asakusa DSLScala
AsakusaFWSIer
SIerJava Asakusa Scala DSL
Asakusa Scala DSLhttps://atnd.org/events/13174
-
52
-
53
Apache Spark
ScalaStreaming
HadoopMapReduceAsakusa Framework
ScalaSIerAsakusa Scala DSL
DQ10DQ10 ver3.4 2016/10/6