java8 stream apiとapache sparkとasakusa frameworkの類似点・相違点
TRANSCRIPT
-
2015/11/28
Java8 Stream APIApache SparkAsakusa Framework
JJUG CCC 2015 Fall
-
2
lJJUGJava8 Stream API
Java8 Stream APIApache SparkAsakusa FrameworkDAGAsakusa Framework
-
3
1.2.Stream APISparkAsakusaFW3.Stream APISparkAsakusaFW4.Stream APISparkAsakusaFW5.
-
4
Twitter ID@hishidama
http://hiroba.dqx.jp/sc/character/1091135261820/
AsakusaFWDQ10
-
5
2006 Apache HadoopNutchJava6
2010 2 Hadoop2010 SparkOSS2011 3 Asakusa Framework2011 7 Spark2012 2 SIer2012 8 DQ102014 2 Apache Spark2014 3 Java8
-
6
StreamAPISparkAsakusaFW
-
7
Java8 Stream APIAPI
Java SE 7/8
-
8
Apache Hadoop1/3lHDFSlMapReducelYARN
1.2.MapReducejar
-
9
Apache Hadoop2/3
DB
app
app
Hadoop
Hadoop
app
app
-
10
Apache Hadoop3/3Hadoopl Hadoop
HadooplMapReduce
l
-
11
Apache SparklRDDScalal
lHDFS
AMPLabDatabrickslhttps://databricks.com/spark/aboutlApache Spark
-
12
Asakusa FrameworklHadoopSpark
l http://www.asakusafw.com/
-
13
StreamAPISparkAsakusaFW
-
14
Java8 Stream APIMyOperator operator = new MyOperator();
Stream s0 = ;
Stream s1 = s0.filter(operator::f);
Stream s2 = s1.map(operator::m);
List out1 = s2.collect(Collectors.toList());
-
15
Java8 Stream APIMyOperatorpublic class MyOperator {
public boolean f(Data data) {
return data.getValue() % 2 == 0;
}
public Data m(Data data) {
return new Data(data.getValue() + 1);
}
}
-
16
Java8 Stream APIMyOperator operator = new MyOperator();
Stream s0 = ;
Stream s1 = s0.filter(operator::f);
Stream s2 = s1.map(operator::m);
List out1 = s2.collect(Collectors.toList());
DAG
-
17
DAG1/2l ER
-
18
DAG2/2Directed Acyclic Graphll
-
19
Java8 Stream APIMyOperator operator = new MyOperator();
Stream s0 = ;
Stream s1 = s0.filter(operator::f);
Stream s2 = s1.map(operator::m);
List out1 = s2.collect(Collectors.toList());
s0 filter
f map m
out1
-
20
Scalaval operator = new MyOperator
val s0 : Stream[Data] =
val s1 = s0.filter(operator.f)
val s2 = s1.map(operator.m)
val out1 = s2.toSeq
s0 filter
f map m
out1
-
21
ScalaMyOperatorclass MyOperator {
def f(data: Data) : Boolean =
data.getValue() % 2 == 0
def m(data: Data) : Data =
Data(data.getValue() + 1)
}
-
22
Apache Sparkval sc = new SparkContext()
val operator = new MyOperator
val s0 : RDD[Data] = sc.
val s1 = s0.filter(operator.f)
val s2 = s1.map(operator.m)
s2.saveAsTextFile()
MyOperatorScala
s0 filter
f map m
out1
-
23
Asakusa FrameworkIn s0 = ; //
Out out1 = ; //
MyOperatorFactory operator = new MyOperatorFactory();
Source s1 = operator.f(s0).out;
Source s2 = operator.m(s1).out;
out1.add(s2);
s0 @Branch
f @Update
m out1
-
24
Asakusa FrameworkMyOperatorFactorypublic abstract class MyOperator {
@Branch
public Filter f(Data data) {
return (data.getValue() % 2 == 0) ? Filter.OUT : Filter.MISSED;
}
@Update
public void m(Data data) {
data.setValue(data.getValue() + 1);
}
}
MyOperatorFactory
-
25
DAG
s0 filter
f map m
out1
s0 @Branch
f @Update
m out1
-
26
StreamAPISparkAsakusaFW
-
27
1unionjoinzip
s0
out1
s1
-
28
1 unionStream API
Stream out = Stream.concat(Stream.concat(s0, s1), s2);
Spark
val out = s0 ++ s1 ++ s2
AsakusaFW
Source out = core.confluent(s0, s1, s2);
1,abc
2,def
1,foo
3,bar
1,abc
2,def
1,foo
3,bar
-
29
1 joinStream API
Spark
val out = s0.join(s1)
AsakusaFW
Source out = operator.join(s0, s1).joined; // @MasterJoin
1,abc
2,def
1,foo
3,bar
1,abc,foo
-
30
1 cogroupStream API
Spark
val out = s0.cogroup(s1)
AsakusaFW
Source out = operator.group(s0, s1).out; // @CoGroup
1,abc
2,def
1,foo
3,bar
2,def,null 1,abc,foo
3,null,bar
-
31
1zip zipStream API
Spark
val out = s0.zip(s1)
AsakusaFW
1,abc
2,def
1,foo
3,bar
zip 2,def,3,bar 1,abc,1,foo
-
32
2duplicate
s0
2 out2
1 out1
-
33
2duplicate duplicateStream API
Spark
val out1 = s0.map(operator.m1) val out2 = s0.map(operator.m2)
AsakusaFW
Source out1 = operator.m1(s0).out; Source out2 = operator.m2(s0).out;
-
34
3branch
s0
out2
out1
-
35
3 branchStream API
Spark
AsakusaFW
// @Branch Branch result = operator.branch(s0); Source out1 = result.out1; Source out2 = result.out2; Source out3 = result.out3;
-
36
Stream API ListStream Stream
Scala
Stream
Spark Scala
AsakusaFW HadoopSpark
Hadoop, Spark
-
37
Asakusa Framework1/4Hadoop
Hadoop
Spark
-
38
Asakusa Framework2/4Asakusa FrameworkjarHadoopMapReduce
Spark
-
39
Asakusa Framework3/4
1. Hadoopl Hadoop180l 4550
2. Hadoopl 1015
3. Sparkl 34
-
40
Asakusa Framework3/4DAG
@Convert
@CoGroup
@Summarize
@CoGroup
@MJoinUpdate
1 255
@MJoinUpdate
-
41
Asakusa Framework4/4
Spark
AsakusaFW
-
42
HadoopHDD
HDD
-
43
HDDSSDHDD
CPU100
MRAM
-
44
Java8 Stream API
parallel()
Apache SparkexecutorSpark
Asakusa Framework
-
45
-
46
DAG
AsakusaFW@Update@MasterJoinmapfilterStream APIStream APIAsakusaFW
AsakusaFWSparkAsakusaFW
-
47
http://www.adventar.org/calendars/1166AsakusaFW
DQ10