flink batch processing and iterations

Batch Processing using Apache FlinkBy - Sameer Wadkar

Flink API

Table

• Input is in the form of files or collections (Unit Testing)

• Results of transformations are returned as Sinks which may be files or command line terminal or collections (Unit Testing)

DataStreamDataSet

• SQL like expression language embedded in Java/Scala

• Instead of working with DataSet or DataStream use Table abstraction

• Similar to DataSet but applies to streaming data

Source Code

Source Code for examples presented can be downloaded from

https://github.com/sameeraxiomine/FlinkMeetup

https://github.com/sameeraxiomine/FlinkMeetup

Flink DataSet API – Word Count public class WordCount { public static void main(String[] args) throws Exception { final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<String> text = getLines(env); //Create DataSet from lines in file DataSet<Tuple2<String, Integer>> wordCounts = text .flatMap(new LineSplitter()) .groupBy(0) //Group by first element of the Tuple .aggregate(Aggregations.SUM, 1); wordCounts.print();//Execute the WordCount job } /*FlatMap implantation which converts each line to many <Word,1> pairs*/ public static class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override public void flatMap(String line, Collector<Tuple2<String, Integer>> out) { for (String word : line.split(" ")) { out.collect(new Tuple2<String, Integer>(word, 1)); } } }

Source Code -https://github.com/sameeraxiomine/FlinkMeetup/blob/master/src/main/java/org/apache/flink/examples/WordCount.java

https://github.com/sameeraxiomine/FlinkMeetup/blob/master/src/main/java/org/apache/flink/examples/WordCount.java









Flink Batch API (Table API)public class WordCountUsingTableAPI { public static void main(String[] args) throws Exception { final ExecutionEnvironment env = ExecutionEnvironment .getExecutionEnvironment(); TableEnvironment tableEnv = new TableEnvironment(); DataSet<Word> words = getWords(env); Table table = tableEnv.fromDataSet(words); Table filtered = table .groupBy("word") .select("word.count as wrdCnt, word") .filter(" wrdCnt = 2"); DataSet<Word> result = tableEnv.toDataSet(filtered, Word.class); result.print(); }public static DataSet<Word> getWords(ExecutionEnvironment env) { //Return DataSet of Word}public static class Word { public String word; public int wrdCnt; public Word(String word, int wrdCnt) { this.word = word; this.wrdCnt = wrdCnt; } public Word() {} // empty constructor to satisfy POJO requirements @Override public String toString() { return "Word [word=" + word + ", count=" + wrdCnt + "]"; } }}

Source Code -https://github.com/sameeraxiomine/FlinkMeetup/blob/master/src/main/java/org/apache/flink/examples/WordCountUsingTableAPI.java

https://github.com/sameeraxiomine/FlinkMeetup/blob/master/src/main/java/org/apache/flink/examples/WordCountUsingTableAPI.java









Table API – How it worksTable filtered = table .groupBy("word") .select(“word, word.count as wrdCnt")//count(word) .filter(" wrdCnt = 2");DataSet<Word> result = tableEnv.toDataSet(filtered, Word.class);……public static DataSet<Word> getWords(ExecutionEnvironment env) { //Return DataSet of Word}public static class Word { public String word; public int wrdCnt; …}

groupby Word.wordCount words (word.count as wrdCnt)& emit word,wrdCnt

Transform to DataSet<Word> using reflection

Filter words with wrdCnt ==2

Iterative Algorithm

Input Data

Update Inputyes

Output

Read

Write1

Iteration

Continue?

2

3

5

4

Result of the last iteration

Iterative Algorithm - MapReduce

Input Data

Update Input

Output

Read

Write1

Iteration

Continue?

2

3

5

4

Result of the last iteration

HDFS

HDFS

MapReduce Job

Check Counters orNew MapReduce job

yes

Iterative Algorithm - Spark

Input Data

Update RDD and Cache

Output

Read

Write1

Iteration

Continue?

2

3

5

4

Write to Disk

HDFS

RDD

Spark Action

Spark Action or check counters

yes

Iterative Algorithm - Flink

Input Data

New Input Data

Output

Read

Write1

Iteration

Continue?

2

3

5

4

Write to Disk

Dataset

IterativeDataSet

DeltaIteration

or

Inside Job IterationAggregatorConvergenceCriterion

Dataset

Pipelined

yes

Batch Processing - Iterator Operators

• Iterative algorithms are common used in • Graph processing• Machine Learning – Bayesian, Numerical Solutions, Optimization

algorithms

• Accumulators can be used as Job Level Counters• Aggregators are used as Iteration level Counters

• Reset at the end of each iteration• Can specify a convergence criterion to exit the loop (iterative process)

Bulk Iterations vs Delta Iterations

• Bulk Iterations are appropriate when entire datasets are consumed per iteration.

• Example - K Means Clustering algorithm

• Delta Iterations are exploit the following features• Each iteration processes on a subset of full DataSet• The working dataset become smaller in each iterations allowing the

iterations to become faster in each subsequent step• Example – Graph processing (Propagate minimum in a graph)

Bulk Iteration – Toy Example• Consider a DataSet<Long> of random numbers from 0-99. This DataSet

can be arbitrarily large• Each number needs to be incremented simultaneously • Stop when the sum of all numbers exceeds an arbitrary but user defined

value ( Ex. noOfElements * 20000) at the end of the iteration

i1+1 i2+1 i3+1 in+1….

Increment all numbers simultaneously

Input Dataset of numbers

Is sum of all numbers > NNo

End

Bulk Iteration – Sample Dataset of 5 elements

Initial Dataset Final Dataset

<46,46> <46, 19999>

<32,32> <32, 19985>

<48,48> <48, ,20001>

<39,39> <39, 19992>

<73,73> <73, 20026>

Initial Total = 238 Final Total = 100,003

• DataSet<Tuple2<Long,Long>> is used as Input where the first element is the key and the second element is incremented each iteration

• Sum of all second elements of the Tuple2 cannot exceed 100000

Bulk Iteration – Solution

• Solution Highlights• Cannot use counters (Accumulators) to determine when to stop.

Accumulators are guaranteed to be accurate only at the end of the program

• Aggregator’s are used at the end of each iteration to verify terminating condition

• Source Code - https://github.com/sameeraxiomine/FlinkMeetup/blob/master/src/main/java/org/apache/flink/examples/AdderBulkIterations.java

https://github.com/sameeraxiomine/FlinkMeetup/blob/master/src/main/java/org/apache/flink/examples/AdderBulkIterations.java



Bulk Iteration – Implementation

<46,46><32,32><48,48><39,39><73,73>

Input

1

Map

Map

Map

……

Step Function(Add 1)

2

Iterate (Max 100,000 times)

Check for terminating condition

(Synchronize)

Feedback to next iteration

3

<46,19999><32,19985><48,20001><39,19992><73,20026>

Output

4

Terminates after 19953 iterations

Bulk Iteration – Source Codepublic static void main(String[] args) throws Exception { final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); //First create an initial dataset IterativeDataSet<Tuple2<Long, Long>> initial = getData(env) .iterate(MAX_ITERATIONS);//Register Aggregator and Convergence Criterion Classinitial.registerAggregationConvergenceCriterion("total", new LongSumAggregator(), new VerifyIfMaxConvergence()); //IterateDataSet<Tuple2<Long, Long>> iteration = initial.map( new RichMapFunction<Tuple2<Long, Long>, Tuple2<Long, Long>>() { private LongSumAggregator agg = null; @Override public void open(Configuration parameters) { this.agg = this.getIterationRuntimeContext().getIterationAggregator("total"); } @Override public Tuple2<Long, Long> map(Tuple2<Long, Long> input) throws Exception { long incrementF1 = input.f1 + 1; Tuple2<Long, Long> out = new Tuple2<>(input.f0, incrementF1); this.agg.aggregate(out.f1); return out; } }); DataSet<Tuple2<Long, Long>> finalDs = initial.closeWith(iteration); //Close Iteration finalDs.print(); //Consume output}public static class VerifyIfMaxConvergence implements ConvergenceCriterion<LongValue>{ @Override public boolean isConverged(int iteration, LongValue value) {

return (value.getValue()>AdderBulkIterations.ABSOLUTE_MAX); }}

Bulk Iteration – Steps

Create intial Dataset(IterativeDataSet)And define max iterations

IterativeDataSet<Tuple2<Long, Long>> initial = getData(env).iterate(MAX_ITERATIONS);

Register Convergence Criterion

initial.registerAggregationConvergenceCriterion("total", new LongSumAggregator(), new VerifyIfMaxConvergence());

Execute Iterations and update aggregator and check for convergence at end of each iteration

End Iteration by executing closewith(DataSet) on the IterativeDataSet

DataSet<Tuple2<Long, Long>> iteration = initial.map(new RichMapFunction<Tuple2<Long, Long>, Tuple2<Long, Long>>() { return new Tuple2<>(input.f0, input.f1+1);});

class VerifyIfMaxConvergence implements ConvergenceCriterion{ public boolean isConverged(int iteration, LongValue value) { return (value.getValue()>AdderBulkIterations.ABSOLUTE_MAX); }}

DataSet<Tuple2<Long, Long>> finalDs = initial.closeWith(iteration);

finalDs.print();//Consume results

Bulk Iteration – The Wrong Way

DataSet<Tuple2<Long, Long>> input = getData(env); DataSet<Tuple2<Long, Long>> output = input; for(int i=0;i<MAX_ITERATIONS;i++){ output = input.map(new MapFunction>() { public Tuple2<Long, Long> map(Tuple2<Long, Long> input) { return new Tuple2<>(input.f0, input.f1+1); } }); //This is what slows down iteration. Job starts immediately here long sum = output.map(new FixTuple2()).reduce(new ReduceFunc()) .collect().get(0); input = output;//Prepare for next iteration System.out.println("Current Sum="+sum); if(sum>100){ System.out.println("Breaking now:"+i); break; }} output.print();

• Flink cannot optimize because job executes immediately on long sum = output.map(new FixTuple2()).reduce(new ReduceFunc()).collect().get(0);

• https://github.com/sameeraxiomine/FlinkMeetup/blob/master/src/main/java/org/apache/flink/examples/AdderBulkIterationsWrongWay.java

https://github.com/sameeraxiomine/FlinkMeetup/blob/master/src/main/java/org/apache/flink/examples/AdderBulkIterationsWrongWay.java



Delta Iteration – Example

1 2 3

4

5

6 7

11 12

8 9

10 13

15

Given the following events and their relationships propagate root id of each event to its children

Source Code - https://github.com/sameeraxiomine/FlinkMeetup/blob/master/src/main/java/org/apache/flink/examples/DeltaIterationExample.java

https://github.com/sameeraxiomine/FlinkMeetup/blob/master/src/main/java/org/apache/flink/examples/DeltaIterationExample.java




Delta Iteration – Initial and Final Dataset

• Each event is represented as a Tuple2<Integer,Integer> instance

• Tuple2.f0 is the EventId• Tuple2.f1 is the ParentId

Vertex Edge<1,1> <1,2>

<2,2> <2,3>

<3,3> <2,4>

<4,4> <3,5>

<11,11> <6,7>

<12,12> <8,9>

<15,15> <8,10>

<6,6> <5,6>

<7,7> <7,6>

<8,8> <8,8>

<9,9> <9,8>

<10,10> <10,8>

<13,13> <13,8>

Delta Iteration – Implementation

Step Function

2

Check for Convergence

or empty workset

Next Workset3

Initial Workset

Initial SolutionSet Solution Set 3 4

Final SolutionSet

Delta Iteration

• Initial Workset and SolutionSet are identical• Each iteration updates the SolutionSet and reduces the size of the Workset• Iteration terminates when

• Max iterations are reached• Workset fed back (3 below) is empty

• SolutionSet at termination is the result of the Iteration job

Delta Iteration – Working Set at end of Iteration 1

1,1 2,1 3,2

4,2

5,3

6,6 7,7

11,5

12,11

8,8 9,8

10,8

13,10

15,1

Working Set

= Dropped off from working set


1,1 2,1 3,1

4,1

5,2

6,6 7,6

11,3

12,5

8,8 9,8

10,8

13,8

15,1

Working Set



1,1 2,1 3,1

4,1

5,1

6,6 7,6

11,2

12,3

8,8 9,8

10,8

13,8

15,1

Working Set



1,1 2,1 3,1

4,1

5,1

6,6 7,6

11,1

12,2

8,8 9,8

10,8

13,8

15,1

Working Set



1,1 2,1 3,1

4,1

5,1

6,6 7,6

11,1

12,1

8,8 9,8

10,8

13,8

15,1

Working Set



1,1 2,1 3,1

4,1

5,1

6,6 7,6

11,1

12,1

8,8 9,8

10,8

13,8

15,1

Working Set is empty!


Delta Iteration – At Scale• Imagine a dataset of over 10 billion transactions and sub-graphs of average

size 10• Total of 1 billion sub-graphs• Each iteration drops of 1 billion vertices • Over in about 10 iterations • Can optimize more with Tuple3 (Maintain information whether root vertex id is

propagated to a vertex). Save another iteration at the expense of increasing storage requirements

• First iteration drops 10%. By end of 5th iteration working set drops by 20% wrt. the working set at the beginning of the iteration. Iterations get progressively faster

Delta Iteration – Source Code private static final int MAX_ITERATIONS = 10; public static void main(String... args) throws Exception { // set up execution environment ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

// read vertex and edge data // initially assign parent vertex id== my vertex id DataSet<Tuple2<Long, Long>> vertices = GraphData.getDefaultVertexDataSet(env); DataSet<Tuple2<Long, Long>> edges = GraphData.getDefaultEdgeDataSet(env); int vertexIdIndex = 0; // open a delta iteration DeltaIteration<Tuple2<Long, Long>, Tuple2<Long, Long>> iteration = vertices.iterateDelta(vertices , MAX_ITERATIONS, vertexIdIndex);

// apply the step logic: join with the edges, // update if the component of the candidate is smaller DataSet<Tuple2<Long, Long>> changes = iteration.getWorkset() .join(edges).where(0).equalTo(0) /* Update the parentVertex=parent.id */ .with(new NeighborWithComponentIDJoin()) /* Merge with solution set */ .join(iteration.getSolutionSet()) .where(0).equalTo(0) /* Only pass on the changes to next iteration */ .with(new ComponentIdFilter()); // close the delta iteration (delta and new workset are identical) DataSet<Tuple2<Long, Long>> result = iteration.closeWith(changes, changes); result.print(); }

Delta Iteration – Read Vertices and Edges // read vertex and edge data // initially assign parent vertex id== my vertex id DataSet<Tuple2<Long, Long>> vertices = GraphData.getDefaultVertexDataSet(env); DataSet<Tuple2<Long, Long>> edges = GraphData.getDefaultEdgeDataSet(env);

Vertex

Vertex Edge

<1,1> <1,2>

<2,2> <2,3>

<3,3> <2,4>

<4,4> <3,5>

<11,11> <6,7>

<12,12> <8,9>

<15,15> <8,10>

<6,6> <5,11>

<7,7> <11,12>

<8,8> <10,13>

<9,9> <9,14>

<10,10> <1,15>

<13,13>

Vertex – Tuple2<Long,Long>f0 – Vertex Idf1 – Root Id

Edge– Tuple2<Long,Long>f0 – Parent Idf1 – Receiving Id

Delta Iteration – Initiate Delta Iteration int vertexIdIndex = 0; //Why does this need to be passed to the iterateDelta function // open a delta iteration DeltaIteration<Tuple2<Long, Long>, Tuple2<Long, Long>> iteration = vertices.iterateDelta(vertices , MAX_ITERATIONS, vertexIdIndex );

Vertex

• After each iteration during the merge step, only the delta solution set is shuffled

• The elements of the delta solution set end up on the same nodes as the initial solution set and merged (Always join on keys)

• This is considerable cheaper than shuffling the delta solution set and the full solution set

• As the size of the delta solution set reduces in size this optimization reaps increasingly higher performance benefits with subsequent iteration steps

Initial Solution

Set

Partition by key indices

Cache Each

PartitionStart

Solution Set is never shuffled

again

Delta Iteration – Step Clause (Step 1)DataSet<Tuple2<Long, Long>> changes = iteration.getWorkset().join(edges).where(0).equalTo(0) /* Update the parentVertex*/ .with(new NeighborWithComponentIDJoin())

public static final class NeighborWithComponentIDJoin implements JoinFunction<Tuple2<Long, Long>, Tuple2<Long, Long>, Tuple2<Long, Long>> { @Override public Tuple2<Long, Long> join(Tuple2<Long, Long> vertexWithComponent, Tuple2<Long, Long> edge) { return new Tuple2<Long, Long>(edge.f1, vertexWithComponent.f1); } }

1,1 1 2

e.f0=1e.f1=2

v.f0=1v.f1=1

Join on v.f0=e.f0

Vertex Edge New Vertex

2,1

Apply with(NeighborWithComponentIDJoin )

The above shows how event 2 gets a new parent id

Delta Iteration – Merge With Solution Set changes = ...with(new NeighborWithComponentIDJoin()) .join(iteration.getSolutionSet()).where(0).equalTo(0) .with(new ComponentIdFilter()); //Close with DeltaSolutionSet and NewWorkingSet. Both are equal to changes variable DataSet<Tuple2<Long, Long>> result = iteration.closeWith(changes, changes); public static final class ComponentIdFilter implements FlatJoinFunction { public void join(..) { if (candidate.f1 < old.f1) { out.collect(candidate); } } }

2,2Join on v.f0=s.f0

InitialSolution Set

Result of Step Delta Solution Set & Work Set

2,1

Apply with(ComponentIdFilter)

1,1

2,1

Updated Solution Set

2,1

1,1

The above shows how the parent id’s of event’s 1 and 2 transition by the end of iteration 1. Event id 1 does not make it past the step function.

….….

2,2

Flink Runtime Merges

Framework merges Delta Solution Set with Solution Set on Index Indices

Remember – Only Delta Solution Set is Shuffled

Delta Iteration – End of Iteration 1Initial Dataset /Initial Solution Set

Delta Solution Set/ New Workset

Merge Solution Set

<1,1> <1,1>

<2,2> <2,1> <2,1>

<3,3> <3,2> <3,2>

<4,4> <4,2> <4,2>

<11,11> <11,5> <11,5>

<12,12> <12,11> <12,11>

<15,15> <15,1> <15,1>

<6,6> <6,6>

<7,7> <7,6> <7,6>

<8,8> <8,8>

<9,9> <9,8> <9,8>

<10,10> <10,8> <10,8>

<13,13> <13,10> <13,10>

Delta Iteration – End of Iteration 2Working Set Delta Solution Set/

New WorksetMerge Solution Set

<1,1>

<2,1> <2,1>

<3,2> <3,1> <3,1>

<4,2> <4,1> <4,1>

<11,5> <11,3> <11,3>

<12,11> <12,5> <12,5>

<15,1> <15,1>

<6,6>

<7,6> <7,6>

<8,8>

<9,8> <9,8>

<10,8> <10,8>

<13,10> <13,8> <13,8>



<1,1>

<2,1>

<3,1> <3,1>

<4,1> <4,1>

<11,3> <11,2> <11,2>

<12,5> <12,3> <12,3>

<15,1>

<6,6>

<7,6>

<8,8>

<9,8>

<10,8>

<13,8> <13,8>



<1,1>

<2,1>

<3,1>

<4,1>

<11,2> <11,1> <11,1>

<12,3> <12,2> <12,2>

<15,1>

<6,6>

<7,6>

<8,8>

<9,8>

<10,8>

<13,8>



<1,1>

<2,1>

<3,1>

<4,1>

<11,1> <11,1>

<12,2> <12,1> <12,1>

<15,1>

<6,6>

<7,6>

<8,8>

<9,8>

<10,8>

<13,8>



<1,1>

<2,1>

<3,1>

<4,1>

<11,1>

<12,1> <12,1> <12,1>

<15,1>

<6,6>

<7,6>

<8,8>

<9,8>

<10,8>

<13,8>



<1,1>

<2,1>

<3,1>

<4,1>

<11,1>

<12,1>

<15,1>

<6,6>

<7,6>

<8,8>

<9,8>

<10,8>

<13,8>

flink batch processing and iterations

Data & Analytics