Download - Concurrent Stream Processing
Concurrent Stream Processing
Alex Miller - @puredangerRevelytix - http://revelytix.com
Contents• Query execution - the problem• Plan representation - plans in our program• Processing components - building blocks• Processing execution - executing plans
2
Query Execution
Relational Data & Queries
SELECT NAMEFROM PERSONWHERE AGE > 20
4
NAME AGE
Joe 30
RDF"Resource Description Framework" - a fine-grained graph representation of data
5
http://data/Joe
30
"Joe"
http://demo/age
http://demo/name
Subject Predicate Object
http://data/Joe http://demo/age 30
http://data/Joe http://demo/name "Joe"
SPARQL queriesSPARQL is a query language for RDF
6
PREFIX demo: <http://demo/>SELECT ?nameWHERE { ?person demo:age ?age. ?person demo:name ?name. FILTER (?age > 20) }
A "triple pattern"
Natural join on ?person
PREFIX demo: <http://demo/>SELECT ?nameWHERE { ?person demo:age ?age. ?person demo:name ?name. FILTER (?age > 20) }
Relational-to-RDF• W3C R2RML mappings define how to virtually
map a relational db into RDF
7
NAME AGEJoe 30
http://data/Joe
30
"Joe"
http://demo/age
http://demo/name
SELECT NAMEFROM PERSONWHERE AGE > 20
Enterprise federation• Model domain at enterprise level• Map into data sources• Federate across the enterprise (and beyond)
8
Enterprise
SPARQL
SPARQLSPARQLSPARQL
SQLSQLSQL
Query pipeline• How does a query engine work?
9
Parse Plan Resolve Optimize Process
SQL
Results!
AST Plan
Plan
Plan
Metadata
Trees!
10
Parse Plan Resolve Optimize Process
SQL
Results!
AST Plan
Plan
Plan
Metadata
Trees!
Plan Representation
SQL query plans
12
Person
Dept
join filter project
DeptID Age > 20 Name, DeptName
DeptIDDeptName
NameAgeDeptID
SELECT Name, DeptNameFROM Person, DeptWHERE Person.DeptID = Dept.DeptID AND Age > 20
SPARQL query plans
13
TP1
TP2
join filter project
?Person ?Age > 20 ?Name
{ ?Person :Age ?Age }
{ ?Person :Name ?Name }
SELECT ?NameWHERE { ?Person :Name ?Name . ?Person :Age ?Age . FILTER (?Age > 20) }
Common modelStreams of tuples flowing through a network of processing nodes
14
node
node
node node node
What kind of nodes?• Tuple generators (leaves)
– In SQL: a table or view– In SPARQL: a triple pattern
• Combinations (multiple children)– Join– Union
15
• Transformations– Filter– Dup removal– Sort– Grouping
– Project– Slice (limit / offset)– etc
RepresentationTree data structure with nodes and attributes
16
TableTableNode
joinTypejoinCriteria
JoinNode
childNodesPlanNode
criteriaFilterNode
projectExpressionsProjectNode
limitoffset
SliceNode
Java
s-expressionsTree data structure with nodes and attributes
17
(* (+ 2 3) (- 6 5) )
List representationTree data structure with nodes and attributes
18
(project+ [Name DeptName] (filter+ (> Age 20) (join+ (table+ Empl [Name Age DeptID]) (table+ Dept [DeptID DeptName]))))
Query optimizationExample - pushing criteria down
19
(project+ [Name DeptName] (filter+ (> Age 20) (join+ (project+ [Name Age DeptID] (bind+ [Age (- (now) Birth)] (table+ Empl [Name Birth DeptID]))) (table+ Dept [DeptID DeptName]))))
Query optimizationExample - rewritten
20
(project+ [Name DeptName] (join+ (project+ [Name DeptID] (filter+ (> (- (now) Birth) 20) (table+ Empl [Name Birth DeptID]))) (table+ Dept [DeptID DeptName])))
Hash join conversion
21
first+
let+
preduce+
join+
left tree
right tree
hash-tupleshashes
mapcat tuple-matches
left tree
right tree
Hash join conversion
22
(join+ _left _right)
(let+ [hashes (first+ (preduce+ (hash-tuple join-vars {} #(merge-with concat %1 %2)) _left))] (mapcat (fn [tuple] (tuple-matches hashes join-vars tuple)) _right)))
Processing trees
23
• Compile abstract nodes into more concrete stream operations:
– map+– mapcat+ – filter+
– first+ – mux+
– let+– let-stream+
– pmap+– pmapcat+ – pfilter+– preduce+
– number+– reorder+– rechunk+
– pmap-chunk+– preduce-chunk+
Summary• SPARQL and SQL query plans have essentially
the same underlying algebra• Model is a tree of nodes where tuples flow from
leaves to the root• A natural representation of this tree in Clojure is
as a tree of s-expressions, just like our code• We can manipulate this tree to provide
– Optimizations– Differing levels of abstraction
24
Processing Components
PipesPipes are streams of data
26
Producer Consumer
Pipe
(enqueue pipe item)(enqueue-all pipe items)(close pipe)(error pipe exception)
(dequeue pipe item)(dequeue-all pipe items)(closed? pipe)(error? pipe)
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
1. (add-callback pipe callback-fn)
callback-fn
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
1. (add-callback pipe callback-fn)
callback-fn
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
1. (add-callback pipe callback-fn)2. (enqueue pipe "foo")
callback-fn
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
1. (add-callback pipe callback-fn)2. (enqueue pipe "foo")
callback-fn
Pipe callbacks
Events on the pipe trigger callbacks which are executed on the caller's thread
27
1. (add-callback pipe callback-fn)2. (enqueue pipe "foo")3. (callback-fn "foo") ;; during enqueue
callback-fn
PipesPipes are thread-safe functional data structures
28
PipesPipes are thread-safe functional data structures
28
callback-fn
Batched tuples• To a pipe, data is just data. We actually pass
data in batches through the pipe for efficiency.
29
[ {:Name "Alex" :Eyes "Blue" } {:Name "Jeff" :Eyes "Brown"} {:Name "Eric" :Eyes "Hazel" } {:Name "Joe" :Eyes "Blue"} {:Name "Lisa" :Eyes "Blue" } {:Name "Glen" :Eyes "Brown"}]
Pipe multiplexerCompose multiple pipes into one
30
Pipe teeSend output to multiple destinations
31
Nodes• Nodes transform tuples from the input pipe and
puts results on output pipe.
32
fnInput Pipe Output PipeNode
•input-pipe•output-pipe•task-fn•state •concurrency
Processing Trees• Tree of nodes and pipes
33
fn
fnfn
fn
fn
fn
Data flow
SPARQL query example
34
TP1
TP2
join filter project
?Person ?Age > 20 ?Name
{ ?Person :Age ?Age }
{ ?Person :Name ?Name }
SELECT ?NameWHERE { ?Person :Name ?Name . ?Person :Age ?Age . FILTER (?Age > 20) }
(project+ [?Name] (filter+ (> ?Age 20) (join+ [?Person] (triple+ [?Person :Name ?Name]) (triple+ [?Person :Age ?Age]))))
Processing tree
35
TP1
TP2
filter project
?Age > 20 ?Name
{ ?Person :Age ?Age }
{ ?Person :Name ?Name }
first+
preduce+ hash-tuples
hashes
mapcat tuple-matches
let+
Mapping to nodes• An obvious mapping to nodes and pipes
36
fn
fn
fnfnfn fn
fn project+filter+let+
triple pattern
triple pattern
triple pattern
first+
preduce+
Mapping to nodes• Choosing between compilation and evaluation
37
eval
triple pattern
project
?Age > 20 ?Name
filterfn
fn
fnfnfn
fn let+
triple pattern
first+
preduce+
Compile vs eval• We can evaluate our expressions
– Directly on streams of Clojure data using Clojure– Indirectly via pipes and nodes (more on that next)
• Final step before processing makes decision– Plan nodes that combine data are real nodes– Plan nodes that allow parallelism (p*) are real nodes– Most other plan nodes can be merged into single eval– Many leaf nodes actually rolled up, sent to a database– Lots more work to do on where these splits occur
38
Processing Execution
Execution requirements• Parallelism
– Across plans – Across nodes in a plan– Within a parallelizable node in a plan
• Memory management– Allow arbitrary intermediate results sets w/o OOME
• Ops– Cancellation– Timeouts– Monitoring
40
Event-driven processing• Dedicated I/O thread pools stream data into plan
41
fn
fnfn
fn
fn
fn
Compute threadsI/O threads
Task creation1.Callback fires when data added to input pipe2.Callback takes the fn associated with the node
and bundles it into a task3.Task is scheduled with the compute thread pool
42
fncallback Node
Fork/join vs Executors• Fork/join thread pool vs classic Executors
– Optimized for finer-grained tasks– Optimized for larger numbers of tasks– Optimized for more cores– Works well on tasks with dependencies– No contention on a single queue– Work stealing for load balancing
43
Compute threads
Task execution
1.Pull next chunk from input pipe2.Execute task function with access to node's state3.Optionally, output one or more chunks to output
pipe - this triggers the upstream callback4.If data still available, schedule a new task,
simulating a new callback on the current node
44
42
fncallback
Concurrency
• Delicate balance between Clojure refs and STM and Java concurrency primitives
• Clojure refs - managed by STM– Input pipe– Output pipe– Node state
• Java concurrency– Semaphore - "permits" to limit tasks per node– Per-node scheduling lock
• Key integration constraint– Clojure transactions can fail and retry!
45
Concurrency mechanisms
Blue outline = Java lockall = under Java semaphoreGreen outline = Cloj txnBlue shading = Cloj atom
Acquire sempahore Yes Dequeue
inputInput
message Data
Close
set closed = true
empty
closed && !closed_done
Create task
acquire all semaphores
Yesrun-task
w/ nil msg
set closed_done = true
close output-
pipe
release all
semaphores
Yes
invoke task
Result message
release 1 semaphore
No
No
Input closed?
enqueue data on
output pipe
set closed = true
Closes output?
empty
Data
Yes Yes
Close
run-taskclose-output
process-input
Memory management• Pipes are all on the heap• How do we avoid OutOfMemory?
47
Buffered pipes• When heap space is low, store pipe data on disk• Data is serialized / deserialized to/from disk• Memory-mapped files are used to improve I/O
48
fnfn
fn
fn
0100 ….
Memory monitoring• JMX memory beans
– To detect when memory is tight -> writing to disk• Use memory pool threshold notifications
– To detect when memory is ok -> write to memory• Use polling (no notification on decrease)
• Composite pipes– Build a logical pipe out of many segments– As memory conditions go up and down, each segment
is written to the fastest place. We never move data.
49
Cancellation• Pool keeps track of what nodes belong to which
plan• All nodes check for cancellation during execution• Cancellation can be caused by:
– Error during execution – User intervention from admin UI– Timeout from query settings
50
Summary• Data flow architecture
– Event-driven by arrival of data– Compute threads never block– Fork/join to handle scheduling of work
• Clojure as abstraction tool– Expression tree lets us express plans concisely– Also lets us manipulate them with tools in Clojure– Lines of code
• Fork/join pool, nodes, pipes - 1200• Buffer, serialization, memory monitor - 970• Processor, compiler, eval - 1900
• Open source? Hmmmmmmmmmmm……. 51
Thanks...Alex Miller
@puredangerRevelytix, Inc.