using r for iterative and incremental processing
TRANSCRIPT
![Page 1: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/1.jpg)
Using R for Iterative and
Incremental Processing
Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber
UC Berkeley and HP Labs
UC BERKELEY
![Page 2: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/2.jpg)
Big Data, Complex Algorithms
PageRank
(Dominant eigenvector)
Recommendations
(Matrix factorization)
Anomaly detection
(Top-K eigenvalues)
User Importance
(Vertex Centrality) 6/29/2012 2
![Page 3: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/3.jpg)
Big Data, Complex Algorithms
PageRank
(Dominant eigenvector)
Recommendations
(Matrix factorization)
Anomaly detection
(Top-K eigenvalues)
User Importance
(Vertex Centrality) 6/29/2012 3
Machine learning + Graph algorithms
Iterative Linear Algebra Operations
![Page 4: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/4.jpg)
PageRank Using Matrices
Dominant eigenvector
M = modified web graph matrix
p = PageRank vector
6/29/2012 4
P2
P1
PN/s
M
P1
P2
PN/s
…
p Z
P1
P2
PN/s
…
p
s P1
P2
PN/s
…
N
s
N
Simplified algorithm:
repeat { p = M*p + Z}
![Page 5: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/5.jpg)
Breadth-first Search Using Matrices
G = adjacency matrix
X = BFS vector
6/29/2012 5
A B C D E
A 1 1 1 0 0
B 0 1 0 1 0
C 0 1 1 0 0
D 0 0 0 1 1
E 0 0 0 0 1
1 0 0 0 0
A B
C
D
E
X
G
Simplified algorithm:
repeat { X = G*X }
![Page 6: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/6.jpg)
Breadth-first Search Using Matrices
G = adjacency matrix
X = BFS vector
6/29/2012 6
A B C D E
A 1 1 1 0 0
B 0 1 0 1 0
C 0 1 1 0 0
D 0 0 0 1 1
E 0 0 0 0 1
1 0 0 0 0
A B
C
D
E
X
G
* * * 0 0
A B C D E
A B
C
D
E
Simplified algorithm:
repeat { X = G*X }
![Page 7: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/7.jpg)
Breadth-first Search Using Matrices
G = adjacency matrix
X = BFS vector
6/29/2012 7
A B C D E
A 1 1 1 0 0
B 0 1 0 1 0
C 0 1 1 0 0
D 0 0 0 1 1
E 0 0 0 0 1
1 0 0 0 0
A B
C
D
E
X
G
* * * 0 0
A B C D E
* * * * 0
A B C D E
A B
C
D
E
A B
C
D
E
Simplified algorithm:
repeat { X = G*X }
![Page 8: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/8.jpg)
Breadth-first Search Using Matrices
G = adjacency matrix
X = BFS vector
6/29/2012 8
A B C D E
A 1 1 1 0 0
B 0 1 0 1 0
C 0 1 1 0 0
D 0 0 0 1 1
E 0 0 0 0 1
1 0 0 0 0
A B
C
D
E
X
G
* * * 0 0
A B C D E
* * * * 0
A B C D E
* * * * *
A B C D E
A B
C
D
E
A B
C
D
E
A B
C
D
E
Simplified algorithm:
repeat { X = G*X }
![Page 9: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/9.jpg)
Breadth-first Search Using Matrices
G = adjacency matrix
X = BFS vector
6/29/2012 9
A B C D E
A 1 1 1 0 0
B 0 1 0 1 0
C 0 1 1 0 0
D 0 0 0 1 1
E 0 0 0 0 1
1 0 0 0 0 * * * 0 0
A B C D E
* * * * 0
A B C D E
* * * * *
A B C D E
Simplified algorithm:
repeat { X = G*X }
Matrix operations
Easy to express
Efficient to implement
![Page 10: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/10.jpg)
Linear Algebra on Existing Frameworks
6/29/2012 10
Matrix Operations: Structured, coarse grained
Need global state
![Page 11: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/11.jpg)
Linear Algebra on Existing Frameworks
Data-parallel frameworks – MapReduce/Dryad
– Process each record in parallel
– Use case: Computing sufficient statistics
6/29/2012 11
Matrix Operations: Structured, coarse grained
Need global state
![Page 12: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/12.jpg)
Linear Algebra on Existing Frameworks
Data-parallel frameworks – MapReduce/Dryad
– Process each record in parallel
– Use case: Computing sufficient statistics
Graph-centric frameworks – Pregel/GraphLab
– Process each vertex in parallel
– Use case: Graph models
6/29/2012 12
Matrix Operations: Structured, coarse grained
Need global state
![Page 13: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/13.jpg)
Challenge 1 – Sparse Matrices
6/29/2012 13
![Page 14: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/14.jpg)
Challenge 1 – Sparse Matrices
6/29/2012 14
![Page 15: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/15.jpg)
Challenge 1 – Sparse Matrices
6/29/2012 15
![Page 16: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/16.jpg)
Challenge 1 – Sparse Matrices
1
10
100
1000
10000
1 11 21 31 41 51 61 71 81 91 Blo
ck d
en
sity
(n
orm
alized
)
Block ID
LiveJournal Netflix ClueWeb-1B
6/29/2012 16
![Page 17: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/17.jpg)
Challenge 1 – Sparse Matrices
1
10
100
1000
10000
1 11 21 31 41 51 61 71 81 91 Blo
ck d
en
sity
(n
orm
alized
)
Block ID
LiveJournal Netflix ClueWeb-1B
6/29/2012 17
![Page 18: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/18.jpg)
Challenge 1 – Sparse Matrices
1
10
100
1000
10000
1 11 21 31 41 51 61 71 81 91 Blo
ck d
en
sity
(n
orm
alized
)
Block ID
LiveJournal Netflix ClueWeb-1B
1000x more elements Computation imbalance
6/29/2012 18
![Page 19: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/19.jpg)
Challenge 2 – Incremental Updates
Incremental computation on consistent view of data
6/29/2012 19
Refine
recommendations
New movie
ratings
Better
suggestions
![Page 20: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/20.jpg)
Presto
Framework for large-scale iterative linear algebra
Extend R for scalability and incremental updates
6/29/2012 20
![Page 21: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/21.jpg)
Outline
• Motivation
• Programming model
• Design
• Applications and Results
6/29/2012 21
![Page 22: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/22.jpg)
Programming Model
One data structure: Distributed Array
A darray(…)
6/29/2012 22
![Page 23: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/23.jpg)
Programming Model
Iteration: foreach
6/29/2012 23
![Page 24: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/24.jpg)
Programming Model
Iteration: foreach
Compute Compute Compute Compute 6/29/2012 24
![Page 25: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/25.jpg)
Programming Model
Incremental updates: onchange, update
Compute Compute Compute Compute 6/29/2012 25
![Page 26: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/26.jpg)
Programming Model
Incremental updates: onchange, update
Compute Compute Compute Compute
Data Updated
6/29/2012 26
![Page 27: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/27.jpg)
Programming Model
Incremental updates: onchange, update
Compute Compute Compute Compute
Data Updated
6/29/2012 27
![Page 28: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/28.jpg)
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(p=splits(P,i),m=splits(M,i),
x=splits(P_old),z=splits(Z,i)) {
p (m*x)+z
}
) P_old P
} 6/29/2012 28
P2
P1
PN/s
M
P1
P2
PN/s
…
P_old Z
P1
P2
PN/s
…
P
s P1
P2
PN/s
…
N
s
N
![Page 29: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/29.jpg)
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(p=splits(P,i),m=splits(M,i),
x=splits(P_old),z=splits(Z,i)) {
p (m*x)+z
}
) P_old P
}
Create Distributed Array
6/29/2012 29
P2
P1
PN/s
M
P1
P2
PN/s
…
P_old Z
P1
P2
PN/s
…
P
s P1
P2
PN/s
…
N
s
N
![Page 30: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/30.jpg)
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(p=splits(P,i), m=splits(M,i),
x=splits(P_old), z=splits(Z,i)) {
p (m*x)+z
}
) P_old P
} 6/29/2012 30
P2
P1
PN/s
M
P1
P2
PN/s
…
P_old Z
P1
P2
PN/s
…
P
s P1
P2
PN/s
…
N
s
N
![Page 31: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/31.jpg)
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(p=splits(P,i), m=splits(M,i),
x=splits(P_old), z=splits(Z,i)) {
p (m*x)+z
}
) P_old P
}
Execute function in a cluster
6/29/2012 31
P2
P1
PN/s
M
P1
P2
PN/s
…
P_old Z
P1
P2
PN/s
…
P
s P1
P2
PN/s
…
N
s
N
![Page 32: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/32.jpg)
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(p=splits(P,i), m=splits(M,i),
x=splits(P_old), z=splits(Z,i)) {
p (m*x)+z
}
) P_old P
}
Execute function in a cluster
Pass array partitions
6/29/2012 32
P2
P1
PN/s
M
P1
P2
PN/s
…
P_old Z
P1
P2
PN/s
…
P
s P1
P2
PN/s
…
N
s
N
![Page 33: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/33.jpg)
Incremental PageRank
M darray(dim=c(N,N),blocks=(s,N)) P darray(dim=c(N,1),blocks=(s1)) onchange(M) { while(..){ foreach(i,1:len,
calculate(p=splits(P,i), m=splits(M,i), x=splits(P_old), z=splits(Z,i)) { p (m*x)+z update(p)
}) P_old P }} 6/29/2012 33
P2
P1
PN/s
M
P1
P2
PN/s
…
P_old Z
P1
P2
PN/s
…
P
s P1
P2
PN/s
…
N
s
N
![Page 34: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/34.jpg)
Incremental PageRank
M darray(dim=c(N,N),blocks=(s,N)) P darray(dim=c(N,1),blocks=(s1)) onchange(M) { while(..){ foreach(i,1:len,
calculate(p=splits(P,i), m=splits(M,i), x=splits(P_old), z=splits(Z,i)) { p (m*x)+z update(p)
}) P_old P }}
Execute when data changes
6/29/2012 34
P2
P1
PN/s
M
P1
P2
PN/s
…
P_old Z
P1
P2
PN/s
…
P
s P1
P2
PN/s
…
N
s
N
![Page 35: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/35.jpg)
Incremental PageRank
M darray(dim=c(N,N),blocks=(s,N)) P darray(dim=c(N,1),blocks=(s1)) onchange(M) { while(..){ foreach(i,1:len,
calculate(p=splits(P,i), m=splits(M,i), x=splits(P_old), z=splits(Z,i)) { p (m*x)+z update(p)
}) P_old P }}
Execute when data changes
Update page rank vector
6/29/2012 35
P2
P1
PN/s
M
P1
P2
PN/s
…
P_old Z
P1
P2
PN/s
…
P
s P1
P2
PN/s
…
N
s
N
![Page 36: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/36.jpg)
Outline
• Motivation
• Programming model
• Design
• Applications and Results
6/29/2012 36
![Page 37: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/37.jpg)
Dynamic Partitioning of Matrices
6/29/2012 37
![Page 38: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/38.jpg)
Dynamic Partitioning of Matrices
6/29/2012 38
Profile execution
![Page 39: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/39.jpg)
Dynamic Partitioning of Matrices
6/29/2012 39
Profile execution
![Page 40: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/40.jpg)
Dynamic Partitioning of Matrices
6/29/2012 40
Profile execution
Partition
![Page 41: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/41.jpg)
Dynamic Partitioning of Matrices
6/29/2012 41
Profile execution
Partition
![Page 42: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/42.jpg)
Dynamic Partitioning of Matrices
6/29/2012 42
Profile execution
Partition
![Page 43: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/43.jpg)
Dynamic Partitioning of Matrices
6/29/2012 43
Profile execution
Partition
Programmers specify size invariants.
![Page 44: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/44.jpg)
Dynamic Partitioning of Matrices
6/29/2012 44
Up to 2x performance improvement
![Page 45: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/45.jpg)
Incremental Updates Using Consistent Snapshots
45
P
Q
R S
Web Graph
… …
Adjacency Matrix
0 0 0 0 1 0 0 0 1 0 0 0
0 0 1 1
…
6/29/2012
![Page 46: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/46.jpg)
Incremental Updates Using Consistent Snapshots
46
P
Q
R S
Web Graph
… …
Adjacency Matrix
0 0 0 0 1 0 0 0 1 0 0 0
0 0 1 1
…
onchange(M1)
6/29/2012
![Page 47: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/47.jpg)
Incremental Updates Using Consistent Snapshots
47
P
Q
R S
Web Graph
… …
Adjacency Matrix
0 0 0 0 1 0 0 0 1 0 0 0
0 0 1 1
Page Rank
… …
0.035
0.006
0.008
0.032
update P1
6/29/2012
![Page 48: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/48.jpg)
Incremental Updates Using Consistent Snapshots
48
P
Q
R S
Web Graph
… …
Adjacency Matrix
0 0 0 0 1 0 0 0 1 0 0 0
0 0 1 1
Page Rank
… …
0.035
0.006
0.008
0.032
6/29/2012
![Page 49: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/49.jpg)
Incremental Updates Using Consistent Snapshots
49
P
Q
R S
Web Graph
… …
Adjacency Matrix
0 0 0 0 1 0 0 0 1 0 0 0
0 0 1 1
Page Rank
… …
0.035
0.006
0.008
0.032
0 0 0 0 1 0 0 0 1 0 0 0
0 1 1 1
onchange(M2)
6/29/2012
![Page 50: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/50.jpg)
Incremental Updates Using Consistent Snapshots
50
P
Q
R S
Web Graph
… …
Adjacency Matrix
0 0 0 0 1 0 0 0 1 0 0 0
0 0 1 1
Page Rank
… …
0.035
0.006
0.008
0.032
0 0 0 0 1 0 0 0 1 0 0 0
0 1 1 1 0.035
0.028
0.008
0.032
update P2
6/29/2012
![Page 51: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/51.jpg)
Versioned Distributed Arrays
Mechanics of versioning
– update: Increment version number
– onchange: Bind a version number for the array
before executing the handler
51 6/29/2012
![Page 52: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/52.jpg)
Outline
• Motivation
• Programming model
• Design
• Applications and Results
6/29/2012 52
![Page 53: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/53.jpg)
Applications Implemented in Presto
Application Algorithm Presto LOC
PageRank Eigenvector calculation 41
Triangle counting Top-K eigenvalues 121
Netflix recommendation Matrix factorization 130
Centrality measure Graph algorithm 132
k-path connectivity Graph algorithm 30
k-means Clustering 71
Sequence alignment Smith-Waterman 64
6/29/2012 53
![Page 54: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/54.jpg)
Applications Implemented in Presto
Application Algorithm Presto LOC
PageRank Eigenvector calculation 41
Triangle counting Top-K eigenvalues 121
Netflix recommendation Matrix factorization 130
Centrality measure Graph algorithm 132
k-path connectivity Graph algorithm 30
k-means Clustering 71
Sequence alignment Smith-Waterman 64
Fewer than 140 lines of code
6/29/2012 54
![Page 55: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/55.jpg)
1
10
100
1000
8 16 32 64
Tim
e (
sec)
Number of workers
Presto Hadoop-InMem
Presto is Fast !
PageRank per-iteration execution time
Data: 100M nodes, 1.2B edges. Setup: 10G network. 12 cores, 96GB RAM.
6/29/2012 55
![Page 56: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/56.jpg)
1
10
100
1000
8 16 32 64
Tim
e (
sec)
Number of workers
Presto Hadoop-InMem
Presto is Fast !
PageRank per-iteration execution time
Data: 100M nodes, 1.2B edges. Setup: 10G network. 12 cores, 96GB RAM.
More than 20x faster than Hadoop (w/ in-memory storage)
6/29/2012 56
![Page 57: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/57.jpg)
More in the Paper
• Memory management, caching of partitions
• Scheduling operations
• Storage driver interface to HBase
• Fault tolerance
6/29/2012 57
![Page 58: Using R for Iterative and Incremental Processing](https://reader030.vdocuments.mx/reader030/viewer/2022012417/61720b37d3d68c61855ea952/html5/thumbnails/58.jpg)
Conclusion
Linear Algebra is a powerful abstraction
Easily express machine learning, graph algorithms
Challenges: Sparse matrices, Incremental data
Presto – prototype extends R
Open source version soon !
6/29/2012 58