treadmarks: distributed shared memory on standard workstations and operating systems p. keleher, a....

37
Treadmarks: Distributed Shared Treadmarks: Distributed Shared Memory on Standard Workstations Memory on Standard Workstations and Operating Systems and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix Conference 1994 2008-22952 Jun Lee

Upload: kelley-parks

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Treadmarks: Distributed Shared Treadmarks: Distributed Shared Memory on Standard Memory on Standard

Workstations and Operating Workstations and Operating SystemsSystems

P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix Conference

1994

2008-22952Jun Lee

Page 2: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

DSM (distributed shared DSM (distributed shared memory)memory) A software system for parallel

computation• Shares distributed memories• Easier programming

−Provide a single global address space

Page 3: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

DSM (distributed shared DSM (distributed shared memory)memory) No widely available DSM implementations• In-house research platforms• Kernel modifications• Poor performance

−Imitating consistency protocols of hardware−False sharing

Page 4: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

TreadmarksTreadmarks

Objectives• Commercially available workstations and

OS−Standard Unix system on DECstation

• Efficient user-level DSM implementation−Reduce communication overhead

Design• LRC (lazy release consistency)• Multiple writer protocol• Lazy diff creation

Page 5: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Consistency protocol (SC)Consistency protocol (SC)

Sequential Consistency• Every write visible “immediately”• Single writer

P0 P1

R(a):0

W(a):1

R(a):1

Page 6: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Consistency protocol (SC)Consistency protocol (SC)

Sequential Consistency• Every write visible “immediately”• Single writer

P0 P1

R(a):0

W(a):1

R(a):?

R(a):1

Big problem with page size granularity

Page 7: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Page X

Consistency protocol (SC)Consistency protocol (SC)

Sequential Consistency• Every write visible “immediately”• Single writer

W(x0):a W(x1):b a

W(x2):c

W(x3):d

P0 P1

a

Page X

bb c c d

False sharing

Page 8: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Consistency protocol (RC)Consistency protocol (RC)

Release Consistency• Relaxed memory consistency model

−delay making its changes visible to other processors until certain synchronization accesses occurs

• Synchronization points−Acquire(), Release() (similar to locks,

barriers)

• Two types−ERC (eager), LRC (lazy)

Page 9: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Consistency protocol (RC)Consistency protocol (RC)

Release Consistency• Acquire() and release() are sequentially

consistent−Release() is performed after all previous

operations have completed−Operations are performed after previous

acquire() have been performed

• Acquire() and release() pair between conflicting accesses−SC and RC produce the same results.

Page 10: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Consistency protocol (RC)Consistency protocol (RC)

ERC• Write information is delivered at the

release pointP0 P1

Acquire(L)

R(a):0

W(a):1

Release(L)

Acquire(L)

R(a):?

Release(L)

R(a):1

Write Notice

Page 11: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Consistency protocol (RC)Consistency protocol (RC)

ERC• Write information is delivered at the

release pointP0 P1

Acquire(L)

R(a):0

W(a):1

Release(L)

Acquire(L)

R(a):1

Release(L)

Acquire(K)

Release(K)

Page 12: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Consistency protocol (RC)Consistency protocol (RC)

LRC• The delivery is postponed until the

acquire• Fewer messages than ERCP0 P1

Acquire(L)

R(a):0W(a):1

Release(L)Acquire(L)

R(a):?

Release(L)

R(a):1

Page 13: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Consistency protocol (RC)Consistency protocol (RC)

ERC vs. LRC

P0 P1

Acquire(L) R(a):0

W(a):1

Release(L)

P2

R(a):0

P3

R(a):0

Acquire(L)

Release(L)

R(a):1

ERC

Page 14: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Consistency protocol (RC)Consistency protocol (RC)

ERC vs. LRC

P0 P1

Acquire(L) R(a):0

W(a):1

Release(L)

P2

R(a):0

P3

R(a):0

Acquire(L)

Release(L)

R(a):1

ERC

P0 P1

Acquire(L) R(a):0

W(a):1

Release(L)

P2

R(a):0

P3

R(a):0

Acquire(L)

Release(L)

R(a):1

LRC

Page 15: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Page X

Multiple writer protocolMultiple writer protocol

Page X

W(x0):a W(x1):b

W(x2):c

W(x3):?

P0 P1

a bc

Acquire(L)

Release(L)

Acquire(L)

a c

W(x0):a W(x1):b

W(x2):c

W(x3):d

P0 P1a

Page X

bc

Page X

abcd

False sharing

Page 16: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Page X

Multiple writer protocolMultiple writer protocol

Page X

W(x0):a W(x1):b

W(x2):c

W(x3):?

P0 P1

a bc d

W(x0):a W(x1):b

W(x2):c

W(x3):d

P0 P1a

Page X

bc

Page X

abcd

False sharing

Acquire(L)

Release(L)

Acquire(L)

Release(L)

W(x3):d

a c

Page 17: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Page X

Twin and DiffTwin and Diff

Page X

W(x0):a W(x1):b

W(x2):c

W(x3):?

P0 P1

a bc d

Acquire(L)

Release(L)

Acquire(L)

Release(L)

W(x3):d

Twin X Twin XDiff

a c diff

a c

Page 18: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Page X

Twin and DiffTwin and Diff

Page X

W(x0):a W(x1):b

W(x2):c

W(x3):?

P0 P1

a bc d

Acquire(L)

Release(L)

Acquire(L)

Release(L)

W(x3):d

Twin X Twin XDiff

a c diff

a c

Twin X

ba cDiff

b diff

a c diffinterval

Page 19: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

ImplementationImplementation

Page 20: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Etc.Etc.

Lock & barrier• Statically assigned manager

Garbage collection• reclaim the space used by write notice

records, interval records, and diffs• Triggered when the free space drops

below a threshold

Page 21: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

EvaluationEvaluation

Experimental Environment• 8 DECstation-5000/240• connected to a 100-Mbps ATM LAN and a 10-

Mbps Ethernet

Applications• Water – molecular dynamics simulation • Jacobi – Successive Over-Relaxation• TSP – branch & bound algorithm to solve the

traveling salesman problem• Quicksort – using bubblesort to sort subarray of

less than 1K element• ILINK – genetic linkage analysis

Page 22: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

EvaluationEvaluation

Speedup

Execution statistics

Page 23: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

EvaluationEvaluation

Execution time breakdown

Page 24: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

EvaluationEvaluation

Unix overhead breakdown TreadMarks overhead breakdown

Page 25: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

EvaluationEvaluation

Execution time breakdown for Water

Page 26: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

EvaluationEvaluationERC vs. LRC

Speedup Message rate

Page 27: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

EvaluationEvaluationERC vs. LRC

Data rate Diff creation rate

Page 28: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

ImplementationImplementation

. . .

. . .

. . .

pages

time stamp0 0 0

. . .

. . .

. . .

pages

time stamp0 0 0

Acq(L)

P0 side P1 side

P P

Page 29: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

ImplementationImplementation

. . .

. . .

. . .

pages

time stamp1 0 0

. . .

. . .

. . .

pages

time stamp0 0 0

Acq(L)

P0 side P1 side

W(a)

Rel(L)a W(b) b

twin twin

P0W.N

P1W.N

P P

Page 30: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

ImplementationImplementation

. . .

. . .

. . .

pages

time stamp2 0 0

. . .

. . .

. . .

pages

time stamp0 0 0

Acq(L)

P0 side P1 side

W(a)

Rel(L)a

Acq(L)

W(b) b

twin twin

P0W.N

P1W.N

Page 31: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

ImplementationImplementation

. . .

. . .

. . .

pages

time stamp2 0 0

. . .

. . .

. . .

pages

time stamp0 0 0

Acq(L)

P0 side P1 side

W(a)

Rel(L)a

Acq(L)

W(b) b

twin twin

P0W.N

P0

1 0 0

P1W.N

Page 32: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

ImplementationImplementation

. . .

. . .

. . .

pages

time stamp2 0 0

. . .

. . .

. . .

pages

time stamp1 1 0

Acq(L)

P0 side P1 side

W(a)

Rel(L)a

Acq(L)

W(c)

W(b) b

twin twin

P0W.N

P0W.N

P1

bdiff

adiff

P

a

W.N

P0 P1

1 0 0

Page 33: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

ImplementationImplementation

. . .

. . .

. . .

pages

time stamp2 0 0

Acq(L)

P0 side P1 side

W(a)

Rel(L)a

Acq(L)

W(c)

Rel(L)

W(b)P0W.N

adiff

P

. . .

. . .

. . .

pages

time stamp1 1 0

b P0W.N

P1

bdiff

adiff

aW.N

P0 P1

1 0 0

c P1W.N

P

twinba

Page 34: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

ImplementationImplementation

. . .

. . .

. . .

pages

time stamp2 0 0

Acq(L)

P0 side P1 side

W(a)

Rel(L)a

Acq(L)

W(c)

Rel(L)

W(b)P0W.N

adiff

P

. . .

. . .

. . .

pages

time stamp1 2 0

b P0W.N

P1

bdiff

adiff

aW.N

P0 P1

1 0 0

c P1W.N

twinba

Page 35: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

ImplementationImplementationP1 side P2 side

P0 P1

1 0 0

. . .

. . .

. . .

pages

time stamp0 0 0

P

Acq(L). . .

. . .

. . .

pages

time stamp1 2 0

b P0W.N

P1

bdiff

adiff

aW.N

c P1W.N

twinba

0 0 01 1 0

Page 36: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

ImplementationImplementationP1 side P2 side

P0 P1

1 0 0

. . .

. . .

. . .

pages

time stamp1 1 1

P

Acq(L). . .

. . .

. . .

pages

time stamp1 2 0

b P0W.N

P1

bdiff

adiff

aW.N

c P1W.N

twinba

0 0 01 1 0

P0W.N

P1W.N

P1W.N

Page 37: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix

Thank you !Thank you !Any questions?