shenandoah gc - part i: the garbage collector that could · overview:usuallog...

178
Shenandoah GC Part I: The Garbage Collector That Could Aleksey Shipilёv [email protected] @shipilev

Upload: others

Post on 04-Jul-2020

8 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Shenandoah GCPart I: The Garbage Collector That CouldAleksey Shipilё[email protected]@shipilev

Page 2: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Safe Harbor / Тихая ГаваньAnything on this or any subsequent slides may be a lie. Donot base your decisions on this talk. If you do, ask forprofessional help.Всё что угодно на этом слайде, как и на всех следующих,может быть враньём. Не принимайте решений наосновании этого доклада. Если всё-таки решите принять,то наймите профессионалов.

Slide 2/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 3: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics

Page 4: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: OpenJDK GCs Landscape

Slide 4/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 5: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: OpenJDK GCs Landscape

Slide 4/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 6: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: OpenJDK GCs Landscape

Slide 4/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 7: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: OpenJDK GCs Landscape

Slide 4/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 8: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: Concurrent GC Only For Large Heaps?

𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑠𝑡𝑤 = 𝛼 * 𝑆𝑖𝑧𝑒ℎ𝑒𝑎𝑝 *𝑀𝑒𝑚𝑅𝑒𝑓𝑠𝑠𝑡𝑤 *𝑀𝑒𝑚𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑎𝑣𝑔

Slide 5/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 9: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: Concurrent GC Only For Large Heaps?

𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑠𝑡𝑤 = 𝛼 * 𝑆𝑖𝑧𝑒ℎ𝑒𝑎𝑝 *𝑀𝑒𝑚𝑅𝑒𝑓𝑠𝑠𝑡𝑤 *𝑀𝑒𝑚𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑎𝑣𝑔

Slide 5/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 10: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: Concurrent GC Only For Large Heaps?

𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑠𝑡𝑤 = 𝛼 * 𝑆𝑖𝑧𝑒ℎ𝑒𝑎𝑝 *𝑀𝑒𝑚𝑅𝑒𝑓𝑠𝑠𝑡𝑤 *𝑀𝑒𝑚𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑎𝑣𝑔

Heap size collectedper GC cycle,MB Memory referencesduring STW,accesses/MB

End-to-endmemory latency,ns/access

Slide 5/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 11: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: Concurrent GC Only For Large Heaps?𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑠𝑡𝑤 components

Observation 𝛼 * 𝑆𝑖𝑧𝑒ℎ𝑒𝑎𝑝 𝑀𝑒𝑚𝑅𝑒𝑓𝑠𝑠𝑡𝑤 𝑀𝑒𝑚𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑎𝑣𝑔Large heap ↑↑ ↓↓ ≈

Large heap: large live data sets⇒ need concurrent GC

Slow hardware: memory is slow⇒ need concurrent GC

Slide 6/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 12: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: Concurrent GC Only For Large Heaps?

𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑠𝑡𝑤 components

Observation 𝛼 * 𝑆𝑖𝑧𝑒ℎ𝑒𝑎𝑝 𝑀𝑒𝑚𝑅𝑒𝑓𝑠𝑠𝑡𝑤 𝑀𝑒𝑚𝐿𝑎𝑡𝑒𝑛𝑐𝑦𝑎𝑣𝑔Large heap ↑↑ ↓↓ ≈Slow hardware ≈ ↓↓ ↑↑

Large heap: large live data sets⇒ need concurrent GCSlow hardware: memory is slow⇒ need concurrent GC

Slide 6/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 13: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: Slow HardwareRaspberry Pi 3, running springboot-petclinic:

# -XX:+UseShenandoahGC

Pause Init Mark 8.991ms

Concurrent marking 409M->411M(512M) 246.580ms

Pause Final Mark 3.063ms

Concurrent cleanup 411M->89M(512M) 1.877ms

# -XX:+UseParallelGC

Pause Young (Allocation Failure) 323M->47M(464M) 220.702ms

# -XX:+UseG1GC

Pause Young (G1 Evacuation Pause) 410M->38M(512M) 164.573ms

Slide 7/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 14: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: ReleasesEasy to access (development) releases: try it now!

https://wiki.openjdk.java.net/display/shenandoah/

Dev follows latest JDK, backports to 11, 10, and 8JDK 8 backport ships in RHEL 7.4+, Fedora 24+JDK 11 backport ships in Fedora 27+Nightly development builds (tarballs, Docker images)

docker run -it --rm shipilev/openjdk-shenandoah \

java -XX:+UseShenandoahGC -Xlog:gc -version

Slide 8/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 15: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Basics: This Message Is Brought To You ByIMHO, discussing gory GC detailswithout «GC Handbook» is a wasteof timeMany GCs appear super-innovative,but in fact they reuse (or reinvent)ideas from the GC HandbookCombinations of those ideas giverise to many concrete GCs

Slide 9/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 16: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overview

Page 17: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overview: Heap StructureShenandoah is a regionalized GCHeap division, humongous regions, etcare similar to G1Collects garbage regions first by defaultNot generational by default, noyoung/old separation, even temporallyTracking inter-region references is notneeded by default

Slide 11/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 18: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overview: Usual Cycle

Three major phases:

1. Concurrent marking2. Concurrent evacuation3. Concurrent update references (optional)

Slide 12/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 19: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overview: Usual Cycle

Three major phases:1. Concurrent marking

2. Concurrent evacuation3. Concurrent update references (optional)

Slide 12/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 20: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overview: Usual Cycle

Three major phases:1. Concurrent marking2. Concurrent evacuation

3. Concurrent update references (optional)

Slide 12/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 21: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overview: Usual Cycle

Three major phases:1. Concurrent marking2. Concurrent evacuation3. Concurrent update references (optional)Slide 12/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 22: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overview: Usual Cycle

Three major phases:1. Concurrent marking2. Concurrent evacuation3. Concurrent update references (optional)Slide 12/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 23: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overview: Usual LogLRUFragger, 100 GB heap, ≈ 80 GB live data:

Pause Init Mark 0.227ms

Concurrent marking 84864M->85952M(102400M) 1386.157ms

Pause Final Mark 0.806ms

Concurrent cleanup 85952M->85985M(102400M) 0.176ms

Concurrent evacuation 85985M->98560M(102400M) 473.575ms

Pause Init Update Refs 0.046ms

Concurrent update references 98560M->98944M(102400M) 422.959ms

Pause Final Update Refs 0.088ms

Concurrent cleanup 98944M->84568M(102400M) 18.608msSlide 13/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 24: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overview: Usual LogLRUFragger, 100 GB heap, ≈ 80 GB live data:

Pause Init Mark 0.227ms

Concurrent marking 84864M->85952M(102400M) 1386.157ms

Pause Final Mark 0.806ms

Concurrent cleanup 85952M->85985M(102400M) 0.176ms

Concurrent evacuation 85985M->98560M(102400M) 473.575ms

Pause Init Update Refs 0.046ms

Concurrent update references 98560M->98944M(102400M) 422.959ms

Pause Final Update Refs 0.088ms

Concurrent cleanup 98944M->84568M(102400M) 18.608msSlide 13/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 25: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Phases

Page 26: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: ReachabilityTo catch a garbage, you have to think like a garbageknow if there are references to the object

Three basic approaches:1. No-op: ignore the problem (Epsilon GC)2. Reference counting: track the number of references,and when refcount drops to 0, treat the object as garbage3. Tracing: walk the object graph, find reachable objects,treat everything else as garbage

Slide 15/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 27: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: ReachabilityTo catch a garbage, you have to think like a garbageknow if there are references to the object

Three basic approaches:1. No-op: ignore the problem (Epsilon GC)

2. Reference counting: track the number of references,and when refcount drops to 0, treat the object as garbage3. Tracing: walk the object graph, find reachable objects,treat everything else as garbage

Slide 15/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 28: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: ReachabilityTo catch a garbage, you have to think like a garbageknow if there are references to the object

Three basic approaches:1. No-op: ignore the problem (Epsilon GC)2. Reference counting: track the number of references,and when refcount drops to 0, treat the object as garbage

3. Tracing: walk the object graph, find reachable objects,treat everything else as garbage

Slide 15/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 29: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: ReachabilityTo catch a garbage, you have to think like a garbageknow if there are references to the object

Three basic approaches:1. No-op: ignore the problem (Epsilon GC)2. Reference counting: track the number of references,and when refcount drops to 0, treat the object as garbage3. Tracing: walk the object graph, find reachable objects,treat everything else as garbage

Slide 15/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 30: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Three-Color AbstractionAssign colors to the objects:1. White: not yet visited2. Gray: visited, but references are not scanned yet3. Black: visited, and fully scanned

Daily Blues:«All the marking algorithms do iscoloring white gray, and then coloring gray black»

Slide 16/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 31: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Three-Color AbstractionAssign colors to the objects:1. White: not yet visited2. Gray: visited, but references are not scanned yet3. Black: visited, and fully scanned

Daily Blues:«All the marking algorithms do iscoloring white gray, and then coloring gray black»

Slide 16/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 32: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Stop-The-World Mark

When application is stopped, everything is trivial!Nothing messes up the scan...Slide 17/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 33: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Stop-The-World Mark

Found all roots, color them Black,because they are implicitly reachableSlide 17/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 34: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Stop-The-World Mark

References from Black are now Gray,scanning Gray referencesSlide 17/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 35: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Stop-The-World Mark

Finished scanning Gray, color them Black;new references are GraySlide 17/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 36: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Stop-The-World Mark

Gray→ Black;reachable from Gray→ GraySlide 17/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 37: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Stop-The-World Mark

Gray→ Black;reachable from Gray→ GraySlide 17/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 38: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Stop-The-World Mark

Gray→ Black;reachable from Gray→ GraySlide 17/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 39: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Stop-The-World Mark

Gray→ Black;reachable from Gray→ GraySlide 17/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 40: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Mark: Stop-The-World Mark

Finished: everything reachable is Black;all garbage is WhiteSlide 17/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 41: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Mutator ProblemsWith concurrentmarkeverything gets complicated:the application runs andactively mutates the objectgraph during the markWe contemptuously call itmutator because of that

Slide 18/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 42: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Mutator Problems

Wavefront is here,and starts scanning the references in Gray object...Slide 19/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 43: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Mutator Problems

Mutator removes the reference from Gray...and inserts it to Black!Slide 19/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 44: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Mutator Problems

...or mutator inserted the reference totransitively reachable White object into Black

Slide 19/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 45: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Mutator Problems

...or mutator inserted the reference totransitively reachable White object into Black

Slide 19/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 46: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Mutator Problems

Mark had finished, and boom: we have reachableWhiteobjects, which we will now reclaim, corrupting the heapSlide 19/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 47: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Mutator Problems

new

Another quirk: created new new object,and inserted it into BlackSlide 19/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 48: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Textbook Says

There are at least three approaches tosolve this problem. All of them requireintercepting heap accesses. Short on time,we shall discuss what G1 and Shenandoahare doing.

Slide 20/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 49: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: SATB

new

Color all removed referents GraySlide 21/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 50: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: SATB

new

Color all new objects BlackSlide 21/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 51: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: SATB

new

Finishing...Slide 21/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 52: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: SATB

new

Done!Slide 21/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 53: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: SATB

new

«Snapshot At The Beginning»:marked all reachable at mark startSlide 21/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 54: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: SATB Barrier# check if we are marking

testb 0x2, 0x20(%r15)

jne OMG-MARKING

BACK:

# ... actual store follows ...

# somewhere much later

OMG-MARKING:

# tens of instructions that add old value

# to thread-local buffer, check for overflow,

# call into VM slowpath to process the buffer

...

jmp BACK

Slide 22/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 55: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Two Pauses1Init Mark: stop the mutator to avoid races1. Walk and mark all roots2. Arm SATB barriersFinal Mark: stop the mutator to avoid races1. Drain the thread buffers2. Finish work from buffer updates1These can actually be concurrent, but that is not very practical

Slide 23/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 56: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Two Pauses1Init Mark: stop the mutator to avoid races1. Walk and mark all roots←most heavy-weight2. Arm SATB barriersFinal Mark: stop the mutator to avoid races1. Drain the thread buffers2. Finish work from buffer updates←most heavy-weight1These can actually be concurrent, but that is not very practical

Slide 23/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 57: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Barriers Cost2Throughput hit, %

SATB

WB RB CMP* TOTAL

Cmp -1.6

-3.5 -7.7 -14.3

Cps -3.5

-11.4 -13.7

Cry

-1.1 -4.3

Der -1.6

-7.4 -9.3

Mpg

-2.1 -12.4 -14.8

Smk

-0.5 -4.9 -2.6

Ser

-4.0 -7.1 -11.1

Sfl

-2.7 -6.7 -11.3

Xml -3.1

-3.5 -9.5 -15.6

2Performance compared to STW Shenandoah with all barriers disabledSlide 24/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 58: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Observations1. Extended concurrency needs to pay with more barriersIdeal STW GC beats ideal concurrent GC on pure throughputIf you do not care about GC pauses, just use good STW GCEmpty GC log does not mean no GC overhead

2. Hiding references from mark prolongs final mark pauseWeak references with unreachable referents, finalizers«Old» objects hidden in SATB buffers

Slide 25/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 59: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Mark: Observations1. Extended concurrency needs to pay with more barriersIdeal STW GC beats ideal concurrent GC on pure throughputIf you do not care about GC pauses, just use good STW GCEmpty GC log does not mean no GC overhead

2. Hiding references from mark prolongs final mark pauseWeak references with unreachable referents, finalizers«Old» objects hidden in SATB buffers

Slide 25/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 60: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Copy: Stop-The-World

Problem:there is the object, theobject is referencedfrom somewhere, needto move it to newlocation

Slide 26/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 61: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Copy: Stop-The-World

Step 1: Stop The World,evasive maneuver todistract mutator fromlooking into our mess

Slide 26/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 62: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Copy: Stop-The-World

Step 2:Copy the object with allits contents

Slide 26/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 63: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Copy: Stop-The-World

Step 3.1:Update all references:save the pointer thatforwards to the copy

Slide 26/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 64: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Copy: Stop-The-World

Step 3.2:Update all references:walk the heap, replaceall refs with fwdptrdestination

Slide 26/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 65: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Copy: Stop-The-World

Step 3.2:Update all references:walk the heap, replaceall refs with fwdptrdestination

Slide 26/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 66: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Copy: Stop-The-World

Everything is fine in theworld, set the mutatorsfree! Done!

Slide 26/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 67: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Mutator Problems

http://vernova-dasha.livejournal.com/77066.html

With concurrentcopying everythinggets is significantlyharder: the applicationwrites into the objectswhile we are movingthe same objects!

Slide 27/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 68: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Mutator Problems

While object is beingmoved, there are twocopies of the object,and both arereachable!

Slide 28/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 69: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Mutator Problems

Thread A writes 𝑦 = 4to one copy, andThread B writes 𝑥 = 5to another. Which copyis correct now, huh?

Slide 28/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 70: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Brooks Pointers

Idea:Brooks pointer: objectversion change withadditional atomicallychanged indirection

Slide 29/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 71: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Brooks Pointers

Step 1:Copy the object,initialize its forwardingpointer to self

Slide 29/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 72: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Brooks Pointers

We now have the copyof the object, but noone knows about it

Slide 29/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 73: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Brooks PointersStep 2:CAS! Atomically installforwarding pointer topoint to new copy. IfCAS had failed,discover the copy viaforwarding pointer

Slide 29/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 74: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Brooks Pointers

Step 3:Rewrite the referencesat our own pace in therest of the heap

Slide 29/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 75: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Brooks Pointers

If somebody reachesthe old copy via the oldreference, it has todereference via fwdptrand discover the actualobject copy!

Slide 29/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 76: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Brooks Pointers

Step 4:All references areupdated, recycle thefrom-space copy

Slide 29/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 77: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Concurrent Copy: Brooks Pointers

Done!

Slide 29/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 78: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Write Barriers: Motivation

To-space invariant:Writes should happenin to-space only,otherwise they are lostwhen cycle is finished

Slide 30/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 79: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Write Barriers: Fastpathtestb 0x1, 0x20(%r15) # Heap is stable?

jne OMG-FORWARDED-OBJECTS

BACK:

# ... actual store follows ...

# somewhere much later

OMG-FORWARDED-OBJECTS:

mov -0x8(%rbp),%r10 # Resolve via fwdptr

testb 0x4, 0x20(%r15) # Evacuation in progress?

jne OMG-EVACUATION

jmp BACK

Slide 31/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 80: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Write Barriers: Slowpathstub WriteBarrier(obj) {

if (in-collection-set(obj) && // target is in from-space

fwd-ptrs-to-self(obj)) { // no copy yet

val copy = copy(obj);

if (CAS(fwd-ptr-addr(obj), obj, copy)) {

return copy; // success!

} else {

return fwd-ptr(obj); // someone beat us to it

}

}

}

Slide 32/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 81: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Write Barriers: GC Evacuation Codestub evacuate(obj) {

if (in-collection-set(obj) && // target is in from-space

fwd-ptrs-to-self(obj)) { // no copy yet

copy = copy(obj);

CAS(fwd-ptr-addr(obj), obj, copy);

}

}

Termination guarantees:Always copy out of collection set.Double forwarding is the GC error.Slide 33/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 82: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Write Barriers: Barriers Cost2Throughput hit, %

SATB WB

RB CMP* TOTAL

Cmp -1.6 -3.5

-7.7 -14.3

Cps -3.5

-11.4 -13.7

Cry -1.1

-4.3

Der -1.6

-7.4 -9.3

Mpg -2.1

-12.4 -14.8

Smk -0.5

-4.9 -2.6

Ser -4.0

-7.1 -11.1

Sfl -2.7

-6.7 -11.3

Xml -3.1 -3.5

-9.5 -15.6

2Performance compared to STW Shenandoah with all barriers disabledSlide 34/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 83: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Write Barriers: Observations1. Shenandoah needs WB on all storesField stores – obviouslyLocking the object – changes header⇒ needs WBComputing identity hash code – changes header⇒ needs WB

2. Passive WB cost is lowWrites, even the primitive ones, are rareThe cost of L1-load-test-branch is low3. Active WB cost is moderateGC does the bulk of the workIn optimized barrier paths, fwdptr CAS is the major cost

Slide 35/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 84: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Write Barriers: Observations1. Shenandoah needs WB on all storesField stores – obviouslyLocking the object – changes header⇒ needs WBComputing identity hash code – changes header⇒ needs WB2. Passive WB cost is lowWrites, even the primitive ones, are rareThe cost of L1-load-test-branch is low

3. Active WB cost is moderateGC does the bulk of the workIn optimized barrier paths, fwdptr CAS is the major cost

Slide 35/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 85: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Write Barriers: Observations1. Shenandoah needs WB on all storesField stores – obviouslyLocking the object – changes header⇒ needs WBComputing identity hash code – changes header⇒ needs WB2. Passive WB cost is lowWrites, even the primitive ones, are rareThe cost of L1-load-test-branch is low3. Active WB cost is moderateGC does the bulk of the workIn optimized barrier paths, fwdptr CAS is the major cost

Slide 35/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 86: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Read Barriers: Motivation

Heap reads have to (?)dereference via theforwarding pointer, todiscover the actualobject copy

Slide 36/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 87: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Read Barriers: Implementation# read barrier: dereference via fwdptr

mov -0x8(%r10),%r10 # obj = *(obj - 8)

# ...actual read from %r10 follows...

Benchmark Score Units

base +3 RBs

time 4.6 ± 0.1 5.3 ± 0.1 ns/op

L1-dcache-loads 12.3 ± 0.2 15.1 ± 0.3 #/op

cycles 18.7 ± 0.3 21.6 ± 0.3 #/op

instructions 26.6 ± 0.2 30.3 ± 0.3 #/op

Slide 37/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 88: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Read Barriers: Implementation# read barrier: dereference via fwdptr

mov -0x8(%r10),%r10 # obj = *(obj - 8)

# ...actual read from %r10 follows...

Benchmark Score Units

base +3 RBs

time 4.6 ± 0.1 5.3 ± 0.1 ns/op

L1-dcache-loads 12.3 ± 0.2 15.1 ± 0.3 #/op

cycles 18.7 ± 0.3 21.6 ± 0.3 #/op

instructions 26.6 ± 0.2 30.3 ± 0.3 #/op

Slide 37/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 89: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Read Barriers: Barriers Cost2Throughput hit, %

SATB WB RB

CMP* TOTAL

Cmp -1.6 -3.5 -7.7

-14.3

Cps -3.5 -11.4

-13.7

Cry -1.1

-4.3

Der -1.6 -7.4

-9.3

Mpg -2.1 -12.4

-14.8

Smk -0.5 -4.9

-2.6

Ser -4.0 -7.1

-11.1

Sfl -2.7 -6.7

-11.3

Xml -3.1 -3.5 -9.5

-15.6

2Performance compared to STW Shenandoah with all barriers disabledSlide 38/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 90: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Read Barriers: Observations1. Shenandoah needs RBs beforemost loadsCannot make RBs much heavierOptimizing compilers move and coalesce RB –massive gains

2. Passive RB cost is moderateDependent load that hits the same cache line as object3. Active RB cost is moderateDoes not differ much from passive RB

Slide 39/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 91: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Read Barriers: Observations1. Shenandoah needs RBs beforemost loadsCannot make RBs much heavierOptimizing compilers move and coalesce RB –massive gains2. Passive RB cost is moderateDependent load that hits the same cache line as object

3. Active RB cost is moderateDoes not differ much from passive RB

Slide 39/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 92: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Read Barriers: Observations1. Shenandoah needs RBs beforemost loadsCannot make RBs much heavierOptimizing compilers move and coalesce RB –massive gains2. Passive RB cost is moderateDependent load that hits the same cache line as object3. Active RB cost is moderateDoes not differ much from passive RB

Slide 39/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 93: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

CMP: Trouble

What if we comparefrom-copy and to-copythemselves?

(a1 == a2)→ ???

Slide 40/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 94: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

CMP: TroubleWhat if we comparefrom-copy and to-copythemselves?

(a1 == a2)→ ???

Butmachine ptrs arenot equal... Oops.

Slide 40/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 95: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

CMP: Exotic BarriersHaving two physical copies of the same logical object,«==» has to compare logical objects

# compare the ptrs; if equal, good!

cmp %rcx,%rdx # if (a1 == a2) ...

je EQUALS

# false negative? have to compare to-copy:

mov -0x8(%rcx),%rcx # a1 = *(a1 - 8)

mov -0x8(%rdx),%rdx # a2 = *(a2 - 8)

# compare again:

cmp %rcx,%rdx # if (a1 == a2) ...

Slide 41/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 96: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

CMP: Barriers Cost2Throughput hit, %

SATB WB RB CMP*

TOTAL

Cmp -1.6 -3.5 -7.7

-14.3

Cps -3.5 -11.4

-13.7

Cry -1.1

-4.3

Der -1.6 -7.4

-9.3

Mpg -2.1 -12.4

-14.8

Smk -0.5 -4.9

-2.6

Ser -4.0 -7.1

-11.1

Sfl -2.7 -6.7

-11.3

Xml -3.1 -3.5 -9.5

-15.6

2Performance compared to STW Shenandoah with all barriers disabledSlide 42/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 97: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

CMP: Observations1. Shenandoah needs to handle ref comparisons speciallyCannot make RBs much heavierOptimizing compilers move and coalesce RB –massive gains

2. Passive CMP cost is lowBarely detectable in most casesComparisons with null are frequent and optimized3. Active CMP cost is lowDoes not differ much from passive RB

Slide 43/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 98: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

CMP: Observations1. Shenandoah needs to handle ref comparisons speciallyCannot make RBs much heavierOptimizing compilers move and coalesce RB –massive gains2. Passive CMP cost is lowBarely detectable in most casesComparisons with null are frequent and optimized

3. Active CMP cost is lowDoes not differ much from passive RB

Slide 43/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 99: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

CMP: Observations1. Shenandoah needs to handle ref comparisons speciallyCannot make RBs much heavierOptimizing compilers move and coalesce RB –massive gains2. Passive CMP cost is lowBarely detectable in most casesComparisons with null are frequent and optimized3. Active CMP cost is lowDoes not differ much from passive RB

Slide 43/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 100: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overall: Barriers Cost2Throughput hit, %

SATB WB RB CMP* TOTAL

Cmp -1.6 -3.5 -7.7 -14.3

Cps -3.5 -11.4 -13.7

Cry -1.1 -4.3

Der -1.6 -7.4 -9.3

Mpg -2.1 -12.4 -14.8

Smk -0.5 -4.9 -2.6

Ser -4.0 -7.1 -11.1

Sfl -2.7 -6.7 -11.3

Xml -3.1 -3.5 -9.5 -15.6

2Performance compared to STW Shenandoah with all barriers disabledSlide 44/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 101: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overall: Observations1. Easily portable across HW architecturesSpecial needs: CAS (performance is important, but not critical)x86_64 and AArch64 are major implemented targetsTheoretically works with 32-bit arches (but not ported yet)

2. Trivially portable across OSesSpecial needs: noneLinux is a major target, Windows is minor targetAdopters build on Mac OS without problems3. VM interactions are simple enoughPlay well with compressed oops: separate fwdptrOS/CPU-specific things only for barriers codegen

Slide 45/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 102: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overall: Observations1. Easily portable across HW architecturesSpecial needs: CAS (performance is important, but not critical)x86_64 and AArch64 are major implemented targetsTheoretically works with 32-bit arches (but not ported yet)2. Trivially portable across OSesSpecial needs: noneLinux is a major target, Windows is minor targetAdopters build on Mac OS without problems

3. VM interactions are simple enoughPlay well with compressed oops: separate fwdptrOS/CPU-specific things only for barriers codegen

Slide 45/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 103: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Overall: Observations1. Easily portable across HW architecturesSpecial needs: CAS (performance is important, but not critical)x86_64 and AArch64 are major implemented targetsTheoretically works with 32-bit arches (but not ported yet)2. Trivially portable across OSesSpecial needs: noneLinux is a major target, Windows is minor targetAdopters build on Mac OS without problems3. VM interactions are simple enoughPlay well with compressed oops: separate fwdptrOS/CPU-specific things only for barriers codegen

Slide 45/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 104: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo

Page 105: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: Generational Hypotheses

Weak hypothesis:most objects die young

Slide 47/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 106: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: Generational HypothesesStrong hypothesis:the older the object,the less chance it hasto die

In-memory LRU-likecaches are the primecounterexamples

Slide 48/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 107: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: Generational HypothesesStrong hypothesis:the older the object,the less chance it hasto dieIn-memory LRU-likecaches are the primecounterexamples

Slide 48/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 108: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: LRU, Pesky WorkloadVery inconvenient workload forsimple generational GCs

Early on, many young objects die, and oldies survive:weak GH is valid, strong GH is validSuddenly, old objects start to die:weak GH is valid, strong GH is not valid anymore!Naive GCs trip over and burn

Slide 49/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 109: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: The Simplest LRUThe simplest LRU implementation in Java?

cache = new LinkedHashMap<>(size*4/3, 0.75f, true) {

@Override

protected boolean removeEldestEntry(Map.Entry<> eldest) {

return size() > size;

}

};

Slide 50/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 110: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: The Simplest LRUThe simplest LRU implementation in Java?

cache = new LinkedHashMap<>(size*4/3, 0.75f, true) {

@Override

protected boolean removeEldestEntry(Map.Entry<> eldest) {

return size() > size;

}

};

Slide 50/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 111: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: TestingBoring config:1. Latest improvements in all GCs: shenandoah/jdk forest2. Decent multithreading: 8 threads on 16-thread i7-7820X3. Larger heap: -Xmx100g -Xms100g

4. 90% hit rate, 90% reads, 10% writes5. Size (LDS) = 0..100% of -XmxVarying cache size⇒ varying LDS⇒make GC uncomfortable

Slide 51/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 112: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: Pauses vs. LDS

●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●

Parallel CMS Shenandoah

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

10−4

10−3

10−2

10−1

100

101

Live Data Size, % of heap

Pau

se t

ime,

sec

(al

l sa

fepoin

ts)

Slide 52/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 113: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: Pauses vs. LDS

●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●

Parallel CMS Shenandoah

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

10−4

10−3

10−2

10−1

100

101

Live Data Size, % of heap

Pau

se t

ime,

sec

(al

l sa

fepoin

ts)

No STW

Old GC

Slide 52/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 114: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: Pauses vs. LDS

●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●

Parallel CMS Shenandoah

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

10−4

10−3

10−2

10−1

100

101

Live Data Size, % of heap

Pau

se t

ime,

sec

(al

l sa

fepoin

ts)

No STW

Young GC

Slide 52/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 115: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: Pauses vs. LDS

●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●

Parallel CMS Shenandoah

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100

10−4

10−3

10−2

10−1

100

101

Live Data Size, % of heap

Pau

se t

ime,

sec

(al

l sa

fepoin

ts)

Heap

Overload

Slide 52/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 116: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: Perf vs. LDS

●●●●●●

●●●●●●●

●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●

50

60708090

100100

200

300

400

500

600700800900

10001000

0 20 40 60 80 100Live Data Size, % of heap

gc ● ● ●Parallel CMS Shenandoah

Operation Time, sec

●●

●●●

●●●●

●●●●●●

●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100Live Data Size, % of heap

gc ● ● ●Parallel CMS Shenandoah

GC Pause Time, %

Slide 53/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 117: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: Perf vs. LDS

●●●●●●

●●●●●●●

●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●

50

60708090

100100

200

300

400

500

600700800900

10001000

0 20 40 60 80 100Live Data Size, % of heap

gc ● ● ●Parallel CMS Shenandoah

Operation Time, sec

●●

●●●

●●●●

●●●●●●

●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100Live Data Size, % of heap

gc ● ● ●Parallel CMS Shenandoah

GC Pause Time, %

GC work happens

in background

Slide 53/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 118: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Intermezzo: Perf vs. LDS

●●●●●●

●●●●●●●

●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●

50

60708090

100100

200

300

400

500

600700800900

10001000

0 20 40 60 80 100Live Data Size, % of heap

gc ● ● ●Parallel CMS Shenandoah

Operation Time, sec

●●

●●●

●●●●

●●●●●●

●●●●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100Live Data Size, % of heap

gc ● ● ●Parallel CMS Shenandoah

GC Pause Time, %

GC work happens

in background...and application

appears faster!

Slide 53/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 119: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Command and Control

Page 120: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Command and Control: Central DogmaConcurrent GCs are in-background heavy-liftersRely on collecting faster than applications allocateFrequently works by itself: threads do useful work, GCthreads are high-priority, there is enough heap to absorballocationsPractical concurrent GCs have to care about unfortunatecases as well

Slide 55/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 121: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Command and Control: Off To The Races[1003.2s][gc] Trigger: Average GC time (4018.8 ms) is

above the time for allocation rate (3254.90 MB/s) to

deplete free headroom (13071M)

Want better conc GC performance, less frequent GC cycles?GC Time. Get more GC threads, have coarser objects, etcAllocation Rate. Get easy on excessive allocationsHeap Size. Give concurrent GC more heap to play with

Slide 56/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 122: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Command and Control: Off To The Races[1003.2s][gc] Trigger: Average GC time (4018.8 ms) is

above the time for allocation rate (3254.90 MB/s) to

deplete free headroom (13071M)

Want better conc GC performance, less frequent GC cycles?GC Time. Get more GC threads, have coarser objects, etcAllocation Rate. Get easy on excessive allocationsHeap Size. Give concurrent GC more heap to play with

Slide 56/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 123: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Command and Control: Off To The Races[1003.2s][gc] Trigger: Average GC time (4018.8 ms) is

above the time for allocation rate (3254.90 MB/s) to

deplete free headroom (13071M)

Want better conc GC performance, less frequent GC cycles?GC Time. Get more GC threads, have coarser objects, etcAllocation Rate. Get easy on excessive allocationsHeap Size. Give concurrent GC more heap to play with

Slide 56/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 124: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Command and Control: Off To The Races[1003.2s][gc] Trigger: Average GC time (4018.8 ms) is

above the time for allocation rate (3254.90 MB/s) to

deplete free headroom (13071M)

Want better conc GC performance, less frequent GC cycles?GC Time. Get more GC threads, have coarser objects, etcAllocation Rate. Get easy on excessive allocationsHeap Size. Give concurrent GC more heap to play with

Slide 56/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 125: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Command and Control: Living SpaceProblem:Concurrent GC needs breathing room to succeed,while applications allocate like madmen

Things that help:Immediate garbage shortcuts: free memory earlyAggressive heap expansion: prefer taking more memoryMutator pacing: stall allocators before they hit the wallHandling failures: gracefully degrade

Slide 57/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 126: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Immediates: Living SpaceProblem:Concurrent GC needs breathing room to succeed,while applications allocate like madmen

Things that help:Immediate garbage shortcuts: free memory earlyAggressive heap expansion: prefer taking more memoryMutator pacing: stall allocators before they hit the wallHandling failures: gracefully degrade

Slide 58/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 127: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Immediates: Obvious ShortcutGC(7) Pause Init Mark 0.614ms

GC(7) Concurrent marking 76812M->76864M(102400M) 1.650ms

GC(7) Total Garbage: 76798M

GC(7) Immediate Garbage: 75072M, 2346 regions (97% of total)

GC(7) Pause Final Mark 0.758ms

GC(7) Concurrent cleanup 76864M->1844M(102400M) 3.346ms

1. Mark is fast, because most things are dead2. Lots of fully dead regions, because most objects are dead3. Cycle shortcuts, because why bother...

Slide 59/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 128: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Immediates: Obvious ShortcutGC(7) Pause Init Mark 0.614ms

GC(7) Concurrent marking 76812M->76864M(102400M) 1.650ms

GC(7) Total Garbage: 76798M

GC(7) Immediate Garbage: 75072M, 2346 regions (97% of total)

GC(7) Pause Final Mark 0.758ms

GC(7) Concurrent cleanup 76864M->1844M(102400M) 3.346ms

1. Mark is fast, because most things are dead

2. Lots of fully dead regions, because most objects are dead3. Cycle shortcuts, because why bother...

Slide 59/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 129: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Immediates: Obvious ShortcutGC(7) Pause Init Mark 0.614ms

GC(7) Concurrent marking 76812M->76864M(102400M) 1.650ms

GC(7) Total Garbage: 76798M

GC(7) Immediate Garbage: 75072M, 2346 regions (97% of total)

GC(7) Pause Final Mark 0.758ms

GC(7) Concurrent cleanup 76864M->1844M(102400M) 3.346ms

1. Mark is fast, because most things are dead2. Lots of fully dead regions, because most objects are dead

3. Cycle shortcuts, because why bother...

Slide 59/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 130: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Immediates: Obvious ShortcutGC(7) Pause Init Mark 0.614ms

GC(7) Concurrent marking 76812M->76864M(102400M) 1.650ms

GC(7) Total Garbage: 76798M

GC(7) Immediate Garbage: 75072M, 2346 regions (97% of total)

GC(7) Pause Final Mark 0.758ms

GC(7) Concurrent cleanup 76864M->1844M(102400M) 3.346ms

1. Mark is fast, because most things are dead2. Lots of fully dead regions, because most objects are dead3. Cycle shortcuts, because why bother...

Slide 59/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 131: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Living SpaceProblem:Concurrent GC needs breathing room to succeed,while applications allocate like madmen

Things that help:Immediate garbage shortcuts: free memory earlyAggressive heap expansion: prefer taking more memoryMutator pacing: stall allocators before they hit the wallHandling failures: gracefully degrade

Slide 60/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 132: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Shenandoah OverheadsShenandoah requires additional word per objectfor forwarding pointer at all times, plus some native structs

Java heap: 1.5x worst and 1.05-1.10x avg overhead«−»: the overhead is non-static«+»: counted in Java heap – no surprise RSS inflationNative structures: 2x marking bitmaps, each 1/64 of heap«−»: -Xmx is still not close to RSS«+»: overhead is static: -Xmx100gmeans 103 GB RSS

Surprise: a significant part of footprint story is heapsizing, not per-object or per-heap overheads

Slide 61/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 133: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Shenandoah OverheadsShenandoah requires additional word per objectfor forwarding pointer at all times, plus some native structs

Java heap: 1.5x worst and 1.05-1.10x avg overhead«−»: the overhead is non-static«+»: counted in Java heap – no surprise RSS inflationNative structures: 2x marking bitmaps, each 1/64 of heap«−»: -Xmx is still not close to RSS«+»: overhead is static: -Xmx100gmeans 103 GB RSSSurprise: a significant part of footprint story is heapsizing, not per-object or per-heap overheadsSlide 61/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 134: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Heap Sizing

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

RS

S,

MB

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Slide 62/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 135: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Heap Sizing

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

RS

S,

MB

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Slide 62/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 136: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Heap Sizing

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

RS

S,

MB

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Slide 62/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 137: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Heap Sizing

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

RS

S,

MB

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Slide 62/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 138: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Heap Sizing

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

RS

S,

MB

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Slide 62/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 139: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Heap Sizing

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

RS

S,

MB

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Aggressive expansionSlide 62/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 140: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Heap Sizing

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

RS

S,

MB

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

First uncommitSlide 62/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 141: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Heap Sizing

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

RS

S,

MB

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Periodic GCSlide 62/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 142: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Heap Sizing

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

RS

S,

MB

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Second uncommitSlide 62/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 143: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Heap Sizing

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

RS

S,

MB

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Very frequent GCsSlide 62/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 144: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: CPU Time Tradeoffs

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

Ja

va

use

r C

PU

, %

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Slide 63/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 145: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: CPU Time Tradeoffs

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

Ja

va

use

r C

PU

, %

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

WarmupsSlide 63/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 146: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: CPU Time Tradeoffs

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

Ja

va

use

r C

PU

, %

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

High footprint, low CPUSlide 63/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 147: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: CPU Time Tradeoffs

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

Ja

va

use

r C

PU

, %

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Low footprint, high CPUSlide 63/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 148: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: CPU Time Tradeoffs

0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120

Start Idle Load Idle Full GC Idle

Ja

va

use

r C

PU

, %

time, sec

wildfly-swarm-rest-http, 30K rps, JDK head x86-64, -Xmx512m

G1Sh

Sh (compact)

Low footprint, low CPUSlide 63/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 149: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Observations1. Footprint story is nuancedBlindly counting bytes taken by Java heap and GC does not cut itFirst-order effect: heap sizing policiesSecond-order effects: per-object and per-reference overheads

2. Forwarding ptr overhead is substantial, but manageable...especially when the alternative is giving up compressed oopsIn-object fwdptr injection cuts the overhead down (see backup)3. Idle footprint seems to be of most interestFew adopters (none?) care about peak footprint, but we still doAnecdote: I am running Shenandoah with my IDEA and CLion,because memory is scarce on my puny ultrabook

Slide 64/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 150: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Observations1. Footprint story is nuancedBlindly counting bytes taken by Java heap and GC does not cut itFirst-order effect: heap sizing policiesSecond-order effects: per-object and per-reference overheads2. Forwarding ptr overhead is substantial, but manageable...especially when the alternative is giving up compressed oopsIn-object fwdptr injection cuts the overhead down (see backup)

3. Idle footprint seems to be of most interestFew adopters (none?) care about peak footprint, but we still doAnecdote: I am running Shenandoah with my IDEA and CLion,because memory is scarce on my puny ultrabook

Slide 64/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 151: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Footprint: Observations1. Footprint story is nuancedBlindly counting bytes taken by Java heap and GC does not cut itFirst-order effect: heap sizing policiesSecond-order effects: per-object and per-reference overheads2. Forwarding ptr overhead is substantial, but manageable...especially when the alternative is giving up compressed oopsIn-object fwdptr injection cuts the overhead down (see backup)3. Idle footprint seems to be of most interestFew adopters (none?) care about peak footprint, but we still doAnecdote: I am running Shenandoah with my IDEA and CLion,because memory is scarce on my puny ultrabook

Slide 64/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 152: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Living SpaceProblem:Concurrent GC needs breathing room to succeed,while applications allocate like madmen

Things that help:Immediate garbage shortcuts: free memory earlyAggressive heap expansion: prefer taking more memoryMutator pacing: stall allocators before they hit the wallHandling failures: gracefully degrade

Slide 65/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 153: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: STW GC Control Loop

Once memory is exhausted, perform GCNatural feedback loop: STW is the nominal modeNot really accessible for concurrent GC?Slide 66/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 154: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Naive Conc GC Control Loop

Memory is exhausted⇒ stall allocation and wait for GCTechnically not a GC pause, but still local latencyAFs usually happen in all threads at once: global latencySlide 67/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 155: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Shenandoah Control Loop

Incremental pacing stalls allocations a bit at a timeIf AF happens, «degenerates»: completes under STWPacing introduces latency, but the capped oneSlide 68/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 156: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Max Pacing, Pauses

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●● ●

● ● ● ● ● ● ● ● ● ● ●

Shenandoah Shenandoah (max pacing)

0 20 40 60 80 100 0 20 40 60 80 100

10−4

10−3

10−2

10−1

100

101

Live Data Size, % of heap

Pau

se t

ime,

sec

(al

l sa

fepoin

ts)

Slide 69/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 157: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Max Pacing, Pauses

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●● ●

● ● ● ● ● ● ● ● ● ● ●

Shenandoah Shenandoah (max pacing)

0 20 40 60 80 100 0 20 40 60 80 100

10−4

10−3

10−2

10−1

100

101

Live Data Size, % of heap

Pau

se t

ime,

sec

(al

l sa

fepoin

ts)

Nuclear option:

max pacing

Slide 69/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 158: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Max Pacing, Times

●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●●●●

60708090

100100

200

300

400

500

600700800900

10001000

0 20 40 60 80 100Live Data Size, % of heap

gc ● ●Shenandoah Shenandoah (max pacing)

Operation Time, sec

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100Live Data Size, % of heap

gc ● ●Shenandoah Shenandoah (max pacing)

GC Pause Time, %

Slide 70/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 159: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Max Pacing, Times

●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●●●●

60708090

100100

200

300

400

500

600700800900

10001000

0 20 40 60 80 100Live Data Size, % of heap

gc ● ●Shenandoah Shenandoah (max pacing)

Operation Time, sec

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100Live Data Size, % of heap

gc ● ●Shenandoah Shenandoah (max pacing)

GC Pause Time, %

Pauses

are invisible

Slide 70/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 160: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Max Pacing, Times

●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●

●●●●●

60708090

100100

200

300

400

500

600700800900

10001000

0 20 40 60 80 100Live Data Size, % of heap

gc ● ●Shenandoah Shenandoah (max pacing)

Operation Time, sec

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100Live Data Size, % of heap

gc ● ●Shenandoah Shenandoah (max pacing)

GC Pause Time, %

Pauses

are invisible

Yet the progress

is wrecked anyway

Slide 70/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 161: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Observations1. Pacing provides essential negative feedback loopThread allocates? Thread pays for it!Thread does not allocate as much? It can run freely!

2. Pacing introduces local latencyHidden from the tools, hidden from usual GC logLatency is not global, making perf analysis harder3. Nuclear option: max pacing delay = +∞Resolves the need for handling allocation failures: threadalways stalls when memory is not availableShenandoah caps delay at 10 ms to avoid cheating

Slide 71/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 162: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Observations1. Pacing provides essential negative feedback loopThread allocates? Thread pays for it!Thread does not allocate as much? It can run freely!2. Pacing introduces local latencyHidden from the tools, hidden from usual GC logLatency is not global, making perf analysis harder

3. Nuclear option: max pacing delay = +∞Resolves the need for handling allocation failures: threadalways stalls when memory is not availableShenandoah caps delay at 10 ms to avoid cheating

Slide 71/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 163: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Pacing: Observations1. Pacing provides essential negative feedback loopThread allocates? Thread pays for it!Thread does not allocate as much? It can run freely!2. Pacing introduces local latencyHidden from the tools, hidden from usual GC logLatency is not global, making perf analysis harder3. Nuclear option: max pacing delay = +∞Resolves the need for handling allocation failures: threadalways stalls when memory is not availableShenandoah caps delay at 10 ms to avoid cheating

Slide 71/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 164: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Handling Failures: Living SpaceProblem:Concurrent GC needs breathing room to succeed,while applications allocate like madmen

Things that help:Immediate garbage shortcuts: free memory earlyAggressive heap expansion: prefer taking more memoryMutator pacing: stall allocators before they hit the wallHandling failures: gracefully degrade

Slide 72/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 165: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Handling Failures: Shenandoah Control Loop

If AF happens, «degenerates»: completes under STWSlide 73/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 166: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Handling Failures: Degenerated GCPause Init Update Refs 0.034ms

Cancelling GC: Allocation Failure

Concurrent update references 7265M->8126M(8192M) 248.467ms

Pause Degenerated GC (Update Refs) 8126M->2716M(8192M) 29.787ms

First allocation failure dives into stop-the-world modeDegenerated GC continues the cycleSecond allocation failure may upgrade to Full GC

Slide 74/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 167: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Handling Failures: Degenerated GCPause Init Update Refs 0.034ms

Cancelling GC: Allocation Failure

Concurrent update references 7265M->8126M(8192M) 248.467ms

Pause Degenerated GC (Update Refs) 8126M->2716M(8192M) 29.787ms

First allocation failure dives into stop-the-world modeDegenerated GC continues the cycleSecond allocation failure may upgrade to Full GC

Slide 74/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 168: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Handling Failures: Full GCFull GC is the Maximum Credible Accident:Parallel, STW, Sliding «Lisp 2»-style GC.

Designed to recover from anything: 99% full regions,heavy (humongous) fragmentation, abort from any pointin concurrent GC, etc.Parallel: Multi-threaded, runs on-par with Parallel GCSliding: No additional memory needed + reuses fwdptrslots to store forwarding data

Slide 75/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 169: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Handling Failures: Observations1. Being fully concurrent is nice, but own the failuresThe failures will happen, accept it«Our perfect GC melted down, because you forgot this magicVM option(, stupid)» flies only that far

2. Graceful and observable degradation is keyGetting worse incrementally is better than falling off the cliffHave enough logging to diagnose the degradations3. Failure paths performance is importantDegenerated GC is not throwing away progressFull GC is optimized too

Slide 76/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 170: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Handling Failures: Observations1. Being fully concurrent is nice, but own the failuresThe failures will happen, accept it«Our perfect GC melted down, because you forgot this magicVM option(, stupid)» flies only that far2. Graceful and observable degradation is keyGetting worse incrementally is better than falling off the cliffHave enough logging to diagnose the degradations

3. Failure paths performance is importantDegenerated GC is not throwing away progressFull GC is optimized too

Slide 76/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 171: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Handling Failures: Observations1. Being fully concurrent is nice, but own the failuresThe failures will happen, accept it«Our perfect GC melted down, because you forgot this magicVM option(, stupid)» flies only that far2. Graceful and observable degradation is keyGetting worse incrementally is better than falling off the cliffHave enough logging to diagnose the degradations3. Failure paths performance is importantDegenerated GC is not throwing away progressFull GC is optimized too

Slide 76/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 172: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Conclusion

Page 173: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Conclusion: In Single PictureUniversal GC does not exist:either low latency, or high throughput(, or low memory footprint)

Choose this for your workload!Slide 78/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 174: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Conclusion: In Single Paragraph1. No GC could detect what tradeoffs you are after: youhave to tell it yourself

2. Stop-the-world GCs beat concurrent GCs in throughputand efficiency. Parallel GC is your choice!3. Concurrent Mark trims down the pauses significantly.G1 is ready for this, use it!4. Сoncurrent Сopy/Сompact needs to be addressed foreven shallower pauses. This is where Shenandoah andZGC come in!

Slide 79/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 175: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Conclusion: In Single Paragraph1. No GC could detect what tradeoffs you are after: youhave to tell it yourself2. Stop-the-world GCs beat concurrent GCs in throughputand efficiency. Parallel GC is your choice!

3. Concurrent Mark trims down the pauses significantly.G1 is ready for this, use it!4. Сoncurrent Сopy/Сompact needs to be addressed foreven shallower pauses. This is where Shenandoah andZGC come in!

Slide 79/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 176: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Conclusion: In Single Paragraph1. No GC could detect what tradeoffs you are after: youhave to tell it yourself2. Stop-the-world GCs beat concurrent GCs in throughputand efficiency. Parallel GC is your choice!3. Concurrent Mark trims down the pauses significantly.G1 is ready for this, use it!

4. Сoncurrent Сopy/Сompact needs to be addressed foreven shallower pauses. This is where Shenandoah andZGC come in!

Slide 79/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 177: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Conclusion: In Single Paragraph1. No GC could detect what tradeoffs you are after: youhave to tell it yourself2. Stop-the-world GCs beat concurrent GCs in throughputand efficiency. Parallel GC is your choice!3. Concurrent Mark trims down the pauses significantly.G1 is ready for this, use it!4. Сoncurrent Сopy/Сompact needs to be addressed foreven shallower pauses. This is where Shenandoah andZGC come in!

Slide 79/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’

Page 178: Shenandoah GC - Part I: The Garbage Collector That Could · Overview:UsualLog LRUFragger,100GBheap,≈80GBlivedata: Pause Init Mark 0.227ms Concurrent marking 84864M->85952M(102400M)

Conclusion: ReleasesEasy to access (development) releases: try it now!

https://wiki.openjdk.java.net/display/shenandoah/

Dev follows latest JDK, backports to 11, 10, and 8JDK 8 backport ships in RHEL 7.4+, Fedora 24+JDK 11 backport ships in Fedora 27+Nightly development builds (tarballs, Docker images)

docker run -it --rm shipilev/openjdk-shenandoah \

java -XX:+UseShenandoahGC -Xlog:gc -version

Slide 80/80. «Shenandoah GC», Aleksey Shipilёv, 2018, D:20180914113310+02’00’