Transcript
Page 1: Compaction, Compaction Everywhere

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

SOUTH BAY CASSANDRA USERS NOVEMBER 2014

COMPACTION, COMPACTION, EVERYWHERE.

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Aaron Morton@aaronmorton

Co-Founder & Principal Consultant

Page 2: Compaction, Compaction Everywhere

About The Last Pickle.

Work with clients to deliver and improve Apache Cassandra based solutions.

Apache Cassandra Committer, DataStax MVP, Apache

Usergrid Committer. Based in New Zealand & USA.

Page 3: Compaction, Compaction Everywhere

Compaction?STCSLCS

DTCS

Page 4: Compaction, Compaction Everywhere

Compaction?

Because reasons.

Page 5: Compaction, Compaction Everywhere

No compaction?

Row fragmentation would result in dramatically increased

read latency.

Page 6: Compaction, Compaction Everywhere

No compaction?

Increased file count would increase memory usage.

Page 7: Compaction, Compaction Everywhere

No compaction?

Overwrites and deletions would result in wasted disk

space.

Page 8: Compaction, Compaction Everywhere

Compaction?

Yes.

Page 9: Compaction, Compaction Everywhere

Compaction?

Log Structured Merge Tree.

Page 10: Compaction, Compaction Everywhere

Compaction?

Creating new files when flushing to disk improves performance and reduces

complexity.

Page 11: Compaction, Compaction Everywhere

Compaction?

SSTable 1foo: dishwasher (ts 10): tomato purple (ts 10): cromulent

SSTable 2foo: frink (ts 20): flayven monkey (ts 10): embiggins

SSTable 3 SSTable 4foo: dishwasher (ts 15): tomacco

SSTable 5

Page 12: Compaction, Compaction Everywhere

Demo.

Page 13: Compaction, Compaction Everywhere

nodetool cfhistograms foo bar

SSTables per Read 1 sstables: 149 2 sstables: 62 3 sstables: 65 4 sstables: 50 5 sstables: 45 6 sstables: 44 7 sstables: 76 8 sstables: 72 10 sstables: 305 12 sstables: 390

Page 14: Compaction, Compaction Everywhere

Compaction?STCSLCS

DTCS

Page 15: Compaction, Compaction Everywhere

SizeTieredCompactionStrategy

The first compaction strategy.

Group files of a similar size for compaction.

Page 16: Compaction, Compaction Everywhere

SizeTieredCompactionStrategy

Works well when data is written to initially and then

only read from.

Page 17: Compaction, Compaction Everywhere

STCS - After flush.

Tier 0 ( < 50 MB) Tier 1 ~125MB

Page 18: Compaction, Compaction Everywhere

STCS - Compaction Starts

Tier 0 ( < 50 MB) Tier 1 ~125MB

Page 19: Compaction, Compaction Everywhere

STCS - New SSTable

Tier 0 ( < 50 MB) Tier 1 ~125MB

Page 20: Compaction, Compaction Everywhere

STCS - Purge old SSTables

Tier 0 ( < 50 MB) Tier 1 ~125MB

Page 21: Compaction, Compaction Everywhere

STCS - Compaction Starts (again)

Tier 0 ( < 50 MB) Tier 1 ~125MB

Page 22: Compaction, Compaction Everywhere

STCS - Final State

Tier 0 ( < 50 MB) Tier 1 ~200MB Tier 2 ~800MB

Page 23: Compaction, Compaction Everywhere

Maximum size of SSTables in the “small” bucket.

Default 50

STCS - min_sstable_size

Page 24: Compaction, Compaction Everywhere

Lower bound of the bucket size compared to the average

size in the bucket. Default 0.5

STCS - bucket_low

Page 25: Compaction, Compaction Everywhere

Upper bound of the bucket size compared to the average

size in the bucket. Default 1.5

STCS - bucket_high

Page 26: Compaction, Compaction Everywhere

Maximum percentage of reads SSTables ignored by STCS may

be responsible for.Default 0

STCS - cold_reads_to_omit

Page 27: Compaction, Compaction Everywhere

Compact buckets with at least this many SSTables.

Default 4

min_compaction_threshold

Page 28: Compaction, Compaction Everywhere

Compact no more than this many SSTables in a bucket.

Default 32

max_compaction_threshold

Page 29: Compaction, Compaction Everywhere

Compaction?STCSLCS

DTCS

Page 30: Compaction, Compaction Everywhere

LeveledCompactionStrategy

Based on LevelDB from the Chromium team.

http://leveldb.org/

Page 31: Compaction, Compaction Everywhere

LeveledCompactionStrategy

Works well with overwrites and tombstones.

Provides low read latency.

Page 32: Compaction, Compaction Everywhere

LeveledCompactionStrategy

“Uses twice the disk IO”

Page 33: Compaction, Compaction Everywhere

DataStax Blogs

“Leveled Compaction in Apache Cassandra”

“When to Use Leveled Compaction”

Page 34: Compaction, Compaction Everywhere

LCS - “It’s going to be all levels Jerry”

Level Number of Files

0 Unlimited*1 1002 10003 10000

Page 35: Compaction, Compaction Everywhere

LCS in nodetool cfstats

Column Family: HappyPandaCF SSTable count: 21 SSTables in each level: [1, 7, 13, 0, 0, 0, 0, 0, 0]

Column Family: SadPandaCF SSTable count: 710 SSTables in each level: [1, 10, 117/100, 582, 0, 0, 0, 0, 0]

Page 36: Compaction, Compaction Everywhere

LCS - Starting out

level 0

Page 37: Compaction, Compaction Everywhere

LCS - New File in Level 1

level 0 level 1

Page 38: Compaction, Compaction Everywhere

LCS - Later, Compact L0 With Overlapping L1

level 1level 0

Page 39: Compaction, Compaction Everywhere

LCS - Another File in L1

level 0 level 1

Page 40: Compaction, Compaction Everywhere

LCS - Level 1 Full, compact overlapping

level 0 level 1

Page 41: Compaction, Compaction Everywhere

LCS - New Files in Level 2

level 0 level 1 level 2

Page 42: Compaction, Compaction Everywhere

Maximum* size of each SSTable at all levels.

Default 160

LCS - sstable_size_in_mb

Page 43: Compaction, Compaction Everywhere

Compaction?STCSLCS

DTCS

Page 44: Compaction, Compaction Everywhere

DateTieredCompactionStrategy

CASSANDRA-6602

In 2.0.11 and 2.1.1“Experimental”

Page 45: Compaction, Compaction Everywhere

DTCS - Compact Newest Time Bucket

80 hours > 365 days20 hours4 hours

Page 46: Compaction, Compaction Everywhere

DTCS - New File in First Bucket

80 hours > 365 days20 hours4 hours

Page 47: Compaction, Compaction Everywhere

DTCS - Promoted to Later Bucket

80 hours > 365 days20 hours4 hours

Page 48: Compaction, Compaction Everywhere

Target size.

Multiplied by min_sstable_size.

DTCS - base_time_seconds

Page 49: Compaction, Compaction Everywhere

What TimeUnit you are using for your WriteTime.

DTCS - timestamp_resolution

Page 50: Compaction, Compaction Everywhere

Do not compact SSTables where the youngest

WRITETIME is older than this.

DTCS - max_sstable_age_days

Page 51: Compaction, Compaction Everywhere

Thanks.

Page 52: Compaction, Compaction Everywhere

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License


Top Related