compaction, compaction everywhere

52
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License SOUTH BAY CASSANDRA USERS NOVEMBER 2014 COMPACTION, COMPACTION, EVERYWHERE. Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License Aaron Morton @aaronmorton Co-Founder & Principal Consultant

Upload: planet-cassandra

Post on 27-Jun-2015

1.345 views

Category:

Technology


3 download

DESCRIPTION

Compaction is the consequence of the Log-Structured Merge-Tree engine used by Cassandra. Starting with the SizeTieredCompactionStrategy, we added the LeveledCompactionStrategy and recently the DateTieredCompactionStrategy it has always required some care and feeding. In this talk Aaron Morton, Co-Founder and Principal Consultant at The Last Pickle, will discuss the different strategies, their options, and when to use them.

TRANSCRIPT

Page 1: Compaction, Compaction Everywhere

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

SOUTH BAY CASSANDRA USERS NOVEMBER 2014

COMPACTION, COMPACTION, EVERYWHERE.

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Aaron Morton@aaronmorton

Co-Founder & Principal Consultant

Page 2: Compaction, Compaction Everywhere

About The Last Pickle.

Work with clients to deliver and improve Apache Cassandra based solutions.

Apache Cassandra Committer, DataStax MVP, Apache

Usergrid Committer. Based in New Zealand & USA.

Page 3: Compaction, Compaction Everywhere

Compaction?STCSLCS

DTCS

Page 4: Compaction, Compaction Everywhere

Compaction?

Because reasons.

Page 5: Compaction, Compaction Everywhere

No compaction?

Row fragmentation would result in dramatically increased

read latency.

Page 6: Compaction, Compaction Everywhere

No compaction?

Increased file count would increase memory usage.

Page 7: Compaction, Compaction Everywhere

No compaction?

Overwrites and deletions would result in wasted disk

space.

Page 8: Compaction, Compaction Everywhere

Compaction?

Yes.

Page 9: Compaction, Compaction Everywhere

Compaction?

Log Structured Merge Tree.

Page 10: Compaction, Compaction Everywhere

Compaction?

Creating new files when flushing to disk improves performance and reduces

complexity.

Page 11: Compaction, Compaction Everywhere

Compaction?

SSTable 1foo: dishwasher (ts 10): tomato purple (ts 10): cromulent

SSTable 2foo: frink (ts 20): flayven monkey (ts 10): embiggins

SSTable 3 SSTable 4foo: dishwasher (ts 15): tomacco

SSTable 5

Page 12: Compaction, Compaction Everywhere

Demo.

Page 13: Compaction, Compaction Everywhere

nodetool cfhistograms foo bar

SSTables per Read 1 sstables: 149 2 sstables: 62 3 sstables: 65 4 sstables: 50 5 sstables: 45 6 sstables: 44 7 sstables: 76 8 sstables: 72 10 sstables: 305 12 sstables: 390

Page 14: Compaction, Compaction Everywhere

Compaction?STCSLCS

DTCS

Page 15: Compaction, Compaction Everywhere

SizeTieredCompactionStrategy

The first compaction strategy.

Group files of a similar size for compaction.

Page 16: Compaction, Compaction Everywhere

SizeTieredCompactionStrategy

Works well when data is written to initially and then

only read from.

Page 17: Compaction, Compaction Everywhere

STCS - After flush.

Tier 0 ( < 50 MB) Tier 1 ~125MB

Page 18: Compaction, Compaction Everywhere

STCS - Compaction Starts

Tier 0 ( < 50 MB) Tier 1 ~125MB

Page 19: Compaction, Compaction Everywhere

STCS - New SSTable

Tier 0 ( < 50 MB) Tier 1 ~125MB

Page 20: Compaction, Compaction Everywhere

STCS - Purge old SSTables

Tier 0 ( < 50 MB) Tier 1 ~125MB

Page 21: Compaction, Compaction Everywhere

STCS - Compaction Starts (again)

Tier 0 ( < 50 MB) Tier 1 ~125MB

Page 22: Compaction, Compaction Everywhere

STCS - Final State

Tier 0 ( < 50 MB) Tier 1 ~200MB Tier 2 ~800MB

Page 23: Compaction, Compaction Everywhere

Maximum size of SSTables in the “small” bucket.

Default 50

STCS - min_sstable_size

Page 24: Compaction, Compaction Everywhere

Lower bound of the bucket size compared to the average

size in the bucket. Default 0.5

STCS - bucket_low

Page 25: Compaction, Compaction Everywhere

Upper bound of the bucket size compared to the average

size in the bucket. Default 1.5

STCS - bucket_high

Page 26: Compaction, Compaction Everywhere

Maximum percentage of reads SSTables ignored by STCS may

be responsible for.Default 0

STCS - cold_reads_to_omit

Page 27: Compaction, Compaction Everywhere

Compact buckets with at least this many SSTables.

Default 4

min_compaction_threshold

Page 28: Compaction, Compaction Everywhere

Compact no more than this many SSTables in a bucket.

Default 32

max_compaction_threshold

Page 29: Compaction, Compaction Everywhere

Compaction?STCSLCS

DTCS

Page 30: Compaction, Compaction Everywhere

LeveledCompactionStrategy

Based on LevelDB from the Chromium team.

http://leveldb.org/

Page 31: Compaction, Compaction Everywhere

LeveledCompactionStrategy

Works well with overwrites and tombstones.

Provides low read latency.

Page 32: Compaction, Compaction Everywhere

LeveledCompactionStrategy

“Uses twice the disk IO”

Page 33: Compaction, Compaction Everywhere

DataStax Blogs

“Leveled Compaction in Apache Cassandra”

“When to Use Leveled Compaction”

Page 34: Compaction, Compaction Everywhere

LCS - “It’s going to be all levels Jerry”

Level Number of Files

0 Unlimited*1 1002 10003 10000

Page 35: Compaction, Compaction Everywhere

LCS in nodetool cfstats

Column Family: HappyPandaCF SSTable count: 21 SSTables in each level: [1, 7, 13, 0, 0, 0, 0, 0, 0]

Column Family: SadPandaCF SSTable count: 710 SSTables in each level: [1, 10, 117/100, 582, 0, 0, 0, 0, 0]

Page 36: Compaction, Compaction Everywhere

LCS - Starting out

level 0

Page 37: Compaction, Compaction Everywhere

LCS - New File in Level 1

level 0 level 1

Page 38: Compaction, Compaction Everywhere

LCS - Later, Compact L0 With Overlapping L1

level 1level 0

Page 39: Compaction, Compaction Everywhere

LCS - Another File in L1

level 0 level 1

Page 40: Compaction, Compaction Everywhere

LCS - Level 1 Full, compact overlapping

level 0 level 1

Page 41: Compaction, Compaction Everywhere

LCS - New Files in Level 2

level 0 level 1 level 2

Page 42: Compaction, Compaction Everywhere

Maximum* size of each SSTable at all levels.

Default 160

LCS - sstable_size_in_mb

Page 43: Compaction, Compaction Everywhere

Compaction?STCSLCS

DTCS

Page 44: Compaction, Compaction Everywhere

DateTieredCompactionStrategy

CASSANDRA-6602

In 2.0.11 and 2.1.1“Experimental”

Page 45: Compaction, Compaction Everywhere

DTCS - Compact Newest Time Bucket

80 hours > 365 days20 hours4 hours

Page 46: Compaction, Compaction Everywhere

DTCS - New File in First Bucket

80 hours > 365 days20 hours4 hours

Page 47: Compaction, Compaction Everywhere

DTCS - Promoted to Later Bucket

80 hours > 365 days20 hours4 hours

Page 48: Compaction, Compaction Everywhere

Target size.

Multiplied by min_sstable_size.

DTCS - base_time_seconds

Page 49: Compaction, Compaction Everywhere

What TimeUnit you are using for your WriteTime.

DTCS - timestamp_resolution

Page 50: Compaction, Compaction Everywhere

Do not compact SSTables where the youngest

WRITETIME is older than this.

DTCS - max_sstable_age_days

Page 51: Compaction, Compaction Everywhere

Thanks.

Page 52: Compaction, Compaction Everywhere

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License