cassandra anti-patterns (in 5m)

Upload: phil-kim

Post on 07-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    1/13

    Cassandra Anti-Patterns (in 5m)Matthew F. Dennis // @mdennis

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    2/13

    Non-Sun (err, Non-Oracle) JVM

    No OpenJDK

    No Blackdown (anyone still use this?)

    Etc, etc, etc; just use the Sun (Oracle) JVM

    At least u22, but in general the latest release(unless you have specific reasons otherwise)

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    3/13

    CommitLog+Data On The Same Disk

    Don't put the commit log and data directories onthe same set of spindles

    commit log gets a single spindle entirely to itself (standardconsumer SATA disks easily sustain > 80 MB/s insequential writes)

    DOES NOT APPLY TO SSDS or EC2

    SSDs have no seek time

    EC2 ephemeral drives are still virtualized (but not thesame as EBS)

    On EC2 or SSDs: use one RAID set for both thecommit log and data directories

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    4/13

    EBS volumes on EC2

    Sounds great, nice feature set, but

    Not predictable

    freezes are common

    Throughput limited in many cases

    Use ephemeral drives instead

    Stripe them

    Both commit log and data directory on the sameraid set

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    5/13

    Oversized JVM heaps

    6 8 GB is good (assuming sufficient ram onyour boxen)

    10 12 GB is possible and in some

    circumstances correct 16GB == max JVM heap size

    > 16GB => badness

    JVM heap ~= boxen RAM => badness (always)

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    6/13

    JVM heap size -v- GC suckage

    GCSuc

    kage

    JVM heap size

    ~6GB

    ~10GB

    ~16GB

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    7/13

    Largebatchmutations(large in number of distinct rows)

    Timeout / failure => entire mutation must beretried => wasted work

    Larger mutations => higher likely hood of

    timehood 1000 mutations to perform? Do 100 batches of

    10 in parallel instead of one batch of 1000

    Exact number or rows/batch is variabledepending on HW, network, load, etc;experiment! (10-100 is a good starting point)

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    8/13

    OPP / BOP partitioner

    You probably shouldn't use it

    No really, you almost certainly shouldn't use it

    Creates hot spots

    Requires baby sitting from ops

    Not as well tested nor is it widely deployed

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    9/13

    C* auto selection of tokens

    Always specify your initial token.

    Auto select doesn't do what you think it doesnor does it do what you want

    loadbalance is even worse, it doesn't currently do whatyou think, what you want or what it claims; F#@* mycluster would be a much more apt name thanloadbalance

    Future (next?) release of OPSC will remove yourbalancing woes

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    10/13

    Super Columns

    10 15 percent performance penalty on reads and writes

    Easier / better to use to composite columns

    0.8.x makes this a lot easier

    Done manually in 0.7.x and is still better

    Devs working in C* code despise (loathe?) them API probably won't be deprecated, but implementation will be

    replaced behind the seen with composites (may be ok at that pointto use them, but should probably just use composite API direclty)

    Cassandra and DataStax is committed to maintain the API going

    forward, even if the implementation changes

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    11/13

    Read Before Write

    Race conditions

    Abuses/Thrashes cache (row, key and page)

    Increases latency

    Increases IO requirements (by a lot)

    Increases size in the client

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    12/13

    Winblows

    Try to avoid it, you'll be happier

    Not always possible? Then, I'm sorry for your pain

    Run 'nix (in particular, probably Linux)

    Easier to get help (IRC, email, meetups, etc)

    C* performs better

    Better tested

    Cheaper Wider deployed (by a lot)

  • 8/6/2019 Cassandra Anti-Patterns (in 5m)

    13/13

    Cassandra Anti-Patterns

    Matthew F. Dennis // @mdennis

    Q?