libon cassandra summiteu2014

138
Billion Records from SQL to Cassandra, lessons learned DuyHai Doan Brice Dutheil

Upload: duyhai-doan

Post on 07-Jul-2015

782 views

Category:

Technology


0 download

DESCRIPTION

Lessons learnd, billions of contacts data from SQL to Cassandra

TRANSCRIPT

Page 1: Libon cassandra summiteu2014

Billion Records from SQL to Cassandra, lessons learned DuyHai Doan Brice Dutheil

Page 2: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Who are we ?

Brice Dutheil

Mockito Java Track Lead @ Devoxx France Independant contractor @ Libon (Orange-Vallée)

DuyHai Doan

Achilles Cassandra Technical Advocate Former Java Developer @ Libon

2

Page 3: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Agenda •  Libon context

•  Migration strategy

•  Business code migration

•  Data Modeling

•  Take Away

3

Page 4: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Libon Context

Page 5: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

What is Libon ? •  Messaging app

•  VOIP (out)

•  Custom voicemail & greetings

•  SMS/chat/file transfer

•  Contacts matching

5

Page 6: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Contact Matching

6

Libon User

Page 7: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Contact Matching

7

Libon User Friend

Page 8: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Contact Matching

8

Libon User Friend

Contact matching

Page 9: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Contact Matching

9

Libon User Friend

Accept link

Page 10: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  Application grew over the years

10

Page 11: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  Application grew over the years

•  Already using Cassandra to handle events

•  messaging / file sharing / SMS / notifications

•  Cassandra R/W latencies ≈ 0,4 ms

•  server response time under 10 ms

11

Page 12: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  About contacts …

12

Page 13: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

13

Page 14: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

•  1 user ≈ 300 contacts

14

Page 15: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

•  1 user ≈ 300 contacts

•  with millions users ☞ billions of contacts to handle

15

Page 16: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

•  1 user ≈ 300 contacts

•  with millions users ☞ billions of contacts to handle

•  query latency unpredictable

16

Page 17: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil 17

Page 18: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

18

Page 19: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

19

Page 20: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

20

Page 21: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

21

Page 22: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

•  hardware capacity increased

22

Page 23: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

•  hardware capacity increased

That worked

23

Page 24: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

•  hardware capacity increased

That worked but …

24

Page 25: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Back-end application

RDBMS Cassandra

25

Page 26: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Next Challenges •  High Availability (DB failure, site failure …)

26

Page 27: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Next Challenges •  High Availability (DB failure, site failure …)

•  Predictable performance at scale

27

Page 28: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Next Challenges •  High Availability (DB failure, site failure …)

•  Predictable performance at scale

•  Going to multi data-centers

28

Page 29: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Going for Cassandra •  Denormalize (if possible …)

29

Page 30: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Going for Cassandra •  Denormalize (if possible …)

•  Know your business ☞ know your queries

30

Page 31: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Going for Cassandra •  Denormalize (if possible …)

•  Know your business ☞ know your queries

•  Linear scaling out

31

Page 32: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Going for Cassandra •  Denormalize (if possible …)

•  Know your business ☞ know your queries

•  Linear scaling out

•  Consistent performance

32

Page 33: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data Migration Strategy

Page 34: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Objectives •  No downtime

34

Page 35: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Objectives •  No downtime

•  No concurrency corner-cases

35

Page 36: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Objectives •  No downtime

•  No concurrency corner-cases

•  Safe rollback possible

36

Page 37: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Objectives •  No downtime

•  No concurrency corner-cases

•  Safe rollback possible

•  Replay-ability & resume-ability

37

Page 38: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Strategy •  3 phases

38

Page 39: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Strategy •  3 phases

•  Write contacts to both data stores

39

Page 40: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Strategy •  3 phases

•  Write contacts to both data stores

•  Old contacts migration

40

Page 41: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Strategy •  3 phases

•  Write contacts to both data stores

•  Old contacts migration

•  Switch to Cassandra …

•  … and deprecate SQL

41

Page 42: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 1 Back end server

· ·

·

SQL SQL SQL

C*

C*

C* C*

C*

Write

contactUUID

42

contactId … contactUUID 129363 123e4567-

e89b-12d3… 834849

contacId(long) + contactUUID

Page 43: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 1 Back end server

· ·

·

SQL SQL SQL

C*

C*

C* C*

C*

Read

43

Page 44: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

SQL SQL SQL

C*

C*

C* C*

C*

For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL

•  On live production, migrate old contacts

44

Old contacts created before phase 1

Page 45: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

SQL SQL SQL

C*

C*

C* C*

C*

For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL

Logged batches of INSERT INTO contacts(..) VALUES(…) USING TIMESTAMP now() - 1 week

•  On live production, migrate old contacts

45

Old contacts created before phase 1

Page 46: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

USING TIMESTAMP now() - 1 week 😳

46

Page 47: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2 •  During data migration …

47

Page 48: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2 •  During data migration …

•  … concurrent writes from the migration batch …

48

Page 49: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2 •  During data migration …

•  … concurrent writes from the migration batch …

•  … and updates from production for the same contact

49

Page 50: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

contact_uuid name (now -1 week) … name (now) …

Johny … Johnny …

Insert from batch (to the past)

Update from production

50

Page 51: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

contact_uuid name (now -1 week) … name (now) …

Johny … Johnny …

Future reads pick the most up-to-date value

51

Page 52: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

"Write to the Past… to save the Future"

Libon – 2014/10/08

52

Page 53: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 3 Back end server

· ·

·

SQL SQL SQL

C*

C*

C* C*

C*

Write

❌ 53

Page 54: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Business Code Refactoring

Page 55: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory •  Written for RDBMS

55

Page 56: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory •  Written for RDBMS

•  Lots of joins (no surprise)

56

Page 57: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory •  Written for RDBMS

•  Lots of joins (no surprise)

•  Designed around transactions

57

Page 58: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory •  Written for RDBMS

•  Lots of joins (no surprise)

•  Designed around transactions

•  Spring @Transactional everywhere

58

Page 59: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory cont. •  Entities go through Services & Repositories

59

Repositories

Services

ContactEntity

Page 60: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory cont. •  Hibernate is auto-magic

60

Page 61: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory cont. •  Hibernate is auto-magic

•  lazy loading

•  1st level cache

•  N+1 select

61

Repositories

Services

ContactEntity

Page 62: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Which options ? •  Throw existing code …

•  … and re-design from scratch for Cassandra

62

Page 63: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Which options ? •  Throw existing code …

•  … and re-design from scratch for Cassandra No way !

63

Page 64: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Quality •  Existing business code has…

•  … ≈ 3500 unit tests

64

Page 65: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Quality •  Existing business code has…

•  … ≈ 3500 unit tests

•  and ≈600+ integration tests

65

Page 66: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Quality •  We are TDD aficionados …

66

Page 67: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Quality •  We are TDD aficionados …

•  … and we love our code coverage

67

Page 68: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Quality

"The code coverage is one of your most

valuable technical asset" Libon – since beginning

68

Page 69: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Repositories

Services

Refactoring Strategy

ContactMatchingService ContactService ContactSync

ContactEntity

n 1 n n

69

Page 70: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Repositories

Services

Refactoring Strategy

ContactMatchingService ContactService

ContactNoSQLEntity

ContactSync

ContactEntity

n 1 n n

70

Proxy

Page 71: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Repositories

Services

Refactoring Strategy

ContactMatchingService ContactService

ContactNoSQLEntity

ContactSync

ContactEntity

n 1 n n

Denorm2 … DenormN Denorm1

71

Proxy

Page 72: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Refactoring Strategy •  Use CQRS

•  ContactReadRepository

•  ContactWriteRepository

•  ContactUpdateRepository

•  ContactDeleteRepository

72

Page 73: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Refactoring Strategy •  ContactReadRepository

•  direct sequential read

•  no joins

•  1 read ≈ 1 SELECT

73

Page 74: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Refactoring Strategy •  ContactWriteRepository

•  write to all denormalized tables

•  using CQL logged batches

•  use TTLs

74

Page 75: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Refactoring Strategy •  ContactUpdateRepository

•  read-before-write most of the time 😟

•  rare updates ☞ acceptable perf penalty

75

Page 76: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Refactoring Strategy •  ContactDeleteRepository

•  delete

•  update contact modification date

76

Page 77: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Outcome •  5 months of 2 men work

77

Page 78: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

78

Page 79: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

•  Lots of performance benchmarks using Gatling

79

Page 80: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Gatling Output

80

Page 81: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

•  Lots of performance benchmarks using Gatling

☞ data model & code validation

81

Page 82: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

•  Lots of performance benchmarks using Gatling

☞ data model & code validation

•  … we are almost there for production

82

Page 83: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data Model

Page 84: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Denormalization, the good •  Support fast reads

•  1 read ≈ 1 SELECT

•  Worthy because mostly read, few updates

84

Page 85: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Denormalization, the bad •  Updating mutable data can be nightmare

•  Data model bound by existing client-facing API

•  Update paths very error-prone without tests

85

Page 86: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model in detail

Contacts_by_id

Contacts_by_identifiers

Contacts_in_profiles

Contacts_by_modification_date

Contacts_by_firstname_lastname

Contacts_linked_user

86

Page 87: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model in detail

Contacts_by_id

Contacts_by_identifiers

Contacts_in_profiles

Contacts_by_modification_date

Contacts_by_firstname_lastname

Contacts_linked_user

87

user_id always component

of partition key

Page 88: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Scalable design

88

n1

n2

n3

n4

n5

n6

n7

n8

A

B

C

D

E

F

G

H

user_id1

user_id2

user_id3

user_id4

user_id5

Page 89: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Scalable design

89

n1

n2

n3

n4

n5

n6

n7

n8

A

B

C

D

E

F

G

H

user_id1 user_id2

user_id3

user_id4

user_id5

Page 90: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Bloom filters in action

90

•  For some tables, partition key = (user_id, contact_id)

☞ fast look-up, leverages Bloom filters

☞ touches 1 SSTable most of the time

Page 91: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model in detail

Contacts_by_id

Contacts_by_identifiers

Contacts_in_profiles

Contacts_by_modification_date

Contacts_by_firstname_lastname

Contacts_linked_user

91

Wide partition Bucketed

Page 92: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

A "queue" story

92

•  contacts_by_modification_date

•  queue-like pattern 😭

Page 93: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

A "queue" story

93

•  contacts_by_modification_date

•  queue-like pattern 😭

☞ buckets to the rescue

user_id:2014-12 date35 date12 … … date47

… … … …

user_id:2014-11 date11 date12 … … date34

… … … …

Page 94: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model summary •  7 tables for denormalization

94

Page 95: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model summary •  7 tables for denormalization

•  Normalize some tables because rare access

95

Page 96: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model summary •  7 tables for denormalization

•  Normalize some tables because rare access

•  Read-before write in most update scenarios 😟

96

Page 97: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  In SQL, auto-generated long using sequence

•  In Cassandra, auto-generated timeuuid

97

Page 98: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  How to store both types ?

98

Page 99: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  How to store both types ?

•  As text ? ☞ easy solution …

99

Page 100: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  How to store both types ?

•  As text ? ☞ easy solution …

•  … but waste of space !

•  because encoded as UTF-8 or ASCII in Cassandra

100

Page 101: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  Long ☞ 8 bytes

•  Long as text(UTF-8: 1 byte) ☞ "digits count" bytes

101

Page 102: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  UUID ☞ 16 bytes

•  32 hex chars + 4 hyphens = 36 chars

•  UUID as text(UTF-8: 1 byte) ☞ 36 bytes

•  Bytes overhead = 36 – 16 = 20 bytes

102

Page 103: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  20 bytes wasted per contact uuid

103

Page 104: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  20 bytes wasted per contact uuid

•  × 7 denormalizations = 140 bytes per contact uuid

104

Page 105: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  20 bytes wasted per contact uuid

•  × 7 denormalizations = 140 bytes per contact uuid

•  × 109 contacts = 140 GB wasted

😠 105

not even counting replication factor …

Page 106: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  ☞ just save contact id as byte[ ]

106

Page 107: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  ☞ just save contact id as byte[ ]

•  Achilles @TypeTransformer for automatic conversion (see later)

107

Page 108: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  ☞ just save contact id as byte[ ]

•  Achilles @TypeTransformer for automatic conversion (see later)

•  Use blobAsBigInt( ) or blobAsUUID( ) to view data

108

Page 109: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Advanced "object mapper"

•  Fluent API

•  Tons of features

•  TDD friendly

109

Page 110: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dirty checking, why is it important ?

110

Page 111: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dirty checking, why is it important ?

•  1 contact ≈ 8 mutable fields

111

Page 112: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dirty checking, why is it important ?

•  1 contact ≈ 8 mutable fields

•  × 7 denormalizations = 56 update combinations …

112

Page 113: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dirty checking, why is it important ?

•  1 contact ≈ 8 mutable fields

•  × 7 denormalizations = 56 update combinations …

•  and not even counting multiple fields updates …

113

Page 114: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Are you going to manually generate 56+ prepared

statements for all possible updates ?

114

Page 115: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Are you going to manually generate 56+ prepared

statements for all possible updates ?

•  Or just use dynamic plain string statements and get some perf penalty ?

115

Page 116: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dirty check in action

//No read-before-write ContactEntity proxy = manager.forUpdate(ContactEntity.class, contactId); proxy.setFirstName(…); proxy.setLastName(…); //type-safe updates proxy.setAddress(…);

manager.update(proxy);

116

Page 117: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

117

Empty Entity

DirtyMap

Proxy Setters interception

PrimaryKey

Page 118: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dynamic statements generation

UPDATE contacts SET firstname=?, lastname=?,address=? WHERE contact_id=?

118

prepared statements are cached, of course

Page 119: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Insert strategy, what is it ?

119

Page 120: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Simple INSERT prepared statement

INSERT INTO contacts(contact_id,name,age,address,gender,avatar,…) VALUES(?, ?, ?, ? … ?);

120

Page 121: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Runtime values binding

•  some columns are optional

preparedStatement.bind(49374,’John DOE’,33, null, null, …, null);

121

Page 122: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

Wait … are you saying inserting null in CQL???

😳

122

Page 123: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

Inserting null ≡ creating tombstones

123

Page 124: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

Inserting null ≡ creating tombstones × 7 denormalizations

124

Page 125: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

Inserting null ≡ creating tombstones × 7 denormalizations

× billions of contacts created

😱 125

not even counting replication factor …

Page 126: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

@Entity(table = "contacts_by_id ») @Strategy(insert = InsertStrategy.NOT_NULL_FIELDS) public class ContactById {

}

126

•  Simple annotation

Page 127: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Runtime dynamic INSERT statement

INSERT INTO contacts(contact_id, name, age, address,) VALUES(:contact_id, :name, :age, :address);

127

prepared statements are cached, of course

Page 128: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

@PartitionKey @Column(name = "contact_id") @TypeTransformer(valueCodecClass = ContactIdToBytes.class) private ContactId contactId;

128

•  Remember the contactId ⇄ byte[ ] conversion ?

BYOC ☞ Bring Your Own Codec

Page 129: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles public interface Codec<FROM, TO> { Class<FROM> sourceType(); Class<TO> targetType(); TO encode(FROM fromJava) FROM decode(TO fromCassandra); }

129

Page 130: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

130

2014-12-01 14:25:20,554 Bound statement : [INSERT INTO contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES (:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...]

2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND (modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead]

•  Dynamic logging in action

Page 131: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

131

•  Dynamic logging

•  runtime activation

•  no need to recompile/re-deploy

•  save us hours of debugging

•  TRACE log level ☞ query tracing

Page 132: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Take Away

Page 133: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Conditions for success •  Data modeling is crucial

133

Page 134: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

134

Page 135: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

•  Data type conversion can be tricky

135

Page 136: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

•  Data type conversion can be tricky

•  Benchmark !

136

Page 137: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

•  Data type conversion can be tricky

•  Benchmark !

•  Mindset shifts for the team

137

Page 138: Libon cassandra summiteu2014

Thank You