cassandra conference

38
1 Issues and Tips for Big Data on Cassandra Shotaro Kamio Architecture and Core Technology dept., DU, Rakuten, Inc.

Upload: rakuten-inc

Post on 17-Dec-2014

3.603 views

Category:

Technology


1 download

DESCRIPTION

Issues and Tips for Big Data on Cassandra, by Shotaro Kamio, Rakuten.2011/10/05, Cassandra Conference

TRANSCRIPT

Page 1: Cassandra conference

1

Issues and Tips for Big Data on Cassandra

Shotaro KamioArchitecture and Core Technology dept., DU, Rakuten, Inc.

Page 2: Cassandra conference

2

Contents

1 Big Data Problem in Rakuten

2 Contributions to Cassandra Project

3 System Architecture

4 Details and Tips

5 Conclusion

Page 3: Cassandra conference

3

Contents

1 Big Data Problem in Rakuten

2 Contributions to Cassandra Project

3 System Architecture

4 Details and Tips

5 Conclusion

Page 4: Cassandra conference

4

Big Data Problem in Rakuten

User data increases exponentially.– Double its size every second year.

More than 1 billion records. We need a scalable solution to handle this big data.

Mon

th-Y

ear

Jun-

97

Dec-

97

Jun-

98

Dec-

98

Jun-

99

Dec-

99

Jun-

00

Jan-

00

Jun-

01

Dec-

01

Jun-

02

Dec-

02

Jun-

03

Dec-

03

Jun-

04

Dec-

04

Jun-

05

Dec-

05

Jun-

06

Dec-

06

Jun-

07

Dec-

07

Jun-

08

Dec-

08

Jun-

09

Dec-

09

Jun-

10

Dec-

10

Tota

l si

ze

2 years

x 2

Page 5: Cassandra conference

5

Importance of Data Store in Rakuten

• Rakuten have a lot of data– User data, item data, reviews, etc.

• Expect connectivity to Hadoop• High-performance, fault-tolerant, scalable

storage is necessary → Cassandra

Data A Data B

Service A Service B Service C …

Page 6: Cassandra conference

6

Performance of New System (Cassandra)

Store all data in 1 day– Achieved 15,000 updates/sec with quorum.– 50 times faster than DB.

Good read throughput– Handle more than 100 read threads at a

time.

DB New

x 50

15,000 updates/sec

Page 7: Cassandra conference

7

Contents

1 Big Data Problem in Rakuten

2 Contributions to Cassandra Project

3 System Architecture

4 Details and Tips

5 Conclusion

Page 8: Cassandra conference

8

Contributions to Cassandra Project

• Tested 0.7.x - 0.8.x

• Bug reports / Feedback to JIRA– CASSANDRA-2212, 2297, 2406, 2557, 2626 and more– Bugs related to specific condition, secondary index and large

dataset.• Contribute patches

– Talk this in later slides.

Page 9: Cassandra conference

9

JIRA: Overflow in bytesPastMark(..)

• https://issues.apache.org/jira/browse/CASSANDRA-2297

• Hit the error on a row which is more than 60GB– The row has column families of super column type

• bytesPastMark method was fixed to return long value.

Page 10: Cassandra conference

10

JIRA: Stack overflow while compacting

• https://issues.apache.org/jira/browse/CASSANDRA-2626

• Long series of compaction causes stack overflow.← This occurs with large dataset.

• Helped debugging.

Page 11: Cassandra conference

11

Challenges in OSS

• Not well tested with real big data.→ Rakuten can feedback a lot to community.

– Bug report, patches, and communication.• OSS becomes much stable.

Feedback

Page 12: Cassandra conference

12

Contribution of Patches

• Column name aliasing– Encode column name in compact way.– Useful to reduce data size for structured (relational)

data.– Reduce SSTable size by 15%.

• Variable-length quantity (VLQ) compression– Reduce encoding overhead in columns– Reduce SSTable size by 17%.

Page 13: Cassandra conference

13

VLQ Compression Patch

• Serializer is changed to use VLQ encoding.• Typical column has fixed length of:

– 2 bytes for column name length– 1 byte for flag– 8 bytes for TTL, deletion time– 8 bytes for timestamp – 4 bytes for length of value.

• Those encoding overheads are reduced.

Page 14: Cassandra conference

14

Contents

1 Big Data Problem in Rakuten

2 Contributions to Cassandra Project

3 System Architecture

4 Details and Tips

5 Conclusion

Page 15: Cassandra conference

15

System Architecture

        

Backup

Cassandra 1

DB

DB

Batch

Data feeder

Cassandra 2

DB

DB

Batch

ServicesDB

Page 16: Cassandra conference

16

System Architecture

        

Backup

Cassandra 1

DB

DB

Batch

Data feeder

Cassandra 2

DB

DB

Batch

ServicesDB

Page 17: Cassandra conference

17

Planning: Schema Design

• Data modeling is a key of scalability.• Design schema

– Query patterns for super column and normal column.• Think queries based on use cases.

– Batch operation to reduce number of requests because Thrift has communication overhead.

• Secondary Index– We used it to find out updated data.

• Choose partitioner appropriately.– One partitioner for a cluster.

Page 18: Cassandra conference

18

Secondary Index

• Pros– Useful to query based on a column value.– It can reduce consistency problem.– For example, to query updated data based on update-time.

• Cons– Performance of complex query depends on data.

E.g., Year == 2011 and Price < 100

Page 19: Cassandra conference

19

A Bit Detail of Secondary Index

Works like a hash + filters.1. Pick up a row which has a key for the index (hash).2. Apply filters.

– Collect the result if all filters are matched.

1. Repeat until the requested number of rows are obtained.

Key1 Year = 2011

Key2 Year = 2011 Price = 1,000

Key3 Year = 2011 Price = 10

Key4 Year = 2011 Price = 10,000

Key5 Year = 2011 Price = 200

E.g., Year == 2011 and Price < 100

Many keys of year = 2011,

but a few results.

Page 20: Cassandra conference

20

A Bit Detail of Secondary Index (2)

Consider the frequency of results for the query– Very few result in large data set → query might get

timeout. Careful data/query design is necessary at this moment. Improvement is discussed: CASSANDRA-2915

Page 21: Cassandra conference

21

Planning: Data Size Estimation

• Estimate future data volume• Serialization overhead: x 3 - 4

– Big overhead for small data. – We improved with custom patches, compression code

• Cassandra 1.0 can use Snappy/Deflate compression.• Replication: x 3 (depends on your decision)• Compaction: x 2 or above

Page 22: Cassandra conference

22

Other Factors for Data Size

• Obsolete SSTables– Disk usage may keep high after compaction.– Cassandra 0.8.x relies on GC to remove obsolete SSTables.– Improved in 1.0.

• How to balance data distribution– Disk usage can be unbalanced (ByteOrderedPartitioner).– Partitioning, key design, initial token assignment.– Very helpful if you know data in advance.

• Backup scheme affects disk space– Need backup space.– Discuss later.

Page 23: Cassandra conference

23

Configuration

• We adopted Cassandra 0.8.x + custom patches. • Without mmap

– No noticeable difference on performance– Easier to monitor and debug memory usage and GC related

issues• ulimit

– Avoid file descriptor shortage. Need more than number of db files. Bug??

– “memlock unlimited” for JNA– Make /etc/security/limits.d/cassandra.conf (Redhat)

Page 24: Cassandra conference

24

JVM / GC

• Have to avoid Full GC anytime. • JVM cannot utilize large heap over 15G.

– Slow GC. Can be unstable.– Don’t give too much data/cache into heap.– Off-heap cache is available in 0.8.1

• Cassandra may use more memory than heap size.– ulimit –d 25000000 (max data segment size)– ulimit –v 75000000 (max virtual memory size)

• Need benchmark to know appropriate parameters.

Page 25: Cassandra conference

25

Parameter Tuning for Failure Detector

• Cassandra uses Phi Accrual Failure Detector– The Φ Accrual Failure Detector [SRDS'04]

• Failure detection error occurswhen node is having too much access and/or GC running

• Depends on number of nodes:– Larger cluster, larger number.

double phi(long tnow){ int size = arrivalIntervals_.size(); double log = 0d; if ( size > 0 ) { double t = tnow - tLast_; double probability = p(t); log = (-1) * Math.log10( probability ); } return log; } double p(double t){ double mean = mean(); double exponent = (-1)*(t)/mean; return Math.pow(Math.E, exponent);}

Page 26: Cassandra conference

26

Hardware

• Benchmark is important to decide hardware.– Requirements for performance, data size, etc.– Cassandra is good at utilizing CPU cores.

• Network ports will be bottleneck to scale-out…– Large number of low-spec servers or– Small number of high-spec servers.

Our case:

• High-spec CPU and SSD drives

• 2 clusters (active and test cluster)

Page 27: Cassandra conference

27

System Architecture

        

Backup

Cassandra 1

DB

DB

Batch

Data feeder

Cassandra 2

DB

DB

Batch

ServicesDB

Page 28: Cassandra conference

28

Customize Hector Library

• Query can timeout on Cassandra:– When Cassandra is in high load temporarily.– Request of large result set– Timeout of secondary index query

• Hector retries forever when query get timed-out.• Client cannot detect infinite loop.• Customize:

– 3 Timeouts to return exception to client.

Page 29: Cassandra conference

29

System Architecture

        

Backup

Cassandra 1

DB

DB

Batch

Data feeder

Cassandra 2

DB

DB

Batch

ServicesDB

Page 30: Cassandra conference

30

Testing: Data Consistency Check Tool

• We wanted to make sure data is not corrupted within Cassandra.

• Made a tool to check the data consistency.

Another

databaseCassandra

Input data

(Periodically comes in)

Process A

Insert, update, and delete data

Process B

Compare data with that in Cassandra

- Insert

- Update

- Delete

Page 31: Cassandra conference

31

Testing: Data Consistency Check Tool (2)

Compare only keys of data, not contents. Useful to diagnose which part is wrong in test phase. We found out other team’s bug as well

Page 32: Cassandra conference

32

Repair

• Some types of query doesn’t trigger read repair.• Nodetool repair is tricky on big data.

– Disk usage– Time consuming

→ Read all data afterward: Read repair

• Discussion for improvement is going on:– CASSANDRA-2699

Page 33: Cassandra conference

33

System Architecture

        

Backup

Cassandra 1

DB

DB

Batch

Data feeder

Cassandra 2

DB

DB

Batch

ServicesDB

Page 34: Cassandra conference

34

Backup Scheme

Backup might be required to shorten recovery time. 1. Snapshot to local disk

– Plan disk size at server estimation phase.

1. Full backup of input data– We had full data feed several times for various reasons:

E.g., Logic change, schema change, data corruption, etc.

Backup

Incoming data

SnapshotSnapshotSnapshot

Cassandra

DB

DB

Page 35: Cassandra conference

35

Contents

1 Big Data Problem in Rakuten

2 Contributions to Cassandra Project

3 System Architecture

4 Details and Tips

5 Conclusion

Page 36: Cassandra conference

36

Conclusion

• Rakuten uses Cassandra with Big data.• We’ll continue contributing to OSS.

Page 37: Cassandra conference

37

最後に・・・

ちょっと宣伝させてください・・・

Page 38: Cassandra conference

38

We are hiring! 中途採用を大募集しております!

人と社会を(ネットを通じて)Empowermentし自らの成功を通じ社会を変革し豊かにする

楽天のMission

楽天のGOAL

To become No.1

Internet Service Company

in the World楽天のMission&GOALに共感いただける方は是非ご連絡ください!

 t[email protected]