bulk exporting from cassandra - carlo cabanilla
TRANSCRIPT
Bulk exporting datafrom Cassandra
Carlo Cabanilla@clofresh
Why export?
snapshot
sstable2json
Killing IO on live cluster
sstable2json sstable2csv, with filters
ionice -c 3
Need a place to put it
EBS to the rescue
gzipped
S3cmd
Need to dedupe
Hadoop
numpy pickles
Haderp Mortar Data
numpy pickles msgpack lz4
gzipped lzo'd
Haderp file naming!2010-07-27~org-1018~m-48778.csv-1,316.gz
S3 copy
Bulk exporting datafrom Cassandra
Carlo Cabanilla@clofresh