helsinki cassandra meetup #2: from postgres to cassandra

23
From Postgres to Cassandra In four easy steps Axel Eirola < > Jarrod Creado < > [email protected] [email protected]

Upload: bruno-almeida

Post on 17-Dec-2014

1.175 views

Category:

Technology


1 download

DESCRIPTION

- From PostgreSQL to Cassandra, In Four Easy Steps (Axel Eirola and Jarrod Creado, LabDev, F-Secure) In this presentation Axel and Jarrod will tell you the tale of our Network Reputation System Live Migration ( PostgreSQL to Cassandra ). F-Secure Network Reputation System is a core element of the protection we provide to our customers. It consists of URLs and other network related metadata, used to make fast assessments regarding their reputation. Currently the Network Reputation System database contains hundreds of millions of URLs. More info about Cassandra @ F-Secure? http://www.planetcassandra.com/blog/post/apache-cassandra-at-f-secure

TRANSCRIPT

Page 1: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

From Postgres to Cassandra

In four easy stepsAxel Eirola < >

Jarrod Creado < >[email protected]@f-secure.com

Page 2: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Agenda1. Postgres2. Cassandra3. ???4. Profit

Page 3: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

0. Context

Page 4: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Categorizing the internetHundreds of millions

Data size in the terabytes

Reputation metadata:

Categories: adult, gambling, …

Safety: malicious, safe, …

Page 5: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Automatic processing(re)Processing hundreds of thousands of URLs per dayComputation divided among multiple services, each withmultiple instancesDowntime not an option

Page 6: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Manual researchData mining capabilitiesResearching (aimlessly poking around)Reporting

Page 7: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

1. Postgres

Page 8: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

BCNF up in thisPlanned for storage, not queries

Highly normalized

Stiff schema, hard to add more fields

Page 9: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Sharding like a bossSegmenting the URL keyspace

One (or more) box for each segment

Difficult to add more capacity

We got eight single points of failure

Upgrading means downtime

Page 10: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Index all the things Building queries is hard due to the structure of the schemaManaging indices for those queries is hardThe mess needs to be abstracted away from the user, this is also hard

Page 11: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

2. Cassandra

Page 12: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Easy managementEasy scaling up as more data is stored

Out of the box:

Replication

Pagination

Load balancing

Less downtime during upgrades

TTL

Page 13: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Mapping dataStructure of our data is suitable for NoSQLMostly based around single URLsGiven a URL, fetch metadata

Page 14: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Got queries?Cassandra schema designed for fixed pattern access

performed by automation

Human free-form searches offloaded to Elasticsearch

Load on one doesn't affect the other

Page 15: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

DenormalizeProvide fixed pattern access for automationRelations become ranges in the column namespaceThis is pre-CQL, so we are doing the old-school wayMinimize the amount of read-then-write scenarios

collections

Page 16: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

PostgresUrl_Category

url_key

category_key

timestamp

Category

key

name

Url

key

url

Url

row_key url (c)_<category_name>

<url_key> <url> <timestamp>

row_key <url_key>

<category_name> <empty>

Category

Cassandra

Page 17: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

3. ???

Page 18: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Going into productionbefore going into production

DAL (data access layer) abstracts away the split databasesImplement new features in Cassandra onlyGet a feel of Cassandra before taking it into full use

Page 19: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

A tale of two databasesRun both databases in parallelWrites:

New data, and updates, into both databases

Blind writes makes it easy to do partial updates

Reads:Reads from both databases, cross-validate responses

Easy to move responsibilities from one database to another

Page 20: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Migration boiled down to this1. Dump URL keys form Postgres into batches2. Custom migration script to chew a batch; for each URL in

batch:2.1. Read data from Postgres

2.2. Delete Cassandra row key for each URL

2.3. Write fresh data from Postgres into Cassandra

3. Log failing URLs4. Cross-validate on reads for a while to ensure successful

migration

Page 21: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

4. Profit

Page 22: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Bro-tipsDecide what you don't want to migrateDry run while testing, keep an eye on the performanceStart in small batches, and verify the results before proceedingParallelize the batches, if you need to speed it upKeep an eye on performance, throttle if necessaryEverything doesn't always go as planned, make it easy torepeat migrationMake sure the cluster is prepared for the migration, reservetime to tweak if not

Page 23: Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Kiitos