scalable php applications with cassandra

48
@akira28 Scalable PHP web applications with Apache Cassandra Andrea De Pirro

Upload: andrea-de-pirro

Post on 28-Aug-2014

400 views

Category:

Software


2 download

DESCRIPTION

Developing a fast and scalable application for your fancy new startup is hard. Many factors are responsible for the slowness of a website, like network latency, webserver configuration or large assets, but as any developer involved with high volumes knows, the real bottleneck is the database. During the latest years a bunch of NoSQL solutions came to the rescue, each one with his pros and cons. Apache Cassandra is one of the most used and mature "Big Data" NoSQL, and is currently deployed on several projects by tech giants like Twitter, eBay and Netflix, due to its extremely high throughput, automatic replication and decentralization. During the session I'll talk about how to leverage Apache Cassandra best features and data modeling best practices for your web application projects to respond to huge peaks of traffic, using open source tools such as Zend Framework and phpcassa, and describing a large e-commerce project currently using Cassandra.

TRANSCRIPT

Page 1: Scalable PHP Applications With Cassandra

@akira28

Scalable PHP web applications with Apache

Cassandra

Andrea De Pirro

Page 2: Scalable PHP Applications With Cassandra

@akira28

About me

• Co-founder at Yameveo

• 9+ years developing in PHP

• 2+ years experience with Apache Cassandra

• Zend Framework Certified Engineer

Page 3: Scalable PHP Applications With Cassandra

@akira28

Yameveo

Founded on 2012 in Barcelona, Yameveo is a young, dynamic and international company specialised in e-

commerce and web applications development !

!

www.yameveo.com @Yameveo

Page 4: Scalable PHP Applications With Cassandra

@akira28

Yameveo StoreDozens of e-commerce modules

store.yameveo.com

Page 5: Scalable PHP Applications With Cassandra

@akira28

What we will talk about

• Apache Cassandra

• Data Modeling

• Cassandra & PHP

• Case study

Page 6: Scalable PHP Applications With Cassandra

@akira28

Apache CassandraApache Cassandra is a massively scalable open source

NoSQL database. Cassandra is perfect for managing large amounts of structured, semi-structured, and

unstructured data across multiple data centers and the cloud. Cassandra delivers continuous availability, linear

scalability, and operational simplicity across many commodity servers with no single point of failure, along

with a powerful dynamic data model designed for maximum flexibility and fast response times.

Apache Cassandra documentation

Page 7: Scalable PHP Applications With Cassandra

@akira28

Why Cassandra• Open Source (enterprise distribution also available)

• Linearly scalable

• Fault-tolerant

• Fully distributed

• Highly performant

• Flexible data model

Page 8: Scalable PHP Applications With Cassandra

@akira28

Cassandra Uses• Web analytics

• Web Applications

• Transaction logging

• Data collection

• …

Page 9: Scalable PHP Applications With Cassandra

@akira28

Page 10: Scalable PHP Applications With Cassandra

@akira28

Architecture

Page 11: Scalable PHP Applications With Cassandra

@akira28

CAP TheoremOnly two of:!!

1. Consistency all nodes see the same data at the same time

2. Availability the guarantee that every request receives a response about whether it was successful or failed

3. Partition Tolerance the system continues to operate despite message loss or failure of part of the system

Page 12: Scalable PHP Applications With Cassandra

@akira28

CAP Theorem

Page 13: Scalable PHP Applications With Cassandra

@akira28

Architecture

• Ring

• Each node has a unique token and is identical

• Intra-ring communication via “Gossip” protocol

• Tokens range from 0 to 2^127

Page 14: Scalable PHP Applications With Cassandra

@akira28

Partitioning

Page 15: Scalable PHP Applications With Cassandra

@akira28

Data Modeling

Page 16: Scalable PHP Applications With Cassandra

@akira28

Data Model• Cluster

• Keyspace

• Column Family

• Super Column

• Composite Columns

Page 17: Scalable PHP Applications With Cassandra

@akira28

Data Model

Page 18: Scalable PHP Applications With Cassandra

@akira28

Data Model

Page 19: Scalable PHP Applications With Cassandra

@akira28

Data Modeling Problems

• Neither join nor subquery support

• Limited support for aggregation

• Ordering is done per-partition

• Ordering is specified at table creation time

Page 20: Scalable PHP Applications With Cassandra

@akira28

Data Modeling Best Practices

• Don’t think of a relational table

• Model column families around query patterns

• De-normalize and duplicate for read performance

• Storing values in column names is perfectly OK

• Leverage wide rows for ordering, grouping, and filtering

Page 21: Scalable PHP Applications With Cassandra

@akira28

Some Numbers

Page 22: Scalable PHP Applications With Cassandra

@akira28

Some Numbers

Page 23: Scalable PHP Applications With Cassandra

@akira28

Page 24: Scalable PHP Applications With Cassandra

@akira28

Cassandra & PHP

Page 25: Scalable PHP Applications With Cassandra

@akira28

Apache ThriftThrift is an interface definition language and binary

communication protocol that is used to define and create services for numerous languages. It is used as a remote procedure call (RPC) framework and was developed at

Facebook for "scalable cross-language services development"

Wikipedia

Page 26: Scalable PHP Applications With Cassandra

@akira28

Apache Thrift

Page 27: Scalable PHP Applications With Cassandra

@akira28

PhpCassa• Open Source

• Uses the Thrift protocol

• Compatible with Cassandra 0.7 through 1.2

• Optional C extension for improved performance

https://github.com/thobbs/phpcassa !

require: “thobbs/phpcassa”: “v1.1.0”

Page 28: Scalable PHP Applications With Cassandra

@akira28

ExamplesOpening Connections!!$pool = new ConnectionPool('Keyspace1'); !Create a column family object!!$users = new ColumnFamily($pool, 'Standard1'); $super = new SuperColumnFamily($pool, 'Super1'); !Inserting!!$users->insert('key', array('column1' => 'value1', 'column2' => 'value2')); !Querying!!$users->get(‘key'); // returns an array $users->multiget(array('key1', ‘key2')); // returns an array of arrays !Removing!!$users->remove('key1'); // removes whole row $users->remove('key1', 'column1'); // removes 'column1'

Page 29: Scalable PHP Applications With Cassandra

@akira28

Case Study

Page 30: Scalable PHP Applications With Cassandra

@akira28

Flash Deals website• 5 Apache servers

• 32 GB of RAM

• 8 CPU

• 6 Cassandra nodes

• 4+ millions visits/month

• 17+ millions pages/month

• 600GB of data

Page 31: Scalable PHP Applications With Cassandra

@akira28

Page 32: Scalable PHP Applications With Cassandra

@akira28

Requirement• The client wanted a new way to navigate the

website: deal attributes

• Millions of deals (hundreds new and expiring everyday)

• Dozens of stores and categories

• Performance is key!

Page 33: Scalable PHP Applications With Cassandra

@akira28

How We Solved It

• Each day we have new deals, so queries based on date and attributes

• Leverage Cassandra wide-rows to create indexes

• Use Cassandra multiGet whenever possible

Page 34: Scalable PHP Applications With Cassandra

@akira28

Deals CFRowKey name price attributes …

211 Miyagi Sushi 29 [21,20,114]

432 Mos Eisley Cantina 19 [21,20]

12 iPhone 5 32GB 549 [7]

… … …

Page 35: Scalable PHP Applications With Cassandra

@akira28

Attributes CFRowKey name keyword

21 Restaurants restaurants

114 Japanese japanese

20 Barcelona barcelona

7 Technology tech

Page 36: Scalable PHP Applications With Cassandra

@akira28

Cities CFRowKey name attributeid …

1 Madrid 12

8 Barcelona 20

32 Amsterdam 81

Page 37: Scalable PHP Applications With Cassandra

@akira28

Urls CFRowKey attributes city …

/restaurants/barcelona [21] 8

/restaurants/barcelona/japanese [21,114] 8

/tech [7] -

/restaurants [21] -

… … …

Page 38: Scalable PHP Applications With Cassandra

@akira28

AttributesDeals CFRowKey 211 432 12 … …

21|20140621 true true -

114|20140621 true - -

20|20140621 true true -

7|20140621 - - true

… … … …

Page 39: Scalable PHP Applications With Cassandra

@akira28

Code/** * List deals action * eg. /restaurants/barcelona/japanese * */ public function dealsAction() { $path = $this->getUrlPath(); // cleaned query string ! $url = $this->manager->getUrl($path); $attributes = Zend_Json::decode($url[‘attributes’]); $cityId = $url[‘city’]; $deals = $this->manager->getDeals($attributes, $cityId); $this->view->assign(‘deals’, $deals); … }

Controller

Page 40: Scalable PHP Applications With Cassandra

@akira28

Code/** * Retrieves the url containing attributes and city infos * * @param string $path * @return array $url */ public function getUrl($path) { $pool = new ConnectionPool('Keyspace'); $urls = new ColumnFamily($pool, 'Urls'); try { $url = $urls->get($path); } catch (Exception $e) { … } return $url; }

Manager

Page 41: Scalable PHP Applications With Cassandra

@akira28

Code/** * Retrieves the url containing attributes and city infos * * @param array $attributes * @param int $cityId * @return array $deals */ public function getDeals($attributes, $cityId) { $pool = new ConnectionPool('Keyspace'); $dealsCF = new ColumnFamily($pool, ‘Deals’); if(!empty($cityId) { $attributes[] = $this->getAttributeIdByCity($cityId); } try { $dealsIds = $this->getDealsIdsByAttributes($attributes); $deals = $dealsCF->multiget($dealsIds); } catch (Exception $e) { … } return $deals; }

Manager

Page 42: Scalable PHP Applications With Cassandra

@akira28

Code/** * Retrieves an array of deals ids given an array of attribute ids * * @param array $attributes * @return array $dealsIds */ protected function getDealsIdsByAttributes($attributes) { $dealsIds = array(); $dealsGroups = array(); $date = date(‘Ymd’); $attributesDeals= new ColumnFamily($pool, 'AttributesDeals'); foreach($attributes as $attributeId) { $attributeKey =“$attributeId|$date"; $dealsGroups[] = array_keys($attributesDeals->get($attributeKey)); // columns! } $countGroups = count($dealsGroups); if($countGroups > 1) { $dealsIds = call_user_func_array('array_intersect', $dealsGroups); } elseif($countGroups == 1) { $dealsIds = reset($dealsGroups); } return $dealsIds; }

Manager

Page 43: Scalable PHP Applications With Cassandra

@akira28

Cassandra future (and present)

• New PHP driver wrapping the C++ driver

• Cassandra 2.0

• CQL 3.0

Page 44: Scalable PHP Applications With Cassandra

@akira28

Resources

• www.yameveo.com

• http://planetcassandra.org

• https://github.com/thobbs/phpcassa

• http://www.hakkalabs.co/articles/cassandra-data-modeling-guide

Page 45: Scalable PHP Applications With Cassandra

@akira28

Resources• http://www.ebaytechblog.com/2012/07/16/

cassandra-data-modeling-best-practices-part-1/

• http://www.slideshare.net/DataStax/cassandra-community-webinar-introduction-to-apache-cassandra-12

• http://www.geroba.com/cassandra/apache-cassandra-byteorderedpartitioner/

Page 46: Scalable PHP Applications With Cassandra

@akira28

Questions?

Page 47: Scalable PHP Applications With Cassandra

@akira28

[email protected]

WE ARE HIRING!

Page 48: Scalable PHP Applications With Cassandra

@akira28

Dank!joind.in/10865 lanyrd.com/scxyhk !

www.yameveo.com !

@akira28 @Yameveo !

http://bit.ly/andreadepirro