patterns / antipatterns with nosql
DESCRIPTION
NoSQL vs. SQL: practical patterns (and antipatterns to avoid) in real projects. Talk at NoSQL Day 2011.TRANSCRIPT
Luca Bonmassar
[Anti]Patterns with NoSQL
NOSQL DAY ’11
NoSQL vs. SQL
NOSQL DAY ’11
NoSQL / NoAgenda
No specific NoSQL technologies (document, key/value, ...)
No specific NoSQL db features (sharding, replication, ...)
No specific ...
...
NOSQL DAY ’11
Does it solve my problem?
NOSQL DAY ’11
Does it solve my problem?
NOSQL DAY ’11
Does it solve my problem?
But what is your problem?
NOSQL DAY ’11
Does it solve my problem?
But what is your problem?
NoSQL fragmentation (like NoCloud, NoJava, NoMicrosoft, ...)
NOSQL DAY ’11
[Anti]PatternsThe “dynamic” website
Data mapping
The Alien
The binary Alien
The SQLQueue
Overnormalized
Unpredictable tomorrow
NOSQL DAY ’11
Who am I?
Passionate about technology
Favorite topics: Cloud, Virtualization, NoSQL
Gamification fan! (or fun?)
The “dynamic” website
NOSQL DAY ’11
The “dynamic” website
NOSQL DAY ’11
The “dynamic” website
I have a website
NOSQL DAY ’11
The “dynamic” website
I have a website
Dynamic contents everywhere
NOSQL DAY ’11
The “dynamic” website
I have a website
Dynamic contents everywhere
High DB load
NOSQL DAY ’11
The “dynamic” website
NOSQL DAY ’11
The “dynamic” website
NOSQL DAY ’11
The “dynamic” website
SELECT COUNT(###) WHERE ...
SELECT COUNT(###) WHERE ...
SELECT COUNT(###) WHERE ...
SELECT username FROM users WHERE
sessionid = ###
SELECT avatar.* FROM users WHERE ...
SELECT users.*, count(distinct puzzle_id) FROM submissions JOIN users on submissions.user_id = users.id where submissions.current_state =
"succeeded" ...
SELECT COUNT("submissions"."puzzle_id") FROM "submissions" WHERE (puzzle_id IS
NOT NULL) AND (user_id IS NOT NULL) GROUP BY puzzle_id ...
SELECT username FROM users WHERE
sessionid = ###
SELECT * FROM submissions WHERE ...
NOSQL DAY ’11
The “dynamic” website
SELECT COUNT(###) WHERE ...
SELECT COUNT(###) WHERE ...
SELECT COUNT(###) WHERE ...
SELECT username FROM users WHERE
sessionid = ###
SELECT avatar.* FROM users WHERE ...
SELECT users.*, count(distinct puzzle_id) FROM submissions JOIN users on submissions.user_id = users.id where submissions.current_state =
"succeeded" ...
SELECT COUNT("submissions"."puzzle_id") FROM "submissions" WHERE (puzzle_id IS
NOT NULL) AND (user_id IS NOT NULL) GROUP BY puzzle_id ...
SELECT username FROM users WHERE
sessionid = ###
SELECT * FROM submissions WHERE ...
NOSQL DAY ’11
The “dynamic” website
NOSQL DAY ’11
The “dynamic” website
SELECT COUNT(###) WHERE ...
SELECT COUNT(###) WHERE ...
SELECT COUNT(###) WHERE ...
SELECT username FROM users WHERE
sessionid = ###
SELECT avatar.* FROM users WHERE ...
SELECT users.*, count(distinct puzzle_id) FROM submissions JOIN users on submissions.user_id = users.id where submissions.current_state =
"succeeded" ...
SELECT COUNT("submissions"."puzzle_id") FROM "submissions" WHERE (puzzle_id IS
NOT NULL) AND (user_id IS NOT NULL) GROUP BY puzzle_id ...
SELECT username FROM users WHERE
sessionid = ###
SELECT * FROM submissions WHERE ...
NOSQL DAY ’11
The “dynamic” website
Is it that dynamic?
Are there any data structures that fit better than relational?
NOSQL DAY ’11
Cache, Cache, Cache
Use Ad-Hoc Caching (e.g. Rails Caching)
SQL Table as cache
NOSQL DAY ’11
SQL Table as Cache
When you use SQL Table as Cache, a kitten somewhere dies
Use Memcache, MemcacheDB, Redis
NOSQL DAY ’11
Better tools?
NOSQL DAY ’11
Rethink data structures
SELECT COUNT(*) ...SELECT COUNT(*) ...SELECT COUNT(*) ...
submitted++passed++failed++
NOSQL DAY ’11
Rethink data structures
SELECT *, count(distinct ...) FROM submissions JOIN users on ... where submissions.current_state = "succeeded" ...
List(5)List.add / List.replace
Data mapping
NOSQL DAY ’11
Data mapping
An original idea: we need a newsfeed!
Users can comment on feed items
Users can reply to comments
NOSQL DAY ’11
What the customer is looking for
NOSQL DAY ’11
What the developer is thinking about
But how to ...
NOSQL DAY ’11
Eureka!
The item has a feed_id and a parent_id
You have now a navigable SQL data structure!
NOSQL DAY ’11
Data mapping - alternatives
Do not try this at home! (or any office)
A document db can help
{ feed : 42, user : ‘luca.bonmassar’, data : ‘living the dream!’, comments : [{user : ‘jonny’ : ‘make sense!’ }] }
The Alien
NOSQL DAY ’11
The Alien
A type of object stored in a relational database
No (or weak) relations with any other table
Stored in the db because the db == persistency
NOSQL DAY ’11
User Settings
User preferences
user_id, data
Data as BLOB or TEXT
NOSQL DAY ’11
The Alien
The Alien is a key/value entity
Use a key/value storage to store it
The binary Alien
NOSQL DAY ’11
The binary Alien
Like the Alien, but the payload is pure binary data
Common solution to store “small” images
Even worse: Base64 encoded binary
NOSQL DAY ’11
The binary Alien
Move the binary Alien to a binary content provider
S3 or Filesystem are good ones
Let the webserver access/serve them directly
The SQLQueue
NOSQL DAY ’11
The SQLQueue
Distributed components need to exchange data
Producers / Consumers
Backlog of work to be completed
SQL database (== persistency) as persistent queue
NOSQL DAY ’11
Mail delivery
NOSQL DAY ’11
Mail delivery
NOSQL DAY ’11
Mail delivery
NOSQL DAY ’11
Mail delivery
NOSQL DAY ’11
The SQLQueue
No “relations”, like the Alien
Simulating a Queue using AUTO_INCREMENT ids and transactions
NOSQL DAY ’11
SQLQueue
In some countries, SQLQueue is considered a crime against software
Use message queues(Amazon SQS, MemcacheQ, StormMQ, RabbitMQ, ... )
Overnormalized
NOSQL DAY ’11
Overnormalized
NOSQL DAY ’11
Overnormalized
The process of organizing data to minimize redundancy
NOSQL DAY ’11
Overnormalized
The process of organizing data to minimize redundancy
A larger schema is broken into smaller ones
NOSQL DAY ’11
Overnormalized
The process of organizing data to minimize redundancy
A larger schema is broken into smaller ones
user_id, email
NOSQL DAY ’11
Overnormalized
The process of organizing data to minimize redundancy
A larger schema is broken into smaller ones
user_id, email
user_id, phone_num
NOSQL DAY ’11
Overnormalized
The process of organizing data to minimize redundancy
A larger schema is broken into smaller ones
user_id, email
user_id, phone_num
user_id, badge
NOSQL DAY ’11
Overnormalized
PRO:
reduce redundancy
less overhead
each table scale separately
NOSQL DAY ’11
OvernormalizedCons:
NOSQL DAY ’11
OvernormalizedCons:
... JOIN... JOIN... JOIN... JOIN... JOIN... JOIN... JOIN...
NOSQL DAY ’11
Overnormalized
Cache the normalized data
Denormalize / keep a replica of the denormalized view
Use document db or key/value storage for the replica
Unpredictable tomorrow
NOSQL DAY ’11
Unpredictable tomorrow
You are now part of a new Agile(TM) project
You are Agile(TM), so:
No complete specs
No complete use cases
NOSQL DAY ’11
How many wonderful things around me
A mobile App w/ internet backend
“Simple” use cases
User can log in
User can update their location
User can get all the many wonderful things they have around themselves
NOSQL DAY ’11
How many wonderful things around me
Data Model:
User
Places
Aliens!!! Aliens!!! Aliens!!!
NOSQL DAY ’11
...but sooner or later...
NOSQL DAY ’11
How many wonderful things around me
Users can send/receive friendship invitations
Users can import FB friends, Twitter followers
Users can interact with messages, pokes
Users can check-in into places
Users can share their checkins with friends
...
NOSQL DAY ’11
Unpredictable tomorrow
No silver bullets
Mix technologies
E.g. relational databases to handle relations
Migrations are painful, but always an option
One more thing
NOSQL DAY ’11
Recap
Does it make sense with the relational paradigm?
Do I need a persistent storage or a relational database?
NOSQL DAY ’11
Thanks! Any Questions?
NOSQL DAY ’11
Contacts
linkedin.com/in/lucabonmassar
twitter.com/lucabonmassar
joind.in/talk/view/2939