redis - the hacker's database

29
The Hacker’s Database Amir Salihefendic (amix)

Upload: amir-salihefnedic

Post on 28-Apr-2015

704 views

Category:

Documents


3 download

DESCRIPTION

PyCon Russia talk by Amir Salihefendic

TRANSCRIPT

Page 1: Redis - The Hacker's Database

The  Hacker’s  Database  

Amir  Salihefendic  (amix)  

Page 2: Redis - The Hacker's Database

About  Me  

•  Co-­‐founder  and  former  CTO  of  Plurk.com    •  Helped  Plurk  scale  to  millions  of  users,  billions  of  pages  views  and  8+  billion  unique  data  items.  With  minimal  hardware!  

•  Founder  of  Doist.io  creators  of  Todoist  and  Wedoist  

Page 3: Redis - The Hacker's Database

Outline  of  the  talk  •  Plurk  Timelines  opKmizaKon:  How  we  saved  hundreds  of  thousands  of  dollars    

•  What’s  great  about  Redis?    

•  Different  sample  implementaKons:  –  redis_wrap  –  redis_graph  –  redis_queue    

•  Advanced  analyKcs  using  Redis  –  bitmapist  and  bitmapist.cohort  

Page 4: Redis - The Hacker's Database
Page 5: Redis - The Hacker's Database

Problem  ExponenKal  data  growth  in  Social  Networks  

data size

number of users

Page 6: Redis - The Hacker's Database

The  Easy  Solu=on  Throw  money  at  the  problem  

Page 7: Redis - The Hacker's Database

The  Smarter  Solu=on  

data size

number of users

Reduce  to  linear  data  growth    

Page 8: Redis - The Hacker's Database

Example:  Timelines  

Page 9: Redis - The Hacker's Database

Example:  Timelines  

timelinedata size

number of users

Page 10: Redis - The Hacker's Database

Example:  Timelines  

timelinedata size

number of users

SoluKon:  Chea=ng!  Make  Kmelines  a  fixed  size  -­‐  500  messages  

•  O(1)  inserKon  •  O(1)  update  •  Cache  able  

Page 11: Redis - The Hacker's Database

Plurk’s  =melines  migra=on  path  

             

•  Problem  with  MySQL  and  Tokyo  Tyrant?  Death  by  IO  

Tokyo  Tyrant  

Page 12: Redis - The Hacker's Database

What’s  great  about  Redis?  

• Everything  is  in  memory,  but  the  data  is  persistent.    • Amazing  performance:  100.000+  SETs  pr.  sec  80.000+  GETs  pr.  sec  

Page 13: Redis - The Hacker's Database

Redis  Rich  Datatypes  

•  Rela=onal  databases  Schemas,  tables,  columns,  rows,  indexes  etc.    •  Column  databases  (BigTable,  hBase  etc.)  Schemas,  columns,  column  families,  rows  etc.    •  Redis  key-­‐value,  sets,  lists,  hashes,  bitmaps,  etc.  

Page 14: Redis - The Hacker's Database

Redis  datatypes  resemble  datatypes  in  programming  languages.    They  are  natural  to  us!  

Page 15: Redis - The Hacker's Database

redis_wrap  

•  Implements  a  wrapper  for  Redis  datatypes  so  they  mimic  the  datatypes  found  in  Python    

•  100  lines  of  code    

•  h_ps://github.com/Doist/redis_wrap    

Page 16: Redis - The Hacker's Database

redis_wrap  

# Mimic of Python listsbears = get_list('bears')bears.append('grizzly')assert len(bears) == 1assert 'grizzly' in bears  

# Mimic of hashes villains = get_hash('villains')assert 'riddler' not in villainsvillains['riddler'] = 'Edward Nigma'assert 'riddler' in villainsassert len(villains.keys()) == 1del villains['riddler']assert len(villains) == 0  

# Mimic of Python setsfishes = get_set('fishes')assert 'nemo' not in fishesfishes.add('nemo')assert 'nemo' in fishesfor item in fishes: assert item == 'nemo'  

Page 17: Redis - The Hacker's Database

redis_graph  

•  Implements  a  simple  graph  database  in  Python    

•  Can  scale  to  a  few  million  nodes  easily  

•  You  could  use  something  similar  to  implement  LinkedIn’s  “who  is  connected  to  who”  feature    

•  Under  40  lines  of  code    

•  h_ps://github.com/Doist/redis_graph    

Page 18: Redis - The Hacker's Database

redis_graph  

# Adding an edge between nodesadd_edge(from_node='frodo', to_node='gandalf')assert has_edge(from_node='frodo', to_node='gandalf') == True # Getting neighbors of a nodeassert list(neighbors('frodo')) == ['gandalf']# Deleting edgesdelete_edge(from_node='frodo', to_node='gandalf')  

# Setting node valuesset_node_value('frodo', '1')assert get_node_value('frodo') == '1'# Setting edge valuesset_edge_value('frodo_baggins', '2')assert get_edge_value('frodo_baggins') == '2'  

Page 19: Redis - The Hacker's Database

redis_graph:  The  implementaKon  from redis_wrap import *#--- Edges ----------------------------------------------def add_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) edges.add( to_node )def delete_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) key_node_y = to_node if key_node_y in edges: edges.remove( key_node_y )def has_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) return to_node in edgesdef neighbors(node_x, system='default'): return get_set( node_x, system=system )  

#--- Node values ----------------------------def get_node_value(node_x, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).get( node_key )def set_node_value(node_x, value, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).set( node_key, value )#--- Edge values -----------------------------def get_edge_value(edge_x, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).get( edge_key )def set_edge_value(edge_x, value, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).set( edge_key, value )  

Page 20: Redis - The Hacker's Database

redis_queue  

•  Implements  a  queue  in  Python  using  Redis    

•  Used  to  process  millions  of  background  tasks  on  Plurk  /  Todoist  /  Wedoist  daily  (billions  in  total)    

•  Implementa=on:  18  lines  “real”  implementaKon  a  bit  bigger    

•  h_ps://github.com/Doist/redis_simple_queue    

Page 21: Redis - The Hacker's Database

redis_queue  

from redis_simple_queue import *delete_jobs('tasks')put_job('tasks', '42')assert 'tasks' in get_all_queues()assert queue_stats('tasks')['queue_size'] == 1assert reserve_job('tasks') == '42'assert queue_stats('tasks')['queue_size'] == 0  

Page 22: Redis - The Hacker's Database

redis_queue:  Implementa=on  from redis_wrap import *def put(queue, job_data, system='default'): get_list(queue, system=system).append(job_data)def reserve(queue, system='default'): return get_list(queue, system=system).pop()def delete_jobs(queue, system='default'): get_redis(system).delete(queue)def get_all_queues(system='default'): return get_redis(system).keys('*').split(' ')def queue_stats(queue, system='default'): return { 'queue_size': len(get_list(queue)) }  

Page 23: Redis - The Hacker's Database

bitmapist  and  bitmapist.cohort  

•  Implements  an  advanced  analyKcs  library  on  top  of  Redis  bitmaps.  Saved  us  $2000  USD/month  (Mixpanel)!    •  bitmapist  h_ps://github.com/Doist/bitmapist    •  bitmapist.cohort  Cohort  analyKcs  (retenKon)  

Page 24: Redis - The Hacker's Database

bitmapist:  What  does  it  help  with?  

•  Has  user  123  been  online  today?  This  week?  •  Has  user  123  performed  acKon  "X"?  •  How  many  users  have  been  acKve  have  this  month?  •  How  many  unique  users  have  performed  acKon  "X"  this  week?  

•  How  many  %  of  users  that  were  acKve  last  week  are  sKll  acKve?  

•  How  many  %  of  users  that  were  acKve  last  month  are  sKll  acKve  this  month?  

•  Bitmapist  can  answer  thisfor  millions  of  users  and  most  operaKons  are  O(1)!  Using  very  small  amounts  of  memory.  

Page 25: Redis - The Hacker's Database

What  are  bitmaps?  

•  Opera=ons:  SETBIT,  GETBIT,  BITCOUNT,  BITOP      

•  SETBIT  somekey  8  1  

•  GETBIT  somekey  8  

•  BITOP  AND  destkey  somekey1  somekey2  

•  h_p://en.wikipedia.org/wiki/Bit_array    

Page 26: Redis - The Hacker's Database

bitmapist:  Using  it  # Mark user 123 as active and has played a songmark_event('active', 123)mark_event('song:played', 123)# Answer if user 123 has been active this monthassert 123 in MonthEvents('active', now.year, now.month)assert 123 in MonthEvents('song:played', now.year, now.month)# How many users have been active this week?print len(WeekEvents('active', now.year, now.isocalendar()[1]))# Perform bit operations. How many users that# have been active last month are still active this month?active_2_months = BitOpAnd( MonthEvents('active', last_month.year, last_month.month), MonthEvents('active', now.year, now.month))print len(active_2_months)  

Page 27: Redis - The Hacker's Database

bitmapist.cohort:  Manage  retenKon!  

h_p://amix.dk/blog/post/19718    

Page 28: Redis - The Hacker's Database

•  Goal:  InvenKng  a  modern  way  to  work  together  

•  Join  an  amazing  team  of  13  people  from  all  around  the  world.  A  profitable  business.  500.000+  users.  

•  Work  from  anywhere.  Hacker  friendly  culture.  Python.  CompeKKve  salaries.  

•  We  are  hiring:    [email protected]                                                            www.doist.io    

Page 29: Redis - The Hacker's Database

Ques=ons  and  Answers  

•  Slides  will  be  posted  to  h_p://amix.dk/    

•  For  “offline”  quesKons  contact:  [email protected]