persistence of memory: in-memory is not often the answer

16
On the Persistence of Memory (in Database Systems) i © 2012 Hired Brains Inc. All Rights Reserved On the Persistence of Memory… In Database Systems Picture credit Creative Commons By Neil Raden Hired Brains, Inc. December, 2012

Upload: neil-raden

Post on 20-Jul-2015

516 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   i    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

 

 

 

On  the  Persistence  of  Memory…  

In  Database  Systems    

 Picture  credit  Creative  Commons  

 

 

By  Neil  Raden  Hired  Brains,  Inc.  

                                           December,  2012    

 

Page 2: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   ii    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

 

Table  of  Contents  

 

Executive  Summary   1  

The  Basics   1  

Database  Memory  and  Processing  Models   3  

In-­‐Memory  Database   4  

Why  is  in-­‐memory,  a  fairly  old  concept,  interesting  again?   6  

Limitations  of  iMDB   8  Cost   8  Persistence   8  Volume   9  Dual-­Purpose  OLTP  and  Analytics   9  Not  so  “green”   10  

The  Hybrid  DBMS   10  

Compare  and  Contrast   12  

Conclusion   12  

ABOUT  THE  AUTHOR   14  

Page 3: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   1    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

 

Executive  Summary  

 

Recent  drop  in  computer  memory  prices  and  the  introduction  of  early  implementations  

of  In-­‐Memory  database  solutions  have  recently  raised  the  level  of  interest  in  in-­‐memory  

databases,  but  the  topic  of  in-­‐memory  databases  is  not  new.  In  fact,  there  are  literally  

dozens  of  in-­‐memory  database  products,  some  in  production  for  decades,  but  due  to  

the  prohibitive  cost  differential  between  memory-­‐based  systems  and  disk-­‐based  

systems,  none  have  found  a  place  beyond  certain  niche  markets.  But  the  drastic  and  

remarkable  (there  is  hardly  a  word  to  describe  it)  drop  in  the  cost  of  memory  combined  

with  an  equally  remarkable  growth  in  density  and  capacity  is  driving  the  discussion  into  

the  mainstream  of  computing  architectures.    

 

For  the  purposes  of  discussion,  we  refer  to  in-­‐memory  databases  systems  as  iMDB  and  

current  relational  database  systems  incorporating  large  memory  models  with  attached  

storage  (including  traditional  magnetic  disk  and  solid-­‐state  devices)  as  hybrid-­‐DBMS.  

Though  the  discussion  is  occasionally  technical,  our  conclusions  are  that:  

• iMDB  are  leveraging  lower-­‐cost  RAM  for  storage  but  still  lack  persistence  and  

data  scalability  while  limiting  the  types  of  solutions  supported  by  iMDB  

architecture.  

• Hybrid-­‐DBMS  is  a  proven  technology  and  provides  high  performance  and  flexible  

architecture  to  support  a  variety  of  analytics  applications.  

 

 

The  Basics  

 

Page 4: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   2    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

All  database  management  systems  (DBMS),  in  fact,  virtually  all  programs  in  conventional  

computing  environments  behave  exactly  the  same  way.  A  central  processing  unit  (CPU)  

performs  a  single  very  low-­‐level  instruction  on  a  single  piece  of  data.  While  complex  

application  programs  like  DBMS  have  many  layers  of  functionality  and  can  be  described  

logically  as  a  set  of  higher-­‐level  interworking  pieces,  the  CPU  has  utterly  no  insight  into  

this,  it  just  chugs  along  one  instruction  at  a  time.  If  you  were  to  sit  inside  a  CPU  and  

watch  its  stream  of  sequential  processes,  you  would  be  unable  to  determine  what  the  

controlling  program  was  doing.  So  database  software,  or  really,  any  software,  is  just  a  

logical  structure  that  encapsulates  all  of  the  smaller  steps.  When  things  get  calculated,  

they  bear  no  resemblance  to  the  whole.  A  CPU  doesn’t  know  what  a  join  or  an  index  is.    

 

How  those  bits  of  work  are  presented  to  the  CPU  is  the  heart  of  the  application  design.  

In  other  words,  though  there  is  no  difference  in  how  CPU’s  execute  from  one  application  

to  another,  the  order  of  those  instructions  is  the  key  to  performance.  

 

Each  step  in  execution  is  composed  of  a  single  instruction  and  a  single  piece  of  data  

(though  today’s  CPU’s  are  composed  of  multiple  “cores,”  essentially  multiple  CPU’s  on  a  

single  chip).  The  instruction  and  the  data  have  to  be  presented  to  the  CPU  through  

memory,  either  system  RAM  or  a  memory  cache  on  the  CPU  itself.  It  makes  no  

difference  if  the  application  is  “in-­‐memory”  or  disk-­‐based,  the  CPU  has  to  be  presented  

with  the  instruction  (actually,  the  “instruction  set”  is  burned  into  the  CPU,  what  is  

presented  to  it  is  an  instruction  for  which  instruction  to  execute).  For  this  reason,  an  in-­‐

memory  architecture,  where  all  instructions  and  data  are  in  RAM  should,  in  theory,  

provide  superior  performance  compared  to  DBMS  that  must  fetch  data  from  remote  

mechanical  disk  drives.    

 

Solid-­‐state  drives  (SSD)  mentioned  above  use  solid-­‐state  memory  chips,  typically  flash  

memory  (NAND),  instead  of  spinning  magnetic  disk  drives.  Flash  memory  (NAND),  is  less  

expensive  and  slower  than  RAM/SRAM,  but  it  is  non-­‐volatile,  meaning,  it  retains  data  

Page 5: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   3    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

 

without  power.  It  does  not  lose  data  in  the  case  of  a  system  shutdown.  RAM  is  volatile  

and  must  be  powered  continuously  and  requires  backup,  typically  conventional  disk  

drives  for  reliability.    

 

One  could  say  that  a  DBMS  with  SSD  instead  of  traditional  disks  could  be  an  in-­‐memory  

device,  but  there  are  two  fundamental  differences.  First,  the  “memory”  chips  of  an  SSD  

are  part  of  a  disk  drive  “card”  or  assembly  that  uses  the  same  block  addressing  as  the  

disks  it  replaces.  In  other  words,  even  though  the  seek  time  finding  data  on  an  SSD  is  at  

least  an  order  of  magnitude  greater  than  a  spinning  magnetic  disk  (this  is  a  

generalization),  there  is  still  a  call  for  external  data,  handled  by  the  disk  controller  and  

passed  to  RAM.  An  interesting  arrangement,  typically  used  for  add-­‐on  accelerators,  not  

primary  database  operations  are  SSD’s  constructed  from  SRAM,  boosting  the  seek  time  

on  the  drives.  This  is  a  special-­‐purpose  architecture  and  very  expensive  and  not  further  

considered  here.  

 

Database  Memory  and  Processing  Models  

 

To  clear  up  confusion  between  various  models  for  memory  in  databases,  it’s  useful  to  

describe  the  predominant  versions.  There  is  a  difference  between  memory  models  in  

database  systems  for  processing.  The  two  predominant  memory  models  for  the  most  

common  database  systems  are  shared  memory  and  shared  nothing.  In  both,  memory  is  

used  only  for  processing,  not  for  persistent  storage.  This  is  the  essential  difference  

between  today’s  iMDBs  and  more  conventional  on-­‐disk  or  hybrid  systems.    

 

In  the  shared  memory  model,  all  database  operations  use  the  same  single  aggregation  

of  memory  and  the  system  allocates  its  memory  and  processing  tasks.  All  memory  is  

available  to  every  processor.  In  a  shared  nothing  system,  each  separate  node  of  

processors  and  memory  do  their  own  work  in  parallel  and  are,  typically,  controlled  by  a  

Page 6: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   4    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

master  node  (which  can  be  physical  or  virtual).  In  reality,  nodes  in  a  shared  nothing  

environment  may,  themselves,  operate  as  independent  shared  memory  nodes.  But  in  

neither  case  is  data  stored  in  memory  until  it  is  called  for.  The  exception  is  when  data  is  

cached  (frequently  used  data  in  “pinned”  in  memory),  but  it  is  still  volatile  and  the  data  

can  be  flushed  at  any  time.    

 

iMDB  operate  more  or  less  like  a  shared  memory  systems,  but  everything,  including  

operating  systems,  software  programs  (executables),  workspace,  indexes  and  data  are  

stored  in  RAM.  When  these  systems  are  scaled  out  with  multiple  nodes  connected  by  a  

network,  they  operate  more  like  a  grid  or  distributed  network  than  like  a  true  MPP-­‐

engineered  system.  However,  concepts  of  shared  memory  versus  shared  disk  (shared  

nothing)  are  a  little  obsolete  now  as  CPU’s  themselves  are  multi-­‐core,  meaning,  the  

processors  themselves  are  capable  of  parallel  processing,  provided  the  software  

program  (DBMs)  has  been  designed  to  take  advantage  of  it)  

 

This  description  is  a  simplification  and  there  are  many  exceptions,  but  in  general,  no  

database  management  system  stores  data  persistently  in  memory  except,  of  course,  

iMDB.  The  difference  between  the  various  memory  models  described  above  is  how  

memory  is  used  for  processing  data.  

 

In-­‐Memory  Database  

 

It  is  an  unassailable  truth  that  data  processed  from  memory  is  orders  of  magnitude  

faster  than  retrieving  it  from  a  disk  drive,  but  that  is  only  a  small  part  of  the  story.  

Historically,  CPU  processors  have  been  “I/O  bound,”  meaning  they  spent  a  significant  

amount  of  time  waiting  for  the  requested  data  to  arrive,  requiring  extreme  

countermeasures  in  software  design  to  minimize  the  latency.  With  data  streaming  to  

processors  at  the  speed  of  random-­‐access  memory  (RAM),  just  the  opposite  situation  

Page 7: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   5    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

 

can  occur  –  the  CPU’s  may  become  flooded  with  data  

and  unable  to  process  as  quickly  as  it  is  presented.  The  

point  cannot  be  stressed  enough  –  merely  boosting  the  

available  RAM  does  not  guarantee  smooth,  faster  

executions  of  existing  programs.  This  turn  of  events  calls  

for  careful  engineering  and  balance.  In  other  words,  

performance  of  complex  applications  is  rarely  resolved  

by  changing  one  thing,  it  usually  requires  rethinking  of  the  whole  approach.  The  result  is  

that  software  migration  to  in-­‐memory  usually  requires  a  great  deal  of  re-­‐work;  It  is  not  

just  move  and  drop.  

 

 Even  the  notion  of  iMDB  is  a  bit  of  a  misnomer  as  there  

is  still  the  requirement  for  separate  conventional  storage  

devices  for  mirroring  everything  for  persistence,  and  

keeping  the  iMDB  refreshed  and  reliable.  Systems  can  

fail,  which  means  in-­‐memory  systems  still  have  to  

maintain  multiple  copies  of  the  data,  and  a  complete  

reload  if  the  system  fails.  Adding  all  of  these  factors  

together  can  make  the  effort  quite  expensive  despite  the  

seemingly  reasonable  price  of  memory  today  (though  at  multiple  terabytes,  you  will  feel  

the  pinch).  In  addition,  to  make  maximum  use  of  RAM,  all  database  systems  use  

compression  of  data,  to  one  degree  or  another.  IMDBs  typically  employ  aggressive  

compression  algorithms  to  maximize  the  amount  of  data  that  can  be  put  in  working  

memory.  Back-­‐up  of  an  iMDB  is  usually  lightly-­‐  or  un-­‐compressed  so  it  can  be  read  by  

other  processes,  among  other  reasons.  Assuming  a  realistic  3.5x  compression  for  an  

iMDB  (not  all  RAM  is  available  for  the  data),  the  back-­‐up  drives  will  need  to  be  5X  the  

size  of  RAM,  and  there  may  be  multiple  archives,  and  the  backups  themselves  will  likely  

be  mirrored.  With  even  average-­‐sized  analytical  data  warehouses  today  running  about  

50  terabytes  (there  are,  of  course  much  larger  ones),  an  iMDB  to  accommodate  those  

The  point  cannot  be  stressed  enough  –  merely  boosting  the  available  RAM  does  not  guarantee  smooth,  faster  executions  of  existing  programs.    

Even  the  notion  of  iMDB  is  a  bit  of  a  misnomer  as  there  is  still  the  requirement  for  separate  conventional  storage  devices  for  mirroring  everything  for  persistence,  and  keeping  the  iMDB  refreshed  and    reliable.  

Page 8: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   6    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

would  need  75-­‐100TB  of  separate  disk  drives  to  handle  back-­‐ups,  snapshots,  logs  and  

staging  areas.    

Another  thing  to  consider  is  that  a  database  still  has  to  perform  all  of  the  database  

functions,  from  loading  data  to  presenting  it  as  the  result  of  a  query.  Conventional  

relational  database  technology,  including  those  platforms  are  that  are  designed  

specifically  for  data  warehousing  and  analytical  work,  as  opposed  to  transactional  

processing,  must  employ  a  host  of  services  to  be  useful  to  an  enterprise  including:  

• Workload  management  for  efficient  management  of  the  resources    

• Security  

• Reliability    

• High  availability  

• Use  of  performance  statistics  for  query  optimization.    

 

They  must  also  support,  in  addition  to  traditional  row-­‐based  schema,  columnar  

organization  of  the  data  which  is  particularly  effective  for  wide  tables  with  many  

attributes,  but  it  is  less  effective  with  more  normalized  schema  and  has  some  serious  

drawbacks  in  the  ability  to  update  the  database  in  real-­‐time.  But  columnar  orientation  is  

not  a  feature  limited  to  iMDBs  –  most  analytical  database  systems  incorporate  or  even  

operate  solely  in  columnar  mode.    

 

Why  is  in-­‐memory,  a  fairly  old  concept,  interesting  again?    

 

iMDBs  have  been  used  for  quite  some  time  but  they  have  always  been  limited  primarily  

by  three  factors:  The  cost  of  memory,  size  of  database,  and  the  persistence  of  data.  

Today,  a  dollar  will  buy  500  to  1000  times  as  much  memory  as  it  did  in  1995,  and  the  

capacity  per  square  inch  of  the  chips  has  increased  in  inverse  proportion.  Memory  

speeds  increased  as  well,  though  not  as  dramatically.  If  the  amount  of  data  that  could  

Page 9: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   7    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

 

be  stored  in  early  in-­‐memory  systems  was  too  small  for  most  applications,  1000  times  

more  memory  might  be  enough  for  in-­‐memory  to  be  feasible.    

 

 

This  extremely  simplified  diagram  depicts  the  essential  (but  certainly  not  all)  differences  

between  an  iMDB  and  a  hybrid-­‐DBMS.  iMDB  maximizes  the  use  of  RAM  but  uses  

essentially  the  same  hardware  architecture  of  2  CPU’s  with  levels  of  on-­‐board  cache,  

and  RAM  for  holding  the  entire  database,  the  database  software,  working  space,  caches  

and  embedded  functionality.  The  only  difference  in  the  hybrid-­‐DBMS  is  less  reliance  on  

RAM  and  the  ability  to  address  vastly  greater  amounts  of  data  from  the  storage  

subsystem.  The  hybrid-­‐DBMS  has  documented  databases  of  greater  than  a  petabyte.  

iDBMS  typically  scale  out  to  16  servers  with  up  to  1  terabyte  of  RAM  each,  but  with  a  

significant  amount  of  RAM  taken  up  with  operating  system,  working  memory,  etc,.  

Therefore  even  with  5x  compression,  the  maximum  amount  of  uncompressed  data  per  

Page 10: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   8    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

server  is  no  more  than  40Tb.  Given  the  expense  of  these  large  iMDB  systems,  scaling  out  

to  sizes  that  are  needed  today  is  difficult.  

 

Limitations  of  iMDB    

In-­‐memory  databases  are  constrained  by  key  overwhelming  limitations:    

 

• No  matter  how  inexpensive  RAM  is  today  compared  to  historical  cost,  it  is  still  

considerably  more  expensive  than  its  alternatives  limiting  its  useful  for  

enterprise  level  systems.    

• Data  cannot  persist  in  memory  indefinitely.  It  is  inevitable  that  something  will  

fail,  which  requires  mechanisms  to  protect  the  data  that  can  erode  the  value  

proposition.    

• With  today’s  data  volumes,  it  is  still  not  practical  to  use  an  in-­‐memory  approach  

for  a  data  warehouse.  

• iMDB  rely  on  the  system  being  up  24/7.  

 

Cost  Though  RAM  is  10,000  times  faster  to  read  than  a  mechanical  disk  drive,  data  volumes  

today  are  enormous  and  growing.  A  petabye-­‐sized  in  memory  database  would  cost  

more  than  $5  million,  perhaps  twice  that.    SSD  for  that  capacity  would  cost  1/5  to  1/10  

the  price.  And  a  hybrid-­‐DBMS,  hot/warm/cold  hierarchical  storage  architecture  would  

cost  far  less  than  that.  

Persistence  In-­‐memory  architecture  still  requires  conventional  storage.  RAM  is  volatile  and  if  

something  fails,  or  even  just  hiccups,  there  can  be  data  loss.  Therefore,  everything  in  

memory  has  to  have  a  copy  on  less  volatile  storage  devices.  Updating  the  memory  

requires  log  files,  “Snapshots”  and  “checkpoints  which  can  slow  down  processing).    

Page 11: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   9    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

 

Volume  In-­‐memory  cannot  economically,  or  even  practically,  scale  to  the  volumes  of  today’s  

data  warehouses.  Ten  years  ago,  a  terabyte-­‐size  data  warehouse  was  remarkable,  but  

today,  there  are  dozens,  perhaps  even  more  than  a  hundred  greater  than  a  petabyte,  

one  thousand  times  larger.  Projections  are  that  this  growth  rate  is  not  diminishing.  

Dual-­Purpose  OLTP  and  Analytics  Some  iMDB  products  promise  the  ability  to  perform  OLTP  and  analytical  processing  on  

the  same  platform,  with  the  same  data.  This  would  be  a  real  advantage  as  it  would  

alleviate  need  to  extract  and  transform  data  from  operational  systems  and  provide  

analytical  support  without  additional.  Unfortunately,  this  is  currently  impossible.    

 

iMDB  platforms  generally  cannot  support  OLTP  because  they  have  to  wait  for  a  

transaction  to  complete  on  disk  to  be  ACID  compliant.  When  data  is  updated  in  

memory,  it  is  held  in  log  files  usually  stored  on  SSD  drives.    iMDB  platforms  use  this  disk  

based  “persistent”  layer  to  “weather”  a  node  failure,  which,  in  a  narrow  sense,  suggests  

they  have  ACID  properties.    When  the  iMDB  node  comes  back  up  (after  the  failed  part  is  

replaced  or  the  cold  standby  node  takes  over),  the  data  that  is  resident  on  the  disk  

“persistent”  layer  is  reloaded  back  into  memory.    It  can  be  done  in  one  of  two  ways  –  

“Lazy”,  where  the  data  is  reloaded  as  queries  enter  the  system  and  request  a  specific  

table  (which  doesn’t  really  make  sense  since  the  iMDB  appears  in  memory  as  one  

dimensional  table),  or  “Full”  where  queries  must  wait  until  all  the  data  is  reloaded.    In  

both  cases,  the  log  files  stored  on  disk  or  flash  have  to  be  read  and  applied1.      

 

There  are  features  to  handle  different  kinds  of  failure,  though.  Both  the  SSD  area  and  

Disk  Persistent  layer  have  RAID  capability  to  cover  for  a  disk  failure.  So,  if  a  node  

has  a  problem,  but  keeps  power,  then  all  “may  be”  ok.    It  is  an  “error  dependent”  issue.  

 If  there  is  a  problem  with  a  memory  chip,  it  is  unlikely  the  data  will  survive  -­‐-­‐  requiring  a  

                                                                                                                 

Page 12: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   10    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

total  reload..  If  a  node  loses  power,  then  a  total  reload  of  all  the  data  that  was  on  that  

node  is  required.  

Not  so  “green”  At  a  time  when  most  vendors  are  formulating  a  “green  ”  message,  it  turns  out  that  

iMDBs  require  a  lot  of  power,  considerably  more  than  spinning  drives  and  significantly  

more  than  solid-­‐state  drives  (SSD  –  more  on  this  below))  RAM  is  volatile  and  needs  to  be  

powered  24/7  if  the  data  is  to  persist.  

 

The  Hybrid  DBMS  IMDB  vendors  often  portray  disk-­‐based  systems  as  dinosaurs  that  have  outlived  their  

usefulness,  but  in  fact,  they  are  the  result  of  30  years  of  research  and  development  by  

some  of  the  most  brilliant  minds  in  the  technology  industry  and  have  hardly  been  

standing  still.  In  the  same  way  relational  database  technology  gradually  gained  new  

hardware  capabilities  and  evolved  to  become  hybrid-­‐DMBS,  it  seems  likely  that  the  

major  database  vendors  will  continue  to  evolve  to  leverage  the  advantages  of  more  

memory  over  disk  drives.  The  dramatic  cost  reductions  of  memory  have  benefits  that  

accrue  to  hybrid-­‐DMBSs  too  –  solid-­‐state  disk  drives  replacing  traditional  magnetic  

drives  with  improvements  in  I/O  speed.  Teradata  Virtual  Storage  for  example  

automatically  manages  the  movement  of  the  hot  and  the  cold  data.  Large  memory  

models  are  common,  too,  even  if  the  persistent  data  remains  on  attached  storage  

instead  of  completely  in  memory.    

Another  consideration  is  that  for  most  database  applications,  there  is  a  clear  difference  

between  hot  and  cold  data.  In  other  words,  data  that  is  used  at  the  moment  as  opposed  

to  data  that  is  use  less  frequently.  This  tilts  the  decision  between  disk-­‐only  and  in-­‐

memory  to  an  in-­‐between  alternative,  a  hybrid  scheme  with  large  memory,  SDD  drives,  

and  less  expensive  slower  HDD  for  warm  or  cold  data.  Hybrid-­‐DBMS  leverage  the  speed  

of  SSD  to  reduce  query  response  time  delays  by  cutting  the  painful  delay  times  

introduced  by  lengthy  I/O  queues  in  HDD  storage.    A  query  requires  many  I/O  

Page 13: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   11    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

 

operations  to  complete  so  the  time  spent  with  I/O  requests  in  storage  queues  has  a  

direct  impact.  Not  only  does  the  speed  and  parallel  channel  capability  of  SSD  result  in  

40X  faster  I/O  completions,  but  the  queue  in  the  HDD  are  shortened  by  aiming  80%  of  

I/O  at  the  SSD,  this  can  result  up  to  a  60X  improvement  in  average  response  times.  

 

A  Hybrid  scheme  requires  not  only  a  physical  assemblage  of  devices,  but  also  an  

intelligent  data  manager  that  continually  and  transparently  optimizes  the  architecture  

by  moving  data  to  its  best  location.  The  figure  below  represents  Teradata’s  version  of  

such  as  system.2  

 

 

Notice  that  in  this  scheme,  each  node  is  balanced  with  a  combination  of  CPU’s  and  their  

characteristics,  the  amount  of  RAM  and  the  storage  devices.  This  provides  for  optimum  

balance  between  processing,  memory  and  addressable  storage  which  leads  to  optimal  

performance.  It  does,  however,  somewhat  limit  configuration  flexibility  as  the  drives  

and  CPU’s  are  fixed.    

                                                                                                               2  Teradata  are  working  on  extending  the  data  management  to  the  memory  layer    

 

Page 14: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   12    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

Compare  and  Contrast  

Today  there  are  two  ways  to  store  data  electronically:  on  arrays  of  solid-­‐state  memory  

chips  (on  either  a  memory  bus  or  on  SSD)  or  on  magnetic  disk  drives.  Solid-­‐state  chips  

are  obviously  faster  than  magnetic  drives  (although  in  some  cases,  the  differential  can  

be  overcome  with  good  platform  design  and  workload  management).    Solid-­‐state  chips  

are  considerably  more  expensive  than  magnetic  drives,  and  volatile  RAM  chips  are  

considerably  more  expensive  (and  faster)  than  non-­‐volatile  RAM.  We  can’t  see  the  

future  with  perfect  clarity,  but  it  is  likely  for  the  foreseeable  future,  this  stratification  of  

memory  and  storage  will  not  change,  even  as  the  price/performance  of  each  continues  

to  improve.  The  faster  RAM  chips  will  remain  volatile,  making  full  in-­‐memory  databases  

impractical  for  most  uses.    

iMDB  lack  the  balance  of  CPU  and  storage  could  lead  to  flooding  of  the  CPU’s.  iMDB  

trades  the  potential  for  I/O  latency  with  the  very  real  possibility  of  RAM  out-­‐performing  

the  processors.  Without  I/O  bottleneck,  processors  can  become  saturated.  This  is  

something  that  the  software  developers  should  be  aware  of,  and  design  for,  but  given  

the  relative  recency  of  certain  iMDB’s,  these  features  may  not  be  well  developed.  It  

may  be  the  case  that  client  applications  may  need  to  be  rewritten  to  not  only  take  

advantage  of  the  memory  resources  but  to  keep  them  from  bogging  down.  

iMDB  rely  on  large  banks  of  very  fast,  expensive  RAM,  but  also  on  the  other  types  of  

memory  and  storage  to  operate  for  high  availability  and  for  backup.  Hybrid-­‐DBMS  relies  

on  the  same  collection  of  memory  and  storage  types,  but  in  different  proportion.  A  

hybrid  system  uses  solid-­‐state  memory  judiciously  and  attempts  to  keep  as  much  data  

pinned  in  memory  as  possible  for  active  work,  but  relies  on  only  one  mechanism  for  

persistent  storage.  

 

Conclusion    

Page 15: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   13    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

 

iMDB  vendors  claim  that  In-­‐Memory  will  replace  traditional  hybrid-­‐DBMS,  unless  they  

are  new  laws  of  physics,  holding  persistent  data  for  months  or  years  simply  isn’t  feasible  

without  resorting  to  a  hybrid  in-­‐memory  and  disk-­‐based  system.  In  a  way,  one  can  think  

of  an  iMDB  as  merely  an  accelerator  for  a  conventional  database  because  it  cannot  

meet  the  requirements  durability  on  its  own.    

On  the  other  hand,  hybrid-­‐DMBS  are  based  on  proven  data  warehousing  technologies  

and  offer  flexible  architectures  and  deliver  high  performance  with  automatic  storage  

management.  

It  would  be  easy  to  predict  that  iMDBs,  and  that  includes  DBMS  with  all  SSD  drives,  will  

eventually  overtake  disk-­‐based  systems.  However,  the  cost  of  memory  will  still  be  

greater,  no  matter  what  it  is,  than  disk  drives  and  though  it  is  impossible  to  predict,  the  

amount  of  data  captured  and  analyzed  will  continue  to  grow  at  a  rate  faster  than  the  

price/Gb  of  memory.    

 

 

Page 16: Persistence of memory: In-memory Is Not Often the Answer

On  the  Persistence  of  Memory  (in  Database  Systems)   14    

©  2012  Hired  Brains  Inc.    All  Rights  Reserved  

 

ABOUT  THE  AUTHOR  

 

 

 

Neil  Raden,  based  in  Santa  Fe,  NM,  is  an  industry  analyst  and  active  consultant,  widely  

published  author  and  speaker  and  the  founder  of  Hired  Brains,  Inc.,  

http://www.hiredbrains.com.  Hired  Brains  provides  consulting,  systems  integration  and  

implementation  services  in  Data  Warehousing,  Business  Intelligence,  “big  data:,  Decision  

Automation  and  Advanced  Analytics  for  clients  worldwide.  Hired  Brains  Research  

provides  consulting,  market  research,  product  marketing  and  advisory  services  to  the  

software  industry.  

 

Neil  was  a  contributing  author  to  one  of  the  first  (1995)  books  on  designing  data  

warehouses  and  he  is  more  recently  the  co-­‐author  of  Smart  (Enough)  Systems:  How  to  

Deliver  Competitive  Advantage  by  Automating  Hidden  Decisions,  Prentice-­‐Hall,  2007.  He  

welcomes  your  comments  at  [email protected]  or  at  his  blog  at  Competing  on  

Decisions.