foss4g in the cloud: using open source to build cloud based spatial infrastructure

49
FOSS4G in the Cloud Mohamed Sayed [email protected] Version 092013 License: CCBYSA

Upload: mohamed-sayed

Post on 12-Jul-2015

240 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

FOSS4G  in  the  Cloud    

Mohamed  Sayed  [email protected]  

Version  092013  License:  CC-­‐BY-­‐SA  

Page 2: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Agenda  •  Disclaimers  •  Goals/MoLves  •  The  historical  path  to  ‘Cloud  CompuLng’  •  ‘DefiniLon’  of  cloud  compuLng  •  FOSS4G  in  Cloud  Use  cases  •  AWS:  Components  and  Services  •  Building  for  the  cloud  

–  Architectural  paUerns  for    Cloud  Services  –  Cultural  changes  –  Processes  changes  –  Things  to  remember  

•  Common  FOSS4G  tasks  in  AWS  –  ImporLng  OSM  data  into  POSTGIS  –  Mod_Lle/Mapnik  –  GWC/Geoserver  

•  QuesLons?  

Page 3: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Disclaimers  

•  The  work  presented  was  funded  personally  and  done  during  my  vacaLon.  All  opinions  are  my  own  and  not  my  employer.  

•  I  am  not  affiliated  with  AWS  in  any  other  way  than  being  a  customer,  I  choose  them  when  that  choice  makes  sense    and  would  use  others  where  applicable.  

•  This  is  sLll  Work  in  progress.  YMMV  

Page 4: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Goals/MoLves  

•  Goals  – We  will  learn  or  validate  some  ideas.  – Get  some  feedback  on  what  to  do  next.  – Help    save  someone  Lme/money/frustraLon  – Raise  awareness  about  some  risks.  

•  MoLves  – The  new  disrupLon  is  in  data  and  services  around  it,  we(Open  Source  people)  should  not  miss  out  on  that  and  I  believe  I  can  help.  

Page 5: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Cloud Computing

Hardware Changes

Virtualization Mobile Computing

Path to Cloud Computing

MultiScreen

Tablets

KVM/Xen

Solaris Zones

VMWare/Parallels

Storage/Network Virtualization

I/O Offloading

NPT/EPT

Multicore Support

Smart Phones

Page 6: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Cloud  CompuLng  definiLon  (IMHO)  

•  Cloud  compuLng  is  a  compuLng  paradigm  composed    of  abstracLons  ,    a  set  of  primiLves  and  a  set  of  interfaces  and  tools  to  drive  those  abstracLons  and  primiLves.  The  abstracLons  and  primiLves  need  not  be  new  in  themselves,  but  their  combinaLon  and  impact  is  what  create  ‘The  Cloud’  culture.  

Page 7: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Compute   Storage  Network  

PrimiLves  

AbstracLons   FoundaLon  

Image   Volumes   Snapshots   Autoscale  

Tools   APIs   Config  Management  

Page 8: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Example  “High  level”  Architecture  OpenStack  

Page 9: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

In  reality,  it  sorta  looks  like  this  

Page 10: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

AWS  as  a  Public  Cloud  

Page 11: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

FOSS4G  Use  Cases  

•  Disaster  Recover/Backup  •  StaLc,  Logic-­‐free,  web  publishing  •  Online  FOSS4G  as  a  Service  •  Data  transformaLon  jobs  •  Content  CuraLon  and  Batch  processes  

Page 12: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Example  FOSS4G  AWS  Use  Case  StaLc  publishing  blueprint  

Page 13: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

How  to  Build  your  Cloud  Infrastructure  

Page 14: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Architectural  PaUerns  

•  The  Cookie  CuUer/Soloist.  •  The  Centrist.  •  The  Replicator.  •  The  Masters  of  Colonies.  

Page 15: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

CAP:  Cookie  CuUer  

Page 16: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

The  Cookie  CuUer/Soloist  

•  Pros:  – Simple.  – Scales  Horizontally  w/load.  – Localized  failure  impact.  

•  Cons:  – Poor  support  for  write-­‐oriented  services.  – Coarse  grained  scalability.  – Node  capacity  has  verLcal  scalability  issues.  

Page 17: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

CAP  –  The  Centrist  

Page 18: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

The  Centrist  

•  Pros:  –  Scales  at  components  level.  – Moderate  complexity  up  to  middle  range  load.  –  Faster/Easier  fault  isolaLon/detecLon.  – Data  stores  Master/Slave  is  a  well  studied  concept.  

•  Cons:  –  Central  data  store  becomes  more  criLcal/boUleneck.  – MulL-­‐region  deployments  suffer  from  latency.  –  VerLcal  scaling  characterisLcs  pronounced  on  the  Data  store.  

Page 19: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

CAP  –  The  Replicator  

Page 20: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

The  Replicator  

•  Pros:  – Scales  at  components  level.  –  Improved  read  performance.  – BeUer  Disaster  Recovery.  – Well  suited  for  mulL  regions  deployments.  

•  Cons:  – Writes  are  sLll  central.  – Added  complexity.  –  Increased  bandwidth  requirements.  

Page 21: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Masters  of  Colonies  

Page 22: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

CAP  –  Master  of  Colonies  

•  Pros:  –  Improved  write  performance.  – Decompose  large  data  sets  into  smaller  ones.  – Faster  data  iteraLons.  – Good  disaster  recovery  strategy.  

•  Cons:  – Complex!  – Weak/Varying  support  by  various  data  stores.  – High  maintenance  overhead  

Page 23: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Cultural  Changes  

•  Get  stakeholders  buy-­‐in  early.  •  Build  a  full  ownership  culture.  •  Adopt  an  agile  approach.  •  Encourage  prototyping  and  experimentaLon.  •  AutomaLon  as  a  way  of  life.  

Page 24: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Processes  Changes  •  Somware  Architecture:  

–  Know  the  floor,  and  the  ceiling.  –  Be  as  stateless  as  possible.  –  Graceful  failure  response.  –  Good  Logging  as  a  way  of  life.  

•  Release  Engineering  –  The  VM  as  an  arLfact  –  AutomaLon  –  Versioning  –  Snapshot  

•  AutomaLon:  –  ConfiguraLon  management  –  OrchestraLon  –  Auto-­‐scaling  

Page 25: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Things  to  remember  

•  Review  any  legal  implicaLons.  •  Use  the  cloud  primiLves.  •  Pay  aUenLon  to  security:  Security  groups,  Encrypted  data  at  rest,  etc.  

•  Cleanup  old  stuff.  •  Things  fail:  don’t  fight  it,  just  handle  it.  •  You  will  not  get  it  right  the  first  Lme  but  things  should  look  good  on  3rd  iteraLon.(Read  the  mythical  man  month)  

Page 26: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

FOSS4G  in  AWS  Performance/Architecture  EvaluaLon  •  Tools  used:  – Siege  – Sar  – Oprofile  – R/AWK/Python/Ruby  

•  Postgresql  queries  log.  •  Test  client  -­‐>  Target  server  as  separate  nodes.  

Page 27: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

OSM  Data  into  AWS  •  Setup  1  

–  M1.Large  (  2  Cores)  –  Standard  EBS  –  EU-­‐West  region  

•  Setup  2  –  M1.Large  –  Provisioned  EBS  :  8000  IOPS  –  EU-­‐West  region  

•  Setup  3  –  Hi.4xlarge  –  SSD  drive  –  EU-­‐West  region  

•  Setup  4  –  M2.2xlarge  –  EU-­‐West  –  Ephemeral  drives  

Page 28: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

ImporLng  OSM  data  into  AWS  TesLng  the  water  

Page 29: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

ImporLng  OSM  data  into  AWS  TesLng  the  water  some  more  

Page 30: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Enough  Water  TesLng  ImporLng  Planet  to  SSD  

•  Guess  how  long  it  took  to  finish  

Page 31: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

ImporLng  Planet  into  AWS  Using  SSD  

•  It  only  took  35  hours!  •  Disk  uLlizaLon:  ~250Gb  •  Guess  what  was  the  first  thing  I  did  when  it  finished?  

Page 32: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

ImporLng  Planet  into  AWS  

•  I  made  a  copy  of  course  J  •  Create  a  RAID  0  set  •  Create  LVM  on  top  of  RAID  0  •  Kick  off  data  copy  •  Guess  how  long  it  took  

Page 33: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

ImporLng  Planet  into  AWS  

•  It  only  took  2.5    hours.  

Page 34: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Data  Import  in  AWS  OSM  full  planet  

Page 35: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Profiling  OSM2PGSQL  

•  Data  sets  used  •  Links/Ways/nodes  of  each  set  •  Time  

Page 36: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Data  import  notes  

•  Create  the  DB  on  SSD  and  clone  to  EBS:  – Use  case:  quickly  import  the  data  but  make  it  persistent.  

– Full  planet  volume  takes  2-­‐2.5  hours.  •  Create  Provisioned  EBS  and  clone  to  SSD:  – Use  case:  Need  very  fast  runLme  access  – Full  planet  volume  takes  5.4  hours  

•  Can  we  get  OSM  primiLves  summary  per  dump  and  full  planet  as  part  of  the  pbf?  

Page 37: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Data  Import  in  AWS  Lessons  learned  

•  It  is  not  only  the  disk.  •  Risk  on  mulLple  levels  – Dev  teams  can’t  possibly  be  tesLng  to  their  full  potenLal(in  the  data  context).  

– Evident  in  outdated/incorrect  documentaLon  for  bootstraping  

Page 38: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Rendering  –  ModLle/mapnik  •  Apache  module  +  a  unix  daemon.  •  Apache  module  is  process  model,  Renderd  is  mulLthreaded.  

•  Apache  module  sends  a  command  to  renderd  over  a  unix  socket.  

•  The  renderer  will  fetch  the  data  and  writes  it  out.  •  Non  cached  data  will:  –  Fail  on  first  aUempt(return  404)  –  Pass  on  second  aUempt(~600  msec)  

•  Cached  data  is  served  <  10  msec  •  Very  SQL  chaUery  

Page 39: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Renderd  Threads  Profiling  

Page 40: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Renderd  Profiling  

Page 41: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Renderd  Profiling  

Page 42: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Renderd  Profiling  

Page 43: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Renderd  Profiling  

Page 44: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Rendering  –  GeoServer/GWC  

•  Single  layer,  ZL  15,  RAM  Disk  :  100  Lles/sec  •  TruncaLon  is  very  slow.  Please  version  your  published  layers.  

•  Standalone  GWC  offers  much  beUer  scalability  model  

•  Possible  race  condiLons  in  threads  wriLng  Lles.  

•  Didn’t  hit  the  getAlphaTile()  issue.    

Page 45: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

GWC/Geoserver  in  AWS  Example  deployment  

Page 46: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Cost?  

•  Screenshot  of  my  account  acLvity  

Page 47: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Released  arLfacts  Snapshots  of  OSM  data  in  flat  PGSQL  •  2  drives  :  –  snap-­‐f9affde6  –  snap-­‐ffaffde0  

•  To  use:  –  Create  a  volume  based  on  the  snapshot  – Mdadm  acLvate  (  raid0  ,  2  drives)  –  Pvscan,vgscan,vgchange,lvscan  –  Installing  mdadm  and  rebooLng  should  work  on  most  machines  to  do  this  for  you  automagically.  

– Mount  on  the  volume  on  your  PGDATA  path  

Page 48: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Backlog  

•  Geocoding  tesLng  with  Twofish  and  GISGraphy  

•  OSRM  profiling  •  SuggesLons?  

Page 49: FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

Many  thanks  to  

•  Geofabrik  for  compiling  all  those  sets/formats.  •  FOSS4G2013  for  this  opportunity  •  And  THANK  YOU