sqrrl - the apache software...

21
Sqrrl Data, Inc. All Rights Reserved sqrrl Secure. Scale. Adapt. Adam Fuchs, CTO 11 April, 2013

Upload: others

Post on 20-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

sqrrl  Secure.  Scale.  Adapt  

Sqrrl  Data,  Inc.    All  Rights  Reserved  

sqrrl  Secure.  Scale.  Adapt.  

Adam  Fuchs,  CTO  11  April,  2013  

Page 2: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

2  Sqrrl  Data,  Inc.    All  Rights  Reserved  

Management

Ely Kahn sqrrl VP BizDev,

White House

Investors

Adam Fuchs

sqrrl CTO, NSA

Who  We  Are  

20+  years  of  combined  Apache  Accumulo  engineering  exper9se  

Mark Terenzoni sqrrl CEO, F5

•  Founded  July  2012  •  Funded  August  2012  •  Team  includes  former  Tech  

Director  of  Accumulo  at  NSA  and  6  commiDers/contributors    

Page 3: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

3  Sqrrl  Data,  Inc.    All  Rights  Reserved  

3  

Our  Mission  

Security  

AdapGvity  Scalability  

Page 4: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

4  Sqrrl  Data,  Inc.    All  Rights  Reserved  

4  

Apache  Accumulo  

"   Sorted, Distributed Key/Value Store

"   Based on Google’s Big Table Design

"   Built on Top of Apache Hadoop and Apache Zookeeper

"   Augments and Integrates With the Hadoop ecosystem

"   Originally developed at the National Security Agency, now an Apache Software Foundation project

Page 5: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

5  Sqrrl  Data,  Inc.    All  Rights  Reserved  

5  

Applica9ons  

Analy9cs  APIs  

Security  &  Access  Controls  

Data  Integra9on  

Search,  Sta*s*cs,  Graph,  Lucene,  SQL,  Custom  Extensions  

IAM,  Encryp*on,  DAM,  Secure  Code  

ETL,  Hadoop  

Accumulo  

Sqrrl  Enterprise  Architecture  

Page 6: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

6  Sqrrl  Data,  Inc.    All  Rights  Reserved  

"   Start  small,  but  design  for  scalability  –  One  applicaGon  first,  then  grow  to  hundreds  –  One  gigabyte  first,  then  grow  to  petabytes  

"   Itera*ve  schema  refinement  –  IniGally,  let  the  data  define  the  schema  –  Refine  the  schema  in  bulk  as  you  beDer  understand  the  data  –  Middle  ground  between  flat  files  and  complete  ontologies  

"   Discovery  analy*cs  as  applica*on  building  blocks  –  Universal  search:  structured  and  unstructured  data,  across  data  sets,  low  latency  –  Basic  staGsGcs:  aggregaGons  of  query  results,  parallelized,  low  latency,  to  support  big  

picture  analysis  –  Graphs:  scalable  graph  analyGcs  for  analyzing  how  everything  is  connected  

"   Data-­‐centric  security  –  Separate  modeling  of  security  and  analysis  –  Simplifies  mulG-­‐tenancy  and  applicaGon  accreditaGon  

Big  Data  Lessons  Learned  

Page 7: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

7  Sqrrl  Data,  Inc.    All  Rights  Reserved  

7  

Schema  Discovery  

Page 8: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

8  Sqrrl  Data,  Inc.    All  Rights  Reserved  

The  future  of  Big  Data  innovaGon  is  Apps,  built  on:  •  Universal  Search  •  Schema-­‐less  StaGsGcs  •  Graphs  •  IntuiGve  Languages  •  Secure,  Scalable,  and  

Adaptable  pla\orms  

Lightweight  Apps  

Page 9: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

9  Sqrrl  Data,  Inc.    All  Rights  Reserved  

9  

Targeted  Analysis  

Page 10: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

10  Sqrrl  Data,  Inc.    All  Rights  Reserved  

10  

Big-Picture  Analytics  

Page 11: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

11  Sqrrl  Data,  Inc.    All  Rights  Reserved  

DefiniGon:  A  form  of  security  in  which  data  carries  with  it  the  elements  of  provenance  that  are  required  to  make  policy  decisions  on  its  releasability.  •  Separate  data  modeling  for  Security  and  Analysis  •  Reusability  of  applicaGons  across  security  domains  

•  Distributed  development  of  ingest  and  query  applicaGons  

•  Supported  by  Accumulo’s  cell-­‐level  security  

Data-Centric  Security  

Page 12: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

12  Sqrrl  Data,  Inc.    All  Rights  Reserved  

12  

Cell-Level  Security  

Page 13: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

13  Sqrrl  Data,  Inc.    All  Rights  Reserved  

13  

Scalable  Data-Centric  Security  

Data   Labeler   Accumulo   Apps  

User  ACributes  

Audits  

Policies  

HDFS,  Zookeeper  

End  Users  

Auth.  Service  

Policy  Engine  

Page 14: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

14  Sqrrl  Data,  Inc.    All  Rights  Reserved  

14  

Accumulo’s  Strengths  

"   Security  –  Cell-­‐level  security  reduces  the  cost  of  applicaGon  development  in  the  

presence  of  complex  legal  or  policy  restricGons  on  data  use  –  IAM  and  encrypGon  Ges  into  enterprise  security  standards    

"   Scalability  –  Proven  reliability  and  performance  at  the  mulG-­‐petabyte  scale  –  High-­‐performance  parallel  I/O  library    

"   Adap9vity  –  Flexible  schema  support  to  quickly  ingest  new  data  sources  –  Sorted  key/value  paradigm  supports  a  mulGtude  of  search  and  

analysis  applicaGons  –  Server-­‐side  programming  framework  “iterator  trees”  support  best-­‐in-­‐

class  aggregaGon,  filtering,  and  complex  query  semanGcs  

Page 15: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

15  Sqrrl  Data,  Inc.    All  Rights  Reserved  

15  

An  Accumulo  key  is  a  5-­‐tuple,  consis9ng  of:      "   Row:  Controls  Atomicity  "   Column  Family:  Controls  Locality    "   Column  Qualifier:    Controls  Uniqueness  "   Visibility  Label:    Controls  Access  "   Timestamp:    Controls  Versioning  

Row   Col.  Fam.   Col.  Qual.   Visibility   Timestamp   Value  

John  Doe   Notes   PCP   PCP_JD   20120912   PaGent  suffers  from  an  acute  …  

John  Doe   Test  Results   Cholesterol   JD|PCP_JD   20120912   183  

John  Doe   Test  Results   Mental  Health   JD|PSYCH_JD   20120801   Pass  

John  Doe   Test  Results   X-­‐Ray   JD|PHYS_JD   20120513   1010110110100…  

Accumulo  Key/Value  Example  

Accumulo  Key  Structure  

Page 16: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

16  Sqrrl  Data,  Inc.    All  Rights  Reserved  

16  

Accumulo  Architecture  

Tablet  Server  

Tablet  

Tablet  Server  

Tablet  

Tablet  Server  

Tablet  

ApplicaGon  

Zookeeper  

Zookeeper  

Zookeeper  

Master  

HDFS  

Read/Write  

Store/Replicate  

Assign/Balance  

Delegate  Authority  

Delegate  Authority  

ApplicaGon  

ApplicaGon  

Page 17: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

17  Sqrrl  Data,  Inc.    All  Rights  Reserved  

17  

Tablet  Data  Flow  

In-­‐Memory  Map  

Write  Ahead  Log  

(For  Recovery)  

Sorted,  Indexed  File  

Sorted,  Indexed  File  

Sorted,  Indexed  File  

Tablet  Reads  

Iterator  Tree  

Minor  Compac<on  

Merging  /  Major  Compac<on  

Iterator  Tree  

Writes   Iterator  Tree  

Scan  

Page 18: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

Iterator  Framework  

18  

Secure.        Scale.        Adapt.  

Iterator  Opera9ons:    "   File  Reads  "   Block  Caching  "   Merging  "   DeleGon  "   IsolaGon  "   Locality  Groups  "   Range  SelecGon  "   Column  SelecGon  "   Cell-­‐level  Security  "   Versioning  "   Filtering  "   AggregaGon  "   ParGGoned  Joins  

[email protected]  |  @sqrrl_inc  |  617.520.4375                          sqrrl  data,  INC.,    All  Rights  Reserved  

Page 19: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

19  Sqrrl  Data,  Inc.    All  Rights  Reserved  

•  No  built-­‐in  secondary  indices  

•  Sort  Order  ó  Index  •  Balance  between  ingest  and  query  

•  Avoid  introducing  boDlenecks  

•  Preserve  cell-­‐level  security  and  scalability  

Table  Design  Table:  

Row:  

Column  Family:  

Column  Qualifier:  

Value:  

Forward  Index  

<UUID>  

<Type>  

<Field>  

<Term>  

Inverted  Index  

<Term>  

<Type>  +  <Field>  

<UUID>  

<Digest  of  Event>  

Page 20: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

20  Sqrrl  Data,  Inc.    All  Rights  Reserved  

20  

Ecosystem  Architecture  

Apache  HDFS  

Apache  Accumulo  

Sqrrl  Enterprise  

Custom  Ingester  Web  Server    Custom  AnalyGc  Map/Reduce  Task  

Sqrrl  API  over  Apache  Thrip  RPC  :    Hierarchical  Documents  +  Graphs,  Lucene  +  SQL  +  more  

Accumulo  RPC  :  Sorted  Key/Value  I/O  

Hadoop  RPC  :  File  I/O    

Page 21: sqrrl - The Apache Software Foundationpeople.apache.org/~afuchs/slides/sqrrl_real_time_big_data_2013041… · 14 Sqrrl!Data,!Inc.!!All!Rights!Reserved! Accumulo’s!Strengths!" Security

21  Sqrrl  Data,  Inc.    All  Rights  Reserved  

21  

sqrrl  data,  inc.  275  Third  St.  

Cambridge,  MA  02142    

617-­‐902-­‐0784  www.sqrrl.com  @sqrrl_inc  

[email protected]  

Contact