analyze*datain* mongodb*with*the* hunk*app*€¦ · javascript java python* php* c# ruby* rest*api!...

29
Copyright © 2014 Splunk Inc. Asya Kamsky Principle Developer Advocate, MongoDB Analyze Data in MongoDB with the Hunk App

Upload: doantu

Post on 22-Apr-2018

234 views

Category:

Documents


4 download

TRANSCRIPT

Copyright  ©  2014  Splunk  Inc.  

Asya  Kamsky  Principle  Developer  Advocate,  MongoDB  

Analyze  Data  in  MongoDB  with  the  Hunk  App  

Disclaimer  

2  

During  the  course  of  this  presentaIon,  we  may  make  forward-­‐looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cauIon  you  that  such  statements  reflect  our  current  expectaIons  and  

esImates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  

please  review  our  filings  with  the  SEC.  The  forward-­‐looking  statements  made  in  the  this  presentaIon  are  being  made  as  of  the  Ime  and  date  of  its  live  presentaIon.  If  reviewed  aTer  its  live  presentaIon,  this  presentaIon  may  not  contain  current  or  accurate  informaIon.  We  do  not  assume  any  obligaIon  to  update  any  forward-­‐looking  statements  we  may  make.  In  addiIon,  any  informaIon  about  our  roadmap  outlines  our  general  product  direcIon  and  is  subject  to  change  at  any  Ime  without  noIce.  It  is  for  informaIonal  purposes  only,  and  shall  not  be  incorporated  into  any  contract  or  other  commitment.  Splunk  undertakes  no  obligaIon  either  to  develop  the  features  or  funcIonality  described  or  to  

include  any  such  feature  or  funcIonality  in  a  future  release.  

Speaker  Bio  

3  

!   Principal  Community  Advocate    –  Helping  users  get  the  most  out  of  their  MongoDB    

deployments  

!   Over  20  years  of  industry  experience  ranging  from  big  companies  to  start-­‐ups  

!   Her  career  has  spanned  work  in  database  technologies,  security,  soTware  tesIng,  networking,  and  the  web  

MongoDB  

Business Agility: Dynamic Data Model

Scale Fast: Operational Flexibility

Low TCO: Commodity Hardware

 Fastest Growing Database of 2013*

 *DB-Engines

Horizontally  Scalable  -­‐Sharding  

Agile  Flexible  

High  Performance  &  Strong  Consistency  

Application!

Highly  Available  -­‐Replica  Sets  

{  author:  “roger”,      date:  new  Date(),      text:  “Spirited  Away”,      tags:  [“Tezuka”,  “Manga”]}    

MongoDB  

RelaIonal  Data  Models:  Hard  to  Change  New  Table  

New  Table  

New  Column  

Name   Age   Phone   Email  

New  Column  

Documents  are  Easier    RelaAonal   MongoDB  

{ ! givenName: ‘Paul’,! surname: ‘Miller’! city: ‘London’,! location: [45.123,47.232],! cars: [ ! { model: ‘Bentley’,! year: 1973,! value: 100000, … },! { model: ‘Rolls Royce’,! year: 1965,! value: 330000, … }! ]!}!

No  SQL  But  SIll  Flexible  Querying  

Rich  Queries  •  Find  all  Mark’s  policies  •  Find  everybody  who  purchased  a  policy  last  

month  

GeospaAal   •  Find  all  customers  that  live  within  10  miles  of  NYC  

Text  Search   •  Find  all  tweets  that  menAon  the  company  within  the  last  2  days  

AggregaAon   •  What’s  the  total  value  of  Mark’s  policies  

Map  Reduce   •  How  many  customers  that  bought  an  auto  policy  got  a  home  policy  within  three  months  

{ !customer_id : 1,!!first_name : "Mark",!!last_name : "Smith",!!city : "San Francisco",!!policies: [ !{!! !policy_number : 13,!! !type : “auto”,!! !deductible: 500!! },!! {!policy_number : 14,!! !type : “life”,!! !beneficiaries: […]!! }!

] ! }!

Copyright  ©  2014  Splunk  Inc.  

Director  of  Product  MarkeIng,  Splunk  

Clint  Sharp  

The  AcceleraIng  Pace  of  Data  Volume    |    Velocity    |    Variety  |    Variability  

GPS,  RFID,  

Hypervisor,  Web  Servers,  

Email,  Messaging,  Clickstreams,  Mobile,    

Telephony,  IVR,  Databases,  Sensors,  TelemaIcs,  Storage,  

Servers,  Security  Devices,  Desktops    

Machine  data  is  the  fastest  growing,  most  complex,  most  valuable  area  of  big  data  

Industry  Leading  Big  Data  Product  Porjolio  

Real-­‐Ime  indexing    Real-­‐Ime  search  

Splunk  Apps  Vibrant  and  passionate  developer  community  

IT  Ops.  

Security  &  Compliance  

Digital  Intelli-­‐gence  

App  Dev  &  App  

Mgmt.  

Business    AnalyIcs  

Splunk  Hadoop  Connect  ODBC    DB  Connect  

Ad  hoc  analyIcs  of  historical  data  in  Hadoop  

Developers  building  big  data  apps  on  top  of  Hadoop  

3600  Customer  View  

Complete  Security  AnalyIcs  

Product  and  Service  AnalyIcs  

Powerful  Developer  Plajorm  with  Familiar  Tools  

JavaScript   Java   Python   PHP   C#   Ruby  

REST  API    

Add  New  UI  components  

Integrate  into  ExisIng  Systems  

With  Known  Languages  and  Frameworks  

Components  of  Hunk  Server  

64-­‐bit  Linux  OS  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC  

splunkd  

Hadoop  Interface  •  Hadoop  Client  Libraries  •  JAVA  

Streaming  Resource  Libraries  •  NoSQL  &  Other  Stores  

splunkweb  Web  and  ApplicaIon  server  

Virtual  Indexes  

Python,  AJAX,  CSS,  XSLT,  XML  

Search  Head   C++,  Web  Services  

Where  is  Hunk  Used?  FINANCIAL  SERVICES  •  Detect  /  prevent  fraud  •  Model  and  manage  risk  •  Personalize  banking  &  insurance  products  

WEB  /  SOCIAL  /  MOBILE  •  Product  and  customer  analyIcs  •  SenIment  analysis,  personalizaIon  •  Web  log,  image  and  video  analysis  

RETAIL  •  Behavior  analysis  •  Cross-­‐selling,  recommendaIon  engine  •  OpImize  pricing,  placement,  design  •  OpImize  inventory  and  distribuIon  

GOVERNMENT  •  Detect  and  prevent  fraud  •  Security  and  intelligence  •  Support  open  data  iniIaIves    

MANUFACTURING  •  SimulaIon,  analysis  and  design  •  Improve  service  via  product  sensor  data  •  “Digital  factory”  for  lean  manufacturing  

HEALTHCARE  •  Drug  pedigree  and  supply  chain  •  PaIent  monitoring  •  Compliance,  archival,  text  search  •  Data-­‐driven  research  

Virtual  Indexes  

•  Enables  seamless  use  of  almost  the  enIre  Splunk  stack  on  data  •  AutomaIcally  handles  query  execuIon  to  Mongo,  Hadoop,  etc.  •  Technology  is  patent  pending  

Hunk  Search  Head  >  

Examples  of  Virtual  Indexes  

External  System  1  

External  System  2    

External  System  3  

index  =  syslog        (/home/syslog/…)  

index  =  apache_logs  index  =  sensor_data  

index  =  twiyer    

Hunk  applies  schema  for  all  fields  –  including  transacIons  –  at  search  Ime  

Hunk  Applies  Schema  on  the  Fly  

•  Structure  applied  at  search  Ime  

•  No  briyle  schema  to  work  around  

•  AutomaIcally  find  payerns  and  trends  

IntegraIon  

Hunk  Search  Architecture  

Query  per  Index/Virtual  Index  

Search  Processor  

Hunk  Search  Head  >  

1.  3.  

4.  

2.  

Splunk  Distributed  Search  

Hadoop  External  Results  Provider  

MongoDB  Streaming  

Resource  Library  MongoDBProvider  

MongoDB  

MongoDB  

MongoDB  

JSON  Config  

Results  ReducIon  

Install  via  GUI  

20  

1.   2.  

3.  

Install  via  Command  Line  

21  

!   Go  to  <apps.splunk.com  URL>  !   Download  MongoDBProvider.spl  !   Either:  

–  Copy  MongoDBProvider.spl  to  $SPLUNK_HOME/etc/apps  –  tar  –zxvf  MongoDBProvider.spl  

Configure  Indexes.conf  –  Overview  

22  

! Indexes.conf  defines  indexes,  physical  and  virtual  !   Need  to  two  configuraIon  items,  a  provider  and  a  virtual  index  

–  Provider  should  be  1:1  to  your  MongoDB  Server  –  There  can  be  mulIple  virtual  indexes  per  Provider  

! Indexes.conf  can  be  in  any  Splunk  App,  probably  easiest  to  put  it  in  MongoDBProvider  folder  

Configure  Indexes.conf  

23  

[wocorders]  vix.provider  =  local-­‐mongodb  vix.mongodb.db  =  demo  vix.mongodb.collecIon  =  wocorders  vix.mongodb.field.Ime  =  Imestamp  vix.mongodb.field.Ime.format  =  date  

[provider:local-mongodb]!vix.family = mongodb_erp_family!vix.splunk.search.debug = 0!vix.mongodb.host = localhost:27017!    

Provider Name (referenced in Virtual Indexes)!Family!Disable Debugging!Hostname:Port!

Provider  

[mongodb_vix]!vix.provider = local-mongodb!vix.mongodb.db = hunk!vix.mongodb.collection = test!vix.mongodb.field.time = _id!vix.mongodb.field.time.format = ObjectId    

Name of the Virtual Index (used by users)!Provider Name (matches earlier stanza)!MongoDB DB Name!MongoDB Collection Name!Field to extract time from!Format of the Field to Extract Time From (Valid Options are ObjectID, Date, or Epoch)!

Virtual  Index  1  

Configure  Indexes.conf  

24  

[wocorders]!vix.provider = local-mongodb!vix.mongodb.db = demo!vix.mongodb.collection = wocorders!vix.mongodb.field.time = timestamp!vix.mongodb.field.time.format = date    

Name of the Virtual Index (used by users)!Provider Name (matches earlier stanza)!MongoDB DB Name!MongoDB Collection Name!Field to extract time from!Format of the Field to Extract Time From (Valid Options are ObjectID, Date, or Epoch)!

Virtual  Index  2  

How  to  Query  Mongo  

25  

index=mongodb  (foo=xyz  OR  other=val)                |        fields  foo,  bar,  baz  

Query  your  MongoDB  

Virtual  Index  

Match  any  fields  by  specifying  the  field  name  and  matching  parameters  

Minimize  results  returned  by  

projecIng  down  only  the  fields  you  want  returned  

Mongo  Specific  IntegraIon  Highlights  

26  

index=mongodb  foo=xyz                |                      Imechart  avg(bar)  by  baz  

Predicate  Pushdown   ProjecAons  

Filtering  terms  are  processed  on  the  MongoDB  side,  so  only  results  where  the  

field  foo  matches  xyz  are  returned  

We  only  return  back  fields  which  are  menIoned  in  the  parIcular  search,  in  this  

case  _Ime,  bar  and  baz  

Roadmap  for  the  Future  

27  

!   Full  text  search  engine  !   BSON  support  

Get  The  Bits!  

28  

!   Hunk  –  hyp://splunk.com/download  

! MongoDB  App  –  hyp://apps.splunk.com/app/1810/  –  Or  search  for  “MongoDB”  on  apps.splunk.com  

THANK  YOU