big data for everyman

Post on 20-Dec-2014

2.317 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A presentation given by Erik Swan, CTO/Co-Founder of Splunk and Michael Wilde, Splunk NInja at the SXSW Interactive 2012 Conference on March 11, 2011

TRANSCRIPT

Big Data for Everyman

Erik Swan, Michael Wilde

Hi... We work at Splunk.

We stare at data all day.

WTF is Big Data?!

larger than small data?

smaller than giant data?

some cool sauce for DBAs?

Aaaahhh, no.

a simple way to describe a massive problem

*or opportunity depending on your p.o.v.

Volume | Velocity | Variety | Variability

GPS,RFID,

Hypervisor,Web Servers,

Email, MessagingClickstreams, Mobile,

Telephony, IVR, Databases,Sensors, Telematics, Storage,

Servers, Security Devices, Desktops

Big data comes out of machines

Volume | Velocity | Variety | Variability

GPS,RFID,

Hypervisor,Web Servers,

Email, MessagingClickstreams, Mobile,

Telephony, IVR, Databases,Sensors, Telematics, Storage,

Servers, Security Devices, Desktops

Machine-generated data is one of the fastest growing, most complex

and most valuable segments of big data

Big data comes out of machines

no, not uswe’re justnice guyswho wantshow youcool stuff

you are a producer and consumer of data

building a service?

using an app?

Location-­‐Based  Messaging  and  Intelligence  For  Your  App  and  Your  Customers

Seth RabinowitzCEO

James RodmellCTO

2011-11-06 11:57:31,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.75496,-73.963853,60

2011-11-06 12:17:32,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.755001,-73.963886,70

2011-11-06 12:37:33,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754982,-73.963849,75

2011-11-06 12:57:34,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754984,-73.963883,85

2011-11-06 13:17:35,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754941,-73.9639,90

2011-11-06 13:37:36,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754948,-73.963874,90

2011-11-06 13:57:37,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754931,-73.963892,95

2011-11-06 14:17:38,50,00027d27-ae02-627d-a79a-fa0004d3a347,40.755232,-73.963522,100

2011-11-06 14:37:33,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754979,-73.9639,100

Data! Good!DATE/TIME

DEVICE ID

LAT/LONG

BATTERY STRENGTH

Oh, real quick. Did you check in

or tweet #splunk #sxsw

...please

All this data can be pretty cooland empowering

Text

except one little

PROBLEM

alot of it looks like this

13/Apr/2011 08:52:53,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.16,192.168.1.6,(empty),(empty),1099,135,epmap,(empty),0,113/Apr/2011 08:52:53,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.16,192.168.1.6,(empty),(empty),1100,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1048,135,epmap,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1049,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1051,135,epmap,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1052,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.64,192.168.1.6,(empty),(empty),1694,135,epmap,(empty),0,1

and we’re expected to talk to it like this

select (select max(answer.answer) from answer where answer.member_id in (select member_id from team_members where project_id in ( select project_idfrom project where Business_stream='Upstream' and stage='Appraise' andproject_id in (select project_id from projectextra where subteam<>1 ) ) ) andanswer.page_id=page.page_id) as thinl, (select max(avgscore) from task_projectwhere task_project.project_id not in (select project_id from projectextrawhere subteam=1 ) and task_project.project_id in (select project_id fromproject where stage='Appraise' and Business_stream = 'Upstream') andtask_project.page_id=page.page_id) as bmax, (select max(answer) from answerwhere answer.page_id=page.page_id) as datamax, (select avg(avgscore) fromtask_project where project_id=1 and task_project.page_id=page.page_id) asprojavg, (select avg(avgscore) from task_project where project_id not in(select project_id from projectextra where subteam=1) andtask_project.page_id=page.page_id) as companyavg, (select avg(avgscore) fromtask_project where project_id not in (select project_id from projectextrawhere subteam=1) and project_id in (select project_id from project whereBusiness_stream = 'Upstream') and task_project.page_id=page.page_id) asbusinessavg, page.* from page,riverorder where page.category_name='BusinessBoundaries' and stage_name='Appraise' andriverorder.category_name=page.category_name order byriverorder.riverorder,page.order_id select (select max(answer.answer) fromanswer where answer.member_id in ( select member_id from team_members whereproject_id in ( select project_id from project whereBusiness_stream='Upstream' and stage='Appraise' and project_id in (selectproject_id from projectextra where subteam<>1 ) ) ) andanswer.page_id=page.page_id) as thinl, (select max(avgscore) from task_projectwhere task_project.project_id not in (select project_id from projectextrawhere subteam=1 ) and task_project.project_id in (select project_id fromproject where stage='Appraise' and Business_stream = 'Upstream') andtask_project.page_id=page.page_id) as bmax, (select max(answer) from answerwhere answer.page_id=page.page_id) as datamax, (select avg(avgscore) fromtask_project where project_id=1 and task_project.page_id=page.page_id) asprojavg, (select avg(avgscore) from task_project where project_id not in(select project_id from projectextra where subteam=1) andtask_project.page_id=page.page_id) as companyavg, (select avg(avgscore) fromtask_project where project_id not in (select project_id from projectextrawhere subteam=1) and project_id in (select project_id from project whereBusiness_stream = 'Upstream') and task_project.page_id=page.page_id) asbusinessavg, page.* from page,riverorder where page.category_name='BusinessBoundaries' and stage_name='Appraise' andriverorder.category_name=page.category_name order byriverorder.riverorder,page.order_id

It could be better. yes? better is good!

{[-­‐]    checkin  :  {[-­‐]        badges  :  [],        created  :  1331454784,        geolat  :  "30.2640941786",        geolong  :  "-­‐97.7414819408",        mayor  :  {[-­‐]            type  :  "nochange"        },        primarycategory  :  {[-­‐]            fullpathname  :  "Food:American  Restaurants",            iconurl  :  "https://foursquare.com/img/categories/food/default.png",            id  :  "4bf58dd8d48988d14e941735",            nodename  :  "American  Restaurants"        },        timezone  :  "America/Chicago",        user  :  {[-­‐]            gender  :  "male"        },        venue  :  {[-­‐]            id  :  "4d752b1bba682d43e7563876",            name  :  "CNN  Grill  @  SXSW  (Max's  Wine  Dive)"        }    }} readable, ya think?

Text

source=foursquare | timechart count by checkin.venue.name

The languages to talk to data are getting better for us humans

Guys.. come on! Go back to the data please.

a simple way to describe a massive problem

A friend in Boulder can help

Need data?

Just when you think you’re all done, wait. There is another

consumer you may have forgotten

Someone with a different

perspective sees your service as

input to theirs

DEMAND REALTIME DATAIN A STREAM OVER THE WEB

IN JSON FORMAT

Hey audience!We still have a few

minutes.

What questions might you have

been saving until this exact moment?

Thanks.

Erik Swan, CTO Co-Founder,

Splunk

Michael WildeSplunk Ninja

Who else sends you on your way with a cute dog photo?

top related