big data for everyman

Click here to load reader

Post on 20-Dec-2014




0 download

Embed Size (px)


A presentation given by Erik Swan, CTO/Co-Founder of Splunk and Michael Wilde, Splunk NInja at the SXSW Interactive 2012 Conference on March 11, 2011


  • 1. Erik Swan, Michael WildeBig Data for Everyman

2. Hi... We work at Splunk. 3. We stare at data all day. 4. WTF is Big Data?! 5. larger than small data? 6. smaller than giant data? 7. some cool sauce for DBAs? 8. Aaaahhh, no. 9. a simple way to describe a massive problem *or opportunity depending on your p.o.v. 10. Big data comes out of machines GPS,RFID, Hypervisor, Web Servers, Email, Messaging Clickstreams, Mobile, Telephony, IVR, Databases,Sensors, Telematics, Storage, Servers, Security Devices, DesktopsVolume | Velocity | Variety | Variability 11. Big data comes out of machines Machine-generated data is one of theGPS,fastest growing, most complexRFID,and most valuable segments of big dataHypervisor,Web Servers,Email, MessagingClickstreams, Mobile,Telephony, IVR, Databases, Sensors, Telematics, Storage,Servers, Security Devices, DesktopsVolume | Velocity | Variety | Variability 12. no, not uswere justnice guyswho wantshow youcool stu 13. building a service?you are a producer andconsumer of data using an app? 14. Seth RabinowitzJames RodmellCEOCTOLocation-Based Messaging and Intelligence For Your App and Your Customers 15. DATE/TIMEData! Good!DEVICE ID2011-11-06 11:57:31,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.75496,-73.963853,602011-11-06 12:17:32,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.755001,-73.963886,702011-11-06 12:37:33,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754982,-73.963849,75LAT/LONG2011-11-06 12:57:34,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754984,-73.963883,852011-11-06 13:17:35,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754941,-73.9639,902011-11-06 13:37:36,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754948,-73.963874,902011-11-06 13:57:37,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754931,-73.963892,95 BATTERY STRENGTH2011-11-06 14:17:38,50,00027d27-ae02-627d-a79a-fa0004d3a347,40.755232,-73.963522,1002011-11-06 14:37:33,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754979,-73.9639,100 16. show them somethingcool already! 17. Oh, real quick. Did you check inor tweet #splunk #sxsw...please 18. All this data can be pretty cooland empowering 19. except one littlePROBLEM Text 20. alot of it looks like this 21. 0,113/Apr/2011 08:52:53,Info,Teardown,ASA-session-6-302014,TCP,,,(empty),(empty),1100,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,,,(empty),(empty),1048,135,epmap,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,,,(empty),(empty),1049,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,,,(empty),(empty),1051,135,epmap,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,,,(empty),(empty),1052,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,,,(empty),(empty),1694,135,epmap,(empty), 22. and were expected to talk to it like this 23. select (select max(answer.answer) from answer where answer.member_id in (select member_id from team_members where project_id in ( select project_idfrom project where Business_stream=Upstream and stage=Appraise andproject_id in (select project_id from projectextra where subteam1 ) ) ) andanswer.page_id=page.page_id) as thinl, (select max(avgscore) from task_projectwhere task_project.project_id not in (select project_id from projectextrawhere subteam=1 ) and task_project.project_id in (select project_id fromproject where stage=Appraise and Business_stream = Upstream) andtask_project.page_id=page.page_id) as bmax, (select max(answer) from answerwhere answer.page_id=page.page_id) as datamax, (select avg(avgscore) fromtask_project where project_id=1 and task_project.page_id=page.page_id) asprojavg, (select avg(avgscore) from task_project where project_id not in(select project_id from projectextra where subteam=1) andtask_project.page_id=page.page_id) as companyavg, (select avg(avgscore) fromtask_project where project_id not in (select project_id from projectextrawhere subteam=1) and project_id in (select project_id from project whereBusiness_stream = Upstream) and task_project.page_id=page.page_id) asbusinessavg, page.* from page,riverorder where page.category_name=BusinessBoundaries and stage_name=Appraise andriverorder.category_name=page.category_name order byriverorder.riverorder,page.order_id select (select max(answer.answer) fromanswer where answer.member_id in ( select member_id from team_members whereproject_id in ( select project_id from project where 24. It could be better.yes? better is good! 25. {[-] checkin : {[-] badges : [], created : 1331454784, geolat : "30.2640941786", geolong : "-97.7414819408", mayor : {[-] type : "nochange" }, primarycategory : {[-] fullpathname : "Food:American Restaurants", iconurl : "", id : "4bf58dd8d48988d14e941735", Text nodename : "American Restaurants" }, timezone : "America/Chicago", user : {[-] gender : "male" }, venue : {[-] id : "4d752b1bba682d43e7563876", name : "CNN Grill @ SXSW (Maxs Wine Dive)" } }}readable, ya think? 26. source=foursquare | timechart count by The languages to talk to data are getting better for us humans 27. Guys.. come on! Goback to the data please. 28. Need data?a simple way to describe a massive problemA friend in Boulder can help 29. The Social Media APIJud ValeskiCo-Founder, CEO 30. Just when you think youre all done, wait. There is another consumer you may haveforgotten 31. Someone witha dierent perspectivesees yourservice asinput to theirs 32. DEMAND REALTIME DATAIN A STREAM OVER THE WEBIN JSON FORMAT 33. Hey audience!We still have a fewminutes.What questionsmight you have been saving untilthis exact moment? 34. Thanks.Erik Swan, CTO Co-Founder, SplunkMichael WildeSplunk NinjaWho else sends you on your way with a cute dog photo?

View more