big data groningen

Download Big data groningen

Post on 11-Apr-2017

320 views

Category:

Technology

0 download

Embed Size (px)

TRANSCRIPT

  • Small cars are Small cars are dangerous!dangerous!

    Willem HendriksData Scientist IBM

    willem.hendriks@nl.ibm.comhttps://github.com/willemhendriks

    mailto:willem.hendriks@nl.ibm.com

  • Nice to be in Groningen again!

  • More data usually beats better algorithmsAnand Rajaraman (when teaching at Stanford)

    http://anand.typepad.com/datawocky/2008/03/more-data-usual.html

    What I learned in Groningen...What I learned in Groningen... What I am doing now...What I am doing now...

  • Parallel Computing is not easy....

  • Google Trends of Apache Spark

    Apache Spark is a fast and general engine for large-scale data processing.

  • Why Spark?(4) Nice library!

  • Is it really that easy & Is it really that easy & quick?quick?

    Best deal if you want a Mercedes MLBest deal if you want a Mercedes ML Best place to have dinner in BrusselsBest place to have dinner in Brussels(and have a walk afterwards)(and have a walk afterwards)

  • Let's combine police reports datasets & marktplaats advertisements...(not big data, just a toy example of spark)

    Do thieves like certain neighborhoods with certain items?

  • Download advertisement data with script

    Find postal code of each neighborhood

    Combine in Apache Spark

  • Scale models are an indication for burglary! Check marktplaats.nl if more than 10advertisements are in a radius of 600 meter!!!!

    Maybe markplaats.nl advertisements can predict.....House-pricing trends? Crime? Education level?They have something!!!

    If you were asked to build a model, on the Netherlands, what tool would you use?

    *dataset too small to make this statement

  • Try yourself! (GB's limited)Try yourself! (GB's limited)

    Mix with various Services, e.g. Hadoop/NoSQLMix with various Services, e.g. Hadoop/NoSQL Free Trial & Paid (with Serious Power)Free Trial & Paid (with Serious Power) Made for the App Developer Made for the App Developer

  • Run Spark Online (Various) Notebook, to use for Python, Scala, & R Free, perfect to start & learn! (examples) Made for the Data Scientist

    Try yourself! (GB's limited)Try yourself! (GB's limited)

  • IBM Will: Educate one million data IBM Will: Educate one million data scientists and data engineers on Apache scientists and data engineers on Apache Spark through extensive partnerships with Spark through extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOCand Big Data University MOOCJoin us, & start today at the BIG DATA Join us, & start today at the BIG DATA University! https://bigdatauniversity.com/University! https://bigdatauniversity.com/Spark Hackathon Coming soon in NL!Spark Hackathon Coming soon in NL!

    IBM Wants YOU to learn spark!

  • Questions about... Questions about... Start with Spark?Start with Spark? IBM & Spark?IBM & Spark? Marktplaats.nl?Marktplaats.nl?

    Code will be on GithubCode will be on Github(after cleaning) (after cleaning)

    Thank you!Thank you!

    Willem Hendriks06 2240 8900

    Data Scientist IBM

    willem.hendriks@nl.ibm.comhttps://github.com/willemhendriks

    mailto:willem.hendriks@nl.ibm.com

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14