Download - Big data groningen

Transcript
Page 1: Big data groningen

Small cars are Small cars are dangerous!dangerous!

Willem HendriksData Scientist IBM

[email protected]://github.com/willemhendriks

Page 2: Big data groningen

Nice to be in Groningen again!

Page 3: Big data groningen

“More data usually beats better algorithms”Anand Rajaraman (when teaching at Stanford)

http://anand.typepad.com/datawocky/2008/03/more-data-usual.html

What I learned in Groningen...What I learned in Groningen... What I am doing now...What I am doing now...

Page 4: Big data groningen

Parallel Computing is not easy....

Page 5: Big data groningen

Google Trends of “Apache Spark”

Apache Spark™ is a fast and general engine for large-scale data processing.

Page 6: Big data groningen

Why Spark?(4) Nice library!

Page 7: Big data groningen

Is it really that easy & Is it really that easy & quick?quick?

Best deal if you want a Mercedes MLBest deal if you want a Mercedes ML Best place to have dinner in BrusselsBest place to have dinner in Brussels(and have a walk afterwards)(and have a walk afterwards)

Page 8: Big data groningen

Let's combine police reports datasets & marktplaats advertisements...(not big data, just a toy example of spark)

Do thieves like certain neighborhoods with certain items?

Page 9: Big data groningen

Download advertisement data with script

Find postal code of each neighborhood

Combine in Apache Spark

Page 10: Big data groningen

Scale models are an indication for burglary! Check marktplaats.nl if more than 10advertisements are in a radius of 600 meter!!!!

Maybe markplaats.nl advertisements can predict.....House-pricing trends? Crime? Education level?They have something!!!

If you were asked to build a model, on the Netherlands, what tool would you use?

*dataset too small to make this statement

Page 11: Big data groningen

Try yourself! (GB's limited)Try yourself! (GB's limited)

● Mix with various Services, e.g. Hadoop/NoSQLMix with various Services, e.g. Hadoop/NoSQL

● Free Trial & Paid (with Serious Power)Free Trial & Paid (with Serious Power)

● Made for the App Developer Made for the App Developer

Page 12: Big data groningen

● Run Spark Online

● (Various) Notebook, to use for Python, Scala, & R

● Free, perfect to start & learn! (examples)

● Made for the Data Scientist

Try yourself! (GB's limited)Try yourself! (GB's limited)

Page 13: Big data groningen

IBM Will: “Educate one million data IBM Will: “Educate one million data scientists and data engineers on Apache scientists and data engineers on Apache Spark through extensive partnerships with Spark through extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOC”and Big Data University MOOC”

Join us, & start today at the BIG DATA Join us, & start today at the BIG DATA University! https://bigdatauniversity.com/University! https://bigdatauniversity.com/

Spark Hackathon Coming soon in NL!Spark Hackathon Coming soon in NL!

IBM Wants YOU to learn spark!

Page 14: Big data groningen

Questions about... Questions about... Start with Spark?Start with Spark? IBM & Spark?IBM & Spark? Marktplaats.nl?Marktplaats.nl?

Code will be on GithubCode will be on Github(after cleaning) (after cleaning)

Thank you!Thank you!

Willem Hendriks06 2240 8900

Data Scientist IBM

[email protected]://github.com/willemhendriks


Top Related