eddie aronovich [email protected]. “command line” input files web crawling (pull) web...
TRANSCRIPT
TOOLS PRESENTATION
Eddie [email protected]
ONCE UPON A TIME
“EVOLUTION OF THE INPUT”
“command line” input
Files
Web crawling (pull)
Web sensors (using API - push)
EVOLUTION OF THE OUTPUT (MULTIPLE DIMENSIONS)
LinkedIn MAP Gapminder
- http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html- http://www.ted.com/talks/nicholas_christakis_the_hidden_influence_of_social_networks.html
API EXAMPLES
Twitter http://
api.twitter.com/1/users/show.json?screen_name=TheMarker
Format the output (json)https://dev.twitter.com/docs/api/1/get/search
FB /usr/bin/python fbconole.pyfql("SELECT uid FROM user WHERE username='ariel.bardavid.5'"
https://developers.facebook.com/docs/reference/apis/
PYTHON CODE FOR JSON FORMAT
import jsonfrom pprint import pprintjson_data=open('json_data') data = json.load(json_data)pprint(data)json_data.close()
WEB CRAWLING
wget + parser (html2txt)
ETL (Extract, Transform, Load)
Structured vs. Unstructured data
SOME GENERAL TOOLS
Scripting bash sed awk
cron (and scratch space)
Hadoop Condor
OVERVIEW
Collect Data (and extract it)
Analyze Data
Build a model
Run the model
Collect more data