odsc uk 2016: how to analyse weather data and twitter sentiment with spark and watson

Post on 08-Jan-2017

514 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

@MargrietGr

Margriet Groenendijk, PhDDeveloper Advocate for IBM Cloud Data Services

Open Data Science Conference UK9 October 2016, London

How To Analyse Weather Data and Twitter Sentiment with Spark and Watson

@MargrietGr

@MargrietGr

@MargrietGr

@MargrietGr

Analyse Weather Data and Twitter Sentiment using Spark and Watson

People love to talk about the weather on Twitter

What insights can you find when combining the data?

What is the weather sentiment related to?

@MargrietGr

Bluemix

Where to find the data?

Insights for Twitter

Weather Company Data

API services from IBM Bluemix

https://console.ng.bluemix.net/

@MargrietGr

Bluemix

Where to store the data?

Availablefrom IBM Bluemix

Cloudant NoSQL DB

@MargrietGr

Where to analyse the data?

http://datascience.ibm.com

@MargrietGr

Weather Company Data

Watson Tone Analyser

TweetsWeatherSentiment

Exploring the options

Bluemix

@MargrietGr

IBM Bluemix

▪ Free trial etc▪ lots of services etc

Free 30-day trial“Big Blue Box” containing all IBMs services

https://console.ng.bluemix.net/

@MargrietGr

Add a service in Bluemix

Add a service

Search for weather

Account + spaces

Weather Company Data for IBM Bluemix

@MargrietGr

Weather Company Data for IBM Bluemix

2 Credentials3 Ready to

use the REST APIs

Add service 1

@MargrietGr

Your own weather forecast in a Python notebook

@MargrietGr

Weather Company Data API

Show json weather file

@MargrietGr

Your own weather forecast in a Python notebook

London

@MargrietGr

https://developer.ibm.com/clouddataservices/2016/10/06/your-own-weather-forecast-in-a-python-notebook/

Weather in UK on Friday evening 7 October

@MargrietGr

Store weather data in Cloudant

@MargrietGr

Python script run daily on a Bluemix VM service

https://python-cloudant.readthedocs.io

@MargrietGr

Python script run daily on a Bluemix VM service

Add crontab job to run daily

Cloudant

@MargrietGr

▪ cloudant

▪ etc

@MargrietGr

▪ geospatial index▪ show map with 100 cities :-)

@MargrietGr

▪ geospatial index▪ show map with 100 cities :-)

@MargrietGr

Weather Company Data

Watson Tone Analyser

TweetsWeatherSentiment

Insights for Twitter

@MargrietGr

Insights for Twitter

@MargrietGr

Insights for Twitter

Only a 100…

dashDB

@MargrietGr

Add the dashDB service in Bluemix

Add a service

Search for dashDB

@MargrietGr

@MargrietGr

Use an existing service3

1

2

posted:2016-08-01,2016-10-01 followers_count:3000 friends_count: 3000 (weather OR sun OR sunny OR rain OR hail OR storm OR rainy OR drought OR flood OR hurricane OR tornado OR cold OR snow OR drizzle OR cloudy OR thunder OR lightning OR wind OR windy OR heatwave)

REST API docs:https://new-console.ng.bluemix.net/docs/services/Twitter/twitter_rest_apis.html#rest_apis

Search for tweets

4 Select table

@MargrietGr

@MargrietGr

Weather Company Data

Watson Tone Analyser

TweetsWeatherSentiment

Explore the data

IBM Data Science Experience

@MargrietGr

Nested data…

@MargrietGr

@MargrietGr

Load tweets from dashDB with Spark SQL

@MargrietGr

Clean data, summarise and load into pandas DataFrame

@MargrietGr

Add weather to tweets

Weather data is nested, pyspark.sql struggles with thatThere is no location data of tweets

Only 10% of all tweets available in the free plan through the Decahose streamWeather API only has 24 hours of data available

@MargrietGr

Weather Company Data

Watson Tone Analyser

TweetsWeatherSentimentX

@MargrietGr

Weather Company Data

crontab -e

0 23 * * * /path/to/file/do_something.sh

python do_something.py

TweetsWeatherSentiment

Watson Tone Analyser

@MargrietGr

Add sentiment - example

@MargrietGr

@MargrietGr

#Matthew

@MargrietGr

Use an existing service3

1

2

posted:2016-08-26,2016-10-06 followers_count:1000 friends_count:1000 (matthew OR hurricane matthew OR hurricane)

REST API docs:https://new-console.ng.bluemix.net/docs/services/Twitter/twitter_rest_apis.html#rest_apis

#matthew tweets

4 Select table

@MargrietGr

@MargrietGr

@MargrietGr

@MargrietGr

Some lessons learnedAPIs are great!Can extend and build on this, as all data is in the Cloud

Weather data only available for 24 hrs, great for weather apps, but harder to combine weather with historical tweets, need a daily script

Now ready to build a more efficient workflow that will be easily able to handle millions of tweetsStart a more in depth analysis in the Data Science Experience

@MargrietGr

▪ analyse data!▪ pretty plots

https://github.com/ibm-cds-labs/pixiedust

@MargrietGr

Margriet Groenendijk, PhDDeveloper Advocate for IBM Cloud Data Services

https://developer.ibm.com/clouddataservices/author/mgroenen/

Thank you!

Slides will be available onhttp://www.slideshare.net/MargrietGroenendijk

top related