collecting and analyzing sensor data with hadoop or other no sql databases
DESCRIPTION
Scouting howto collecting and analyzing sensor data with hadoop or other no sql databasesTRANSCRIPT
Collecting and Analyzing
sensor data
Bigdata with Hadoop or other NoSQL databases
Who am I
I am an Open Source enthusiast!
matteo DOT redaelli AT gmail DOT com
http://www.redaelli.org/matteo/
Hadoop ecosystem (1 of 2)
● HDFS is the distribuited file system of Hadoop: data are usually stored as text/csv files (rows are distribuited in the cluster)
● HIVE is the datawarehouse of Hadoop
Hadoop ecosystem (2 of 2)
http://www.adaltas.com/blog/2013/02/08/hadoop-2013-ecosystem/
http://hadoopecosystemtable.github.io/
http://wikibon.org/wiki/v/HBase,_Sqoop,_Flume_and_More:_Apache_Hadoop_Defined
Storing: Hadoop HDFS
http://hortonworks.com/hadoop-tutorial/how-
to-analyze-machine-and-sensor-data/
Analysis over Hadoop(Hadoop) Hive (sql like)
Map & Reduce:● Cascading and Cascalog● Apache Pig● R
Apache Spark (in memory) data processorApache Storm (from Twitter) for realtime computingApache Drill from MapRApache Samza (from linkedin)
Hadoop & Cloud
Amazon:● Amazon Redshift (datawarehouse)● Amazon S3 + EMR (hadoop MapR)● Amazon Kinesis for realtime processing
Google cloud & hadoop
Hadoop evolution (1 of 2)
Hadoop evolution (2 of 2)
Hadoop top distributions: Cloudera
Hadoop top distributions: Hortonworks
Hadoop top distributions: MapR
Hadoop alternatives: Cassandra
Cassandra from Facebook
● http://www.slideshare.
net/patrickmcfadin/time-series-with-apache-
cassandra-strata
Hadoop alternatives: MongoDB
MongoDB
http://blog.mongodb.
org/post/65517193370/schema-design-for-
time-series-data-in-mongodb
Hadoop alternatives: Riak
Riakhttp://docs.basho.com/riak/1.2.1
/cookbooks/use-cases/sensor-data/
Hadoop alternatives: Kafka + Storm
Apache Kafka (from Linkedin) for aggregating
Apache Storm (from Twitter) for realtime computing
Alternatives: timeseries databases
OpenTSDB Hadoop Hbase
Influxdb
Kairosdb Cassandra, Hadoop Hbase
Referenceshttp://hadoop.apache.org/
http://www.slideshare.net/Datadopter/lambdoop-a-framework-for-easy-development-of-big-data-applications
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
http://2014.nosql-matters.org/cgn/wp-content/uploads/2014/05/Aerospike_NoSQL_Matters-Brian-B.pdf
http://www.slideshare.net/KaiWaehner/2014-05-jaxdwhvsbigdatavsrealtime
http://www.infoq.com/articles/stream-processing-hadoop