Collecting and Analyzing
sensor data
Bigdata with Hadoop or other NoSQL databases
Who am I
I am an Open Source enthusiast!
matteo DOT redaelli AT gmail DOT com
http://www.redaelli.org/matteo/
Hadoop ecosystem (1 of 2)
● HDFS is the distribuited file system of Hadoop: data are usually stored as text/csv files (rows are distribuited in the cluster)
● HIVE is the datawarehouse of Hadoop
Hadoop ecosystem (2 of 2)
http://www.adaltas.com/blog/2013/02/08/hadoop-2013-ecosystem/
http://hadoopecosystemtable.github.io/
http://wikibon.org/wiki/v/HBase,_Sqoop,_Flume_and_More:_Apache_Hadoop_Defined
Storing: Hadoop HDFS
http://hortonworks.com/hadoop-tutorial/how-
to-analyze-machine-and-sensor-data/
Analysis over Hadoop(Hadoop) Hive (sql like)
Map & Reduce:● Cascading and Cascalog● Apache Pig● R
Apache Spark (in memory) data processorApache Storm (from Twitter) for realtime computingApache Drill from MapRApache Samza (from linkedin)
Hadoop & Cloud
Amazon:● Amazon Redshift (datawarehouse)● Amazon S3 + EMR (hadoop MapR)● Amazon Kinesis for realtime processing
Google cloud & hadoop
Hadoop evolution (1 of 2)
Hadoop evolution (2 of 2)
Hadoop top distributions: Cloudera
Hadoop top distributions: Hortonworks
Hadoop top distributions: MapR
Hadoop alternatives: Cassandra
Cassandra from Facebook
● http://www.slideshare.
net/patrickmcfadin/time-series-with-apache-
cassandra-strata
Hadoop alternatives: MongoDB
MongoDB
http://blog.mongodb.
org/post/65517193370/schema-design-for-
time-series-data-in-mongodb
Hadoop alternatives: Riak
Riakhttp://docs.basho.com/riak/1.2.1
/cookbooks/use-cases/sensor-data/
Hadoop alternatives: Kafka + Storm
Apache Kafka (from Linkedin) for aggregating
Apache Storm (from Twitter) for realtime computing
Alternatives: timeseries databases
OpenTSDB Hadoop Hbase
Influxdb
Kairosdb Cassandra, Hadoop Hbase
Referenceshttp://hadoop.apache.org/
http://www.slideshare.net/Datadopter/lambdoop-a-framework-for-easy-development-of-big-data-applications
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
http://2014.nosql-matters.org/cgn/wp-content/uploads/2014/05/Aerospike_NoSQL_Matters-Brian-B.pdf
http://www.slideshare.net/KaiWaehner/2014-05-jaxdwhvsbigdatavsrealtime
http://www.infoq.com/articles/stream-processing-hadoop