collecting and analyzing sensor data with hadoop or other no sql databases

Download Collecting and analyzing sensor data with hadoop or other no sql databases

Post on 01-Dec-2014

508 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

Scouting howto collecting and analyzing sensor data with hadoop or other no sql databases

TRANSCRIPT

  • 1. Collecting and Analyzing sensor data Bigdata with Hadoop or other NoSQL databases
  • 2. Who am I I am an Open Source enthusiast! matteo DOT redaelli AT gmail DOT com http://www.redaelli.org/matteo/
  • 3. Hadoop ecosystem (1 of 2) HDFS is the distribuited file system of Hadoop: data are usually stored as text/csv files (rows are distribuited in the cluster) HIVE is the datawarehouse of Hadoop
  • 4. Hadoop ecosystem (2 of 2) http://www.adaltas. com/blog/2013/02/08/hadoop-2013-ecosystem/ http://hadoopecosystemtable.github.io/ http://wikibon.org/wiki/v/HBase,_Sqoop, _Flume_and_More:_Apache_Hadoop_Defined
  • 5. Collecting Apache flume from Cloudera
  • 6. Storing: Hadoop HDFS http://hortonworks.com/hadoop-tutorial/how-to- analyze-machine-and-sensor-data/
  • 7. Analysis over Hadoop (Hadoop) Hive (sql like) Map & Reduce: Cascading and Cascalog Apache Pig R Apache Spark (in memory) data processor Apache Storm (from Twitter) for realtime computing Apache Drill from MapR Apache Samza (from linkedin)
  • 8. Hadoop & Cloud Amazon: Amazon Redshift (datawarehouse) Amazon S3 + EMR (hadoop MapR) Amazon Kinesis for realtime processing Google cloud & hadoop
  • 9. Hadoop evolution (1 of 2)
  • 10. Hadoop evolution (2 of 2)
  • 11. Hadoop top distributions: Cloudera
  • 12. Hadoop top distributions: Hortonworks
  • 13. Hadoop top distributions: MapR
  • 14. Hadoop alternatives: Cassandra Cassandra from Facebook http://www.slideshare. net/patrickmcfadin/time-series-with-apache-cassandra- strata
  • 15. Hadoop alternatives: MongoDB MongoDB http://blog.mongodb. org/post/65517193370/schema-design-for-time- series-data-in-mongodb
  • 16. Hadoop alternatives: Riak Riak http://docs.basho.com/riak/1.2.1 /cookbooks/use-cases/sensor-data/
  • 17. Hadoop alternatives: Kafka + Storm Apache Kafka (from Linkedin) for aggregating Apache Storm (from Twitter) for realtime computing
  • 18. Alternatives: timeseries databases OpenTSDB Hadoop Hbase Influxdb Kairosdb Cassandra, Hadoop Hbase
  • 19. References http://hadoop.apache.org/ http://www.slideshare.net/Datadopter/lambdoop-a-framework-for-easy-development-of-big-data-applications http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis http://2014.nosql-matters.org/cgn/wp-content/uploads/2014/05/Aerospike_NoSQL_Matters-Brian-B.pdf http://www.slideshare.net/KaiWaehner/2014-05-jaxdwhvsbigdatavsrealtime http://www.infoq.com/articles/stream-processing-hadoop

Recommended

View more >