big data simplified
TRANSCRIPT
BIG DATA simplified!
Pravin Hanchinalpravinhanchinal.com
Big Data
Big Data is a
cluster of many technologies and tools
that are used in various scenarios.
(Hadoop + HDFS+ Hcatalog+Flume+PowerView)
(HortonWorks + PowerView)
How Big is Big Data?
Byte of data: one grain of rice
Kilobyte: cup of rice
Megabyte: 8 bags of rice
Gigabyte: 3 container of lorries
Terabyte: 2 container ships
Petabyte: covers Mumbai
Exabyte: covers India
Zettabyte: fills Indian Ocean
MapReduce•MapReduce is a processing technique and a program model for distributed computing based on java.•The MapReduce algorithm contains two important tasks, namely Map and Reduce.
Why Hadoop?
-> Hadoop modeling and development: MapReduce, Pig, Mahout-> Hadoop storage and data management: HDFS, HBase, Cassandra-> Hadoop data warehousing, summarization and query: Hive, Sqoop-> Hadoop data collection, aggregation and analysis: Chukwa, Flume-> Hadoop metadata, table and schema management: HCatalog-> Hadoop cluster management, job scheduling and workflow: ZooKeeper, Oozie and Ambari-> Hadoop Data serialization: Avro
Stay connected
pravinhanchinal.com
Resourceshttp://pravinhanchinal.com/what-is-for-what-hadoop-tools
https://blog.cloudera.com/blog/2014/01/how-to-create-a-simple-hadoop-cluster-with-virtualbox/
http://pingax.com/install-apache-hadoop-ubuntu-cluster-setup/
https://de.slideshare.net/EdurekaIN/ha-webinar-48976388
Resourceshttps://ayende.com/blog/4435/map-reduce-a-visual-explanation
MultiNode on Amazon: https://dzone.com/articles/how-set-multi-node-hadoop
https://ayende.com/blog/4435/map-reduce-a-visual-explanation
Run Sample MapReduce Examples:
MapReduce examples: http://www.informit.com/articles/article.aspx?p=2190194&seqNum=3
https://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/