introduction to hadoop at data-360 conference

29
MICROSOFT CONFIDENTIAL – INTERNA Introduction to Hadoop [email protected] @avkashchauha Avkash Chauhan

Upload: avkash-chauhan

Post on 22-Jan-2015

261 views

Category:

Technology


0 download

DESCRIPTION

A short introduction to Hadoop mostly with live industry examples and scenarios.

TRANSCRIPT

2. http://www.packtpub.com/using-cloudera-impala/book http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802 https://www.linkedin.com/in/avkashchauhan 3. Hadoop is an Open Source (Java based), Scalable, fault tolerant platform for large amount of unstructured data storage & processing, distributed across machines. 4. Flexibility A Single Repo for storing and analyzing any kind of data not bounded by schema Scalability Scale-out architecture divides workload across multiple nodes using flexible distributed file system Low Cost Deployed on commodity hardware & open source platform Fault Tolerant Continue working event if node(s) go down 5. A system to move computation, where the data is. 6. Hadoop Common HDFS Map/Reduce 7. Hadoop Common HDFS MapReduce 8. Cloudera Impala Hortonworks Tez Impala uses C++ based in-memory processing of HDFS data through SQL like statements to expedite the data processing Use cases include user collaborative filtering, user recommendations, clustering and classification.