big data and hadoop for developers - syllabus.docx

4
Big Data and Hadoop for Developers – Level 1 Description Gartner predicts that 4.4 Million Jobs will be created globally to support BigData. BigData is a popular term used to describe the exponential growth, availability and use of information, both structured and unstructured. It is imperative that organizations and IT leaders focus on the ever-increasing volume, variety and velocity of information that forms BigData. Hadoop is the core platform for structuring BigData, and solves the problem of making it useful for Analytics. Our course will teach you all you need to learn about using Hadoop for BigData analysis and give you a clear understanding about processing BigData with Hadoop. Why learn about Processing BigData with Hadoop? Businesses are now aware of the large volumes of data that they generate in their day to day transactions. They have also realized that this BigData can provide very valuable insights once analyzed The massive volume of BigData and its unstructured format make it difficult to analyze BigData. Hadoop brings the ability to cheaply process large amounts of data, regardless of structure. If you are an IT professional who wants to stay up to date with the current buzzword then this is the course for you. Knowledge about processing BigData with Hadoop will also prove to be a huge Resume builder for Students who will be trying for Placements soon. If you are a developer who is uncertain about how Hadoop works, this course will clear things up and save you lot of time and effort If you are business that is planning to shift to Hadoop, then this is the right course for your employees to get trained. Processing BigData with Hadoop will prove to be an answer to many questions at once. The session will be handled by very experienced trainers who not only have immense knowledge but are also loaded with valuable experience Objectives What is Hadoop and how can it help process large data sets. How to write MapReduce programs using Hadoop API. How to use HDFS (the Hadoop Distributed Filesytem), from the command line and API, for effectively loading and processing data in Hadoop. How to ingest data from a RDBMS or a data warehouse to Hadoop. Best practices for building, debugging and optimizing Hadoop solutions. Get introduced to tools like Pig, Hive, HBase, Elastic MapReduce etc. and understand how they can help in BigData projects. Who should attend A developer who wants to learn Hadoop but you don’t know where to start A team that is struggling to extract insights from large scale and fast growing data in traditional systems

Upload: vkbm42

Post on 12-Sep-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Big Data and Hadoop for Developers Level 1

DescriptionGartner predicts that 4.4 Million Jobs will be created globally to support BigData.BigData is a popular term used to describe the exponential growth, availability and use of information, both structured and unstructured.It is imperative that organizations and IT leaders focus on the ever-increasing volume, variety and velocity of information that forms BigData.Hadoop is the core platform for structuring BigData, and solves the problem of making it useful for Analytics.Our course will teach you all you need to learn about using Hadoop for BigData analysis and give you a clear understanding aboutprocessing BigData with Hadoop.Why learn about Processing BigData with Hadoop? Businesses are now aware of the large volumes of data that they generate in their day to day transactions. They have also realized that this BigData can provide very valuable insights once analyzed The massive volume of BigData and its unstructured format make it difficult to analyze BigData.Hadoop brings the ability to cheaply process large amounts of data, regardless of structure. If you are an IT professional who wants to stay up to date with the current buzzword then this is the course for you. Knowledge about processing BigData with Hadoop will also prove to be a huge Resume builder for Students who will be trying for Placements soon. If you are a developer who is uncertain about how Hadoop works, this course will clear things up and save you lot of time and effort If you are business that is planning to shift to Hadoop, then this is the right course for your employees to get trained. Processing BigData with Hadoop will prove to be an answer to many questions at once. The session will be handled by very experienced trainers who not only have immense knowledge but are also loaded with valuable experienceObjectives What is Hadoop and how can it help process large data sets. How to write MapReduce programs using Hadoop API. How to use HDFS (the Hadoop Distributed Filesytem), from the command line and API, for effectively loading and processing data in Hadoop. How to ingest data from a RDBMS or a data warehouse to Hadoop. Best practices for building, debugging and optimizing Hadoop solutions. Get introduced to tools like Pig, Hive, HBase, Elastic MapReduce etc. and understand how they can help in BigData projects.Who should attend

A developer who wants to learn Hadoop but you dont know where to start A team that is struggling to extract insights from large scale and fast growing data in traditional systems A team that has decided to migrate from a RDBMS or a traditional data warehouse to Hadoop, but needs help getting startedCourse Outline

Day 1 and 2Introduction Big Data What is Big Data? Trends across industries. Opportunities to disrupt business models across industries. Industry specific Use Cases. Some brief Case Studies. Data Science An emerging new discipline. Skills required to be a Data Scientist. Hadoop What is Hadoop? Why do we need a new tool? / Motivations for Hadoop A comparison with traditional databases (RDBMS) and data warehouses. Data Hub/Lake/Reservoir: Therole of Hadoop in a modern data architecture. Apache Hadoop Distributions including Hadoop: Cloudera, Hortonworks, MapR, IBM, Pivotal and Intel. An overview of a typical Hadoop cluster. Hadoop Deployment Commodity Hardware Hadoop Appliances Hadoop on the Cloud Hadoop as a ServiceLab: Install and configure a multi node Hadoop cluster with AmbariData Storage File System Abstraction Big Data and Distributed File Systems Hadoop Distributed File System (HDFS) HDFS Architecture Architectural assumptions and goals How data is stored in HDFS How data is read from HDFS Namenodes and Datanodes Blocks Data Replication Fault Tolerance Data Integrity Namespaces Federation in Hadoop 2.0 High Availabilityin Hadoop 2.0 Security and Encryption HDFSInterfaces:FileSystem API,FSShell, WebHDFS, Fuse etc.Lab: Manipulating files in HDFS using hadoop fs commands.Lab:Manipulating files in HDFS pragmatically using the FileSystem API. Alternative Hadoop File Systems: IBM GPFS, MapR-FS, Lustre, Amazon S3 etc.Data Processing MapReduce The fundamentals:map() and reduce() Data Locality Architecture of the MapReduce framework. Phases of a MapReduce JobLab: Write a simple log analysis MapReduce application Job Execution Partitioners Combiners The flow ofpairs in a MapReduce JobLab:Write an Inverted Index MapReduce Application with custom Partitioner and Combiner Custom types and Composite Keys Custom Comparators InputFormats and OutputFormats Distributed Cache MapReduce Design Patterns Sorting Joins Streaming Job: Writing MapReduce programs in languages other than JavaLab: Writing a streaming MapReduce job in Python YARN and Hadoop 2.0 Separating resource management and processing YARN Applications: MapReduce, Tez, HBase, Storm, Spark, Giraph etc. YARN Architecture ResourceManager NodeManagers ApplicationMasters Containers Fault Tolerance Tez:Accelerating processing of data stored in HDFSData Integration Integrating Hadoop into your existing enterprise. Introduction to SqoopLab: Importing data from an RDBMS to HDFS using SqoopLab: Exporting data from HDFS to an RDBMS Other data integration tools: Flume, Kafka, Informatica, Talend etc.Higher Level Tools Defining workflows with Oozie An introduction toHive Architecture Interfaces: Hive Shell, Thrift, JDBC, ODBC etc. HiveQL: A dialect of SQL Data Types andFile Formats Creating Tables and Loading Data Schema at Read Querying Data User Defined Functions An introduction to Pig Grunt Shell Pigs Data Model Pig Latin User Defined Functions An introduction toHBase Architecture Client API MapReduce Integration Schema DesignDay 3 (optional) MapReduceLab: Writing custom InputFormat andOutputFormatLab:Implementing Total SortLab:Implementing Secondary Sort with Composite Keys and Custom Comparators HiveLab:Writing Hive Queries: Managed/External Tables, Formats, Partitions etc.Lab:Writing a User Defined Hive FunctionLab: Accessing data in Hive from Excel over ODBC PigLab: Writing and excuting aPig Latin scriptLab:Writing a Pig User Defined Function HBaseLab:Importing data into HBaseLab: Writing an HBase MapReduce JobOther Details

Questions?For latest batch dates, fees, location and general inquiries, contact our sales team at: +91 8880002200 or email [email protected] purely technical queries about the course please contact Bhavesh [email protected]