big data & hadoop by mr.nataraj smallest unit is bit 1 byte=8 bits 1 kb (kilo byte)= 1024 bytes...

34

Upload: elinor-tyler

Post on 19-Dec-2015

239 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits
Page 2: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

Big Data & HadoopBy Mr.Nataraj

Page 3: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

smallest unit is bit1 byte=8 bits1 KB (Kilo Byte) = 1024 bytes =1024*8 bits1MB (Mega Byte) =1024 KB =(1024)^2 * 8 bits1 GB (Giga Byte) =1024 MB =(1024)^3 * 8 bits1 TB (Tera Byte) =1024GB =(1024)^4 * 8 bits1 PB (Peta Byte) =1024 TB =(1024)^5 * 8 bits1 EB (Exa Byte) =1024 PB =(1024)^6 * 8 bits1 ZB (Zetta Byte) =1024 EB =(1024)^7 * 8 bits1 YB (Yotta Byte) =1024 ZB =(1024)^8 * 8 bits1 XB (Xenotta Byte) =1024 YB =(1024)^9 * 8 bits

UNITS OF DATA

Page 4: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

1 byte =A single character1 KB = A very short story1 MB=A small novel (6 seconds of TV-quality video)1 Gigabyte: A pickup truck filled with paper1 Terabyte : 50000 trees made into paper2 PB: All US academic research libraries5 EB: All words ever spoken by human beings

HOW BIG ARE THOSE NUMBERS

Page 5: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

WHAT IS BIG DATA

Page 6: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

SOME INTERESTING FACTS• Google: 20,00,000 query per second• Facebook 34000 likes per minute• Online Shopping of USD 300,000 per minute• 1,00,000 tweets in twitter per minute• 600 new videos are uploaded per minute in yT• Barack Obama used Big Data to win election• Driver-less cars uses Big Data Processing for

driving vehicles

Page 7: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

• AT&T transfers about 30 petabytes of data through its networks each day

• Google processed about 24 petabytes of data per day in 2009

• The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects

Page 8: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

As of January 2013, Facebook users had uploaded over 240 billion photos, with 350 million new photos every day. For each uploaded photo, Facebook generates and stores four images of different sizes, which translated to a total of 960 billion images and an estimated 357 petabytes of storage

Processing capabiltiyGoogle process 20 PB a dayFacebook 2.5 PB of User data + 15 TB/dayebay 6.5 PB of data +50TB/day

Page 9: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

Evolution of Hadoop• Doug Cutting working on Lucene

Project(A Search engine to search document)got problem of Storage and computation, was looking for distributed Processing.

• Google publish a Paper GFS(Google File System)

• Doug cutting & Michael Cafarella implemented GFS to come out with Hadoop

Page 10: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

WHAT IS HADOOP• A framework written in Java for running

applications on large clusters of commodity hardware.

• Mainly contains 2 parts– HDFS for Storing data– Map-Reduce for processing data

• Maintains fault-tolerant using replication factor.

Page 11: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

• employee.txt(eno,ename,empAge,empSal,empDes)• 101,prasad,t20,1000,lead

Page 12: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

Assume you have around 100,00,000,00000,0000000000 records and you would like to find out all the employees above 60 years of age.How do you program them traditionally. 10 GB= 10 min 1 TB= 1000 minutes =16 hoursGoogle process 20 PB of data per dayTo process 20 PB it will take 3200 hours = 133 days

INSPIRATION FOR HADOOPTo store huge data(unlimited)To process huge data

Page 13: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

• Node -A single computer with its own processor and memory.

• Cluster-combination of nodes as a single unit• Commodity Hardware-cheap non-reliable

hardware• Replication Factor-data getting duplicated &

saved in more than one place• Data Local Optimization-data will be processed

locally

Basic Terminology

Page 14: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

• Block:- A part of data• node1 node2 node3• data1 data2 data3• 1 file 200 MB(50MB 50MB 50MB 50MB)

• Block size:- The size of data that can stored as a single unit

• apache hadoop:- 64 MB(configurable)• 1GB in apache hadoop=16 blocks• 65MB(apache)=64MB+ 1MB• Replication:- duplicate the data replication factor is: 3

Page 15: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

NODE CLUSTER

Page 16: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

SCALING• Vertical Scaling

Adding more powerful hardware to an existing system.Will Scale only up to certain limit.

• Horizontal Scaling Adding a completely new node to an existing

cluster. will scale up to many nodes

Page 17: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

3 V's of Hadoop• Volume: The amount of data generated• Variety: structured data,unstructed

data.Database table data• Velocity: The frequency at which data is

generated

Page 18: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

1.hadoop believes on scale out instead of scale up when needed buy more oxes dont grow your oxe more powerful

2.hadoop on structured as well unstructured RDBMS only works with structured data.(However now a days many no-sql database has comeout in the market like mongo db,couch base.)

3.hadoop believes on key-value pair rather than data in the column

HADOOP VS RDMS

Page 19: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

No doubt Hadoop is a framework for processing big data. But it is not the only framework to do so. Below are few more alternative.

Apache SparkGraphLabHPCC Systems- (High Performance Computing Cluster)

DryadStratosphere

HADOOP ALTERNATIVES

Storm R3 Disco Phoenix Plasma

Page 20: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

You can download hadoop from link http://hadoop.apache.org/releases.htmlhttp://apache.bytenet.in/hadoop/common/ · 18 November, 2014: Release 2.6.0 available · 27 June, 2014: Release 0.23.11 available · 1 Aug, 2013: Release 1.2.1 (stable) available

DOWLOADING HADOOP

Page 21: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

1. Name Node2. Secondary Name Node3. Job Tracker4. Task Tracker5. Data Node

HADOOP1. Storing Huge Data 2. Processing Huge Data

Hadoop Daemons

Page 22: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

HADOOP CORE COMPONENTS

Page 23: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

Modes in HadoopStandalone ModePseudo Distributed ModeFully Distributed Mode

Standalone modeIt is the default mode1 nodeNo separate process will be running(daemons)Everything runs in a single JVMSmall development,Test,Debugging

Page 24: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

Pseudo Distributed Mode1. A single node, but cluster will be simulated2. Daemons will run on separate process separate JVMs3. Development and Debugging

Page 25: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

1. Multiple nodes2. Hadoop will run in a cluster of machines/nodes used in Production Environment

Fully Distributed Mode

Page 26: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

Hadoop Architecture

Page 27: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits
Page 28: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

HivePigScoopAvro

ECOsystem Components

FlumeOozieHBaseCassandra

Page 29: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

Job Tracker

Page 30: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

Job Tracker contd..

Page 31: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

Job Tracker contd..

Page 32: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

Job Tracker Contd..

Page 33: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

HDFS write

Page 34: Big Data & Hadoop By Mr.Nataraj smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits

HDFS write