hadoop framework

7
Seminar on

Upload: anita-kadam

Post on 02-Jul-2015

107 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Hadoop framework

Seminar on

Page 2: Hadoop framework

Abstract

The amount total digital data in the world has exploded in recent years.

In 2006, the universal data was estimated to be 0.18 zettabytesin 2006, and is forecasting a tenfold growth

by 2011 to 1.8 zettabytes.

1 zettabyte = 10 21 bytes.

The problem is that while the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up.

One typical drive from 1990 could store 1370 MB of data and had a transfer speed of 4.4 MB/s, so we could read all the data from a full drive in around 300 seconds.

In 2010, 1 Tb drives are the standard hard disk size, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.

Page 3: Hadoop framework

Parallelisation

A very obvious solution to solving this problem is parallelisation. The input data is

usually large and the computations have to be distributed across hundreds or thousands of machines in order to finish in a reasonable amount of time.

Reading 1 Tb from a single hard drive may take a long time, but on parallelizing this over

different machines can solve the problem in 2 minutes.

Page 4: Hadoop framework

Key issues

The key issues involved in this Solution:

Hardware failure

Combine the data after analysis (i.e.

reading)

Page 5: Hadoop framework

Solutions

Hadoop is a framework for running applications on large cluster built ofcommodity hardware. The Hadoopframework transparently provides applications both

reliability and data motion.

It solves the problem of Hardware Failure through replication.

The second problem is solved by a simple programming model- Mapreduce.

Page 6: Hadoop framework

Introduction

Hadoop

Hadoop is an open source framework

for writing and running distributed

applications that process large amounts

of data.

Hadoop is designed to efficiently

process large volumes of information by

connecting many commodity computers

together to work in parallel.

Page 7: Hadoop framework

Features of HADOOP

The features of hadoop that stand out

are its simplified programming model

and its efficient, automatic distribution of

data and work across machines.