big data & hadoop & how we use it at alchetron

51
Brief BIG DATA & HADOOP Alchetron.com Free Social Encyclopedia

Upload: paul-jr

Post on 14-Apr-2017

337 views

Category:

Technology


2 download

TRANSCRIPT

Page 2: Big data & Hadoop & How we use it at Alchetron

BIG DATA HADOOPHDFSMAP-REDUCEALCHETRONFEEDBACKSQ/A

Page 3: Big data & Hadoop & How we use it at Alchetron

BIG DATA & HADOOP

+

To understand BIG DATA we will have to understand data first !!!

Page 4: Big data & Hadoop & How we use it at Alchetron

THIS DRAWING WAS CREATED 40,000 YEARS AGO THIS WAS THE FIRST TIME WHEN HUMANS STARTED RECORDING DATA

Page 5: Big data & Hadoop & How we use it at Alchetron

AS TIME PASSED WE STARTED CREATING MORE DATA AS YOU CAN SEE IN THIS PIC WHICH IS 3000-

10,000 YEARS OLD

STONE TABLETS

Page 6: Big data & Hadoop & How we use it at Alchetron

This man invented printing machine in

1439 that means more data is

collected than before

Johannes Gutenberg

Page 7: Big data & Hadoop & How we use it at Alchetron

100 crore books printed till 18th century & my dear friends you are still not born…..

Page 8: Big data & Hadoop & How we use it at Alchetron

THIS GUY INVENTS INTERNET IN 1991

SIR Tim Berners-Lee Invents Internet in 1991 now with internet the amount of data generatedby mankind explodes !!

Page 9: Big data & Hadoop & How we use it at Alchetron
Page 10: Big data & Hadoop & How we use it at Alchetron

30 years of mobile Technology

Page 11: Big data & Hadoop & How we use it at Alchetron

30 years of mobile Technology

Page 12: Big data & Hadoop & How we use it at Alchetron
Page 13: Big data & Hadoop & How we use it at Alchetron

Next 20 years Computing will move on to Microscopic levelComputers wont be in our pockets but inside our body & mindThis is where Technology & Biology will merge which will multiply and enhance our capabilities a thousand times

30 years of mobile Technology

Page 14: Big data & Hadoop & How we use it at Alchetron
Page 15: Big data & Hadoop & How we use it at Alchetron

Technological change will be so rapid & exponential

Page 16: Big data & Hadoop & How we use it at Alchetron

With invention of internet + small & less expensive storage devices !! Data creation explodes

Page 17: Big data & Hadoop & How we use it at Alchetron

Data generation statisticsDith invention of internet + small & less expensive storage devices !! Data creation explodes2.7 Zetabytes of data exist in the digital universe today

Facebook stores, accesses, and analyzes 50+ Petabytes of user generated data.

Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data

More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide.

YouTube users upload 48 hours of new video every minute of the day.

In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day

Page 18: Big data & Hadoop & How we use it at Alchetron

With invention of internet data creation explodesSO WHAT IS BIG DATA ??

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere : sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.

This data is big data.

Page 19: Big data & Hadoop & How we use it at Alchetron

With invention of internet data creation explodes

Page 20: Big data & Hadoop & How we use it at Alchetron

With invention of internet data creation explodes

Page 21: Big data & Hadoop & How we use it at Alchetron

With invention of internet data creation explodes

Page 22: Big data & Hadoop & How we use it at Alchetron

With invention of internet data creation explodes

Page 23: Big data & Hadoop & How we use it at Alchetron

Who will manage BIG DATA

Page 24: Big data & Hadoop & How we use it at Alchetron

HADOOP

Open Source Apache ProjectWritten in Java

Runs on Linux, Mac OS/X, Windows, and Solaris

Commodity hardware

Page 25: Big data & Hadoop & How we use it at Alchetron

Contents

• History of Hadoop• The current applications of Hadoop• Hadoop HDFS + MAP-REDUCE• Other hadoop projects

Page 26: Big data & Hadoop & How we use it at Alchetron

Fun Fact of Hadoop

"The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria.                    

---- Doug Cutting, Hadoop project creator

Page 27: Big data & Hadoop & How we use it at Alchetron

History of Hadoop

Apache Nutch

Doug Cutting

“Map-reduce” 2004

“It is an important technique!”Reads paper

Extended

Joins Yahoo! at 2006

The great journey begins…

Page 28: Big data & Hadoop & How we use it at Alchetron

History of Hadoop• Yahoo! became the primary contributor in

2006

Page 29: Big data & Hadoop & How we use it at Alchetron

History of Hadoop• Yahoo! deployed large scale science clusters in

2007. • Tons of Yahoo! Research papers emerge:

– WWW– CIKM– SIGIR

• Yahoo! began running major production jobs in Q1 2008.

Page 30: Big data & Hadoop & How we use it at Alchetron

Hadoop consists of 2 parts.They are HDFS & MapReduce.

Page 31: Big data & Hadoop & How we use it at Alchetron

HDFS

Namenodes & Datanodes are nothing but machines which helps the client to store data.Metadata is stored in namenode & actual data is stored in datanodes

Page 32: Big data & Hadoop & How we use it at Alchetron

A TaskTracker is a daemon and works on datanode and is a node in the cluster that accepts tasks - Map, Reduce and Shuffle operations - from a Jobtracker.

A JobTracker is a daemon and works on namenodeand also farms out MapReduce tasks to specific nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack.

Page 33: Big data & Hadoop & How we use it at Alchetron
Page 34: Big data & Hadoop & How we use it at Alchetron
Page 35: Big data & Hadoop & How we use it at Alchetron

Map-Reduce Architecture

Map-reduce is basically a data processing engine

To understand it deeply you should know java coding with experience

Lets try to learn the architecture of map-reduce

Page 36: Big data & Hadoop & How we use it at Alchetron
Page 37: Big data & Hadoop & How we use it at Alchetron

An example

Page 38: Big data & Hadoop & How we use it at Alchetron

BORED ALMOST THERE

Page 39: Big data & Hadoop & How we use it at Alchetron

BORED ALMOST THERE

JUST ONE MORE CODE

Page 40: Big data & Hadoop & How we use it at Alchetron

Another Example code

Page 41: Big data & Hadoop & How we use it at Alchetron

Now a days (as per latest job market)…• Software Developer Intern - IBM - Somers, NY +3 locations- Agile development - Big data / Hadoop /

data analytics a plus• Software Developer - IBM - San Jose, CA +4 locations - include Hadoop-powered distributed parallel

data processing system, big data analytics ... multiple technologies, including Hadoop

Page 42: Big data & Hadoop & How we use it at Alchetron

Other Hadoop Projects Ecosystem• Hadoop Core

– Distributed File System– MapReduce Framework

• Pig (initiated by Yahoo!)– Parallel Programming Language and Runtime

• Hbase (initiated by Powerset)– Table storage for semi-structured data

• Zookeeper (initiated by Yahoo!)– Coordinating distributed systems

• Hive (initiated by Facebook)– SQL-like query language and metastore

Page 43: Big data & Hadoop & How we use it at Alchetron
Page 44: Big data & Hadoop & How we use it at Alchetron

TYPICAL HADOOP CLUSTER HANDLING & PROCESSING PETA BYTES OF DATA

1000 TB = 1 PETA BYTE APPROX..

Page 45: Big data & Hadoop & How we use it at Alchetron

Now a days… Who use Hadoop?

• Amazon/A9• Alchetron• Fox interactive media• Google • IBM• Facebook• Quantcast• Rackspace/Mailtrust• Veoh• Yahoo!• More at http://wiki.apache.org/hadoop/PoweredBy

Page 46: Big data & Hadoop & How we use it at Alchetron

Lets see how we Implemented this at

Page 47: Big data & Hadoop & How we use it at Alchetron

When you visit Alchetron.comyou are interacting with data processedwith Hadoop

Page 48: Big data & Hadoop & How we use it at Alchetron

When you visit Alchetron.comyou are interacting with data processedwith Hadoop!!

Search Index

Search Index

When you visit Alchetron.comyou are interacting with data processedwith Hadoop !!

Page 49: Big data & Hadoop & How we use it at Alchetron

Organizing data

Page 50: Big data & Hadoop & How we use it at Alchetron

Content Filtering

Page 51: Big data & Hadoop & How we use it at Alchetron

References• For more information:

– http://hadoop.apache.org/– http://developer.yahoo.com/hadoop/– http://alchetron.com/What-is-Big-data-1530-W– http://alchetron.com/Big-Data-Hadoop-260-W