puja(801),sanghamitra(819),surabhi(844)

30
BIG DATA BY : PUJA SINGH (801) SANGHAMITRA BAL (819) SURABHI SINHA(844) T. XAVIER’S COLLEGE,RANCHI

Upload: puja-singh

Post on 16-Feb-2017

60 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Puja(801),sanghamitra(819),surabhi(844)

BIG DATABY : PUJA SINGH (801)SANGHAMITRA BAL (819)SURABHI SINHA(844)

ST. XAVIER’S COLLEGE,RANCHI

Page 2: Puja(801),sanghamitra(819),surabhi(844)

What is Big Data?• The buzz word.

• The misnomer .

• No Single Standard Definition

Page 3: Puja(801),sanghamitra(819),surabhi(844)

“Big Data” is data whose scale, diversity, and Complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it...

Page 4: Puja(801),sanghamitra(819),surabhi(844)

Characteristics of Big Data : 3 V’s

Page 5: Puja(801),sanghamitra(819),surabhi(844)

Characteristics of Big Data : Volume

Facts : • IBM estimates 2.5 quintillion

bytes of data are generated each day.

• 90% of the data in the world is less than two years old.

Page 6: Puja(801),sanghamitra(819),surabhi(844)

Some more facts :

• 230 millions tweets a day.

• 24 petabytes processed by Google everyday.

• 29 petabytes used by National Security Agency.

• CERN’s Large Hydron Collider (LHC) generates 15 PB a year.

• NASA’s EOS program used 4.2 petabytes (2010)

Page 7: Puja(801),sanghamitra(819),surabhi(844)

Maximilien Brice, © CERN

Characteristics of Big Data : Variety

Various Formats of Data :• Black Box Data• Social Media Data• Stock Exchange

Data• Power Grid Data• Transport Data• Search Engine Data

Page 8: Puja(801),sanghamitra(819),surabhi(844)

The three types of Data :

Structured data: Relational data. Semi Structured data: XML data. Unstructured data: Word, PDF, Text, Media Logs.

Page 9: Puja(801),sanghamitra(819),surabhi(844)
Page 10: Puja(801),sanghamitra(819),surabhi(844)

Measure of value of Data

1. Timely2. Accessible3. Holistic4. Trustworthy5. Relevant6. Secure7. Authoritive8. Actionable

Page 11: Puja(801),sanghamitra(819),surabhi(844)

Characteristics of Big Data : Velocity

• Data is begin generated fast and need to be processed fast :

Online Data Analytics.• Performing analytics against volume and

variety of Data while it is still in motion.• Late decisions missing opportunities

Page 12: Puja(801),sanghamitra(819),surabhi(844)

12

Big Data: 3V’s

Page 13: Puja(801),sanghamitra(819),surabhi(844)

13

Some Make it 4V’s

Page 14: Puja(801),sanghamitra(819),surabhi(844)
Page 15: Puja(801),sanghamitra(819),surabhi(844)

Benefits of Big Data It creates transparency to increase efficiency. It allows for better analysis of employee and

systems performances. It can replace and support human decision

making with automated algorithms. It creates innovative business models, products

and services.

Page 16: Puja(801),sanghamitra(819),surabhi(844)
Page 17: Puja(801),sanghamitra(819),surabhi(844)
Page 18: Puja(801),sanghamitra(819),surabhi(844)
Page 19: Puja(801),sanghamitra(819),surabhi(844)

Key enablers for the growth of "Big Data" are:

◦ Increase of storage capacities ◦ Increase of processing power

◦ Availability of data

Page 20: Puja(801),sanghamitra(819),surabhi(844)

Big Data Challenges Capturing data Curation Storage Searching Sharing Transfer Analysis Presentation

Page 21: Puja(801),sanghamitra(819),surabhi(844)
Page 22: Puja(801),sanghamitra(819),surabhi(844)

Concerns and LimitationsPrivacy and Security Issues : Significant opportunity for malicious data input

and inadequate data validation. User authentication and access to data from

multiple locations may not be sufficiently controlled.

Page 23: Puja(801),sanghamitra(819),surabhi(844)
Page 24: Puja(801),sanghamitra(819),surabhi(844)

Big Data has the potential for significant negative impacts that may be impossible to avoid.

Concerns about the motives of government and corporations.

The rich will profit from big data and the poor will not.

Humans will still be most capable of making decisions using Big Data. Statistics can still lie.

Page 25: Puja(801),sanghamitra(819),surabhi(844)

Taming Big Data

Following practices must be adopted to work with Big Data :

i. Availabilityii. Managementiii. Disaster Recoveryiv. Provisioningv. Optimizationvi. Backup & Restorevii. Securityviii.Governanceix. Auditingx. Replicationxi. Virtualizationxii. Archiving

Page 26: Puja(801),sanghamitra(819),surabhi(844)
Page 27: Puja(801),sanghamitra(819),surabhi(844)

Big Data Technology

Page 28: Puja(801),sanghamitra(819),surabhi(844)

Big Data Solutions

Google’s Solution Google solved the problem of processing big data through a single database bottleneck, using an algorithm called MapReduce. This algorithm divides the task into small parts and assigns them to many computers, and collects the results from them which when integrated, form the result dataset.

Page 29: Puja(801),sanghamitra(819),surabhi(844)

Hadoop

It is an Open Source Project developed by Doug Cutting. Hadoop is used to develop applications that could perform complete statistical analysis on huge amounts of data.

Page 30: Puja(801),sanghamitra(819),surabhi(844)

THANK YOU