hadoop & no sql new generation database systems

Post on 27-Jan-2015

129 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

This document is intended for only AVEA İletişim Hizmetleri A.Ş.("AVEA"), its dealers, employees and/or others specifically authorised. The contents of this document are

confidential and any disclosure, copying, distribution and/or taking any action in reliance with the content of this document is prohibited. AVEA is not liable for the transmission

of this document in any manner to any third parties that are not authorised to receive.

Hadoop & NoSQL

New Generation Database Systems

Ramazan FIRIN

22.04.2014

2

AGENDA

• Big Data

• Hadoop

• NoSQL

• Graph DB and Neoj

• Possible Usage in Tellco

• Demo

3

Executive Summary

AVEA

• Big Data is a new IT trend

• Hadoop and NoSQL can used to process Big Data

• Possible usage area in Tellco :- Prevent Churn

- to offer customer spesific campaign

- to get more customer

4

Big Bang = Big Data

Big Bang Big Data

42008-07-01_Presentation Template MBT / CEOMercedes-Benz Türk A.Ş.

5

What is Big Data?

Datasets that are too awkward to work with using traditional,

hands-ondatabase management tools.

6

Big Data- 3V Concept

7

Big Data To Smart Data

Cover of The Economist

8

Big Data Sources

1. Social network profiles -Facebook, LinkedIn, Yahoo, Google

2. Social influencers - blog comments, user forums, review sites,

3. Activity-generated data - application logs, sensor data

4. Public—Wikipedia, IMDb, etc

5. Data warehouse appliances - transactional data

6. Network and in-stream monitoring

7. Legacy documents—

9

Big Data Approach

10

Sample Usage - 360°Degree View of the Customers

11

Big Data Solutions – Oracle Big Data Appliance

12

Big Data Solutions – IBM Pure Data

13

Storage for Big Data

13

İf we cant use relational Database, how can westore it?

1)Hadoop2)NoSQL

14

What is HADOOP?

The Apache Hadoop software library is a framework that

allows for the distributed processing of large data sets

across clusters of computers using simple programming models

15

History

16

Hadoop Components

17

HADOOP ARCHITECTURE

18

Hadoop Ecosystem

Pig - simplifies hadoop programming, data processing language

Hive - SQL like queries

HBase - Random read/write, billions of row and millions of colums

(NoSQL)

19

NoSQL

20

RDBMS PERFORMANCE

20

21

Join is killer...

21

22

What is NoSQL?

• Stands for Not Only SQL

• Non relational

• Cheap, Easy to implement

• Scalability

– Vertically - Add more data

– Horizontally - Add more storage

• No pre-defined schema

• No join operations

• Not ACID, support CAP threom

23

Key-Value Stores

- Redis, Voldemort

24

Redis Features

• Data Types

• Publish / Subscribe

• Transactions

• Replication

• Persistence

• Partition

24

25

Redis Datatypes

• String

• List

• Sets

• Sorted Sets

• Hashes

25

26

Redis persistance

• RDB - Take snapshot in an interval

Fast

may loss several minutes data if kill -9

• AOF – Log for all operations

Still fast enough

may loss 1 second data if kill -9

26

27

Redis Commands

$ redis-cli set counter 100 OK

$ redis-cli incr counter (integer) 101

$ redis-cli incr counter (integer) 102

$ redis-cli incrby counter 10 (integer) 112

SET : SADD,

GET : SPOP, SRANDMEMBER, SMEMBERS

DEL : SREM

ETC : SINTER, SUNION, SCARD, SDIFF, SMOVE, SISMEMBER

27

28

Redis Commands – Lists

$redis-cli rpush messages "Hello how are you?" OK

$ redis-cli rpush messages "Fine thanks. I'm having fun with Redis"

OK

$ redis-cli rpush messages "I should look into this NOSQL thing

ASAP" OK

$ redis-cli lrange messages 0 2

1. Hello how are you?

2. 2. Fine thanks. I'm having fun with Redis

3. 3. I should look into this NOSQL thing ASAP

• Chat systems

• Paginations...28

29

Redis – Publish/Subscribe

redis 127.0.0.1:6379> PUBLISH myradioshow "Good morning

everyone!" (integer) 0

redis 127.0.0.1:6379> PUBLISH myradioshow "How ya'll doin

tonight?" (integer) 0

redis 127.0.0.1:6379> PUBLISH myradioshow "Hello? Is anyone

listening? I'm not wearing pants."

(integer) 0

redis 127.0.0.1:6379> SUBSCRIBE myradioshow

Reading messages... (press Ctrl-C to quit)

1) "subscribe"

2) 2) "myradioshow"

3) 3) (integer) 1 29

30

Document Database

- CouchDB, MongoDB

31

MongoDB Features

• JSON / BSON support

• RestFul support

• CRUD operations

• Queries like SQL

• İndexing

• Auto sharding

• Built in replication and high availabity

• Aggregation framework

31

32

Terminology

32

33

Sharding

33

34

MondoDB vs SQL

34

SQL MongoDB

SELECT * FROM users db.users.find()

SELECT id, user_id, status FROM users db.users.find( { }, { user_id: 1, status:

1 } )

SELECT * FROM users WHERE status

= "A"db.users.find( { status: "A" } )

SELECT user_id, status FROM users

WHERE status = "A"

db.users.find( { status: "A" }, {

user_id: 1, status: 1, _id: 0 } )

SELECT * FROM users WHERE

user_id like "%bc%"db.users.find( { user_id: /bc/ } )

SELECT * FROM users WHERE status

= "A" ORDER BY user_id ASC

db.users.find( { status: "A" } ).sort( {

user_id: 1 } )

SELECT * FROM users LIMIT 5 SKIP

10db.users.find().limit(5).skip(10)

35

Column Family Stores

-Cassandra, HBase

36

Cassandra Features

• Proven

• Rich Data Model

• Scalable

• Distributed & Decentralized

• High Performance read/write

• Fault Tolerance

• No SPOF

• Schema free

36

37

Cassandra Cluster

37

38

Benhmark

38

39

Architecture

39

40

Consistency Level

• ANY

• ONE

• TWO

• THREE

• QUORUM

• LOCAL_QUORUM

• EACH_QUORUM

• ALL

40

41

RMDBS Support ACID

• Atomicity - a transaction is all or nothing

• Consistency - only valid data is written to the database

• Isolation - pretend all transactions are happening serially and the data

is correct

• Durability - what you write is what you get

42

NoSQL Support CAP Threom

Consistency : all nodes give the same

answer

Avaibility : nodes always give answer and

accept updates

Partitioning: system continuos working if

some nodes go quite

43

Visual Guide to NoSQL Systems

43

44

Graph Database

- Neo4J, InfoGrid, Infinite Graph

45

Graph DB

Graph database uses graph structures with nodes, edges, and properties

to represent and store data.

46

NoSQL Performance

47

Graph DB Usage Area

• Recommendations

• Business Inteligence

• Social networking

• MDM

• System Management

• Time Series data

• Product Catalogue

• Web Analitics

• Scientific Computing

• Indexing your slow

RMDBS

48

Neo4j

49

Neo4j

• Leading Graph Database

• Transaction support (ACID)

• Indexing

• Querying

• REST support

• Disk Based

• Opensource

• Traversal framework

• High Performance (traverse 1.000.000 + relationship/seconds)

• Robust (in 7/24 operation since 2003)

• Massive scalability

50

Neo4j Data Model

Neo4j has Nodes and Relationship.

Nodes and realtionships have properties.

Node1 Node2

Property:name

Property:surname

Property:name

Property:surname

Relationship

Relationship type : knows

Property : Date of meeting

51

Relational Databases are Graphs!

52

Cypher For Query

53

Ne4j Performance

http://www.neotechnology.com/2012/10/20-billion-relationships-imported-

into-neo4j-on-ec2/

54

Who use Neo4j?

• Cisco - Master Data Management

• Telenor Group : Customer organization scructure (203 million

subscribers )

• Deutsche Telekom: Social football site (150 million subscribers )

55

Orient DB

• The Document-Graph

database

• ACID support

• SQL and Native Queries,

• schema-less, schema-full

and schema-mixed modes

• Roles + Security

• Functions

• HTTP / Restfull / Json /

Binary supports

• Hooks

• Fetch plans

• Inheritance

• 200.000 insert per

second(6 M node travels

with cache)

56

FluxGraph

• Temporal Graph Database

• Has checkpoint

• Compatible with Neo4j

562008-07-01_Presentation Template MBT / CEOMercedes-Benz Türk A.Ş.

57

Graphs of Telecommunications

57

58

CDR Analysis by Graph

58

59

Spring Data

59

60

Spring Data Neo4j

61

NoSQL Usage

• Cisco is building a master data management system based on Neo4j, and this is

actually our first Fortune 500 customer. They found us about two years ago when they

tried to build this big, complex hierarchy inside of Oracle RAC. In Oracle RAC, they had

response time in minutes, and then when they replaced it [with] Neo4j, they had

response times in milliseconds.

Emil Eifrem – Neo4j

CEO

• NHS tears out its Oracle Spine in favour of open source

http://www.theregister.co.uk/2013/10/10/nhs_drops_oracle_for_riak/

• AMD: Why we had to evacuate 276TB from Oracle DB to Hadoop

http://www.theregister.co.uk/2014/03/24/amd_hadoop_migration/

61

62 62

Statistics

63

Magic Quadrant for Operational Database Management Systems

63

64

NoSQL Market Size

64

65

NoSQL Engine Ranking

65

66

NoSQL in Enterprise App

66

67

Use of NoSQL products

67

68

Database market share

68

69

Web Application Arcitecture

69

70

Polyglot Persistance

70

71

Thanks

top related