couchbase_nosql matters_introduction_to_map_reduce_2013

42
Friday, April 26, 13

Upload: couchbase

Post on 20-Aug-2015

235 views

Category:

Technology


0 download

TRANSCRIPT

Friday, April 26, 13

Introduc)on*to*Map*Reducewith*Couchbase

Tugdual*Grall*/*@tgrall

NoSQL&Ma)ers&‘13&0&Cologne&0&April&25th&2013

Friday, April 26, 13

About*Me*

• Tugdual*“Tug”*Grall­ Couchbase

• Technical.Evangelist

­ eXo

• CTO

­ Oracle

• Developer/Product.Manager

• Mainly.Java/SOA

­ [email protected]

•Web

•@tgrall

• hEp://blog.grallandco.com• tgrall

• NantesJUG.coKfounder• Pet.Project.:• hEp://www.resultri.com

Friday, April 26, 13

What’s*the*Problem*?

Lots&of&DataBig&Data SaaS/Cloud&

CompuDngBig&Users

Friday, April 26, 13

Solu)on

Distribute:•&the&data•&the&processing&of&the&data

Friday, April 26, 13

Map*Reduce*

MapReduce.is.a.programming.model.for.processing.large.data.sets,.and.the.name.of.an.implementa@on.of.the.model.by.Google..MapReduce.is.typically.used.to.do.distributed.compu@ng.on.clusters.of.computers.

hEp://research.google.com/archive/mapreduce.html

Friday, April 26, 13

In*details

• Developer*specifies*2*methods:­ map (in_key, in_value) -> list(out_key, intermediate_value)

• Processes.input.data.

• Produces.key,.values.pairs­ reduce (out_key, list(intermediate_value)) -> list(out_value)

[email protected]

• Produce.a.set.of.merged.output.values

Friday, April 26, 13

Execu)on

Friday, April 26, 13

Most*common*use*case

©.Yahoo.inc.

Friday, April 26, 13

What*about*Couchbase?

Friday, April 26, 13

Couchbase*Open*Source*Project

• Leading.NoSQL.database.project.focused.on.distributed.database.technology.and.surrounding.ecosystem

• Supports.both.keyKvalue.and.documentKoriented.use.cases

• All.components.are.available.under.the.Apache.2.0.Public.License

• Obtained.as.packaged.soXware.in.both.enterprise.and.community.edi@ons.

Couchbase Open Source Project

Friday, April 26, 13

Couchbase*Server*Core*Principles

Easy*Scalability

Consistent*High*Performance

Always*On*24x365

[email protected],[email protected].

single.click

Consistent.subKmillisecond.read.and.write.response.@mes.with.consistent.high.throughput

[email protected],.hardware.maintenance,.etc.

Flexible*Data*Model

JSON.document.model.with.no.fixed.schema.

JSONJSONJSON

JSONJSON

PERFORMANCE

Friday, April 26, 13

Addi)onal*Couchbase*Server*Features

BuiltKin.clustering.–.All.nodes.equal

[email protected]

[email protected].

BuiltKin.managed.cached

AppendKonly.storage.layer

Online.compac@on

Monitoring.and.admin.API.&.UI

SDK.for.a.variety.of.languages

Friday, April 26, 13

Heartbeat

Process.m

onito

r

Glob

al.singleton.supe

rviso

r

Confi

gura@o

n.manager

on.each.node

Rebalance.orchestrator

Nod

e.he

alth.m

onito

r

one.per.cluster

vBucket.state.and

.replica@

on.m

anager

hVpRE

ST*m

anagem

ent*A

PI/W

eb*UI

HTTP8091

Erlang.port.mapper4369

Distributed.Erlang21100&0&21199

Erlang/OTP

storage.interface

Couchbase*EP*Engine

11210Memcapable..2.0

Moxi

11211Memcapable..1.0

Memcached

New*Persistence*Layer

8092Query.API

Que

ry*Engine

Data&Manager Cluster&Manager

Couchbase*Server*2.0*Architecture

Friday, April 26, 13

New*Persistence*Layer

storage.interface

Couchbase*EP*Engine

11210Memcapable..2.0

Moxi

11211Memcapable..1.0

Object]level*Cache

Disk*Persistence

8092Query.API

Que

ry*Engine

HTTP8091

Erlang.port.mapper4369

Distributed.Erlang21100&0&21199

Heartbeat

Process.m

onito

r

Glob

al.singleton.supe

rviso

r

Confi

gura@o

n.manager

on.each.node

Rebalance.orchestrator

Nod

e.he

alth.m

onito

r

one.per.cluster

vBucket.state.and

.replica@

on.m

anager

hVp

REST*m

anagem

ent*A

PI/W

eb*UI

Erlang/OTP

Server/Cluster&Management&&&CommunicaDon

(Erlang)

RAM&Cache,&Indexing&&&Persistence&Management

(C&&&V8)

The Unreasonable Effectiveness of C by Damien Katz

Couchbase*Server*2.0*Architecture

Friday, April 26, 13

COUCHBASE&SERVER&CLUSTER

Basic*Opera)on

• Docs*distributed*evenly*across*servers*• Each*server*stores*both*ac)ve*and*replica*[email protected].@me

• Client*library*provides*app*with*simple*interface*to*database

• Cluster*map*provides*map*to*which*server*doc*is*onApp.never.needs.to.know

• App*reads,*writes,*updates*docs•Mul)ple*app*servers*can*access*same*document*at*same*)me

User.Configured.Replica.Count.=.1

READ/WRITE/UPDATE

ACTIVE

Doc&5

Doc&2

Doc

Doc

Doc

SERVER&1

ACTIVE

Doc&4

Doc&7

Doc

Doc

Doc

SERVER&2

Doc&8

ACTIVE

Doc&1

Doc&2

Doc

Doc

Doc

REPLICA

Doc&4

Doc&1

Doc&8

Doc

Doc

Doc

REPLICA

Doc&6

Doc&3

Doc&2

Doc

Doc

Doc

REPLICA

Doc&7

Doc&9

Doc&5

Doc

Doc

Doc

SERVER&3

Doc&6

APP&SERVER&1

COUCHBASE&Client&LibraryCLUSTER&MAP

COUCHBASE&Client&LibraryCLUSTER&MAP

APP&SERVER&2

Doc&9

Friday, April 26, 13

How.to.access.the.data?

Friday, April 26, 13

Couchbase.get(“my-key”);

Friday, April 26, 13

Key

{....“string”.:.“string”,....“string”.:.value,....“string”.:............{..“string”.:.“string”,...............“string”.:.value.},....“string”.:.[.array.]}

JSONOBJECT

(“DOCUMENT”)

• How*to*find*document*based*on*its*aVributes?­ get.employee.by.email

­ get.products.by.type

­ ...

• You*need*to*look*“into”*the*document/value

Look*at*a*document

Friday, April 26, 13

Create&an&index&!

How*to?

Friday, April 26, 13

{

"name": "Aventinus",

"abv": 8.2,

"ibu": 0,

"srm": 0,

"upc": 0,

"type": "beer",

"brewery_id": "110f1f2012",

"updated": "2010-07-22 20:00:20",

"description": "Dark-ruby,

... Weizenbock",

"category": "German Ale"

}

{

"id": "110f37fa30",

"rev": "1-000000000",

"expiration": 0,

"flags": 0,

"type": "json"

}

Key Value

Aven@nus 8.2

Avenue.Ale 4.1

... ...

{

"name": "Aventinus",

"abv": 8.2,

"ibu": 0,

"srm": 0,

"upc": 0,

"type": "beer",

"brewery_id": "110f1f2012",

"updated": "2010-07-22 20:00:20",

"description": "Dark-ruby,

... Weizenbock",

"category": "German Ale"

}

{

"id": "110f37fa30",

"rev": "1-000000000",

"expiration": 0,

"flags": 0,

"type": "json"

}

{

"name": "Aventinus",

"abv": 8.2,

"ibu": 0,

"srm": 0,

"upc": 0,

"type": "beer",

"brewery_id": "110f1f2012",

"updated": "2010-07-22 20:00:20",

"description": "Dark-ruby,

... Weizenbock",

"category": "German Ale"

}

{

"id": "110f37fa30",

"rev": "1-000000000",

"expiration": 0,

"flags": 0,

"type": "json"

}

{

"name": "Aventinus",

"abv": 8.2,

"ibu": 0,

"srm": 0,

"upc": 0,

"type": "beer",

"brewery_id": "110f1f2012",

"updated": "2010-07-22 20:00:20",

"description": "Dark-ruby,

... Weizenbock",

"category": "German Ale"

}

{

"id": "110f37fa30",

"rev": "1-000000000",

"expiration": 0,

"flags": 0,

"type": "json"

}

{

"name": "Aventinus",

"abv": 8.2,

"ibu": 0,

"srm": 0,

"upc": 0,

"type": "beer",

"brewery_id": "110f1f2012",

"updated": "2010-07-22 20:00:20",

"description": "Dark-ruby,

... Weizenbock",

"category": "German Ale"

}

{

"id": "110f37fa30",

"rev": "1-000000000",

"expiration": 0,

"flags": 0,

"type": "json"

}

{

"name": "Aventinus",

"abv": 8.2,

"ibu": 0,

"srm": 0,

"upc": 0,

"type": "beer",

"brewery_id": "110f1f2012",

"updated": "2010-07-22 20:00:20",

"description": "Dark-ruby,

... Weizenbock",

"category": "German Ale"

}

{

"id": "110f37fa30",

"rev": "1-000000000",

"expiration": 0,

"flags": 0,

"type": "json"

}

{

"name": "Aventinus",

"abv": 8.2,

"ibu": 0,

"srm": 0,

"upc": 0,

"type": "beer",

"brewery_id": "110f1f2012",

"updated": "2010-07-22 20:00:20",

"description": "Dark-ruby,

... Weizenbock",

"category": "German Ale"

}

{

"id": "110f37fa30",

"rev": "1-000000000",

"expiration": 0,

"flags": 0,

"type": "json"

}

{

"name": "Aventinus",

"abv": 8.2,

"ibu": 0,

"srm": 0,

"upc": 0,

"type": "beer",

"brewery_id": "110f1f2012",

"updated": "2010-07-22 20:00:20",

"description": "Dark-ruby,

... Weizenbock",

"category": "German Ale"

}

{

"id": "110f37fa30",

"rev": "1-000000000",

"expiration": 0,

"flags": 0,

"type": "json"

}

{

"name": "Aventinus",

"abv": 8.2,

"ibu": 0,

"srm": 0,

"upc": 0,

"type": "beer",

"brewery_id": "110f1f2012",

"updated": "2010-07-22 20:00:20",

"description": "Dark-ruby,

... Weizenbock",

"category": "German Ale"

}

{

"id": "110f37fa30",

"rev": "1-000000000",

"expiration": 0,

"flags": 0,

"type": "json"

}

Create*the*index

Friday, April 26, 13

Concrete*Example

• This*map*func)on:­ receives.the.document.and.metadata

­ as.developer.you.just.have.to.emit.the.K,V

Friday, April 26, 13

Map*Func)on

Text

Friday, April 26, 13

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

ye@@couchbase.com u::4

[email protected] u::3

?startkey=”b1”&&&endkey=”zz”

Pulls.the.IndexKKeys.between.UTFK8.Range.specified.by.the.startkey.and.endkey.

?startkey=”bz”&&&endkey=”zn”

Pulls.the.IndexKKeys.between.UTFK8.Range.specified.by.the.startkey.and.endkey.

Friday, April 26, 13

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

ye@@couchbase.com u::4

[email protected] u::3

?key=”[email protected]”&

Match.a.Single.IndexKKey

Friday, April 26, 13

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

ye@@couchbase.com u::4

[email protected] u::3

?keys=[“[email protected]”,“[email protected]”]

[email protected].(Array.Nota@on)

Friday, April 26, 13

How.it.works.?

Friday, April 26, 13

COUCHBASE&SERVER&&CLUSTER

Indexing*and*Querying*

User.Configured.Replica.Count.=.1

ACTIVE

Doc&5

Doc&2

Doc

Doc

Doc

SERVER&1

REPLICA

Doc&4

Doc&1

Doc&8

Doc

Doc

Doc

APP&SERVER&1

COUCHBASE&Client&LibraryCLUSTER&MAP

COUCHBASE&Client&LibraryCLUSTER&MAP

APP&SERVER&2

Doc&9

• Indexing*work*is*distributed*amongst*nodes

• Large*data*set*possible

• Parallelize*the*effort

• Each*node*has*index*for*data*stored*on*it

• Queries*combine*the*results*from*required*nodes

ACTIVE

Doc&5

Doc&2

Doc

Doc

Doc

SERVER&2

REPLICA

Doc&4

Doc&1

Doc&8

Doc

Doc

Doc

Doc&9

ACTIVE

Doc&5

Doc&2

Doc

Doc

Doc

SERVER&3

REPLICA

Doc&4

Doc&1

Doc&8

Doc

Doc

Doc

Doc&9

Query

Friday, April 26, 13

Couchbase*Server*2.0:*Views

• Views*can*cover*a*few*different*use*cases­ Primary.Index.

­ Simple.secondary.indexes.(the.most.common)

­ Complex.secondary,[email protected]

­ [email protected]@ons.(reduc@on)

• Example:.count.the.number.of.“North.American.Ales”

­ Organizing.related.data

• Built*using*Map/Reduce­ [email protected]

­ [email protected].(reduces).informa@on

Friday, April 26, 13

Distributed*Index*Build*Phase

• Op)mized*for*lookups,*in]order*access*and*aggrega)ons

• All*view*reads*from*disk*(different*performance*profile)

• View*builds*against*every*document*on*every*node­ This.is.why.you.should.group.them.in.a.design.document

• Automa)cally*kept*up*to*date­ “Incremental”.Map.Reduce

Friday, April 26, 13

Dynamic(Range(Queries(with(Op5onal(Aggrega5on

•Efficiently.fetch.an.row.or.group.of.related.rows.•Queries.use.cached.values.from.BKtree.inner.nodes.when.possible•Take.advantage.of.inKorder.tree.traversal.with.group_level.queries

Doc.4

Doc.2

Doc.5

SERVER*1

Doc.6

Doc.4

SERVER*2

Doc.7

Doc.1

SERVER*3

Doc.3

Doc.9

Doc.7

Doc.8 Doc.6

Doc.3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc.9

Doc.5

DOC

DOC

DOC

Doc.1

Doc.8 Doc.2

Replica.Docs Replica.Docs Replica.Docs

[email protected] [email protected] [email protected]

?startkey=“J”&endkey=“K”

{“rows”:[{“key”:“Juneau”,“value”:null}]}

Friday, April 26, 13

Append*Only*Index

• Disk&acDvity&is&slow

• UpdaDng&disk&blocks&is&very&slow

• Appending&new&data&to&the&end&of&the&current&file&is&fast

• Overhead&of&reverse&reading&is&small

• Because&exisDng&blocks&are&not&re0used,&can&lead&to&fragmentaDon­ Couchbase.will.compact.the.index.automa@cally

DocView

Processor Disk

DocView

Processor

Changed Documents

Appended

Original

Friday, April 26, 13

Adding*a*new*Document

A-R15

I-R8

M-R5

A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

N-R4

A-H7

I-R7

A-R14

M

new root

new key

new reductions

Friday, April 26, 13

What*about*Reduce*?

• Out*of*the*box*func)ons*:­ _count()

­ _sum()

­ _stats()

• Create*your*own*if*neededfunction(key, values, rereduce) { if (rereduce) { var result = 0; for (var i = 0; i < values.length; i++) { result += values[i]; } return result; } else { return values.length; }}

Friday, April 26, 13

Reduce*Func)on

• Key*and*Arrays*of*values*as*parameters

•WriVen*Javascript

• Called*aner*the*map*func)on

• Used*to*reduce*the*result*of*a*map*of*single*values

• Used*with*grouping• Could*be*ignored*when*querying­ reuse.the.index

Friday, April 26, 13

•Map()*Result

• Reduce()

• Result

Reduce*in*Ac)onKey Value

BelgianKStyle.Dubbel 1

BelgianKStyle.Dubbel 1

BelgianKStyle.Dubbel 1

BelgianKStyle.Pale.Ale 1

BelgianKStyle.White 1

BelgianKStyle.White 1

... ...

_count()

Key Value

BelgianKStyle.Dubbel 3BelgianKStyle.Pale.Ale 1

BelgianKStyle.White 2

Friday, April 26, 13

How*to*use*it?

• Use*client*SDK*to*call*the*view:

View view = client.getView("beer", "by_name");Query query = new Query(); query.setIncludeDocs(true)     .setLimit(20)     .setRangeStart(ComplexKey.of(startKey))     .setRangeEnd(ComplexKey.of(startKey + "\uefff"));

ViewResponse result = client.query(view, query); for(ViewRow row : result) { ....}

Friday, April 26, 13

Demonstra)on

Friday, April 26, 13

≠Hadoop*&*Couchbase

• Deal&with&“Big&Data”

• “More”&is&be)er&than&“Faster”

• Batch&Oriented

• Usually&used&to&“extract/transform”&data

• Fully&distributed

­ Map,.Shuffle,.Reduce

• Distributed&

• Executed&where&the&document&is

• Deal&with&“indexing”&data&

• As&fast&as&possible

• Use&to&query&the&data&in&the&Database

Friday, April 26, 13

Map*Reduce*in*Couchbase

• Like*many*other*NoSQL*Database*:*Used*for*queries*!*

• Index*are*distributed*on*each*node*of*the*cluster• Index*are*updated*Incrementally

•Write*you*Map*Reduce*in*Javascript

Friday, April 26, 13

[email protected]

@tgrall

Get.Couchbase.Server.at.hEp://www.couchbase.com/download

Friday, April 26, 13

Friday, April 26, 13