couchbase_nosql matters_introduction_to_map_reduce_2013
TRANSCRIPT
Introduc)on*to*Map*Reducewith*Couchbase
Tugdual*Grall*/*@tgrall
NoSQL&Ma)ers&‘13&0&Cologne&0&April&25th&2013
Friday, April 26, 13
About*Me*
• Tugdual*“Tug”*Grall Couchbase
• Technical.Evangelist
eXo
• CTO
Oracle
• Developer/Product.Manager
• Mainly.Java/SOA
•Web
•@tgrall
• hEp://blog.grallandco.com• tgrall
• NantesJUG.coKfounder• Pet.Project.:• hEp://www.resultri.com
Friday, April 26, 13
Map*Reduce*
MapReduce.is.a.programming.model.for.processing.large.data.sets,.and.the.name.of.an.implementa@on.of.the.model.by.Google..MapReduce.is.typically.used.to.do.distributed.compu@ng.on.clusters.of.computers.
hEp://research.google.com/archive/mapreduce.html
Friday, April 26, 13
In*details
• Developer*specifies*2*methods: map (in_key, in_value) -> list(out_key, intermediate_value)
• Processes.input.data.
• Produces.key,.values.pairs reduce (out_key, list(intermediate_value)) -> list(out_value)
• Produce.a.set.of.merged.output.values
Friday, April 26, 13
Couchbase*Open*Source*Project
• Leading.NoSQL.database.project.focused.on.distributed.database.technology.and.surrounding.ecosystem
• Supports.both.keyKvalue.and.documentKoriented.use.cases
• All.components.are.available.under.the.Apache.2.0.Public.License
• Obtained.as.packaged.soXware.in.both.enterprise.and.community.edi@ons.
Couchbase Open Source Project
Friday, April 26, 13
Couchbase*Server*Core*Principles
Easy*Scalability
Consistent*High*Performance
Always*On*24x365
[email protected],[email protected].
single.click
Consistent.subKmillisecond.read.and.write.response.@mes.with.consistent.high.throughput
[email protected],.hardware.maintenance,.etc.
Flexible*Data*Model
JSON.document.model.with.no.fixed.schema.
JSONJSONJSON
JSONJSON
PERFORMANCE
Friday, April 26, 13
Addi)onal*Couchbase*Server*Features
BuiltKin.clustering.–.All.nodes.equal
BuiltKin.managed.cached
AppendKonly.storage.layer
Online.compac@on
Monitoring.and.admin.API.&.UI
SDK.for.a.variety.of.languages
Friday, April 26, 13
Heartbeat
Process.m
onito
r
Glob
al.singleton.supe
rviso
r
Confi
gura@o
n.manager
on.each.node
Rebalance.orchestrator
Nod
e.he
alth.m
onito
r
one.per.cluster
vBucket.state.and
.replica@
on.m
anager
hVpRE
ST*m
anagem
ent*A
PI/W
eb*UI
HTTP8091
Erlang.port.mapper4369
Distributed.Erlang21100&0&21199
Erlang/OTP
storage.interface
Couchbase*EP*Engine
11210Memcapable..2.0
Moxi
11211Memcapable..1.0
Memcached
New*Persistence*Layer
8092Query.API
Que
ry*Engine
Data&Manager Cluster&Manager
Couchbase*Server*2.0*Architecture
Friday, April 26, 13
New*Persistence*Layer
storage.interface
Couchbase*EP*Engine
11210Memcapable..2.0
Moxi
11211Memcapable..1.0
Object]level*Cache
Disk*Persistence
8092Query.API
Que
ry*Engine
HTTP8091
Erlang.port.mapper4369
Distributed.Erlang21100&0&21199
Heartbeat
Process.m
onito
r
Glob
al.singleton.supe
rviso
r
Confi
gura@o
n.manager
on.each.node
Rebalance.orchestrator
Nod
e.he
alth.m
onito
r
one.per.cluster
vBucket.state.and
.replica@
on.m
anager
hVp
REST*m
anagem
ent*A
PI/W
eb*UI
Erlang/OTP
Server/Cluster&Management&&&CommunicaDon
(Erlang)
RAM&Cache,&Indexing&&&Persistence&Management
(C&&&V8)
The Unreasonable Effectiveness of C by Damien Katz
Couchbase*Server*2.0*Architecture
Friday, April 26, 13
COUCHBASE&SERVER&CLUSTER
Basic*Opera)on
• Docs*distributed*evenly*across*servers*• Each*server*stores*both*ac)ve*and*replica*[email protected].@me
• Client*library*provides*app*with*simple*interface*to*database
• Cluster*map*provides*map*to*which*server*doc*is*onApp.never.needs.to.know
• App*reads,*writes,*updates*docs•Mul)ple*app*servers*can*access*same*document*at*same*)me
User.Configured.Replica.Count.=.1
READ/WRITE/UPDATE
ACTIVE
Doc&5
Doc&2
Doc
Doc
Doc
SERVER&1
ACTIVE
Doc&4
Doc&7
Doc
Doc
Doc
SERVER&2
Doc&8
ACTIVE
Doc&1
Doc&2
Doc
Doc
Doc
REPLICA
Doc&4
Doc&1
Doc&8
Doc
Doc
Doc
REPLICA
Doc&6
Doc&3
Doc&2
Doc
Doc
Doc
REPLICA
Doc&7
Doc&9
Doc&5
Doc
Doc
Doc
SERVER&3
Doc&6
APP&SERVER&1
COUCHBASE&Client&LibraryCLUSTER&MAP
COUCHBASE&Client&LibraryCLUSTER&MAP
APP&SERVER&2
Doc&9
Friday, April 26, 13
Key
{....“string”.:.“string”,....“string”.:.value,....“string”.:............{..“string”.:.“string”,...............“string”.:.value.},....“string”.:.[.array.]}
JSONOBJECT
(“DOCUMENT”)
• How*to*find*document*based*on*its*aVributes? get.employee.by.email
get.products.by.type
...
• You*need*to*look*“into”*the*document/value
Look*at*a*document
Friday, April 26, 13
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
Key Value
Aven@nus 8.2
Avenue.Ale 4.1
... ...
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
Create*the*index
Friday, April 26, 13
Concrete*Example
• This*map*func)on: receives.the.document.and.metadata
as.developer.you.just.have.to.emit.the.K,V
Friday, April 26, 13
doc.email meta.id
[email protected] u::1
[email protected] u::7
[email protected] u::2
[email protected] u::5
[email protected] u::6
ye@@couchbase.com u::4
[email protected] u::3
?startkey=”b1”&&&endkey=”zz”
Pulls.the.IndexKKeys.between.UTFK8.Range.specified.by.the.startkey.and.endkey.
?startkey=”bz”&&&endkey=”zn”
Pulls.the.IndexKKeys.between.UTFK8.Range.specified.by.the.startkey.and.endkey.
Friday, April 26, 13
doc.email meta.id
[email protected] u::1
[email protected] u::7
[email protected] u::2
[email protected] u::5
[email protected] u::6
ye@@couchbase.com u::4
[email protected] u::3
?key=”[email protected]”&
Match.a.Single.IndexKKey
Friday, April 26, 13
doc.email meta.id
[email protected] u::1
[email protected] u::7
[email protected] u::2
[email protected] u::5
[email protected] u::6
ye@@couchbase.com u::4
[email protected] u::3
?keys=[“[email protected]”,“[email protected]”]
[email protected].(Array.Nota@on)
Friday, April 26, 13
COUCHBASE&SERVER&&CLUSTER
Indexing*and*Querying*
User.Configured.Replica.Count.=.1
ACTIVE
Doc&5
Doc&2
Doc
Doc
Doc
SERVER&1
REPLICA
Doc&4
Doc&1
Doc&8
Doc
Doc
Doc
APP&SERVER&1
COUCHBASE&Client&LibraryCLUSTER&MAP
COUCHBASE&Client&LibraryCLUSTER&MAP
APP&SERVER&2
Doc&9
• Indexing*work*is*distributed*amongst*nodes
• Large*data*set*possible
• Parallelize*the*effort
• Each*node*has*index*for*data*stored*on*it
• Queries*combine*the*results*from*required*nodes
ACTIVE
Doc&5
Doc&2
Doc
Doc
Doc
SERVER&2
REPLICA
Doc&4
Doc&1
Doc&8
Doc
Doc
Doc
Doc&9
ACTIVE
Doc&5
Doc&2
Doc
Doc
Doc
SERVER&3
REPLICA
Doc&4
Doc&1
Doc&8
Doc
Doc
Doc
Doc&9
Query
Friday, April 26, 13
Couchbase*Server*2.0:*Views
• Views*can*cover*a*few*different*use*cases Primary.Index.
Simple.secondary.indexes.(the.most.common)
Complex.secondary,[email protected]
[email protected]@ons.(reduc@on)
• Example:.count.the.number.of.“North.American.Ales”
Organizing.related.data
• Built*using*Map/Reduce [email protected]
[email protected].(reduces).informa@on
Friday, April 26, 13
Distributed*Index*Build*Phase
• Op)mized*for*lookups,*in]order*access*and*aggrega)ons
• All*view*reads*from*disk*(different*performance*profile)
• View*builds*against*every*document*on*every*node This.is.why.you.should.group.them.in.a.design.document
• Automa)cally*kept*up*to*date “Incremental”.Map.Reduce
Friday, April 26, 13
Dynamic(Range(Queries(with(Op5onal(Aggrega5on
•Efficiently.fetch.an.row.or.group.of.related.rows.•Queries.use.cached.values.from.BKtree.inner.nodes.when.possible•Take.advantage.of.inKorder.tree.traversal.with.group_level.queries
Doc.4
Doc.2
Doc.5
SERVER*1
Doc.6
Doc.4
SERVER*2
Doc.7
Doc.1
SERVER*3
Doc.3
Doc.9
Doc.7
Doc.8 Doc.6
Doc.3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc.9
Doc.5
DOC
DOC
DOC
Doc.1
Doc.8 Doc.2
Replica.Docs Replica.Docs Replica.Docs
[email protected] [email protected] [email protected]
?startkey=“J”&endkey=“K”
{“rows”:[{“key”:“Juneau”,“value”:null}]}
Friday, April 26, 13
Append*Only*Index
• Disk&acDvity&is&slow
• UpdaDng&disk&blocks&is&very&slow
• Appending&new&data&to&the&end&of&the¤t&file&is&fast
• Overhead&of&reverse&reading&is&small
• Because&exisDng&blocks&are¬&re0used,&can&lead&to&fragmentaDon Couchbase.will.compact.the.index.automa@cally
DocView
Processor Disk
DocView
Processor
Changed Documents
Appended
Original
Friday, April 26, 13
Adding*a*new*Document
A-R15
I-R8
M-R5
A B C D F G H I K L N O Q R
A-C3
D-F2
G-H2
I-L3
N-R4
A-H7
I-R7
A-R14
M
new root
new key
new reductions
Friday, April 26, 13
What*about*Reduce*?
• Out*of*the*box*func)ons*: _count()
_sum()
_stats()
• Create*your*own*if*neededfunction(key, values, rereduce) { if (rereduce) { var result = 0; for (var i = 0; i < values.length; i++) { result += values[i]; } return result; } else { return values.length; }}
Friday, April 26, 13
Reduce*Func)on
• Key*and*Arrays*of*values*as*parameters
•WriVen*Javascript
• Called*aner*the*map*func)on
• Used*to*reduce*the*result*of*a*map*of*single*values
• Used*with*grouping• Could*be*ignored*when*querying reuse.the.index
Friday, April 26, 13
•Map()*Result
• Reduce()
• Result
Reduce*in*Ac)onKey Value
BelgianKStyle.Dubbel 1
BelgianKStyle.Dubbel 1
BelgianKStyle.Dubbel 1
BelgianKStyle.Pale.Ale 1
BelgianKStyle.White 1
BelgianKStyle.White 1
... ...
_count()
Key Value
BelgianKStyle.Dubbel 3BelgianKStyle.Pale.Ale 1
BelgianKStyle.White 2
Friday, April 26, 13
How*to*use*it?
• Use*client*SDK*to*call*the*view:
View view = client.getView("beer", "by_name");Query query = new Query(); query.setIncludeDocs(true) .setLimit(20) .setRangeStart(ComplexKey.of(startKey)) .setRangeEnd(ComplexKey.of(startKey + "\uefff"));
ViewResponse result = client.query(view, query); for(ViewRow row : result) { ....}
Friday, April 26, 13
≠Hadoop*&*Couchbase
• Deal&with&“Big&Data”
• “More”&is&be)er&than&“Faster”
• Batch&Oriented
• Usually&used&to&“extract/transform”&data
• Fully&distributed
Map,.Shuffle,.Reduce
• Distributed&
• Executed&where&the&document&is
• Deal&with&“indexing”&data&
• As&fast&as&possible
• Use&to&query&the&data&in&the&Database
Friday, April 26, 13
Map*Reduce*in*Couchbase
• Like*many*other*NoSQL*Database*:*Used*for*queries*!*
• Index*are*distributed*on*each*node*of*the*cluster• Index*are*updated*Incrementally
•Write*you*Map*Reduce*in*Javascript
Friday, April 26, 13
@tgrall
Get.Couchbase.Server.at.hEp://www.couchbase.com/download
Friday, April 26, 13