navigating the nosql landscape using lego mindstorms and java

Post on 17-Jun-2015

247 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Navigating the NoSQL Landscape using Lego Mindstorms and Java

Michael Nitschinger Developer Advocate, Couchbase Inc.

Navigating the NoSQL Landscape using Lego Mindstorms and Java

Michael Nitschinger Developer Advocate, Couchbase Inc.

•  Developer(Advocate(at(Couchbase,(Inc.(•  Maintainer(of(the(Couchbase(Java(SDK(

•  Speaking(at(Conferences(and(Meetups(

•  Living(and(Working(here(in(Vienna,(Austria(

{“about”:*“me”}*

What*we’ll*talk*about*

•  What*are*the*limits*of*RDBMS*solu=ons?*

•  What*are*the*different*NoSQL*taxonomies?*

•  Which*NoSQL*solu=on*is*right*for*me?*

Growth*is*the*New*Reality*

•  Instagram*gained*nearly*1*million*users*overnight*when*they*expanded*to*Android*

Showcase:*Draw*Something*

Showcase:*Draw*Something*

Showcase:*Draw*Something*

Does*it*work*with*RDMBS*backend?*

Application Scales Out Just add more commodity web servers

Database Scales Up Get a bigger, more complex server

Note(–(RelaEonal(database(technology(is(great(for(what(it(is(great(for,(but(it(is(not(great(for(this.(

Some*alterna=ves*to*scale*out*your*RDBMS*

Scale*out*your*RDBMS*•  Run*many*SQL*Servers*•  Data*is*sharded*

(on$the$app$level!)$•  Memcached/Cache*for*faster*

response*=me*

•  Writes*are*s=ll*slow*

Scale*out*with*RDBMS*

Is*this*a*good*approach*to*scale?*

•  Lot*of*components*to*deploy*

•  Scale*by*Hand*­  Caching(­  Sharding/ReplicaEon(

Learn*From*Others((This(Scenario(Costs(Time(and(Money.(Scaling(SQL(is(potenEally(disastrous(when(going(Viral:((Very(risky(Eme(for(major(code(changes(and(migraEons...(You(have(no(Time(when(skyrockeEng(up!(

The*Rela=onal*Model*

•  Formulated*and*proposed*by*Edgar*Codd*in*1969.*­  hPp://en.wikipedia.org/wiki/RelaEonal_model(

•  Based*on*Rela=onal*Algebra*­  which(is(based(on(Set(Theory(

•  Not*all*Problems*fit*into*Set*Theory*­  i.e.(Graph(Theory(­  RelaEonships(­  RecommendaEons(

hPp://en.wikipedia.org/(wiki/Honeywell_316(

Lacking*market*solu=ons,*users*forced*to*invent*

Dynamo(October(2007(

Cassandra(August(2008(

Voldemort(February(2009(

Bigtable(November(2006(

Very(few(organizaEons(want(to((fewer(can)(build(and(maintain(database(sobware(technology.(But(every(organizaEon(building(interacEve(web(applicaEons(needs(this(technology.(

•  No(schema(required(before(inserEng(data(•  No(schema(change(required(to(change(data(format(•  Autodsharding(without(applicaEon(parEcipaEon(•  Distributed(queries(•  Integrated(main(memory(caching(•  Data(synchronizaEon((mobile,(mulEddatacenter)(

Survey:*Schema*inflexibility*#1*adop=on*driver*

11%(

12%(

16%(

29%(

35%(

49%(

Other(

All(of(these(

Costs(

High(latency/low(performance(

Inability(to(scale(out(data(

Lack(of(flexibility/rigid(schemas(

Source: Couchbase NoSQL Survey, December 2011, n=1351

What*is*the*biggest*data*management*problem**driving*your*use*of*NoSQL*in*the*coming*year?*

NoSQL*database*matches*applica=on*logic*=er*architecture*Data(layer(now(scales(with(linear(cost(and(constant(performance(

Application Scales Out Just add more commodity web servers

Database Scales Out Just add more commodity data servers

Scaling out flattens the cost and performance curves.

NoSQL(Database(Servers(

NoSQL*Taxonomy*

The*CAP*Theorem*

•  In*a*distributed*System:*­  Consistency(­  Availability(­  ParEEon(Tolerance(

•  When*Par==on*happens*­  Choose(either(Consistency(

(only(respond(to(subset)(­  or(Availability(

(accept(stale(data(and(conflict(writes)(Conflict(ResoluEon!(

C A

P

•  Big*Data*­  Large(scale(datastore((“>=(100TB(or(Petabytes”)(­  OpEmized(for(Batch(Processing(­  Data(Warehouse(

•  Big*Users*­  very(high(get/set(rate((thousands(of(ops/s)(­  working(set(in(RAM(­  latency(and(throughput(maPers(most(­  (near)(RealdTime(use(cases(

Clarifica=on*

The*Key`Value*Store*/*“Cache”*–*the*founda=on*of*NoSQL*

Key*101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(

101100101000100010011101(101100101000100010011101(

101100101000100010011101(101100101000100010011101(101100101000100010011101(

Opaque*Binary*Value*

Memcached*–*the*NoSQL*precursor*

Key*101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(

101100101000100010011101(101100101000100010011101(

101100101000100010011101(101100101000100010011101(101100101000100010011101(

Opaque*Binary*Value*

Memcached*

Indmemory(only(Limited(set(of(operaEons(Blob(Storage:(Set,(Add,(Replace,(CAS(Retrieval:(Get(Structured(Data:(Append,(Increment((“Simple(and(fast.”((Challenges:((d((((cold(cache(d  disrupEve(elasEcity(d  missing(persistence(

NoSQL*catalog*Key`Value*

Memcached(

Cache(

(mem

ory(on

ly)(

Database(

(mem

ory/disk)(

Redis*–*More*“Structured*Data”*commands*

Key*101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(

101100101000100010011101(101100101000100010011101(

101100101000100010011101(101100101000100010011101(101100101000100010011101(

“Data*Structures”*Blob*List*Set*Hash*…*

Redis*

Disk(Persistence((eventual(consistency(on(the(disk)!Vast(set(of(operaEons(Blob(Storage:(Set,(Add,(Replace,(CAS(Retrieval:(Get,(PubdSub(Structured(Data:(Strings,(Hashes,(Lists,(Sets,(Sorted(lists((Challenges:((d(clustering((to(come)((d(RAM(limit((no(evicEon)((

((

NoSQL*catalog*Key`Value*

Memcached(

Cache(

(mem

ory(on

ly)(

Database(

(mem

ory/disk)(

Redis(

Data*Structure*

Membase*–*From*key`value*cache*to*database*

Diskdbased(with(builtdin(memcached(cache(Cache(refill(on(restart(Memcached(compaEble((drop(in(replacement)(Highlydavailable((data(replicaEon)(Add(or(remove(capacity(to(live(cluster((“Simple,(fast,(elasEc.”((

Membase*Key*101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(101100101000100010011101(

101100101000100010011101(101100101000100010011101(

101100101000100010011101(101100101000100010011101(101100101000100010011101(

Opaque*Binary*Value*

NoSQL*catalog*Key`Value*

Memcached(

Cache(

(mem

ory(on

ly)(

Database(

(mem

ory/disk)(

Redis(

Data*Structure*

Membase(

Couchbase*–*Document`oriented*database*

Key*{(((((“string”(:(“string”,(((((“string”(:(value,(((((“string”(:((((((((((((({((“string”(:(“string”,((((((((((((((((“string”(:(value(},(((((“string”(:([(array(](}((

Autodsharding(Diskdbased(with(builtdin(memcached(cache(Cache(refill(on(restart(Memcached(compaEble((drop(in(replace)(Highlydavailable((data(replicaEon)(Add(or(remove(capacity(to(live(cluster((When(values(are(JSON(objects((“documents”):(Create(indices,(views(and(query(against(the(views((Chooses(Consistency(over(Availability(

JSON*&*Opaque*OBJECT*

(“DOCUMENT”)*

Couchbase*

NoSQL*catalog*Key`Value*

Memcached(

Cache(

(mem

ory(on

ly)(

Database(

(mem

ory/disk)(

Redis(

Data*Structure*

Membase( Couchbase(

Document*

MongoDB*–*Document`oriented*database*

Key*{(((((“string”(:(“string”,(((((“string”(:(value,(((((“string”(:((((((((((((({((“string”(:(“string”,((((((((((((((((“string”(:(value(},(((((“string”(:([(array(](}((

Diskdbased(with(indmemory(“caching”(BSON((“binary(JSON”)(format(and(wire(protocol(Masterdslave(replicaEon(Autodsharding(Values(are(BSON(objects(Supports(ad(hoc(queries(–(best(when(indexed((more(similar(to(RDBMS(modeling(than(Caches((Scaling(over(sharding(requires(special(nodes(

BSON*OBJECT*

(“DOCUMENT”)*

MongoDB*

NoSQL*catalog*Key`Value*

Memcached(

Cache(

(mem

ory(on

ly)(

Database(

(mem

ory/disk)(

Redis(

Data*Structure*

Membase( Couchbase(

MongoDB(

Document*

Cassandra*–*Column*overlays*

Diskdbased(system(Clustered((External(caching(required(for(lowdlatency(reads(“Columns”(are(overlaid(on(the(data(Not(all(rows(must(have(all(columns(Supports(efficient(queries(on(columns(Restart(required(when(adding(columns((MulEdDatadCenter(replicaEon(supported(ColumndModel(may(be(complex(to(start(with((Chooses(Availability(over(Consistency(((

Cassandra*Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Column(1(

Column(2(

Column(3(((not(present)((

NoSQL*catalog*Key`Value*

Memcached(

Cache(

(mem

ory(on

ly)(

Database(

(mem

ory/disk)(

Redis(

Data*Structure*

Membase( Couchbase(

MongoDB(

Document* Column*

Cassandra(

Neo4j*–*Graph*database*

Diskdbased(system(External(caching(required(for(lowdlatency(reads(Nodes,(relaEonships(and(paths(ProperEes(on(nodes(Delete,(Insert,(Traverse,(etc.(((

Neo4j*

Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101

101100101000100010011101101100101000100010011101101100101000100010011101

OpaqueBinaryValue

NoSQL*catalog*Key`Value*

Memcached(

Cache(

(mem

ory(on

ly)(

Database(

(mem

ory/disk)(

Redis(

Data*Structure*

Membase( Couchbase(

MongoDB(

Document* Column*

Cassandra(

Graph*

Neo4j(

NoSQL*catalog*Key`Value*

Memcached(

Cache(

(mem

ory(on

ly)(

Database(

(mem

ory/disk)(

Redis(

Data*Structure*

Riak(

Couchbase(

MongoDB(

Document* Column*

Cassandra(

Graph*

Neo4j(

HBase( InfiniteGraph(

Coherence(

Membase(

What*about*Hadoop?*

Hadoop:*Big*Data*Swiss*Army*Knife*

•  Oozie:(Workflow,(coordinaEon(•  Sqoop(:(Data(connector(to(import/export(data(•  Hive(:(SQLdLike(interface(•  Pig(:(High(level(programming(language(•  Mahout(:(Machine(learning(library(•  Whirr(:(Hadoop(management(tools(for(cloud(services(

•  Flume(:(Aggregator(•  Map(Reduce(:(Framework(to(process(large(volume(of(data(•  HBase(:(Key(Value(data(store(•  Zookeeper(:(Centralized(configuraEon(management(

•  HDFS(:(Distributed(file(system(

So*what?*Connec=ng*Hadoop*

click(stream(events(

profiles,(campaigns(

profiles,(real(Eme(campaign((staEsEcs(

40*milliseconds*to(respond(with(the(decision.(

2*

3*

1*

Which*one*is*right*for*me?*

Survey:*Schema*inflexibility*#1*adop=on*driver*

11%(

12%(

16%(

29%(

35%(

49%(

Other(

All(of(these(

Costs(

High(latency/low(performance(

Inability(to(scale(out(data(

Lack(of(flexibility/rigid(schemas(

Source: Couchbase NoSQL Survey, December 2011, n=1351

What*is*the*biggest*data*management*problem**driving*your*use*of*NoSQL*in*the*coming*year?*

Lack*of*Flexibility*/*Rigid*Schema*•  Aggregate*Data*Models*(Mar0n$Fowler)$­  Flexible(Data(Structure(­  OpEmized(Access(­  Easy(to(distribute(data(

o::1001*{ uid: ji22jd, customer: Ann, line_items: [ { sku: 0321293533, quan: 3, unit_price: 48.0 }, { sku: 0321601912, quan: 1, unit_price: 39.0 }, { sku: 0131495054, quan: 1, unit_price: 51.0 } ], payment: { type: Amex, expiry: 04/2001,

last5: 12345 } }

hPp://marEnfowler.com/bliki/AggregateOrientedDatabase.html(

Use*Cases*Key*Value* • *Session*Management*

• *User*Profile/Preferences*• *Shopping*Cart*

Document* • *Event*Logging*• *Content*Management**• *Web*Analy=cs*• *E`Commerce*Applica=on*

Columns* • *Event*Logging*• *Content*Management*• *Counters*

Graph* • *Connected*Data*/**Social*Networks*• *Rou=ng,*Dispatch*• *Recommenda=ons*based*on*Social*Graph*

Produc=on*Environment*

US*DATA*CENTER*

*

EMEA*DC*

*

APAC*DC*

*

How*do*I*want*to*scale*out?*

•  Modify*cluster*topology*should*be*simple*­  Add,(Remove,(Configure(Nodes(on(a(running(system(

•  What*is*the*impact*of*topology*changes?*­  Sharding,(Caching(of(the(data(­  Availability(of(the(service(during(cluster(changes(

•  More*hardware*=*More*failures*­  Availability,(reliability(of(the(system:(failover(support(

Add*Nodes*to*Cluster*

•  Two*servers*added*One`click*opera=on*

•  Docs*automa=cally*rebalanced*across*cluster*Even(distribuEon(of(docs(Minimum(doc(movement(

•  Cluster*map*updated*

•  App*database**calls*now*distributed**over*larger*number*of*servers**

**

REPLICA*

ACTIVE*

Doc*5*

Doc*2*

Doc*

Doc*

Doc*4*

Doc*1*

Doc*

Doc*

SERVER*1* **

REPLICA*

ACTIVE*

Doc*4*

Doc*7*

Doc*

Doc*

Doc*6*

Doc*3*

Doc*

Doc*

SERVER*2* **

REPLICA*

ACTIVE*

Doc*1*

Doc*2*

Doc*

Doc*

Doc*7*

Doc*9*

Doc*

Doc*

SERVER*3* **

SERVER*4* **

SERVER*5*

REPLICA*

ACTIVE*

REPLICA*

ACTIVE*

Doc*

Doc*8* Doc*

Doc*9* Doc*

Doc*2* Doc*

Doc*8* Doc*

Doc*5* Doc*

Doc*6*

READ/WRITE/UPDATE* READ/WRITE/UPDATE*

APP*SERVER*1*

COUCHBASE*Client*Library***CLUSTER*MAP*

COUCHBASE*Client*Library***CLUSTER*MAP*

APP*SERVER*2*

COUCHBASE*SERVER*CLUSTER*

User(Configured(Replica(Count(=(1(

Fail*Over*Node*

**

REPLICA*

ACTIVE*

Doc*5*

Doc*2*

Doc*

Doc*

Doc*4*

Doc*1*

Doc*

Doc*

SERVER*1* **

REPLICA*

ACTIVE*

Doc*4*

Doc*7*

Doc*

Doc*

Doc*6*

Doc*3*

Doc*

Doc*

SERVER*2* **

REPLICA*

ACTIVE*

Doc*1*

Doc*2*

Doc*

Doc*

Doc*7*

Doc*9*

Doc*

Doc*

SERVER*3* **

SERVER*4* **

SERVER*5*

REPLICA*

ACTIVE*

REPLICA*

ACTIVE*

Doc*9*

Doc*8*

Doc* Doc*6* Doc*

Doc*

Doc*5* Doc*

Doc*2*

Doc*8* Doc*

Doc*

•  App*servers*accessing*docs*

•  Requests*to*Server*3*fail*

•  Cluster*detects*server*failed*Promotes(replicas(of(docs(to(acEve(Updates(cluster(map(

•  Requests*for*docs*now*go*to*appropriate*server*

•  Typically*rebalance**would*follow*

Doc*

Doc*1* Doc*3*

APP*SERVER*1*

COUCHBASE*Client*Library***CLUSTER*MAP*

COUCHBASE*Client*Library***CLUSTER*MAP*

APP*SERVER*2*

User(Configured(Replica(Count(=(1(

COUCHBASE*SERVER*CLUSTER*

Performance*

•  What*is*my*working*set?*­  Different(PaPerns(based(on(the(ApplicaEon(­  Social(Games(vs.(AnalyEcs(

•  What*do*I*need*to*cache*/*how*oren?*­  Put(your(data(in(RAM(­  Read/Write(rates(

•  How*to*design*my*data*model?*­  Trim(towards(your(“hot(code(path”(­  Aggregate(Model(­  Easy(to(change(

Management*and*Monitoring*

•  Do*not*forget*about*Opera=ons!*­  Service(Reliability(Engineering(Team(will(thank(you!(

•  Manage*your*cluster*easily:*­  Command(Line,(AdministraEon(Console(to(change(cluster(toplogy(

•  Monitor*“your*NoSQL”*­  Analyze(the(overall(status(of(your(cluster(­  View(and(fix(boPlenecks(

Conclusion*

•  One*Size*Does*Not*Fit*All*•  Overview*of*the*the*NoSQL*types*•  Choose*the*right*solu=on*for*your*applica=on*

•  Don’t*mix*Big*Data*with*Big*Users!*

Q&A*

Thank*you!*michael.nitschinger@couchbase.com(

@daschl((

Get(Couchbase(Server(at((hPp://www.couchbase.com/download(

top related