webinar: dramatically reducing development time with mongodb

32
Dramatically Reducing Development Time With MongoDB Solutions Architect, MongoDB Buzz Moschetti [email protected] #MongoDB

Upload: mongodb

Post on 11-Nov-2014

4.123 views

Category:

Technology


1 download

DESCRIPTION

Modern day application development demands persistence of complex and dynamic shapes of data to match the highly flexible and powerful languages used in today's software landscape. Traditional approaches to solutions development with RDBMS increasingly expose the gap between the ease of use of modern development languages and the relational data model. Development time is wasted as the bulk of the work shifts from adding business features to struggling with the RDBMS. MongoDB, the leading NoSQL database, offers a flexible and scalable solution. In this webinar, we will provide a medium-to-deep exploration of the MongoDB programming model and APIs and how they transform the way developers interact with a database, leading to: Faster time to market for both initial deployment and subsequent change Lower development costs More choices in coupling features of a language to the database We will also review the advantages of MongoDB technology in the rapid applications development (RAD) space for popular scripting languages such as javascript, python, perl, and ruby.

TRANSCRIPT

Page 1: Webinar: Dramatically Reducing Development Time With MongoDB

Dramatically Reducing Development Time With MongoDB

Solutions Architect, MongoDB

Buzz [email protected]

#MongoDB

Page 2: Webinar: Dramatically Reducing Development Time With MongoDB

Who is your Presenter?• Yes, I use “Buzz” on my business cards

• Former Investment Bank Chief Architect at JPMorganChase and Bear Stearns before that

• Over 25 years of designing and building systems

• Big and small• Super-specialized to broadly useful in any

vertical• “Traditional” to completely disruptive• Advocate of language leverage and strong

factoring• Still programming – using emacs, of course

Page 3: Webinar: Dramatically Reducing Development Time With MongoDB

What Are Your Developers Doing All Day?

Adding and testing business features

OR

“Integrating with other components, tools, and systems”

• Database(s)• ETL and other data transfer

operations• Messaging • Services (web & other)• Other open source frameworks

Page 4: Webinar: Dramatically Reducing Development Time With MongoDB

Why Can’t We Just Save and Fetch Data?

Because the way we think about data at the business use case level…

…which traditionally is VERY different than the way it is implemented at the database level

…is different than the way it is implemented at the application/code level…

Page 5: Webinar: Dramatically Reducing Development Time With MongoDB

This Problem Isn’t New……but for the past 40 years, innovation at the business &

application layers has outpaced innovation at the database layer

1974 2014Business Data Goals

Capture my company’s transactions daily at 5:30PM EST, add them up on a nightly basis, and print a big stack of paper

Capture my company’s global transactions in realtime plus everything that is happening in the world (customers, competitors, business/regulatory,weather), producing any number of computed results, and passing this all in realtime to predictive analytics with model feedback; results in realtime to 10000s of mobile devices, multiple GUIs, and b2b and b2c channels

Release Schedule

Quarterly Yesterday

Application/Code

COBOL, Fortran, Algol, PL/1, assembler, proprietary tools

COBOL, Fortran, C, C++, VB, C#, Java, javascript, groovy, ruby, perl python, Obj-C, SmallTalk, Clojure, ActionScript, Flex, DSLs, spring, AOP, CORBA, ORM, third party software ecosystem, open source movement

Database I/VSAM, early RDBMS Mature RDBMS, legacy I/VSAMColumn & key/value stores, and…mongoDB

Page 6: Webinar: Dramatically Reducing Development Time With MongoDB

Exactly How Does mongoDB Change Things?

• mongoDB is designed from the ground up to address rich structure (maps of maps of lists of…), not rectangles• Standard RDBMS interfaces (i.e. JDBC) do not exploit

features of contemporary languages• Rapid Application Development (RAD) and scripting in

Javascript, Python, Perl, Ruby, and Scala is impedance-matched to mongoDB

• In mongoDB, the data is the schema

• Shapes of data go in the same way they come out

Page 7: Webinar: Dramatically Reducing Development Time With MongoDB

Rectangles are 1974. Maps and Lists are 2014

{ customer_id : 1,first_name : "Mark",last_name : "Smith",city : "San Francisco",phones: [ {

type : “work”,number: “1-800-

555-1212”},{ type : “home”,

number: “1-800-555-1313”,

DNC: true},{ type : “home”,

number: “1-800-555-1414”,

DNC: true}

] }

Page 8: Webinar: Dramatically Reducing Development Time With MongoDB

An Actual Code Example (Finally!)

Let’s compare and contrast RDBMS/SQL to mongoDB development using Java over the course of a few weeks.

Some ground rules:1. Observe rules of Software Engineering 101: Assume separation of

application, Data Access Layer, and persistor implementation

2. Data Access Layer must be able toa. Expose simple, functional, data-only interfaces to the application

• No ORM, frameworks, compile-time bindings, special toolsb. Exploit high performance features of persistor

3. Focus on core data handling code and avoid distractions that require the same amount of work in both technologiesa. No exception or error handlingb. Leave out DB connection and other setup resources

4. Day counts are a proxy for progress, not actual time to complete indicated task

Page 9: Webinar: Dramatically Reducing Development Time With MongoDB

The Task: Saving and Fetching Contact data

Map m = new HashMap(); m.put(“name”, “buzz”);m.put(“id”, “K1”);

Start with this simple, flat shape in the Data Access Layer:

save(Map m)And assume we save it in this way:

Map m = fetch(String id)

And assume we fetch one by primary key in this way:

Brace yourself…..

Page 10: Webinar: Dramatically Reducing Development Time With MongoDB

Day 1: Initial efforts for both technologiesDDL: create table contact ( … )

init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name ) values ( ?,? )”); fetchStmt = connection.prepareStatement (“select id, name from contact where id = ?”);}

save(Map m){ contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.execute();}

Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) {

m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));

} return m;}

SQLDDL: none

save(Map m){ collection.insert(m);}

mongoDB

Map fetch(String id){ Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) }

m = (Map) c.next(); } return m;}

Let’s assume for argument’s sake that both

approaches take the same amount of time

Page 11: Webinar: Dramatically Reducing Development Time With MongoDB

Day 2: Add simple fields

m.put(“name”, “buzz”);m.put(“id”, “K1”);m.put(“title”, “Mr.”);m.put(“hireDate”, new Date(2011, 11, 1));

• Capturing title and hireDate is part of adding a new business feature

• It was pretty easy to add two fields to the structure

• …but now we have to change our persistence code

Brace yourself (again) …..

Page 12: Webinar: Dramatically Reducing Development Time With MongoDB

SQL Day 2 (changes in bold)DDL: alter table contact add title varchar(8); alter table contact add hireDate date;

init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate from contact where id = ?”);}

save(Map m){ contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); contactInsertStmt.execute();}

Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) {

m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));m.put(“title”, rs.getString(3));m.put(“hireDate”, rs.getDate(4));

} return m;}

Consequences:1. Code release schedule

linked to database upgrade (new code cannot run on old schema)

2. Issues with case sensitivity starting to creep in (many RDBMS are case insensitive for column names, but code is case sensitive)

3. Changes require careful mods in 4 places

4. Beginning of technical debt

Page 13: Webinar: Dramatically Reducing Development Time With MongoDB

mongoDB Day 2save(Map m){ collection.insert(m);}

Map fetch(String id){ Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) }

m = (Map) c.next(); } return m;}

Advantages:1. Zero time and money spent

on overhead code

2. Code and database not physically linked

3. New material with more fields can be added into existing collections; backfill is optional

4. Names of fields in database precisely match key names in code layer and directly match on name, not indirectly via positional offset

5. No technical debt is created

✔ NO CHANGE

Page 14: Webinar: Dramatically Reducing Development Time With MongoDB

Day 3: Add list of phone numbersm.put(“name”, “buzz”);m.put(“id”, “K1”);m.put(“title”, “Mr.”);m.put(“hireDate”, new Date(2011, 11, 1));

n1.put(“type”, “work”);n1.put(“number”, “1-800-555-1212”));list.add(n1);n2.put(“type”, “home”));n2.put(“number”, “1-866-444-3131”));list.add(n2);m.put(“phones”, list);

• It was still pretty easy to add this data to the structure

• .. but meanwhile, in the persistence code …

REALLY brace yourself…

Page 15: Webinar: Dramatically Reducing Development Time With MongoDB

SQL Day 3 changes: Option 1: Assume just 1 work and 1 home phone numberDDL: alter table contact add work_phone varchar(16); alter table contact add home_phone varchar(16); init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate, work_phone, home_phone ) values ( ?,?,?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, work_phone, home_phone from contact where id = ?”);}

save(Map m){ contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); for(Map onePhone : m.get(“phones”)) { String t = onePhone.get(“type”); String n = onePhone.get(“number”); if(t.equals(“work”)) { contactInsertStmt.setString(5, n);

} else if(t.equals(“home”)) { contactInsertStmt.setString(6, n);

} } contactInsertStmt.execute();}

Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) {

m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));m.put(“title”, rs.getString(3));m.put(“hireDate”, rs.getDate(4));

Map onePhone;onePhone = new HashMap();onePhone.put(“type”, “work”);onePhone.put(“number”, rs.getString(5));list.add(onePhone);onePhone = new HashMap();onePhone.put(“type”, “home”);onePhone.put(“number”, rs.getString(6));list.add(onePhone);

m.put(“phones”, list);}

This is just plain bad….

Page 16: Webinar: Dramatically Reducing Development Time With MongoDB

SQL Day 3 changes: Option 2:Proper approach with multiple phone numbersDDL: create table phones ( … )

init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, type, number from contact, phones where phones.id = contact.id and contact.id = ?”);}

save(Map m){

startTrans(); contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”));

for(Map onePhone : m.get(“phones”)) {c2stmt.setString(1, m.get(“id”));c2stmt.setString(2, onePhone.get(“type”));c2stmt.setString(3,

onePhone.get(“number”));c2stmt.execute();

} contactInsertStmt.execute();

endTrans();}

Map fetch(String id){ Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); int i = 0; List list = new ArrayList(); while (rs.next()) {

if(i == 0) {m = new HashMap();m.put(“id”, rs.getString(1));m.put(“name”, rs.getString(2));m.put(“title”,

rs.getString(3));m.put(“hireDate”,

rs.getDate(4)); m.put(“phones”, list);

}Map onePhone = new HashMap();onePhone.put(“type”, rs.getString(5));onePhone.put(“number”, rs.getString(6));

list.add(onePhone);i++;

} return m;}

This took time and money

Page 17: Webinar: Dramatically Reducing Development Time With MongoDB

SQL Day 5: Zombies! (zero or more between entities)

init(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select A.id, A.name, A.title, A.hiredate, B.type, B.number from contact A left outer join phones B on (A.id = B. id) where A.id = ?”);}

Whoops! And it’s also wrong!We did not design the query accounting for contacts that have no phone number. Thus, we have to change the join to an outer join.

But this ALSO means we have to change the unwind logic

This took more time and money!

while (rs.next()) {if(i == 0) { // …}String s = rs.getString(5);if(s != null) { Map onePhone = new HashMap(); onePhone.put(“type”, s); onePhone.put(“number”,

rs.getString(6)); list.add(onePhone); } }

…but at least we have a DAL…right?

Page 18: Webinar: Dramatically Reducing Development Time With MongoDB

mongoDB Day 3save(Map m){ collection.insert(m);}

Map fetch(String id){ Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) }

m = (Map) c.next(); } return m;}

Advantages:1. Zero time and money spent

on overhead code

2. No need to fear fields that are “naturally occurring” lists containing data specific to the parent structure and thus do not benefit from normalization and referential integrity

✔ NO CHANGE

Page 19: Webinar: Dramatically Reducing Development Time With MongoDB

By Day 14, our structure looks like this:

m.put(“name”, “buzz”);m.put(“id”, “K1”);

//…

n4.put(“startupApps”, new String[] { “app1”, “app2”, “app3” } );n4.put(“geo”, “US-EAST”);list2.add(n4);n4.put(“startupApps”, new String[] { “app6” } );n4.put(“geo”, “EMEA”);ln4.put(“useLocalNumberFormats”, false):list2.add(n4);m.put(“preferences”, list2)

n6.put(“optOut”, true);n6.put(“assertDate”, someDate);seclist.add(n6);m.put(“attestations”, seclist)m.put(“security”, anotherMapOfData);

• It was still pretty easy to add this data to the structure

• Want to guess what the SQL persistence code looks like?

• How about the mongoDB persistence code?

Page 20: Webinar: Dramatically Reducing Development Time With MongoDB

SQL Day 14

Error: Could not fit all the code into this space.

…actually, I didn’t want to spend 2 hours putting the code together..

But very likely, among other things:

• n4.put(“startupApps”,new String[]{“app1”,“app2”,“app3”});was implemented as a single semi-colon delimited string

• m.put(“security”, anotherMapOfData);was implemented by flattening it out and storing a subset of fields

Page 21: Webinar: Dramatically Reducing Development Time With MongoDB

mongoDB Day 14 – and every other daysave(Map m){ collection.insert(m);}

Map fetch(String id){ Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) }

m = (Map) c.next(); } return m;}

Advantages:1. Zero time and money spent

on overhead code

2. Persistence is so easy and flexible and backward compatible that the persistor does not upward-influence the shapes we want to persist i.e. the tail does not wag the dog

✔ NO CHANGE

Page 22: Webinar: Dramatically Reducing Development Time With MongoDB

But what about “real” queries?

• mongoDB query language is a physical map-of-map based structure, not a String• Operators (e.g. AND, OR, GT, EQ, etc.) and

arguments are keys and values in a cascade of Maps

• No grammar to parse, no templates to fill in, no whitespace, no escaping quotes, no parentheses, no punctuation

• Same paradigm to manipulate data is used to manipulate query expressions

• …which is also, by the way, the same paradigm for working with mongoDB metadata and explain()

Page 23: Webinar: Dramatically Reducing Development Time With MongoDB

mongoDB Query Examples

List fetchGeneral(Map expr){ List l = new ArrayList(); DBObject dbo = new BasicDBObject(expr); Cursor c = collection.find(dbo); while (c.hasNext()) }

l.add((Map)c.next()); } return l;}

Objective Code CLIFind all contacts with at least one mobile phone

Map expr = new HashMap();expr.put(“phones.type”, “mobile”);

db.contact.find({"phones.type”:"mobile”});

Find contacts with NO phones

Map expr = new HashMap();Map q1 = new HashMap();q1.put(“$exists”, false);expr.put(“phones”, q1);

db.contact.find({"phones”:{"$exists”:false}});

Advantages:1. Far less time required to set

up complex parameterized filters

2. No need for SQL rewrite logic or creating new PreparedStatements

3. Map-of-Maps query structure is easily walked and processed without parsing

Page 24: Webinar: Dramatically Reducing Development Time With MongoDB

…and before you ask…

Yes, mongoDB query expressions support

1. Sorting2. Cursor size limit3. Aggregation functions4. Projection (asking for only parts of the

rich shape to be returned)

Page 25: Webinar: Dramatically Reducing Development Time With MongoDB

Day 30: RAD on mongoDB with Pythonimport pymongo

def save(data):coll.insert(data)

def fetch(id):return coll.find_one({”id": id } )

myData = { “name”: “jane”, “id”: “K2”, # no title? No problem “hireDate”: datetime.date(2011, 11, 1), “phones”: [ { "type": "work", "number": "1-800-555-1212" }, { "type": "home", "number": "1-866-444-3131" } ]}save(myData)print fetch(“K2”)

expr = { "$or": [ {"phones": { "$exists": False }}, {"name": ”jane"}]}for c in coll.find(expr):

print [ k.upper() for k in sorted(c.keys()) ]

Advantages:

1. Far easier and faster to create scripts due to “fidelity-parity” of mongoDB map data and python (and perl, ruby, and javascript) structures

2. Data types and structure in scripts are exactly the same as that read and written in Java and C++

Page 26: Webinar: Dramatically Reducing Development Time With MongoDB

Day 30: Polymorphic RAD on mongoDB with Pythonimport pymongo

item = fetch("K8")# item is:{ “name”: “bob”, “id”: “K8”, "personalData": { "preferedAirports": [ "LGA", "JFK" ], "travelTimeThreshold": { "value": 3, "units": “HRS”} }}

item = fetch("K9")# item is:{ “name”: “steve”, “id”: “K9”, "personalData": { "lastAccountVisited": {

"name": "mongoDB", "when":

datetime.date(2013,11,4)},

"favoriteNumber": 3.14159 }}

Advantages:

1. Scripting languages easily digest shapes with common fields and dissimilar fields

2. Easy to create an information architecture where placeholder fields like personalData are “known” in the software logic to be dynamic

Page 27: Webinar: Dramatically Reducing Development Time With MongoDB

Day 30: (Not) RAD on top of SQL with Pythoninit(){ contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, type, number from contact, phones where phones.id = contact.id and contact.id = ?”);}

save(Map m){

startTrans(); contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”));

for(Map onePhone : m.get(“phones”)) {c2stmt.setString(1, onePhone.get(“type”));c2stmt.setString(2,

onePhone.get(“number”));c2stmt.execute();

} contactInsertStmt.execute();

endTrans();}

Consequences:

1. All logic coded in Java interface layer (splitting up contact, phones, preferences, etc.) needs to be rewritten in python (unless Jython is used) … AND/or perl, C++, Scala, etc.

2. No robust way to handle polymorphic data other than BLOBing it

3. …and that will take real time and money!

Page 28: Webinar: Dramatically Reducing Development Time With MongoDB

The Fundamental Change with mongoDBRDBMS designed in era when:• CPU and disk was slow &

expensive• Memory was VERY expensive• Network? What network?• Languages had limited means

to dynamically reflect on their types

• Languages had poor support for richly structured types

Thus, the database had to• Act as combiner-coordinator of

simpler types• Define a rigid schema• (Together with the code)

optimize at compile-time, not run-time

In mongoDB, the data is the schema!

Page 29: Webinar: Dramatically Reducing Development Time With MongoDB

mongoDB and the Rich Map EcosystemGeneric comparison of two records

Map expr = new HashMap();expr.put("myKey", "K1");DBObject a = collection.findOne(expr);expr.put("myKey", "K2");DBObject b = collection.findOne(expr);List<MapDiff.Difference> d = MapDiff.diff((Map)a, (Map)b);

Getting default values for a thing on a certain date and then overlaying user preferences (like for a calculation run)

Map expr = new HashMap();expr.put("myKey", "DEFAULT");expr.put("createDate", new Date(2013, 11, 1));DBObject a = collection.findOne(expr);expr.clear();expr.put("myKey", "user1");DBObject b = otherCollectionPerhaps.findOne(expr);MapStack s = new MapStack();s.push((Map)a);s.push((Map)b);Map merged = s.project();

Runtime reflection of Maps and Lists enables generic powerful utilities (MapDiff, MapStack) to be created once and used for all

kinds of shapes, saving time and money

Page 30: Webinar: Dramatically Reducing Development Time With MongoDB

Lastly: A CLI with teeth> db.contact.find({"SeqNum": {"$gt”:10000}}).explain();{ "cursor" : "BasicCursor",

"n" : 200000,//..."millis" : 223

}

Try a query and show the diagnostics

> for(v=[],i=0;i<3;i++) {… n = i*50000;… expr = {"SeqNum": {"$gt”: n}};… v.push( [n, db.contact.find(expr).explain().millis)] }

Run it 3 times with smaller and smaller chunks and create a vector of timing result pairs (size,time)

> v[ [ 0, 225 ], [ 50000, 222 ], [ 100000, 220 ] ]

Let’s see that vector

> load(“jStat.js”)> jStat.stdev(v.map(function(p){return p[1];}))2.0548046676563256

Use any other javascript you want inside the shell

> for(i=0;i<3;i++) {… expr = {"SeqNum": {"$gt":i*1000}};… db.foo.insert(db.contact.find(expr).explain()); }

Party trick: save the explain() output back into a collection!

Page 31: Webinar: Dramatically Reducing Development Time With MongoDB

Webex Q&A

Page 32: Webinar: Dramatically Reducing Development Time With MongoDB

Thank You

Solutions Architect, MongoDB

Buzz [email protected]

#MongoDB