creating a single view: data design and loading strategies
DESCRIPTION
Learn how to design a single view application and load your data into the application.TRANSCRIPT
![Page 1: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/1.jpg)
Enterprise Architect, MongoDB
Buzz [email protected]
#ConferenceHashTag
Creating a Single View Part 2:Data Design & Loading Strategies
![Page 2: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/2.jpg)
Who Is Talking To You?
• Yes, I use “Buzz” on my business cards
• Former Investment Bank Chief Architect at JPMorganChase and Bear Stearns before that
• Over 27 years of designing and building systems• Big and small• Super-specialized to broadly useful in any vertical• “Traditional” to completely disruptive• Advocate of language leverage and strong factoring• Inventor of perl DBI/DBD
• Still programming – using emacs, of course
![Page 3: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/3.jpg)
What Is He Going To Talk About?
Historic Challenges
New Strategy for Success
Technical examples and tips
Overview &Data Analysis
Data Design &Loading
Strategies
Securing YourDeployment
çΩ
Creating A Single View
Part1
Part2
Part3
![Page 4: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/4.jpg)
Historic Challenges
![Page 5: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/5.jpg)
It’s 2014: Why is this still hard to do?
• Business / Technical / Information Challenges
• Missteps in evolution of data transfer technology
A X
![Page 6: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/6.jpg)
We wish this “just worked”
A
Query objects from A with great performance
Query objects from B with great performance
X
Query objects from merged A and B with great performance
B
![Page 7: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/7.jpg)
…but Beware The Blue Arrow!
A X
• Extracting many tables into many files• Some tables require more than one file to capture
representation• Encoding/formatting clever tricks• Reconciliation• Different extracts for different consumers• Different extracts for different versions of data to same
consumer
![Page 8: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/8.jpg)
Loss of fidelity exposedclass Product {
String productName;
List<Features> ff;
Date introDate;List<Date>
versDates;int[]
unitBundles;//…
}widget1,,3,,good texture,retains value,,,20142304,102.3,201401widget2,XS,6,,,,not fragile,,,20132304,73,87653widget3,XT,,,4,,dense,shiny,mysterious,,,19990304,73,87653,,widget4,,,3,4,,,,,,20040101,,999999,,
AORM
![Page 9: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/9.jpg)
What happened to XML?
class Product {String
productName;List<Features>
ff;Date introDate;List<Date>
versDates;int[]
unitBundles;//…
}
<product> <name>widget1</name> <features> <feature> <text>good texture</text> <type>A</type> </feature> </features> <introDate>20140204</introDate> <versDates> <versDate>20100103</versDate> <versDate>20100601</versDate> </versDates> <unitBundles>1,3,9</unitBun…
çΩ
![Page 10: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/10.jpg)
XML: Created More Issues Than Solved
<product> <name>widget1</name> <features> <feature> <text>good texture</text> <type>A</type> </feature> </features> <introDate>20140204</introDate> <versDates> <versDate>20100103</versDate> <versDate>20100601</versDate> </versDates> <unitBundles>1,3,9</unitBun…
• No native handling of arrays
• Attribute vs. nested tag rules/conventions widely variable
• Generic parsing (DOM) yields a tree of Nodes of Strings – not very friendly
• SAX is fast but too low level
![Page 11: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/11.jpg)
… and it eventually became this
<p name=“widget1” ftxt1=“good texture” ftyp1=“A” idt=“20140203” …<p name=“widget2” ftxt1=“not fragile” ftyp1=“A” idt=“20110117” …<p name=“widget3” ftxt1=“dense” idt=“20140203” …<p name=“widget4” idt=“20140203” versD=“20130403,20130104,20100605” …
• Short, cryptic, conflated tag names
• Everything is a string attribute
• Mix of flattened arrays and delimited strings
• Irony: org.xml.sax.Attributes easier to deal with than rest of DOM
![Page 12: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/12.jpg)
Schema Change Challenges:Multiplied & Concentrated!
X
Alter table(s)split() more data
AAlter table(s)Extract more dataLOE = x1
Alter table(s)split() more dataAlter table(s)split() more data
BAlter table(s)Extract more dataLOE = x2
CAlter table(s)Extract more dataLOE = x3
where f() is nonlinear wrt n
![Page 13: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/13.jpg)
SLAs & Security: Tough to Combine
A
B
User 1 entitled to see XUser 2 entitled to see Y
User 1 entitled to see ZUser 2 entitled to see V
X
Entitlements managed per-system/per-application here….
…are lost in the low-fidelity transfer of data….
…and have to be reconstituted here…somehow…
![Page 14: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/14.jpg)
Solving The Problem with mongoDB
![Page 15: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/15.jpg)
What We Are Building Today
![Page 16: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/16.jpg)
Overall Strategy For Success
• Let the source systems entities drive the data design, not the physical database
• Capture data in full fidelity
• Perform cross-ref and additional logic at the single point of view, not in transit
![Page 17: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/17.jpg)
Don’t forget the power of the API
class Product {String
productName;List<Features> ff;Date introDate;List<Date>
versDates;int[] unitBundles;//…
}
If you can, avoid files altogether!
Haskell
çΩ
![Page 18: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/18.jpg)
But if you are creating files: emit JSON
class Product {String
productName;List<Features> ff;Date introDate;List<Date>
versDates;int[] unitBundles;//…
}
{ “name”: “widget1”, “features”: [
{ “text”: “good texture”,
“type”: “A” }],“introDate”: “20140204”,“versDates”: [“20100103”, “20100601”],
“unitBundles”: [1,3,7,9]// …
}
çΩ
![Page 19: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/19.jpg)
Let The Feeding System Express itself
A
B
C
{ “name”: “widget1”, “features”: [
{ “text”: “good texture”, “type”: “A” }]
}
{ “myColors”: [“red”,”blue”], “myFloats”: [ 3.14159, 2.71828 ], “nest”: { “as”: { “deep”: true }}}}
{ “myBlob”: { “$binary”: “aGVsbG8K”}, “myDate”: { “$date”: “20130405” }}
![Page 20: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/20.jpg)
What if you forgot something?
{ “name”: “widget1”, “features”: [
{ “text”: “good texture”,
“type”: “A” }],“introDate”: “20140204”,“versDates”: [“20100103”, “20100601”],
“versMinorNum”: [1,3,7,9]// …
}
{ “name”: “widget1”, “features”: [
{ “text”: “good texture”,
“type”: “A” }],
“coverage”: [ “NY”, “NJ” ],“introDate”: “20140204”,“versDates”: [“20100103”, “20100601”],
“versMinorNum”: [1,3,7,9]// …
}
çΩ
![Page 21: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/21.jpg)
The Joy (and value) of mongoDB
AAlter table(s)Extract more dataLOE = .25x1
BAlter table(s)Extract more dataLOE = .25x2
CAlter table(s)Extract more dataLOE = .25x3
![Page 22: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/22.jpg)
Helpful Hints
![Page 23: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/23.jpg)
Helpful Hint: Use the APIscurs.execute("select A.did, A.fullname, B.number from contact A left outer join phones B on A.did = B.did order by A.did")
for q in curs.fetchall(): if q[0] != lastDID: if lastDID != None: coll.insert(contact) contact = { "did": q[0], "name": q[1]} lastDID = q[0]
if q[2] is not None: if 'phones' not in contact: contact['phones'] = [] contact['phones'].append({"number”:q[2]})
if lastDID != None: coll.insert(contact)
{ "did" : ”D159308", "phones" : [ {"number”: "1-666-444-3333”}, {"number”: "1-999-444-3333”}, {"number”: "1-999-444-9999”} ], "name" : ”Buzz"}
çΩ
![Page 24: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/24.jpg)
Helpful Hint: Declare Types
Use mongoDB conventions for dates and binary data:{“dateA”: {“$date”:“2014-05-16T09:42:57.112-0000”}}{“dateB”: {“$date”:1400617865438}}{“someBlob”: { "$binary" : "YmxhIGJsYSBibGE=", "$type" : "00" }
![Page 25: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/25.jpg)
Helpful Hint: Keep the file flexibleUse CR-delimited JSON:
{ “name”: “buzz”, “locale”: “NY”}{ “name”: “steve”, “locale”: “UK”}{ “name”: “john”, “locale”: “NY”}
…instead of a giant array:
records = [ { “name”: “buzz”, “locale”: “NY”}, { “name”: “steve”, “locale”: “UK”}, { “name”: “john”, “locale”: “NY”},]
![Page 26: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/26.jpg)
Helpful Hint: A quick sidebar on jq$ cat myData
{ "name": "dave", “type”: “mobile”, "phones": [ { "type": "mobile", "number": "2123455634", "dnc": false }, { "type": "mobile", "number": "6173455634" }, { "type": "land", "number": "2023455634" } ] }
{ "name": "bob", “type”: “WFH”, "phones": [ { "type": ”land", "number": "70812342342", "dnc": false }, { "type": "land", "number": "7083455634" } ] }
(another 99,998 rows)
![Page 27: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/27.jpg)
Helpful Hint: jq is JSON awk/sed/grep$ jq -c '.phones[] | select(.dnc == false and .type == “mobile” )' myData
{"dnc":false,"number":"2123455634","type":"mobile"}
{"dnc":false,"number":"70812342342","type":"mobile"}
…
$ jq [expression above] | wc –l
32433
$ gzip –c –d myData.gz | jq [expression above] | wc –l
32433
http://stedolan.github.io/jq/
![Page 28: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/28.jpg)
Helpful Hint: Don’t be afraid of metadata
Use a version number in each document:{ “v”: 1, “name”: “buzz”, “locale”: “NY”}{ “v”: 1, “name”: “steve”, “locale”: “UK”}{ “v”: 2, “name”: “john”, “region”: “NY”}
…or get fancier and use a header record:{ “vers”: 1, “creator”: “ID”, “createDate”: …}{ “name”: “buzz”, “locale”: “NY”}{ “name”: “steve”, “locale”: “UK”}{ “name”: “john”, “locale”: “NY”}
![Page 29: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/29.jpg)
Helpful Hints: Use batch ID
{ “vers”: 1, “batchID”: “B213W”, “createDate”:…}{ “name”: “buzz”, “locale”: “NY”}{ “name”: “steve”, “locale”: “UK”}{ “name”: “john”, “locale”: “NY”}
![Page 30: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/30.jpg)
Now that we have the data…
You’re well on your way to a single view consolidation…but first:
–Data Work• Cross-reference important keys• Potential scrubbing/cleansing
– Software Stack Work
![Page 31: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/31.jpg)
You’ve Built a Great Data Asset; leverage it!
![Page 32: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/32.jpg)
DON’T Build This!
Giant Glom
OfGUI-biased
code
http://yourcompany/yourapp
![Page 33: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/33.jpg)
Build THIS!http://yourcompany/yourapp
Data Access Layer
Object Constructon Layer
Basic Functional Layer
Portal Functional Layer
GUI adapter Layer
Web Service Layer
Other Regular Performance Applications
Higher Performance Applications
SpecialGeneric Applications
![Page 34: Creating a Single View: Data Design and Loading Strategies](https://reader034.vdocuments.mx/reader034/viewer/2022052505/555159cab4c905a8768b4b54/html5/thumbnails/34.jpg)
What Is Happening Next?
Access Control
Data Protection
Auditing
Overview &Data Analysis
Data Design &Loading
Strategies
çΩ
Creating A Single View
Part1
Part2
Securing Your Deployment
Part3