sql, nosql , bigdata , tables, blobs and more… what’s a developer to do ?
DESCRIPTION
SQL, noSQL , BigData , Tables, Blobs and more… What’s a developer to do ?. David Campbell Technical Fellow. Overview. Describe the Landscape & How to Decide Explain “Big Data” Map/Reduce Drill-Down Answer Questions. Audience Participation…. Life Was Simple. “Forms Over Data”. - PowerPoint PPT PresentationTRANSCRIPT
SQL, noSQL, BigData, Tables, Blobs and more… What’s a developer to do?
David CampbellTechnical Fellow
A.Describe the Landscape & How to Decide
B.Explain “Big Data”C.Map/Reduce Drill-DownAnswer Questions
Overview
Audience Participation…
Life Was Simple
“Forms Over Data”
Device / CloudMulti-dimensional ExperiencesSocial IntegrationRapid EvolutionVolatile Scale
Not anymore…
A Storage Zoo…
The Result
Rapid Development and Evolution• Persistence Ignorance• Schema Evolution / Dynamic Schema
Friction Free Scaling• O(1) Management Scale• Partition Ignorance• HA & Resilience
Maximize Return on Available Data• Audience Analytics• Recommendations
What do Developers Want?
?
Data ModelConsistency ModelCluster ModelQuery ModelView Model
How do we make sense of this?
A Conceptual Model
It’s Simple – Really!
Smart Choice = Separation & Composition
Entity Framework Code First Migrations
The Cost of ConsistencyCo
st~{
frict
ion,
per
form
ance
, ava
ilabi
lity,
…}
System Implementation Level ----Data Model Level ----
Machine Rack Data Center InternetAt
tribu
teEn
tity
Shar
d Data
base Da
taba
se
ACID consistency within members (shards)
Eventual consistency across members
SQL Azure DB Federations
M1 M2 M3 M4 M5
Root
Takeaway: How to ChooseConceptual Model Drives Smart ChoicesYou can mix and match – baby & bathwater, etc.TNSTAAFL
You are now smarter than most bloggers on this topic!
Azure OfferingsAzure Blob Storage
Elastic Inexpensive storageAzure Tables
Elastic Key/Attribute storageAzure Caching
Elastic Key/Object cacheAzure SQL Database
Elastic RDBMS with sharding capabilities
Explaining “Big Data”
Awash in “Ambient Data”Free to acquireCheap to store“Information Production”Turns Ambient Data into InformationInsight GenerationTurns Information into Insights & Actions
What is “Big Data” really about?
Top Level Value Flow
Ambient Data
Information Production
Insights & Actions
Data Acquisition Cost $0
$1.10
$1,000 $1,000,000,000
$0.00
From: $1B/TB To: ~$0/TB
Data Storage Cost $0
Source: http://www.littletechshoppe.com/ns1625/winchest.html
$December 1981 -
$660M/TBAugust 2010 -
$100/TB
From: $660,000,000/TB To: $100/TB in 30 years
The Big Dataflow…
Digital Shoebox
SourceSourc
eSource
SourceSourceSource
SourceSourceSourceInformationProduction
Traditional Systems• Data Warehouses /
Marts• Cubes• …
Emergent Systems• Deep data mining• Machine Learning• Near real-time
prediction• …
Time
Standard Data Analytics Lifecycle
Questio
n
Collect
the da
ta
Build a
logica
l mod
el
Build a
physi
cal m
odel
Load t
he da
ta
TuneAnsw
er the
quest
ion
Often weeks to months
Lifecycle of a Question
QuestionWorth asking again?
Make it repeatable
Bring it to production
Validation
Different Questio
n
Not interesting
Personal Example - GPSSource T1
T2
T3
T4
T5
• Tree of transforms and filters• Cleansing often happens in transformed
domain• E.g. Where I slept each night…
• Can produce higher level information• [DwellAtHome],[RouteToWork],
[DwellAtWork] = ‘Commute to work’• Using higher level information:
• Commute duration f(leavingTime)
Commute Time as f(leaveTime)
Event & State Correlation
2011-06-10 06:18:26, 2011-06-10 06:16:18, 0.04 2011-06-10 06:21:18, 2011-06-09 08:27:50, 21.89 2011-06-10 06:24:37, 2011-06-09 07:43:58, 22.68 2011-06-10 06:26:48, None, 0.00 2011-06-10 06:29:37, 2011-06-09 06:53:34, 23.60 2011-06-10 06:34:41, 2011-06-09 12:00:25, 18.57 2011-06-10 06:39:52, 2011-06-09 17:44:54, 12.92 2011-06-10 06:43:18, 2011-06-09 14:28:49, 16.24
Dwell geolocation
Outlook statistics
+
=How much email do I send from home vs. at work?
Developer Friendly Information Production MachineSimple to UnderstandSimple to Develop ForInherently Scalable
What’s the deal with Hadoop and other Map/Reduce systems?
Map / Reduce Systems
EYNTK about MapReduce on One Slide Map
Map
Map
Map
Reduce
Reduce
1 2 3 4 5
1. MapReduce framework splits input up into groups of data2. MapReduce framework calls your Map function – Map(input)
a) Your Map function processes input and returns 0 or more (key,value) pairs3. MapReduce framework collates keys (“Shuffle”)4. MapReduce framework calls your Reduce function – Reduce(key, []values)
a) Your Reduce function processes values and returns a result5. MapReduce framework writes your result to the filesystem
HDInsightHadoop on Windows {Azure, Server, Laptop}Hortonworks HDP distribution.NET Map/Reduce APILinq to Hive
Let’s Look at Some Code…
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.