Download - Wake up and smell the data
![Page 1: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/1.jpg)
Wake Up and Smell the Data
February, 2013
Mark Madsenwww.ThirdNature.net@markmadsen
![Page 2: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/2.jpg)
Caveat
The focus of this talk is on information processing and delivery, leaving out many aspects of big data in the automation / execution sense.
![Page 3: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/3.jpg)
Big Data, Big Hype
$876 Gajillion (analyst estimates of the big data market)
![Page 4: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/4.jpg)
We’ve been here before
Bill Schmarzo, EMC
![Page 5: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/5.jpg)
Big Data, Big Nonsense
Big data is subjective, based on bigness at a point in time?
McKinsey focused on the least interesting aspect of big data.
Source: McKinsey
![Page 6: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/6.jpg)
Data volume is the oldest, easiest problem
Image courtesy of Teradata
![Page 7: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/7.jpg)
Technology Capability and Data Volume
Source: Noumenal, Inc.
![Page 8: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/8.jpg)
Origin of BI and data warehouse concepts
The general concept of a separate architecture for BI has been around longer, but this paper by Devlin and Murphy is the first formal data warehouse architecture and definition published.
8
“An architecture for a business and information system”, B. A. Devlin, P. T. Murphy, IBM Systems Journal, Vol.27, No. 1, (1988)
Slide 8Copyright Third Nature, Inc.
![Page 9: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/9.jpg)
Our ideas about information and
how it’s used are outdated.
![Page 10: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/10.jpg)
Metadata catalog
![Page 11: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/11.jpg)
Report
![Page 12: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/12.jpg)
Report library
![Page 13: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/13.jpg)
BI is using broken metaphors
We think of BI as publishing, which it isn’t.
![Page 14: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/14.jpg)
When you first give people access to information that was unavailable…
OH GODI can see into forever
![Page 15: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/15.jpg)
After a while the response is more measured
![Page 16: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/16.jpg)
User autonomy is a tradeoff
Autonomy is a tradeoff in most data warehouses: control at the expense of complexity.
Complexity for casual users can lead to messes.
So we err on the side of simplifying user access in three ways…
![Page 17: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/17.jpg)
Centralize: that solves all problems!
Creates bottlenecks
Causes scale problems
Enforces a single model
In some organizations and areas of business “data warehouse” is a bad word.
![Page 18: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/18.jpg)
Standardize: it’s simpler for everyone
![Page 19: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/19.jpg)
The “E” in EDW was a lie…
![Page 20: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/20.jpg)
Measurement started with the convenient dataThe convenient data is transactional data.▪ Goes in the DW and is used, even if it isn’t the right measurement.
The difficult and misleading data is declarative data.▪ What people say and what they do require ground truth.
The inconvenient data is observational data.▪ It’s not neat, clean, or designed into most systems of operation.
We need to build data systems that integrate all three.
![Page 21: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/21.jpg)
Value: There’s a pony in there somewhere
![Page 22: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/22.jpg)
Many current views miss the point
Using Big Data
![Page 23: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/23.jpg)
It’s not about “big”
Using Big Data
And “big” is often not as big as you think it is.
![Page 24: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/24.jpg)
It’s not really about data, either
Using Big Data
If there’s no process for applying information in a specific context then you are producing expensive trivia.
![Page 25: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/25.jpg)
Two keys to making big data worthwhile
Value:Goal solution
not
Solution goal
Actionability:Simple “value” isn’t enough.
Information has to be actionable, somehow.
![Page 26: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/26.jpg)
Planning data strategy means understanding the context of data use so we can provide infrastructure
Monitor Analyze Exceptions
Analyze Causes Decide Act
No problem No idea Do nothing
We need to focus on what people do with data as the primary task, not on the data or the technology.
Copyright Third Nature, Inc.
![Page 27: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/27.jpg)
General model for organizational use of data
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
No problem No idea Do nothing
Act on the processUsually days/longer timeframe
Act within the processUsually real-time to daily
![Page 28: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/28.jpg)
You need to be able to support both paths
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
Act on the process
Act within the process
Conventional BI
Causal analysis, i.e. “data science”
![Page 29: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/29.jpg)
How do you manage the business in today’s environment?
Our simplistic notions of BI with stable models, ordered data and predictability are being replaced by concepts from decision support and complex adaptive systems (CAS).
Simple Complicated Complex
Assumption: Order Assumption: Unorder Assumption: Disorder
Cause and effect is repeatable & predictable
Cause and effect is separated in time & space, repeatable, learnable
Cause and effect is coherent in retrospect only, modelablebut changing
Known Knowable Unpredictable
Standard processes, clear metrics, best practice
Analytical techniques to determine options, effects
Experiment to create possible options
Sense, categorize, respond Sense, analyze, respond Test, sense, respond
Reporting, dashboards Ad‐hoc, OLAP, exploration Data science, casual analysis
Situational context governs data useCopyright Third Nature, Inc.
![Page 30: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/30.jpg)
BI/DW environment support varies for these contexts
Handles this really well (most of the time).
Basic BI Analysis Data science, analytics
Assumption: Order Assumption: Unorder Assumption: Disorder
Cause and effect is repeatable & predictable
Cause and effect is separated in time & space, repeatable, learnable
Cause and effect is coherent in retrospect only, modelablebut changing
Known Knowable Unpredictable
Standard processes, clear metrics, best practice
Analytical techniques to determine options, effects
Experiment to create possible options, test hypotheses
Sense, categorize, respond Sense, analyze, respond Test, sense, respond
Reporting, dashboards Ad‐hoc, OLAP, data discovery Casual analysis, simulation
Handles this sort of ok, sometimes.
This, not so much.
Copyright Third Nature, Inc.
![Page 31: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/31.jpg)
TANSTAAFL
Technologies are not perfect replacements for one another.
When replacing the old with the new (or ignoring the new over the old) you always make tradeoffs, and usually you won’t see them for a long time.
![Page 32: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/32.jpg)
The usage models for conventional BI
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
No problem No idea Do nothing
Act on the processUsually days/longer timeframe
Act within the processUsually real-time to daily
This is what we’ve been doing with BI so far: static reporting, dashboards, ad-hoc query, OLAP
![Page 33: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/33.jpg)
The usage models for analytics and “big data”
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
No problem No idea Do nothing
Act on the processUsually days/longer timeframe
Act within the processUsually real-time to daily
Analytics and big data is focused on new use cases: deeper analysis, causes, prediction, optimizing decisions
This isn’t ad-hoc, reporting, or OLAP.
![Page 34: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/34.jpg)
Analytics embiggens the data volume problem
Many of the processing problems are O(n2) or worse, so moderate data can be a problem for DB‐based platforms
![Page 35: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/35.jpg)
New and growing use cases drive the need to expand
The use cases are now interactive applications, lower latency data, complex analytics and discovery rather than reporting.
![Page 36: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/36.jpg)
Big Data Shift in a Nutshell
The old model for data▪ Centralized publishing▪ Read only▪ Integrate before use▪ Record only important data
▪ Retrieval‐focused▪ Single method of access
▪ Human‐level latency
The new model for data▪ Community creation
▪ Read‐write▪ Integrate at time of use
▪ Record all the data▪ Processing‐focused▪ Multiple methods of access
▪ Machine‐level latency
It’s an architectural reconfiguration, just like web 2.0
![Page 37: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/37.jpg)
“The future, according to some scientists, will be exactly like the past, only far more expensive.” ~ John Sladek
![Page 38: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/38.jpg)
About the Presenter
Mark Madsen is president of Third Nature, a research and advisory firm focused on analytics, business intelligence and data management. Mark is an award‐winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor at Forbes Online and Information Management. For more information or to contact Mark, follow @markmadsen on Twitter or visit http://ThirdNature.net
![Page 39: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/39.jpg)
About Third Nature
Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, and performance management. If your question is related to data, analytics, information strategy and technology infrastructure then you‘re at the right place.
Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors.
We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.
![Page 40: Wake up and smell the data](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c6828e4a79598d528b46b0/html5/thumbnails/40.jpg)
CC Image AttributionsThanks to the people who supplied the creative commons licensed images used in this presentation:
Outdated gumshoe.jpg – http://flickr.com/photos/olivander/372385317/Card catalog – http://flickr.com/photos/deborahfitchett/2372385317/book of hours manuscript2.jpg ‐ http://flickr.com/photos/jeffrey/89461374/royal library san lorenzo.jpg ‐ http://flickr.com/photos/cuellar/370663920/uniform_umbrellas.jpg ‐ http://www.flickr.com/photos/mortimer/221051561/ponies in field.jpg ‐ http://www.flickr.com/photos/bulle_de/352732514/caged_tower_melbourne.jpg ‐ http://www.flickr.com/photos/vermininc/2227512763