wake up and smell the data
Post on 27-Jan-2015
112 Views
Preview:
DESCRIPTION
TRANSCRIPT
Wake Up and Smell the Data
February, 2013
Mark Madsenwww.ThirdNature.net@markmadsen
Caveat
The focus of this talk is on information processing and delivery, leaving out many aspects of big data in the automation / execution sense.
Big Data, Big Hype
$876 Gajillion (analyst estimates of the big data market)
We’ve been here before
Bill Schmarzo, EMC
Big Data, Big Nonsense
Big data is subjective, based on bigness at a point in time?
McKinsey focused on the least interesting aspect of big data.
Source: McKinsey
Data volume is the oldest, easiest problem
Image courtesy of Teradata
Technology Capability and Data Volume
Source: Noumenal, Inc.
Origin of BI and data warehouse concepts
The general concept of a separate architecture for BI has been around longer, but this paper by Devlin and Murphy is the first formal data warehouse architecture and definition published.
8
“An architecture for a business and information system”, B. A. Devlin, P. T. Murphy, IBM Systems Journal, Vol.27, No. 1, (1988)
Slide 8Copyright Third Nature, Inc.
Our ideas about information and
how it’s used are outdated.
Metadata catalog
Report
Report library
BI is using broken metaphors
We think of BI as publishing, which it isn’t.
When you first give people access to information that was unavailable…
OH GODI can see into forever
After a while the response is more measured
User autonomy is a tradeoff
Autonomy is a tradeoff in most data warehouses: control at the expense of complexity.
Complexity for casual users can lead to messes.
So we err on the side of simplifying user access in three ways…
Centralize: that solves all problems!
Creates bottlenecks
Causes scale problems
Enforces a single model
In some organizations and areas of business “data warehouse” is a bad word.
Standardize: it’s simpler for everyone
The “E” in EDW was a lie…
Measurement started with the convenient dataThe convenient data is transactional data.▪ Goes in the DW and is used, even if it isn’t the right measurement.
The difficult and misleading data is declarative data.▪ What people say and what they do require ground truth.
The inconvenient data is observational data.▪ It’s not neat, clean, or designed into most systems of operation.
We need to build data systems that integrate all three.
Value: There’s a pony in there somewhere
Many current views miss the point
Using Big Data
It’s not about “big”
Using Big Data
And “big” is often not as big as you think it is.
It’s not really about data, either
Using Big Data
If there’s no process for applying information in a specific context then you are producing expensive trivia.
Two keys to making big data worthwhile
Value:Goal solution
not
Solution goal
Actionability:Simple “value” isn’t enough.
Information has to be actionable, somehow.
Planning data strategy means understanding the context of data use so we can provide infrastructure
Monitor Analyze Exceptions
Analyze Causes Decide Act
No problem No idea Do nothing
We need to focus on what people do with data as the primary task, not on the data or the technology.
Copyright Third Nature, Inc.
General model for organizational use of data
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
No problem No idea Do nothing
Act on the processUsually days/longer timeframe
Act within the processUsually real-time to daily
You need to be able to support both paths
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
Act on the process
Act within the process
Conventional BI
Causal analysis, i.e. “data science”
How do you manage the business in today’s environment?
Our simplistic notions of BI with stable models, ordered data and predictability are being replaced by concepts from decision support and complex adaptive systems (CAS).
Simple Complicated Complex
Assumption: Order Assumption: Unorder Assumption: Disorder
Cause and effect is repeatable & predictable
Cause and effect is separated in time & space, repeatable, learnable
Cause and effect is coherent in retrospect only, modelablebut changing
Known Knowable Unpredictable
Standard processes, clear metrics, best practice
Analytical techniques to determine options, effects
Experiment to create possible options
Sense, categorize, respond Sense, analyze, respond Test, sense, respond
Reporting, dashboards Ad‐hoc, OLAP, exploration Data science, casual analysis
Situational context governs data useCopyright Third Nature, Inc.
BI/DW environment support varies for these contexts
Handles this really well (most of the time).
Basic BI Analysis Data science, analytics
Assumption: Order Assumption: Unorder Assumption: Disorder
Cause and effect is repeatable & predictable
Cause and effect is separated in time & space, repeatable, learnable
Cause and effect is coherent in retrospect only, modelablebut changing
Known Knowable Unpredictable
Standard processes, clear metrics, best practice
Analytical techniques to determine options, effects
Experiment to create possible options, test hypotheses
Sense, categorize, respond Sense, analyze, respond Test, sense, respond
Reporting, dashboards Ad‐hoc, OLAP, data discovery Casual analysis, simulation
Handles this sort of ok, sometimes.
This, not so much.
Copyright Third Nature, Inc.
TANSTAAFL
Technologies are not perfect replacements for one another.
When replacing the old with the new (or ignoring the new over the old) you always make tradeoffs, and usually you won’t see them for a long time.
The usage models for conventional BI
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
No problem No idea Do nothing
Act on the processUsually days/longer timeframe
Act within the processUsually real-time to daily
This is what we’ve been doing with BI so far: static reporting, dashboards, ad-hoc query, OLAP
The usage models for analytics and “big data”
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
No problem No idea Do nothing
Act on the processUsually days/longer timeframe
Act within the processUsually real-time to daily
Analytics and big data is focused on new use cases: deeper analysis, causes, prediction, optimizing decisions
This isn’t ad-hoc, reporting, or OLAP.
Analytics embiggens the data volume problem
Many of the processing problems are O(n2) or worse, so moderate data can be a problem for DB‐based platforms
New and growing use cases drive the need to expand
The use cases are now interactive applications, lower latency data, complex analytics and discovery rather than reporting.
Big Data Shift in a Nutshell
The old model for data▪ Centralized publishing▪ Read only▪ Integrate before use▪ Record only important data
▪ Retrieval‐focused▪ Single method of access
▪ Human‐level latency
The new model for data▪ Community creation
▪ Read‐write▪ Integrate at time of use
▪ Record all the data▪ Processing‐focused▪ Multiple methods of access
▪ Machine‐level latency
It’s an architectural reconfiguration, just like web 2.0
“The future, according to some scientists, will be exactly like the past, only far more expensive.” ~ John Sladek
About the Presenter
Mark Madsen is president of Third Nature, a research and advisory firm focused on analytics, business intelligence and data management. Mark is an award‐winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor at Forbes Online and Information Management. For more information or to contact Mark, follow @markmadsen on Twitter or visit http://ThirdNature.net
About Third Nature
Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, and performance management. If your question is related to data, analytics, information strategy and technology infrastructure then you‘re at the right place.
Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors.
We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.
CC Image AttributionsThanks to the people who supplied the creative commons licensed images used in this presentation:
Outdated gumshoe.jpg – http://flickr.com/photos/olivander/372385317/Card catalog – http://flickr.com/photos/deborahfitchett/2372385317/book of hours manuscript2.jpg ‐ http://flickr.com/photos/jeffrey/89461374/royal library san lorenzo.jpg ‐ http://flickr.com/photos/cuellar/370663920/uniform_umbrellas.jpg ‐ http://www.flickr.com/photos/mortimer/221051561/ponies in field.jpg ‐ http://www.flickr.com/photos/bulle_de/352732514/caged_tower_melbourne.jpg ‐ http://www.flickr.com/photos/vermininc/2227512763
top related