minn twdi 9 9
DESCRIPTION
TDWI Presentation on Using Big Data EffectivelyTRANSCRIPT
![Page 1: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/1.jpg)
Big Data Analytics
Using the information effectively
![Page 2: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/2.jpg)
9/10/2013 | 2 | ©2013 Ciber, Inc.
Agenda
• Big Data• Making Sense of it all• A Framework of Understanding• Topical information• Non Topical Information• Analytics • Examples• Getting there• Q&A
![Page 3: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/3.jpg)
9/10/2013 | 3 | ©2013 Ciber, Inc.
Social Network Diagram
• Contextual analytics is one of the hottest areas of interest pertaining to big data today
• Smart companies know there is tremendous value in contextual analytics. But aggregating, categorizing, summarizing, exploring and contextualizing unstructured data is a big undertaking.
![Page 4: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/4.jpg)
9/10/2013 | 4 | ©2012 Ciber, Inc.
Big Data
![Page 5: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/5.jpg)
9/10/2013 | 5 | ©2013 Ciber, Inc.
What is the Big Data market?
Source: “Big Data Market Size and Vendor Revenues”, Wikibon, Jeff Kelly, David Valante, David Elgyer, Feb 2013 – actual data through 2011
Acronyms: TBD = to be determined; SI = systems integrator; BPO = business process outsourcing
![Page 6: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/6.jpg)
9/10/2013 | 6 | ©2013 Ciber, Inc.
Sample Industry Applications of Big Data
Telco
Call Detail Record (CDR) analytics for:• Customer service• Network planning • Regulatory
compliance
Financial Services
Transaction analytics for:• Fraud detection• Customer retention• Distribution network
planning (Branch, ATM, Call Center)
• Regulatory compliance
• Consumer card / Merchant activity
Utilities
Network / Process analytics for:• Grid monitoring /
reliability studies• Preventive
Maintenance• Power production
monitoring / planning
Retail
Product analytics for:• Market Basket analytics• SKU trending• Competitive analyses• Context-aware buying• Social indicators of
brand
Healthcare
Patient analytics for:• Cost of care
reduction• Quality of care
improvement• Claims optimization• Service provider
consistency• Outcome
diagnostics• Regulatory
compliance
![Page 7: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/7.jpg)
9/10/2013 | 7 | ©2013 Ciber, Inc.
Why Big Data? Insights from Analysis
• Time college football products to win customers– WalmartLabs: social media buzz indicates when customers are
getting excited about the upcoming season and their team(s). Combined with ShopyCat app provides targeted promos on team items.
• Detecting nosocomial infections before they kill infants– Toronto hospital – Nosocomial infections can be life-threatening
to premature infants if not treated quickly. Neonatal monitoring with real-time analytics can detect heart beat patterns that identify an infection before symptoms appear.
![Page 8: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/8.jpg)
9/10/2013 | 8 | ©2013 Ciber, Inc.
Wal-Mart handles more than 1 million customer transactions every hour which import into databases containing more than 2.5 petabytes
Volume Velocity Variety
1M/hour
In addition to all procedure, claims and payment systems’ structured data add unstructured data in EMRs, patient monitoring devices, publications, drug structures, social network comments, carrier health sites, post-treatment care records…
80%
Exist in the digital universe as of early 2013
1 zettabyte = 1,000 exabytes1,000,000 petabytes10^9 terabytes10^12 gigabytes
2.7zettabytes
What Drives Big Data Analytics
![Page 9: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/9.jpg)
9/10/2013 | 9 | ©2013 Ciber, Inc.
EngineeringSocial/Mobile
The Big Data Ecosystem
Enterprise Systems
Customer Loyalty & Service Systems
CustomerCase Files
E-MailsAudioImagesProvisioningSystems
Variety Veracity
Velocity Volume
Analysis
Business Outcomes
Predictive AnalyticsCEP
Operational ControlSimulation
Social AnalyticsDigital Marketing
WEB Analytics
Blogs,Communities
![Page 10: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/10.jpg)
9/10/2013 | 10 | ©2013 Ciber, Inc.
Hadoop and other options
• A strategy for bringing together hardware and software
• What choices are available and how do you choose the best option?
• How do I govern it?
![Page 11: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/11.jpg)
9/10/2013 | 11 | ©2013 Ciber, Inc.
Big Data Toolscape
![Page 12: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/12.jpg)
9/10/2013 | 12 | ©2013 Ciber, Inc.
There are Many Use Cases for a Big Data Platform
Social Media - Product/brand Sentiment analysis Brand strategy Market analysis RFID tracking & analysis Transaction analysis to create insight-
based product/service offerings
Multimodal surveillance Cyber security Fraud modeling & detection Risk modeling & management Regulatory reporting
Innovate New Productsat Speed and Scale
Know Everything about your Customer Social media customer sentiment
analysis Promotion optimization Segmentation Customer profitability Click-stream analysis CDR processing Multi-channel interaction analysis Loyalty program analytics Churn prediction
Run Zero Latency Operations
Smart Grid/meter management Distribution load forecasting Sales reporting Inventory & merchandising optimization Options trading ICU patient monitoring Disease surveillance Transportation network optimization Store performance Environmental analysis Experimental research
Instant Awareness ofRisk and Fraud
Exploit Instrumented Assets Network analytics Asset management and predictive issue resolution Website analytics IT log analysis
Back
![Page 13: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/13.jpg)
9/10/2013 | 13 | ©2013 Ciber, Inc.
Processing and Archiving Strategies
• Store forever• Selective storage• Throw away after processing
![Page 14: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/14.jpg)
9/10/2013 | 14 | ©2012 Ciber, Inc.
Making Sense of it all
![Page 15: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/15.jpg)
9/10/2013 | 15 | ©2013 Ciber, Inc.
Making sense of it all
• Clarity of purpose• Definition of scope• Allocation of resources• Concrete result expectations• Comparative Analytical Measures (e.g.
KPIs) – Rationalization of measures into actionable
items and hierarchical groups– Defining predictive analytics workspaces
!
!
!
![Page 16: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/16.jpg)
9/10/2013 | 16 | ©2013 Ciber, Inc.
Role of the Data Scientist
• Creating Intelligent Tagging
• Selecting tools for analysis
• Defining algorithms and data mining techniques
![Page 17: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/17.jpg)
9/10/2013 | 17 | ©2012 Ciber, Inc.
A Framework of Understanding
![Page 18: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/18.jpg)
9/10/2013 | 18 | ©2013 Ciber, Inc.
What is Contextualization ?
• Context is the interrelated conditions in which something exists or occurs . Helping define context is Environment, Setting, Timeline, Genre
• Why is context important?– Consistency needed in returned result sets– The context describes the internal or external “framework” – Internal contextual information is crucial– External contextual information is knowledge that which
cannot be gotten from the text of the item itself– Time and resources are wasted in searching irrelevant
and non-material information
![Page 19: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/19.jpg)
9/10/2013 | 19 | ©2013 Ciber, Inc.
Problems in searching data• Voluminous• Ambiguous meanings• Inconsistent tagging• Multiple item types – text, formatted, PDF, TIFF,
graphical, blogs, mashups • Knowledge of what is wanted is required to
understand and return the proper result sets• Differentiation is necessary between
– Real-time needs (e.g. fraud detection, medical Emergency room procedures)
– Near-time needs (sometime in the near timeline)– Relaxed-time (some clearly defined future period)
![Page 20: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/20.jpg)
9/10/2013 | 20 | ©2013 Ciber, Inc.
Topical information
• Topical information is generally visible in the data stream– Keywords, data ranges, etc.
![Page 21: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/21.jpg)
9/10/2013 | 21 | ©2013 Ciber, Inc.
Non-topical information
• Has to be retrieved outside the item– Although topic is crucial to the relevance of an
item, non-topical criteria plays an important role in the determination of relevance and significance
– The identification and use of non-content (or “context”) descriptors is necessary
– How widely agreed upon are the values of a given criterion among users (or user groups)?
![Page 22: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/22.jpg)
9/10/2013 | 22 | ©2013 Ciber, Inc.
Non-topical information cont’d
– What is the degree to which an attribute-value is “public” or “private”?• How useful is each criterion for the search
tasks to be addressed by the specific query system?
• How easily can a criterion be identified and assigned to an item?
• What methods can be applied for refining and speeding retrievals?
![Page 23: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/23.jpg)
9/10/2013 | 23 | ©2013 Ciber, Inc.
Descriptors - The defining of disambiguity
• Do the content descriptors correspond or relate to non-topical relevance criteria of the system’s users?
• Will users see a relationship between their relevance criteria and these descriptors, and use these descriptors in their search queries?
![Page 24: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/24.jpg)
9/10/2013 | 24 | ©2013 Ciber, Inc.
Content descriptors
• Content descriptors (topical relevance criteria) – “Public” knowledge:
• People of similar cultural backgrounds would (more or less) agree on the meanings. However, context descriptors (which can function as non-topical relevance criteria) can vary widely in the degree to which their attribute-values are considered public or private.
![Page 25: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/25.jpg)
9/10/2013 | 25 | ©2013 Ciber, Inc.
Public Knowledge Examples
• “Has pictures” is a criterion that could be considered “public” as most people could agree on whether or not a document “has pictures”, if given a specific document to evaluate.
• On the other hand, the criterion of “Regency Era” is highly situationally dependent - i.e. a limited subset of the public has knowledge of it -(specifically the period between 1811 and 1820, when King George III was deemed unfit to rule and his son - the Prince of Wales - ruled as his proxy as Prince Regent)
![Page 26: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/26.jpg)
9/10/2013 | 26 | ©2013 Ciber, Inc.
Genres refine taxonomy• Genre is a “folk typology” • Item categories must enjoy widespread recognition by their
intended user groups to qualify as genres.– Examples: Resumes, Ballet, Music, Chemical formulae, statistical
results• Groups of people agree on and define Genres by mutual
consent (Explicitly and Implicitly)– E.g. Taxonomies (plants, accounting, medical), laws, voting, polls
• Genres give rise to sub-genres with increasing granularity– E.g. Music, classical, romantic, new age, atonal– Genres and sub-genres may contain common elements
• E.g. classical music and romantic music may have an intersection of data points
![Page 27: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/27.jpg)
9/10/2013 | 27 | ©2013 Ciber, Inc.
Genre knowledge
• Genre is a type based on purpose, form and content. – E.g. The “resume” genre is for soliciting employment,
divided into sections with contextual descriptors• Knowing a particular item’s genre also infers
significant things about an item, sometimes enough to a make a judgment regarding the Item’s relevance to an information need– E.g. The phrase “Classically Trained Musician” infers
knowledge to read music and understand musical terminology along with additional shades of musical knowledge
![Page 28: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/28.jpg)
9/10/2013 | 28 | ©2012 Ciber, Inc.
Analytics
![Page 29: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/29.jpg)
9/10/2013 | 29 | ©2013 Ciber, Inc.
Historical Analytics
• Presentation of historical data– Dashboards, Drill-downs, interactive reports, static reports– New methods and devices– Identifying the metrics that affect key objectives– Synchronizing those metrics through an organization– Creating user tools to show effects of good (and bad)
choices – Tying the financial, operational, and sales worlds together– Analyzing to predict the future – Refining models for accuracy
![Page 30: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/30.jpg)
9/10/2013 | 30 | ©2013 Ciber, Inc.
Predictive Analytics
• Manipulation of data– Dashboards, Drill-downs, interactive reports– New methods and devices– Varying the metrics that affect key objectives– Synchronizing the impact of metrics through an
organization– Creating user tools to show effects of good (and bad)
choices – Tying the financial, operational, and sales worlds together– Creating models that show potential future scenarios – Refining models for accuracy using advanced tools and
statistics
![Page 31: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/31.jpg)
9/10/2013 | 31 | ©2012 Ciber, Inc.
Examples
![Page 32: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/32.jpg)
9/10/2013 | 32 | ©2013 Ciber, Inc.
Examples of Harnessing Data Resources
Retailer reduces time to run queries by 80% to optimize
inventory
Stock Exchange cuts queries from 26 hours to 2 minutes
on 2 PB
Government cuts acoustic analysis from hours to
70 Milliseconds
Utility avoids power failures by analyzing
10 PB of data in minutes
Telco analyses streaming network data to reduce hardware costs by 90%
Hospital analyses streaming vitals to detect illness
24 hours earlier
Big data challenges exist in every organization today
![Page 33: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/33.jpg)
9/10/2013 | 33 | ©2013 Ciber, Inc.
In Order to Realize New Opportunities, You Need to Think Beyond Traditional Sources of DataTransactional and Application Data
Machine Data Social Data
Volume Structured Throughput
Velocity Semi-structured Ingestion
Variety Highly unstructured Veracity
Enterprise Content
Variety Highly unstructured Volume
![Page 34: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/34.jpg)
9/10/2013 | 34 | ©2013 Ciber, Inc.
• Data at rest – oceans• Collection of what has streamed• Web logs, emails, social media• Unstructured documents: forms, claims• Structured data from disparate systems
• Data in movement - streams• Twitter / Facebook comments• Stock market data• Sensors: Vital signs of a newly-born
Two Sample Types of Big Data
![Page 35: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/35.jpg)
9/10/2013 | 35 | ©2012 Ciber, Inc.
Getting there
![Page 36: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/36.jpg)
9/10/2013 | 36 | ©2013 Ciber, Inc.
Leveraging Big Data Requires Multiple Platform Capabilities
Manage & store huge volume of any data
Hadoop File SystemMapReduce
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all data sources
Integration, Data Quality, Security, Lifecycle Management, MDM
Understand and navigate federated big data sources
Federated Discovery and Navigation
![Page 37: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/37.jpg)
9/10/2013 | 37 | ©2013 Ciber, Inc.
Outcomes Utilizing Big Data Capabilities
To Analyze Any Big Data Type
With Unique CapabilitiesAchieve Breakthrough Outcomes
Content
Transactional / Application Data
Machine Data
Social Media Data
Visualization and DiscoveryKnow Everything
About Your Customers
Run Zero-latency Operations
Innovate new products at Speed and Scale
Instant Awareness of Fraud and Risk
Exploit Instrumented Assets
Hadoop
Data Warehousing
Stream Computing
Integration and Governance
Text Analytics
![Page 38: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/38.jpg)
9/10/2013 | 38 | ©2013 Ciber, Inc.
Big Data Platform and Entry Points
2 – Analyze Raw Rata
5 – Analyze Streaming Data
1 – Unlock Big Data
3 – Simplify your warehouse
4 – Reduce costs with Hadoop
![Page 39: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/39.jpg)
9/10/2013 | 39 | ©2013 Ciber, Inc.
Q & A
ContactRichard Gristak, Senior Director of Business Intelligence – [email protected]
![Page 40: Minn twdi 9 9](https://reader033.vdocuments.mx/reader033/viewer/2022061201/5479fe9f5906b507358b460c/html5/thumbnails/40.jpg)
9/10/2013 | 40 | ©2012 Ciber, Inc.
Thank you