2015 mar-10 improving data management through utilizing big data - mapping a technology to a data...
TRANSCRIPT
Improving Data Management through Utilizing Big Data:Mapping a Technology to a Data ConceptMarch 10, 2015Mike Jennings – Walgreens Boots Alliance
©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
Big DataDefining
2©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
Describe any voluminous amount of structured, semi‐structured and unstructured data that has the potential to be analyzed for information
From www.bizcubed.com.au
Enterprise Data Management FrameworkStarting EDM Definition
3©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
Enterprise Data Management FrameworkContext with the DMBOK Framework
4©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
Enterprise Data Management FrameworkAlternative EDM Framework
Metad
ata Man
agem
ent
Data Con
text
Data M
odel/ClassificationData Structure and Fram
ework
Structured Data
Management
UnstructuredData
Management
Master Data &Reference DataManagement
Business Intelligence &
Data Warehousing
Data Quality Management
Data Security Management
DataIntegrationManagement
Data DeliveryManagement
Data GovernancePolicies, Processes, Standards, Organization, and Stewardship
5©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
DMBOK Functions & Big Data ProjectsData Storage & Operations
9©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
The technologies and processes organizations use to maximize or improve the performance of their data storage resources.
File system that provides the ability to store large volumes of structured and unstructured data
Operations, resource (node), and scheduling management for write and read to the cluster
Workflow scheduling component for data transformations
Manages services, configurations, and their synchronization across the cluster
DMBOK Functions & Big Data ProjectsData Integration & Interoperability
10©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
The combination of technical and business processes used to combine data from disparate sources into meaningful and unified view, according
to business requirements and accepted practices.
Provides real‐time processing of data streams for monitoring and alerts.
Provides ability to import data from a RDBMS to HDFS.
Provides ability to collect, aggregate, and move huge log files ). into HDFS (e.g., apps, GPS, social, sensors, other).
Provides high volume fault tolerant publish & subscribe messaging for real‐time analysis.
DMBOK Functions & Big Data ProjectsData Quality
11©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
A measure of the degree to which data satisfies the information needs of its consumers, reflects the nature and state of the real world concepts to which it relates, is coherent within itself, and provides value in the decision‐making
processes for which it is to be utilized.
Provides relational structure to HDFS data. File formats can be applied to data from HDFS or local file system
Provides ability to import data from a RDBMS to HDFS. Imported data can be constrained through import control arguments and basic SQL execution.
Provides ability to collect, aggregate, and move huge log files ). into HDFS (e.g., apps, GPS, social, sensors, other). Flume agent can be use with predefined data patterns (sinks) to ensure data format.
DMBOK Functions & Big Data ProjectsMeta‐data
12©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
All the physical data and knowledge about the business and technical processes used by an organization. Meta‐data is knowledge about the
organization’s data.
Provides data lineage between data sources and the cluster including integration with the metastore/catalog (e.g., Hive HCatalog).
Provides relational structure to HDFS data. File formats can be applied to data from HDFS or local filesystem
DMBOK Functions & Big Data ProjectsDocuments & Content
13©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
The management of documents and non‐structured content found in audio, video, email, images, etc. and the meta‐data associated
with this material
Provides ability to collect, aggregate, and move huge log files ). into HDFS (e.g., apps, GPS, social, sensors, email, other).
Provides ability to search of data in the cluster by indexing to enable full text search.
DMBOK Functions & Big Data ProjectsData Warehousing & Business Intelligence
14©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
A data warehouse is a subject‐oriented, integrated, time‐variant and non‐volatile collection of data in support of management's decision making process. Business Intelligence is the collection of activities that allow an organization to analyze data
and make decisions based on facts from historical and predictive data sets.
Provides fast big table access to large quantities of data typically on top of the cluster.
Provides compute algorithm typically used to produce output data from a large volume of data in the cluster for consumption.
Provides semantic layer for accessing data in the cluster.
Provides a enhanced compute approach typically used to produce output data from a large volume of data in the cluster for consumption.
Provides a in‐memory compute method typically used to produce output data from a large volume of data in the cluster for consumption (e.g., machine learning algorithms).
DMBOK Functions & Big Data ProjectsData Security
15©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
Data security concerns the protection of data from accidental or intentional but unauthorized modification, destruction or disclosure through the use of
physical security, administrative controls, logical controls, and other safeguards to limit accessibility.
Provides security authorization (grant/revoke), policy administration, and audit for the cluster.
Provides service level authorization for users/groups.
Provides semantic layer (table) for accessing data in the cluster that can be secured.
DMBOK Functions & Big Data ProjectsData Governance – Potential Opportunity Areas
16©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
Provides the organizational oversight, processes and methods to effectively manage data as an asset across the organization
Provides data lineage between data sources and the cluster including integration with the metastore/catalog (e.g., Hive HCatalog).
Provides relational structure to HDFS data. File formats can be applied to data from HDFS or local filesystem
Provides ability to search of data in the cluster by indexing to enable full text search.
Provides security authorization (grant/revoke), policy administration, and audit for the cluster.
17©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information
Data about core business entities and concepts, independent of transactions, and data that defines the set of permissible values to be
used by other data fields
DMBOK Functions & Big Data ProjectsReference & Master Data – Potential Opportunity Areas
Provides ability to import data from a RDBMS to HDFS.
Provides semantic layer for accessing data in the cluster.
Bio
Michael JenningsSenior Director, Enterprise Data ArchitectureWalgreens Boots Alliance1419 Lake Cook Road, MS: L497Deerfield, IL 60015 USA847 964 [email protected]/in/micahelfjennings
Michael Jennings is a recognized industry expert in enterprise architecture and informationmanagement with more than twenty-five years of experience in various industries. Mike speaksfrequently on enterprise architecture and information management concepts and practices at majorindustry conferences.
He is a co-author of the book "Universal Meta Data Models" (2004) and a contributing author to thebooks "Building and Managing the Meta Data Repository" (2000) and “The DAMA Guide to the DataManagement Body of Knowledge - DMBOK” (2009).
Mike was recognized with the 2013 DAMA International Professional Achievement Award and asone of Information Management Magazine’s 25 Top Information Managers for 2012.
He currently serves as VP of Programs for the Wisconsin DAMA Chapter and as VP of Operationsfor DAMA International.
18©2015 Walgreens Boots Alliance. All rights reserved. Confidential and proprietary information