cloud data management · tdwi research offers in-depth research reports, commentary, inquiry...

41
BEST PRACTICES REPORT Q2 2019 Cloud Data Management Integrating and Processing Data in Modern Cloud and Hybrid Environments By Philip Russom Co-sponsored by

Upload: others

Post on 25-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

BEST PRACTICES REPORT

Q2 2019

Cloud Data Management

Integrating and Processing Data in Modern Cloud and Hybrid Environments

By Philip Russom

Co-sponsored by

Page 2: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

© 2019 by TDWI, a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. Email requests or feedback to [email protected].

Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies. Inclusion of a vendor, product, or service in TDWI research does not constitute an endorsement by TDWI or its management. Sponsorship of a publication should not be construed as an endorsement of the sponsor organization or validation of its claims.

This report is based on independent research and represents TDWI’s findings; reader experience may differ. The information contained in this report was obtained from sources believed to be reliable at the time of publication. Features and specifications can and do change frequently; readers are encouraged to visit vendor websites for updated information. TDWI shall not be liable for any omissions or errors in the information in this report.

By Philip Russom

Table of ContentsResearch Methodology and Demographics 3

Executive Summary 4

Introduction to Cloud Data Management 5

Defining Cloud Data Management (CDM) . . . . . . . . . . . . . . 6

Related Terms and Concepts for Cloud Data Management. . . . . . 6

Real-World Use Cases for CDM . . . . . . . . . . . . . . . . . . . 8

The Point of CDM and Similar Hybrid Practices . . . . . . . . . . . 9

Benefits and Barriers for CDM 11

CDM: Problem or Opportunity?. . . . . . . . . . . . . . . . . . . 11

Benefits of CDM . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Barriers to CDM . . . . . . . . . . . . . . . . . . . . . . . . . . 13

The State of CDM 15

Is CDM important? . . . . . . . . . . . . . . . . . . . . . . . . . 15

Why is CDM important? . . . . . . . . . . . . . . . . . . . . . . 16

Cloud Adoption: Decision Disciplines Are Catching up to Operational Ones. . . . . . . . . . . . . . . . . . . . . . . . . . 16

CDM Successes . . . . . . . . . . . . . . . . . . . . . . . . . . 17

CDM Failures. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Organizational Matters 20

CDM Owners . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

CDM Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Hiring and Training for CDM Skills . . . . . . . . . . . . . . . . . 22

Holistic Data Governance for Hybrid CDM . . . . . . . . . . . . . 23

Multiphase Plans for Migrating Data to Cloud . . . . . . . . . . . 24

CDM Best Practices 26

Data Platforms for Hybrid CDM . . . . . . . . . . . . . . . . . . 26

Data Management Tools for Hybrid CDM. . . . . . . . . . . . . . 28

Modern Data Semantics as CDM Enabler and Unifier of HDAs . . . 31

Data Virtualization as an Agile and Non-Intrusive CDM Method . . 32

Distributing Data across a Hybrid Data Architecture . . . . . . . . 33

Top Ten Priorities for Cloud Data Management 37

Research Co-sponsor: Datameer 39

Cloud Data ManagementIntegrating and Processing Data in Modern Cloud and Hybrid Environments

BEST PRACTICES REPORT

Q2 2019

tdwi.org 1

Page 3: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

2

Cloud Data Management

About the AuthorPHILIP RUSSOM, Ph.D., is senior director of TDWI Research for data management and is a well-known figure in data warehousing, integration, and quality. He has published more than 600 research reports, magazine articles, opinion columns, and speeches over a 20-year period. Before joining TDWI in 2005, Russom was an industry analyst covering data management at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and consultant, was a contributing editor with leading IT magazines, and a product manager at database vendors. His Ph.D. is from Yale. You can reach him at [email protected], @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.

About TDWI ResearchTDWI Research provides research and advice for data professionals worldwide. TDWI Research focuses exclusively on data management and analytics issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of data management and analytics solutions. TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor organizations.

About the TDWI Best Practices Reports SeriesThis series is designed to educate technical and business professionals about new business intelligence, analytics, AI, and data management technologies, concepts, or approaches that address a significant problem or issue. Research is conducted via interviews with industry experts and leading-edge user companies and is supplemented by surveys of business and IT professionals. To support the program, TDWI seeks vendors that collectively wish to evangelize a new approach to solving problems or an emerging business and technology discipline. By banding together, sponsors can validate a new market niche and educate organizations about alternative solutions to critical problems or issues. To suggest a topic that meets these requirements, please contact TDWI senior research directors Fern Halper ([email protected]), Philip Russom ([email protected]), or David Stodder ([email protected]).

AcknowledgmentsTDWI would like to thank many people who contributed to this report. First, we appreciate the many professionals who responded to our survey, especially those who agreed to our requests for phone interviews. Second, our report sponsors, who diligently reviewed outlines, survey questions, and report drafts. Finally, we would like to recognize TDWI’s production team: James Powell, Peter Considine, Lindsay Stares, and Michael Boyda.

SponsorsActian, Couchbase, Datameer, Denodo, Hitachi, Snowflake, SAP, and Trifacta sponsored this report.

Page 4: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 3

Research Methodology and Demographics

Research Methodology and Demographics Report Scope. Cloud data management (CDM) is about managing old and new data in hybrid data architectures that span on-premises and cloud systems. CDM involves database management systems, data integration tools, related platforms, and the numerous best practices users perform with them. The primary goal of CDM is to unify heterogeneous and hybrid data so businesses have complete data they can trust, govern, and leverage for business value. This is a challenge because clouds are making data more heavily distributed than it has ever been.

Audience. This report targets business and technical managers who are responsible for modernizing data environments that consolidate traditional enterprise data and modern big data using on-premises and cloud-based tools and platforms, typically to support use cases in reporting, analytics, and business operations.

Survey Methodology. In January 2019, TDWI sent an invitation via e-mail to the data management professionals in our database, asking them to complete an Internet-based survey. The invitation was also distributed via websites, newsletters, and publications from TDWI and other firms. The survey drew responses from 116 survey respondents. From these, we excluded respondents who identified themselves as vendor employees or academics. The resulting responses of 108 respondents form the core data sample for this report.

Research Methods. In addition to the survey, TDWI Research conducted telephone interviews with technical users, business sponsors, and recognized data management experts. TDWI also received product briefings from vendors that offer products and services related to the best practices under discussion.

Survey Demographics. The majority of survey respondents are IT or BI professionals (53%). Others are business sponsors or users (33%) and consultants (14%). We asked consultants to fill out the survey with a recent client in mind.

The respondent population is dominated by industries in consulting (12%), software/Internet (11%), and financial services (10%), followed by retail/wholesale/distribution (8%), insurance (7%), and other industries. Most survey respondents reside in the U.S. (66%), followed by Asia (11%), Canada (8%), and Europe (7%). For the most part, respondents are evenly distributed across all sizes of organizations.

PositionCorporate IT or BI professionals 53%

Business sponsors/users 33%

Consultants 14%

IndustryConsulting/professional services 12%

Software/Internet 11%

Financial services 10%

Retail/wholesale/distribution 8%

Insurance 7%

Education 5%

Manufacturing (non-computers)

5%

Transportation/logistics 5%

Agriculture 4%

Food/beverage 4%

Healthcare 4%

Media/entertainment/publishing

4%

Government: Federal 3%

Government: State/Local 3%

Telecommunications 3%

Other 12%

(“Other” consists of multiple industries, each represented by less than 3% of respondents.)

GeographyUnited States 66%

Asia 11%

Canada 8%

Europe 7%

Mexico, Central, or South America

4%

Australia/New Zealand 2%

Africa 1%

Middle East 1%

Company Size by RevenueLess than $100 million 21%

$100–$499 million 15%

$500 million–$999 million 10%

$1–$4.9 billion 19%

$5–$9.9 billion 8%

$10 billion or greater 19%

Don’t know 8%

Based on 108 survey respondents.

Page 5: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

4

Cloud Data Management

Executive SummaryThe world of IT continues to become more hybrid in that some information systems and data remain on premises while others are increasingly deployed to the cloud. Technical users are lured to the cloud because of its speed, scale, elasticity, and low level of maintenance, while business people are drawn to its agility, low cost, and ability to support new data-driven business practices.

The rise of the cloud has ramifications, especially in the realm of data management. As if managing data of increasing size weren’t hard enough, organizations are now challenged to monitor business processes, assemble complete views of customers, and weave a cohesive analysis of corporate performance based on hybrid data that is strewn across the traditional enterprise and multiple clouds. The long list of data platform and tool types that manage hybrid data coalesce into hybrid data architectures that are difficult to understand and optimize.

Cloud data management (CDM) has risen to address these new challenges. CDM is the latest evolution of data management, and it has been greatly updated and extended to support new cloud data platforms, applications, and use cases. It also integrates data from those with traditional on-premises sources and targets. CDM promises to enable the next level of business analytics and data-driven operational innovations based on data from platforms old and new.

Among users surveyed, CDM is already in use in cloud data warehousing, advanced analytics, multichannel marketing, real-time operational dashboards, and for data sync among on-premises and software-as-a-service (SaaS) applications. According to our survey results, the leading benefits of CDM are scalability, elasticity, analytics, real-time operations, and agility. Barriers to successful CDM include issues in governance, data migration, data quality, and tool maturity. A whopping 96% of users surveyed say CDM is an opportunity, which explains why so many are adopting cloud systems or migrating to them. Although most data is managed on premises today, survey data suggests that the amount on cloud platforms in the typical organization will at least triple over the next three years.

Our survey says that the infrastructure and system architectures for cloud data management are usually owned and maintained by central IT or a group responsible for enterprise data architecture. However, groups for data warehousing, DataOps, and general data management contribute to CDM solution designs and data requirement definitions. Among these workers, almost half of job titles include the word “architect” because creating unified solutions in a multiplatform and hybrid context requires deep architecture expertise for data, integration, systems, and applications.

The hybrid IT ecosystem under discussion includes an amazing variety of systems, each fulfilling a specific purpose, while also interoperating with many other systems. On the data side of things, the relational database continues its hegemony, though it has evolved to run natively on clouds and to focus on analytics (e.g., columnar, graph, and NoSQL databases). Also, data management tools (for integration, quality, metadata, and virtualization) have evolved to execute processing on servers and inside data platforms, both on premises and in the cloud, via interfaces for all these. On the application side of things, traditional on-premises applications are today regularly mixed with SaaS applications in clouds. Despite the eclectic nature of these massive software portfolios and the massive amounts of distributed data involved, cloud data management integrates hybrid data across hybrid data architectures with scale, speed, quality, and compliance.

This report explains in detail what CDM is and does so data professionals and their business counterparts can understand what CDM can do for them and how they might organize a successful program.

Cloud computing has joined traditional

paradigms, making IT hybrid

Data isn’t just big It's also hybrid

The new hybrid cloud context demands an updated approach to

data management

CDM is real and in use in decision-making

solutions, marketing, and for data sync

across applications

Cloud data management enables interoperability

and data sharing among a range of systems, old

and new, on premises and in the cloud

Page 6: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 5

Introduction to Cloud Data Management

Introduction to Cloud Data ManagementData continues to evolve into greater diversity in the sense of more data structures, schema, sources, containers, and latencies. Likewise, user organizations continue to diversify the ways they use data for business value, especially via use cases in advanced analytics, reporting, data warehousing, marketing, and time-sensitive business operations. At the same time, new platforms have emerged for data management, the most disruptive ones being open source and cloud-based.

The tools and platforms we use to manage data are also evolving to keep up with data, its changing management requirements, and its new use cases. Many are leveraging the power and affordability of the cloud. This includes the new database management systems (DBMSs) and data warehouse platforms built from the bottom up specifically for the cloud. Likewise, data integration platforms and analytics tools have evolved to run natively in the cloud (while still running on premises) as well as to interoperate with SaaS applications and other new cloud systems.

Users are adopting new cloud-based platforms and tools for data because they are optimized for the cloud’s high performance and scale. They also offer relatively low cost, highly agile, and flexible deployments. These cloud characteristics help organizations innovate by enabling modern data-driven practices in analytics, real-time operations, and self-service data discovery, prep, visualization, and analytics.

However, despite the steady migration of data, tools, and applications to the cloud, user organizations are also maintaining their on-premises legacy systems. After all, these legacies represent considerable investments in money, time, and physical assets. Furthermore, they still deliver valuable automation for traditional use cases in business process optimization and decision making.

The result is an increasingly hybrid IT ecosystem in the sense that many tools, applications, data platforms, and data sets (and their servers, etc.) are still physically located on premises while an increasing number are deployed in the cloud—sometimes multiple clouds. In terms of information assets, this means that bespoke hybrid data is broadly distributed and persisted in numerous data platforms, whether on premises, in the cloud, or both.

Data’s recent distribution to the cloud results in hybrid data architectures that are inherently complex. More complexity comes from trends that usually accompany cloud usage, namely a greater diversity of data structures, multiple database brands (old and new, from software vendors and open source projects), and the capture of big data and other new data assets from web applications, social media, customer channels, and the Internet of Things (IoT).

Managing distributed data is difficult in the best of times. Managing it in today’s multiplatform hybrid data architectures is a whole new next-level challenge. That’s where cloud data management saves the day.

As data continues to evolve, how we manage it and use it for business value evolves, too

Across industries, users are adopting cloud systems and integrating them with on-premises ones

Integrating new cloud and on-premises systems results in hybrid architectures for data and applications

Page 7: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

6

Cloud Data Management

Defining Cloud Data Management (CDM)Cloud data management is simply data management that involves clouds. As with most forms of data management, CDM involves tasks and technologies for data persistence and data integration. For example:

• When focused on data persistence, CDM provides cloud-native data storage and optimized processing for the burgeoning volumes of enterprise data, big data, and data from new sources that users are choosing to manage and use in clouds.

• When focused on data integration, CDM provides data integration infrastructure that unifies multicloud and hybrid environments. Note that data integration infrastructure includes development tools and deployment servers for multiple forms of data integration (ETL/ELT, replication, virtualization, event processing, etc.) as well as data disciplines related to integration (data quality, metadata management, master data management, etc.). All these interoperate with on-premises and cloud systems and may execute on either.

CDM’s purpose is to extend data management technologies and practices to deeply support cloud environments, especially regarding data persistence and integration.

The cloud offers many benefits that apply to data management. These include the cloud’s elastic scalability, zero system integration and administration, massive storage capacity, and low cost. Cloud users have reaped those benefits with common use cases such as SaaS applications and advanced analytics. Now they need the power of the cloud to give data management solutions greater speed and scale with less administration and cost.

CDM is already giving users business value in production solutions for cloud data warehousing, cloud data integration, cloud-based analytics, modern data semantics, and modern data hubs. CDM also contributes to operational environments such as those for digital marketing, the online supply chain, the Internet of Things (IoT), and many kinds of SaaS tools and applications.

Related Terms and Concepts for Cloud Data ManagementThe vast majority of people responding to this report’s survey feel that they know what CDM is and does. (See Figure 1.) As we’ll see later, the majority of survey respondents already have hands-on experience using or deploying data-driven cloud applications (for analytics, reporting, marketing, and sales automation) or cloud platforms (for data warehousing and data lakes).

Do you feel you know what CDM is and does?

Yes81%

No19%

Figure 1. Based on 108 respondents.

CDM has two halves: data persistence and

data integration

The cloud’s many desirable traits all

apply to CDM

Most users know what CDM is, although

they may call it by another name

Page 8: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 7

Introduction to Cloud Data Management

Although users know what cloud data management is, they may refer to it by another name. To get a sense of which terms users apply to the general practice of CDM, the survey asked, “What term(s) do you or your team use for data management that involves clouds?” (See Figure 2.) Each of the terms adds nuance to our understanding of CDM.

Cloud data warehousing (59%): The survey population is dominated by data warehouse professionals, so it is no surprise that they use the industry standard term cloud data warehousing. This also indicates that data warehousing is now firmly established as a cloud use case.

Cloud data management (50%): For many years, TDWI has been using the phrase data management as a broad umbrella term that includes numerous technologies and user practices. These tend to fall into two categories:

• Database management systems (DBMSs) and data persistence platforms that resemble them (e.g., Hadoop and cloud storage)

• Data integration and related disciplines (e.g., data quality, metadata management, master data management, replication, data services, etc.)

TDWI members and other people working in data management have also adopted the umbrella term data management. Many place an adjective at the beginning of the phrase to describe specific applications, as in cloud data management (50%) and its synonym hybrid data management (31%). The latter reminds us that CDM is not solely about clouds; it is also about combinations of computing platforms, commonly the mixture of on-premises and cloud systems.

Data-as-a-service (DaaS, 44%): Cloud-native systems (and progressively others, too) often interface and integrate via services of various types. These include web services (based on SOAP or REST), emerging microservices, homegrown services, and proprietary APIs. The term DaaS belies how important service architectures are for modern computing, including data management. It also reminds us that a modern hybrid ecosystem will not unify into a cohesive data architecture without an appropriate integration infrastructure, as discussed later in this report.

Distributed data management (24%): This term goes back to the early days of client-server computing, which were some of the first architectures to integrate data that originated in and was managed by multiple IT systems. Though long in the tooth, the term is still relevant, given how clouds provide even more options for distributing data as well as integrating distributed data.

Enterprise data architecture (40%): This equally aged term is a mature and indispensable discipline in larger enterprises that want data consistency, quality, and accessibility across diverse enterprise systems. The unification of platform diversity that EDA was designed to address is still with us, but greatly expanded by the addition of cloud platforms in hybrid architectures.

Multiplatform data architecture (19%): TDWI Research originated this term in a recent research report.1 The term is more descriptive than most because it stresses how hybrid data environments consist of multiple platforms of diverse types, often spread between on premises and the cloud. It also stresses that a collection of platforms is a mere “bucket of silos” that will not achieve full business value unless there is a data architecture that unites and integrates them.

Its synonyms reveal that CDM is distributed, service-oriented, hybrid, and multiplatform

1 For a discussion of the many data architectures possible in today’s multiplatform and hybrid data environments, see the 2018 TDWI Best Practices Report: Multiplatform Data Architectures, online at tdwi.org/bpreports.

Page 9: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

8

Cloud Data Management

What term(s) do you or your team use for data management that involves clouds? Select all that apply

Cloud data warehousing 59%

Cloud data management 50%

Data-as-a-service 44%

Enterprise data architecture 40%

Hybrid data management 31%

Distributed data management 24%

Multiplatform data architecture 19%

Multicloud data sync 12%

Other 4%

Figure 2. Based on 306 responses from 108 respondents (2.8 responses on average).

Real-World Use Cases for CDMCloud data management practices are in place today, supporting numerous production use cases in both analytics and operations. (See Figure 3.) For example, the top four items in Figure 3 are all for use cases in analytics, whereas the other items are for operational use cases. This illustrates that CDM is a real-world best practice, providing valuable support for a wide range of business functions. It also shows that, in the average enterprise today, these business functions and departments are all involved with cloud-based data and applications to some degree. In other words, the cloud has penetrated the enterprise broadly, affecting just about every business process.

For what enterprise functions or use cases is your organization applying cloud data management? Select all that apply

Analytics 64%

Data warehousing 52%

Reporting 47%

Data science 43%

Marketing and sales 35%

Finance 29%

Customer service 27%

Enterprise resource planning 27%

Human resources 24%

Research and development (R&D) 21%

Supply chain 14%

Other 6%

Figure 3. Based on 306 responses from 108 respondents (2.8 responses on average).

Users surveyed use CDM mostly for decision

disciplines, namely reporting, analytics,

and data warehousing

Page 10: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 9

Introduction to Cloud Data Management

The Point of CDM and Similar Hybrid PracticesTo better understand users’ priorities for CDM, our survey asked respondents to rank common CDM goals in priority order. (See Figure 4.)

Get more business value from data, whether in operations or analytics. Business value is rightfully at the top of the list, as it should be for all data management. CDM supports business value by repurposing and provisioning data for multiple business use cases from operations to analytics, even in hybrid data environments that require “many-to-many” data integration across multiple platforms.

Capture and leverage emerging data types and sources. This includes data from IoT, social media, customer channels, big data, and web apps. You cannot leverage data for business value if you cannot capture, store, process, and present that data according to its unique requirements—and that’s where CDM plays a valuable role. New and unique data sources (many of them cloud-based) come online almost daily in some organizations, which drives up the need for CDM that supports diverse platforms and tools.

Embrace new systems architectures and data architectures that involve cloud platforms. Many data warehouses were originally designed exclusively for the repeatable and auditable structures of reporting, which makes them ill-suited to the open-ended and unpredictable explorations of advanced analytics. Instead of trying to satisfy radically different data requirements in one warehouse, many users prefer to complement the warehouse with a cloud-based data lake or sandbox for analytics. Similarly, marketers keep sensitive customer data and customer masters on premises while managing massive stores of data for analytics or campaign management in the cloud. Hybrid and distributed architectures such as these need CDM to integrate and synchronize data across the multiple platforms involved in each use case.

Expand analytics into more advanced forms, such as machine learning and AI. Some forms of advanced analytics work best with massive amounts of data. This includes machine learning, data mining, graph, clustering, and statistics. Furthermore, analytics training data for these should come from many sources and timeframes and have rich details, as is typical of raw source data from operational systems. That’s because analytics needs broad data to make complex correlations among far-flung data points, which is what leads to business insights. Managing and processing data of this volume and diversity in a traditional on-premises system—such as an MPP configuration of a relational database—would be cost prohibitive and difficult to scale. However, the same analytics data and processing in the cloud is far more affordable and scalable.

Put the right data on the right platform for the right storage, processing, or provisioning. For example, the data requirements for set-based OLAP or SQL are radically different than those for algorithmic analytics for statistics, mining, and graph. Real-time analytics requires real-time data ingestion. Sentiment analysis requires a platform that can handle unstructured data. In addition, analytics processing increasingly runs inside a database or other persistence layer where data is stored. Accommodating all these requirements may demand a multiplatform storage portfolio, which nowadays generally leads to a hybrid mix of on-premises and cloud-based platforms.

Real-time access to all data, whether on premises or in the cloud. Interfaces to and from cloud systems vary considerably. However, the trend is toward more reliance on web services and microservices, whether based on SOAP and REST, or developed as a proprietary API. Many services of this sort can perform in real time or close to it. This makes cloud data platforms and analytics tools good fits for fast-paced business processes such as logistics, trading, e-commerce, and utilities.

Business value is the top priority of CDM Others include analytics, collecting web data, and tapping cloud power

Users surveyed know that CDM improves architecture, analytics, real-time processes, and data sharing

Page 11: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

10

Cloud Data Management

Take advantage of cloud characteristics. These include elastic scalability, low cost, and managed services. Many users are drawn to the cloud in order to scale with big data analytics, to reduce costs as compared to traditional on-premises data centers, and to hand off system integration and administrative tasks to the cloud provider.

Simplify the process of sharing data inside and outside of an organization. Cloud data platforms and integration tools can be used many ways, as we’ve already seen. Another way is to use the cloud as a central, shared point of collocation or consolidation for disparate data. For example, cloud data warehouses, data lakes, and data hubs do this (as well as many other useful things).

What is the point of CDM? Rank the following in priority order

Get more business value from data, whether in operations or analytics 5 7

Capture and leverage emerging data types and sources 5 3

Embrace new systems architectures and data architectures that involve cloud platforms 5 0

Expand analytics into more advanced forms, such as machine learning and AI 5 0

Put the right data on the right platform for the right storage, processing, or provisioning 4 4

Real-time access to all data, whether on premises or cloud 3 9

Take advantage of cloud characteristics 3 7

Simplify the process of sharing data inside and outside of an organization 3 4

Figure 4. Based on 97 respondents. Possible scores range from 1 to 8.

USER STORY CONSULTANTS CAN HELP WITH CLOUD ADOPTION AND MIGRATION

“I work for a professional services firm, where we help our clients adopt cloud for data management, analytics, and other use cases,” said Ash Naseer, vice president for analytics at Cloudnile. “As a first step, we conduct advisory consulting engagements for companies that have legacy data environments to determine what kinds of commitments they should make to cloud and what their strategy for adoption and migration should be. As the next step, I organize implementations that migrate data and build cloud data management solutions.

“Our clients want to leverage the abilities of cloud, especially elasticity and scale. Cost plays into it, too, and cloud can reduce capital expenditures for new implementations. However, our clients are more interested in finding the right tool and technology for specific use cases, and cloud-based systems increasingly fit their requirements, whether a pure cloud or hybrid solution.

“Our consulting practices highlight that the core principals of data warehousing and analytics will remain valid, although adapted to cloud and other new platforms such as Hadoop. Despite big data, increasingly advanced analytics, and new data-driven business practices, we still need the relational paradigm and SQL fully supported for most use cases, even on cloud-based data platforms.”

Page 12: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 11

Benefits and Barriers for CDM

Benefits and Barriers for CDMCDM: Problem or Opportunity?Embracing new best practices and technologies can uncover previously unknown challenges as well as new insights and improvements to technical infrastructure. Users are right to ponder the balance of risk and reward before making a commitment. Such is the case with cloud data management today. To test whether CDM is worth the effort, this report’s survey asked: Is cloud data management (CDM) a problem or an opportunity? (See Figure 5.)

The vast majority of respondents (96%) consider CDM an opportunity. Responses to other survey questions reveal that users are hopeful that CDM will help them scale to big data, expand analytics programs, draw business value from new data assets, and extend data warehouses.

A tiny minority (4%) consider CDM a problem. As we’ll see later in this report, many users are rightfully concerned about the difficulties of data governance, privacy, and security when data is located on numerous platforms, both on premises and in the cloud. A growing number of organizations have resolved these concerns to manage and use hybrid data in compliant ways.

Is cloud data management (CDM) a problem or an opportunity?

Opportunity—because it provides many useful options for data manage-ment, analytics, operational applications, etc.

96%

Problem—because its complexity is dif�cult to integrate, optimize, and govern

4%

Figure 5. Based on 107 respondents.

Benefits of CDMIn the perceptions of survey respondents, CDM offers several potential benefits. (See Figure 6.) A few areas stand out in their responses.

The general benefits of the cloud also apply to cloud data management. In fact, cloud performance characteristics ranked at the top of survey responses, namely scalability (51%), elasticity (44%), and support for new data sources and structures (30%). In phone interviews with users that TDWI conducted for this report, users repeatedly listed scalability, elasticity, multistructured data, advanced analytics, and real-time data practices as their reasons for migrating data and applications to the cloud.

CDM enables modern practices in analytics and real time. For example, CDM enables advanced analytics inexpensively at scale (35%), thereby generating new insights that make the business more profitable, efficient, and competitive. Also, CDM enhances real-time access to all data, whether on premises or in the cloud (35%), which enables innovative business practices such as e-commerce recommendations, fast-paced performance management, just-in-time inventory or maintenance, and real-time analytics with data from IoT and streaming sources. Depending on the hybrid combination of platforms involved, CDM helps new solution development be agile, yielding a short time-to-business use compared to implementing with purely on-premises platforms (20%).

Almost all survey respondents consider CDM an opportunity, not a problem

The leading benefits of CDM include scalability, elasticity, analytics, real-time access, and agility

Page 13: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

12

Cloud Data Management

Data assets are more fully leveraged in the cloud. TDWI regularly encounters users who believe that most clouds are well suited to consolidating diverse data so it can support more use cases, which in turn yields greater business value. Similarly, some users consider the cloud a

“neutral Switzerland” that fosters collaboration among diverse users, both internal and external. This way, the cloud makes it easier to share with external suppliers, partners, and customers (30%). CDM makes hybrid data architectures possible, which puts data platforms near web and IoT data sources (15%), and puts data platforms near third-party data providers (9%).

CDM fosters improvements across the board. From a technical perspective, CDM can modernize a mature data management infrastructure (30%). From a business perspective, managing hybrid data more effectively can improve existing business processes (29%), customer experience and service (28%), and employee efficiency (22%).

CDM and clouds can save money. Adopting data platforms that are cloud-based can save time and money because the platform is already set up and optimized for the cloud (alleviating time-consuming system integration on premises). Furthermore, “renting” cloud hardware avoids initial capital expenses for on-premises hardware, thereby making CDM an operational expense (14%). Depending on your cloud provider and whether you signed up for a managed service, administration for DBMSs and Hadoop can be easier (and therefore less expensive) in the cloud than on premises (15%). In many cases, cloud personnel do work that would require new hires on premises. For these reasons, startup expenses are low for new cloud-based data-driven programs and solutions (16%).

If your organization were to implement cloud data management, what would its leading benefits be? Select seven or fewer

Scalability for data storage and integration workloads 51%

Automatic and elastic resource management 44%

Enables advanced analytics, at scale but inexpensively 35%

Enhances real-time access to all data, whether on premises or in the cloud 35%

Data and other assets or resources are more fully leveraged 32%

Cloud data platforms support new data sources and structures at scale 30%

Makes data easier to share with external suppliers, partners, and customers 30%

Modernizes our mature data management infrastructure 30%

Improves existing business processes 29%

Customer experience, service, analytics, etc improve 28%

Improves employee efficiency 22%

Short time-to-use compared to implementing on-premises platforms 20%

Cost reduction, especially for start-up expenses 16%

Finances data management as operational expense, not capital expenditure 15%

Puts data platforms near web and IoT data sources 15%

Admin for DBMSs and Hadoop is easier on cloud than on premises 14%

Security for enterprise data is easier to implement and maintain 14%

Complies with our cloud-first corporate mandate 13%

Fits our IT outsourcing strategy 10%

Puts data platforms near third-party data providers 9%

Figure 6. Based on 605 responses from 98 respondents. 6.2 responses on average.

Page 14: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 13

Benefits and Barriers for CDM

Barriers to CDMIn the opinion of survey respondents, CDM and the cloud present some potential barriers. (See Figure 7.) A few areas stand out in their responses.

Data governance and related issues keep users awake at night. At the top of the barrier list, users’ greatest fears center on the challenges of modernizing data governance to cover all data on all platforms (38%). Users have other governance concerns about CDM and the cloud, including data privacy issues (40%), data security threats (36%), the risk of exposing personally identifiable information and other sensitive data (32%), and a variety of data usage compliance issues (27%).

These concerns are valid given that the hybrid data architectures that CDM unites distribute data across many platform types and enable more users and applications to access a broader range of data. Even so, as users have migrated applications, data, and users to the cloud, TDWI has seen many organizations across industries succeed with the cloud and CDM. They succeed by revising and expanding their programs in data governance, stewardship, curation, and security to assure that the cloud and CDM are compliant, secure, and governed.

Migrations are big projects that must be completed before CDM starts. Many users are replatforming to cloud-based systems (38%), meaning that they have pre-existing applications on premises, with data, users, and business processes they will migrate to the cloud. These users need to allocate generous time and money for such projects because migrating data and applications to the cloud is time-consuming (22%) and expensive in terms of acquiring and supporting cloud platforms and apps (14%).

In a related problem, too many organizations underestimate the scope of migration by thinking that they can perform a simple “lift-and-shift” from on-premises systems to cloud-based systems. In reality, many migration projects are more similar to new development than "lift-and-shift," as explained later in this report. For example, the more different the source and target platforms are, the more development it will take to have a well-performing system with optimal data at the end of the project. For all migrations, users should create a plan that organizes each migration as a series of easily dispatched phases, instead of a risky "big bang" project.

A variety of problems with data can hinder cloud adoption and CDM. Many of the use cases that cloud and CDM enable involve sharing data, as in analytics, reporting, data-driven operations, and self-service data access. These use cases are imperiled when the owners of data and platforms are not willing to embrace the cloud fully (24%), which increases the difficulty of sharing data across all parties, inside and outside your organization (27%). As with any data-oriented technology, CDM success may be threatened by the poor quality of data from traditional sources (24%) or new sources (8%). Finally, maintaining a single version of the truth that is current and accurate (31%) in a hybrid cloud environment requires expert skills in data integration and synchronization.

Some users still consider best practices and tool support for clouds immature. This includes immaturity in the CDM concept (21%) and CDM relative to requirements for advanced analytics (11%). Likewise, CDM may be hamstrung by immature tool support for clouds and SaaS applications (15%) and a data integration infrastructure that is weak on cloud support (19%).

Barriers to successful CDM may arise in governance, migrations, data quality, and tool maturity

Promises of “lift-and-shift” are exaggerated Replatforming usually involves development work

Page 15: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

14

Cloud Data Management

If your organization were to attempt cloud data management, what would its leading BARRIERS be? Select seven or fewer

Data privacy issues 40%

Data governance, modernized to cover all data on all platforms 38%

Replatforming to cloud-based systems (e g , for data warehouse, analytics) 38%

Data security threats 36%

Risk of exposing sensitive data (e g , personally identifiable information) 32%

Maintaining a single version of the truth that is current and accurate 31%

Data usage compliance issues 27%

Difficulty of sharing data across all parties, inside and outside our organization 27%

Owners of data and platforms not willing to go to the cloud 24%

Poor quality of data from traditional sources 24%

Migrating data and apps to cloud is time-consuming and expensive 22%

Architecting solutions for environments that are excessively heterogeneous & hybrid 21%

Immaturity of the CDM concept 21%

Integration infrastructure is weak on cloud support 19%

Immaturity of tool support for clouds and SaaS apps 15%

IT won’t support the cloud brand or cloud tools we want 15%

Cost of acquiring and supporting cloud platforms and apps 14%

No compelling business case 13%

Immaturity of CDM relative to requirements for advanced analytics 11%

Poor quality of data from new sources 8%

Other 4%

Figure 7. Based on 512 responses from 107 respondents (4.8 responses on average).

Page 16: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 15

The State of CDM

EXPERT COMMENT CLOUD, DATA LAKE, AND DATA PREP IS AN EMERGING COMBINATION

David Stodder is a senior research director at TDWI and a recognized expert in the new discipline of data preparation, which he sees increasingly used with clouds and data lakes. “With more data originating in the cloud through online and mobile marketing, e-commerce, social media, and other apps and data services, the resulting ‘data gravity’ is attracting even greater investment in cloud-native analytics, data preparation, and data management.

“Data warehouses, although still valuable, are being augmented or replaced by data lakes that can store all kinds of raw, detailed data. First built with Apache Hadoop clusters on premises, many organizations today are choosing to base their data lakes in the cloud so they can take advantage of service-based flexibility, scale, and the promise of better economics while avoiding having to develop, configure, and maintain data lakes on their own.

“Data preparation is essential to gaining value from all this data. Data preparation processes begin with data ingestion and collection and include steps for profiling the data and improving its quality, consistency, and completeness. Data preparation also includes transformation, wrangling, conversion, cleansing, and enrichment to make the data ready for analysis, modeling, and consumption.

“Cloud computing is evolving quickly, which is having a big impact on data strategies. Agile, loosely coupled microservices and containerization are shifting cloud-native environments away from monolithic structures. Organizations need data preparation processes and solutions that can help them accelerate cloud data lake adoption, take advantage of cloud’s evolution, and reduce time to value.”2

The State of CDMIs CDM Important?To gauge the urgency of cloud data management, this report’s survey asked respondents to rank the importance of CDM relative to their organization’s data strategy. (See Figure 8.)

Few respondents (14%) say that CDM is not a pressing issue. Therefore, we can conclude that CDM contributes significantly to enterprise data strategies.

Most respondents (86%) recognize the importance of CDM. Many feel that CDM is extremely important (39%), while others see it as moderately important (47%).

How important is implementing CDM to the success of your organization’s data strategy?

Extremely important39%

Not currently a pressing issue

14%

Moderately important47%

Figure 8. Based on 91 respondents.

Most organizations consider CDM an important piece of their enterprise data strategy

2 Read more of what David Stodder has to say in the 2019 TDWI Checklist Report: How to Use Data Preparation to Accelerate Cloud Data Lake Adoption, online at tdwi.org/checklists.

Page 17: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

16

Cloud Data Management

Why Is CDM Important?Why is CDM important? The survey asked the open-ended question, "In your own words, why is implementing CDM important (or not important)?" The respondents’ comments reveal a number of use cases, needs, and trends, as seen in the representative excerpts in Figure 9. Note that the users quoted work in many different industries and geographic regions. Cloud and CDM are top of mind for most data professionals and their business sponsors in many contexts worldwide.

In your own words, why is implementing CDM important (or not important)?

• “CDM allows us to expand our use of data with a lower up-front investment.” – Head of data center of excellence, consulting, Asia

• “[CDM’s] scale and speed are commodities we cannot afford to compromise on.” – BI manager, media and entertainment, U.S.

• “[CDM] aligns with the company strategy to move to the cloud, so it is important that the move is done correctly and properly.” – Group manager of IT, media and entertainment, Australia

• “[Our] IT mandate to move all systems to the cloud makes CDM an absolute necessity.” – Data analyst, consulting, Africa

• “Cost-effective use of technology and data warehousing. Leverage dollars, repurpose, provide broader access to tools and infrastructure.” – Senior analyst, transportation, Canada

• “Cloud computing facilitates the access of applications and data from any location worldwide and from any device with an internet connection.” – Database marketing, insurance, U.S.

• “Need to modernize the data management technologies to lay a foundation for cost-effective machine learning and deep learning [analytics].” – Consultant, financial services, U.S.

• “As business processes move to the cloud, it is vital that data processes are also supported.” – Product manager, hospitality, Europe

• “[CDM is important] to modernize the data warehouse and add business value.” – BI manager, food and beverage, U.S.

• “It’s important to be able to stay up-to-date on the latest software updates. The cloud puts this on the vendors.” – Director of data governance, retail, U.S.

• “New opportunities while leveraging existing data assets.” – System administrator, education, Canada

• “It is important to be prepared for the future and not just do what we’ve always done.” – Manager of DW/BI, healthcare, U.S.

Figure 9. Drawn from the text responses of 108 respondents.

Cloud Adoption: Decision Disciplines Are Catching up to Operational OnesTo sort survey respondents according to their exposure to CDM, our survey asked, "Do you personally have experience implementing and/or using tools, apps, or platforms that participate in some form of cloud data management?" (See Figure 10.)

Users have many good reasons for considering

CDM important

Page 18: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 17

The State of CDM

The majority of respondents have direct experience with some form of CDM. This percentage reveals how pervasive clouds and data management on or around them has become. However, why the large response?

As numerous technologies and practices have emerged over the history of IT, a recurring trend is for operational applications to adopt the technology first, followed after a few years by decision-making disciplines such as reporting, analytics, data warehousing, and data integration. In recent years, TDWI has observed organizations of all sizes and from all industries making deep commitments to many forms of cloud-based operational applications, licensed in the software-as-a-service (SaaS) model. These include SaaS applications for sales force automation (SFA), customer relationship management (CRM), marketing campaign management, financials, call center, human resources, and so on. We are now at the phase of the cloud adoption and maturation cycles where decision-making disciplines are catching up to operational ones.

Do you personally have experience implementing and/or using tools, apps, or platforms that participate in some form of cloud data management?

Yes81%

No19%

Figure 10. Based on 91 respondents.

Note that this report’s survey branched according to how respondents answered this question. Respondents answering “yes” (81% in Figure 10) were presented with detailed questions about CDM technologies, teams, and best practices, as seen in many of the following figures.

CDM SuccessesCDM is similar to any IT discipline. There are successful programs and programs that fail outright. However, success and failure are more often a matter of degree, in that some aspects of a program succeed while other aspects fail. The program continues into the future and users improve the lackluster parts as they go.

To get a sense of which aspects of CDM are currently succeeding or failing, our survey presented two open-ended questions that allowed respondents to describe their successes and failures in their own words. (See Figures 11 and 12.) Note that these questions were posed to respondents who reported having direct exposure to CDM; they speak from real-world experience.

Figure 11 assembles several representative comments about aspects of CDM that are succeeding today. The survey population is dominated by people who work in decision-making disciplines, so it’s no surprise that their successes primarily concern reporting, analytics, data warehousing, and data integration, with additional successes for many types of operational applications and their data needs. Clearly, respondents’ successes prove that CDM is real, it works, and it provides business value.

Most organizations surveyed are already doing CDM in some form

CDM successes commonly occur in reporting, analytics, data warehousing, and data integration

Page 19: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

18

Cloud Data Management

Briefly list some areas where cloud data management has succeeded in your organization

Reporting

• “We have had success converting over reports and dashboards”

• “Self-service reporting”

• “Enterprise reporting, single source of truth”

Analytics

• “Customer analytics”

• “Machine learning, data science, near-real-time data”

• “Managing real-time data and analytics”

• “Using a public cloud for heavy calculations on data and wiping it after use”

Data warehousing and other databases

• “Data warehouse modernization and move to the cloud has worked well

• “Migrating on-premises data warehouses to the cloud”

• “Data platforms, data semantics, and data integration”

• “Master data management project enabled to combine data from 27+ organizations”

• “Reducing costs and processing data faster with more resources”

Operational Applications

• “Salesforce implementation and integration with on-premises DWs”

• “Subsidiary companies that don’t own existing infrastructure”

• “Back-end office finance”

• Many mentions of marketing, sales, CRM, HR, supply chain, IoT data, mobile data

Figure 11. Drawn from the text responses of 50 respondents who have CDM experience.

Page 20: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 19

The State of CDM

CDM FailuresFigure 12 assembles several representative comments about aspects of CDM that sometimes fail. The figure also includes commentary about possible causes and remedies for the failures. Note that none of these are total failures. Instead, they are partial failures, each isolated to an aspect of CDM, such that the failures can be corrected and expanded for long-term success.

Briefly list some areas where cloud data management has FAILED in your organization

• “No failures yet.” Forty percent of respondents reported having no failures. For some, they simply have not done CDM long enough to suffer partial or complete failures. With others, it’s because their CDM solution works fine and delivers value, which is a very good sign.

• “Allowing consultants to be the only ones who know the implementation.” When attempting something that is new to you and your organization (as is often the case with cloud solutions), a tried-and-true IT practice is to hire consultants who have the experience you lack. However, for long-term success, this process must include a thorough knowledge transfer from consultants to internal personnel.

• “Consolidating all data into one place takes enterprise governance. The tools and process to do this are not mature.” Data governance (DG) is clearly a success factor for CDM. As we noted earlier, DG is especially hard in cloud and hybrid data architectures where data is distributed geographically. However, DG is mostly about people, process, and policy making—rarely about tools—so that’s where the correction should be applied.

• “Data quality. During the retrieval process the data was not accurate.” If this was a problem with the quality of data from the source systems involved in a migration to the cloud, then basic data profiling should have revealed required corrections. Besides, you should never just move data; always improve data as you move it. However, this problem might be due to faulty ETL logic. Regardless of cause, a good migration will be designed as a multiphase project that includes testing of data and logic at all phases, with rollback as a contingency.

• “Simply moving data from relational to Hadoop and using a presentation layer.” TDWI is seeing a lot of disappointment in Hadoop of late. For technical and business users who are used to mature relational databases, the light relational functionality you can retrofit onto Hadoop is not very satisfying. This explains why most replatforming today is no longer focused on Hadoop but instead on the new cloud-based databases built specifically to deliver deep relational functionality, albeit with cloud’s speed, scale, and low cost.

• “Initial phases are taking too long. No 'quick wins' for the process.” A good cloud implementation or migration plan should start with small, low-risk goals that fairly quickly demonstrate technical prowess and business value. Otherwise, everyone involved can become dispirited and management may even cancel the project.

• “Tightly integrated legacy applications are hard to get off our on-premises data management [infrastructure].” This is another reason why “lift-and-shift” migrations to the cloud don’t always work as advertised. The more mature an incumbent solution is, the more similar to new development the migration to the cloud will be. That’s because the data management components of the legacy solution will probably include platform-specific functionality such as database stored procedures, user-defined functions, and proprietary APIs, as well as hand-coded integration routines. None of that will port unaltered. Legacy-to-cloud migrations are the hardest and longest. Plan for a multiyear transition.

Figure 12. Drawn from the text responses of 46 respondents who have CDM experience.

CDM failures involve consultants, inadequate governance, poor quality data, and poor planning

Page 21: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

20

Cloud Data Management

USER STORY HYBRID DATA WAREHOUSES CAN BE CHALLENGING TO GOVERN

“About three years ago, we hired a large consulting firm to evaluate the legacy data warehouse we had at the time and to determine our future requirements for reporting and analytics. Based on their recommendations, we built a hybrid data warehouse,” said a data warehouse professional currently assigned to data governance at a financial institution.

“Our new hybrid data warehouse architecture centers around a traditional warehouse on premises, deployed on a large MPP configuration of a leading relational database brand. However, the architecture also includes large data sets for analytics and self-service on two different cloud providers, plus cloud-based tools for data integration and data governance. Over time, we’ll progressively move more data and its processing to the cloud, while maintaining the hybrid architecture.

“Data governance for the new warehouse is different from governance for the old one. Replatforming and migrating data revealed some data quality issues, and distributing warehouse components across multiple platform types—some in the cloud, some physically located on premises—makes data access tracking more challenging. That’s why I’ve been assigned to revamp data governance for data warehousing.

“My first priority in governance is to improve data quality across the distributed warehouse environment, largely for the sake of regulatory compliance, risk reduction, and financial reporting. We’ve updated our library of quality metrics and we hope to soon achieve 80% to 90% quality per data set.

“My second priority is to restructure the governance organization. In addition to the data warehouse team, we now have a warehouse governance team. It consists mostly of data stewards who are subject matter experts from the business, so they know the data and its impact on specific business processes.”

Organizational MattersCDM OwnersThe scope of CDM can be as narrow as a few simple data integration jobs that move data from SaaS apps to a data warehouse. Alternately, it can be a very broad infrastructure that provides deep interoperability and integration among multiple applications, tools, and data platforms, distributed across multiple clouds and on-premises sites. The scope of your CDM solution affects its ownership, design, funding, and maintenance. The broader the scope, the more likely it will be owned by a large centralized organization within your enterprise.

For example, according to survey results, the two most common CDM owners are the data architecture group (47%) and central IT (42%). (See Figure 13.) This makes sense because many midsize to large organizations have an enterprise data architecture (EDA) team responsible for data architectures and data standards for enterprise environments that span multiple platforms and business units, as is common with the hybrid data architectures integrated by CDM. The role of central IT varies widely, but in many modern organizations its first priority is to provide enterprise-scope infrastructure (networks, storage subsystems, racks of servers, etc.), and CDM demands hefty infrastructure.

CDM is usually owned by enterprise groups for data architecture or central IT

Page 22: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 21

Organizational Matters

Ownership aside, multiple teams must contribute to the design of data structures and data integration solutions involved in successful CDM solutions. These teams include various data management groups or DataOps (34%) and the data warehouse group (27%), with possible involvement of application groups or DevOps (14%). Note that CDM is more often broad than narrow in scope, which is why it is rarely owned by a business unit or department (14%), although those organizations may be heavy users of CDM and cloud systems and therefore have data requirements than need attention.

Who primarily designs and maintains the CDM solution you work with? Select three or fewer answers

Data architecture group 47%

Central IT 42%

Data management group or DataOps 34%

Data warehouse group 27%

Application group or DevOps 14%

Business unit or department 14%

Research or analysis group 8%

Third party (e g , managed service or cloud provider) 8%

Other 4%

Figure 13. Based on 146 responses from 74 respondents who have CDM experience (2 responses per respondent on average).

CDM WorkersThis report’s survey asked respondents with CDM experience to enter the job titles of people who design and implement CDM solutions. (See Figure 14.)

CDM does not happen without architects—lots and lots of architects. On the data side of IT, this includes data architects (16%), data warehouse architects (4%), enterprise data architects (2%), and a new title: cloud data architects (2%). On the applications and systems side of IT, architects take the form of solution architects (7%), enterprise architects (6%), and systems architects (2%).

It makes sense that CDM requires so many architects. After all, data architecture is usually about how diverse data sets relate through shared data structures or the data flows and pipelines of data integration. It is also about enterprise standards that should apply to most enterprise data sets. Similarly, architectures for solutions and servers concern how diverse application modules communicate within a single application or across multiple ones. Today’s hybrid cloud environments are bursting with multiple data platforms, data sets, and applications, which require architects to make them integrate and interoperate in an organized fashion that lends itself to optimization and maintenance.

Note that data architecture is rarely green field; it’s like archeology in that architects dig into a data ecosystem to gain an understanding of its existing systems and data. From that knowledge, they envision a global design that will unify disparate data platforms and data sets at an appropriate and realistic level. Architects also suggest local changes to data models, databases, interfaces, and data standards to make data more easily shared across the multiple platforms typical of today’s modern digital enterprises, especially those that have embraced cloud computing.

Data teams contribute to CDM’s design, integration, and data requirements

Architects, engineers, analysts, and upper management make CDM happen

Page 23: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

22

Cloud Data Management

The cloud and its data management tend to be top heavy, organizationally speaking. This is clear from the numerous management titles revealed by the survey, including managers (15%), chief officers (8%), directors (5%), and vice presidents (4%). Among these, CDOs, CTOs, CIOs, and VPs get projects moving and keep them focused on the business goals of upper management, whereas directors and managers direct the quotidian work.

Enter the job titles of people who contribute significantly to the design and implementation of cloud data management

DATA SPECIALISTS

Data architects 16%

Data integration engineers 9%

Data warehouse architects 4%

Data analysts and scientists 3%

BI developers 2%

Cloud data architects 2%

Enterprise data architects 2%

APPLICATION or SYSTEM SPECIALISTS

Solution architects 7%

Enterprise architects 6%

System architects 2%

Database administrators 2%

MANAGEMENT

Managers 15%

Chief officers (CDO, CTO, CIO) 8%

Directors 5%

Vice presidents 4%

MISCELLANEOUS

Business owners, sponsors, users, SMEs 7%

Other 6%

Figure 14. Based on 123 responses from 54 respondents (2.3 responses per respondent on average).

Hiring and Training for CDM SkillsThere are multiple ways to get the skills you need for working with clouds and CDM. (See Figure 15.)

Hiring new employees with architectural and integration experience. Finding data professionals who are truly qualified for advanced work in architecture and integration continues to be a challenge for IT and data management teams. Yet, 73% of the organizations surveyed report that they are able to do so successfully.

Fill the skills gap by hiring new people,

training employees, and engaging consultants

Page 24: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 23

Training existing employees for new skills in architecture and integration. Cross-training data management professionals works because these professionals enjoy learning new skills, know that it’s good for resumes and job security, and are fully capable of working with increasingly complex combinations of data platforms and user best practices, which is the very hallmark of the multiplatform hybrid data architectures under discussion here.

Depending on consultants for new skills. Many system integrator and consulting firms now have mature practices devoted to the cloud in general, including the variations of cloud data management discussed in this report. Hence, consultants are an available and reliable source of cloud skills—if you have the budget to afford them. If you tap consulting resources, be sure they work side-by-side with employees to assure a knowledge transfer that will position employees to eventually take over the project successfully.

How is your organization staffing CDM design and implementation? Select all that apply

Hiring new employees with architectural and integration experience 73%

Training existing employees for new skills in architecture and integration 58%

Depending on consultants for new skills 45%

Figure 15. Based on 130 responses from 74 respondents who have CDM experience (1.8 responses on average).

Holistic Data Governance for Hybrid CDMData should be governed holistically across all platforms, including clouds. This is true whether data exists on premises, in the cloud, or both (as is common in today’s hybrid data architectures). It is also true whether data migrates to a cloud, originates there, migrates off a cloud, or in some combination. Data governance should be holistic despite the extreme complexity of modern hybrid data architectures.

Holistic data governance should apply evenly across enterprise platforms. Governance policies are too often made one platform, data set, or user constituency at a time. Instead, data governors should design policies broadly so the policies apply directly to data access and use in many scenarios. This way, business and technical users can apply such policies unaltered (or updated slightly) to cover new applications, data platforms, and data sets, whether on premises, in the cloud, or in a hybrid mix. After all, “compliance is compliance” in most enterprise scenarios, especially when guided by legislated regulations (e.g., U.S. HIPAA and EU GDPR), certain data domains (consumers and patients), internal privacy policies, and enterprise requirements for stewardship and curation. As you migrate data to the cloud or generate data from new cloud systems, be sure that established data governance policies and processes are assigned with minimal revisions or without creating new and potentially conflicting policies.

Consider where the data will live. Replatforming and migrations to the cloud move data physically, and that is a potential problem when data travels great distances. For example, long-standing data protection regulations set up by the European Union limit the movement of certain data domains across certain national borders. When replatforming moves data to a cloud, corroborate that the location of the cloud provider’s data center complies with legislated regulations that apply to your data.

The goal of modern DG is to govern all data and platforms via fewer, but more comprehensive, policies

Organizational Matters

Page 25: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

24

Cloud Data Management

Rethink governance for new platforms that enable new business and technology practices. At the top of that list is self-service data access, which is a foundation for self-service data exploration, data prep, visualization, and analytics. The broad access and use of data that self-service practices assume will inevitably lead to governance infractions unless governance policies are put in place as soon as the new practices arrive. In addition to policies, new practices should also be guided more granularly by data stewards and data curators.

A governance committee should enforce enterprise-scope data standards. Data governance is not only about compliance. A mature governance program will also establish data standards for data models, integration methods, quality, and semantics. Standards give data the consistency it needs to be shared across multiple IT systems and business units. That’s important because sharing data helps the modern business succeed with single views of customers, complete data for reports and analyses, and up-to-date status information for operational processes. Governed data standards are also useful as more data travels across the hybrid data architectures that are typical of cloud scenarios.3

Multiphase Plans for Migrating Data to the Cloud4 Organizations of any size or maturity will already have a data warehouse deployed and in operation. Modernizing a warehouse regularly involves migrating data from platform to platform, increasingly from on-premises to the cloud. This is because replatforming is a common strategy and the cloud is the most modern platform available for warehouses today.

Some data warehouse modernization programs seek to simplify bloated portfolios of databases (or to take control of rogue data marts) by consolidating them onto fewer platforms. The cloud is an easily centralized and globally available platform, which makes it an ideal target for data consolidation. Hence, users who modernize a data warehouse need to plan carefully for the complexity, time, business disruption, risks, and costs of migrating and/or consolidating data, with special considerations for cloud platforms.

The key is to plan carefully before migrating data, its management, and its users to the cloud.

Don’t bite off more than you can chew. We all know the risks of a "big bang" project, where size and complexity raise the probability of failure. Such risks are easily mitigated by a multiphase project plan, which segments work into multiple manageable pieces, each with a technical goal that adds business value.

Start with a low-risk, high-value segment of work. For example, successful data migration or replatforming projects focus the first phase on a data subset or use case that is both easy to construct and in high demand by the business. Prioritize early phases so they give everyone confidence by demonstrating technical prowess and business value. Save problematic phases for later.

Note that you’re not just migrating data. You’re also migrating business processes, groups of end users, reports, applications, analysts, developers, and data management solutions. Plan to migrate all these elements with minimal disruption to business operations.

New data-driven practices need new

DG guidance

More than compliance, DG also sets data standards

Create a multiphase plan before migrating data warehouses and other

databases to cloud

3 Learn about how governance and other data management practices can be adapted to cloud use cases in the 2017 TDWI Checklist Report: Data Management Best Practices for Cloud and Hybrid Architectures, online at tdwi.org/checklists.

4 This section of the current report is borrowed from the 2018 TDWI Checklist Report: Modernizing Data Warehouses via the Cloud, online at tdwi.org/checklists. Read that report for more information about migrations to the cloud and related topics.

Page 26: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 25

Plan contingencies for risky milestones. Expect to fail, but be ready to recover via rollback. Don’t be too eager to unplug the old platforms because you may need them for rollback. It’s inevitable that old and new data warehouse platforms will operate simultaneously for months or years, depending on the size and complexity of the data, user groups, and business processes you are migrating.

Expect development work, not just migration and consolidation work. Replatforming can easily feel like new development when data being migrated or consolidated requires much work. For example, some data and solution components will "lift-and-shift" quickly, and work pretty well on the new platform. Others will not. Even when the so-called "lift-and-shift" works, developers may need to tweak data models and interfaces for maximum performance on the new platform.

When the new platform offers little or no backward compatibility with the old one, development may be needed for platform-specific components such as stored procedures, user-defined functions, and other hand-coded routines. Similarly, poor data quality and modeling should be remediated during migration; otherwise you’re just bringing your old problems into the new platform. In all data management work, when you move data you should also endeavor to improve data.Assemble a diverse team for modernizing and replatforming a data warehouse.

Obviously, data management professionals are required. Data warehouse modernization and replatforming usually needs specialists in warehousing, integration, analytics, and reporting. When tweaks and new development are required, experts in data modeling, architecture, and data languages may be required. Don’t overlook the maintenance work required of database administrators (DBAs), systems analysts, and various IT staff.

• Affected parties must be part of the process. A mature data warehouse will serve a long list of end users who consume reports, dashboards, metrics, analyses, and other products of data warehousing and business intelligence. These people report to a line-of-business manager and other middle managers. Affected parties (i.e., managers and sometimes end users, too) should be involved in planning a data warehouse modernization. First, their input should affect the whole project from the beginning so they get what they need to be successful. Second, the new platform rollout should take into consideration the productivity and process needs of all affected parties.

• External parties may need coordination. In some scenarios, such as those for supply chain, e-commerce, and business-to-business relationships, the migration plan should stipulate dates and actions for partners, suppliers, clients, customers, and other external entities. Light technical work may be required of external parties, as when customers or suppliers have online access to reports or analytics supported by a cloud data warehouse platform.

Some migrations are simple Others are like new development

Data migrations to the cloud affect many types of people Your plan should protect them all

Organizational Matters

Page 27: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

26

Cloud Data Management

EXPERT COMMENT MODERNIZATION IS MORE THAN MIGRATING AND REPLATFORMING

David Loshin is the president of Knowledge Integrity, Inc. and a well-known expert in data management. He has much to say about system modernization and data migration, “Disruptive technologies such as Hadoop and cloud computing have motivated rallies for system modernization. What does ‘modernization’ really mean?

“A common misconception suggests that modernization consists of merely moving existing applications to a new platform. However, modernization is actually more about refactoring or reengineering an existing legacy system to align it with current business demands. Segregating business processes from their original implementations helps eliminate dependencies that were hard-coded into legacy systems to accommodate those business processes.

“Because modernization involves a more sophisticated approach to reengineering, organizations that choose to do so face many challenges that may impede their abilities to effectively modernize their data warehouse environments, whether they are on premises or in the cloud. Becoming aware of the challenges allows you to prepare a foundation for brainstorming approaches to addressing and overcoming those challenges.”5

CDM Best PracticesData Platforms for Hybrid CDMThe list of tools and platforms involved in cloud data management continues to evolve, as users adjust their software portfolios, usually to include more cloud-based systems. To quantify the mix that users are working with today, our survey asked, "For the CDM solution that you use most, what types of use cases, data, and compute platforms are being supported today?" (See Figure 16.) The overall message from survey responses is that a cloud of some kind is being used by almost all user organizations. Furthermore, data is being created, managed, integrated, and used for business advantage across hybrid data architectures, as seen in the following examples.

Operational applications. Given that on premises is still the norm (compared to the cloud), it is no surprise that most of the users surveyed have a variety of on-premises systems (86%), both operational and analytic. However, the surprise is that a minority of organizations (perhaps as high as 14%) would prefer to have no significant on-premises footprint. In phone interviews, TDWI found a couple of these—new firms with digital products (software applications or third-party data). Because their products are cloud-based, it makes sense that they also made internal IT as cloud-based as possible.

Obviously, traditional operational applications are still with us (ERP, CRM, SFA, etc.) (72%), although most brands of packaged applications may now be flexibly deployed on premises or in the cloud. Among end users, the platform trend for operational applications is clearly toward software-as-a-service (SaaS) applications (67%). In fact, two-thirds of organizations surveyed already have these.

CDM involves a rich selection of tools

and platforms

5 Read more from David Loshin in the 2019 TDWI Checklist Report: Overcoming Challenges to Data Warehouse Modernization, online at tdwi.org/checklists.

Page 28: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 27

Cloud types. Plans for cloud migrations typically start by selecting a cloud provider. Most users choose a third-party cloud (56%) instead of a private cloud (42%), usually because of the time and cost of building and maintaining a private one. Most organizations prefer a third-party cloud because it meets their goals of outsourcing IT infrastructure, avoiding system integration, and reducing administrative costs. In a related trend, TDWI sees users increasingly looking for third-party cloud providers that have a managed service (40%) for platforms they need in the cloud, typically a favored brand of database, distribution of Hadoop, analytics platform, or tool for data integration.

Decision-making technologies. The prominent layers of the decision-making technology stack are represented amply in survey results, both on premises and in the cloud. These decision-making technologies include data warehouses (77% on premises, 53% cloud), data integration platforms (77% on premises, 49% cloud), analytics tools (74% on premises, 58% on cloud), and data lakes (49% on premises, 42% cloud). Note that each category of decision-making technology is currently more prominent on premises than in the cloud, but not by much. TDWI expects the

“cloud gap” to shrink as more users gain confidence in the cloud and as cloud providers and software vendors improve their offerings.

Database management systems (DBMSs). The relational DBMS is still the most common paradigm for data management—even in the cloud (72%). Even so, users continue to diversify the range of DBMS types they use, because data itself and business uses of it are diversifying (as discussed at the start of this report). In fact, nonrelational DBMSs (40%), NoSQL DBMSs (37%), and analytic DBMSs (40%) are all firmly established in users’ software portfolios, in both enterprise and cloud deployments.

Do not forget that the relational DBMS is evolving, too. In one direction, most so-called columnar, graph, and NoSQL DBMSs support the relational paradigm. In another direction, multiple DBMSs are now available as native applications on multiple public clouds, whether these are mature brands ported to the cloud or the new relational DBMSs built from the ground up for the cloud. In a related trend, object storage in the cloud (53%) is gaining popularity because it resembles the relational and DBMS paradigms but with the cloud’s speed, scale, and low cost, as well as mechanisms for embedding object storage in a number of application technology stacks.

Open source software. The operating system LINUX years ago proved the value, performance, reliability, and low cost of open source software. In recent years, many open source tools and platforms have proved themselves useful in data management, including cloud solutions (70%). In particular, Hadoop (42%) is now common in data warehousing and analytics, both on premises and in the cloud, followed by Spark (35%) and open source containers (e.g., Docker) (35%).

Every layer of the decision-making technology stack is now established in the cloud

DBMSs have adapted well to the cloud

CDM Best Practices

Page 29: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

28

Cloud Data Management

For the CDM solution that you use most, what types of use cases, data, and compute platforms are being supported today?

OPERATIONAL APPLICATIONS

On-premises systems 86%

Traditional applications (ERP, CRM, SFA, etc ) 72%

Software-as-a-service (SaaS) applications 67%

CLOUD TYPES

Public or third-party cloud 56%

Private cloud 42%

Managed service provider 40%

DECISION TECHNOLOGIES

Data integration platforms on premises 77%

Data warehouse on premises 77%

Analytics tools on premises 74%

Analytics tools in the cloud 58%

Data warehouse in the cloud 53%

Data integration platforms in the cloud 49%

Data lake on premises 49%

Data lake in the cloud 42%

DATABASE MANAGEMENT SYSTEMS (DBMSs)

Relational DBMSs 72%

Object storage in the cloud 53%

Nonrelational DBMSs 40%

Analytic DBMSs 40%

NoSQL DBMSs 37%

OPEN SOURCE

Open source software (except ubiquitous LINUX) 70%

Hadoop 42%

Spark 35%

Containers (e g , Docker) 35%

Figure 16. Based on 57 respondents who have CDM experience.

Data Management Tools for Hybrid CDMHybrid cloud environments and CDM need a comprehensive data integration infrastructure and related data management tools to manage, move, document, and provision fit-for-purpose data that’s clean, compliant, and governed. To quantify the mix of data management platforms and tools users are working with today, our survey asked, "What data management capabilities do you need for successful CDM today?" (See Figure 17.)

DM tools extended for the cloud make

CDM happen and unify hybrid ecosystems

Page 30: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 29

Data stores. Both operational and analytics use cases involving hybrid data architectures benefit from mid-tier data stores where extremely diverse data is integrated, aggregated, and repurposed for these use cases. For that function, the data warehouse (80%) is still alive and well, having modernized recently to leverage the strengths of new platforms, such as the cloud, Hadoop, and NoSQL.

However, most warehouses continue to be optimized for carefully cleansed, documented, and structured data sets, typically for standard reports and dashboards. Therefore, users increasingly complement a data warehouse with a data lake (48%) that is optimized for massive volumes of detailed source data, typically for operations, operational reporting, data exploration or discovery, and analytics. This way, the warehouse and lake complement each other; together they create a more comprehensive solution.

Core data management. Absolute “must haves” for CDM include core data management functions for data integration (70%), data prep (57%), and data quality (55%). Data virtualization (DV, 43%) is a near-real-time and virtual alternative to batch-driven ETL/ELT-style data integration. DV is an excellent fit for CDM because it specializes in interoperability with many far-flung sources (typical of hybrid cloud data ecosystems) while instantiating integrated data sets with close to real-time performance.

Special user features. To get full business value from CDM’s unification of hybrid data, business people and other user constituencies need self-service data access and exploration tools (50%) so they can perform modern self-service practices such as data prep and data visualization. Note that these self-service practices depend on business metadata or its equivalent (defined below) that are best provided by the data integration tools that are fundamental to CDM. Ideally, such tools will also have features tuned for multiple user types (from both technology and business) (45%) via data-sharing functions (48%) and stewardship and curation features (31%).

Real-time performance. CDM for hybrid data clouds regularly supports time-sensitive use cases, such as business monitoring, operational reporting, and data capture for IoT. For these, CDM’s data management infrastructure must support real-time data interfaces (45%) and event processing (38%).

Cross-platform interfaces. For CDM to unify a hybrid data architecture, it needs heavy doses of data pipelining (39%) and other modern approaches to cross-platform interfaces. To help optimize complex integration solutions and avoid resource conflicts, CDM’s data management tooling needs orchestration and workflow management (36%). When the great number of interfaces in a hybrid environment is large, the environment may benefit from interface and API management (43%). In-memory functions (41%) are instrumental for the high speed and low I/O that data pipelining and orchestration assumes. Progressively, tools are exposing their data management functions via interfaces and methods for data-as-a-service (DaaS, 29%) and microservices for data (25%).

Data semantics. As we’ll see in the next section of this report, CDM that satisfies the broad requirements of many users and use cases will rely on multiple approaches to data semantics, including metadata management (43%), data catalogs (39%), and business glossaries (36%). A number of functions can be built atop semantics, including those for impact analysis (32%) and data lineage (25%).

Data quality, integration, and virtualization are the meat and potatoes of cloud data management

The diverse platforms of hybrid environments demand many kinds of interfaces and semantics

CDM Best Practices

Page 31: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

30

Cloud Data Management

What data management capabilities do you need for successful CDM today? Select all that apply

DATA STORES

Data warehouse 80%

Data lake 48%

DATA MANAGEMENT

Data integration 70%

Data prep for simplified data integration and analytics 57%

Data quality 55%

Data virtualization 43%

SPECIAL USER FEATURES

Self-service for data access and exploration tools 50%

Data-sharing functions 48%

Tool features tuned for multiple user types (tech, dev, biz, steward) 45%

Stewardship and curation features 31%

REAL-TIME PERFORMANCE

Real-time data interfaces 45%

Event processing 38%

CROSS-PLATFORM INTERFACES

Interface and API management 43%

In-memory functions 41%

Data pipelining 39%

Orchestration and workflow management for cross-platform data pipelines 36%

Data-as-a-service 29%

Microservices for data 25%

DATA SEMANTICS

Metadata management 43%

Data catalog 39%

Business glossary 36%

Impact analysis 32%

Data lineage 25%

Figure 17. Based on 56 respondents who have CDM experience.

Beef up your data management program as preparation for cloud success. As shown in Figure 17, organizations involved in cloud data management are using every known tool type and function—and using them deeply. This reveals how important data management is to a healthy hybrid ecosystem that requires the timely movement of current data for both operations and analytics. For users planning upgrades to data management infrastructure, to keep pace with the demanding requirements of cloud, we offer these recommendations.

Data integration upgraded to deeply support

clouds is a success factor for data-driven

cloud use cases

6 For more tips about optimizing data integration and other data management disciplines for the cloud, see the 2017 TDWI Checklist Report: Data Management Best Practices for Cloud and Hybrid Architectures, online at tdwi.org/checklists.

Page 32: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 31

Assume that data integration involving cloud requires many interface types. Cloud-based applications and data platforms—as well as clouds themselves—typically support standard interfaces (ODBC/JDBC), proprietary APIs, call interfaces, and both standard and proprietary file or document formats. Ask your application, data platform, and cloud providers which methods they work best with; depending on the use case, the best method could be a standard interface, an API, or a file. Likewise, be sure your data integration tool set supports the interfaces, protocols, and data formats of popular cloud-based applications and data platforms in addition to the usual on-premises enterprise systems.

Seek optimal elasticity. In varying degrees, clouds allocate and re-allocate resources autonomically. This is called elasticity, and it is one of the leading benefits of cloud because it assures speed, scalability, and automatic capacity without much planning. As you evaluate cloud providers, get a sense of how elastic their system is, plus what you must do to achieve maximum elasticity via data modeling, file formats, interface selection, and bulk load options.

Integrate with third-party clouds. Firms in industries with active supply chains (e.g., retail and manufacturing) often turn to cloud-based data brokers to facilitate business-to-business communication and data exchange. Similarly, customer-facing firms turn to cloud-based data aggregators to purchase additional data about consumers. In such cases, your data management infrastructure must support whatever the third-party provider requires. For example, today’s cloud-based B2B gateways can significantly reduce the time and expense of onboarding partners and enable the exchange of data through standard protocols (such as EDI, SWIFT, and HL7) as well as via on-demand and orchestrated APIs (such as REST).

Know the interface points of new platform types. For example, TDWI sees users increasingly adopting cloud-based Hadoop, which involves multiple interface points, including MapReduce, Pig, Hive, Hbase, Spark, Drill, and Presto.

Select from multiple right-time interface types. Data coming from or going to clouds increasingly travels in real time or close to it. Therefore, your data integration tools and data management infrastructure should address multiple right-time interfaces, ranging from offline batch and microbatch to real-time streams and IoT.

Modernize your metadata management. For years, TDWI has seen organizations depend on data integration tools for multiplatform metadata management. This trend continues with clouds, though clouds demand modern approaches to metadata. Be sure your DM infrastructure supports multiple metadata types (technical, business, and operational).

Modern Data Semantics as CDM Enabler and Unifier of HDAsThere are now several established forms of data semantics, namely metadata management and multiple forms of metadata (e.g., technical, business, and operational metadata), as well as emerging semantics for business glossaries, data profiling, and data cataloging. Because modern users want to query, browse, and search semantic descriptions of data (which leads to accessing the data), a modern semantic facility must support multiple forms of indexing. Sophisticated users (such as those working with hybrid data architectures) are using all these approaches to data semantics, often in a single project. The long list of approaches comes together in what TDWI calls the semantic array.

The modern semantic array does a lot for hybrid data architectures (HDAs). When the array is centralized and shared, it presents a comprehensive inventory of data for all the platforms of the HDA, whereas traditional semantics rarely reaches beyond a single platform. The semantic array enables the creation of custom views of distributed data, such as business metadata or

Evaluate cloud providers for integration prowess, not just data platforms

Cloud requirements for interfaces and metadata differ from on premises

Metadata still rules, but today’s requirements demand multiple semantic approaches

CDM Best Practices

7 As examples of such brokers and related standards, visit web sites for the Global Data Synchronization Network (GDSN) and 1WorldSync.

Page 33: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

32

Cloud Data Management

glossaries for business users. The same array can also enable sophisticated data virtualization and certain applications of it, such as the logical data warehouse or logical data lake. Finally, note that semantics-driven views or virtual applications can impose architectural unity upon the siloed chaos of HDA without the risk, cost, and distraction of time-consuming data migration and consolidation projects. Hence, the semantic array is becoming one of the leading tools for the unification of HDAs and other complex, hybrid, and distributed data architectures.8

Metadata management across new platforms. For example, modern metadata management tools are now appearing as software-as-a-service (SaaS) platforms. The benefits of SaaS and cloud-based tools apply to metadata management, namely minimal tool set up, tool maintenance, and capital investment, with short time-to-use and elastic scale in production. As a completely different example, metadata tools must interface with data management functions on new platforms such as SaaS operational apps, cloud-based systems and storage, Hadoop, and other open source products. Finally, given the multiplatform (hybrid) data environments becoming popular today, it sometimes makes sense to deploy a hybrid metadata repository that stores metadata on diverse platforms, although its interface makes distributed metadata look like a single source.

Drawing a holistic “big picture” is critical for HDA success. To cope with the complexity of HDAs, data management professionals need CDM tool functionality that can draw the “big picture” of a hybrid environment’s data inventory, server platforms, local data structures, and overall data architecture. To unify complex environments, data management professionals turn to technologies designed for cross-platform data operations in cloud and hybrid architectures, such as data virtualization, query federation, integration hubs, data flows, and data replication. However, they also need semantics that draw the big picture, as is done by enterprise data catalogs, business glossaries, and modern approaches to metadata management (as discussed earlier in this report). These big-picture functions contribute to multiple contexts, including CDM development, runtime deployment, data governance, and self-service data access.

Data Virtualization as an Agile and Non-Intrusive CDM MethodData virtualization is a platform type for modern data integration. It performs many of the same transformation and quality functions as traditional data integration—namely, ETL, replication, federation, and messaging—but without the latency, redundancy, and rigidity that is typical of traditional systems.

Data federation is a subset of data virtualization. Data federation simply aggregates heterogeneous data from disparate sources and presents it as a single result set or point of access. Data virtualization goes beyond federation to also provide an abstraction layer and data services. Furthermore, federation via a DV platform supports advanced query planning, caching, in-memory, and hybrid strategies for optimizing cross-platform performance.

Data virtualization provides abstraction and service layers for heterogeneous and distributed data. By integrating data from disparate sources, locations, and formats without replicating it, data virtualization (DV) creates a single “virtual” data layer that delivers unified data services to support multiple applications and users. For a hybrid data ecosystem, DV can unify the diversity and simplify the chaos of distributed data.

Data virtualization is a form of integration that

provides abstraction and services layers as

a virtual complement to physical integration

8 The semantic array is also a critical success factor for data hubs, as explained in the 2018 TDWI Checklist Report: The Modern Data Hub, online at tdwi.org/checklists.

Page 34: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 33

DV offers compelling use cases for hybrid data architectures:

• Many virtualized data services can operate in real time (or close to it) to instantiate fresh data that is time-sensitive for business processes or updated repeatedly during the business day.

• Virtual views are often designed to be business-friendly and can simplify access to HDA systems and data, as required of self-service data prep, exploration, and visualization.

• DV reduces data replication and relocation, reducing network and storage loads.

• Data may be migrated through data virtualization’s abstraction layer for fast prototyping and testing. DV infrastructure also facilitates migration across an HDA’s heterogeneous platforms whether on premises or in the cloud.

• DV techniques create data interfaces and communication channels among the many components of an HDA, which in turn unifies the large-scale architecture of the HDA.

Virtualize hybrid data instead of migrating or consolidating it. A data management professional’s knee-jerk reaction to HDA complexity is to migrate data from several systems to fewer ones, and then do the time-consuming and unpredictable hard work of consolidating data sets of heterogeneous schema. However, consolidating hybrid data is not a compelling solution because it heavily consumes time and other resources.

In many cases, data virtualization is an effective alternative to data migration and consolidation. Data virtualization can create logical views in which the data looks consolidated even though it has not been migrated or physically altered. Data virtualization also involves a fraction of the time, cost, risk, and disruption of data migration and consolidation projects. It avoids arguments about data ownership and budgets. Because logical representations are agile, once the logical models are built, updating them to keep pace with evolving data platforms and use cases is faster than with physical methods. Furthermore, advancements in DV techniques and hardware speed make the instantiation of logical data structures fast enough to satisfy most service-level agreements.9

Distributing Data Across a Hybrid Data ArchitectureHybrid data architectures are both a blessing and a curse.

As we saw in the discussions of Figures 16 and 17, relational data and database management systems are common in cloud and hybrid environments. However, big data and many new data sources generate nonrelational data of diverse structures or no structure. For example, consider the proprietary and constantly evolving record structures generated by web applications and sensors on the Internet of Things. Furthermore, unstructured data is on the increase, from traditional enterprise applications (e.g., customer conversations captured by call center apps, the claims process in insurance) and modern web apps (e.g., social media, e-commerce).

The result of all this is extremely hybrid data, which is a blessing in terms of the new insights and innovative business management methods it can inspire and enable. However, it is also a curse because its diversity leads to complex architectures and bulging software portfolios that are expensive to assemble and difficult to maintain over time.

Users find it increasingly difficult to manage data with on-premises platforms (75%, see Figure 18). This is no wonder given the rise of nonrelational data. Traditional data platforms are not going away because they still manage large volumes of highly valuable data and they fit into business processes quite ably. Users are maintaining older platforms while adding new ones

As you embrace the cloud and new data, rethink how you load balance storage and processing in hybrid architectures

CDM Best Practices

9 For a detailed discussion of data virtualization in the context of hybrid data architectures, see the 2017 TDWI Checklist Report: Architecting a Hybrid Data Ecosystem, online at tdwi.org/checklists.

Page 35: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

34

Cloud Data Management

to address new data requirements as well as to scale at a reasonable cost. That is why cloud and hybrid data architectures have become so diverse and complex in terms of data structures and the platform types that manage them.

As your organization adopts new data types and taps new data sources, do you find it increasingly difficult to capture and use data with on-premises platforms?

Yes75%

No25%

Figure 18. Based on 91 respondents.

Upgrade your data management architecture. Adopting the cloud affects architectures, which is an opportunity to fix some of the mistakes of the past. For example, too many DM solutions evolve into a plague of point-to-point interfaces and integrations, which results in a convoluted hairball that is hard to optimize, control, and maintain. Many organizations fix this common data management design problem by restructuring the hairball as a data integration hub with controllable spokes. With a hub-and-spoke architecture and the right hub tools, users can orchestrate data flowing through the hub to control access (for security and governance), improve data (for quality and modeling), and make data accessible to a wider range of users (via self-service and publish/subscribe). Orchestration via a hub can apply to all data, including data flowing to and from clouds.10

Enterprise data in the cloud will double or triple—at least—over three years.

The proliferation of new data types and new data platforms has increased the presence of cloud-based systems in modern enterprises such that an increasing amount of data “lives” in the cloud, regardless of where it may have originated or may be going. Data management professionals need to track this development (from a capacity-planning viewpoint) as well as to assure that the data is where it needs to be for specific use cases, compliance reasons, or processing requirements. To get a sense of how the physical distribution of data will shift, our survey asked, "Which of the following best describes the location of data across your organization relative to on premises versus cloud systems?" (See Figure 19.)

Today, most enterprises surveyed have most of their data on premises. Respondents report having their data almost exclusively on premises (30%) or mostly on premises (48%). Respondents with data on cloud systems report single-digit percentages of data there today.

In three years, most will have doubled or tripled the percentage of data on cloud. Some respondents expect their data to be mostly on premises (29%) in three years. However, others think their data will be in near equal doses on premises and cloud (16%), excessively hybrid and mostly cloud (27%), or almost exclusively cloud or multicloud (21%). Conversely, a mere 2% think their data will be almost exclusively on premises.

Plan capacity for heavier loads on cloud platforms

10 For detailed discussions of hybrid data architectures and architectures for data integration, read the 2018 TDWI Best Practices Report: Multiplatform Data Architectures and the 2014 TDWI Best Practices Report: Evolving Data Warehouse Architectures, both online at tdwi.org/bpreports.

Page 36: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 35

Which of the following best describes the location of data, across your organization, relative to on premises versus cloud systems? Answer for both Today and In Three Years

Almost exclusively on premises 30% 2%

Increasingly hybrid, but still mostly on premises 48% 29%

Near equal doses on premises and cloud 9% 16%

Excessively hybrid and mostly cloud 7% 27%

Almost exclusively cloud or multicloud 6% 21%

Don't Know0%

5%

Figure 19. Based on 56 respondents who have CDM experience.

When and where data is processed moves as data migrates to new storage. As user organizations go deeper into hybrid data architectures, they will regularly redistribute data across multiple platforms of differing types. However, there is more to this process than simply re-examining storage. As data moves, where it is processed will, too. Given the increasing size of data, moving it from platform to platform for processing gets less tenable, which is why platforms and user best practices are evolving to accommodate more data processing inside a platform. This has many names: in situ processing, in-database processing, in-database analytics, push-down processing, and ELT.

Technical users will need to revise their integration and processing solutions for data aggregation, transformation, and quality. This sounds like a problem, but it’s actually an opportunity, because new platforms—both on premises and cloud—offer more options than ever for users to grow into. Users should rely on tools that support native processing on all platforms available as well as the tool’s own server.

To determine where processing is taking place today in hybrid data architectures, our survey asked, "For DM tools and platforms involved with CDM, where does the server software execute today?" (See Figure 20.)

In hybrid data architectures surveyed, most data processing takes place exclusively on premises. This is especially true of large systems, namely relational DBMSs (46%) and data warehouse platforms (37%). It is also true of the core disciplines of data management, namely data quality (35%), metadata (35%), and data integration (30%).

Data processing in the cloud is firmly established. The leader here is analytics tools and sandboxes (30%). This is no surprise because, in the recent uptick of new analytics programs, TDWI has seen many designed from the start to capture data and process it for analytics exclusive in the cloud.

The healthy amount of processing on both proves that data has truly gone hybrid. Prominent systems that process data both on premises and in the cloud include analytics tools and sandboxes (40%), data integration (37%), relational DBMSs (33%), and data warehouse platforms (30%).

Today

In 3 Years

Data processing is just as distributed as data storage

Data processing occurs on premises, in the cloud, or both

CDM Best Practices

Page 37: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

36

Cloud Data Management

For DM tools and platforms involved with CDM, where does the server software execute today? Select one answer per row

Relational DBMSs 46% 17% 33% 2 2

Data warehouse platform 37% 25% 30% 3 5%

Data quality 35% 14% 27% 19% 5%

Metadata management 35% 12% 19% 25% 9%

Data integration 30% 28% 37% 2 3

Hadoop 25% 14% 23% 33% 5%

Data virtualization 23% 24% 21% 16% 16%

Nonrelational and NoSQL DBMSs 21% 25% 21% 26% 7%

Analytics tools and sandboxes 21% 30% 40% 4% 5%

Figure 20. Based on 57 respondents who have CDM experience. Sorted by the On premises column.

USER STORY THE MODERN DATA WAREHOUSE IS INCREASINGLY IN THE CLOUD, BUT STILL RELATIONAL

“We modernized our enterprise data warehouse two years ago, migrating it to the cloud in the process,” said Jean-Paul Saliou, senior director of business intelligence at Genesys, which develops software and services for running small to large call centers. “Our legacy warehouse was on a relational database, and most of our reporting and analytics tools require SQL support. We were also planning to expand our solutions for self-service access to data, reports, and visualization, which involves relational requirements. In a related move, our central IT group decided years ago to give preference to the cloud when planning new implementations, so we knew that the database platform for our next data warehouse should be relational and in the cloud.

“To find the right cloud-based data warehouse platform, we conducted proof-of-concept exercises with two large cloud providers and a small new one. We chose the latter because of its high performance, ease of migration, and consumption-based licensing.

“Today, all BI data is in the cloud, with nothing left on premises. Moving into the future, we’ll expand self-service, increase the adoption of our visualization tool as a spreadsheet replacement, and start a new program for machine learning and other advanced analytics on cloud platforms.”

On premises

Cloud

Both

Not using

Don't know

Page 38: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

tdwi.org 37

Top Ten Priorities for Cloud Data ManagementIn closing, let’s summarize this report by distilling from it the top ten priorities for cloud data management (CDM). Let’s also reflect on why these priorities are important. Think of the priorities as recommendations, requirements, or rules that can guide user organizations through a successful CDM program.

1. Don’t just adopt the cloud. Integrate the cloud, too. Many useful applications, tools, and data platforms are now available on a variety of clouds, and user organizations should avail themselves of these. However, modern software can become a bucket of silos just as quickly as older enterprise applications did. In particular, TDWI has seen organizations go on a shopping spree, acquiring multiple SaaS applications without a plan for integrating and sharing data across these and older applications. Cloud data management can prevent modern silos and get more business value from SaaS applications as well as the many other sources of new data discussed here.

2. Perform CDM for the benefits. The general benefits of the cloud apply to data management, especially scale, speed, elasticity, minimal setup, and maintenance performed by the cloud provider. According to this report’s survey, these favorable cloud characteristics enhance business solutions for analytics, reporting, business activity monitoring, and agility.

3. Beware of CDM’s barriers. Survey respondents redlined issues in governance, migrations, data quality, and tool maturity as common barriers to successful CDM and hybrid data architectures. Even so, these are not catastrophic failures; each is relatively easy to avoid or fix.

4. Know the successful use cases and start with these. Numerous users surveyed reported success while applying CDM to uses cases in decision-making practices, especially reporting and dashboards, data warehousing, and advanced analytics. They have also successfully applied CDM to migrations of applications and data to the cloud as well as implementations of SaaS apps.

5. Consider new cloud-based data platforms. Some of the most exciting new products of recent years are the relational databases purpose-built for data warehousing and analytics in the cloud. These are built from the bottom up to tap the power of the cloud, though at a reasonable cost. One or more of these should always be on any list of data platforms to evaluate when replatforming, modernizing, migrating, or designing databases for warehousing, lakes, operations, analytics, integration, and so on.

6. Deploy significant data integration infrastructure for the cloud. Software for CDM is not just the portfolio of data platforms. You also need appropriate tooling for data integration and other data management disciplines. This is because hybrid data travels relentlessly into, across, and out of the platforms of hybrid data architectures.

7. When possible, virtualize hybrid data instead of consolidating it. In other words, use data virtualization tools to create logical views of existing, siloed data environments without migrating or consolidating their data physically or restructuring their platforms. Migration and consolidation projects are always more expensive, time-consuming, and distracting for both business and technical people than anyone expects—and data virtualization is a viable alternative to these problematic projects. Furthermore, for time-sensitive practices (business monitoring, e-commerce, operational reporting) virtualization provisions fresher data than more latent integration practices can.

Don’t let a cloud be a silo

Be mindful of CDM’s benefits, barriers, and key use cases

Cloud database solutions should be on every evaluation list

Data integration is a key success factor for CDM, including virtual approaches

Top Ten Priorities for Cloud Data Management

Page 39: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

38

Cloud Data Management

8. Govern HDAs (and whole enterprises) holistically instead of per platform. Too often, data governance policies are made on a per application, data set, or use case basis. As the number of these increases (as is typical with successful hybrid data architectures), DG is unable to scale to the massive volume of policies required. Furthermore, the mass of polices confuses users and inevitably leads to contradictory policies. Holistic DG seeks to create as few policies as possible but also make individual policies that apply broadly to many apps, data sets, and use cases. With fewer policies, DG can scale to the complexity of hybrid data environments with fewer opportunities for confusion.

9. Organize migrations to the cloud as multiphase projects. Sometimes you can "lift-and-shift" data from one system to another with minimal work to optimize that data on the new platform and sometimes you cannot. Organizations facing migrations of older applications and data to the cloud should assume that "lift-and-shift" will be inadequate because of the exaggerated differences of old and new platforms discussed earlier. When "lift-and-shift" cannot produce the desired results, create a multiphase project plan (instead of a "big bang" project) that sets proper expectations for time and other resources, then work through the project in a controlled, low-risk manner.

10. Don’t forget architecture. Without data architecture, a hybrid data environment is merely a bucket of siloed data sets. Likewise, the large number of hardware and software servers involved in hybrid IT deserves a systems architecture. Finally, CDM also merits its own architecture in the same way that a data integration solution can have an architecture that differs from the data platform architectures it reads from and writes to. Architecture, when applied to a hybrid data environment, improves its design, maintenance, optimization, usability, data quality, and data standards.

Rethink governance, migrations, and

architectures

Page 40: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

Research Co-sponsor: Datameer

Analytics in the cloud is growing rapidly, with many organizations migrating and adding new analytic workloads there. The elasticity, agility, flexibility, and growing list of powerful services in the cloud are making the choice easy for companies to land new analytics initiatives there in an effort to speed time to insight and become much more data-driven.

Yet, as this TDWI report indicates, organizations still face barriers to managing their data for analytics in the cloud. These challenges center on data security, privacy and governance, data movement, and maturity of the software solutions. The recommended approach is a hybrid data architecture (HDA) which supports the desired agility, yet eliminates the security and governance risks of wholesale migration of data to the cloud.

Datameer is a unified self-service analytics data platform for modern agile analytics that dramatically speeds time to insight. The platform combines data preparation, exploration, and analytics enabling agile creation of data pipelines to feed analytics initiatives and easy last-mile personalization of analytics data sets. The Datameer platform is cloud-native with enterprise-grade scalability, security, and governance to provide the proper support for modern hybrid data architectures.

Datameer offers deep integration with native cloud services to deliver the benefits of the cloud, while transcending these complex services for a higher level, no-coding approach to deliver data to your analytics. Integration with native compute, storage, and security services along with cloud-based data sources, destinations, and tools delivers the scalability and elasticity desired from the cloud combined with the highest security and a seamless, integrated experience.

Cloud analytics initiatives typically involve multiple personas—data engineers, data analysts, and data scientists. Datameer brings these personas together on a unified platform and tools to meet their different skills, functions, and desired approaches. It allows the team members to collaborate and share analytics data sets and pipelines in a secure and well-governed manner further accelerating analytic cycles and time to insight.

• For the data engineer, it offers key no-coding integration, transformation, security, and governance features to enrich raw data and publish 15-25x more curated data sets.

• For the analyst, they can visually explore the data at scale to better understand it and do their last-mile preparation in an easy point-and-click spreadsheet-style interface.

• For the data scientist, they can also visually explore data to find relevance to their models, and easily feature engineer data sets with advanced algorithmic and statistical functions.

Datameer eliminates the need for wholesale data movement with its connected, hybrid architecture, and can bridge your cloud analytics with on-premises data. It offers the highest degree of scalability on a fully elastic infrastructure. It includes deep security and strong governance to eliminate data privacy issues and is trusted by some of the largest financial services firms and insurers.

To learn more and explore how you can take advantage of Datameer for dramatically faster time to insight and accelerate your cloud analytics initiatives, please visit our website at http://www.datameer.com.

tdwi.org 39

datameer.com

Page 41: Cloud Data Management · TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor

555 S. Renton Village Place, Ste. 700

Renton, WA 98057-3295

T 425.277.9126

F 425.687.2842

E [email protected] tdwi.org

TDWI Research provides research and advice for data professionals worldwide. TDWI Research focuses exclusively on data management and analytics issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of data management and analytics solutions. TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor organizations.