starting small (teradata appliance family)

7
S TARTING S MALL , BUT T HINKING L ARGE AND S CALING F AST A Guide to the Teradata Appliance Line WILLIAM MCKNIGHT www.mcknight cg.com

Post on 20-Oct-2014

754 views

Category:

Technology


0 download

DESCRIPTION

As companies take steps to manage their information asset, choosing a platform and database management system (DBMS) is absolutely fundamental. In fact, the platform is the foundation of architecture and business intelligence and the starting point for tool selection, consultancy hires, and more. In short, a company’s platform is key in defining its information culture.

TRANSCRIPT

Page 1: Starting Small (Teradata Appliance Family)

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST

WILLIAM MCKNIGHT

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST

A Guide to the Teradata Appliance Line

WILLIAM MCKNIGHT

www.mcknightcg.com

Page 2: Starting Small (Teradata Appliance Family)

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 2

WILLIAM MCKNIGHT

Starting Small, but Thinking Large and Scaling Fast

INTRODUCTION

As companies take steps to manage their information asset, choosing a platform

and database management system (DBMS) is absolutely fundamental. In fact, the

platform is the foundation of architecture and business intelligence and the starting

point for tool selection, consultancy hires, and more. In short, a company’s

platform is key in defining its information culture.

These platform decisions are taking place in a challenging context. Over time, data

volumes are continuing to soar as history accumulates, syndicated data is collected

and new sources with more detailed data are added. Furthermore, communities

consuming the data continue to grow, expanding well beyond usual company

boundaries to customers, supply-chain partners, and even the internet. Companies

need to make sure they choose a proven platform not just for initial, known

requirements but also with scalability to future, to-be-determined requirements as

data, users, and applications grow.

These challenges are no longer only affecting the big players. Mid-size companies1

have similar data management needs to Fortune companies, albeit with reduced

data volume and, sometimes, fewer users. They, too, need:

Rapid development that can be built upon over time. Quality data that is available. Architectures that provide low, long-term total cost of ownership (TCO). Good query performance that results in increased interactive usage. Ability to get to real-time feeds. A platform to support advanced workload management. A scalable path forward as data, users, and application needs grow.

Table of Contents Introduction ........................................................................ 2 Information is of Major Importance ................................... 3 The Enterprise Data Warehouse Approach ........................ 3 Mid-market Data Warehousing and BI ............................... 4 Criteria for an EDW Platform Selection .............................. 4 Teradata Innovations for Performance and Availability .... 5 The Teradata Data Warehouse Appliance .......................... 5 The Teradata Data Mart Appliance ..................................... 6 The Teradata Extreme Data Appliance ............................... 6 Scaling to the Teradata Active EDW ................................... 6 Conclusion ........................................................................... 6 About the Author ................................................................ 7

provided by: William McKnight

1 For purposes of this paper, mid-size companies will be defined as companies with $1B to $50B in annual revenue.

www.mcknightcg.com

Page 3: Starting Small (Teradata Appliance Family)

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 3

WILLIAM MCKNIGHT

Complicating matters, in selecting the platform to support their data, companies are now faced with an exponentially higher

number of variations and distinct departures from the traditional online transactional processing (OLTP) DBMS than ever

before.

In 2008, in concert with this increase in information management needs, Teradata Corporation – a successful data warehouse

provider for the one-terabyte+ market for nearly 30 years – began making its technology affordable to the mid-market

customer. This move is ushering in a new era of scalability and performance in that segment, as the #1 platform provider is

poised to provide its leadership and influence for companies off, as well as on, the Fortune charts.

INFORMATION IS OF MAJOR IMPORTANCE

The battleground on which many industries engage today extends well beyond customary core competencies to the collection,

management, and use of data. As proof even in a subdued economy, business intelligence remains at the forefront of IT-related

spending. This is in large part due to the applicability of information directly and indirectly to the organization’s bottom line.

Information must be flexible, manageable, and actionable. And it must be all these things within the framework of a multitude

of IT-related realities, such as:

Multiple, complex applications serving a variety of users Exploding data size Data latency becoming intolerable as real-time information becomes necessary to compete

As data begins to accede to its profitable use and platforms evolve to handle the workload, it’s always only a matter of time

until new demands to leverage data arise, adding requirements on a seemingly ongoing basis. But there is a natural flow to

information management maturity that Teradata is not only well aware of, but has helped define over the years. Today, this

maturity includes using data to take advantage of relationships that extend beyond the company walls.

But acknowledging these requirements and realities, and being able to support them are two different things.

THE ENTERPRISE DATA WAREHOUSE APPROACH

The efficacy of having a centralized data store with quality, integrated, accessible, high-performance, and scalable data cannot

be denied, regardless of company size. Yet some organizations with a decentralized orientation believe that initiating an

enterprise data warehouse (EDW) is too difficult an endeavor without a quick and clear ROI. The assumption here is that

EDW architecture implementation has an unbearable, year-plus timeline when it comes to delivering business value.

Fortunately, this is no longer the reality. Today, EDW represents commitment to organize the information of the corporation,

regardless of its size, in the most efficient manner possible. It’s not put in place using a big bang approach, but is instead,

primarily accomplished by meeting the objectives of a key subject area, data source, business objective, or user department,

and then progressively building the environment with scalability from there. Another manageable aspect of EDW

implementation is through the consolidation of smaller, independent data marts into a centralized, money-saving architecture.

The most efficient way to accomplish EDW objectives is the way that builds a data warehouse to solve specific needs, but does

so in a manner that leverages previous investment in the architecture, tools, processes, and people, and does not prohibit future

growth. This enables an efficient, programmatic approach to data warehousing created to serve information to the enterprise.

Setting aside EDW implementation is also particularly important for mid-market organizations that are getting started

developing their architectural foundations. Too often these decisions are made within departmental boundaries without

consideration of an overarching data warehousing strategy. This has led many organizations down the path of data mart

proliferation – the creation of non-integrated data sets developed to address specific application needs, usually with an

inflexible design. In the vast majority of cases, data mart proliferation is not the result of a chosen architectural strategy, but a

consequence due to lack of an architectural strategy. In either case, bringing the EDW approach to bear economically at the

outset of such development is critical to economically taking advantage of its vast promise down the road.

Page 4: Starting Small (Teradata Appliance Family)

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 4

WILLIAM MCKNIGHT

MID-MARKET DATA WAREHOUSING AND BUSINESS INTELLIGENCE

Business intelligence vendors have been slow to respond to the needs of the midmarket. This factor, combined with their own

more limited budgets, has meant that many in the midmarket have had to take alternative paths to business intelligence than the

Fortune 50. In fact, the multi-layered architectures and multi-quarter “timeframes-to-value” were barriers to business

intelligence in the midmarket long before the current recession began.

Teradata is among the vendors that has mobilized solutions with the realities of the mid-market in mind. Enterprise-class

business intelligence with simplicity and scalability is available now in a midmarket-oriented suite of affordable platforms

delivered in the increasingly popular preconfigured “data warehouse appliance” model. The data warehouse appliance is a

hardware/software/OS/DBMS/storage preconfiguration for data management requirements. Low TCO for a mixed workload

data warehouse environment is consequential with appliances.

Naturally, vendors can mix and match their components to best suit certain workloads. Without compromising on the criteria

that experienced practitioners know to be required for success at any level, Teradata has done this with the Teradata® Data

Warehouse Appliance, Teradata Data Mart Appliance, and the Teradata Extreme Data Appliance. All are designed and priced

to meet midmarket needs, or the departmental needs of the larger enterprise.

Teradata appliances use the proven and powerful Teradata DBMS. They also benefit from Teradata’s industry-leading

integration with multiple data integration and BI tools and vendors.

CRITERIA FOR AN ENTERPRISE DATA WAREHOUSE PLATFORM SELECTION

The decision process for choosing a data warehouse platform should go well beyond the usual consideration of the operational

DBMS vendor. Nuances about several potential requirements include:

The immediate availability of information

Cross-functional complexity

The level of query concurrency

The scalability needs of the platform

The functionality of the DBMS

Given the state of the marketplace, the technical architecture for a data platform in a mid-size-or-larger company should be:

Scalable – The solution should be scalable in both performance capacity and incremental data volume growth. The solution

should scale in a near-linear fashion and allow for growth in database size, the number of concurrent users, and the

complexity of queries. Understanding hardware and software requirements for such growth is paramount.

Powerful – The platform should be designed for complex decision support in an advanced workload management environment.

The optimizer should be mature enough to support every type of query with good performance. Determine the best

execution plan based on changing data demographics. Check on conditional parallelism and the causes of variations in the

parallelism deployed, and on dynamic and controllable prioritization of resources for queries.

Manageable – The solution should be manageable with minimal support tasks requiring DBA/System Administrator

intervention. There should be no need for the proverbial army of DBAs to support an environment, and the system should

provide a single point of control to simplify administration. You should be able to create and implement new tables and

indexes at will.

Extensible – Look for flexible database design and system architecture that keeps pace with evolving business requirements

and leverages existing investment in hardware and software applications. Know the answers to questions such as: What is

required to add and delete columns? What is the impact of repartitioning tables?

Interoperable – The system should have integrated access to the web, internal networks, and corporate mainframes.

Recoverable – In the event of component failure, the system must continue providing value to the business. It also should

allow the business to selectively recover the data to points in time – and provide an easy-to-use mechanism for doing this

quickly.

Affordable –The proposed solution (hardware, software, services) should provide a relatively low TCO over a multi-year

period.

Flexible – The system should provide optimal performance across the full range of normalized, star, and hybrid data schemas

with large numbers of tables. Look for proven ability to support multiple applications from different business units, lever-

aging data that is integrated across business functions and subject areas.

Page 5: Starting Small (Teradata Appliance Family)

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 5

WILLIAM MCKNIGHT

Robust in Database Management Systems Features and Functions – Make sure there are DBA productivity tools,

monitoring features, parallel utilities, robust query optimizer, locking schemes, security methodology, intra-query parallel

implementation for all possible access paths, chargeback and accounting features, and remote maintenance capabilities.

There are few vendors who understand what it means to build mission-critical, well-performing data platforms that meet all of

the above criteria. Of course, the vendor itself should be a major consideration, especially in these days of consolidation.

When making this all-important decision, consider a vendor’s financial stability, the importance of data management to their

overall business strategy, and their continued research and development in these areas towards a well-developed and relevant

vision.

TERADATA INNOVATIONS FOR MAXIMUM PERFORMANCE AND AVAILABILITY

One of the hallmarks of Teradata’s unique approach is that all database functions (table scan, index scan, joins, sorts, insert,

delete, update, load and all utilities) are done in parallel all of the time. There is no conditional parallelism. All units of

parallelism participate in each database action.

Also of special note is the table scan. One of Teradata Database’s main features is a technique called synchronous scan, which

allows scan requests to “piggy back” onto scans already in process. So maximum concurrency is achieved through maximum

leverage of every scan. Teradata Database keeps a detailed profile of the data under management to efficiently scan only the

limited storage where query results might be found.2

The Teradata optimizer intelligently runs steps in a query in parallel wherever possible. For example, for a three-table join

requiring three-table scans, Teradata Database would start all three scans in parallel. When scans of tables B and C finished, it

would begin the join step as the scan for table A finished.

Teradata’s optimizer is grounded in the knowledge that every query will be executing on a massively parallel processing

system (MPP). Such systems are generally acknowledged as the preferred architecture for analytic query, business intelligence,

and data warehousing. Teradata systems do not share memory or disk across the nodes, the collections of CPU, memory and

bus connected in an MPP environment. Sharing disk and/or memory creates overhead. Sharing nothing minimizes disk access

bottlenecks.

The Teradata BYNET®, the node-to-node interconnect, which scales linearly to more than a thousand nodes, has fault

tolerant characteristics designed specifically for a parallel processing environment.

Hot-pluggable components allow you to replace components without affecting your applications. If a component fails, built-

in redundancy allows the application to continue running in Teradata systems. Furthermore, the growth path in the Teradata

environment is a function of easily adding nodes and disk storage.

Continual feeding without table-level locks with Teradata utilities can be done with multiple feeders at any point in time. And

again, the impact of the data load on the resources is customizable. The process ensures no input data is missed regardless of

the allocation.

Teradata has extended the concepts that are interesting to the midmarket and to a single-application focus from their Active

Enterprise Data Warehouse into their new appliance family. In so doing, Teradata has ushered in true business intelligence

affordability for the midmarket.

THE TERADATA DATA WAREHOUSE APPLIANCE

The Teradata Data Warehouse Appliance supports the EDW approach to building the data warehouse and is the Teradata

appliance family flagship product. It is suitable for an upper midmarket true EDW or as the platform for a focused application.

With four MPP nodes per cabinet and scaling up to 11 cabinets with 12.6 terabytes each, the Teradata Data Warehouse

Appliance can manage up to 140 terabytes3, with the workload characteristics of a typical data warehouse – multiple, complex

applications serving a wide variety of users. The experience can begin at two terabytes of fully redundant user data on two

2 Teradata Intelligent Scanning

Page 6: Starting Small (Teradata Appliance Family)

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 6

WILLIAM MCKNIGHT

nodes and grow, node-by-node if necessary, up to 46 nodes. The nodes can be provided with Capacity on Demand as well,

which means the capacity can be configured into the system unlicensed until it is needed. This makes adding the capacity

simple.

THE TERADATA DATA MART4 APPLIANCE

The Teradata Data Mart Appliance is a more limited capacity equivalent of the Teradata Data Warehouse Appliance and is

ideal for the data warehouse or another of the larger data stores in the midmarket. It’s a single node, single cabinet design with

a total user data capacity of six terabytes5. It should be noted, though, that a single node environment comes with the potential

for downtime in the unlikely event that the node fails – there is no other node to cover for the failure.

THE TERADATA EXTREME DATA APPLIANCE

Though not nearly strictly a mid-market need, the Teradata Extreme Data Appliance is also part of the Teradata appliance

family, and represents affordability for the management of large data. It out-scales even the Teradata Active EDW platform.

While the Active EDW tops out at 10 petabytes, the Extreme Data Appliance will scale to 50 petabytes. A system of this size

would have less concurrent access requirements due to access being spread out across the large data set.

The Teradata Extreme Data Appliance is designed for high-volume data capture such as that found in click stream capture, call

detail records, high-end POS, scientific analysis, sensor data, and any other specialist system useful when the performance of

straightforward, non-concurrent analytical queries is the overriding selection factor. It also can serve as a surrogate for near-

line archival strategies that move interesting data to slow retrieval systems, and it will keep this data online.

SCALING TO THE TERADATA ACTIVE ENTERPRISE DATA WAREHOUSE

Any code built for a Teradata appliance is completely portable to the Teradata Active Enterprise Data Warehouse, in case you

need to go beyond the chosen Teradata appliance. This platform for data warehousing with nine nodes per cabinet scaling up

to 1,024 nodes, has a total disk capacity of 10 petabytes. A superset of features is part of the Teradata Active EDW, including

automatic node failover and recovery, active system management with full performance continuity with hot standby nodes, fall-

back, backup and recovery, and dual active systems. The system is designed to manage the most mission-critical systems. The

need for such management could be one reason to upsize to this platform. Another reason, except for those using the Extreme

Data Appliance, might be data sizing.

CONCLUSION

From straightforward mid-market data warehouse requirements to the global enterprise and beyond, Teradata’s platforms are

built on a foundation that has served the largest and most complex environments in the world for nearly 30 years. By meeting

the needs of the midmarket with the proven appliance model, as well as with a flexible combination in nodes, maximum data

size, storage and cabinet configurations, and high availability features, Teradata is showing its leadership in the midmarket, as

well as in the larger-company arena.

Teradata solutions allow you to start small, think big, and scale fast in terms of an EDW approach to data management and, if

required, migrate to an Active EDW platform. The Teradata Data Mart Appliance is the robust selection for the mid-market

data warehouse or data store. The Teradata Data Warehouse Appliance takes the data mart appliance benefits to another level,

and the Teradata Extreme Data Appliance has the upper end of data size covered for any enterprise.

Whatever your information needs, Teradata’s principles of scalability, power, manageability, extensibility, interoperability,

manageable long-term TCO, flexibility, and robust features and functions support the possibilities.

3 Numbers do not assume compression, which should allow for 30% more user storage on average. 4 Data “Mart” (vs. Warehouse) is a product label only and is meant to address scale of the project and not the polar opposite of a Data Warehouse 5 However, as noted, once the limits are approached, porting to the Teradata Active Enterprise Data Warehouse is an attractive option.

Page 7: Starting Small (Teradata Appliance Family)

STARTING SMALL, BUT THINKING LARGE AND SCALING FAST 7

WILLIAM MCKNIGHT

About the Author William functions as Strategist, Lead Enterprise Information Architect, and Program Manager for complex, high-volume full life-cycle implementations worldwide utilizing the disciplines of data warehousing, master data man-agement, business intelligence, data quality and operational business intelligence. Many of his clients have gone public with their success stories. William is a Southwest Entrepreneur of the Year Finalist, a frequent best prac-tices judge, has authored more than 150 articles and white papers and given over 150 international keynotes and public seminars. His team’s implementations from both IT and consultant positions have won Best Practices awards. William is a former IT VP of a Fortune company, a former engineer of DB2 at IBM and holds an MBA.

Teradata and the Teradata logo are registered trademarks of Teradata Corporation and/or its affiliates in the U.S. and worldwide.

EB-5933 > 0609

5960 W. Parker Rd., Suite 278-133

Plano, TX 75093

Tel (214) 514-1444

William can be reached at 214-514-1444 or [email protected].