teradata past, present and future todd walter cto – teradata labs

36
Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

Post on 21-Dec-2015

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

Teradata Past, Present and Future

Todd WalterCTO – Teradata Labs

Page 2: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

2 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Page 3: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

3 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Teradata Company Highlights

• Founded 1979 – West LA• First product to market – 1984• First Terabyte system – 1987• Acquired by AT&T and

merged with acquired NCR – 1992• Tri-vested as part of NCR - 1997• Teradata Corporation – (re)Launched October 1, 2007

> Global Leader in Enterprise Data Warehousing– EDW/ADW Database Technology– Analytic Solutions– Consulting Services

> Positioned in Gartner’s Leaders Quadrant in data warehousing since 1999

• Top 10 U.S. publicly-traded software company> S&P 500 Member> Listed NYSE: “TDC”> NYSE Arca Tech 100> 2007 - $1.7B revenue

• Global presence and world-class customer list> More than 850 customers> More than 2,000 installations

• 5,500+ associates

Page 4: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

4 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Page 5: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

5 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Continuous (R)evolution

Hardware

+ Database

+ Consulting

+ Data models and reports

+ Analytic applications

Page 6: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

6 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Continuous (R)evolution

Sell the HW, give everything else away

Sell the SW with some HW to run on

Sell solving business problems – and technology to solve them

Sell applications with consulting, SW and HW inside

Page 7: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

7 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Continuous (R)evolution

90% R&D 10% integration80286

70% R&D 30% integrationi486

20% R&D 80% integrationPentium

10% R&D 90% integrationXeon Quad Core

Page 8: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

8 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

1901

1903 1906

1909

1907

1939

1905

1920

1963

1991

1941 1971

1991

1985

19971994

1950

An AT&T Company

TRADEMARK

Global InformationSolutions

Page 9: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

9 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Scale

• Every dimension of the technology must scale to meet today’s requirements> Data, Data model complexity, Users, Performance, queries, Data loading, …

• What is a big Data Warehouse?• Total spinning disk?

> 2.5 Petabytes• Big table?

> 150 billion rows• Number of tables?

> 300,000• Insert/Update per day?

> 5 billion records• Identified users?

> 100,000• Queries per day?

> 5 million• Data Turnover rate?

> 1TB per 5 seconds

Page 10: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

10 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

The Problem

Accts. Payable

Accts. Receivable

Invoicing

Sales/Orders

Finance G/L

Customer Support

HR

Payroll

Purchasing

Order Fulfillment

Manufacturing

Inventory …

Marketing

Supply Chain

Finance

Risk Management

Maintenance

Sales

Operations

Inventory

Call Center …

ProliferationProliferation of of Data MartsData Marts has resulted in has resulted in fragmented data, higher costs, poor decisionsfragmented data, higher costs, poor decisions

Operational Systems Decision Makers

Page 11: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

11 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

The EDW Solution

Accts. Payable

Accts. Receivable

Invoicing

Sales/Orders

Finance G/L

Customer Support

HR

Payroll

Purchasing

Order Fulfillment

Manufacturing

Inventory …

Enterprise Enterprise

Data Data WarehouseWarehouse

(EDW)(EDW)

Integrated data provides consistency of data, Integrated data provides consistency of data, lower costs, better decisionslower costs, better decisions

Marketing

Supply Chain

Finance

Risk Management

Maintenance

Sales

Operations

Inventory

Call Center …

Operational Systems Decision Makers

Page 12: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

12 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Active Enterprise Intelligence™An Obvious Trend: More Speed, More Users

Days

Seconds

Strategic Intelligence Operational Intelligence

Enterprise Data WarehouseBI Tools & reports

Analysis & visualizationPredictive Analytics

EDW Enterprise IntegrationMixed workload management

SOA, BPMS, IDEsPortals/composite applications

Page 13: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

13 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Active Enterprise Intelligence™ enabled by anActive Data Warehouse™

STRATEGIC INTELLIGENCEOPERATIONAL INTELLIGENCE

Business IntelligenceTools and Applications

Teradata Warehouse

Workflow & Applications

Active EventsActive Access

Suppliers CustomersCall

CenterLogistics MarketingFinance

Product/Services

Executive

Active Enterprise Integration

ActiveAvailability

ActiveWorkload

Management

ActiveLoad

Page 14: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

14 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Active Enterprise Intelligence™ in Retail Detecting Retail Fraud

Situation

Thieves make copies of cash register receipts, walk into the store, pick up merchandise, and return items for cash.

Problem

Associates in returns department did not have historical POS receipt retrieval access to verify against previously “returned” receipts or to do returns without receipts.

Solution

Associates query Teradata to quickly check if a return has already occurred on that receipt number. Also used by analysts to understand and prevent excessive returns.

Impact

(for 500-store chain)• 100% ROI in 5 months• Stopped a crime ring on

the first day of rollout• “Cost savings have been

huge”

Page 15: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

15 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Active Enterprise Intelligence™ in RetailSingle View of the Customer Across All Channels

Situation

Needed to add Web channel for selling shoes.

Problem

Too much time and cost to keep multiple customer systems synchronized. Realized they needed just one customer database, not one more for the Web, in addition to Call Center, and POS/Store databases.

Solution

Adopted an ADW strategy, moved all customer data to one Teradata system, revised data models to cover all channels, added web channel for commerce, used web services, added TASM to handle multiple workload types

Impact

• 1M tactical hits to the EDW per day from the POS, Call Center, and Web with 0.11 sec response time

• Runs simultaneously with back-office BI, reports, and ETL workloads

• Eliminated all other customer data systems

Page 16: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

Change is Fast and Getting Faster

New Challenges for Database Technology

Page 17: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

17 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

What is the Measure of a Great Architecture?

Handle huge changes of underlying technologies and dependent components while continuing to deliver the key value proposition.

Page 18: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

18 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Page 19: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

19 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

2003 2005 2007 2009 2011

90nm process

45nmprocess

65nmprocess

32nmprocess

22nmprocess

Hyper-Threading Dual Core Multi Core

Processor RoadmapCPU power radically increasing

20002000 2008+2008+

SP

EC

Int2

000

SP

EC

Int2

000

5X5X

SINGLE-CORESINGLE-COREPERFORMANCEPERFORMANCE

DUAL/MULTI-COREPERFORMANCE

20042004Source – Intel Corporation

Page 20: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

20 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

What Does Shared Nothing Mean?

• 1985 – Every hardware part, every line of software – “pure” shared nothing

• 1995 – Multiple units of parallelism sharing CPU, memory• 2004 – Multiple units of parallelism sharing multiple

cores, memory• 2009 – Multiple units of parallelism sharing same physical

spindles – but still not sharing data• Future – Multiple units of parallelism in Virtual

machines/cloud not even knowing what physical machine it is on or sharing

Page 21: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

21 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Teradata MPP Server Architecture

• Nodes> Incrementally scalable to

1024 nodes• Operating System

> Linux, Windows, Unix• Storage

> Independent I/O> Scales per node

• BYNET Interconnect> Fully scalable bandwidth

• Connectivity> Fully scalable> Channel – ESCON/FICON> LAN, WAN

• Server Management> One console to view

the entire system

SMP Node1 SMP Node2 SMP Node3 SMP Node4

Server Management

Dual BYNET Interconnects

CPU1 CPU2

Memory

Operating Sys

CPU1 CPU2

Memory

Operating Sys

CPU1 CPU2

Memory

Operating Sys

CPU1 CPU2

Memory

Operating Sys

Page 22: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

22 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Shared Nothing - Dividing the Work

• “Virtual processors” (vprocs) do the work• Two types

> AMP: owns and operates on the data> PE: handles SQL and external interaction

• Configure multiple vprocs per hardware node> Take full advantage of SMP CPU and memory

• Each vproc has many threads of execution> Many operations executing concurrently> Each thread can do work for any user, transaction

• Software is equivalent regardless of configuration> No user changes as system grows from small SMP to huge

MPP

Page 23: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

23 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

AMPsLogs

Locks

Buffers

I/O

Shared Nothing - Dividing the Work

• Basis of Teradata scalability> Each AMP owns an equal slice of the disk> Only that AMP reads that slice

• No single point of control for any operation> I/O, Buffers, Locking, Logging, Dictionary> Nothing centralized> Exponential communication costs avoided

# Nodes

Coordination cost

Teradata

Page 24: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

24 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

• Rows automatically distributed evenly by hash partitioning

> Even distribution results in scalable performance> Done in real-time as data are loaded, appended, or changed.> Hash map defined and maintained by the system

– 2**32 hash codes, 64K buckets distributed to AMPs

> Prime Index (PI) column(s) are hashed> Hash is always the same - for the same values> No reorgs, repartitioning, space management

Teradata Data Distribution

AMP1 AMP2 AMP3 AMP4 ……………………………………………………… AMPn

Table A Table B Table C

Primary Index

Teradata Parallel Hash Function

P

DM

P

DM

P

DM

P

DM

P

DM

P

DM

P

DM

P

DM

P

DM

RowHash (Hash Bucket) Data Fields

Page 25: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

25 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Disk Capacity Exploding with Little Increase in Performance

36 GB

5.5

73 GB

6.0

146 GB

6.4

.044

.080

.155

Perf

orm

an

ce p

er

Cap

acit

yM

B/S

ec/G

B

Dis

k D

rive B

an

dw

idth

(M

B /

Sec)

1

2

3

4

5

6

7

8

Disk Drive Capacity

Random I/O; 48K block; 80% read

Page 26: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

26 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Platform Change

• Focus used to be > Optimization of expensive CPU cycles> Micro-management of precious disk space

• Now> Manage I/O> Balance CPU power to the I/O capacity> Find new ways to optimize I/O, trading for CPU use as

necessary> Pulling 2.5GB/sec per node continuous

• Discontinuity coming> SSDs become price competitive and reliable

Page 27: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

27 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

File System

• Teradata wrote a new rule book> Old one written by IBM 35 years ago, used by all mainstream DBMSs

today - except Teradata• File system built of raw slices• Rows stored in blocks

> Variable length> Grow and shrink on demand> Rows located dynamically

– May be moved to reclaim space, defrag> Maximum block size is configurable

– System default or per table– 8K to 128K– Change dynamically

• Indexes are just rows in tables• Has evolved from direct management of single spindles to

completely virtualized storage, not even knowing spindle location

Page 28: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

28 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Workload Management Evolution

• 1984 – pure timeshare• 1987 – 4 priorities, defined by user• 1995 – multiple priorities in multiple partitions• 2000 – weighted workload groups• 2004 – queuing, reserved resources, focus on tactical

work• 2009 – Visualization and detailed workgroup

management• Future – Set service level goals, our job to deliver

Page 29: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

29 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Speed10

Active Events

Active Access

Query and Reporting

Active Load

Active Data Warehouse

Active Workload Management

• Manage workloads> Reduce server congestion

• Dynamically adjust in-flight task priority> Turn the dial – change

priorities

• Fast active access queries> Performance, performance,

performance

• Get maximum throughput

Speed60

Speed75

Speed25

Page 30: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

30 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

TASM Reporting/Monitoring - 13.10

Page 31: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

31 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

IT, Finance,Planners,

Power Users,Data Miners

Executives,Middles

Managers, Marketing

1000000

100000

10000

1000

100

10

ConsumersSuppliers

B2B

OperationalEmployees

Category Mgr, Line

Managers, Service

Managers

Users

Business Critical

Mission Critical

DualActive

Strategic Intelligence Operational Intelligence

Availability Requirements

Page 32: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

32 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

“Always ON” – An Elusive Challenge

• Unplanned downtime> Hardware faults> Software faults> Hangs

• Planned downtime> Software upgrade> Hardware upgrade> Data center maintenance

• “Disasters”> Multi-component failures> Building disasters> Area disasters

• And optimize resource value to the business• And avoid hidden costs and surprises

> Eg Major performance variations• Major opportunity for research – but must be holistic

> Reaches far beyond core database

Page 33: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

33 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Real time Operational Actions

StrategicIntelligence

OperationalIntelligence

1. Customer makes multi-segment travel reservation

“Active”Enterprise Data

Warehouse

3. What are the customers’ flying history?

4. How profitable is each customer?

5. Which customers experienced delays or other problems in last 6 months?

2. Flight reroutedcausing missedconnections.

WebSphere MQ,Oracle AQ,

Microsoft MSMQ

6. Customer re-bookedand notified.

7. Airport operations adjusted

Page 34: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

34 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

Real Time Customer Management

StrategicIntelligence

OperationalIntelligence

4. Is this customer approaching the predicted loss rate for their segment?

5. What offers are available for this customer?6. Message sent to

floor Luck Ambassador with customer offer to prevent additional losses.

TIBCO2. What is the customer’s

past spending history in all our casinos?

3. What is a significant loss for this person based on market segment, past and predicted behavior?“Active”

Enterprise DataWarehouse

1. Customer inserts Total Rewards Card at Slot Machine

Page 35: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

35 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved

That’s a Wrap!

• Business requires a new level of decision making> Many more decisions by many more people much faster> Current representation of the state of the enterprise

• Data Warehouse must evolve to support the requirements of Active Enterprise Intelligence

• Technology must evolve to deal with the new requirements> Rich area for research and innovation> Change view of what data warehouse/BI means

• Teradata driving an aggressive roadmap to meet real business requirements

Page 36: Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

36 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved