born to be parallel, and beyond part ii · teradata fastexport utility built on the original...

29
#TDPARTNERS16 GEORGIA WORLD CONGRESS CENTER Born To Be Parallel, and Beyond Part II Carrie Ballinger Rich Charucki Performance Engineer, Teradata Labs Teradata Fellow, Teradata Labs

Upload: others

Post on 13-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

#TDPARTNERS16 GEORGIA WORLD CONGRESS CENTER

Born To Be Parallel, and Beyond

Part II

Carrie Ballinger

Rich Charucki

Performance Engineer, Teradata Labs

Teradata Fellow, Teradata Labs

Page 2: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

At Teradata, we believe…

Analytics and data unleash the potential of great companies

Page 3: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

• Part I• The Considerable Contribution of the BYNET®

• A Very Fluid File System

• Part II• Multiple Dimensions of Parallelism• Gracefully Managing the Flow of Work• The Roots of Prioritization• Conclusion – Emerging Opportunities

Agenda

3

Page 4: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Multiple Dimensions of Parallelism

Section

4

Page 5: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Parallel Execution Across All AMPsEach AMP is One Unit of Parallelism

1) Query Execution Parallelism

Backup & Recovery

Building Indexes

Row Locking

TransactionJournalling

SortingReading Writing

AMP 1

Loading

Aggregating

AMP 1’s DataStatistics

Building Indexes

Row Locking

TransactionJournalling

SortingReading Writing

AMP 2

Loading

Aggregating

AMP 2’s DataStatistics

• Teradata was designed to maximize throughput of each individual request and remove single points of control

• Architected so queries can benefit from multiple dimensions of parallelism5

Statistics

Building Indexes

Row Locking

TransactionJournalling

SortingReading Writing

AMP 3

Loading

Aggregating

AMP 3’s Data

Page 6: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Parallelism Across Multiple Query Steps

• The optimizer can choose to execute multiple steps within the same query at the same time

• A technique to speed up the query completion

JOINPRODUCT & INVENTORYRedistribute

1.1 1.2SCAN STORES

Redistribute

2.1 2.2JOINSPOOL

Redistribute…

JOINITEMS

& ORDERSRedistribute…

2) Multi-Step Parallelism

6

Page 7: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Parallelism of Activities Within a Query Step

• Pipelining of different operations within a single step• Overlapping of activities inside a step provides an additional

dimension of parallelism

3) Within-a-Step Parallelism

AMP 1AMP 2

AMP 3AMP 4

. . .

Select & Project Product tableSelect & Project Inventory table

Join Product and Inventory tables

Send joined rows to other AMPs (redistribute)

Tim

e 1-

Star

t Ste

pTi

me

2

Tim

e 3

Tim

e 4

Step

Don

e

7

Page 8: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Optimizer was Designed to Maximize Parallel Opportunities

Plan with serial joins

Table1 Table2

Table3

Table4

Table5

Join

Join

Join

Join

Join

Table6

Join

Plan with parallel joins

Table1 Table2 Table3 Table4

Table5

Join

Join Join

Join

Join

Table6

• Optimizer builds query plans to maximize the throughput of a single request– “Bushy plans” maximize parallel step

opportunities– Query will usually complete sooner 8

Page 9: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

from the database• Final spool file is evenly distributed across all AMPs • Returned to the client in parallel using multiple sessions

• Each AMP is returning the answer set in parallel across multiple sessions

• Each AMP is working on the query in parallel

CLIENT

Returned Rows

AMP 1 AMP 2 AMP 3

Spool Spool Spool9

Page 10: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Teradata QueryGrid Performance Enhanced by Parallel Return of Remote Data

• Teradata parallelism provides more connection points when transferring remote data into the database.

• Multiple AMPs involved in receiving from or sending to data remote platforms.

• Dynamic statistics are collected on Hadoop data across each AMP in parallel.

10

Local Teradata

AMP

AMP

AMP

AMP

A

H

E

PE

Remote Hadoop

HIVE

HCATALOG

Page 11: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

The Parallel Database Extensions (PDE) Layer Enables Virtualization of AMPs and Parsing Engines

The PDE layer virtualizes the hardware and the operating system for the database engine• Enables the definition of multiple virtual AMPs and Parsing Engines per node• Shields the database from having to know physical locations or hardware detail

when messages are sent• Makes it possible to maximize processing power as hardware evolvesThe PDE layer enables high availability• Can migrate AMPs to different nodes without database knowledge or involvement

Teradata Operating System

MP RAS / UNIX Linux

PDE – Parallel Database Extensions

Page 12: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

The AMP Before Virtualization

12

In the early Teradata days the AMP was a hardware component

The addition of the PDE layer allowed multiple “virtual” AMPs to be defined on a single node

The AMP Board

Page 13: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Gracefully Managing the Flow of Work

Section

13

Page 14: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Work Flow Challenge for Parallel Databases

How much work is too much for an AMP?

Optimizer applies multiple levels of parallelism on each query• Database engine is good at exploiting parallelism• Just a few queries can saturate the system

Teradata was designed as a throughput engine• Able to be productive with high demand, many users, maximum

resource levels

14

How is system health and throughput protected during times of extremely high demand?

Page 15: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Complete Decentralized of Control Over the Flow of Work

Each AMP monitors and manages its own flow of work independentlyOnly pushes back when the AMP needs a slight pause to complete work already underwayTwo forms of pushing back:

1. Queueing up arriving messages2. Turning away new messages

AMP1 AMP2 AMP3

Can you do more work?

Can you do more work?

Can you do more work?

AMP1 AMP2 AMP3

Non-Scalable Polling Approach

Scalable Decentralized Approach

Can I do more work?

Can I do more work?

Can I do more work?

Central Controller

15

Page 16: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Work Messages: Vehicle for Bringing New Work Steps From the Parsing Engine to the AMPs

AMP Worker Tasks (AWTs) are stateless, can support a variety of work including

• User-submitted work (load jobs, queries)

• Internal software processes (such as space accounting)

Are allocated at start-upWork messages are categorized into “Work Types” based on importance of the work (Work00, Work01, Work02)

Optimized Query Steps

Completion message to PE

Step is sent to AMPs

Parsing Engine

Work messages use AMP worker tasks to accomplish their work

Pool of Available AWTs

Message gets an AWT to do the work within the message

AWT is released

Step is done

AMPs

16

Page 17: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

When All AMP Worker Tasks are Busy, Arriving Messages are Queued

Each AMP has its own local work message queue in memory Queued messages are sequenced by:

• Work type in descending sequence• Priority within the work type

Some AMPs may de-queue a work message and begin processing a query step sooner than other AMPs

Work01 messages

Work00 messages

Work02 messages

17

Page 18: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

When Too Many Messages are Queued Up, Arriving Messages are Returned to the Sender

3 Work01 messages

20 Work00 messages

Queued messages for Work Type Work00 have reached

their limit of 20

Try later...

18

Newly-arriving Work00 messages will be

returned to sender

Page 19: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Each AMP Makes Its Own Decisions Independently

Each AMP monitors its own work flow, and pushes back temporarily when it has more work than it can easily process

AMP 4AWTs are available

AMP 5AWTs are available

AMP 9Exhausted AWTs

Messages are queued up

AMP 6AWTs are available

AMP 7In flow control

Messages retried

Node 0 Node 2

Node 1 Node 3

AMP 0AWTs are available

AMP 8AWTs are available

AMP 10AWTs are available

AMP 3AWTs are available

AMP 1AWTs are available

AMP 11AWTs are available

Two AMPs are queueing messages

One AMP is sending messages back

AMP 2Exhausted AWTs

Messages are queued up

19

A w

ork

mes

sage

arri

ves

on a

ll A

MP

s

Page 20: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Riding the Wave of Full Usage

Flow control mechanisms are embedded deep in the base of the database

• Able to support parallelism and minimize query execution time when just a few queries are active

• And protects overall system health under extreme usage conditionsGetting back to normal processing is simple, immediate, minimal overhead

• No communication layers need traversing• No messaging or alerting to other components

Provides a highly-efficient, non-intrusive mechanism that performs well with 2 AMPs or 2000 AMPs

It worked great at the beginning, it still works great

20

Page 21: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Prioritization

Section

21

Page 22: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Simple Priority Scheme Embedded in the Original Teradata Database

Internal database routines were architected to use different priorities• In order to support maximum levels of user activity and still get critical

internal work and background tasks completed• Provided a way to boost query performance at critical processing points• Background tasks may start at a low priority but self-promote their priority if

they cannot get their work donePriorities externalized for customer use22

RushPriority

!High

Priority

!MediumPriority

!Low

Priority

!

Page 23: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Customers Embrace Workload Management

Users drove changes and enhancements such as:• A broader set of priority definitions• Concurrency control mechanisms at multiple levels• Rules that identify and reject poorly-written queries• Ability to automatically demote or abort resource-heavy queries

Teradata Active System Management evolved for the Enterprise platformsTeradata Integrated Workload Management for the non-enterprise platforms

23

Internal tasks and the database code continues to rely on the original four priority buckets

Page 24: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Conclusion:Emerging Opportunities

Section

24

Page 25: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Building on a Solid FoundationKey characteristics architected into the original Teradata Database are still delivering performance advantage:

Internal checks and balances that optimize the flow of work

The restorative action of numerous non-intrusive background tasks

Adaptable and flexible file system

Parallelism and parallel-aware Optimizer

Performance boosts of the BYNET

Page 26: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Building on a Solid Foundation

Virtualization has been emerging slowly over time:

AMP as physical entityAs a collection of software processes (relying on PDE)

Parsing Engine as a physical entity As a collection of software processes (relying on PDE)

The YNET/BYNET as a proprietary hardware BYNET as software that can run on any general interconnect

File system structures tied to physical locations on disk Underlying storage managed by TVS with complete disassociation of data and its location

26

Page 27: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

More Information

Content of this slideware is based on the white paper: Born to be Parallel, and Beyond

http://assets.teradata.com/resourceCenter/downloads/WhitePapers/EB3053_new.pdf

Additional sessions on Teradata Database futures:Teradata Database 16.0 Overview Part I & Part IITom Fastner / Phil BentonWednesday 11:00 / 12:00, Room C101

Page 28: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

At Teradata…

We empower companies to achieve high-impact business outcomes

through analytics at scale on an agile data foundation

Page 29: Born To Be Parallel, and Beyond Part II · Teradata FastExport Utility Built on the Original Parallel Design • A high-throughput utility designed to return large volumes of data

Thank You

Questions/Comments

Email:

Rate This Session # with the PARTNERS Mobile App

Remember To Share Your Virtual Passes

[email protected]@teradata.com

297

29