pg-strom query acceleration engine of postgresql powered...

22
PG-Strom Query Acceleration Engine of PostgreSQL Powered by GPGPU NEC OSS Promotion Center The PG-Strom Project KaiGai Kohei <[email protected]>

Upload: vanlien

Post on 31-Mar-2018

251 views

Category:

Documents


4 download

TRANSCRIPT

PG-Strom Query Acceleration Engine of PostgreSQL

Powered by GPGPU

NEC OSS Promotion Center

The PG-Strom Project

KaiGai Kohei <[email protected]>

Self Introduction

▌Name: KaiGai Kohei

▌Company: NEC

▌Mission: Software architect & Intrepreneur

▌Background:

Linux kernel development (2003~?)

PostgreSQL development (2006~)

SAP alliance (2011~2013)

PG-Strom development & productization (2012~)

▌PG-Strom Project:

In-company startup of NEC

Also, an open source software project

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.2

What is PG-Strom

▌An Extension of PostgreSQL

▌Off-loads CPU intensive SQL workloads to GPU processors

▌Major Features

① Automatic and just-in-time GPU code generation from SQL

② Asynchronous and concurrent query executor

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.3

database

Query Executor

Query Planner

Custom Executor

Custom Planner

GPU code on the fly SQL

command

Async- Execution

PG-Strom Q

uery

Fro

nte

nd

Concept

▌No Pain Looks like a traditional PostgreSQL database from standpoint of

applications, thus, we can utilize existing tools, drivers, applications.

▌No Tuning Massive computing capability by GPGPU kills necessity of database

tuning by human. It allows engineering folks to focus on the task only human can do.

▌No Complexity No need to export large data to external tools from RDBMS, because its

computing performance is sufficient to run the workloads nearby data.

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.4

Technology Trend

▌Movement to CPU/GPU integrated architecture rather than multicore CPU

▌Free lunch for SW by HW evolution will finish soon

Unless software is not designed to utilize GPU capability, unable to pull-out the full hardware capability.

DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL Page. 5

SOURCE: THE HEART OF AMD INNOVATION, Lisa Su, at AMD Developer Summit 2013

Background: How SQL is executed

▌Planner constructs query execution plan based on cost estimation

▌SQL never defines how to execute the query, but what shall be returned

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.6

postgres# EXPLAIN SELECT cat, avg(x) FROM t0 NATURAL JOIN t1

WHERE t0.z like '%abc%' GROUP BY cat;

QUERY PLAN

------------------------------------------------------------------

HashAggregate (cost=6629.88..6629.89 rows=1 width=12)

Group Key: t0.cat

-> Hash Join (cost=1234.00..6619.77 rows=2020 width=12)

Hash Cond: (t0.aid = t1.aid)

-> Seq Scan on t0 (cost=0.00..5358.00 rows=2020 width=16)

Filter: (z ~~ '%abc%'::text)

-> Hash (cost=734.00..734.00 rows=40000 width=4)

-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4)

(8 rows)

Background: Custom-Plan Interface

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.7

Aggregate

SELECT cat, avg(x) FROM t1, t2

WHERE t1.id = t2.id AND y > 100

GROUP BY cat;

Scan on t1 Scan on t2

Join

t1 t2

key: cat • Hash Join • Merge Join • Nested Loop • Custom Join

• Seq Scan • Index Scan • Index-Only Scan • Tid Scan • Custom Scan

IndexScan on t1

y > 100

“BulkLoad” on t1

“GpuHashJoin”

t1.id = t2.id

PG-Strom Features

▌Logics

GpuScan ... Parallel evaluation of scan qualifiers

GpuHashJoin ... Parallel multi-relational join

GpuPreAgg ... Two phase aggregation

GpuSort ... GPU + CPU Hybrid Sorting

GpuNestedLoop (in develop)

▌Data Types

Integer, Float, Date/Time, Numeric, Text

▌Function and Operators

Equality and comparison operators

Arithmetic operators and mathematical functions

Aggregates: count, min/max, sum, avg, std, var, corr, regr

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.8

Automatic GPU code generation

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.9

postgres=# SET pg_strom.show_device_kernel = on;

postgres=# EXPLAIN VERBOSE SELECT * FROM t0 WHERE sqrt(x+y) < 10;

QUERY PLAN

--------------------------------------------------------------------------------

Custom Scan (GpuScan) on public.t0 (cost=500.00..357569.35 rows=6666683 width=77)

Output: id, cat, aid, bid, cid, did, eid, x, y, z

Device Filter: (sqrt((t0.x + t0.y)) < 10::double precision)

Features: likely-tuple-slot

Kernel Source: #include "opencl_common.h“

:

static pg_bool_t

gpuscan_qual_eval(__private cl_int *errcode,

__global kern_parambuf *kparams,

__global kern_data_store *kds,

__global kern_data_store *ktoast,

size_t kds_index)

{

pg_float8_t KPARAM_0 = pg_float8_param(kparams,errcode,0);

pg_float8_t KVAR_8 = pg_float8_vref(kds,ktoast,errcode,7,kds_index);

pg_float8_t KVAR_9 = pg_float8_vref(kds,ktoast,errcode,8,kds_index);

return pgfn_float8lt(errcode,

pgfn_dsqrt(errcode, pgfn_float8pl(errcode, KVAR_8, KVAR_9)), KPARAM_0);

}

Implementation (1/3) – GpuScan

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.10

Table

DMA Send

DMA Recv

DMA Send

DMA Recv

DMA Send

DMA Recv

Execution of auto-generated GPU code

Result Output Stream

Input Stream

Chunk (16~64MB)

PostgreSQL PG-Strom

Software Architecture

DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL Page. 11

GPU Code Generator

Storage

Storage Manager

Shared Buffer

Query Parser

Query Optimizer

Query Executor

SQL Query

Breaks down the query to parse tree

Makes query execution plan

Run the query Cust

om

-Pla

n A

PIs

GpuScan

GpuHashJoin

GpuPreAgg

GPU Program Manager

PG-Strom OpenCL Server

Message Queue

GpuSort

※ Current version based on OpenCL

Implementation (2/3) – GpuHashJoin

PG-Strom Preview Feb-2015 Page. 12

Inner relation

Outer relation

Inner relation

Outer relation

Hash Table Hash Table

Next stage Next stage

CPU just references materialized results

Hash-Table Search by CPU

Sequential Materialization by CPU

Parallel Materialization

Parallel Hash-Table

Search

vanilla Hash-Join GpuHashJoin

Benchmark result (1/2) – simple tables join

▌Benchmark Query: SELECT * FROM t0 NATURAL JOIN t1 [NATURAL JOIN ....];

▌Environment: t0 has 100million rows (13GB), t1-t9 has 40,000 rows for each, all-data pre-loaded

CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.13

18.19 19.45 21.04 23.66 26.69 37.64 43.22 49.57 56.38

64.27

87.73

109.73

132.21

155.10

179.62

207.85

233.31

263.51

0.00

50.00

100.00

150.00

200.00

250.00

300.00

2 3 4 5 6 7 8 9 10

Qu

ery

Re

spo

nse

Tim

e [

sec]

number of tables joined

Simple Tables Join Benchmark

PG-Strom PostgreSQL

Implementation (3/3) – GpuPreAgg

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.14

Table

1st Stage Reduction

2nd Stage Reduction

Chunk (16~64MB)

Benchmark result (2/2) – Star Schema Model

▌40 typical reporting queries

▌100GB of retail / start-schema data, all pre-loaded

▌Environment CPU: Xeon E5-2670v3(12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.15

0.00

200.00

400.00

600.00

800.00

1000.00

1200.00

1400.00

1600.00

1800.00

2000.00

Q.0

1

Q.0

2

Q.0

3

Q.0

4

Q.0

5

Q.0

6

Q.0

7

Q.0

8

Q.0

9

Q.1

0

Q.1

1

Q.1

2

Q.1

3

Q.1

4

Q.1

5

Q.1

6

Q.1

7

Q.1

8

Q.1

9

Q.2

0

Q.2

1

Q.2

2

Q.2

3

Q.2

4

Q.2

5

Q.2

6

Q.2

7

Q.2

8

Q.2

9

Q.3

0

Q.3

1

Q.3

2

Q.3

3

Q.3

4

Q.3

5

Q.3

6

Q.3

7

Q.3

8

Q.3

9

Q.4

0

Qu

ery

Res

po

nse

Tim

e [s

ec]

Typical Reporting Queries on Retail / Star-Schema Data

PG-Strom PostgreSQL

Expected Scenario – Reduction of ETL

▌ETL – Its design is human centric task

▌Replication – much automatous task

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.16

ERP CRM SCM BI

OLTP database

OLAP database

ETL

OLAP Cubes Master / Fact Tables

BI

Replication

Replica of Master / Fact Tables

Optimized to transaction

workloads

Optimized to analytic workloads

Sufficient to analytic

workloads also

PG-Strom

Direction of PG-Strom

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.17

Development Plan

Migration of OpenCL to CUDA

Add support of GpuNestedLoop

Add support multi-functional kernel

Standardization of custom-join interface

...and more...?

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.18

Short term target: PostgreSQL v9.5 timeline (2015)

Middle term target: PostgreSQL v9.6 timeline (2016)

Current version: PG-Strom β + PostgreSQL v9.5devel

Integration with funnel executor

Investigation to SSD/NvRAM utilization

Custom-sort/aggregate interface

Add support for spatial data types (?)

Let’s try – Deployment on AWS

The PostgreSQL Conference 2014, Tokyo - GPGPU Accelerates PostgreSQL Page. 19

Search by “strom” !

AWS GPU Instance (g2.2xlarge)

CPU Xeon E5-2670 (8 xCPU)

RAM 15GB

GPU NVIDIA GRID K2 (1536core)

Storage 60GB of SSD

Price $0.898/hour, $646.56/mon (*) Price for on-demand instance on Tokyo region at Nov-2014

Welcome your involvement!

▌How to be involved?

as a user

as a developer

as a business partner

▌Source code

https://github.com/pg-strom/devel

▌Contact US

e-mail: [email protected]

twitter: @kkaigai

PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.20

check it out!

...or, catch me in the Convention Center