kinetica: the fastest, distributed, in-memory gpu-accelerated database

Fastest, distributed, in-memory GPU-Accelerated Database

Eric Mizell – VP, Solution Engineering

Evolution of Data Processing

2

DATA WAREHOUSE

RDBMS and Data Warehouse technologies enable organizations to store and analyze growing volumes of data on high performance machines, but at high cost.

DISTRIBUTED STORAGE AFFORDABLE

IN-MEMORYGPU-ACCELERATED COMPUTE

Hadoop and MapReduce enable distributed storage and processing across multiple machines.

Storing massive volumes of data becomes more affordable, but performance is slow.

Affordable memory allows for faster data read and write. HBase, HANA, and MemSQLprovide faster analytics.

At scale compute processing now becomes the bottleneck.

GPU cores bulk process tasks in parallel–far more efficient for compute-intensive tasks than CPUs.

1990 - 2000s 2005… 2010… 2016…

A Cohesive, High-Performance Data Analytics Solution

Massive parallel processingProcess and manage massive amounts of data cost-effectively

In-memory computingDeliver human acceptable response times to complex analytical queries

High performance computingLeverage GPU-accelerated hardware to massively improve performance with better cost-and energy-efficiency

Massive Parallel Processing

(MPP)

In-Memory Computing

HPC Hardware

3

GPU Acceleration Overcomes Processing Bottlenecks

4,000+ cores per device in many cases, versus 8 to 32

cores per typical CPU-based device.

High performance computing trend to using GPUs to solve

massive processing challenges GPU acceleration brings high

performance compute to commodity hardware

Parallel processing is ideal for scanning entire dataset & brute force compute.

GPUs are designed around thousands of small, efficient cores that are well suited to performing repeated similar instructions in parallel. This makes them well-suited to the compute-intensive workloads required of large data sets.

4

Who is Kinetica?20

09

‘HPC Research Project’ incubated by US military

2010

2011

Patent # US8373710 B1 issued to GPUdb

2012

US Army deploys GPUdb

2013

GPUdb commercially available

2014

IDC HPC innovation excellence award

Army

GPUdb goes into production

at USPS

2015

Iron Net selects GPUdb for Cyber

Defense

2015

PG&E selects GPUdb for electric grid

analysis

IDC HPC innovation excellence award

USPS

2016

Rebrand to

5

Reference Architecture

• High speed in-memory database for your most critical enterprise data

• Real-time analytics for streaming data

• Accelerate performance of your existing analytics tools and applications

• Offload expensive relational databases

• Get more value from your Hadoop investment

15

APPLICATION LAYER

OLAP | NoSQL | GEOSPATIAL| APIs | TEXT SEARCH

DATA INGEST

DATA LAKE

TRANSACTIONAL

IoT/StreamProcessing

MessageQueue

BatchFeeds

Parallel Ingestion

• Parallel ingestion of events

• Kinetica is speed layer with real-time analytic capabilities

• HDFS/Object Store/SAN for data lake

• Much looser coupling than traditional streaming architectures

• Batch mode Spark or MapReduce jobs can push data to Kinetica as needed for fast query

EVENTS

MESSAGEBROKERS

Amazon Kinesis

ANALYSTS

MOBILE USERS

DASHBOARDS &APPLICATIONS

ALERTING SYSTEMS

PUT, GET, SCAN

Execute complex analytics on the fly

Kinetica Connectors

STREAMPROCESSING

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

HDFS/Object Store/SAN

Streaming Analytics Simplified

17

Kinetica Architecture

8

VISUALIZATION via ODBC/JDBCAPIs

Java API

JavaScript API

REST API

C++ API

Node.js API

Python API

OPEN SOURCE INTEGRATION

Apache NiFi

Apache Kafka

Apache Spark

Apache Storm

GEOSPATIAL CAPABILITIESGeometric

Objects

Tracks

Geospatial Endpoints

WMS

WKT

KINETICA CLUSTEROn-demand Scale

Commodity Hardwarew/ GPUs

Disk

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

Columnar In-memory

HTTP Head Node


Disk

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

Columnar In-memory

HTTP Head Node


Disk

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

Columnar In-memory

HTTP Head Node


Disk

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

Columnar In-memory

HTTP Head Node

OTHERINTEGRATION

Message Queues

ETL Tools

Streaming Tools

Reliable, Available and Scalable• Disk-based persistence• Data replication for high

availability• Scale up and/or out

Performance• GPU-accelerated (1000s cores

per GPU)• Ingest billions of records per

minute• Ultra low-latency performance

from ingestion through to analytics

Connectors• ODBC/JDBC• Restful endpoints• Open source APIs• Native geospatial capabilities

16

Code Examples/APIs

http://www.gpudb.com/docs/https://github.com/GPUdb

Configuring Connection

Java Connection - Example:import com.gpudb.GPUdb;GPUdb gpudb = new GPUdb("http://localhost:9191");

Python Connection - Example:import gpudb;h_db = gpudb.GPUdb(encoding = 'BINARY', host = '127.0.0.1', port = '9191')

JDBC Connection – Example:String driverClass = "com.simba.client.core.jdbc4.SCJDBC4Driver”;String url = "jdbc:simba://127.0.0.1:9292;URL=http://127.0.0.1:9191;ParentSet=MASTER”;Connection conn = getConnection(driverClass, url);

10

http://localhost:9191

Object Relational MappingWith the API’s (Java, JavaScript, Python, C++), everything is an object

RecordObject Extension - Example:import com.gpudb.RecordObject;import com.gpudb.ColumnProperty;

public class Order extends RecordObject{

@Column(order = 0, properties = { ColumnProperty.PRIMARY_KEY })public int id;

@Column(order = 1)public String name;

@Column(order = 2, properties = { ColumnProperty.TIMESTAMP })public long orderDate;

}

11

Inserting RecordsSingle Insert - Example:

//requires a list for insert (could have multiple rows/don't use this for bulk loading)List <PersonType> list = new ArrayList <PersonType>();list.add(person);InsertRecordsRequest <PersonType> request = new InsertRecordsRequest<PersonType>("person", list, null);gpudb.insertRecords(request);

Bulk Insert - Example:BulkInserter<PersonType> bulkInserter = new BulkInserter<PersonType>(gpudb, tableName, type, batchSize, null);

//flushes to Kinetica DB automatically based on batchSizefor (Integer i = 1; i <= numberRows; i++) {

PersonType record = createPerson();bulkInserter.insert( record );

}

12

Querying RecordsBasic Query – Example (select * from table limit 1000):

GetRecordsResponse<PersonType> response; //provide table name, records offset, number of records to retrieve, optional optionsresponse = gpudb.getRecords(tableName, 0, 1000, null); List<PersonType> responseList = response.getData(); for(PersonType t : responseList){

//do some processing on the records}

Query by Expression – Example (select * from table where range filter limit 5)String expression = "timestamp >= " + startDate + " and timestamp <= " + endDate;//provide table name, view name (for query chaining), expression, optional options (sort column/order)FilterResponse resp = gpudb.filter(tableName, viewName, expression, null);long numResults = resp.getCount();//provide view name, records offset, number of records to retrieve, optional options (sort column/order)GetRecordsResponse getRecordsRsp = gpudb.getRecords( viewName, 0, 5, null );

13

Demo

http://demo.kinetica.com/gaiademo/

Thank You

Geoff Lunsford – [email protected] Mizell – [email protected]

kinetica: the fastest, distributed, in-memory gpu-accelerated database

Software