gridgain basic turbocharge sql queries

36
© 2015 The Apache Software Foundation. Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are trademarks of The Apache Software Foundation. Turbocharge Your SQL Queries In-Memory with Apache Ignite

Upload: trinhanh

Post on 13-Feb-2017

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GridGain Basic Turbocharge SQL Queries

© 2015 The Apache Software Foundation. Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are trademarks of The Apache Software Foundation.

Turbocharge Your SQL Queries In-Memory with Apache Ignite

Page 2: GridGain Basic Turbocharge SQL Queries

Agenda

• GridGain & Apache Ignite Project

• Ignite Data Fabric:

• In-Memory Data Grid & SQL

• Compute Grid & Streaming • Hadoop & Spark Integration

• Q & A

Page 3: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Apache Ignite Project• 2007: First version of GridGain

(compute grid)• Oct. 2014: GridGain contributes Ignite

to ASF • Aug. 2015: Ignite is the second fastest

project to graduate after Spark • Today: • 60+ contributors and growing rapidly • Huge development momentum -

Estimated 192 years of effort since the first commit in February, 2014 [Openhub]

• Mature codebase: 700k+ SLOC & more than 16k commits January 2016

Page 4: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• GridGain Enterprise Edition• Is a binary build of Apache Ignite™ created by GridGain • Added enterprise features for enterprise deployments • Earlier features and bug fixes by a few weeks • Heavily tested

Page 5: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Customer Use Cases

Automated Trading SystemsReal time analysis of trading positions & market risk. High volume transactions, ultra low latencies.

Financial ServicesFraud Detection, Risk Analysis, Insurance rating and modelling.

Online & Mobile AdvertisingReal time decisions, geo-targeting & retail traffic information.

Big Data AnalyticsCustomer 360 view, real-time analysis of KPIs, up-to-the-second operational BI.

Online Gaming

Real-time back-ends for mobile and massively parallel games.

SaaS Platforms & AppsHigh performance next-generation architectures for Software as a Service Application vendors.

Travel & E-CommerceHigh performance next-generation architectures for online hotel booking.

Page 6: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

What is an IMDF?

High-performance distributed in-memory platform for computing and transacting on large-scale data sets in near real-time.

Page 7: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

What is an IMDF?‣ HPC‣ Machine learning‣ Risk analysis‣ Grid computing

‣ HA API Services‣ Scalable

Middleware

‣ Web-session clustering

‣ Distributed caching‣ In-Memory SQL

‣ Real-time Analytics

‣ Big Data‣ Monitoring tools

‣ Big Data‣ Realtime

Analytics‣ Batch processing

‣ Distributed In-Memory File System

‣ Node2Node & Topic-based Messaging

‣ Fault Tolerance‣ Multiple backups‣ Cluster groups‣ Auto Rebalancing

‣ Complex event processing

‣ Event driven design

‣ Distributed queues

‣ Atomic variables‣ Dist. Semaphore

Page 8: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

IMDF: High-Speed Data & Compute Layer

Data Grid

Batch Data

Compute Grid

Transactional & Analytical Workloads

Transactional & Analytical workloads

Streaming Data

External Persistency

External APIs

Back-end users, third-party clients and downstream systems

Downstream Systems

Clients accessing a high-speed distributed multi-facet service

Page 9: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

In-Memory Data Grid

Page 10: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Scalability & Resilience with Ignite

Data Grid

Batch Data

Compute Grid

Transactional & Analytical Workloads

Transactional & Analytical workloads

Streaming Data

External Persistency

External APIs

Back-end users, third-party clients and downstream systems

Downstream Systems

Clients accessing a high-speed distributed multi-facet service

Page 11: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Fault Tolerance & Scalability

Replicated Cache Partitioned Cache

Page 12: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Tiered Memory & Local Store

• Tiered Memory

• On-Heap -> Off-Heap -> Disk

• Persistent On-Disk Store

• Fast Recovery

• Local Data Reload

• Eliminate Network and Db impacts when reloading in-memory store

Page 13: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• Unlimited Vertical Scale • Avoid Java Garbage Collection

Pauses • Small On-Heap Footprint • Configurable eviction policies • Off-Heap Indexes • Full RAM Utilisation • Simple Configuration

Off-Heap Memory

Page 14: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• Multiple (up to 32) Data Centres • Complex Replication

Technologies • Active-Active & Active-Passive • Smart Conflict Resolution • Durable Persistent Queues • Automatic Throttling • GridGain Enterprise

Data Center Replication

Page 15: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Storage and Caching using Ignite

Data Grid

Batch Data

Compute Grid

Transactional & Analytical Workloads

Transactional & Analytical workloads

Streaming Data

External Persistency

External APIs

Back-end users, third-party clients and downstream systems

Downstream Systems

Clients accessing a high-speed distributed multi-facet service

Page 16: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• 100% JCache Compliant (JSR 107) – Basic Cache Operations

– Concurrent Map APIs

– Collocated Processing (EntryProcessor) – Events and Metrics

– Pluggable Persistence

• Ignite Data Grid – Fault Tolerance and Scalability – Distributed Key-Value Store – SQL Queries (ANSI 99) – ACID Transactions

– In-Memory Indexes

– RDBMS / NoSQL Integration

In-Memory Data Grid

Page 17: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

External Persistence

• Read-through & Write-through

• Support for Write-behind • Configurable eviction policies • DB schema mapping wizard: • Generates all the XML configuration and

Java POJOs

Page 18: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Cache APIs & Queries

• Predicate-based Scan Queries • Text Queries based on Lucene indexing • Query configuration using annotations,

Spring XML or simple Java code • SQL Queries • Memcached (PHP, Java, Python, Ruby) • HTTP REST API • JDBC

Page 19: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• ANSI-99 SQL • In-Memory Indexes (On and Off-

Heap) • Automatic Group By, Aggregations,

Sorting • Cross-Cache Joins, Unions • Use local H2 engine

SQL Support (ANSI 99)

Page 20: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Data Grid: Transactions

• Fully ACID • Support for Transactional & Atomic • Cross-cache transactions • Optimistic and Pessimistic

concurrency modes with multiple isolation levels

• Deadlock protection • JTA Integration

Page 21: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Distributed Java Structures

• Distributed Map (cache) • Distributed Set • Distributed Queue • CountDownLatch • AtomicLong • AtomicSequence • AtomicReference • Distributed ExecutorService

Page 22: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Continuous Queries

• Execute a query and get notified on data changes captured in the filter

• Remote filter to evaluate event and local listener to receive notification

• Guarantees exactly once delivery of an event

Page 23: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• Create chains of event processors & transform an object through various states • Synchronous or asynchronous execution of remote filters & listeners with thread control

Payment Validator Payment Verifier Payment Processor

Ignite Cache

Event Processing using Ignite

Page 24: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

In-Memory Compute Grid & the rest

Page 25: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• Branching Pipelines

• Sliding Windows for CEP/Continuous Query

• JMS, Kafka, MQTT, Flume, Camel data streamer integrations

In-Memory Streaming and CEP

Page 26: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• Direct API for MapReduce

• Cron-like Task Scheduling

• State Checkpoints

• Load Balancing • Round-robin • Random & weighted

• Automatic Failover • Per-node Shared State • Zero Deployment • Distributed class loading

In-Memory Compute Grid

Page 27: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Client-Server vs. Affinity Colocation

12

4

3 Data 1

Job 1

2

3Data 2

Job 2

Processing Node 1

Processing Node 2

Client Node

Data Node 1

Data Node 2

Processing Node 1

1

3

4

Data 1

Data 2

2

2

1. Initial Request 2. Fetch data from remote nodes 3. Process entire data-set 4. Return to client

1. Initial Request 2. Co-locating processing with data 3. Return partial result 4. Reduce & return to client

Page 28: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Messaging & Events

• Topic-based messaging • Ordered & Unordered messages • Local & Remote message

listeners • Local & Remote event listeners • Trigger actions from any cluster

events or operations • Query events via IgniteEvents API

Page 29: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• Singletons on the Cluster

– Cluster Singleton

– Node Singleton

– Key Singleton

• Guaranteed Availability

– Auto Redeployment in Case of Failures

In-Memory Service Grid

Page 30: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• Resilience - Build an in-memory resilient service layer between your client application and the grid

• Only expose application APIs and not direct grid APIs

• Service Chaining - Call services internally via compute tasks to create service chains

In-Memory Service Grid

Page 31: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Hadoop & Spark Integration

Page 32: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• Ignite In-Memory File System (IGFS) – Hadoop-compliant

– Easy to Install – On-Heap and Off-Heap

– Caching Layer for HDFS

– Write-through and Read-through HDFS

– Performance Boost

IGFS: In-Memory File System

MR HIVE PIG

In-Memory MapReduce

IGFS

HDFS

IGFS

YARN }Any Hadoop Distro

Page 33: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Hadoop Accelerator: Map Reduce

• In-Memory Performance

• Zero Code Change

• Use existing MR code

• Use existing Hive queries

• No Name Node

• No Network Noise

• In-Process Data Colocation

• Eager Push Scheduling

User Application

Hadoop Client

Ignite Client

Hadoop Jobtracker

Hadoop Name Node

Hadoop Tasktracker

Hadoop Tasktracker

Ignite Data Node (IGFS)

Ignite Data Node (IGFS)

Hadoop Data Node

(HDFS)

Hadoop Data Node

(HDFS)Ignite PathHadoop Path

Page 34: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• IgniteRDD

– Share RDD across jobs on the host

– Share RDD across jobs in the application

– Share RDD globally

• Faster SQL

– In-Memory Indexes

– SQL on top of Shared RDD

Spark & Ignite Integration

Spark Application

Spark Worker

Spark Job

Spark Job

Ignite Node

Yarn Mesos Docker Cloud

Server

Spark Worker

Spark Job

Spark Job

Ignite Node

Server

Spark Worker

Spark Job

Spark Job

Ignite Node

Server

In-Memory Shared RDDs

Page 35: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

• Docker • Amazon AWS • Azure Marketplace • Google Cloud • Apache JClouds • Mesos • YARN • Apache Karaf (OSGi)

Deployment

Page 36: GridGain Basic Turbocharge SQL Queries

© 2016 GridGain Systems, Inc.

Thank You!

www.gridgain.com

@gridgain

#gridgain

Thank you for joining us. Follow the conversation.

Author: Christos Erotocritou