turnkey riak kv cluster

18
Turnkey Riak Cluster October 2015

Upload: jo6566

Post on 14-Apr-2017

218 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Turnkey Riak KV Cluster

Turnkey Riak ClusterOctober 2015

Page 2: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 2

Agenda

Part One: (Today)- Create a 10 node Riak KV cluster from scratch- Populate it with time series data (big data!)

Part Two: (Next Time)- Index the data- Perform analytics on the time series data

Part Three: (Next Next Time)- Use machine learning / predictive analytics to

model the data and measure model effectiveness

Page 3: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 3

Mission

Build a turnkey solution for working with a riak cluster of any size

Page 4: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 4

Implementation

Turnkey at Any Scale Means• Public Cloud Infrastructure (GCE)• Automated Cloud Orchestration• Automated Configuration Management

Usage of Large Scale Cluster Requires• Load Balancing • Centralized Logging• Centralized Monitoring

Page 5: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 5

Cloud Orchestration

TERRAFORM (http://terraform.io)

“Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently”

• Runs Locally on Your Machine• Uses Cloud Provider APIs To:

• Create Network and Firewall Rules• Launch Server Instances• Provision and Attach Storage Disks• Bootstraps servers with init scripts

*This is the only tool the user configures and runs

Page 6: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 6

Configuration Management

SaltStack (http://saltstack.com)

“Salt delivers a dynamic communication bus for infrastructures that can be used for orchestration, remote execution, configuration management and much more.”

• Salt installs software and configures each server• Each server type has its own Salt profile

• i.e. riak, HAProxy, Zabbix, Elk Stack• Bootstrapped by init script Terraform set

• Installs Salt and sets the server profile• Init script runs automatically once server is initialized

Page 7: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 7

Key – Value Data Store

Riak KV (http://www.basho.com)

Riak is a highly available, scalable, fault tolerant, key – value big data store

• Masterless architecture – every node capable of servingread / write requests

• Automatic sharding of data to ensure even distribution

• Tunable consistency – better performance

Page 8: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 8

Load Balancing

HAProxy (http://www.haproxy.org)

“HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications”

• Equally distributes load between nodes in riak cluster

*Project leverages Cloud Provider LB for single IP external access only

Page 9: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 9

Centralized Monitoring

ZABBIX (http://www.haproxy.com)

“Zabbix is an enterprise open source monitoring solution for networks and applications”

• Collects Basic Server Metrics• i.e. CPU, Memory, Disk Usage

• Added Basho Zabbix Template• https://github.com/basho/riak-zabbix• Adds Riak throughput, latency, health, & Erlang Metrics

• Easily set alerts and configure dashboards

Page 10: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 10

Centralized Logging

ELK Stack (http://elastic.co)

“ELK Stack combines the Elasticsearch, Logstash & Kibana to provide realtime insights of any type of structured, unstructured data.” (i.e. Logs)

• A process on servers sends logs to ELK Stack• https://github.com/josegonzalez/python-beaver

• Kibana provides UI for• Adhoc queries• Building Dashboards

Page 11: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 11

Stack Diagram

Page 12: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 12

Cost

What Does Google Cloud Platforms Charge for?• Hourly Rate of each server instance you create• Hourly Rate for each GB of SSD or Magnetic Storage Used• Hourly Charge for Each GCE LB forwarding rule• $.008 / GB of Data through GCE Load Balancer

Default Configuration CostItem Quantity Unit Cost / Hour Total

Riak Node (n1-standard-2) 5 $0.10 $0.50

Riak SSD Storage 100 $0.000236 $0.02

Riak Magnetic Storage 1000 $0.000056 $0.06

HAProxy Server (n1-standard-1) 2 $0.05 $0.10

Zabbix Server (n1-standard-2) 1 $0.10 $0.10

ELK Stack Server (n1-standard-1) 1 $0.05 $0.05

Server Magnetic Boot Disk Space 90 $0.000056 $0.01

Load Balancer Forwarding Rules 2 $0.03 $0.05

Data Through Load Balancer ? $0.008 $0.00

Total Cost (Per Hour) $0.88

* See GCE Pricing (https://cloud.google.com/compute/pricing#lb)

Page 13: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 13

Loading Data - IoT

Chicago Transit Authority (CTA)- Public API- Allows you to query location of buses and trains- Data refreshed every minute- 1500+ vehicles- Just under a million fixes a day

fix = (id, time, lat, long, ……)

Page 14: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 14

Loading Data - IoT

Example data:{"vid": 1950, "tmstmp": "20150218 23:59", "lat": 41.880667662009216, "lon": -87.741054045848358,"hdg": 269, "pid": 949, "rt": "20", "des": "Austin", "pdist": 4042, "spd": 15, "tablockid": "N20 -893", "tatripid": 1040830, "zone": null}

One per vehicle, per minute

Page 15: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 15

Loading Data - IoT

- CTA data archive located athttps://s3.amazonaws.com/cta-tracker

250 days = 250,000,000 fixes

How do we configure a cluster to allow this much data to be input for analysis in a reasonable* amount of time?

*reasonable = 30 minutes

Page 16: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 16

Stack Diagram

S3

Loader Loader Loader Loader Loader Loader

Page 17: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 17

Conclusions

• With the three basic building blocks (public cloud API, Terraform, Salt), you can build & configure complex, low cost, high performance network infrastructures in minutes.

• Riak KV provides highly scalable, reliable data throughput (read / write) in a cloud environment.

• Supporting tools (HAProxy, Zabbix, ELK) allow you to measure the cluster’s effectiveness.

Together, these tools can be used to build throwaway big data analysis environments!

Page 18: Turnkey Riak KV Cluster

2015 (C) physIQ All rights reserved 18

Access To Project Components

• Cluster Configuration Scripts• https://github.com/physIQ/turnkey-riak

• Archived CTA fix data• https://s3.amazonaws.com/cta-tracker

• CTA Tracker project• https://github.com/jolson7168/ctaTracker