scaling and managing cassandra with docker, coreos and presto

20
Javascript addict, Experienced as technical leader and being focused on building high-quality, distributed, scalable applications. VALI MALINOIU @0X4139

Upload: vali-marius-malinoiu

Post on 18-Jul-2015

223 views

Category:

Software


6 download

TRANSCRIPT

Javascript addict, Experienced as technical leader and being focused on building high-quality, distributed, scalable applications.

VALI MALINOIU @0X4139

HOW TO - WITH CASSANDRA★ Scaling with DOCKER

★ Clustering with COREOS

★ Querying with PRESTO

SCALING - DOCKER★ What is docker ?

★ Why should i use it ?

★ How does it help me ?

Docker is an open platform for developers and sysadmins

to build, ship, and run distributed applications

WHAT IS DOCKER???

DOCKER

★ Why developers like it?

★ Why sysadmins like it?

★ Docker vs VM (VBox, VMWARE)

FOR DEVELOPERS• Any language using any toolchain •“Dockerized” apps are completely portable and can run

anywhere •No more apt-get install, yum install •Get started quickly using one of the 13000+ apps on

Docker Hub

FOR SYSADMINS•Provides standardized environments for QA,DEV,PROD •Reduces “works on my machine” finger pointing •Deploy and run any app on any infrastructure, quickly and

reliable

Each virtualized application includes not only the application - which may be 10s of MBS and the necessary binaries and libraries but also the entire guest operating system - which may weigh 10’s of GB

VIRTUAL MACHINES

The docker Engine container comprises just the application and its dependencies. It runs as an isolated process in userspace on the host operating system, sharing the kernel with other containers.

DOCKER

DOCKER DEMO

CLUSTER - COREOS★ What is cores ?

★ Why should i use it ?

★ How does it help me ?

CoreOS is Linux for Massive Server Deployments

WHAT IS COREOS???

COREOS

★ ETCD

★ FLEET

★ FLANNEL

ETCD

ETCD is a distributed key value store that provides a reliable way to store data across a cluster of machines. ETCD gracefully handles master elections during network partitions and will tolerate machine failure, including the master. Your applications can read and write data into etcd.

FLEET

Easy Warehouse-scale Computing Treat your CoreOS cluster as if it shared a single init system. Automatic migration of the units when a machine is failing

FLEET - ADVANTAGESDeploy docker containers on arbitrary hosts in a cluster Distribute services across a cluster using machine-level anti-affinity Maintain N instances of a service, re-scheduling on machine failure Discover machines running in the cluster Automatically SSH into the machine running a job

Listing machines in a cluster

Listing units in a cluster

Listing unit files in a cluster

Creating a unit

FLANNEL

is an overlay network that gives a subnet to each machine for use with Docker/Kubernetes Allows for Docker containers to communicate even tough the containers are located on different machines (magic)

Facebook uses Presto for interactive queries against several internal data stores,

including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily

to run more than 30,000 queries that in total scan over a petabyte each per day.

DISTRIBUTED SQL ENGINE FOR BIG

DATA

PRESTOby Facebook

HOW DOES IT WORK?★ Coordinator-worker

architecture

★ Works with varios connectors

★ Hadoop/HIVE

★ Cassandra

★ TCP-H (mostly for testing)

WHY??★ Combine data from

multiple data sources

★ JOINS!!!!

★ Scalable

★ Maintained!!!

QUESTIONS?