data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

43
Hardware Agnostic: Cassandra on Raspberry Pi Andy Cobley | Lecturer, University of Dundee, Scotland

Upload: andy-cobley

Post on 17-Dec-2014

993 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

Hardware Agnostic: Cassandra on Raspberry Pi

Andy Cobley | Lecturer, University of Dundee, Scotland

Page 2: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Cassandra is hardware agnostic* So why not run it on a Raspberry Pi ?* How hard can it be ?* What can we do with it once it works?

Cassandra on Raspberry Pi

Page 3: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Andy Cobley* School of Computing* University of Dundee* Twitter: @andycobley

Who Am I ?

Page 4: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Single chip Linux computer* 500 Meg ram* Boots off an SD card* Ethernet port * (graphics and all you need for a general purpose computer)

Whats a Raspberry Pi ?

Page 5: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

Pi with pound coin

Page 6: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* And here’s the Cassandra cluster *

And, here’s one for real

* Power Permitting !

Page 7: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Cassandra is designed to be fast, fast at writing, fast at reading.

* This laptop with one instance of Cassandra will do 12,000 write operations

* Raspberry Pi will do 200 !

The Bad News

Page 8: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Running a external USB drive is actually worse !* Probably be hardware feature

More bad news !

Page 9: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

Raspberry Pi Schematic

Page 10: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Oracle Java vs OpenJDK

And then there’s Java!

Page 11: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Raspbian is Debian for the PI* Uses the Hard floating point accelerator* Much faster than Debian* Current Oracle JDK won’t run on it !

And Raspbian

Page 12: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* http://www.oracle.com/technetwork/java/embedded/downloads/javase/index.html

* Java SE Embedded version 6* Cassandra might prefer 6* But* https://blogs.oracle.com/henrik/entry/

oracle_releases_jdk_for_linux* Preview at:* https://jdk8.java.net/fxarmpreview/

Oracle java

Page 13: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Actually not much difference in performance

Hard vs Soft Float

Page 14: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Cassandra uses compression for performance* Started in version 1.0

2x-4x reduction in data size25-35% performance improvement on reads5-10% performance improvement on writes

The Problem with compression

Page 15: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Two types:

Google Snappy Compressor (Faster read/writes)DeflateCompressor (Java zip, slower , better compression)

* Snappy Compression not available on Pi

(requires native methods, so someone might get it to work!)

Compression types

Page 16: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Startup script allocates memory* Calculates based on number of processors* Pi reports Zero processors !* Boom !* Now fixed

And the startup script

Page 17: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* In Cassandra-env.sh* JVM_OPTS="$JVM_OPTS -

Djava.rmi.server.hostname=192.168.1.15”* Or else nodetool will not work between nodes

JMX Config

Page 18: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* C* 1.22. added UseCondCardMark as a JVM Opt* "for better lock handling especially on hotspot with

multicore processor”* In cassandra-env.sh

#if [ "$JVM_VERSION" \> "1.7" ] ; then # JVM_OPTS="$JVM_OPTS -XX:+UseCondCardMark" #fi

JVM OPT UseCondCardMark

Page 19: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1
Page 20: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* We’ve forgotten one thing* The Pi cost £25* You can power 4 from USB hub (no need for a power

supply on each one)* So:

The Good News !

Page 21: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

So, have a 64 node computer for £2000

University of Southhampton

Page 22: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* 32 node Beowolf cluster:* Joshua Kiepert, Boise University

Or this

Page 23: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Adding nodes adds performance* Adding nodes adds replicas of data * BUT* Make sure your ring is balanced, * Pi’s don’t like to be unbalanced.

Adding nodes is good

Page 24: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Vnodes (in 1.2) would be very nice* However at this point I haven’t got 1.2 on Pi running on a

cluster

Vnodes

Page 25: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

Performance with 3/4 nodes

Page 26: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

Performance with 5/6 nodes

Page 27: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* ./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -I DeflateCompressor

* Note: nodes to use* You will get different performance if you insert to less

nodes than you have in your ring

Stress test commands

Page 28: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Adding a node (in the absence of Vnodes)

Must seed form a known nodeUse a program to calculate new keys Bring up new node with the correct key in cassandra.yamlUse node tool to move other nodes

Adding Nodes Procedure

Page 29: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Python codeimport sysif (len(sys.argv) > 1): num = int(sys.argv[1])else: num = int(raw_input("How many nodes? :"))for i in range(0,num): print 'node %d: %d' % (i, (i*(2**127)/num))

Calculating keys

Page 30: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Use nodetool

sudo ./nodetool -h 192.168.1.10 move 42535295865117307932921825928971026432

* And cleanup

./nodetool -h 192.168.1.10 cleanup

Moving existing nodes

Page 31: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* On Debian, you can free memory from the graphics chip

Cd /bootsudo cp start.elf start.elf.oldsudo cp arm224_start.elf to start.elfreboot

Getting more memory

Page 32: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Under Rasbian* Run with a monitor plugged for the first time* Set options for screen memory* Perhaps disable boot to GUI

Getting more Memory

Page 33: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* I prefer static network addresses* Edit /etc/network/interfaces

iface eth0 inet static address 192.168.1.41 netmask 255.255.255.0 network 192.168.1.0 broadcast 192.168.1.255 gateway 192.168.1.254

*

Network address

Page 34: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Make a master SD card* Copy it !* Make sure the master version has no data on it.* Consider ”Puppet” (though I don’t use it)

Multiple nodes

Page 35: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* See https://github.com/acobley/CassandraStartup * Put the file in /etc/init.d* update-rc.d cassandra defaults

Starting as a service

Page 36: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* So for £200 we get an 8 node C* cluster* It can be reconfigured, blown away, stress tested and

generally abused * We can simulate data racks, data centers and I hope even

long network delays.* Hopefully our upcoming MSc in Data Science will use these

clusters

Pi is for teaching

Page 37: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* We know C* can be configured to be aware of:

Network racksData Centers

* We know we can have replicas are stored across these racks

* How can we play with this cheaply ?

C* is network aware

Page 38: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

Proposed teaching tool

10mbs Hubb

Noise injection

Switch 2

Switch 1

Pi 1

Pi 2

Pi 3

Pi 1

Pi 2

Pi 3

Page 39: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Cassandra wouldn’t run on a PI* It does now.* Running it on a Pi shook out some Cassandra bugs* You can run it in a secure lab

Pi is discovery

Page 40: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Most important, this was pure Geeky Fun

Pi is for fun

Page 41: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Data Science:* http://www.computing.dundee.ac.uk/study/postgrad/degree

details.asp?17

Obligatory Plug

Page 42: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

* Raspberry Pi is cheap* C* needs some work to run on it* You can make clusters cheaply for experimentation* It’s fun !

C* is Hardware Agnostic

Page 43: Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

THANK YOU