breadth or depth: what's in a column-store?

Post on 27-Jan-2015

116 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

My talk from Barcamp2013 in HK at PolyU

TRANSCRIPT

Breadthor Depth

What's in a column-store?

Jeff SmithFebruary 23, 2013

This presentationIs not

marketingtechnicalarbitrarypolitetraining

Ispersuasivefor the technicalpreciseopinionatededucational

Srsouly

Bio{ past :[startups, biotech, data_management],school : [research, HKU, uncertain_data],work : [AI, finance, prediction] }

This guy

Daniel Abadi

Back to the future● 1 database to rule them all● A scrappy band of rebels● A brave new idea

The big questionWhy grab this?

When all you want is this?

id thing attr1 attr2 attr3 attr4 attr5 attr6 attr7 attr8

123 doodad abc def ghi jkl mno pqr stu vwx

id thing

123 doodad

You're chopping it wrong.

Relations in pieces

id pet weight poops_per_day

1 dog 40 3

2 cat 15 2

3 bird 5 4

4 snake 78 0.25

Horizontal Partitions

id pet weight poops_per_day

1 dog 40 3

2 cat 15 2

3 bird 5 4

4 snake 78 0.25

You gotta get yourself some marble columns.

Vertical Partitions

id

1

2

3

4

pet

dog

cat

bird

snake

weight

40

15

5

78

poops_per_day

3

2

4

0.25

We're gonna need a bigger table.

NoSQL startsEmpire crumblesNomenclature obfuscates

BigTable

I know that song!

Column...families?!

row_id best_pet worst_pet illegal_pet

123 bulldog turtle rhino

row_id make model

123 Smart Fortwo

Pets Cars

Modest MapYear of the snake =>4G =>NoSQL =>Beard =>Column-stores =>

Year of PythonLTENon-relationalFace-mane{column-store | column-family-store}

Does it smell as sweet?

...at column-oriented tasks.

C-Store rocks*

* Contrary to popular belief, after years of effort, Cleveland still does not rock.

Move, b*tch.Get out the vote.

age

23

32

45

67

56

49

43

50

63

34

The catch

Attack of the clones

The contendersHBase*Cassandra*HypertableAccumulo

* The ones that matter

HBaseHadoop stackJava everywhereComponents, extensions, variables, headaches...

Tastes like SQLSELECT sensorid, (20-down)/(up-down) AS probabilityFROM hive_sensors WHERE down>=10 AND up>=20 and down <=20UNION ALLSELECT sensorid, (up-10)/(up-down) AS probabilityFROM hive_sensors WHERE up>=10 AND up<=20 and down <=10UNION ALLSELECT sensorid, 1 AS probabilityFROM hive_sensors WHERE up<=20 and down >=10UNION ALLSELECT sensorid, (20-10)/(up-down) AS probability

FROM hive_sensors WHERE down<=10 AND up>=20;

CassandraCQL interfacePeer to peerBetter, but...

Anything you can do, I can do better.

Sparsenessid attr1 attr2 attr3 attr4

1 1

2 1

3 1

4 1

5

6 1

7

8 1

9 1

10

11

Dynamic Schemas

row_id best_pet worst_pet illegal_pet robot_pet

123 bulldog turtle rhino aibo

456 shi tzu gecko koala

row_id make model

123 Smart Fortwo

456 VW Golf

Pets Cars

Stronger in the broken places

InnovationTruly distributed systemsColumns as metadataArbitrarily deep column hierarchies*Community database development

* Someday soon, I hope

Pig & friendsdata = load 'hbase://table_name' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:*', '-loadKey true') AS (id:chararray, stats: map[int]);

@outputSchema("values:bag{t:tuple(key, value)}")def bag_of_tuples(map_dict): return map_dict.items()

register 'udfs.py' using jython as pydata = load 'hbase://table_name' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:*', '-loadKey true') AS (id:chararray, stats: map[int]);databag = foreach data generate id, FLATTEN(py.bag_of_tuples(stats));

from Chase Seibert

No dog in this fight

jeffreyksmithjr@gmail.com

Hey I just met youAnd this is crazyBut here's my emailMail me maybe

jeff@aidyia.com

Work Play

Disclaimer

All images used in this presentation were stolen from the internet in a daring midnight raid that left 3 dead and 8 wounded. No license was obtained for their use and no license is implied by their misappropriation.

Yarrr. BarrrCamp.

Please don't sue me. I have nothing. Just a dog. Don't take my dog.

top related