introduction to apache accumulo

Post on 27-Jan-2015

4.934 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Description of Apache Accumulo including data model, scaling and recovery features, API, security, and applications

TRANSCRIPT

Apache Accumulo

Introduction

Introduction

• Aaron Cordova

• Founded Accumulo project with several others

• Led development through release 1.0

• aaron@tetraconcepts.com

Agenda

• Introduction

• Data Model

• API

• Architecture - scaling, recovery

• Security

• Data-lifecycle

• Applications

Introduction

History

• Began writing in summer of 2008, after comparing design goals with BigTable paper and existing implementations Hbase, Hypertable

• Released internal version 1.0 summer of 2009.

• September 2011 accepted as an Apache Incubator project. Doug Cutting, founder of Hadoop, was the Champion Sponsor

• Feb 2012 1.4 Released

• March 2012 graduates to a top level Apache project

• V 1.5 due out soon

Introduction

• Accumulo is a sparse, distributed, sorted, multi-dimensional map

• Modeled after Google’s BigTable design

• Scales to trillions of records and 100s of Terabytes

• Features automatic load balancing, high-availability, dynamic control over data layout

Data Model

Data Model

KeyKeyKeyKeyKeyValue

row IDColumnColumnColumn

TimestampValue

row IDFamily Qualifier Visibility

TimestampValue

Data Model (Logical 2D table structure)

attribute:age

attribute:phone

purchases:sneakers returns:hat

bill 49 555-1212 $100 -

george 38 - $80 $30

Physical layout (sorted keys)

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill attribute phone private Jun 2010 555-1212

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

High-level API

Accumulo API

• To use Accumulo, must write a an application using the Accumulo Java client library. There is no SQL (hence NoSQL)

• Data is packaged into Mutation objects which are added to a BatchWriter which sends them to TabletServers

• Clients can scan a set of key value pairs by specifying optional start and end keys (Range) and obtaining a Scanner. Iterating over the scanner returns sorted key value pairs for that range. Each scan takes milliseconds to start.

• Can scan over a subset of the columns

• Can send a set of Ranges to a BatchScanner, get matching key value pairs, unsorted

Insert

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

Insert

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

bill attribute phone private Jun 2010 555-1212

Insert

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill attribute phone private Jun 2010 555-1212

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

Scan - Full key lookup

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill attribute phone private Jun 2010 555-1212

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

bill attribute phone private Jun 2010

Scan - Single row

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill attribute phone private Jun 2010 555-1212

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

bill

Scan - Multiple Rows

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill attribute phone private Jun 2010 555-1212

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

bill - will

Scan - Multiple Rows, Selected Columns

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill attribute phone private Jun 2010 555-1212

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

bill - will, fetch purchases

Architecture - Scaling and Recovery

Performance

• Accumulo ‘scales’ because aggregate read and write performance increase as more machines are added, and because individual reads/write performance remains very good even with trillions of key-value pairs already in the system

• Sources: http://www.slideshare.net/acordova00/accumulo-on-ec2

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf

10

100

1000

10000

1 16 64 256 1024

Thou

sand

s of

writ

es p

er s

econ

d

Number of machines

AccumuloBigTable circa 2006Cassandra

Accumulo Prerequisites

• One to hundreds of computers with local hard drives, connected via ethernet

• Password-less SSH access

• Local directory for write-ahead logs

• Hadoop and ZooKeeper installed, configured, and running

Architecture

HDFS MapReduce

Accumulo

ZooKeeper

Architecture: HDFS

HDFSNameNode

DataNodes

File

Architecture: HDFS

HDFSNameNode

DataNodes

Block 2Block 1

Architecture: HDFS

HDFSNameNode

DataNodes

Architecture: Tables

Accumulo

Tablet Servers

Master

Table

Architecture: Tables

Accumulo

Tablet Servers

Master

P2P1 P3

Architecture: Tables

Accumulo

Tablet Servers

Master

Architecture: Writes

HDFS

P1

File1

MemTable

Architecture: Writes

HDFS

P1

File1

MemTable

Client

Write-ahead Log

Architecture: Writes

HDFS

File1 File 2

P1 MemTable

Write-ahead Log

Architecture: Writes

HDFS

File1 File 2

P1 MemTable

Write-ahead LogX

Architecture: Splits

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill attribute phone private Jun 2010 555-1212

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

Architecture: Splits

Accumulo

Tablet Servers

Master

Architecture: Splits

Accumulo

Tablet Servers

Master

Architecture: Splits

Accumulo

Tablet Servers

Master

• Because keys are sorted, tables can be partitioned based on the data

• partitions (tablets) are uniform in size, regardless of data distribution,(as long as single rows are smaller than the partition size)

• not based on the number of servers

• Can add /remove / fail servers at any time, the system is always automatically balanced

Sorted keys - dynamic partitioning

Partitioning Contrast

• Some relational databases allow partitioning. May require users to choose a field or two on which to partition. Hopefully that field is uniformly distributed

• Hash-based systems (default Cassandra, CouchDB, Riak, Voldemort) avoid this problem, but at the cost of range scans. Some support range scans via other means.

• Many systems couple partition storage with partition service, requiring data movement to rebalance partition service (MongoDB, Cassandra, etc)

Architecture: Reads

File1 File 2

P1 MemTable

Client

Merge

Architecture: Recovery

Accumulo

Tablet Servers

Master

DataNodesNameNode

Architecture: Recovery

Accumulo

Tablet Servers

Master

DataNodesNameNode

Architecture: Recovery

Accumulo

Tablet Servers

Master

DataNodesNameNode

Architecture: Recovery

Accumulo

Tablet Servers

Master

DataNodesNameNode

Architecture: Recovery

Accumulo

Tablet Servers

Master

DataNodesMaster reassigns

NameNode

Architecture: Recovery

Accumulo

Tablet Servers

Master

DataNodesReplay Write-ahead Log

NameNode

Architecture: Recovery

Accumulo

Tablet Servers

Master

DataNodesNameNode

Architecture: Recovery

Accumulo

Tablet Servers

Master

DataNodesNameNode

user tables

metadata table

Metadata Hierarchy

root

md1 md2 md3

user1 user2 index1 index2

Architecture: Lookup

Accumulo

Tablet Servers

Master

Client

ZooKeeper

Architecture: Lookup

Accumulo

Tablet Servers

Master

Client

ZooKeeperClient knows zookeeper,finds root tablet

Architecture: Lookup

Accumulo

Tablet Servers

Master

Client

ZooKeeperScan root tabletfind metadata tabletthat describes theuser table we want

Architecture: Lookup

Accumulo

Tablet Servers

Master

Client

ZooKeeperRead location info

of tablets of user tableand cache it

Architecture: Lookup

Accumulo

Tablet Servers

Master

Client

ZooKeeperRead directly from server

holding the tablets we want

Architecture: Lookup

Accumulo

Tablet Servers

Master

Client

ZooKeeperFind other tabletsvia cache lookups

Security

Security

• Design and Guarantees

• Data Labeling

• Authentication

• User Configuration

Data Security

• Accumulo will only return cells whose visibility labels are satisfied by user credentials presented at Scan time

• Two necessary conditions

• Correctly labeling data on ingest

• Presenting right user credentials

Security Labels

row IDcolumncolumncolumn

timestamp valuerow IDfamily qualifier visibility

timestamp value

Extension of BigTable data model

Column Visibility

row col fam col qual col vis time value

bill attribute age public Jun 2010 49

bill attribute phone private Jun 2010 555-1212

bill purchases sneakers public Apr 2010 $100

george attribute age private Oct 2009 38

george purchases sneakers public Nov 2009 $80

george returns hat public Dec 2009 $30

Security Label Syntax

• A & B - both A and B required

• A | B - must have either A or B

• (A | B) & C - must have C and A or B

• A | (B & C) - must have A or both B and C

• A & (B | (C & D))

Security Label Example

• Drive needs:

• license&over15

• Join military:

• (over17|(over16&parentConsent)) & (greencard|USCitizen)

• Access to Classified data

• TS&SI&(USA|GBR|NZL|CAN|AUS)

Security Perimeter

Security Model

Accumulo

Trusted Client Auth Service

User

ID, password, cert

auths

verify

auths data

data

Trusted Client Responsibility

• Ensure that credentials belong to the user

• Ensure that the user is authenticated

Application Authorization

• Trusted Client applications must have max authorizations set before they can be passed

• The Trusted Client limits the set of authorizations by application

Application Authorization Example

• Data may be labeled with any combination of the following:

{ personal, research, finance, diet, cancer }

• We wish to limit certain applications to a subset

Example Table

row colF ColQ col vis value

row0 name - personal|finance Johnrow0 age - personal|research 49row0 phone - personal|finance 555-1212row0 owed - personal|finance $5440

row0 diagnosis - personal|(research & cancer)

melanoma

row0 diagnosis - personal|(research & diet) diabetes

Application Authorizations

Cancer Research: cancer diagnoses, age

Diabetes Research: diet info, age

Accounting System: balance, name, phone

Personal Records Management: all

Security Perimeter

Security Model

Accumulo

Auth Service

Researcher

ID, password, cert

Cancer Research App

Security Perimeter

Security Model

Accumulo

Auth Service

ID, password, cert

verify

Researcher

Cancer Research App

Security Perimeter

Security Model

Accumulo

Auth Service

ID, password, cert

research, cancer, diabetes

verify

Researcher

Cancer Research App

Security Perimeter

Security Model

Accumulo

Auth Service

ID, password, cert

research,cancer

Researcher

Cancer Research App

Security Perimeter

Security Model

Accumulo

Auth Service

ID, password, cert

dataresearch,cancer

Researcher

Cancer Research App

Security Perimeter

Security Model

Accumulo

Auth Service

ID, password, cert

data

data

research,cancer

Researcher

Cancer Research App

Data life-cycle

Data Model

KeyKeyKeyKeyKeyValue

row IDColumnColumnColumn

TimestampValue

row IDFamily Qualifier Visibility

TimestampValue

Versions

rowID family qualifier timestamp valuerow1 fam1 qual1 1005 2

row1 fam1 qual1 1004 5

row1 fam1 qual1 1003 3

row1 fam1 qual1 1002 2

row1 fam1 qual1 1001 7

What can we do with multiple versions of the same data?

Iterators

• Mechanism for adding online functionality to tables

• Aggregation (called Combiners)

• Age-Off

• Filtering (including by security label)

Versioning Iterators

rowID family qualifier timestamp valuerow1 fam1 qual1 1005 2row1 fam1 qual1 1004 5row1 fam1 qual1 1003 3row1 fam1 qual1 1002 2row1 fam1 qual1 1001 7

Filtering Iterators

• Age Off

• RegEx

• Arbitrary filtering

Age Off

• Can specify a particular date - e.g. delete everything older than July 1, 2007

• Can specify a time period - e.g. delete everything older than 6 months

Age-Off

rowID family qualifier timestamp valuerow1 fam1 qual1 1005 2row1 fam1 qual1 1004 5row1 fam1 qual1 1003 3row1 fam1 qual1 1002 2row1 fam1 qual1 1001 7

Current Time: 1103

K/V pair ismore than

100 sec. old

Age-Off

rowID family qualifier timestamp valuerow1 fam1 qual1 1005 2row1 fam1 qual1 1004 5row1 fam1 qual1 1003 3row1 fam1 qual1 1002 2row1 fam1 qual1 1001 7

Current Time: 1104

K/V pair ismore than

100 sec. old

Age-Off

rowID family qualifier timestamp valuerow1 fam1 qual1 1005 2row1 fam1 qual1 1004 5row1 fam1 qual1 1003 3row1 fam1 qual1 1002 2row1 fam1 qual1 1001 7

Current Time: 1105 K/V pair ismore than

100 sec. old

Manual Deletes

• Can insert ‘deletes’. They are inserted like other key-value pairs, any keys with an older timestamp is suppressed from reads

• Compactions write non-deleted data to new files

• Old files are then removed from HDFS

• To ensure data is deleted from disk,

• write deletes (they are now absent from query results)

• compact (can compact a particular range of a table if it’s large)

Garbage Collection

• Garbage collector compares the files in HDFS with the set of files currently active

• When files are no longer on the active list, GC waits for a while, then deletes from HDFS

Applications

• Fast lookups / scan on extremely large tables with flexible schemas, varying security

• Large index across heterogeneous data sets

• Continuous Summary Analytics via Iterators

• Secure Storage of key value pairs for MapReduce jobs

Where does your data come from?

• BigTable was designed to store data for web applications serving millions of users. Web application creates all the data. Many NoSQL databases are designed solely for this purpose. Accumulo can certainly support that.

• However, many organizations have lots of data from various sources. Different schema, different security levels. Bringing them together for analysis is very valuable. Accumulo can support this too.

Indexing and queries

• BigTable data model supports building a wide variety of indexes

• Simple strings, numbers, geo points, ip addresses, etc

• Each has to be coupled with query code

• New applications should examine their data access use cases, indexes and query code to accomplish those can then be written

• Best applications are constructed so each user request is a single scan, or a small number of scans

Compared to MapReduce

• Hadoop’s HDFS stores simple files. Usually unsorted.

• MapReduce is designed to process all or most of the files at once.

• Accumulo maintains a set of sorted files in HDFS

• Accumulo scans are designed to access a small portion of the data quickly.

• Fairly complementary

Tough use case

• Ran MapReduce on some input data set to create a large result set.

• Now have a few new records, want to update the result set

• MapReduce has to process all the data again, have to wait

• Accumulo allows users to perform a limited set of operations to update a result set incrementally, using Iterators

• Result sets are always up to date, immediately after insert

Combiners

row col fam col qual col vis time value

bill perf June_calls P June 1 9

bill perf June_calls P June 4 3

bill perf July_calls P July 3 4

bill perf July_calls P July 11 7

bill perf August_calls P Aug 12 5

bill perf August_calls P Aug 29 2

Combiners

row col fam col qual col vis time value

bill perf June_calls P - 12

bill perf July_calls P - 11

bill perf August_calls P - 7

Combiners

• Almost equivalent to Reduce of MapReduce except:

• Cannot assume we have seen all the values for a particular key

• Exactly equivalent to a Combiner function

Combiners

• Useful Combiners:

• Event count (StringSummation or LongSummation aggregator)

• Event hour occurrence histogram (NumArraySummation aggregator)

• Event duration histogram (NumArraySummation aggregator)

Conceptual Graph Representation

a

c

b

e

f

d

g

Edge table

row col fam col qual col vis time valuea edge f 1.0c edge b 1.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0g edge e 1.0g edge f 1.0

Edge Weights

• Summing Combiners are typically used to efficiently and incrementally update edge weights

• See SummingCombiner

Edge table

row col fam col qual col vis time valuea edge f 1.0c edge b 1.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0

Incoming: a, edge, f, 1.0

Edge table

row col fam col qual col vis time valuea edge f 2.0c edge b 1.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0

Edge table

row col fam col qual col vis time valuea edge f 2.0c edge b 1.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0

Incoming: c, edge, b, 6.0

Edge table

row col fam col qual col vis time valuea edge f 2.0c edge b 7.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0

Edge table

row col fam col qual col vis time valuea edge f 2.0c edge b 7.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0

Incoming: a, edge, f, 2.3

Edge table

row col fam col qual col vis time valuea edge f 4.3c edge b 7.0c edge d 1.0d edge b 1.0d edge e 1.0e edge d 1.0f edge g 1.0

Edge Table Applications

• Graph Analytics - traversal, neighbors, connected components

• Neighborhood = feature vector. Vector-based machine learning techniques. Nearest neighbor search, clustering, classification

• Automated dossiers, fact accumulation - ‘tell me everything we know about X’ in a single scan

• Find entities based on features - ‘show me everyone who has feature value > x’ or ‘with < 5 neighbors of type k’

RDF Triples

row col fam col qual col vis time value

DC is_capital_of USA 1.0

Don vacations_in Arctic 7.0

Don is_employed_by MI6 1.0

Sean has_status “007” 1.0

Sean starred_with Ursula 1.0

Sean starred_with Anya 0.7

Sean starred_with Teresa 0.3

Additional Training

Additional Training

• Talked about the basics today

• 3 days of developer training with hands on examples covering

• installation, configuration, read / write API, MapReduce, security, table configuration, indexing specific types, querying index tables, combiners, custom iterators, table constraints, storing relational data, joins, high performance considerations, document-partitioned indexing (text search), machine learning, object persistence

• 2 days of administrator training covering

• hardware selection, process assignment, troubleshooting, maintenance, replication and high availability, cluster modification, failure handling

Next Scheduled Training Sessions

• March 5-7 Columbia MD

• April 9-11 Columbia MD

• http://www.tetraconcepts.com/training

• aaron@tetraconcepts.com

• brian@tetraconcepts.com

top related