phoenix secondary indexing - la hug sept 9th, 2013

54
Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer LA HBase User Group – September 4, 2013

Upload: jesse-yates

Post on 11-May-2015

1.724 views

Category:

Technology


0 download

DESCRIPTION

In-depth look at secondary indexing for phoenix

TRANSCRIPT

Page 1: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

Secondary Indexing in Phoenix

Jesse YatesHBase CommitterSoftware Engineer

LA HBase User Group – September 4, 2013

Page 2: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

2

Agenda

• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes in Phoenix

• Mutable Indexing Internals

• Roadmap

LA HUG – Sept 2013

https://www.madison.k12.wi.us/calendars

Page 3: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

3

About me

• Developer at Salesforce– System of Record, Phoenix

• Open Source– Phoenix– HBase– Accumulo

LA HUG – Sept 2013

Page 4: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

4

Phoenix

• Open Source– https://github.com/forcedotcom/phoenix

• “SQL-skin” on HBase– Everyone knows SQL!

• JDBC Driver– Plug-and-play

• Faster than HBase– in some cases

LA HUG – Sept 2013

Page 5: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

5

Why Index?

• HBase is only sorted on 1 “axis”

• Great for search via a single pattern

Example!LA HUG – Sept 2013

Page 6: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

6

Example

name:type:

subtype:date:

major:minor:

quantity:

LA HUG – Sept 2013

Page 7: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

7

Secondary Indexes

• Sort on ‘orthogonal’ axis

• Save full-table scan

• Expected database feature

• Hard in HBase b/c of ACID considerations

LA HUG – Sept 2013

Page 8: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

8

Agenda

• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes in Phoenix

• Mutable Indexing Internals

• Roadmap

LA HUG – Sept 2013

Page 9: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

9 LA HUG – Sept 2013

http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/

Page 10: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

10

Other (Major) Indexing Frameworks

• HBase SEP– Side-Effects Processor– Replication-based– https://github.com/NGDATA/hbase-sep

• Huawei – Server-local indexes– Buddy regions– https://github.com/Huawei-Hadoop/hindex

LA HUG – Sept 2013

Page 11: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

11

Agenda

• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes in Phoenix

• Mutable Indexing Internals

• Roadmap

LA HUG – Sept 2013

Page 12: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

12

Immutable Indexes

• Immutable Rows

• Much easier to implement

• Client-managed

• Bulk-loadable

LA HUG – Sept 2013

Page 13: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

13

Bulk Loading

phoenix-hbase.blogspot.com

LA HUG – Sept 2013

Page 14: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

14

Index Bulk Loading

Identity Mapper

Custom Phoenix Reducer

LA HUG – Sept 2013

HFile Output Format

Page 15: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

15

Index Bulk LoadingPreparedStatement statement = conn.prepareStatement(dmlStatement); statement.execute();

String upsertStmt = "upsert into core.entity_history(organization_id,key_prefix,entity_history_id, created_by, created_date)\n" + "values(?,?,?,?,?)";

statement = conn.prepareStatement(upsertStmt);… //set values

Iterator<Pair<byte[],List<KeyValue>>> dataIterator = PhoenixRuntime.getUncommittedDataIterator(conn);

LA HUG – Sept 2013

Page 16: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

16

Agenda

• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes in Phoenix

• Mutable Indexing Internals

• Roadmap

LA HUG – Sept 2013

Page 17: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

17

The “fun” stuff…

LA HUG – Sept 2013

Page 18: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

18

1.5 years

LA HUG – Sept 2013

Page 19: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

19

Mutable Indexes

• Global Index

• Change row state– Common use-case– “expected” implementation

• Covered Columns

LA HUG – Sept 2013

Page 20: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

20

Usage

• Just SQL!

• Baby name popularity

• Mock demo

LA HUG – Sept 2013

Page 21: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

21

Usage• Selects the most popular name for a given yearSELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1;

• Selects the total occurrences of a given name across all years SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names

WHERE name='Jesse' GROUP BY name;

• Selects the total occurrences of a given name across all years allowing an index to be used

SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME;

LA HUG – Sept 2013

Page 22: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

22

Usage• Update rows due to census inaccuracy

– Will only work if the mutable indexing is working

UPSERT INTO baby_names SELECT year,occurrences+3000,sex,name FROM baby_names WHERE name='Jesse';

• Selects the now updated data (from the index table)

SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME;

• Index table still used in scans

EXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME;

LA HUG – Sept 2013

Page 23: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

23

Agenda

• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes in Phoenix

• Mutable Indexing Internals

• Roadmap

LA HUG – Sept 2013

Page 24: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

24

Internals

• Index Management– Build index updates– Ensures index is ‘cleaned up’

• Recovery Mechanism– Ensures index updates are “ACID”

LA HUG – Sept 2013

Page 25: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

25

“There is no magic”

- Every programming hipster (chipster)

LA HUG – Sept 2013

Page 26: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

26

Mutable Indexing: Standard Write Path

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

LA HUG – Sept 2013

Page 27: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

27

Mutable Indexing: Standard Write Path

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

LA HUG – Sept 2013

Page 28: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

28

Mutable Indexing

RegionCoprocessor

Host

WAL

RegionCoprocessor

Host

Indexer Builder

WAL Updater

Durable!

IndexerIndex Table

Index TableIndex Table

Codec

LA HUG – Sept 2013

Page 29: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

29

Index Management

• Lives within a RegionCoprocesorObserver• Access to the local HRegion• Specifies the mutations to apply to the index

tables

public interface IndexBuilder {public void setup(RegionCoprocessorEnvironment env);public Map<Mutation, String> getIndexUpdate(Put put);public Map<Mutation, String> getIndexUpdate(Delete delete);

}

LA HUG – Sept 2013

Page 30: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

30

Why not write my own?

• Managing Cleanup – Efficient point-in-time correctness– Performance tricks

• Abstract access to HRegion– Minimal network hops

• Sorting correctness– Phoenix typing ensures correct index sorting

LA HUG – Sept 2013

Page 31: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

31

Example: Managing Cleanup

• Updates can arrive out of order– Client-managed timestamps

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS VALUE

Row1 Fam Qual 10 val1

Row1 Fam2 Qual2 12 val2

Row1 Fam Qual 13 val3

Page 32: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

32

Example: Managing Cleanup

Index Table

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS

Val1|Row1 Index Fam:Qual 10

Val1|Val2|Row1 Index Fam:QualFam2:Qual2

12

Val3|Val2|Row1 Index Fam:QualFam2:Qual2

13

Page 33: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

33

Example: Managing Cleanup

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS VALUE

Row1 Fam Qual 10 val1

Row1 Fam2 Qual2 12 val2

Row1 Fam Qual 13 val3

Row1 Fam Qual 11 val4

Page 34: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

34

Example: Managing Cleanup

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS VALUE

Row1 Fam Qual 10 val1

Row1 Fam Qual 11 val4

Row1 Fam2 Qual2 12 val2

Row1 Fam Qual 13 val3

Page 35: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

35

Example: Managing Cleanup

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS

Va1|Row1 Index Fam:Qual 10

Val4|Row1 Index Fam:Qual 11

Val4|Val2|Row1 Index Fam:QualFam2:Qual2

12

Va1l|Val2|Row1 Index Fam:QualFam2:Qual2

12

Val3|Val2|Row1 Index Fam:QualFam2:Qual2

13

Page 36: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

36

Example: Managing Cleanup

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS

Va1|Row1 Index Fam:Qual 10

Val4|Row1 Index Fam:Qual 11

Val4|Val2|Row1 Index Fam:QualFam2:Qual2

12

Va1l|Val2|Row1 Index Fam:QualFam2:Qual2

12

Val3|Val2|Row1 Index Fam:QualFam2:Qual2

13

Page 37: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

37

Managing Cleanup

• History “roll up”• Out-of-order Updates• Point-in-time correctness• Multiple Timestamps per Mutation• Delete vs. DeleteColumn vs. DeleteFamily

Surprisingly hard!LA HUG – Sept 2013

Page 38: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

38

Phoenix Index Builder

• Much simpler than full index management• Hides cleanup considerations• Abstracted access to local state

LA HUG – Sept 2013

public interface IndexCodec{public void initialize(RegionCoprocessorEnvironment env);public Iterable<IndexUpdate> getIndexDeletes(TableState state;public Iterable<IndexUpdate> getIndexUpserts(TableState state);

}

Page 39: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

39

Phoenix Index Codec

LA HUG – Sept 2013

Page 40: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

40

Dude, where’s my data?

LA HUG – Sept 2013

Ensuring Correctness

Page 41: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

41

HBase ACID

• Does NOT give you:– Cross-row consistency– Cross-table consistency

• Does give you:– Durable data on success– Visibility on success without partial rows

LA HUG – Sept 2013

Page 42: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

Key Observation

“Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.”

- Lars Hofhansl

42 LA HUG – Sept 2013

Page 43: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

43

Idempotent Index Updates

• Doesn’t need full transactions

• Replay as many times as needed

• Can tolerate a little lag– As long as we get the order right

LA HUG – Sept 2013

Page 44: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

44

Failure Recovery• Custom WALEditCodec– Encodes index updates– Supports compressed WAL

• Custom WAL Reader– Replay index updates from WAL

LA HUG – Sept 2013

<property><name>hbase.regionserver.wal.codec</name> <value>o.a.h.hbase.regionserver.wal.IndexedWALEditCodec</value>

</property><property>

<name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value>

</property>

Page 45: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

45

Failure Situations

• Any time before WAL, client replay

• Any time after WAL, HBase replay

• All-or-nothing

LA HUG – Sept 2013

Page 46: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

46

Failure #1: Before WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

LA HUG – Sept 2013

Page 47: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

47

Failure #1: Before WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

No problem! No data is stored in the WAL, client just retries entire update.

LA HUG – Sept 2013

Page 48: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

48

Failure #2: After WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

LA HUG – Sept 2013

Page 49: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

49

Failure #2: After WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

WAL replayed via usual replay mechanisms

LA HUG – Sept 2013

Page 50: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

50

Agenda

• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes

• RoadmapLA HUG – Sept 2013

Page 51: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

51

Roadmap

• Next release of Phoenix

• Performance testing

• Increased adoption

• Adding to HBase (?)

LA HUG – Sept 2013

Page 52: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

52

Open Source!

• Main: https://github.com/forcedotcom/phoenix

• Indexing:https://github.com/forcedotcom/phoenix/tree/mutable-si

LA HUG – Sept 2013

Page 53: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

(obligatory hiring slide)

We’re Hiring!

Page 54: Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

Questions? Comments?

[email protected]@jesse_yates