phoenix secondary indexing - la hug sept 9th, 2013

Download Phoenix Secondary Indexing - LA HUG Sept 9th, 2013

Post on 11-May-2015

1.719 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

In-depth look at secondary indexing for phoenix

TRANSCRIPT

  • 1.Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer LA HBase User Group September 4, 2013

2. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap 2 LA HUG Sept 2013 https://www.madison.k12.wi.us/calendars 3. About me Developer at Salesforce System of Record, Phoenix Open Source Phoenix HBase Accumulo 3 LA HUG Sept 2013 4. Phoenix Open Source https://github.com/forcedotcom/phoenix SQL-skin on HBase Everyone knows SQL! JDBC Driver Plug-and-play Faster than HBase in some cases 4 LA HUG Sept 2013 5. Why Index? HBase is only sorted on 1 axis Great for search via a single pattern Example! LA HUG Sept 20135 6. Example name: type: subtype: date: major: minor: quantity: LA HUG Sept 20136 7. Secondary Indexes Sort on orthogonal axis Save full-table scan Expected database feature Hard in HBase b/c of ACID considerations LA HUG Sept 20137 8. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap 8 LA HUG Sept 2013 9. 9 LA HUG Sept 2013 http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/ 10. Other (Major) Indexing Frameworks HBase SEP Side-Effects Processor Replication-based https://github.com/NGDATA/hbase-sep Huawei Server-local indexes Buddy regions https://github.com/Huawei-Hadoop/hindex 10 LA HUG Sept 2013 11. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap 11 LA HUG Sept 2013 12. Immutable Indexes Immutable Rows Much easier to implement Client-managed Bulk-loadable 12 LA HUG Sept 2013 13. Bulk Loading phoenix-hbase.blogspot.com 13 LA HUG Sept 2013 14. Index Bulk Loading Identity Mapper Custom Phoenix Reducer 14 LA HUG Sept 2013 HFile Output Format 15. Index Bulk Loading PreparedStatement statement = conn.prepareStatement(dmlStatement); statement.execute(); String upsertStmt = "upsert into core.entity_history(organization_id,key_prefix,entity_history_id, created_by, created_date)n" + "values(?,?,?,?,?)"; statement = conn.prepareStatement(upsertStmt); //set values Iterator>>dataIterator = PhoenixRuntime.getUncommittedDataIterator(conn); 15 LA HUG Sept 2013 16. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap 16 LA HUG Sept 2013 17. The fun stuff 17 LA HUG Sept 2013 18. 1.5 years 18 LA HUG Sept 2013 19. Mutable Indexes Global Index Change row state Common use-case expected implementation Covered Columns 19 LA HUG Sept 2013 20. Usage Just SQL! Baby name popularity Mock demo 20 LA HUG Sept 2013 21. Usage Selects the most popular name for a given year SELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1; Selects the total occurrences of a given name across all years SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY name; Selects the total occurrences of a given name across all years allowing an index to be used SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG Sept 201321 22. Usage Update rows due to census inaccuracy Will only work if the mutable indexing is working UPSERT INTO baby_names SELECT year,occurrences+3000,sex,name FROM baby_names WHERE name='Jesse'; Selects the now updated data (from the index table) SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; Index table still used in scans EXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG Sept 201322 23. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap 23 LA HUG Sept 2013 24. Internals Index Management Build index updates Ensures index is cleaned up Recovery Mechanism Ensures index updates are ACID 24 LA HUG Sept 2013 25. There is no magic - Every programming hipster (chipster) LA HUG Sept 201325 26. Mutable Indexing: Standard Write Path 26 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG Sept 2013 27. Mutable Indexing: Standard Write Path 27 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG Sept 2013 28. Mutable Indexing 28 Region Coprocessor Host WAL Region Coprocessor Host Indexer Builder WAL Updater Durable! Indexer Index Table Index Table Index Table Codec LA HUG Sept 2013 29. Index Management 29 Lives within a RegionCoprocesorObserver Access to the local HRegion Specifies the mutations to apply to the index tables public interface IndexBuilder{ public void setup(RegionCoprocessorEnvironmentenv); public MapgetIndexUpdate(Put put); public MapgetIndexUpdate(Deletedelete); } LA HUG Sept 2013 30. Why not write my own? Managing Cleanup Efficient point-in-time correctness Performance tricks Abstract access to HRegion Minimal network hops Sorting correctness Phoenix typing ensures correct index sorting LA HUG Sept 201330 31. Example: Managing Cleanup Updates can arrive out of order Client-managed timestamps LA HUG Sept 201331 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3 32. Example: Managing Cleanup Index Table LA HUG Sept 201332 ROW FAMILY QUALIFIER TS Val1|Row1 Index Fam:Qual 10 Val1|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13 33. Example: Managing Cleanup LA HUG Sept 201333 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3 Row1 Fam Qual 11 val4 34. Example: Managing Cleanup LA HUG Sept 201334 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Row1 Fam Qual 11 val4 Row1 Fam2 Qual2 12 val2 Row1 Fam Qual 13 val3 35. Example: Managing Cleanup LA HUG Sept 201335 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13 36. Example: Managing Cleanup LA HUG Sept 201336 ROW FAMILY QUALIFIER TS Va1|Row1 Index Fam:Qual 10 Val4|Row1 Index Fam:Qual 11 Val4|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Va1l|Val2|Row1 Index Fam:Qual Fam2:Qual2 12 Val3|Val2|Row1 Index Fam:Qual Fam2:Qual2 13 37. Managing Cleanup History roll up Out-of-order Updates Point-in-time correctness Multiple Timestamps per Mutation Delete vs. DeleteColumn vs. DeleteFamily Surprisingly hard! LA HUG Sept 201337 38. Phoenix Index Builder Much simpler than full index management Hides cleanup considerations Abstracted access to local state LA HUG Sept 201338 public interfaceIndexCodec{ public void initialize(RegionCoprocessorEnvironmentenv); public IterablegetIndexDeletes(TableState state; public IterablegetIndexUpserts(TableState state); } 39. Phoenix Index Codec LA HUG Sept 201339 40. Dude, wheres my data? 40 LA HUG Sept 2013 Ensuring Correctness 41. HBase ACID Does NOT give you: Cross-row consistency Cross-table consistency Does give you: Durable data on success Visibility on success without partial rows 41 LA HUG Sept 2013 42. Key Observation Secondary indexing is inherently an easier problem than full transactions secondary index updates are idempotent. - Lars Hofhansl 42 LA HUG Sept 2013 43. Idempotent Index Updates Doesnt need full transactions Replay as many times as needed Can tolerate a little lag As long as we get the order right 43 LA HUG Sept 2013 44. Failure Recovery Custom WALEditCodec Encodes index updates Supports compressed WAL Custom WAL Reader Replay index updates from WAL LA HUG Sept 201344 hbase.regionserver.wal.codeco.a.h.hbase.regionserver.w al.IndexedWALEditCodechbase.regionserver.hlog.reader.implo.a.h.hbase.regionserver.wal.IndexedHLogReader 45. Failure Situations Any time before WAL, client replay Any time after WAL, HBase replay All-or-nothing LA HUG Sept 201345 46. Failure #1: Before WAL 46 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG Sept 2013 47. Failure #1: Before WAL 47 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore No problem! No data is stored in the WAL, client just retries entire update. LA HUG Sept 2013 48. Failure #2: After WAL 48 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG Sept 2013 49. Failure #2: After WAL 49 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore WAL replayed via usual replay mechanisms LA HUG Sept 2013 50. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes Roadmap 50 LA HUG Sept 2013 51. Roadmap Next release of Phoenix Performance testing Increased adoption Adding to HBase (?) 51 LA HUG Sept 2013 52. Open Source! Main: https://github.com/forcedotcom/phoenix Indexing: https://github.com/forcedotcom/phoenix/tree/mutable-si 52 LA HUG Sept 2013 53. (obligatory hiring slide) Were Hiring! 54. Questions? Comments? jyates@salesforce.com @jesse_yates