hbase from the trenches - phoenix data conference 2015

31
HBase from the Trenches Avinash Ramineni Email: [email protected] LinkedIn: https://www.linkedin.com/in/avinashramineni

Upload: avinash-ramineni

Post on 16-Jan-2017

52 views

Category:

Technology


0 download

TRANSCRIPT

HBase fromtheTrenchesAvinashRamineni

Email: [email protected]: https://www.linkedin.com/in/avinashramineni

Agenda

• IntrotoHBase– Overview– DataModel– Architecture

• CommonProblems• BestPractices• ToolsandUtilities

IntrotoHBase

• Non-relationaldistributedcolumn-orienteddatabase–ModeledafterGoogle’sBigTable• BillionsofRowsandMillionsofColumns

• Sparse,consistent,distributedsortedmap• BuiltontopofHDFS• TightintegrationwithMapReduce• SupportsRandomCRUDOperations

IntrotoHBase

• FaultTolerant• HorizontallyScalable• Real-timeRandomread-writeaccesstodatastoredinHDFS

• Millionsofqueries/second• Supportfortransactionsatasinglerowlevel• Bloomfilters• AutomaticSharding• ImplementedinJava

DataModel

• DataisstoredinTables• Tablescontainrows– Rowsarereferencedbyauniquekey- Rowkey

• Rowsaremadeofcolumnswhicharegroupedincolumnfamilies

• Rowsaresorted• Everythingisstoredasasequenceofbytes• Allentriesareversionedandtimestamped

DataRepresenation

HBase Cluster

• HBase Master• Zookeeper• RegionServers• HDFS- DataNodes

ComponentView

LogicalView

HBase API

• APIissimple• Operations– Get,Put,Delete,Scan,MapReduce

• Connection• Createthisinstanceonlyonceperapplicationandshareitduringitsruntime• Htable

– Zookeeper• HBase:meta

ColumnFamilies• Allcolumnsthatareaccesed togetherneedtobegroupedintoaColumnFamily

• Noneedtoaccessorloaddatathatisnotused• Atthecolumnfamilywecandefinethesettingslike– compression,versionretentionpolicy,cachepriority– Understandthedata,accesspatternandgroupcolumnfamily

• ColumnFamilyandColumnQualifiersarestoredasbytes– Avoidbeingverbose

HBase WritePath

HBase Compactions

• HDFSdoesnotsupportupdates– HFilesareimmutable– NewHFilesarecreated

• MinorCompactions– SmallHFilesaremergedintolargerHfiles– Deletesarenotapplied

• MajorCompactions– Hfiles withincolumnfamilyaremergedintoSingleHfile

– Deletesareapplied

Rowkey

• Immutable• Getitrightthefirsttimebeforealotofdataisloaded

• Whatifwegotitwrong?– Newtableandloadthedata– IfTTLset..let thedataexpire

SecondaryIndexes

• Querying/AccessingrecordsotherthanbyRowkey

• MapReducejobstopopulateindextable– Periodicupdate

• Buildasecondaryindexwithdualwrites• Co-processors

RegionHotspotting• Clienttrafficnotequallydistributedacrosstheregionservers– Performancedegradation– Regionunavailability

• Poorrowkeydesign– MonotonicallyincreasingRowKey

• TimeseriesorSequence– Salting

• ReadVsWrites• GET?

– Hashing• Saltwithone-wayhashofrowkey

ShortCircuitReads

• RegionServersareco-locatedwithdatanodes• HMaster assignsRegionskeepingdatalocalityintoconsideration(mostly)

• dfs.client.read.short-circuit– RegionServersreadthedatadirectlyfromHDFSratherthangoingthroughDatanode

• LocalityLoss

Pre-Splitting• Regionsplitting– Growsuntill itneedstobesplit– Regionatatimeisservedbyonly1RegionServer

• Pre-splitatableintoregionsattablecreationtime– Uniformlydistributewriteloadacrossregionservers– Understandthekeyspace

• Riskofunevenloaddistribution• Autosplitting– Constantsizeregionsplitpolicy– IncreasingToUpperBoundRegionSplitPolicy

BulkLoading

• NativeAPI– DisableWAL

• MapReduce JobtogenerateHfile– Loadusingcompletebulkload /importTSV tool• Loadsintorelevantregion

– Fasterthangoingthroughnormalwritepath• NowritestoWALandMemstore• Noflushingandcompacting

Troubleshooting

• ulimit -n– Limitsonnumberoffilesandprocesss

• HBase isdatabaseandneedstoopenanumberoffiles

• dfs.datanode.max.transfer.threadsrr.• Network• OSParameters

YouareDeadException

• RegionServersgoingdown– Zookeeper• Distributedco-ordinated service

– HBase Masteraskstheregionservertoshutdown– GarbageCollection– Zookeepersessiontimeout

PerformanceTuning• Compression– Reducesdatastoredondiskandtransferred– Compressionspeedoverratio

• LoadBalancing- Balancer• MergingRegions• BatchWrites– ClientWriteBuffer– AutoFlush

• MemStore-localallocationbuffers– GarbageCollectionIssues

Tuning• HeavyWrites– Flushes,compacting,splitting increaseIOanddegradeclusterperformance• KeepRegionsizeslarger• KeepHfile sizelarge

• HeavySequentialReads• Higherblocksize• AvoidCachingontable

• HeavyRandomReads• HigherBlocklevel cache• LowerMemstore limit• Smallerblocksize

ApachePhoenix

• SQLoverHbase– CompilesintoHbase Scans– Orchetrates parallelexecution– Aggregatequeries

• JDBCAPI’soverNativeHBase API.• SaltingBucketsPreSplitting• Trafodion– TransactionalSQLonHBase

Hannibal

• MonitorandmaintainHBase Clusters• Howwellregionsarebalancedoverthecluster?

• Howwellregionsaresplitforeachtable• Howregionsevolveovertime• Howlongcompactionstake• IntegrationwithHUE

Hannibal

Hannibal

Hannibal

OperationalAspects• Metrics

– Master• Clusterrequests,split time,split size

– RegionServer• Blockcache,memstore,compaction,store,IO

• Htrace– Tracetoolforparalleldistributedsystem

• Monitoring– Nagios– Hannibal– Ganglia– Graphite– OpenTSDB

• Backup– Export,CopyTable,Snapshot

Questions?

[email protected]