internals of md-sal clustering

66
www.opendaylight.org MD-SAL Clustering Internals Moiz Raja Open Daylight Summit 2015

Upload: nguyenanh

Post on 03-Jan-2017

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Internals of MD-SAL Clustering

www.opendaylight.org

MD-SAL Clustering Internals

Moiz Raja Open Daylight Summit 2015

Page 2: Internals of MD-SAL Clustering

www.opendaylight.org

▪ Abhishek  Kumar  ▪ Basheeruddin  Ahmed  ▪ Colin  Dixon  ▪ Harman  Singh  ▪ Kamal  Rameshan  ▪ Robert  Varga  ▪ Tony  Tkacik  

My  Collaborators  

2

Tom Pantelis

▪  Luis  Gomez  ▪ Phillip  Shea  ▪ Radhika  Hirannaiah  ▪ and  many  more…  

Page 3: Internals of MD-SAL Clustering

www.opendaylight.org

▪ Architecture  ▪ Modules  ▪ Flows  ▪ DiagnosHcs  ▪ QuesHons    

Agenda  

Page 4: Internals of MD-SAL Clustering

www.opendaylight.org

Architecture  

4

Page 5: Internals of MD-SAL Clustering

www.opendaylight.org

Subsystems

5

member-­‐1  

member-­‐2  

member-­‐3  

Distributed Data Store

member-­‐1   member-­‐2  

Remote RPC Connector

Page 6: Internals of MD-SAL Clustering

www.opendaylight.org

High  Level  Architecture

Distributed Data Store Remote RPC Connector

Persistence

Remoting

Clustering

Page 7: Internals of MD-SAL Clustering

www.opendaylight.org

Actor  Systems

7

Distributed Data Store Remote RPC Connector

Actor Hierarchy

Configuration

Dispatchers

Page 8: Internals of MD-SAL Clustering

www.opendaylight.org

Data  Synchroniza:on

8

Data store Synchronized Data Tree Raft for Distributed Consensus

Remote RPC Synchronized RPC Registry Gossip for data distribution

Page 9: Internals of MD-SAL Clustering

www.opendaylight.org

Distributed  Data  Store  Architecture  

9

Page 10: Internals of MD-SAL Clustering

www.opendaylight.org

Accessing  Remote  Data

10

Client

member-1 member-2

Page 11: Internals of MD-SAL Clustering

www.opendaylight.org

Loca:on  Transparency

11

Client

member-1 member-2

Dis

tribu

tedD

ataS

tore

Page 12: Internals of MD-SAL Clustering

www.opendaylight.org

DistributedDataStore

12

DOMStore

DistributedDataStore

Page 13: Internals of MD-SAL Clustering

www.opendaylight.org

Communica:on

13

Client

member-1 member-2

Dis

tribu

tedD

ataS

tore

Shard

Page 14: Internals of MD-SAL Clustering

www.opendaylight.org

Data  Distribu:on

14

Client

member-1

Dis

tribu

tedD

ataS

tore

member-2

member-3

topology  

inventory  

Page 15: Internals of MD-SAL Clustering

www.opendaylight.org

Module  Based  Shards

15

/

/inventory /topology /toaster

default

Page 16: Internals of MD-SAL Clustering

www.opendaylight.org

HA

16

member-2

member-3

inventory  –  follower  -­‐1  

inventory  –  follower  -­‐  2  

Client

member-1

DistributedDataStore

inventory  –  leader  

Page 17: Internals of MD-SAL Clustering

www.opendaylight.org

RaB  Distributed  Consensus

17

discovers  n

ode  with

 highe

r  term     Follower

Candidate

Leader

starts  up/    recovers  

Hmes  out,    starts  elecHons  

receives  votes  from  majority    of  nodes  

Hmes  out,    restarts  elecHons  

follower-­‐2  follower-­‐1  

leader  

Election Replication/Consensus

Page 18: Internals of MD-SAL Clustering

www.opendaylight.org

Journal  replica:on

18

leader

follower-1

follower-2

transaction-1

transaction-2

transaction-3

transaction-4

transaction-1

transaction-2

transaction-3

transaction-4

transaction-1

transaction-2

transaction-3

transaction-4

Page 19: Internals of MD-SAL Clustering

www.opendaylight.org

Snapshot  Replica:on

19

leader

follower-1

follower-2

Page 20: Internals of MD-SAL Clustering

www.opendaylight.org

Durability/Recovery

20

Journal

Snapshots

Page 21: Internals of MD-SAL Clustering

www.opendaylight.org

Remote  RPC  Architecture  

21

Page 22: Internals of MD-SAL Clustering

www.opendaylight.org

Invoking  a  Remote  RPC

22

Consumer

member-1 member-2

Provider

Page 23: Internals of MD-SAL Clustering

www.opendaylight.org

Loca:on  Transparency

23

Consumer

member-1 member-2

Provider R

pcP

rovi

derP

roxy

Rem

oteR

pcB

roke

r

Page 24: Internals of MD-SAL Clustering

www.opendaylight.org

RPC  Registry

24

Provider RPC

Registration Listener

RPC Registry

Page 25: Internals of MD-SAL Clustering

www.opendaylight.org

RPC  Registry  Replica:on  -­‐  Gossip

25

version=1  

version=2  

modify  

change  version  

Local  bucket  updates    change  version  

m1,v1  

m2,v5  

m3,v7  

All  buckets  and  their  versions  known  to  all  members  

Every  1  second  members  send  all  known  bucket    versions  to  any  one  peer  

m1  

m2   m3  status  

status  

m2   m3  

m1  

update  

local  versions  higher  –  send  update  local  versions  lower  –  send  status  to  sender  

Page 26: Internals of MD-SAL Clustering

www.opendaylight.org

Modules  

26

Page 27: Internals of MD-SAL Clustering

www.opendaylight.org

Modules

27

sal-clustering-commons

sal-akka-raft sal-remoterpc-connector

sal-distributed-datastore

sal-clustering-config

sal-akka-raft-example

sal-dummy-distributed-datastore

clustering-test-app

Page 28: Internals of MD-SAL Clustering

www.opendaylight.org

▪ Some  common  messages  ▪ Actor  base  classes  ▪ The  Protobuf  messages  used  in  Helium  ▪ The  Protobuf  NormalizedNode  serializaHon  code  ▪ The  NormalizedNode  streaming  code  ▪ Other  miscellaneous  uHlity  classes  

sal-­‐clustering-­‐commons

28

Page 29: Internals of MD-SAL Clustering

www.opendaylight.org

▪  ImplementaHon  of  the  Ra[  Algorithm  on  top  of  akka  ▪ Uses  akka-­‐persistence  for  durability  ▪ Provides  a  base  class  called  Ra:Actor  which  when  can  be  extended  by  anyone  who  wants  to  replicate  state  ▪ See  sal-­‐akka-­‐ra[-­‐example  which  provides  a  simple  implementaHon  of  a  replicated  HashMap  

sal-­‐akka-­‐raB

29

Page 30: Internals of MD-SAL Clustering

www.opendaylight.org

▪ ConcurrentDOMDataBroker  ▪ DistributedDataStore  ▪  ImplementaHon  of  the  DOMStore  SPI  ▪ Shard  built  on  top  of  Ra[Actor  ▪ Creates  Shards  based  on  Sharding  strategy  ▪ Code  for  a  client  to  interact  with  the  Shard  Leader  

sal-­‐distributed-­‐datastore

30

Page 31: Internals of MD-SAL Clustering

www.opendaylight.org

▪ RemoteRpcProvider  ▪ Default  RPC  Provider.  Invoked  when  an  RPC  is  not  found  in  the  local  MD-­‐SAL  registry.  ▪ Code  for  BucketStore  which  provides  a  mechanism  to  replicate  state  based  on  Gossip  ▪ Code  for  RpcBroker  which  allows  invoking  a  remote  rpc  

sal-­‐remoterpc-­‐connector

31

Page 32: Internals of MD-SAL Clustering

www.opendaylight.org

Data  store  flows  

32

Page 33: Internals of MD-SAL Clustering

www.opendaylight.org

Startup

33

DistributedConfigDataStoreProviderModule

DistributedDataStore

ShardManager

Shard1 Shard Shard3 Shard4

createInstance

ActorContext waitTillReadyLatch

create & waitTillReady

Page 34: Internals of MD-SAL Clustering

www.opendaylight.org

Recovery

34

Shard1 Shard Shard3 Shard4

ShardManager

read last known state from disk

ready

waitTillReadyLatch

countDown

Page 35: Internals of MD-SAL Clustering

www.opendaylight.org

▪ Recovery  must  be  complete  ▪ All  Shard  Leaders  must  be  known  ▪ Three  messages  are  monitored  by  ShardManager  

▪  Cluster.MemberStatusUp ▪ Used to figure out the address of a cluster member

▪  LeaderStateChanged ▪ Used to figure out if a Follower has a different Leader

▪  ShardRoleChanged ▪ Use to figured out any changes in a Shard’s Role

▪ WaiHng  is  not  infinite,  by  default  it  lasts  only  90  seconds  but  is  configurable  ▪ Will  block  config  sub-­‐system  

Wai:ng  for  Ready

35

Page 36: Internals of MD-SAL Clustering

www.opendaylight.org

Crea:ng  a  Transac:on

36

DistributedDataStore newReadWriteTransaction

TransactionProxy

create

Page 37: Internals of MD-SAL Clustering

www.opendaylight.org

First  Opera:on

37

ActorContext.findPrimary

PrimaryCache.lookup/ShardManager.findPrimary

Found?

LocalTransactionContext RemoteTransactionContext

NoOpTransactionContext

TransactionProxy write(“inventory”, node)

Local?

N

Y N

Page 38: Internals of MD-SAL Clustering

www.opendaylight.org

Transac:ons

38

Client

DistributedDataStore

inventory  –  leader  

Client

DistributedDataStore

inventory  –  leader  

Local Transaction Remote Transaction

mem

ber-

1 mem

ber-

1 m

embe

r-2

Page 39: Internals of MD-SAL Clustering

www.opendaylight.org

Local  Transac:on  Op:miza:on

39

LocalTransactionContext Shard - Leader

write

merge

delete

ready

member-1

Page 40: Internals of MD-SAL Clustering

www.opendaylight.org

Remote  Transac:on  Op:miza:on

40

RemoteTransactionContext Shard Leader

write

merge

delete

ready

write  mod  

merge  mod  

delete  mod  

member-1 member-2

Page 41: Internals of MD-SAL Clustering

www.opendaylight.org

Transac:on  Rate  Limi:ng

41

rate-limit = 100 Tx/Sec

Tx Cohort

Shard Leader

member-2

20ms

Tx Cohort

50ms

Tx Cohort

15ms

after rate-limit/2 transactions done…. new-rate-limit = 25 Tx/Sec

Page 42: Internals of MD-SAL Clustering

www.opendaylight.org

Opera:on  Limi:ng

42

RemoteTransactionContext Shard Leader

write

merge

delete

write  mod  

merge  mod  

delete  mod  

member-1 member-2

…  …

block

Page 43: Internals of MD-SAL Clustering

www.opendaylight.org

Commit  Coordina:on

43

Shard Leader

member-2

Shard CommitCoordinator

Tx1  -­‐  ready  

Tx2  -­‐  ready  

Tx3  -­‐  ready  

Tx1  -­‐  commit  

Tx3  -­‐  commit  

Tx3  -­‐  abort  

Tx2  -­‐  commit  

Tx1  

Tx2  

Tx3  

Page 44: Internals of MD-SAL Clustering

www.opendaylight.org

Managing  the  in-­‐memory  journal  Replicated  To  All

44

Client leader

follower-1 follower-2

commit transaction

txn

txn txn

Page 45: Internals of MD-SAL Clustering

www.opendaylight.org

Managing  the  in-­‐memory  journal  Cluster  member  unavailable

45

Client leader

follower-1 follower-2

commit transaction

txn

txn

txn txn txn

txn txn txn

Page 46: Internals of MD-SAL Clustering

www.opendaylight.org

Data  Change  No:fica:ons

46

Client leader

follower-1 follower-2

commit transaction

txn

txn txn

notify

Page 47: Internals of MD-SAL Clustering

www.opendaylight.org

RPC  Connector  flows  

47

Page 48: Internals of MD-SAL Clustering

www.opendaylight.org

Startup

48

RemoteRpcBrokerModule createInstance

RpcManager

RemoteRpcProvider

RpcBroker RpcRegistry RemoteRpcImpl RpcListener

Page 49: Internals of MD-SAL Clustering

www.opendaylight.org

Default  RPC  Delegate

49

RpcManager SchemaContext

DOMRpcProviderService

read all rpc definitions

registerImplementation(remoteRpcImpl)

Page 50: Internals of MD-SAL Clustering

www.opendaylight.org

RPC  Registered

50

RpcProviderRegistry addRoutedRpcImpl

RoutedRpcRegistration registerPath

RpcListener

RpcRegistry

Page 51: Internals of MD-SAL Clustering

www.opendaylight.org

Invoking  a  Remote  RPC

51

RemoteRpcImpl invokeRpc

RpcRegistry

Route found?

RpcBroker

ExecuteRpc

FooService

throw Exception

Page 52: Internals of MD-SAL Clustering

www.opendaylight.org

Invoking  a  Remote  RPC

52

RemoteRpcImpl

Consumer

Provider

member-1 member-2

RpcBroker

RpcRegistry

invokeRpc

invokeRpc findRoute

ExecuteRpc

Page 53: Internals of MD-SAL Clustering

www.opendaylight.org

Data  store  DiagnosCcs  

53

Page 54: Internals of MD-SAL Clustering

www.opendaylight.org

Transac:on  Tracing

54

Created  txn  member-­‐2-­‐txn-­‐9400  of  type  READ_WRITE  on  chain  member-­‐2-­‐txn-­‐chain-­‐13  

Client

Server

Tx  member-­‐2-­‐txn-­‐9400  read  /(urn:opendaylight:inventory?...  

member-­‐3-­‐shard-­‐inventory-­‐operaHonal:  CreaHng  transacHon  :  shard-­‐member-­‐2-­‐txn-­‐9400  

Tx  member-­‐2-­‐txn-­‐9400  Readying  1  transacHons  for  commit  

Tx  member-­‐2-­‐txn-­‐9400  commit  

member-­‐3-­‐shard-­‐inventory-­‐operaHonal:  Readying  transacHon  member-­‐2-­‐txn-­‐9400  

member-­‐3-­‐shard-­‐inventory-­‐operaHonal:  Commigng  transacHon  member-­‐2-­‐txn-­‐9400  

Tx  member-­‐2-­‐txn-­‐9400:  commit  succeeded  

Cluster  Member  IniHator  

Counter  

TransacHon  Type  

Module  

Data  store  type  

Page 55: Internals of MD-SAL Clustering

www.opendaylight.org

Replica:on  Tracing

55

Leader

Sending  AppendEntries  to  follower  member-­‐2-­‐shard-­‐topology-­‐operaHonal:  AppendEntries  [term=2,  leaderId=member-­‐1-­‐shard-­‐topology-­‐operaHonal,  prevLogIndex=520,  prevLogTerm=2,  entries=[Entry{index=521,  term=2}],  leaderCommit=520,  replicatedToAllIndex=-­‐1]  

Follower handleAppendEntries:  AppendEntries  [term=2,  leaderId=member-­‐2-­‐shard-­‐topology-­‐operaHonal,    prevLogIndex=520,  prevLogTerm=2,  entries=[Entry{index=521,  term=2}],  leaderCommit=520,    replicatedToAllIndex=-­‐1]  

handleAppendEntries  returning  :  AppendEntriesReply  [term=2,  success=true,  logLastIndex=521,    logLastTerm=2,  followerId=member-­‐1-­‐shard-­‐topology-­‐operaHonal]  

handleAppendEntriesReply  from  member-­‐2-­‐shard-­‐topology-­‐operaHonal:  applying  to  log  –    commitIndex:  521,  lastAppliedIndex:  520  

handleAppendEntriesReply  -­‐  FollowerLogInformaHon  for  member-­‐2-­‐shard-­‐topology-­‐operaHonal  updated:    matchIndex:  521,  nextIndex:  522  

Page 56: Internals of MD-SAL Clustering

www.opendaylight.org

Shard  MBean

56

org.opendaylight.controller:type=DistributedOperaHonalDataStore,Category=Shards,name=member-­‐1-­‐shard-­‐inventory-­‐operaHonal  

OperaHonal  

Config  

member-­‐1  

member-­‐2  

member-­‐3  

default  

inventory  

topology  

operaHonal  

config  

Attributes AbortTransacHonsCount   CommitIndex   CommiledTransacHon

sCount  CurrentTerm   FailedTransacHonsCount  

FollowerInfo   FollowerIniHalSync  Status  

InMemoryJournalData  Size  

InMemoryJournalLogSize  

LastApplied  

LastCommiledTransacHonTime  

LastIndex   LastTerm   Leader   Ra[State  

ReadOnlyTransacHon  Count  

ReadWriteTransacHonCount  

WriteOnlyTransacHon  Count  

VotedFor   and  more….  

Page 57: Internals of MD-SAL Clustering

www.opendaylight.org

ShardManager  MBean

57

org.opendaylight.controller:type=DistributedOperaHonalDataStore,Category=ShardManager,name=shard-­‐manager-­‐operaHonal  

OperaHonal  

Config  

operaHonal  

config  

Attributes

•  LocalShards  •  SyncStatus  

Page 58: Internals of MD-SAL Clustering

www.opendaylight.org

Data  store  GeneralRun:meInfo  MBean

58

org.opendaylight.controller:type=DistributedConfigDatastore,name=GeneralRunHmeInfo  

OperaHonal  

Config  

Attributes

•  TransacHonCreaHonRateLimit  

Page 59: Internals of MD-SAL Clustering

www.opendaylight.org

Transac:on  Commit  Rate  MBean

59

org.opendaylight.controller.cluster.datastore:name=distributed-­‐data-­‐store.config.commit.rate  

Attributes

•  50thPercentile •  75thPercenHle  •  90thPercenHle  •  and  so  on…  

operaHonal  

config  

•  Count •  Min  •  Max  •  StdDev  

Page 60: Internals of MD-SAL Clustering

www.opendaylight.org

Data  store  GeneralRun:meInfo  MBean

60

org.opendaylight.controller:type=DistributedConfigDatastore,name=GeneralRunHmeInfo  

OperaHonal  

Config  

Attributes

•  TransacHonCreaHonRateLimit  

Page 61: Internals of MD-SAL Clustering

www.opendaylight.org

Message  Sta:s:cs  MBean

61

org.opendaylight.controller.actor.metric:name=/user/shardmanager-­‐config.msg-­‐rate.ActorIniHalized  

Attributes

•  50thPercentile •  75thPercenHle  •  90thPercenHle  •  and  so  on…  

operaHonal  

config  

•  Count •  Min  •  Max  •  StdDev  

Message  Name  

Page 62: Internals of MD-SAL Clustering

www.opendaylight.org

Remote  RPC  Broker  DiagnosCcs  

62

Page 63: Internals of MD-SAL Clustering

www.opendaylight.org

RemoteRpcBroker  MBean

63

org.opendaylight.controller:type=RemoteRpcBroker,name=RemoteRpcRegistry  

Attributes

•  BucketVersions •  GlobalRpc  •  LocalRegisteredRoutedRpc  

Operations

•  findRpcByName  •  findRpcByRoute  

Page 64: Internals of MD-SAL Clustering

www.opendaylight.org

Message  Sta:s:cs  MBean

64

org.opendaylight.controller.actor.metric:name=/user/rpc/registry.msg-­‐rate.AddOrUpdateRoutes  

Attributes

•  50thPercentile •  75thPercenHle  •  90thPercenHle  •  and  so  on…  

•  Count •  Min  •  Max  •  StdDev  

Message  Name  

Page 65: Internals of MD-SAL Clustering

www.opendaylight.org 65

Page 66: Internals of MD-SAL Clustering

www.opendaylight.org

▪ Deploy  a  cluster  ▪ Run  clustering  integraHon  tests  ▪ Write  an  applicaHon  that  works  in  the  cluster  ▪ Write  bugs  to  report  features  which  you  find  missing  ▪ Try  running  dsBenchMark  on  a  cluster  ▪ Test  out  replicaHon  using  the  dummy  data  store  ▪ Check  out  the  code  ▪ Send  email  to  controller-­‐[email protected]  with  quesHons  

Suggested  Next  Steps…  

66