zoo keeper

30
ZooKeeper 高高高高高高高高高high available and reliable coordination system

Upload: amazingjxq

Post on 29-May-2015

790 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Zoo keeper

ZooKeeperZooKeeper

高可用分布式协调系统high available and reliable

coordination system

高可用分布式协调系统high available and reliable

coordination system

Page 2: Zoo keeper

zk的角色zk的角色

人体 分布式系统

神经 网络

器官 服务器

大脑 zk

Page 3: Zoo keeper

zk的角色zk的角色

Page 4: Zoo keeper

Standalone模式Standalone模式

$ cd zookeeper-3.4.2/$ cat conf/zoo.cfgtickTime=2000dataDir=/var/lib/zookeeperclientPort=2181

Page 5: Zoo keeper

Standalone模式Standalone模式$ bin/zkServer.sh startJMX enabled by defaultUsing config: /home/jxq/code/zookeeper-3.4.2/bin/../conf/ zoo.cfgStarting zookeeper ... STARTED

Connecting to localhost:2181Welcome to ZooKeeper!JLine support is enabled[zk: localhost:2181(CONNECTED) 0] help

$ bin/zkCli.sh -server 127.0.0.1:2181

Page 6: Zoo keeper

Standalone模式Standalone模式[zk: localhost:2181(CONNECTED) 0] help ZooKeeper -server host:port cmd args

connect host:portget path [watch]ls path [watch]set path data [version]rmr pathdelquota [-n|-b] pathquit printwatches on|offcreate [-s] [-e] path data aclstat path [watch]close ls2 path [watch]listquota pathsetAcl path aclgetAcl pathsync pathdelete path [version]

Page 7: Zoo keeper

Standalone模式Standalone模式[zk: localhost:2181(CONNECTED) 1] ls /[zookeeper][zk: localhost:2181(CONNECTED) 2] create /zk_test test_dataCreated /zk_test[zk: localhost:2181(CONNECTED) 3] ls /[zookeeper, zk_test][zk: localhost:2181(CONNECTED) 4] get /zk_testtest_datacZxid = 0x11ctime = Thu Mar 08 15:33:55 CST 2012mZxid = 0x11mtime = Thu Mar 08 15:33:55 CST 2012pZxid = 0x11cversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0x0dataLength = 9numChildren = 0

Page 8: Zoo keeper

Standalone模式Standalone模式[zk: localhost:2181(CONNECTED) 5] set /zk_test test_2cZxid = 0x11ctime = Thu Mar 08 15:33:55 CST 2012mZxid = 0x13mtime = Thu Mar 08 15:46:27 CST 2012pZxid = 0x11cversion = 0dataVersion = 1aclVersion = 0ephemeralOwner = 0x0dataLength = 6numChildren = 0

Page 9: Zoo keeper

Standalone模式Standalone模式[zk: localhost:2181(CONNECTED) 6] get /zk_test watchtest_2cZxid = 0x11ctime = Thu Mar 08 15:33:55 CST 2012mZxid = 0x13mtime = Thu Mar 08 15:46:27 CST 2012pZxid = 0x11cversion = 0dataVersion = 1aclVersion = 0ephemeralOwner = 0x0dataLength = 6numChildren = 0

Page 10: Zoo keeper

Standalone模式Standalone模式[zk: 127.0.0.1:2181(CONNECTED) 6] set /zk_test test_3

WATCHER::cZxid = 0x11

WatchedEvent state:SyncConnected type:NodeDataChanged path:/zk_testctime = Thu Mar 08 15:33:55 CST 2012mZxid = 0x51mtime = Mon Mar 19 10:43:59 CST 2012pZxid = 0x11cversion = 0dataVersion = 11aclVersion = 0ephemeralOwner = 0x0dataLength = 6numChildren = 0

Page 11: Zoo keeper

zk的一致性保证zk的一致性保证• 顺序性:客户端请求顺序生效• 原子性• 单一系统映像• 可靠性:一旦更新请求生效,会持续到下一次请求

Page 12: Zoo keeper

znodeznode

• 3 种 znode• persistent znode: 永久有效地节点

• ephemeral znode: 临时节点

• sequential znode: 顺序节点

• 数据少于 1MB

Page 13: Zoo keeper

watcheswatches

• getData(), getChildren(), exists()• One-time trigger• data watches and child watches• 有序的:• 客户端收到 watch 事件的顺序跟节点发生改变的顺序一致

• 客户端收到 watch 事件后才会看到新数据• 注意延迟:收到 watch 事件和获取新数据之间数据可能改变多次void watcher(zhandle_t *zzh, int type, int state, const char *path, void* context)

Page 14: Zoo keeper

ACLACL• 类似 unix 文件权限• 只对某一节点有效(非递归的)• 权限:• CREATE, READ,WRITE,DELETE,ADMIN

Page 15: Zoo keeper

APIAPI异步 同步

zoo_acreate() zoo_create()

zoo_aexists() zoo_exists()

zoo_aget() zoo_get()

zoo_aget_children() zoo_get_children()

zoo_aset() zoo_set()

zoo_adelete() zoo_delete()

Page 16: Zoo keeper

典型应用典型应用• Naming service• 配置管理• 集群监控• Barriers• 分布式队列• 分布式锁• leader election

Page 17: Zoo keeper

配置管理配置管理• 配置文件、机器列表等等• 集中管理• 服务自动更新配置• 客户端建立 watch• zk 节点内容(配置)更改时推送到客户端

Page 18: Zoo keeper

集群监控集群监控• 每个服务创建“ /clusterServers/

{hostname}” 节点, ephemeral• 监控服务 watch“/clusterServers” 子节点数量

• 被监控服务停止时节点消失,监控服务收到 watch 事件

Page 19: Zoo keeper

BarriersBarriers

Page 20: Zoo keeper

BarriersBarriers• “/barrier/[n]”

1. create(“/barrier/n”, EPHEMERAL_SEQUENTIAL)2. getChildren(“/barrier/”, true),设定 watch,

节点数量变化时通知3. if 节点数量小于 x ,等待 watch通知4. else return5. goto 2

watcher函数: pthread_cond_signal()等待: pthread_cond_wait()

Page 21: Zoo keeper

分布式队列分布式队列• “/q/element_[n]”• 生产者:

• 消费者:

• zk 集群数据一致性

create(“/q” + “/element_”, message, ZOO_SEQUENCE);

get_children();delete();

Page 22: Zoo keeper

分布式锁分布式锁• “_locknode_/lock-[n]”• 获得锁:

• 释放锁:删除节点

1. create(“_locknode_/lock-[n]”)2. getChildren()3. 判断 (1)中创建节点序号是否是最小的,是则取得锁4. exists()判断第二小的节点是否存在,并加 watch5. 如果 exists()返回 false, goto 2。否则等待通

知后再跳到 2

Page 23: Zoo keeper

leader electionleader election

Page 24: Zoo keeper

ZK内部设计ZK内部设计

Page 25: Zoo keeper

• ZooKeeper Atomic Broadcast• 保证:• 消息的可靠传递• 全局顺序• 因果顺序

• 消息传递的两个流程• 选举• 同步

Page 26: Zoo keeper

zk节点的角色zk节点的角色角色 描述leader( 领导者 ) 进行投票的发起和决议

learner( 学习者 )

follower( 跟随者 )

接收客户端请求并返回结果,选举过程中参与投票

observer( 观察者 )

接收客户端请求,将写操作转发给leader 。但是不参与投票过程,只同步 leader 状态。提高读性能。

client( 客户端 ) 发起请求

Page 27: Zoo keeper

zk server工作状态zk server工作状态• LOOKING :当前 server 不知道 leader是谁

• LEADING :当前 server 是 leader• FOLLOWING : leader 已经选举出来

Page 28: Zoo keeper

选举流程选举流程• basic paxos• 每个 Server 启动以后都询问其它的 Server 它要投票给谁,收到所有 Server 回复以后,就计算出 zxid 最大的哪个 Server ,并将这个Server 相关信息设置成下一次要投票的Server 。如果此时获胜的 Server 获得 n/2 + 1 的 Server 票数, 设置当前推荐的 leader为获胜的 Server ,并修改自己状态。

• election.jpg

Page 29: Zoo keeper

同步流程同步流程• leader 等待 server 连接• follower 连接 leader ,将最大的 zxid 发送给 leader

• leader 根据 follower 的 zxid 确定同步点• 完成同步后通知 follower 已经成为

uptodate 状态• follower 收到 uptodate 消息后,又可以重新接受 client 的请求进行服务

Page 30: Zoo keeper