Zookeeperdistributed协调service
1. ZookeeperIntroduction
Zookeeper is a open-source distributed协调service, 用于managementdistributedapplication程序 configurationinformation, 命名service, distributedlock and clustermanagementetc.. 它providing了一个 simple 而强 big API, 用于构建 reliable distributedsystem.
1.1 Zookeeper design目标
- simple 易用: providing simple API, 让Development者able to fast 速 on 手
- high reliability: throughdatacopymechanism, 确保data in nodefailure时不会loss
- high availability性: supportfailure自动转移, 确保service 连续性
- high performance: support high concurrent读写operation
- 严格顺序访问: 保证客户端 request严格按照发送顺序执行
1.2 Zookeeper application场景
- configurationmanagement: 集inmanagementdistributedapplication程序 configurationinformation
- 命名service: for distributedapplication程序providing命名service
- distributedlock: implementationdistributedsystemin lockmechanism
- clustermanagement: managementdistributedcluster nodestatus and 成员relationships
- 领导者选举: in distributedsystemin选举领导者
- queuemanagement: implementationdistributedqueue and priorityqueue
2. Zookeepercore concepts
2.1 datamodel
Zookeeper datamodelclass似于filesystem, 由node (Node) 组成, 每个node称 for znode. znode可以storedata, 也可以package含子node.
/ # 根node ├── app1 # application1 根node │ ├── config # application1 configurationnode │ └── servers # application1 servernode │ ├── server1 # server1node │ └── server2 # server2node └── app2 # application2 根node
2.2 znodeclass型
Zookeeperin znode has 四种class型:
- 持久node (PERSISTENT) : 一旦creation, 除非手动delete, 否则将永久存 in
- 持久顺序node (PERSISTENT_SEQUENTIAL) : in 持久node Basics on , Zookeeper会自动 for node添加一个递增 序号
- 临时node (EPHEMERAL) : 客户端session结束时, node会被自动delete
- 临时顺序node (EPHEMERAL_SEQUENTIAL) : in 临时node Basics on , Zookeeper会自动 for node添加一个递增 序号
2.3 version号
每个znode都 has 三个version号:
- dataVersion: dataversion号, 每次datamodify时递增
- cversion: 子nodeversion号, 每次子nodemodify时递增
- aclVersion: ACLversion号, 每次ACLmodify时递增
2.4 session (Session)
客户端 and Zookeeperserver建立连接 after , 会creation一个session. session has 以 under 特点:
- session has 一个超时时间, 默认 is 30秒
- 客户端需要定期发送心跳message, 以保持session活跃
- session超时 after , 客户端需要重 new 连接
- session结束时, 该sessioncreation 临时node会被自动delete
2.5 监听mechanism (Watch)
Zookeeperproviding了监听mechanism, 客户端可以register监听event, 当znode发生变化时, Zookeeper会notification客户端. 监听eventincluding:
- NodeCreated: nodecreationevent
- NodeDeleted: nodedeleteevent
- NodeDataChanged: nodedata变化event
- NodeChildrenChanged: 子node变化event
Notes
Zookeeper 监听mechanism is 一次性 , 即触发一次 after 就会失效, 需要重 new register.
2.6 ACL (Access Control List)
ZookeeperusingACL来控制 for znode 访问permission, support以 under permission:
- CREATE: creation子node permission
- READ: 读取nodedata and 子nodelist permission
- WRITE: modifynodedata permission
- DELETE: delete子node permission
- ADMIN: modifyACL permission
3. Zookeeperarchitecturedesign
Zookeeperadoptsmaster-slave (Leader-Follower) architecture, 主要package含以 under component:
3.1 serverrole
Zookeeperclusterin server has 三种role:
- Leader: 领导者, 负责processing客户端 写request and cluster 协调工作
- Follower: 跟随者, 负责processing客户端 读request and 转发写request给Leader
- Observer: 观察者, 负责processing客户端 读request, 不参 and 投票选举
3.2 clusterworking principles
Zookeepercluster working principles基于ZAB (Zookeeper Atomic Broadcast) protocol, 该protocolensuredZookeeperclusterindata consistency. ZABprotocolpackage含两个阶段:
- 领导者选举阶段: 当cluster启动 or Leaderfailure时, 选举 new Leader
- 原子广播阶段: Leader将客户端 写request广播给所 has Follower, 确保所 has node dataconsistency
3.3 领导者选举
Zookeeperusing基于投票 领导者选举algorithms, 主要步骤such as under :
- cluster启动时, 每个node都认 for 自己 is Leader, 并向othernode发送投票
- 每个node收 to othernode 投票 after , 根据投票规则update自己 投票
- such as果一个node收 to 超过半数 投票, 它就成 for Leader
- Leadernotification所 has node自己成 for Leader
- othernode成 for Follower, 开始接收Leader commands
Notes
Zookeepercluster node数量通常 for 奇数, 这样可以更 easy 地达成共识. 建议cluster规模 for 3, 5 or 7个node.
4. Zookeeperinstallation and configuration
4.1 installation before 提
installationZookeeper之 before , 需要先installationJava JDK 1.8+.
4.2 installation步骤
# 1. under 载Zookeeper $ wget https://dlcdn.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz # 2. 解压Zookeeper $ tar -xzvf apache-zookeeper-3.7.1-bin.tar.gz $ mv apache-zookeeper-3.7.1-bin /opt/zookeeper # 3. configurationenvironmentvariable $ export ZOOKEEPER_HOME=/opt/zookeeper $ export PATH=$PATH:$ZOOKEEPER_HOME/bin # 4. creationdataTable of Contents and logTable of Contents $ mkdir -p /opt/zookeeper/data /opt/zookeeper/logs # 5. creationconfigurationfile $ cp $ZOOKEEPER_HOME/conf/zoo_sample.cfg $ZOOKEEPER_HOME/conf/zoo.cfg $ vim $ZOOKEEPER_HOME/conf/zoo.cfg # modifyconfigurationfile in 容 dataDir=/opt/zookeeper/data dataLogDir=/opt/zookeeper/logs clientPort=2181 tickTime=2000 initLimit=10 syncLimit=5 server.1=localhost:2888:3888 server.2=localhost:2889:3889 server.3=localhost:2890:3890 # 6. creationmyidfile (单node模式可以跳过) $ echo "1" > /opt/zookeeper/data/myid # 7. 启动Zookeeper $ zkServer.sh start # 8. verificationZookeeper启动 $ zkServer.sh status # 应该看 to Mode: standalone (单node模式) or Mode: leader/follower (cluster模式) # 9. 连接Zookeeper $ zkCli.sh -server localhost:2181
5. Zookeeperbasicoperation
5.1 Zookeeper Shell
Zookeeper Shell is Zookeeperproviding commands行tool, 用于 and Zookeeper交互.
# 连接Zookeeper $ zkCli.sh -server localhost:2181 # creationnode tryit-1: [zk: localhost:2181(CONNECTED) 0] create /app1 "app1 data" Created /app1 # creation子node [zk: localhost:2181(CONNECTED) 1] create /app1/config "config data" Created /app1/config # creation临时node [zk: localhost:2181(CONNECTED) 2] create -e /app1/temp "temp data" Created /app1/temp # creation顺序node [zk: localhost:2181(CONNECTED) 3] create -s /app1/seq "seq data" Created /app1/seq0000000000 # 查看nodedata [zk: localhost:2181(CONNECTED) 4] get /app1 app1 data cZxid = 0x100000002 ctime = Wed Aug 09 10:00:00 CST 2023 mZxid = 0x100000002 mtime = Wed Aug 09 10:00:00 CST 2023 pZxid = 0x100000005 cversion = 3 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 9 numChildren = 3 # modifynodedata [zk: localhost:2181(CONNECTED) 5] set /app1 "updated app1 data" # 查看子node [zk: localhost:2181(CONNECTED) 6] ls /app1 [config, seq0000000000, temp] # deletenode [zk: localhost:2181(CONNECTED) 7] delete /app1/temp # deletepackage含子node node [zk: localhost:2181(CONNECTED) 8] deleteall /app1 # 退出Zookeeper Shell [zk: localhost:2181(CONNECTED) 9] quit
5.2 Java API
Zookeeperproviding了Java API, 用于writingJava程序 and Zookeeper交互.
import org.apache.zookeeper.*;
import org.apache.zookeeper.data.Stat;
import java.io.IOException;
import java.util.List;
import java.util.concurrent.CountDownLatch;
public class ZookeeperExample {
private static final String CONNECT_STRING = "localhost:2181";
private static final int SESSION_TIMEOUT = 30000;
private static final String PATH = "/app1";
private static ZooKeeper zk;
private static CountDownLatch connectedSignal = new CountDownLatch(1);
// 连接监听器
private static Watcher watcher = event -> {
if (event.getState() == Watcher.Event.KeeperState.SyncConnected) {
connectedSignal.countDown();
}
};
public static void main(String[] args) throws IOException, InterruptedException, KeeperException {
// 1. 连接Zookeeper
connect();
try {
// 2. creationnode
createNode();
// 3. 获取nodedata
getData();
// 4. modifynodedata
setData();
// 5. 查看子node
getChildren();
// 6. deletenode
deleteNode();
} finally {
// 7. 关闭连接
close();
}
}
// 连接Zookeeper
private static void connect() throws IOException, InterruptedException {
zk = new ZooKeeper(CONNECT_STRING, SESSION_TIMEOUT, watcher);
connectedSignal.await();
System.out.println("Connected to Zookeeper");
}
// creationnode
private static void createNode() throws KeeperException, InterruptedException {
String path = zk.create(PATH, "app1 data".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
System.out.println("Created node: " + path);
// creation子node
String childPath = zk.create(PATH + "/config", "config data".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
System.out.println("Created child node: " + childPath);
}
// 获取nodedata
private static void getData() throws KeeperException, InterruptedException {
byte[] data = zk.getData(PATH, false, null);
System.out.println("Data at " + PATH + ": " + new String(data));
}
// modifynodedata
private static void setData() throws KeeperException, InterruptedException {
Stat stat = zk.setData(PATH, "updated app1 data".getBytes(), -1);
System.out.println("Updated data at " + PATH + ", dataVersion: " + stat.getDataVersion());
}
// 查看子node
private static void getChildren() throws KeeperException, InterruptedException {
List children = zk.getChildren(PATH, false);
System.out.println("Children of " + PATH + ": " + children);
}
// deletenode
private static void deleteNode() throws KeeperException, InterruptedException {
// 先delete子node
zk.delete(PATH + "/config", -1);
System.out.println("Deleted child node: " + PATH + "/config");
// 再delete父node
zk.delete(PATH, -1);
System.out.println("Deleted node: " + PATH);
}
// 关闭连接
private static void close() throws InterruptedException {
zk.close();
System.out.println("Disconnected from Zookeeper");
}
}
6. Zookeeperapplicationexample
6.1 configurationmanagement
usingZookeepermanagementdistributedapplication程序 configurationinformation:
// 1. in Zookeeperincreationconfigurationnode
$ zkCli.sh -server localhost:2181
[zk: localhost:2181(CONNECTED) 0] create /app/config "key1=value1;key2=value2"
// 2. 客户端监听configurationnode
byte[] data = zk.getData("/app/config", watcher, null);
String config = new String(data);
// 解析configuration
Map configMap = parseConfig(config);
// 3. 当configuration发生变化时, Zookeeper会notification客户端
@Override
public void process(WatchedEvent event) {
if (event.getType() == Event.EventType.NodeDataChanged) {
try {
byte[] newData = zk.getData("/app/config", this, null);
String newConfig = new String(newData);
// updateconfiguration
configMap = parseConfig(newConfig);
System.out.println("Config updated: " + newConfig);
} catch (Exception e) {
e.printStackTrace();
}
}
}
6.2 distributedlock
usingZookeeperimplementationdistributedlock:
public class DistributedLock {
private static final String LOCK_PATH = "/lock";
private ZooKeeper zk;
private String lockNode;
private String threadName;
public DistributedLock(ZooKeeper zk, String threadName) {
this.zk = zk;
this.threadName = threadName;
}
// 获取lock
public boolean lock() throws KeeperException, InterruptedException {
// creation临时顺序node
lockNode = zk.create(LOCK_PATH + "/lock_", threadName.getBytes(),
ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL_SEQUENTIAL);
System.out.println(threadName + " created node: " + lockNode);
// 获取所 has locknode
List lockNodes = zk.getChildren(LOCK_PATH, false);
// sortlocknode
Collections.sort(lockNodes);
// check当 before node is 否 is 第一个
String firstLockNode = LOCK_PATH + "/" + lockNodes.get(0);
if (lockNode.equals(firstLockNode)) {
System.out.println(threadName + " got the lock");
return true;
}
// 监听 before 一个locknode
String prevLockNode = LOCK_PATH + "/" + lockNodes.get(lockNodes.indexOf(lockNode.substring(LOCK_PATH.length() + 1)) - 1);
final CountDownLatch latch = new CountDownLatch(1);
Watcher watcher = event -> {
if (event.getType() == Event.EventType.NodeDeleted) {
latch.countDown();
}
};
zk.getData(prevLockNode, watcher, null);
// etc.待 before 一个locknode释放
latch.await();
System.out.println(threadName + " got the lock");
return true;
}
// 释放lock
public void unlock() throws KeeperException, InterruptedException {
zk.delete(lockNode, -1);
System.out.println(threadName + " released the lock");
}
}
实践练习
练习1: Zookeeperinstallation and configuration
installation并configurationZookeeper, 搭建一个3node Zookeepercluster.
练习2: Zookeeper Shelloperation
usingZookeeper Shellcompletion以 under operation:
- creation一个名 for "myapp" 根node
- in 该node under creation"config", "servers" and "clients"三个子node
- in "servers"node under creation3个顺序node
- 查看"myapp"node data
- modify"config"node data
- 查看"servers"node 子node
- delete所 has node
练习3: Java APIoperation
writingJava程序, usingZookeeper Java APIcompletion以 under operation:
- 连接Zookeeper
- creation一个名 for "test" node
- 获取该node data
- modify该node data
- 监听该node data变化
- delete该node
- 关闭连接
练习4: configurationmanagementexample
usingZookeeperimplementation一个 simple configurationmanagementsystem, including:
- in Zookeeperinstoreconfigurationinformation
- 客户端监听configuration变化
- 当configuration发生变化时, 客户端自动updateconfiguration
7. Zookeeperbest practices
7.1 clusterdesign
- node数量: using奇数个node, 建议3, 5 or 7个node
- 硬件configuration: using high performance server, 确保足够 memory and disk空间
- networkconfiguration: 确保node之间 network连接 stable
- dataTable of Contents: 将dataTable of Contents and logTable of Contents放 in 不同 disk on , improvingperformance
7.2 nodedesign
- node命名: using has 意义 node名称, 便于management
- nodedata big small : 每个node data big small 不宜过 big , 建议不超过1MB
- nodeclass型选择: 根据practicalrequirements选择合适 nodeclass型
7.3 performanceoptimization
- 增加server数量: 增加Zookeepercluster node数量, improvingprocessingcapacity
- usingObserver: for 于读 many 写 few 场景, 可以usingObservernode, improving读performance
- reducing监听event: 合理using监听mechanism, 避免过度using
- 批量operation: using批量operation, reducingnetwork开销
7.4 reliabilitydesign
- databackup: 定期backupZookeeper dataTable of Contents
- monitor and 告警: configurationmonitor and 告警, 及时发现 and 解决issues
- fault tolerancedesign: designfault tolerancemechanism, 确保 in Zookeeperfailure时, application程序able to正常run
8. summarized
本tutorial深入介绍了Zookeeperdistributed协调service basicconcepts, architecturedesign and usingmethod. Zookeeperserving as一个open-source distributed协调service, 具 has simple 易用, high reliability, high availability性 and high performanceetc.特点, widely used indistributedsystemin.
Zookeeper core conceptsincludingdatamodel, znodeclass型, session, 监听mechanism and ACLetc.. Zookeeperadoptsmaster-slavearchitecture, throughZABprotocol确保dataconsistency. Zookeepercluster node数量通常 for 奇数, 这样可以更 easy 地达成共识.
Zookeeper application场景includingconfigurationmanagement, 命名service, distributedlock, clustermanagement, 领导者选举 and queuemanagementetc.. usingZookeeper可以简化distributedsystem Development, improvingsystem reliability and availability.
in usingZookeeper时, 需要注意clusterdesign, nodedesign, performanceoptimization and reliabilitydesignetc.best practices, 以improvingZookeeper performance and reliability.