Zookeeperdistributed协调service

1. ZookeeperIntroduction

Zookeeper is a open-source distributed协调service, 用于managementdistributedapplication程序 configurationinformation, 命名service, distributedlock and clustermanagementetc.. 它providing了一个 simple 而强 big API, 用于构建 reliable distributedsystem.

1.1 Zookeeper design目标

simple 易用: providing simple API, 让Development者able to fast 速 on 手
high reliability: throughdatacopymechanism, 确保data in nodefailure时不会loss
high availability性: supportfailure自动转移, 确保service 连续性
high performance: support high concurrent读写operation
严格顺序访问: 保证客户端 request严格按照发送顺序执行

1.2 Zookeeper application场景

configurationmanagement: 集inmanagementdistributedapplication程序 configurationinformation
命名service: for distributedapplication程序providing命名service
distributedlock: implementationdistributedsystemin lockmechanism
clustermanagement: managementdistributedcluster nodestatus and 成员relationships
领导者选举: in distributedsystemin选举领导者
queuemanagement: implementationdistributedqueue and priorityqueue

2. Zookeepercore concepts

2.1 datamodel

Zookeeper datamodelclass似于filesystem, 由node (Node) 组成, 每个node称 for znode. znode可以storedata, 也可以package含子node.

/                      # 根node
├── app1                # application1 根node
│   ├── config          # application1 configurationnode
│   └── servers         # application1 servernode
│       ├── server1     # server1node
│       └── server2     # server2node
└── app2                # application2 根node

2.2 znodeclass型

Zookeeperin znode has 四种class型:

持久node (PERSISTENT) : 一旦creation, 除非手动delete, 否则将永久存 in
持久顺序node (PERSISTENT_SEQUENTIAL) : in 持久node Basics on , Zookeeper会自动 for node添加一个递增序号
临时node (EPHEMERAL) : 客户端session结束时, node会被自动delete
临时顺序node (EPHEMERAL_SEQUENTIAL) : in 临时node Basics on , Zookeeper会自动 for node添加一个递增序号

2.3 version号

每个znode都 has 三个version号:

dataVersion: dataversion号, 每次datamodify时递增
cversion: 子nodeversion号, 每次子nodemodify时递增
aclVersion: ACLversion号, 每次ACLmodify时递增

2.4 session (Session)

客户端 and Zookeeperserver建立连接 after , 会creation一个session. session has 以 under 特点:

session has 一个超时时间, 默认 is 30秒
客户端需要定期发送心跳message, 以保持session活跃
session超时 after , 客户端需要重 new 连接
session结束时, 该sessioncreation 临时node会被自动delete

2.5 监听mechanism (Watch)

Zookeeperproviding了监听mechanism, 客户端可以register监听event, 当znode发生变化时, Zookeeper会notification客户端. 监听eventincluding:

NodeCreated: nodecreationevent
NodeDeleted: nodedeleteevent
NodeDataChanged: nodedata变化event
NodeChildrenChanged: 子node变化event

Notes

Zookeeper 监听mechanism is 一次性 , 即触发一次 after 就会失效, 需要重 new register.

2.6 ACL (Access Control List)

ZookeeperusingACL来控制 for znode 访问permission, support以 under permission:

CREATE: creation子node permission
READ: 读取nodedata and 子nodelist permission
WRITE: modifynodedata permission
DELETE: delete子node permission
ADMIN: modifyACL permission

3. Zookeeperarchitecturedesign

Zookeeperadoptsmaster-slave (Leader-Follower) architecture, 主要package含以 under component:

3.1 serverrole

Zookeeperclusterin server has 三种role:

Leader: 领导者, 负责processing客户端写request and cluster 协调工作
Follower: 跟随者, 负责processing客户端读request and 转发写request给Leader
Observer: 观察者, 负责processing客户端读request, 不参 and 投票选举

3.2 clusterworking principles

Zookeepercluster working principles基于ZAB (Zookeeper Atomic Broadcast) protocol, 该protocolensuredZookeeperclusterindata consistency. ZABprotocolpackage含两个阶段:

领导者选举阶段: 当cluster启动 or Leaderfailure时, 选举 new Leader
原子广播阶段: Leader将客户端写request广播给所 has Follower, 确保所 has node dataconsistency

3.3 领导者选举

Zookeeperusing基于投票领导者选举algorithms, 主要步骤such as under :

cluster启动时, 每个node都认 for 自己 is Leader, 并向othernode发送投票
每个node收 to othernode 投票 after , 根据投票规则update自己投票
such as果一个node收 to 超过半数投票, 它就成 for Leader
Leadernotification所 has node自己成 for Leader
othernode成 for Follower, 开始接收Leader commands

Notes

Zookeepercluster node数量通常 for 奇数, 这样可以更 easy 地达成共识. 建议cluster规模 for 3, 5 or 7个node.

4. Zookeeperinstallation and configuration

4.1 installation before 提

installationZookeeper之 before , 需要先installationJava JDK 1.8+.

4.2 installation步骤

# 1.  under 载Zookeeper
$ wget https://dlcdn.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz

# 2. 解压Zookeeper
$ tar -xzvf apache-zookeeper-3.7.1-bin.tar.gz
$ mv apache-zookeeper-3.7.1-bin /opt/zookeeper

# 3. configurationenvironmentvariable
$ export ZOOKEEPER_HOME=/opt/zookeeper
$ export PATH=$PATH:$ZOOKEEPER_HOME/bin

# 4. creationdataTable of Contents and logTable of Contents
$ mkdir -p /opt/zookeeper/data /opt/zookeeper/logs

# 5. creationconfigurationfile
$ cp $ZOOKEEPER_HOME/conf/zoo_sample.cfg $ZOOKEEPER_HOME/conf/zoo.cfg
$ vim $ZOOKEEPER_HOME/conf/zoo.cfg

# modifyconfigurationfile in 容
dataDir=/opt/zookeeper/data
dataLogDir=/opt/zookeeper/logs
clientPort=2181
tickTime=2000
initLimit=10
syncLimit=5
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

# 6. creationmyidfile (单node模式可以跳过) 
$ echo "1" > /opt/zookeeper/data/myid

# 7. 启动Zookeeper
$ zkServer.sh start

# 8. verificationZookeeper启动
$ zkServer.sh status
# 应该看 to Mode: standalone (单node模式)  or Mode: leader/follower (cluster模式) 

# 9. 连接Zookeeper
$ zkCli.sh -server localhost:2181

5. Zookeeperbasicoperation

5.1 Zookeeper Shell

Zookeeper Shell is Zookeeperproviding commands行tool, 用于 and Zookeeper交互.

# 连接Zookeeper
$ zkCli.sh -server localhost:2181

# creationnode
tryit-1:
[zk: localhost:2181(CONNECTED) 0] create /app1 "app1 data"
Created /app1

# creation子node
[zk: localhost:2181(CONNECTED) 1] create /app1/config "config data"
Created /app1/config

# creation临时node
[zk: localhost:2181(CONNECTED) 2] create -e /app1/temp "temp data"
Created /app1/temp

# creation顺序node
[zk: localhost:2181(CONNECTED) 3] create -s /app1/seq "seq data"
Created /app1/seq0000000000

# 查看nodedata
[zk: localhost:2181(CONNECTED) 4] get /app1
app1 data
cZxid = 0x100000002
ctime = Wed Aug 09 10:00:00 CST 2023
mZxid = 0x100000002
mtime = Wed Aug 09 10:00:00 CST 2023
pZxid = 0x100000005
cversion = 3
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 9
numChildren = 3

# modifynodedata
[zk: localhost:2181(CONNECTED) 5] set /app1 "updated app1 data"

# 查看子node
[zk: localhost:2181(CONNECTED) 6] ls /app1
[config, seq0000000000, temp]

# deletenode
[zk: localhost:2181(CONNECTED) 7] delete /app1/temp

# deletepackage含子node node
[zk: localhost:2181(CONNECTED) 8] deleteall /app1

# 退出Zookeeper Shell
[zk: localhost:2181(CONNECTED) 9] quit

5.2 Java API

Zookeeperproviding了Java API, 用于writingJava程序 and Zookeeper交互.

import org.apache.zookeeper.*;
import org.apache.zookeeper.data.Stat;

import java.io.IOException;
import java.util.List;
import java.util.concurrent.CountDownLatch;

public class ZookeeperExample {
    private static final String CONNECT_STRING = "localhost:2181";
    private static final int SESSION_TIMEOUT = 30000;
    private static final String PATH = "/app1";
    private static ZooKeeper zk;
    private static CountDownLatch connectedSignal = new CountDownLatch(1);

    // 连接监听器
    private static Watcher watcher = event -> {
        if (event.getState() == Watcher.Event.KeeperState.SyncConnected) {
            connectedSignal.countDown();
        }
    };

    public static void main(String[] args) throws IOException, InterruptedException, KeeperException {
        // 1. 连接Zookeeper
        connect();

        try {
            // 2. creationnode
            createNode();

            // 3. 获取nodedata
            getData();

            // 4. modifynodedata
            setData();

            // 5. 查看子node
            getChildren();

            // 6. deletenode
            deleteNode();
        } finally {
            // 7. 关闭连接
            close();
        }
    }

    // 连接Zookeeper
    private static void connect() throws IOException, InterruptedException {
        zk = new ZooKeeper(CONNECT_STRING, SESSION_TIMEOUT, watcher);
        connectedSignal.await();
        System.out.println("Connected to Zookeeper");
    }

    // creationnode
    private static void createNode() throws KeeperException, InterruptedException {
        String path = zk.create(PATH, "app1 data".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
        System.out.println("Created node: " + path);

        // creation子node
        String childPath = zk.create(PATH + "/config", "config data".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
        System.out.println("Created child node: " + childPath);
    }

    // 获取nodedata
    private static void getData() throws KeeperException, InterruptedException {
        byte[] data = zk.getData(PATH, false, null);
        System.out.println("Data at " + PATH + ": " + new String(data));
    }

    // modifynodedata
    private static void setData() throws KeeperException, InterruptedException {
        Stat stat = zk.setData(PATH, "updated app1 data".getBytes(), -1);
        System.out.println("Updated data at " + PATH + ", dataVersion: " + stat.getDataVersion());
    }

    // 查看子node
    private static void getChildren() throws KeeperException, InterruptedException {
        List children = zk.getChildren(PATH, false);
        System.out.println("Children of " + PATH + ": " + children);
    }

    // deletenode
    private static void deleteNode() throws KeeperException, InterruptedException {
        // 先delete子node
        zk.delete(PATH + "/config", -1);
        System.out.println("Deleted child node: " + PATH + "/config");

        // 再delete父node
        zk.delete(PATH, -1);
        System.out.println("Deleted node: " + PATH);
    }

    // 关闭连接
    private static void close() throws InterruptedException {
        zk.close();
        System.out.println("Disconnected from Zookeeper");
    }
}

6. Zookeeperapplicationexample

6.1 configurationmanagement

usingZookeepermanagementdistributedapplication程序 configurationinformation:

// 1.  in Zookeeperincreationconfigurationnode
$ zkCli.sh -server localhost:2181
[zk: localhost:2181(CONNECTED) 0] create /app/config "key1=value1;key2=value2"

// 2. 客户端监听configurationnode
byte[] data = zk.getData("/app/config", watcher, null);
String config = new String(data);
// 解析configuration
Map configMap = parseConfig(config);

// 3. 当configuration发生变化时, Zookeeper会notification客户端
@Override
public void process(WatchedEvent event) {
    if (event.getType() == Event.EventType.NodeDataChanged) {
        try {
            byte[] newData = zk.getData("/app/config", this, null);
            String newConfig = new String(newData);
            // updateconfiguration
            configMap = parseConfig(newConfig);
            System.out.println("Config updated: " + newConfig);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

6.2 distributedlock

usingZookeeperimplementationdistributedlock:

public class DistributedLock {
    private static final String LOCK_PATH = "/lock";
    private ZooKeeper zk;
    private String lockNode;
    private String threadName;

    public DistributedLock(ZooKeeper zk, String threadName) {
        this.zk = zk;
        this.threadName = threadName;
    }

    // 获取lock
    public boolean lock() throws KeeperException, InterruptedException {
        // creation临时顺序node
        lockNode = zk.create(LOCK_PATH + "/lock_", threadName.getBytes(), 
                            ZooDefs.Ids.OPEN_ACL_UNSAFE, 
                            CreateMode.EPHEMERAL_SEQUENTIAL);
        System.out.println(threadName + " created node: " + lockNode);

        // 获取所 has locknode
        List lockNodes = zk.getChildren(LOCK_PATH, false);
        // sortlocknode
        Collections.sort(lockNodes);

        // check当 before node is 否 is 第一个
        String firstLockNode = LOCK_PATH + "/" + lockNodes.get(0);
        if (lockNode.equals(firstLockNode)) {
            System.out.println(threadName + " got the lock");
            return true;
        }

        // 监听 before 一个locknode
        String prevLockNode = LOCK_PATH + "/" + lockNodes.get(lockNodes.indexOf(lockNode.substring(LOCK_PATH.length() + 1)) - 1);
        final CountDownLatch latch = new CountDownLatch(1);
        Watcher watcher = event -> {
            if (event.getType() == Event.EventType.NodeDeleted) {
                latch.countDown();
            }
        };
        zk.getData(prevLockNode, watcher, null);
        
        // etc.待 before 一个locknode释放
        latch.await();
        System.out.println(threadName + " got the lock");
        return true;
    }

    // 释放lock
    public void unlock() throws KeeperException, InterruptedException {
        zk.delete(lockNode, -1);
        System.out.println(threadName + " released the lock");
    }
}

实践练习

练习1: Zookeeperinstallation and configuration

installation并configurationZookeeper, 搭建一个3node Zookeepercluster.

练习2: Zookeeper Shelloperation

usingZookeeper Shellcompletion以 under operation:

creation一个名 for "myapp" 根node
in 该node under creation"config", "servers" and "clients"三个子node
in "servers"node under creation3个顺序node
查看"myapp"node data
modify"config"node data
查看"servers"node 子node
delete所 has node

练习3: Java APIoperation

writingJava程序, usingZookeeper Java APIcompletion以 under operation:

连接Zookeeper
creation一个名 for "test" node
获取该node data
modify该node data
监听该node data变化
delete该node
关闭连接

练习4: configurationmanagementexample

usingZookeeperimplementation一个 simple configurationmanagementsystem, including:

in Zookeeperinstoreconfigurationinformation
客户端监听configuration变化
当configuration发生变化时, 客户端自动updateconfiguration

7. Zookeeperbest practices

7.1 clusterdesign

node数量: using奇数个node, 建议3, 5 or 7个node
硬件configuration: using high performance server, 确保足够 memory and disk空间
networkconfiguration: 确保node之间 network连接 stable
dataTable of Contents: 将dataTable of Contents and logTable of Contents放 in 不同 disk on , improvingperformance

7.2 nodedesign

node命名: using has 意义 node名称, 便于management
nodedata big small : 每个node data big small 不宜过 big , 建议不超过1MB
nodeclass型选择: 根据practicalrequirements选择合适 nodeclass型

7.3 performanceoptimization

增加server数量: 增加Zookeepercluster node数量, improvingprocessingcapacity
usingObserver: for 于读 many 写 few 场景, 可以usingObservernode, improving读performance
reducing监听event: 合理using监听mechanism, 避免过度using
批量operation: using批量operation, reducingnetwork开销

7.4 reliabilitydesign

databackup: 定期backupZookeeper dataTable of Contents
monitor and 告警: configurationmonitor and 告警, 及时发现 and 解决issues
fault tolerancedesign: designfault tolerancemechanism, 确保 in Zookeeperfailure时, application程序able to正常run

8. summarized

本tutorial深入介绍了Zookeeperdistributed协调service basicconcepts, architecturedesign and usingmethod. Zookeeperserving as一个open-source distributed协调service, 具 has simple 易用, high reliability, high availability性 and high performanceetc.特点, widely used indistributedsystemin.

Zookeeper core conceptsincludingdatamodel, znodeclass型, session, 监听mechanism and ACLetc.. Zookeeperadoptsmaster-slavearchitecture, throughZABprotocol确保dataconsistency. Zookeepercluster node数量通常 for 奇数, 这样可以更 easy 地达成共识.

Zookeeper application场景includingconfigurationmanagement, 命名service, distributedlock, clustermanagement, 领导者选举 and queuemanagementetc.. usingZookeeper可以简化distributedsystem Development, improvingsystem reliability and availability.

in usingZookeeper时, 需要注意clusterdesign, nodedesign, performanceoptimization and reliabilitydesignetc.best practices, 以improvingZookeeper performance and reliability.

on 一课: HBasedistributeddatalibrary 返回tutoriallist under 一课: Flumedistributedlog收集