1. shard原理
MongoDBshard is a水平scalesolution, through将data分布 to many 个server on 来processinglarge-scaledata集 and high throughputoperation. shardcluster由以 under component组成:
1.1 shardclusterarchitecture
shardcluster basicarchitectureincluding:
- shard (Shard) : storedata shardnode, 通常 is a copy集
- configurationserver (Config Server) : storecluster 元data and configurationinformation
- routingnode (Mongos) : serving asapplication程序 and shard之间 in间层, processingqueryrouting
1.2 shardworking principles
MongoDBshard working principles:
- data根据shard键被分成 many 个data块 (chunks)
- data块被分布 to 不同 shard on
- 当data增 long 时, MongoDB会自动 in shard间重 new 平衡data块
- application程序throughmongosroutingnode访问data, 无需关心data分布
提示: shard适用于data量超过单台serverstorecapacity, or 需要更 high 读写throughput 场景.
2. shard键选择
shard键 is documentationin 一个 or many 个字段, 用于确定datasuch as何分布 to shard on . 选择合适 shard键至关 important , 直接影响shardcluster performance and 可scale性.
2.1 shard键 特点
- 基数 high : has 足够 many 不同值, 确保data均匀分布
- 写入分布均匀: 避免写入热点 (所 has 写入都集in in 一个shard)
- query模式匹配: and commonquery模式匹配, improvingqueryperformance
- stable : 值不经常变化, 避免datamigration开销
2.2 shard键class型
| shard键class型 | 优点 | 缺点 | 适用场景 |
|---|---|---|---|
| 递增shard键 (such as时间戳) | 范围query high 效 | 写入热点issues | 时间序列data |
| 随机shard键 (such asUUID) | 写入分布均匀 | 范围query low 效 | 随机写入场景 |
| 复合shard键 | 平衡写入 and query | design complex | many 种query模式 |
2.3 shard键选择best practices
- analysisapplication程序 query模式 and 写入模式
- 选择基数 high 字段serving asshard键
- 避免using单调递增/递减 字段serving as唯一shard键
- 考虑using复合shard键来平衡各种requirements
- 一旦选择shard键, 就无法轻易更改
3. configurationmethod
configurationMongoDBshardcluster 步骤such as under :
3.1 configurationserver
首先启动configurationserver (至 few 3个) :
// 启动configurationserver1
mongod --configsvr --replSet configReplSet --port 27019 --dbpath /data/configdb1
// 启动configurationserver2
mongod --configsvr --replSet configReplSet --port 27019 --dbpath /data/configdb2
// 启动configurationserver3
mongod --configsvr --replSet configReplSet --port 27019 --dbpath /data/configdb3
// 初始化configurationservercopy集
mongo --port 27019
rs.initiate({
_id: "configReplSet",
configsvr: true,
members: [
{ _id: 0, host: "server1:27019" },
{ _id: 1, host: "server2:27019" },
{ _id: 2, host: "server3:27019" }
]
});
3.2 shardnode
启动shardnode (每个shard is a copy集) :
// 启动shard1 copy集
mongod --shardsvr --replSet shard1ReplSet --port 27018 --dbpath /data/shard1db1
mongod --shardsvr --replSet shard1ReplSet --port 27018 --dbpath /data/shard1db2
mongod --shardsvr --replSet shard1ReplSet --port 27018 --dbpath /data/shard1db3
// 初始化shard1copy集
mongo --port 27018
rs.initiate({
_id: "shard1ReplSet",
members: [
{ _id: 0, host: "server1:27018" },
{ _id: 1, host: "server2:27018" },
{ _id: 2, host: "server3:27018" }
]
});
// class似地configurationothershard...
3.3 routingnode
启动routingnode (mongos) :
// 启动mongos mongos --configdb configReplSet/server1:27019,server2:27019,server3:27019 --port 27017
3.4 启用shard
连接 to mongos, 添加shard并启用collectionshard:
// 连接 to mongos
mongo --port 27017
// 添加shard
sh.addShard("shard1ReplSet/server1:27018,server2:27018,server3:27018")
sh.addShard("shard2ReplSet/server4:27018,server5:27018,server6:27018")
// 启用datalibraryshard
sh.enableSharding("mydatabase")
// for collection启用shard并指定shard键
sh.shardCollection("mydatabase.mycollection", { "userId": 1 })
4. management
managementMongoDBshardcluster需要monitorclusterstatus, 平衡data分布, processingshard添加/移除etc.operation.
4.1 monitorshardcluster
// 查看shardstatus sh.status() // 查看shardstatisticsinformation use config db.shards.find() db.chunks.find().count() // 查看平衡器status sh.getBalancerState() sh.getBalancerWindow()
4.2 data平衡
MongoDB会自动 in shard间平衡data块, 但也可以手动控制平衡过程:
// 启用平衡器
sh.startBalancer()
// 禁用平衡器
sh.stopBalancer()
// in specific时间窗口 in run平衡器
sh.setBalancerWindow({ start: "22:00", stop: "06:00" })
4.3 添加 and 移除shard
// 添加 new shard
sh.addShard("newShardReplSet/server7:27018,server8:27018,server9:27018")
// 移除shard (需要先migrationdata)
sh.removeShard("shardToRemove")
// check移除进度
sh.status()
4.4 shardclustermaintenance
- 定期backupconfigurationserver
- monitorshard disk空间usingcircumstances
- 确保所 has shard copy集status正常
- 定期check平衡器log
- monitorqueryperformance, optimizationquery模式
5. 实践case: 构建电商systemshardcluster
fake设我们需要 for 一个 big 型电商system构建shardcluster, processing海量订单data, 具体步骤such as under :
5.1 clusterplanning
- 2个shard, 每个shard is 3nodecopy集
- 3个configurationserver (copy集)
- 2个routingnode (mongos)
5.2 shard键选择
for 于订单collection, 选择复合shard键:
// 订单collectionshard键
sh.shardCollection("ecommerce.orders", { "userId": 1, "orderDate": 1 })
选择理由:
- userId: 确保同一user 订单集instore, improvingqueryefficiency
- orderDate: 避免写入热点, 平衡写入分布
5.3 configuration步骤
// 1. 启动configurationserver
mongod --configsvr --replSet configRS --port 27019 --dbpath /data/configdb1
mongod --configsvr --replSet configRS --port 27019 --dbpath /data/configdb2
mongod --configsvr --replSet configRS --port 27019 --dbpath /data/configdb3
// 2. 初始化configurationservercopy集
mongo --port 27019
rs.initiate({
_id: "configRS",
configsvr: true,
members: [
{ _id: 0, host: "server1:27019" },
{ _id: 1, host: "server2:27019" },
{ _id: 2, host: "server3:27019" }
]
});
// 3. 启动shard1copy集
mongod --shardsvr --replSet shard1RS --port 27018 --dbpath /data/shard1db1
mongod --shardsvr --replSet shard1RS --port 27018 --dbpath /data/shard1db2
mongod --shardsvr --replSet shard1RS --port 27018 --dbpath /data/shard1db3
// 4. 初始化shard1copy集
mongo --port 27018
rs.initiate({
_id: "shard1RS",
members: [
{ _id: 0, host: "server1:27018" },
{ _id: 1, host: "server2:27018" },
{ _id: 2, host: "server3:27018" }
]
});
// 5. class似地configurationshard2
// 6. 启动mongos
mongos --configdb configRS/server1:27019,server2:27019,server3:27019 --port 27017
// 7. 添加shard并启用shard
mongo --port 27017
sh.addShard("shard1RS/server1:27018,server2:27018,server3:27018")
sh.addShard("shard2RS/server4:27018,server5:27018,server6:27018")
sh.enableSharding("ecommerce")
sh.shardCollection("ecommerce.orders", { "userId": 1, "orderDate": 1 })
6. 互动练习
issues1: MongoDBshardcluster componentincluding哪些?
A. shard, configurationserver, routingnode
B. 主node, from node, 仲裁node
C. 主server, backupserver, monitorserver
D. applicationserver, datalibraryserver, cacheserver
issues2: 选择shard键时, 以 under 哪个不 is 理想features?
A. 基数 high
B. 写入分布均匀
C. 值经常变化
D. and query模式匹配
issues3: 递增shard键 主要缺点 is what?
A. 范围query low 效
B. 写入热点issues
C. data分布不均匀
D. queryperformance差
issues4: such as何查看shardclusterstatus?
A. rs.status()
B. db.serverStatus()
C. sh.status()
D. show dbs