MongoDBshardtutorial

LearningMongoDBshard 原理, shard键选择, configurationmethod and management, 构建可scale MongoDBcluster

1. shard原理

MongoDBshard is a水平scalesolution, through将data分布 to many 个server on 来processinglarge-scaledata集 and high throughputoperation. shardcluster由以 under component组成:

1.1 shardclusterarchitecture

shardcluster basicarchitectureincluding:

  • shard (Shard) : storedata shardnode, 通常 is a copy集
  • configurationserver (Config Server) : storecluster 元data and configurationinformation
  • routingnode (Mongos) : serving asapplication程序 and shard之间 in间层, processingqueryrouting

1.2 shardworking principles

MongoDBshard working principles:

  • data根据shard键被分成 many 个data块 (chunks)
  • data块被分布 to 不同 shard on
  • 当data增 long 时, MongoDB会自动 in shard间重 new 平衡data块
  • application程序throughmongosroutingnode访问data, 无需关心data分布

提示: shard适用于data量超过单台serverstorecapacity, or 需要更 high 读写throughput 场景.

2. shard键选择

shard键 is documentationin 一个 or many 个字段, 用于确定datasuch as何分布 to shard on . 选择合适 shard键至关 important , 直接影响shardcluster performance and 可scale性.

2.1 shard键 特点

  • 基数 high : has 足够 many 不同值, 确保data均匀分布
  • 写入分布均匀: 避免写入热点 (所 has 写入都集in in 一个shard)
  • query模式匹配: and commonquery模式匹配, improvingqueryperformance
  • stable : 值不经常变化, 避免datamigration开销

2.2 shard键class型

shard键class型 优点 缺点 适用场景
递增shard键 (such as时间戳) 范围query high 效 写入热点issues 时间序列data
随机shard键 (such asUUID) 写入分布均匀 范围query low 效 随机写入场景
复合shard键 平衡写入 and query design complex many 种query模式

2.3 shard键选择best practices

  • analysisapplication程序 query模式 and 写入模式
  • 选择基数 high 字段serving asshard键
  • 避免using单调递增/递减 字段serving as唯一shard键
  • 考虑using复合shard键来平衡各种requirements
  • 一旦选择shard键, 就无法轻易更改

3. configurationmethod

configurationMongoDBshardcluster 步骤such as under :

3.1 configurationserver

首先启动configurationserver (至 few 3个) :

// 启动configurationserver1
mongod --configsvr --replSet configReplSet --port 27019 --dbpath /data/configdb1

// 启动configurationserver2
mongod --configsvr --replSet configReplSet --port 27019 --dbpath /data/configdb2

// 启动configurationserver3
mongod --configsvr --replSet configReplSet --port 27019 --dbpath /data/configdb3

// 初始化configurationservercopy集
mongo --port 27019
rs.initiate({
   _id: "configReplSet",
   configsvr: true,
   members: [
      { _id: 0, host: "server1:27019" },
      { _id: 1, host: "server2:27019" },
      { _id: 2, host: "server3:27019" }
   ]
});

3.2 shardnode

启动shardnode (每个shard is a copy集) :

// 启动shard1 copy集
mongod --shardsvr --replSet shard1ReplSet --port 27018 --dbpath /data/shard1db1
mongod --shardsvr --replSet shard1ReplSet --port 27018 --dbpath /data/shard1db2
mongod --shardsvr --replSet shard1ReplSet --port 27018 --dbpath /data/shard1db3

// 初始化shard1copy集
mongo --port 27018
rs.initiate({
   _id: "shard1ReplSet",
   members: [
      { _id: 0, host: "server1:27018" },
      { _id: 1, host: "server2:27018" },
      { _id: 2, host: "server3:27018" }
   ]
});

// class似地configurationothershard...

3.3 routingnode

启动routingnode (mongos) :

// 启动mongos
mongos --configdb configReplSet/server1:27019,server2:27019,server3:27019 --port 27017

3.4 启用shard

连接 to mongos, 添加shard并启用collectionshard:

// 连接 to mongos
mongo --port 27017

// 添加shard
sh.addShard("shard1ReplSet/server1:27018,server2:27018,server3:27018")
sh.addShard("shard2ReplSet/server4:27018,server5:27018,server6:27018")

// 启用datalibraryshard
sh.enableSharding("mydatabase")

//  for collection启用shard并指定shard键
sh.shardCollection("mydatabase.mycollection", { "userId": 1 })

4. management

managementMongoDBshardcluster需要monitorclusterstatus, 平衡data分布, processingshard添加/移除etc.operation.

4.1 monitorshardcluster

// 查看shardstatus
sh.status()

// 查看shardstatisticsinformation
use config
db.shards.find()
db.chunks.find().count()

// 查看平衡器status
sh.getBalancerState()
sh.getBalancerWindow()

4.2 data平衡

MongoDB会自动 in shard间平衡data块, 但也可以手动控制平衡过程:

// 启用平衡器
sh.startBalancer()

// 禁用平衡器
sh.stopBalancer()

//  in specific时间窗口 in run平衡器
sh.setBalancerWindow({ start: "22:00", stop: "06:00" })

4.3 添加 and 移除shard

// 添加 new shard
sh.addShard("newShardReplSet/server7:27018,server8:27018,server9:27018")

// 移除shard (需要先migrationdata) 
sh.removeShard("shardToRemove")

// check移除进度
sh.status()

4.4 shardclustermaintenance

  • 定期backupconfigurationserver
  • monitorshard disk空间usingcircumstances
  • 确保所 has shard copy集status正常
  • 定期check平衡器log
  • monitorqueryperformance, optimizationquery模式

5. 实践case: 构建电商systemshardcluster

fake设我们需要 for 一个 big 型电商system构建shardcluster, processing海量订单data, 具体步骤such as under :

5.1 clusterplanning

  • 2个shard, 每个shard is 3nodecopy集
  • 3个configurationserver (copy集)
  • 2个routingnode (mongos)

5.2 shard键选择

for 于订单collection, 选择复合shard键:

// 订单collectionshard键
sh.shardCollection("ecommerce.orders", { "userId": 1, "orderDate": 1 })

选择理由:

  • userId: 确保同一user 订单集instore, improvingqueryefficiency
  • orderDate: 避免写入热点, 平衡写入分布

5.3 configuration步骤

// 1. 启动configurationserver
mongod --configsvr --replSet configRS --port 27019 --dbpath /data/configdb1
mongod --configsvr --replSet configRS --port 27019 --dbpath /data/configdb2
mongod --configsvr --replSet configRS --port 27019 --dbpath /data/configdb3

// 2. 初始化configurationservercopy集
mongo --port 27019
rs.initiate({
   _id: "configRS",
   configsvr: true,
   members: [
      { _id: 0, host: "server1:27019" },
      { _id: 1, host: "server2:27019" },
      { _id: 2, host: "server3:27019" }
   ]
});

// 3. 启动shard1copy集
mongod --shardsvr --replSet shard1RS --port 27018 --dbpath /data/shard1db1
mongod --shardsvr --replSet shard1RS --port 27018 --dbpath /data/shard1db2
mongod --shardsvr --replSet shard1RS --port 27018 --dbpath /data/shard1db3

// 4. 初始化shard1copy集
mongo --port 27018
rs.initiate({
   _id: "shard1RS",
   members: [
      { _id: 0, host: "server1:27018" },
      { _id: 1, host: "server2:27018" },
      { _id: 2, host: "server3:27018" }
   ]
});

// 5. class似地configurationshard2

// 6. 启动mongos
mongos --configdb configRS/server1:27019,server2:27019,server3:27019 --port 27017

// 7. 添加shard并启用shard
mongo --port 27017
sh.addShard("shard1RS/server1:27018,server2:27018,server3:27018")
sh.addShard("shard2RS/server4:27018,server5:27018,server6:27018")
sh.enableSharding("ecommerce")
sh.shardCollection("ecommerce.orders", { "userId": 1, "orderDate": 1 })

6. 互动练习

issues1: MongoDBshardcluster componentincluding哪些?

A. shard, configurationserver, routingnode

B. 主node, from node, 仲裁node

C. 主server, backupserver, monitorserver

D. applicationserver, datalibraryserver, cacheserver

issues2: 选择shard键时, 以 under 哪个不 is 理想features?

A. 基数 high

B. 写入分布均匀

C. 值经常变化

D. and query模式匹配

issues3: 递增shard键 主要缺点 is what?

A. 范围query low 效

B. 写入热点issues

C. data分布不均匀

D. queryperformance差

issues4: such as何查看shardclusterstatus?

A. rs.status()

B. db.serverStatus()

C. sh.status()

D. show dbs

7. 推荐链接