1. modeldeploymentoverview
modeldeployment is 将训练 good 机器Learningmodelapplication to practicalproduceenvironmentin 过程. in TensorFlowin, has many 种modeldeployment方式, 适用于不同 场景 and 设备. 选择合适 deployment方式取决于 many 个因素, including:
- deploymentenvironment: server, move设备, 嵌入式设备, Web浏览器etc.
- performance要求: latency, throughput, resource占用etc.
- Development成本: deploymentcomplexity, maintenance成本etc.
- ecosystem: and 现 has system 集成难度etc.
1.1 TensorFlow deploymentsolutions
TensorFlowproviding了 many 种deploymentsolutions, 主要including:
- TensorFlow Serving: 用于server端deployment, supportmodelversionmanagement and REST/gRPC API
- TensorFlow Lite: 用于move设备 and 嵌入式设备deployment, model体积 small , performance high
- TensorFlow.js: 用于Web浏览器 and Node.jsenvironmentdeployment, supportJavaScriptrun
- SavedModel: commonmodel格式, 可用于 many 种deployment场景
- ONNX: 开放神经network交换格式, support跨frameworkdeployment
2. model保存格式
in deploymentmodel之 before , 需要将训练 good model保存 for 合适 格式. TensorFlowsupport many 种model保存格式, 其in最常用 is SavedModel格式.
2.1 SavedModel 格式
SavedModel is TensorFlow 标准model保存格式, 它package含了model 计算graph and 权重, support跨平台deployment. usingKeras API可以easily将model保存 for SavedModel格式:
import tensorflow as tf
from tensorflow.keras import layers
# creation一个 simple model
model = tf.keras.Sequential([
layers.Dense(64, activation='relu', input_shape=(784,)),
layers.Dense(10, activation='softmax')
])
# 编译model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# 保存 for SavedModel格式
tf.saved_model.save(model, 'saved_model/my_model')
保存 after , 会生成一个Table of Contentsstructure, package含model 计算graph and 权重file:
saved_model/
└── my_model/
├── assets/
├── saved_model.pb # model 计算graph
└── variables/
├── variables.data-00000-of-00001
└── variables.index
2.2 HDF5 格式
除了SavedModel格式, Keras还supportHDF5格式, 适合保存完整 Kerasmodel:
# 保存 for HDF5格式
model.save('my_model.h5')
# 加载HDF5格式model
loaded_model = tf.keras.models.load_model('my_model.h5')
提示
for 于Kerasmodel, SavedModel格式 and HDF5格式都可以using, 但SavedModel格式更适合produceenvironmentdeployment, 因 for It supports更flexible deployment选项 and 更 good performance.
3. TensorFlow Serving deployment
TensorFlow Serving is a high performance 机器Learningmodelservicesystem, designed forproduceenvironmentdesign. It supportsmodelversionmanagement, REST and gRPC API, 以及自动model热加载.
3.1 installation TensorFlow Serving
可以usingDocker or 直接installation 方式deploymentTensorFlow Serving:
3.1.1 using Docker deployment
# pull TensorFlow Serving 镜像
docker pull tensorflow/serving
# 启动 TensorFlow Serving containers
docker run -p 8501:8501 -p 8500:8500 \
--mount type=bind,source="$(pwd)/saved_model/my_model",target=/models/my_model \
-e MODEL_NAME=my_model -t tensorflow/serving
3.1.2 直接installation
in Ubuntusystem on , 可以usingapt-getinstallation:
echo "deb [arch=amd64] http://store.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list
curl https://store.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install tensorflow-model-server
3.2 using REST API 访问model
TensorFlow Serving启动 after , 可以throughREST API访问model:
import requests
import numpy as np
# 准备testdata
test_data = np.random.rand(1, 784).tolist()
# 发送request
response = requests.post(
'http://localhost:8501/v1/models/my_model:predict',
json={'instances': test_data}
)
# processingresponse
predictions = response.json()['predictions']
print(predictions)
3.3 using gRPC API 访问model
gRPC API比REST APIperformance更 high , 适合 high throughput场景:
import grpc
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import tensorflow as tf
# creationgRPC通道
channel = grpc.insecure_channel('localhost:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
# 准备request
request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.model_spec.signature_name = 'serving_default'
# 准备testdata
test_data = tf.random.normal([1, 784])
request.inputs['dense_input'].CopyFrom(
tf.make_tensor_proto(test_data)
)
# 发送request
response = stub.Predict(request, 10.0) # 10秒超时
# processingresponse
predictions = tf.make_ndarray(response.outputs['dense_1'])
print(predictions)
4. TensorFlow Lite deployment
TensorFlow Lite is TensorFlow 轻量级solution, designed formove设备, 嵌入式设备 and IoT设备design. 它providing了更 small model体积 and 更 fast 推理速度.
4.1 model转换
要usingTensorFlow Lite, 需要将SavedModel转换 for TensorFlow Litemodel格式 (.tflite) :
import tensorflow as tf
# 加载SavedModel
model = tf.saved_model.load('saved_model/my_model')
# creation转换器
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/my_model')
# optimizationmodel (可选)
converter.optimizations = [tf.lite.optimization.DEFAULT]
# 转换model
tflite_model = converter.convert()
# 保存TensorFlow Litemodel
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
4.2 model推理
in Pythonenvironmentin, 可以usingTensorFlow Lite Interpreterfor推理:
import tensorflow as tf
import numpy as np
# 加载TensorFlow Litemodel
interpreter = tf.lite.Interpreter(model_path='model.tflite')
interpreter.allocate_tensors()
# 获取输入 and 输出张量
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# 准备testdata
input_data = np.random.rand(1, 784).astype(np.float32)
# 设置输入data
interpreter.set_tensor(input_details[0]['index'], input_data)
# 执行推理
interpreter.invoke()
# 获取输出结果
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
4.3 move设备deployment
in move设备 on , 可以usingTensorFlow Lite Android or iOSlibraryfordeployment:
4.3.1 Android deployment
in Androidproject build.gradlein添加依赖:
dependencies {
implementation 'org.tensorflow:tensorflow-lite:2.15.0'
implementation 'org.tensorflow:tensorflow-lite-gpu:2.15.0' // 可选, GPU加速
}
4.3.2 iOS deployment
in iOSproject Podfilein添加依赖:
pod 'TensorFlowLiteSwift', '~> 2.15.0'
5. TensorFlow.js deployment
TensorFlow.js is TensorFlow JavaScriptimplementation, 可以 in Web浏览器 and Node.jsenvironmentinrun. It supports from Kerasmodel or SavedModel转换 for TensorFlow.jsmodel.
5.1 model转换
usingtfjs-convertertool将Kerasmodel or SavedModel转换 for TensorFlow.jsmodel:
# installation tfjs-converter
pip install tensorflowjs
# 转换Kerasmodel
tensorflowjs_converter --input_format=keras my_model.h5 tfjs_model
# 转换SavedModel
tensorflowjs_converter --input_format=saved_model saved_model/my_model tfjs_model
5.2 in 浏览器inusing
in HTMLfilein引入TensorFlow.jslibrary, 并加载model:
<!DOCTYPE html>
<html>
<head>
<title>TensorFlow.js modeldeployment</title>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@4.10.0/dist/tf.min.js"></script>
</head>
<body>
<script>
async function run() {
// 加载model
const model = await tf.loadLayersModel('tfjs_model/model.json');
// 准备testdata
const input = tf.randomNormal([1, 784]);
// 执行推理
const output = model.predict(input);
output.print();
}
run();
</script>
</body>
</html>
5.3 in Node.js inusing
in Node.jsenvironmentin, 可以using@tensorflow/tfjs-nodelibrary:
// installation依赖
// npm install @tensorflow/tfjs @tensorflow/tfjs-node
const tf = require('@tensorflow/tfjs');
require('@tensorflow/tfjs-node');
async function run() {
// 加载model
const model = await tf.loadLayersModel('file://tfjs_model/model.json');
// 准备testdata
const input = tf.randomNormal([1, 784]);
// 执行推理
const output = model.predict(input);
output.print();
}
run();
6. modeloptimization
in deploymentmodel之 before , 通常需要 for modelforoptimization, 以improvingperformance and reducingresource占用. TensorFlowproviding了 many 种modeloptimizationtechniques:
6.1 model量化
model量化 is 将浮点数model转换 for 定点数model (such as8位整数) 过程, 可以reducingmodel体积 and improving推理速度:
# using TensorFlow Lite for量化
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/my_model')
converter.optimizations = [tf.lite.optimization.DEFAULT]
# 动态范围量化 (默认)
tflite_model_dynamic = converter.convert()
# 全整数量化 (需要校准data)
def representative_data_gen():
for _ in range(100):
yield [np.random.rand(1, 784).astype(np.float32)]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model_int8 = converter.convert()
6.2 model剪枝
model剪枝 is 移除modelin不 important 权重 and 神经元, 以reducingmodel体积 and improving推理速度:
import tensorflow_model_optimization as tfmot
# creation一个 simple model
model = tf.keras.Sequential([
layers.Dense(64, activation='relu', input_shape=(784,)),
layers.Dense(10, activation='softmax')
])
# application剪枝
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude
# 定义剪枝parameter
pruning_params = {
'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.50,
final_sparsity=0.80,
begin_step=0,
end_step=1000)
}
# creation剪枝model
model_for_pruning = prune_low_magnitude(model, **pruning_params)
# 编译model
model_for_pruning.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# 训练model
model_for_pruning.fit(x_train, y_train, epochs=10)
# 移除剪枝package装
model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)
# 保存model
tf.saved_model.save(model_for_export, 'saved_model/pruned_model')
6.3 knowledge蒸馏
knowledge蒸馏 is 将 complex model (教师model) knowledgemigration to simple model (学生model) 过程, 可以 in 保持performance 同时reducingmodelcomplexity:
# creation教师model ( complex model)
teacher_model = tf.keras.Sequential([
layers.Dense(256, activation='relu', input_shape=(784,)),
layers.Dense(128, activation='relu'),
layers.Dense(10, activation='softmax')
])
# 训练教师model
teacher_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
teacher_model.fit(x_train, y_train, epochs=20)
# creation学生model ( simple model)
student_model = tf.keras.Sequential([
layers.Dense(64, activation='relu', input_shape=(784,)),
layers.Dense(10, activation='softmax')
])
# 定义knowledge蒸馏损失function
def distillation_loss(y_true, y_pred, teacher_pred, temperature=2.0, alpha=0.5):
# 软目标损失
soft_loss = tf.keras.losses.categorical_crossentropy(
tf.nn.softmax(teacher_pred / temperature),
tf.nn.softmax(y_pred / temperature)
) * (temperature ** 2)
# 硬目标损失
hard_loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred)
# 总损失
return alpha * soft_loss + (1 - alpha) * hard_loss
# 训练学生model
for epoch in range(20):
for step, (x_batch, y_batch) in enumerate(train_dataset):
with tf.GradientTape() as tape:
# 教师model预测
teacher_pred = teacher_model(x_batch, training=False)
# 学生model预测
student_pred = student_model(x_batch, training=True)
# 计算损失
loss = distillation_loss(y_batch, student_pred, teacher_pred)
# 计算梯度
gradients = tape.gradient(loss, student_model.trainable_variables)
# update权重
optimizer.apply_gradients(zip(gradients, student_model.trainable_variables))
7. modelmonitor and maintenance
modeldeployment to produceenvironment after , 需要formonitor and maintenance, 以确保modelperformance and 准确性:
7.1 monitor指标
- performance指标: latency, throughput, resource占用 (CPU, memory, GPU)
- 准确性指标: 准确率, 精确率, 召回率, F1值etc.
- 业务指标: 转化率, 点击率, user满意度etc.
7.2 modelupdate
当modelperformance under 降 or 业务requirements变化时, 需要updatemodel:
- in 线update: using new data in 线微调model
- 离线update: using new data重 new 训练model, 然 after deploymentupdate after model
- modelversionmanagement: usingTensorFlow Servingetc.toolmanagement many 个modelversion, supportA/Btest and 灰度release
8. 练习
练习 1: 保存 and 转换model
- creation一个 simple Kerasmodel (例such as, MNIST手写number识别model)
- 训练model并保存 for SavedModel格式
- 将SavedModel转换 for TensorFlow Litemodel
- usingTensorFlow Lite Interpreterfor推理
练习 2: TensorFlow Serving deployment
- usingDocker启动TensorFlow Servingcontainers
- usingREST API访问model
- usinggRPC API访问model
- 比较REST API and gRPC API performancediff
练习 3: modeloptimization
- for modelfor动态范围量化
- for modelfor全整数量化
- 比较原始model and 量化model 体积 and performancediff