TensorFlow modeldeployment and produce化

1. modeldeploymentoverview

modeldeployment is 将训练 good 机器Learningmodelapplication to practicalproduceenvironmentin 过程. in TensorFlowin, has many 种modeldeployment方式, 适用于不同场景 and 设备. 选择合适 deployment方式取决于 many 个因素, including:

deploymentenvironment: server, move设备, 嵌入式设备, Web浏览器etc.
performance要求: latency, throughput, resource占用etc.
Development成本: deploymentcomplexity, maintenance成本etc.
ecosystem: and 现 has system 集成难度etc.

1.1 TensorFlow deploymentsolutions

TensorFlowproviding了 many 种deploymentsolutions, 主要including:

TensorFlow Serving: 用于server端deployment, supportmodelversionmanagement and REST/gRPC API
TensorFlow Lite: 用于move设备 and 嵌入式设备deployment, model体积 small , performance high
TensorFlow.js: 用于Web浏览器 and Node.jsenvironmentdeployment, supportJavaScriptrun
SavedModel: commonmodel格式, 可用于 many 种deployment场景
ONNX: 开放神经network交换格式, support跨frameworkdeployment

2. model保存格式

in deploymentmodel之 before , 需要将训练 good model保存 for 合适格式. TensorFlowsupport many 种model保存格式, 其in最常用 is SavedModel格式.

2.1 SavedModel 格式

SavedModel is TensorFlow 标准model保存格式, 它package含了model 计算graph and 权重, support跨平台deployment. usingKeras API可以easily将model保存 for SavedModel格式:

import tensorflow as tf
from tensorflow.keras import layers

# creation一个 simple  model
model = tf.keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(784,)),
    layers.Dense(10, activation='softmax')
])

# 编译model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# 保存 for SavedModel格式
tf.saved_model.save(model, 'saved_model/my_model')

保存 after , 会生成一个Table of Contentsstructure, package含model 计算graph and 权重file:

saved_model/
└── my_model/
    ├── assets/
    ├── saved_model.pb  # model 计算graph
    └── variables/
        ├── variables.data-00000-of-00001
        └── variables.index

2.2 HDF5 格式

除了SavedModel格式, Keras还supportHDF5格式, 适合保存完整 Kerasmodel:

# 保存 for HDF5格式
model.save('my_model.h5')

# 加载HDF5格式model
loaded_model = tf.keras.models.load_model('my_model.h5')

提示

for 于Kerasmodel, SavedModel格式 and HDF5格式都可以using, 但SavedModel格式更适合produceenvironmentdeployment, 因 for It supports更flexible deployment选项 and 更 good performance.

3. TensorFlow Serving deployment

TensorFlow Serving is a high performance 机器Learningmodelservicesystem, designed forproduceenvironmentdesign. It supportsmodelversionmanagement, REST and gRPC API, 以及自动model热加载.

3.1 installation TensorFlow Serving

可以usingDocker or 直接installation 方式deploymentTensorFlow Serving:

3.1.1 using Docker deployment

# pull TensorFlow Serving 镜像
docker pull tensorflow/serving

# 启动 TensorFlow Serving containers
docker run -p 8501:8501 -p 8500:8500 \
  --mount type=bind,source="$(pwd)/saved_model/my_model",target=/models/my_model \
  -e MODEL_NAME=my_model -t tensorflow/serving

3.1.2 直接installation

in Ubuntusystem on , 可以usingapt-getinstallation:

echo "deb [arch=amd64] http://store.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list
curl https://store.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install tensorflow-model-server

3.2 using REST API 访问model

TensorFlow Serving启动 after , 可以throughREST API访问model:

import requests
import numpy as np

# 准备testdata
test_data = np.random.rand(1, 784).tolist()

# 发送request
response = requests.post(
    'http://localhost:8501/v1/models/my_model:predict',
    json={'instances': test_data}
)

# processingresponse
predictions = response.json()['predictions']
print(predictions)

3.3 using gRPC API 访问model

gRPC API比REST APIperformance更 high , 适合 high throughput场景:

import grpc
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import tensorflow as tf

# creationgRPC通道
channel = grpc.insecure_channel('localhost:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

# 准备request
request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.model_spec.signature_name = 'serving_default'

# 准备testdata
test_data = tf.random.normal([1, 784])
request.inputs['dense_input'].CopyFrom(
    tf.make_tensor_proto(test_data)
)

# 发送request
response = stub.Predict(request, 10.0)  # 10秒超时

# processingresponse
predictions = tf.make_ndarray(response.outputs['dense_1'])
print(predictions)

4. TensorFlow Lite deployment

TensorFlow Lite is TensorFlow 轻量级solution, designed formove设备, 嵌入式设备 and IoT设备design. 它providing了更 small model体积 and 更 fast 推理速度.

4.1 model转换

要usingTensorFlow Lite, 需要将SavedModel转换 for TensorFlow Litemodel格式 (.tflite) :

import tensorflow as tf

# 加载SavedModel
model = tf.saved_model.load('saved_model/my_model')

# creation转换器
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/my_model')

# optimizationmodel (可选) 
converter.optimizations = [tf.lite.optimization.DEFAULT]

# 转换model
tflite_model = converter.convert()

# 保存TensorFlow Litemodel
with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

4.2 model推理

in Pythonenvironmentin, 可以usingTensorFlow Lite Interpreterfor推理:

import tensorflow as tf
import numpy as np

# 加载TensorFlow Litemodel
interpreter = tf.lite.Interpreter(model_path='model.tflite')
interpreter.allocate_tensors()

# 获取输入 and 输出张量
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# 准备testdata
input_data = np.random.rand(1, 784).astype(np.float32)

# 设置输入data
interpreter.set_tensor(input_details[0]['index'], input_data)

# 执行推理
interpreter.invoke()

# 获取输出结果
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

4.3 move设备deployment

in move设备 on , 可以usingTensorFlow Lite Android or iOSlibraryfordeployment:

4.3.1 Android deployment

in Androidproject build.gradlein添加依赖:

dependencies {
    implementation 'org.tensorflow:tensorflow-lite:2.15.0'
    implementation 'org.tensorflow:tensorflow-lite-gpu:2.15.0'  // 可选, GPU加速
}

4.3.2 iOS deployment

in iOSproject Podfilein添加依赖:

pod 'TensorFlowLiteSwift', '~> 2.15.0'

5. TensorFlow.js deployment

TensorFlow.js is TensorFlow JavaScriptimplementation, 可以 in Web浏览器 and Node.jsenvironmentinrun. It supports from Kerasmodel or SavedModel转换 for TensorFlow.jsmodel.

5.1 model转换

usingtfjs-convertertool将Kerasmodel or SavedModel转换 for TensorFlow.jsmodel:

# installation tfjs-converter
pip install tensorflowjs

# 转换Kerasmodel
tensorflowjs_converter --input_format=keras my_model.h5 tfjs_model

# 转换SavedModel
tensorflowjs_converter --input_format=saved_model saved_model/my_model tfjs_model

5.2 in 浏览器inusing

in HTMLfilein引入TensorFlow.jslibrary, 并加载model:

<!DOCTYPE html>
<html>
  <head>
    <title>TensorFlow.js modeldeployment</title>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@4.10.0/dist/tf.min.js"></script>
  </head>
  <body>
    <script>
      async function run() {
        // 加载model
        const model = await tf.loadLayersModel('tfjs_model/model.json');
        
        // 准备testdata
        const input = tf.randomNormal([1, 784]);
        
        // 执行推理
        const output = model.predict(input);
        output.print();
      }
      
      run();
    </script>
  </body>
</html>

5.3 in Node.js inusing

in Node.jsenvironmentin, 可以using@tensorflow/tfjs-nodelibrary:

// installation依赖
// npm install @tensorflow/tfjs @tensorflow/tfjs-node

const tf = require('@tensorflow/tfjs');
require('@tensorflow/tfjs-node');

async function run() {
  // 加载model
  const model = await tf.loadLayersModel('file://tfjs_model/model.json');
  
  // 准备testdata
  const input = tf.randomNormal([1, 784]);
  
  // 执行推理
  const output = model.predict(input);
  output.print();
}

run();

6. modeloptimization

in deploymentmodel之 before , 通常需要 for modelforoptimization, 以improvingperformance and reducingresource占用. TensorFlowproviding了 many 种modeloptimizationtechniques:

6.1 model量化

model量化 is 将浮点数model转换 for 定点数model (such as8位整数) 过程, 可以reducingmodel体积 and improving推理速度:

# using TensorFlow Lite for量化
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/my_model')
converter.optimizations = [tf.lite.optimization.DEFAULT]

# 动态范围量化 (默认) 
tflite_model_dynamic = converter.convert()

# 全整数量化 (需要校准data) 
def representative_data_gen():
    for _ in range(100):
        yield [np.random.rand(1, 784).astype(np.float32)]

converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model_int8 = converter.convert()

6.2 model剪枝

model剪枝 is 移除modelin不 important 权重 and 神经元, 以reducingmodel体积 and improving推理速度:

import tensorflow_model_optimization as tfmot

# creation一个 simple  model
model = tf.keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(784,)),
    layers.Dense(10, activation='softmax')
])

# application剪枝
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

# 定义剪枝parameter
pruning_params = {
    'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.50,
                                                             final_sparsity=0.80,
                                                             begin_step=0,
                                                             end_step=1000)
}

# creation剪枝model
model_for_pruning = prune_low_magnitude(model, **pruning_params)

# 编译model
model_for_pruning.compile(optimizer='adam',
                          loss='sparse_categorical_crossentropy',
                          metrics=['accuracy'])

# 训练model
model_for_pruning.fit(x_train, y_train, epochs=10)

# 移除剪枝package装
model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)

# 保存model
tf.saved_model.save(model_for_export, 'saved_model/pruned_model')

6.3 knowledge蒸馏

knowledge蒸馏 is 将 complex model (教师model) knowledgemigration to simple model (学生model) 过程, 可以 in 保持performance 同时reducingmodelcomplexity:

# creation教师model ( complex model) 
teacher_model = tf.keras.Sequential([
    layers.Dense(256, activation='relu', input_shape=(784,)),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# 训练教师model
teacher_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
teacher_model.fit(x_train, y_train, epochs=20)

# creation学生model ( simple model) 
student_model = tf.keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(784,)),
    layers.Dense(10, activation='softmax')
])

# 定义knowledge蒸馏损失function
def distillation_loss(y_true, y_pred, teacher_pred, temperature=2.0, alpha=0.5):
    # 软目标损失
    soft_loss = tf.keras.losses.categorical_crossentropy(
        tf.nn.softmax(teacher_pred / temperature),
        tf.nn.softmax(y_pred / temperature)
    ) * (temperature ** 2)
    
    # 硬目标损失
    hard_loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred)
    
    # 总损失
    return alpha * soft_loss + (1 - alpha) * hard_loss

# 训练学生model
for epoch in range(20):
    for step, (x_batch, y_batch) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            # 教师model预测
            teacher_pred = teacher_model(x_batch, training=False)
            # 学生model预测
            student_pred = student_model(x_batch, training=True)
            # 计算损失
            loss = distillation_loss(y_batch, student_pred, teacher_pred)
        
        # 计算梯度
        gradients = tape.gradient(loss, student_model.trainable_variables)
        # update权重
        optimizer.apply_gradients(zip(gradients, student_model.trainable_variables))

7. modelmonitor and maintenance

modeldeployment to produceenvironment after , 需要formonitor and maintenance, 以确保modelperformance and 准确性:

7.1 monitor指标

performance指标: latency, throughput, resource占用 (CPU, memory, GPU)
准确性指标: 准确率, 精确率, 召回率, F1值etc.
业务指标: 转化率, 点击率, user满意度etc.

7.2 modelupdate

当modelperformance under 降 or 业务requirements变化时, 需要updatemodel:

in 线update: using new data in 线微调model
离线update: using new data重 new 训练model, 然 after deploymentupdate after model
modelversionmanagement: usingTensorFlow Servingetc.toolmanagement many 个modelversion, supportA/Btest and 灰度release

8. 练习

练习 1: 保存 and 转换model

creation一个 simple Kerasmodel (例such as, MNIST手写number识别model)
训练model并保存 for SavedModel格式
将SavedModel转换 for TensorFlow Litemodel
usingTensorFlow Lite Interpreterfor推理

练习 2: TensorFlow Serving deployment

usingDocker启动TensorFlow Servingcontainers
usingREST API访问model
usinggRPC API访问model
比较REST API and gRPC API performancediff

练习 3: modeloptimization

for modelfor动态范围量化
for modelfor全整数量化
比较原始model and 量化model 体积 and performancediff

on 一节: TensorFlow 循环神经network(RNN) 返回tutoriallist