TensorFlow modeldeployment and produce化

Learningsuch as何将训练 good modeldeployment to produceenvironment, includingTensorFlow Serving, TensorFlow Lite and Webdeploymentetc.method

1. modeldeploymentoverview

modeldeployment is 将训练 good 机器Learningmodelapplication to practicalproduceenvironmentin 过程. in TensorFlowin, has many 种modeldeployment方式, 适用于不同 场景 and 设备. 选择合适 deployment方式取决于 many 个因素, including:

  • deploymentenvironment: server, move设备, 嵌入式设备, Web浏览器etc.
  • performance要求: latency, throughput, resource占用etc.
  • Development成本: deploymentcomplexity, maintenance成本etc.
  • ecosystem: and 现 has system 集成难度etc.

1.1 TensorFlow deploymentsolutions

TensorFlowproviding了 many 种deploymentsolutions, 主要including:

  • TensorFlow Serving: 用于server端deployment, supportmodelversionmanagement and REST/gRPC API
  • TensorFlow Lite: 用于move设备 and 嵌入式设备deployment, model体积 small , performance high
  • TensorFlow.js: 用于Web浏览器 and Node.jsenvironmentdeployment, supportJavaScriptrun
  • SavedModel: commonmodel格式, 可用于 many 种deployment场景
  • ONNX: 开放神经network交换格式, support跨frameworkdeployment

2. model保存格式

in deploymentmodel之 before , 需要将训练 good model保存 for 合适 格式. TensorFlowsupport many 种model保存格式, 其in最常用 is SavedModel格式.

2.1 SavedModel 格式

SavedModel is TensorFlow 标准model保存格式, 它package含了model 计算graph and 权重, support跨平台deployment. usingKeras API可以easily将model保存 for SavedModel格式:

import tensorflow as tf
from tensorflow.keras import layers

# creation一个 simple  model
model = tf.keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(784,)),
    layers.Dense(10, activation='softmax')
])

# 编译model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# 保存 for SavedModel格式
tf.saved_model.save(model, 'saved_model/my_model')

保存 after , 会生成一个Table of Contentsstructure, package含model 计算graph and 权重file:

saved_model/
└── my_model/
    ├── assets/
    ├── saved_model.pb  # model 计算graph
    └── variables/
        ├── variables.data-00000-of-00001
        └── variables.index

2.2 HDF5 格式

除了SavedModel格式, Keras还supportHDF5格式, 适合保存完整 Kerasmodel:

# 保存 for HDF5格式
model.save('my_model.h5')

# 加载HDF5格式model
loaded_model = tf.keras.models.load_model('my_model.h5')

提示

for 于Kerasmodel, SavedModel格式 and HDF5格式都可以using, 但SavedModel格式更适合produceenvironmentdeployment, 因 for It supports更flexible deployment选项 and 更 good performance.

3. TensorFlow Serving deployment

TensorFlow Serving is a high performance 机器Learningmodelservicesystem, designed forproduceenvironmentdesign. It supportsmodelversionmanagement, REST and gRPC API, 以及自动model热加载.

3.1 installation TensorFlow Serving

可以usingDocker or 直接installation 方式deploymentTensorFlow Serving:

3.1.1 using Docker deployment

# pull TensorFlow Serving 镜像
docker pull tensorflow/serving

# 启动 TensorFlow Serving containers
docker run -p 8501:8501 -p 8500:8500 \
  --mount type=bind,source="$(pwd)/saved_model/my_model",target=/models/my_model \
  -e MODEL_NAME=my_model -t tensorflow/serving

3.1.2 直接installation

in Ubuntusystem on , 可以usingapt-getinstallation:

echo "deb [arch=amd64] http://store.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list
curl https://store.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install tensorflow-model-server

3.2 using REST API 访问model

TensorFlow Serving启动 after , 可以throughREST API访问model:

import requests
import numpy as np

# 准备testdata
test_data = np.random.rand(1, 784).tolist()

# 发送request
response = requests.post(
    'http://localhost:8501/v1/models/my_model:predict',
    json={'instances': test_data}
)

# processingresponse
predictions = response.json()['predictions']
print(predictions)

3.3 using gRPC API 访问model

gRPC API比REST APIperformance更 high , 适合 high throughput场景:

import grpc
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import tensorflow as tf

# creationgRPC通道
channel = grpc.insecure_channel('localhost:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

# 准备request
request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.model_spec.signature_name = 'serving_default'

# 准备testdata
test_data = tf.random.normal([1, 784])
request.inputs['dense_input'].CopyFrom(
    tf.make_tensor_proto(test_data)
)

# 发送request
response = stub.Predict(request, 10.0)  # 10秒超时

# processingresponse
predictions = tf.make_ndarray(response.outputs['dense_1'])
print(predictions)

4. TensorFlow Lite deployment

TensorFlow Lite is TensorFlow 轻量级solution, designed formove设备, 嵌入式设备 and IoT设备design. 它providing了更 small model体积 and 更 fast 推理速度.

4.1 model转换

要usingTensorFlow Lite, 需要将SavedModel转换 for TensorFlow Litemodel格式 (.tflite) :

import tensorflow as tf

# 加载SavedModel
model = tf.saved_model.load('saved_model/my_model')

# creation转换器
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/my_model')

# optimizationmodel (可选) 
converter.optimizations = [tf.lite.optimization.DEFAULT]

# 转换model
tflite_model = converter.convert()

# 保存TensorFlow Litemodel
with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

4.2 model推理

in Pythonenvironmentin, 可以usingTensorFlow Lite Interpreterfor推理:

import tensorflow as tf
import numpy as np

# 加载TensorFlow Litemodel
interpreter = tf.lite.Interpreter(model_path='model.tflite')
interpreter.allocate_tensors()

# 获取输入 and 输出张量
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# 准备testdata
input_data = np.random.rand(1, 784).astype(np.float32)

# 设置输入data
interpreter.set_tensor(input_details[0]['index'], input_data)

# 执行推理
interpreter.invoke()

# 获取输出结果
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

4.3 move设备deployment

in move设备 on , 可以usingTensorFlow Lite Android or iOSlibraryfordeployment:

4.3.1 Android deployment

in Androidproject build.gradlein添加依赖:

dependencies {
    implementation 'org.tensorflow:tensorflow-lite:2.15.0'
    implementation 'org.tensorflow:tensorflow-lite-gpu:2.15.0'  // 可选, GPU加速
}

4.3.2 iOS deployment

in iOSproject Podfilein添加依赖:

pod 'TensorFlowLiteSwift', '~> 2.15.0'

5. TensorFlow.js deployment

TensorFlow.js is TensorFlow JavaScriptimplementation, 可以 in Web浏览器 and Node.jsenvironmentinrun. It supports from Kerasmodel or SavedModel转换 for TensorFlow.jsmodel.

5.1 model转换

usingtfjs-convertertool将Kerasmodel or SavedModel转换 for TensorFlow.jsmodel:

# installation tfjs-converter
pip install tensorflowjs

# 转换Kerasmodel
tensorflowjs_converter --input_format=keras my_model.h5 tfjs_model

# 转换SavedModel
tensorflowjs_converter --input_format=saved_model saved_model/my_model tfjs_model

5.2 in 浏览器inusing

in HTMLfilein引入TensorFlow.jslibrary, 并加载model:

<!DOCTYPE html>
<html>
  <head>
    <title>TensorFlow.js modeldeployment</title>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@4.10.0/dist/tf.min.js"></script>
  </head>
  <body>
    <script>
      async function run() {
        // 加载model
        const model = await tf.loadLayersModel('tfjs_model/model.json');
        
        // 准备testdata
        const input = tf.randomNormal([1, 784]);
        
        // 执行推理
        const output = model.predict(input);
        output.print();
      }
      
      run();
    </script>
  </body>
</html>

5.3 in Node.js inusing

in Node.jsenvironmentin, 可以using@tensorflow/tfjs-nodelibrary:

// installation依赖
// npm install @tensorflow/tfjs @tensorflow/tfjs-node

const tf = require('@tensorflow/tfjs');
require('@tensorflow/tfjs-node');

async function run() {
  // 加载model
  const model = await tf.loadLayersModel('file://tfjs_model/model.json');
  
  // 准备testdata
  const input = tf.randomNormal([1, 784]);
  
  // 执行推理
  const output = model.predict(input);
  output.print();
}

run();

6. modeloptimization

in deploymentmodel之 before , 通常需要 for modelforoptimization, 以improvingperformance and reducingresource占用. TensorFlowproviding了 many 种modeloptimizationtechniques:

6.1 model量化

model量化 is 将浮点数model转换 for 定点数model (such as8位整数) 过程, 可以reducingmodel体积 and improving推理速度:

# using TensorFlow Lite for量化
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/my_model')
converter.optimizations = [tf.lite.optimization.DEFAULT]

# 动态范围量化 (默认) 
tflite_model_dynamic = converter.convert()

# 全整数量化 (需要校准data) 
def representative_data_gen():
    for _ in range(100):
        yield [np.random.rand(1, 784).astype(np.float32)]

converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model_int8 = converter.convert()

6.2 model剪枝

model剪枝 is 移除modelin不 important 权重 and 神经元, 以reducingmodel体积 and improving推理速度:

import tensorflow_model_optimization as tfmot

# creation一个 simple  model
model = tf.keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(784,)),
    layers.Dense(10, activation='softmax')
])

# application剪枝
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

# 定义剪枝parameter
pruning_params = {
    'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.50,
                                                             final_sparsity=0.80,
                                                             begin_step=0,
                                                             end_step=1000)
}

# creation剪枝model
model_for_pruning = prune_low_magnitude(model, **pruning_params)

# 编译model
model_for_pruning.compile(optimizer='adam',
                          loss='sparse_categorical_crossentropy',
                          metrics=['accuracy'])

# 训练model
model_for_pruning.fit(x_train, y_train, epochs=10)

# 移除剪枝package装
model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)

# 保存model
tf.saved_model.save(model_for_export, 'saved_model/pruned_model')

6.3 knowledge蒸馏

knowledge蒸馏 is 将 complex model (教师model) knowledgemigration to simple model (学生model) 过程, 可以 in 保持performance 同时reducingmodelcomplexity:

# creation教师model ( complex model) 
teacher_model = tf.keras.Sequential([
    layers.Dense(256, activation='relu', input_shape=(784,)),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# 训练教师model
teacher_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
teacher_model.fit(x_train, y_train, epochs=20)

# creation学生model ( simple model) 
student_model = tf.keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(784,)),
    layers.Dense(10, activation='softmax')
])

# 定义knowledge蒸馏损失function
def distillation_loss(y_true, y_pred, teacher_pred, temperature=2.0, alpha=0.5):
    # 软目标损失
    soft_loss = tf.keras.losses.categorical_crossentropy(
        tf.nn.softmax(teacher_pred / temperature),
        tf.nn.softmax(y_pred / temperature)
    ) * (temperature ** 2)
    
    # 硬目标损失
    hard_loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred)
    
    # 总损失
    return alpha * soft_loss + (1 - alpha) * hard_loss

# 训练学生model
for epoch in range(20):
    for step, (x_batch, y_batch) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            # 教师model预测
            teacher_pred = teacher_model(x_batch, training=False)
            # 学生model预测
            student_pred = student_model(x_batch, training=True)
            # 计算损失
            loss = distillation_loss(y_batch, student_pred, teacher_pred)
        
        # 计算梯度
        gradients = tape.gradient(loss, student_model.trainable_variables)
        # update权重
        optimizer.apply_gradients(zip(gradients, student_model.trainable_variables))

7. modelmonitor and maintenance

modeldeployment to produceenvironment after , 需要formonitor and maintenance, 以确保modelperformance and 准确性:

7.1 monitor指标

  • performance指标: latency, throughput, resource占用 (CPU, memory, GPU)
  • 准确性指标: 准确率, 精确率, 召回率, F1值etc.
  • 业务指标: 转化率, 点击率, user满意度etc.

7.2 modelupdate

当modelperformance under 降 or 业务requirements变化时, 需要updatemodel:

  • in 线update: using new data in 线微调model
  • 离线update: using new data重 new 训练model, 然 after deploymentupdate after model
  • modelversionmanagement: usingTensorFlow Servingetc.toolmanagement many 个modelversion, supportA/Btest and 灰度release

8. 练习

练习 1: 保存 and 转换model

  1. creation一个 simple Kerasmodel (例such as, MNIST手写number识别model)
  2. 训练model并保存 for SavedModel格式
  3. 将SavedModel转换 for TensorFlow Litemodel
  4. usingTensorFlow Lite Interpreterfor推理

练习 2: TensorFlow Serving deployment

  1. usingDocker启动TensorFlow Servingcontainers
  2. usingREST API访问model
  3. usinggRPC API访问model
  4. 比较REST API and gRPC API performancediff

练习 3: modeloptimization

  1. for modelfor动态范围量化
  2. for modelfor全整数量化
  3. 比较原始model and 量化model 体积 and performancediff
on 一节: TensorFlow 循环神经network(RNN) 返回tutoriallist