TensorFlow 卷积神经network(CNN)

详细介绍卷积神经network 原理 and application, including卷积层, 池化层, 激活function and 经典CNNarchitectureimplementation

1. 卷积神经networkIntroduction

卷积神经network (Convolutional Neural Network, CNN) is a专门用于processing具 has 网格structuredata 深度Learningmodel, 尤其 in graph像识别, computer vision and other fields取得了巨 big 成功. CNNthrough局部连接, 权值共享 and 池化etc.features, able to high 效地提取graph像 局部特征, 并具 has 良 good 平移不变性.

CNN 主要特点including:

  • 局部连接: 每个神经元只 and 输入 局部区域连接
  • 权值共享: 同一卷积核 in 整个输入 on 共享权重
  • 池化: through under 采样reducing特征graph 维度, 保留 important information
  • 平移不变性: able to识别graph像in不同位置 相同特征

2. CNN basic组成

CNN主要由以 under 几种层组成:

2.1 卷积层 (Convolutional Layer)

卷积层 is CNN corecomponent, 用于提取输入data 局部特征. 它through卷积operation将输入 and 卷积核 (滤波器) for卷积, 生成特征graph.

2.1.1 卷积operation原理

卷积operation is 将卷积核 in 输入 on 滑动, 逐元素相乘并求 and , 生成输出特征graph 过程. for 于二维graph像卷积, 卷积operation可以表示 for :

Y[i, j] = ΣₖΣₗ X[i+k, j+l] * K[k, l]

其in:

  • X is 输入graph像
  • K is 卷积核
  • Y is 输出特征graph
  • i, j is 输出特征graph 位置
  • k, l is 卷积核 位置

2.1.2 卷积层parameter

卷积层 主要parameterincluding:

  • 卷积核数量: 决定输出特征graph 深度
  • 卷积核 big small : 通常 for 3x3, 5x5etc.奇数 big small
  • 步 long (Stride) : 卷积核滑动 步 long
  • 填充 (Padding) : in 输入edge缘添加零值, 保持输出 big small

2.1.3 TensorFlowin 卷积层

#  in TensorFlowincreation卷积层
conv_layer = tf.keras.layers.Conv2D(
    filters=32,  # 卷积核数量
    kernel_size=(3, 3),  # 卷积核 big  small 
    strides=(1, 1),  # 步 long 
    padding='same',  # 填充方式: 'same' or 'valid'
    activation='relu',  # 激活function
    input_shape=(28, 28, 1)  # 输入形状
)

2.2 激活function

激活function用于引入非线性, 使神经networkable toLearning complex 非线性relationships. 常用 激活functionincluding:

2.2.1 ReLU激活function

ReLU (Rectified Linear Unit) is 最常用 激活function, 它将所 has 负值置 for 零, 正值保持不变:

ReLU(x) = max(0, x)

# usingReLU激活function
relu_layer = tf.keras.layers.ReLU()

2.2.2 Leaky ReLU

Leaky ReLU解决了ReLU 死亡神经元issues, for 负值添加一个 small 斜率:

LeakyReLU(x) = max(αx, x), 其inα通常 for 0.01

# usingLeaky ReLU激活function
leaky_relu_layer = tf.keras.layers.LeakyReLU(alpha=0.01)

2.2.3 ELU

ELU (Exponential Linear Unit) 结合了ReLU and Leaky ReLU 优点, 具 has 平滑 负值区域:

# usingELU激活function
elu_layer = tf.keras.layers.ELU(alpha=1.0)

2.3 池化层 (Pooling Layer)

池化层用于 under 采样, reducing特征graph 空间维度, 同时保留 important information. 常用 池化operationincluding最 big 池化 and 平均池化.

2.3.1 最 big 池化 (Max Pooling)

最 big 池化取局部区域 最 big 值:

# using最 big 池化层
max_pool_layer = tf.keras.layers.MaxPooling2D(
    pool_size=(2, 2),  # 池化窗口 big  small 
    strides=(2, 2),  # 步 long 
    padding='valid'  # 填充方式
)

2.3.2 平均池化 (Average Pooling)

平均池化取局部区域 平均值:

# using平均池化层
avg_pool_layer = tf.keras.layers.AveragePooling2D(
    pool_size=(2, 2),
    strides=(2, 2),
    padding='valid'
)

2.4 批量归一化层 (Batch Normalization Layer)

批量归一化层用于 for 特征graphfor归一化, 加速model收敛, reducing过拟合:

# using批量归一化层
bn_layer = tf.keras.layers.BatchNormalization()

2.5 全连接层 (Fully Connected Layer)

全连接层用于将卷积层提取 特征map to 最终 输出class别:

# using全连接层
fc_layer = tf.keras.layers.Dense(
    units=128,  # 神经元数量
    activation='relu'  # 激活function
)

2.6 Dropout层

Dropout层用于正则化, 防止过拟合:

# usingDropout层
dropout_layer = tf.keras.layers.Dropout(rate=0.5)  # discard概率

3. 构建 simple CNNmodel

现 in , 让我们usingTensorFlow构建一个 simple CNNmodel, 用于MNIST手写number识别:

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# 加载MNISTdata集
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# data预processing
X_train = X_train.reshape((-1, 28, 28, 1))  # 重塑 for 4D张量 (样本数,  high 度, 宽度, 通道数)
X_test = X_test.reshape((-1, 28, 28, 1))
X_train = X_train.astype('float32') / 255.0  # 归一化 to [0, 1]范围
X_test = X_test.astype('float32') / 255.0
y_train = to_categorical(y_train)  # 独热编码
Y_test = to_categorical(y_test)

# 构建CNNmodel
model = tf.keras.Sequential([
    # 第一个卷积块
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2), strides=(2, 2)),
    
    # 第二个卷积块
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D((2, 2), strides=(2, 2)),
    
    # 第三个卷积块
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D((2, 2), strides=(2, 2)),
    
    # 展平层
    tf.keras.layers.Flatten(),
    
    # 全连接层
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    
    # 输出层
    tf.keras.layers.Dense(10, activation='softmax')
])

# 查看model摘要
model.summary()

3.1 编译 and 训练model

# 编译model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# 训练model
history = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=64,
    validation_data=(X_test, Y_test),
    verbose=1
)

# assessmentmodel
loss, accuracy = model.evaluate(X_test, Y_test, verbose=1)
print(f"test损失: {loss}")
print(f"test准确率: {accuracy}")

3.2 visualization训练过程

import matplotlib.pyplot as plt

# 绘制损失曲线
plt.figure(figsize=(12, 4))

# 损失曲线
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='训练损失')
plt.plot(history.history['val_loss'], label='verification损失')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('model损失')
plt.legend()

# 准确率曲线
plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='训练准确率')
plt.plot(history.history['val_accuracy'], label='verification准确率')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('model准确率')
plt.legend()

plt.tight_layout()
plt.show()

4. 经典CNNarchitecture

4.1 LeNet-5

LeNet-5 is 最早 CNNarchitecture之一, 由Yann LeCunetc.人于1998年提出, 用于手写number识别.

# implementationLeNet-5model
lenet5 = tf.keras.Sequential([
    # C1: 卷积层, 6个5x5卷积核
    tf.keras.layers.Conv2D(6, (5, 5), activation='tanh', padding='same', input_shape=(28, 28, 1)),
    # S2: 平均池化层, 2x2池化窗口
    tf.keras.layers.AveragePooling2D((2, 2), strides=(2, 2)),
    # C3: 卷积层, 16个5x5卷积核
    tf.keras.layers.Conv2D(16, (5, 5), activation='tanh'),
    # S4: 平均池化层, 2x2池化窗口
    tf.keras.layers.AveragePooling2D((2, 2), strides=(2, 2)),
    # C5: 卷积层, 120个5x5卷积核
    tf.keras.layers.Conv2D(120, (5, 5), activation='tanh'),
    # 展平层
    tf.keras.layers.Flatten(),
    # F6: 全连接层, 84个神经元
    tf.keras.layers.Dense(84, activation='tanh'),
    # 输出层, 10个神经元
    tf.keras.layers.Dense(10, activation='softmax')
])

lenet5.summary()

4.2 AlexNet

AlexNet is 由Alex Krizhevskyetc.人于2012年提出 CNNarchitecture, in ImageNet竞赛in取得了突破性成绩.

# implementationAlexNetmodel
alexnet = tf.keras.Sequential([
    # 第一层: 卷积+ReLU+池化
    tf.keras.layers.Conv2D(96, (11, 11), strides=(4, 4), activation='relu', input_shape=(227, 227, 3)),
    tf.keras.layers.MaxPooling2D((3, 3), strides=(2, 2)),
    tf.keras.layers.BatchNormalization(),
    
    # 第二层: 卷积+ReLU+池化
    tf.keras.layers.Conv2D(256, (5, 5), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((3, 3), strides=(2, 2)),
    tf.keras.layers.BatchNormalization(),
    
    # 第三层: 卷积+ReLU
    tf.keras.layers.Conv2D(384, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.BatchNormalization(),
    
    # 第四层: 卷积+ReLU
    tf.keras.layers.Conv2D(384, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.BatchNormalization(),
    
    # 第五层: 卷积+ReLU+池化
    tf.keras.layers.Conv2D(256, (3, 3), padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D((3, 3), strides=(2, 2)),
    tf.keras.layers.BatchNormalization(),
    
    # 全连接层
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(4096, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(4096, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    
    # 输出层
    tf.keras.layers.Dense(1000, activation='softmax')
])

alexnet.summary()

4.3 VGGNet

VGGNet is 由牛津 big 学Visual Geometry Group提出 CNNarchitecture, 以其简洁 design and 良 good performance而闻名.

# implementationVGG16model
def vgg_block(num_convs, num_filters):
    """creationVGG块"""
    block = tf.keras.Sequential()
    for _ in range(num_convs):
        block.add(tf.keras.layers.Conv2D(num_filters, (3, 3), padding='same', activation='relu'))
    block.add(tf.keras.layers.MaxPooling2D((2, 2), strides=(2, 2)))
    return block

# 构建VGG16model
vgg16 = tf.keras.Sequential([
    # 输入层
    tf.keras.layers.Input(shape=(224, 224, 3)),
    
    # 第一个VGG块: 2个卷积层, 64个滤波器
    vgg_block(2, 64),
    
    # 第二个VGG块: 2个卷积层, 128个滤波器
    vgg_block(2, 128),
    
    # 第三个VGG块: 3个卷积层, 256个滤波器
    vgg_block(3, 256),
    
    # 第四个VGG块: 3个卷积层, 512个滤波器
    vgg_block(3, 512),
    
    # 第五个VGG块: 3个卷积层, 512个滤波器
    vgg_block(3, 512),
    
    # 全连接层
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(4096, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(4096, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    
    # 输出层
    tf.keras.layers.Dense(1000, activation='softmax')
])

vgg16.summary()

4.4 ResNet

ResNet (Residual Network) is 由微软研究院提出 CNNarchitecture, through残差连接解决了深层network训练 difficult issues.

# implementationResNet残差块
def residual_block(x, filters, stride=1, downsample=None):
    """creation残差块"""
    identity = x
    
    # 第一个卷积层
    x = tf.keras.layers.Conv2D(filters, (3, 3), strides=stride, padding='same', activation='relu')(x)
    x = tf.keras.layers.BatchNormalization()(x)
    
    # 第二个卷积层
    x = tf.keras.layers.Conv2D(filters, (3, 3), padding='same')(x)
    x = tf.keras.layers.BatchNormalization()(x)
    
    #  under 采样 (such as果需要) 
    if downsample is not None:
        identity = downsample(identity)
    
    # 残差连接
    x = tf.keras.layers.Add()([x, identity])
    x = tf.keras.layers.Activation('relu')(x)
    
    return x

# 构建ResNet18model
def build_resnet18(input_shape=(32, 32, 3), num_classes=10):
    inputs = tf.keras.Input(shape=input_shape)
    
    # 初始卷积层
    x = tf.keras.layers.Conv2D(64, (3, 3), padding='same', activation='relu')(inputs)
    x = tf.keras.layers.BatchNormalization()(x)
    
    # 残差层
    # 第一组残差块
    x = residual_block(x, 64)
    x = residual_block(x, 64)
    
    # 第二组残差块
    downsample1 = tf.keras.Sequential([
        tf.keras.layers.Conv2D(128, (1, 1), strides=2, padding='same'),
        tf.keras.layers.BatchNormalization()
    ])
    x = residual_block(x, 128, stride=2, downsample=downsample1)
    x = residual_block(x, 128)
    
    # 第三组残差块
    downsample2 = tf.keras.Sequential([
        tf.keras.layers.Conv2D(256, (1, 1), strides=2, padding='same'),
        tf.keras.layers.BatchNormalization()
    ])
    x = residual_block(x, 256, stride=2, downsample=downsample2)
    x = residual_block(x, 256)
    
    # 第四组残差块
    downsample3 = tf.keras.Sequential([
        tf.keras.layers.Conv2D(512, (1, 1), strides=2, padding='same'),
        tf.keras.layers.BatchNormalization()
    ])
    x = residual_block(x, 512, stride=2, downsample=downsample3)
    x = residual_block(x, 512)
    
    # 全局平均池化
    x = tf.keras.layers.GlobalAveragePooling2D()(x)
    
    # 输出层
    outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
    
    # creationmodel
    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    return model

# creationResNet18model
resnet18 = build_resnet18(input_shape=(32, 32, 3), num_classes=10)
resnet18.summary()

5. CNNadvancedtechniques

5.1 data增强

data增强 is improvingCNNmodel泛化capacity important techniques, 可以through随机变换训练data来增加data many 样性.

# graph像data增强
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.experimental.preprocessing.RandomFlip('horizontal'),
    tf.keras.layers.experimental.preprocessing.RandomRotation(0.1),
    tf.keras.layers.experimental.preprocessing.RandomZoom(0.1),
    tf.keras.layers.experimental.preprocessing.RandomHeight(0.1),
    tf.keras.layers.experimental.preprocessing.RandomWidth(0.1),
])

#  in modelinusingdata增强
augmented_model = tf.keras.Sequential([
    data_augmentation,
    #  after 续层...
])

5.2 using预训练model

预训练model is in big 型data集 on 训练 good model, 可以throughmigrationLearning用于 new task.

# using预训练 VGG16model
base_model = tf.keras.applications.VGG16(
    weights='imagenet',  # usingImageNet预训练权重
    include_top=False,  # 不package含顶层classification器
    input_shape=(224, 224, 3)  # 输入形状
)

# 冻结Basicsmodel 权重
base_model.trainable = False

# 添加自定义顶层
inputs = tf.keras.Input(shape=(224, 224, 3))
x = base_model(inputs, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(1024, activation='relu')(x)
x = tf.keras.layers.Dropout(0.5)(x)
outputs = tf.keras.layers.Dense(10, activation='softmax')(x)

# creationmodel
model = tf.keras.Model(inputs, outputs)

# 编译model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

5.3 微调预训练model

微调 is 指解冻预训练model 部分层, and 自定义顶层一起训练, 以adapting to newtask.

# 解冻Basicsmodel 最 after 几个层
base_model.trainable = True

# 冻结 before 5层
for layer in base_model.layers[:5]:
    layer.trainable = False

# 重 new 编译model
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),  # using较 small  Learning率
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# 微调model
history_fine = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=32,
    validation_data=(X_test, Y_test),
    verbose=1
)

6. CNNapplicationinstance

6.1 graph像classification

CNN最经典 application is graph像classification, 以 under is a usingCNNforCIFAR-10graph像classification example:

# CIFAR-10graph像classification
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# 加载CIFAR-10data集
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# data预processing
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# 构建CNNmodel
model = tf.keras.Sequential([
    # 第一个卷积块
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.2),
    
    # 第二个卷积块
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.3),
    
    # 第三个卷积块
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Dropout(0.4),
    
    # 全连接层
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.5),
    
    # 输出层
    tf.keras.layers.Dense(10, activation='softmax')
])

# 编译model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# 训练model
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=64,
    validation_data=(X_test, y_test),
    verbose=1
)

7. 练习

练习 1: 构建CNNmodel

  1. usingMNISTdata集构建一个CNNmodel, package含至 few 2个卷积块.
  2. 添加data增强, improvingmodelperformance.
  3. 尝试不同 卷积核 big small and 数量, 观察 for modelperformance 影响.
  4. visualizationmodel in间层输出, understanding特征提取过程.

练习 2: implementation经典CNNarchitecture

  1. implementationLeNet-5model, 用于MNIST手写number识别.
  2. implementationVGG16model, 用于CIFAR-10graph像classification.
  3. 比较不同CNNarchitecture performancediff.

练习 3: migrationLearning

  1. using预训练 VGG16 or ResNetmodelforgraph像classification.
  2. 冻结Basicsmodel 权重, 添加自定义顶层.
  3. 微调model, improvingclassificationperformance.
  4. 比较冻结权重 and 微调 performancediff.
on 一节: TensorFlow modelassessment and 保存 under 一节: TensorFlow 循环神经network(RNN)