PyTorch 卷积神经network(CNN)

1. 卷积神经networkoverview

卷积神经network (Convolutional Neural Network, CNN) is a专门用于processing具 has 网格structuredata 深度Learningmodel, 尤其 in graph像识别, computer vision and other fields取得了巨 big 成功. CNN core特点 is 共享权重 and 局部连接, 这使得它able to has 效地processing high 维data, 同时reducingmodelparameter数量.

1.1 CNN basic组成

一个典型 CNN由以 under 几个主要component组成:

卷积层 (Convolutional Layer) : 提取输入data 局部特征
激活function (Activation Function) : 引入非线性变换
池化层 (Pooling Layer) : 降 low 特征graph 空间维度
全连接层 (Fully Connected Layer) : 将特征map to 输出class别
dropout层 (Dropout Layer) : 防止过拟合

2. 卷积层

卷积层 is CNN corecomponent, 它through卷积operation提取输入data 局部特征. 卷积operation is a线性operation, 它using一个称 for 卷积核 ( or filter器) small 窗口 in 输入data on 滑动, 计算窗口 in 元素 and 卷积核点积.

2.1 卷积operation原理

fake设我们 has 一个3x3 输入graph像 and 一个2x2 卷积核, 卷积operation 过程such as under :

将卷积核放置 in 输入graph像 left on 角
计算卷积核 and for 应区域点积, 得 to 一个输出值
将卷积核向 right 滑动一个步 long (通常 for 1) , 重复步骤2
当 to 达graph像 right 侧edge界时, 将卷积核向 under 滑动一个步 long , from left 侧重 new 开始
重复 on 述过程, 直 to 覆盖整个graph像

2.2 PyTorchin 卷积层

PyTorchproviding了torch.nn.Conv2dclass来implementation二维卷积层, 适用于graph像data.

import torch
import torch.nn as nn

# creation一个卷积层
conv_layer = nn.Conv2d(
    in_channels=1,      # 输入通道数, 灰度graph像 for 1, RGBgraph像 for 3
    out_channels=6,     # 输出通道数, 即卷积核 数量
    kernel_size=3,      # 卷积核 big  small , 3x3
    stride=1,           # 步 long 
    padding=1           # 填充, 保持输出 big  small  and 输入相同
)

# 查看卷积层 parameter
print(conv_layer)
print(f"卷积核数量: {conv_layer.out_channels}")
print(f"卷积核 big  small : {conv_layer.kernel_size}")
print(f"步 long : {conv_layer.stride}")
print(f"填充: {conv_layer.padding}")

# 输入example: 1个灰度graph像,  big  small  for 28x28
input = torch.randn(1, 1, 28, 28)

#  before 向传播
output = conv_layer(input)
print(f"输入形状: {input.shape}")
print(f"输出形状: {output.shape}")

3. 激活function

激活function用于引入非线性变换, 使CNNable toLearning complex 非线性relationships. 常用激活functionincluding:

ReLU (Rectified Linear Unit) : f(x) = max(0, x)
Sigmoid: f(x) = 1 / (1 + exp(-x))
Tanh: f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
Leaky ReLU: f(x) = max(0.01x, x)
ELU (Exponential Linear Unit) : f(x) = x if x > 0 else α(exp(x) - 1)

3.1 PyTorchin 激活function

# 激活functionexample
relu = nn.ReLU()
sigmoid = nn.Sigmoid()
tanh = nn.Tanh()
leaky_relu = nn.LeakyReLU()
elu = nn.ELU()

# usingexample
x = torch.tensor([-1.0, 0.0, 1.0])
print(f"输入: {x}")
print(f"ReLU输出: {relu(x)}")
print(f"Sigmoid输出: {sigmoid(x)}")
print(f"Tanh输出: {tanh(x)}")
print(f"Leaky ReLU输出: {leaky_relu(x)}")
print(f"ELU输出: {elu(x)}")

4. 池化层

池化层用于降 low 特征graph 空间维度, reducingmodelparameter数量, 防止过拟合. 池化operation通常 has 两种: 最 big 池化 (Max Pooling) and 平均池化 (Average Pooling) .

4.1 最 big 池化 vs 平均池化

最 big 池化: 取池化窗口 in 最 big 值, able to保留纹理information
平均池化: 取池化窗口 in 平均值, able to保留整体information

4.2 PyTorchin 池化层

# 最 big 池化层
max_pool = nn.MaxPool2d(
    kernel_size=2,  # 池化窗口 big  small 
    stride=2        # 步 long 
)

# 平均池化层
avg_pool = nn.AvgPool2d(
    kernel_size=2,  # 池化窗口 big  small 
    stride=2        # 步 long 
)

# 输入example
input = torch.randn(1, 6, 28, 28)  # 6个特征graph, 每个 big  small  for 28x28

# 最 big 池化
max_output = max_pool(input)
print(f"输入形状: {input.shape}")
print(f"最 big 池化输出形状: {max_output.shape}")

# 平均池化
avg_output = avg_pool(input)
print(f"平均池化输出形状: {avg_output.shape}")

5. 构建 simple CNNmodel

现 in 我们来构建一个 simple CNNmodel, 用于MNIST手写number识别.

5.1 model定义

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        # 卷积层1: 1个输入通道, 16个输出通道, 3x3卷积核
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1)
        # 激活function
        self.relu = nn.ReLU()
        # 池化层: 2x2最 big 池化
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        # 卷积层2: 16个输入通道, 32个输出通道, 3x3卷积核
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1)
        # 全连接层1: 32*7*7输入, 128输出
        self.fc1 = nn.Linear(32 * 7 * 7, 128)
        # 全连接层2: 128输入, 10输出 (MNIST has 10个class别) 
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        # 第一层卷积
        x = self.conv1(x)
        x = self.relu(x)
        x = self.pool(x)  # 输出形状: (batch_size, 16, 14, 14)
        
        # 第二层卷积
        x = self.conv2(x)
        x = self.relu(x)
        x = self.pool(x)  # 输出形状: (batch_size, 32, 7, 7)
        
        # 展平特征graph
        x = x.view(-1, 32 * 7 * 7)  # 输出形状: (batch_size, 32*7*7)
        
        # 全连接层
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)  # 输出形状: (batch_size, 10)
        
        return x

# creationmodelinstance
model = SimpleCNN()
print(model)

5.2 model训练

import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# 设置设备
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# data预processing
transform = transforms.Compose([
    transforms.ToTensor(),  # 转换 for 张量
    transforms.Normalize((0.1307,), (0.3081,))  # 归一化, 均值 and 标准差 is MNISTdata集 statistics值
])

# 加载MNISTdata集
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# creationdata加载器
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# 初始化model, 损失function and optimization器
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 训练model
epochs = 5
for epoch in range(epochs):
    run_loss = 0.0
    correct = 0
    total = 0
    
    model.train()  # 切换 to 训练模式
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        
        #  before 向传播
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        
        # 反向传播 and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # statistics损失 and 准确率
        run_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs.data, 1)
        total += targets.size(0)
        correct += (predicted == targets).sum().item()
    
    # 计算平均损失 and 准确率
avg_loss = run_loss / len(train_loader.dataset)
accuracy = correct / total
    
    print(f'Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}, Accuracy: {accuracy:.4f}')

# assessmentmodel
model.eval()  # 切换 to assessment模式
test_correct = 0
test_total = 0
with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        test_total += targets.size(0)
        test_correct += (predicted == targets).sum().item()

test_accuracy = test_correct / test_total
print(f'Test Accuracy: {test_accuracy:.4f}')

6. 经典CNNarchitecture

自CNN提出以来, 出现了许 many 经典 CNNarchitecture, 它们 in 不同 computer visiontaskin取得了出色成绩. 以 under is 一些常用经典CNNarchitecture:

6.1 LeNet-5

LeNet-5 is 由Yann LeCunetc.人 in 1998年提出 , is 第一个成功application于手写number识别 CNNmodel. 它package含2个卷积层, 2个池化层 and 3个全连接层.

6.2 AlexNet

AlexNet is 由Alex Krizhevskyetc.人 in 2012年提出 , 它 in ImageNet竞赛in取得了突破性成绩, 引发了深度Learning 热潮. AlexNetpackage含5个卷积层, 3个池化层 and 3个全连接层, 并using了ReLU激活function and Dropouttechniques.

6.3 VGGNet

VGGNet is 由牛津 big 学 Visual Geometry Group in 2014年提出 , 它特点 is using了 big 量 3x3卷积核 and 池化层, 形成了深度较深 networkstructure (VGG16 and VGG19) .

6.4 GoogLeNet

GoogLeNet (也称 for Inception v1) is 由Google团队 in 2014年提出 , 它引入了Inceptionmodule, able to in 同一层using不同 big small 卷积核, improving了model 表达capacity.

6.5 ResNet

ResNet (Residual Network) is 由Microsoft团队 in 2015年提出 , 它through引入残差连接 (Residual Connection) 解决了深度network训练in 梯度消失issues, 使得训练数百层甚至 on 千层 network成 for 可能.

6.6 PyTorchin 预训练model

PyTorch torchvision.modelsmoduleproviding了许 many 预训练 CNNmodel, 我们可以直接using这些modelformigrationLearning or 微调.

from torchvision import models

# 加载预训练 ResNet18model
resnet18 = models.resnet18(pretrained=True)
print(resnet18)

# 加载预训练 VGG16model
vgg16 = models.vgg16(pretrained=True)
print(vgg16)

# 加载预训练 GoogLeNetmodel
googlenet = models.googlenet(pretrained=True)
print(googlenet)

# 冻结model 卷积层parameter
for param in resnet18.parameters():
    param.requires_grad = False

# modify最 after 一层全连接层, adapting to new classificationtask
num_ftrs = resnet18.fc.in_features
resnet18.fc = nn.Linear(num_ftrs, 10)  # 10个class别

print(resnet18.fc)

7. advancedCNNtechniques

除了basic CNNarchitecture out , 还 has 一些advancedCNNtechniques可以improvingmodel performance:

7.1 data增强

data增强 is a常用 techniques, 它through for 训练datafor随机变换 (such as旋转, 缩放, 翻转etc.) 来增加data many 样性, from 而improvingmodel 泛化capacity.

7.2 批量归一化 (Batch Normalization)

批量归一化可以加速model 收敛, reducing梯度消失issues, 并允许using更 high Learning率. PyTorchproviding了nn.BatchNorm2dclass来implementation批量归一化.

7.3 Dropout

Dropout is a防止过拟合 techniques, 它 in 训练过程in随机discard一部分神经元. PyTorchproviding了nn.Dropoutclass来implementationDropout.

7.4 migrationLearning

migrationLearning is a利用预训练model来解决 new task techniques, 它可以节省训练时间 and 计算resource, improvingmodel performance.

实践练习

练习1: 构建CNNmodel

usingPyTorch构建一个更 complex CNNmodel, 用于CIFAR10data集 classificationtask. CIFAR10data集package含10个class别彩色graph像, 每个graph像 big small for 32x32.

练习2: using预训练model

加载一个预训练 ResNetmodel, modify其最 after 一层, 用于CIFAR10data集 classificationtask, 并比较微调 before after performancediff.

练习3: implementationadvancedCNNtechniques

in 练习1 Basics on , 添加批量归一化, Dropoutetc.advancedtechniques, 观察它们 for modelperformance 影响.

8. summarized

本tutorial介绍了PyTorchin卷积神经network (CNN) implementation, including:

CNN basic组成 and 原理
卷积层 implementation and using
常用激活function and 池化层
构建 simple CNNmodel
经典CNNarchitecture介绍
PyTorchin 预训练model
advancedCNNtechniques

CNN is computer vision领域 coretechniques, MasterCNN 原理 and implementation for 于 from 事深度Learning and computer vision相关工作至关 important . in practicalapplicationin, 我们可以根据task complexity选择合适 CNNarchitecture, or 者using预训练modelformigrationLearning.

on 一课: PyTorchmodelassessment and 保存返回tutoriallist under 一课: PyTorch循环神经network(RNN)