PyTorch 卷积神经network(CNN)
1. 卷积神经networkoverview
卷积神经network (Convolutional Neural Network, CNN) is a专门用于processing具 has 网格structuredata 深度Learningmodel, 尤其 in graph像识别, computer vision and other fields取得了巨 big 成功. CNN core特点 is 共享权重 and 局部连接, 这使得它able to has 效地processing high 维data, 同时reducingmodelparameter数量.
1.1 CNN basic组成
一个典型 CNN由以 under 几个主要component组成:
- 卷积层 (Convolutional Layer) : 提取输入data 局部特征
- 激活function (Activation Function) : 引入非线性变换
- 池化层 (Pooling Layer) : 降 low 特征graph 空间维度
- 全连接层 (Fully Connected Layer) : 将特征map to 输出class别
- dropout层 (Dropout Layer) : 防止过拟合
2. 卷积层
卷积层 is CNN corecomponent, 它through卷积operation提取输入data 局部特征. 卷积operation is a线性operation, 它using一个称 for 卷积核 ( or filter器) small 窗口 in 输入data on 滑动, 计算窗口 in 元素 and 卷积核 点积.
2.1 卷积operation原理
fake设我们 has 一个3x3 输入graph像 and 一个2x2 卷积核, 卷积operation 过程such as under :
- 将卷积核放置 in 输入graph像 left on 角
- 计算卷积核 and for 应区域 点积, 得 to 一个输出值
- 将卷积核向 right 滑动一个步 long (通常 for 1) , 重复步骤2
- 当 to 达graph像 right 侧edge界时, 将卷积核向 under 滑动一个步 long , from left 侧重 new 开始
- 重复 on 述过程, 直 to 覆盖整个graph像
2.2 PyTorchin 卷积层
PyTorchproviding了torch.nn.Conv2dclass来implementation二维卷积层, 适用于graph像data.
import torch
import torch.nn as nn
# creation一个卷积层
conv_layer = nn.Conv2d(
in_channels=1, # 输入通道数, 灰度graph像 for 1, RGBgraph像 for 3
out_channels=6, # 输出通道数, 即卷积核 数量
kernel_size=3, # 卷积核 big small , 3x3
stride=1, # 步 long
padding=1 # 填充, 保持输出 big small and 输入相同
)
# 查看卷积层 parameter
print(conv_layer)
print(f"卷积核数量: {conv_layer.out_channels}")
print(f"卷积核 big small : {conv_layer.kernel_size}")
print(f"步 long : {conv_layer.stride}")
print(f"填充: {conv_layer.padding}")
# 输入example: 1个灰度graph像, big small for 28x28
input = torch.randn(1, 1, 28, 28)
# before 向传播
output = conv_layer(input)
print(f"输入形状: {input.shape}")
print(f"输出形状: {output.shape}")
3. 激活function
激活function用于引入非线性变换, 使CNNable toLearning complex 非线性relationships. 常用 激活functionincluding:
- ReLU (Rectified Linear Unit) : f(x) = max(0, x)
- Sigmoid: f(x) = 1 / (1 + exp(-x))
- Tanh: f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
- Leaky ReLU: f(x) = max(0.01x, x)
- ELU (Exponential Linear Unit) : f(x) = x if x > 0 else α(exp(x) - 1)
3.1 PyTorchin 激活function
# 激活functionexample
relu = nn.ReLU()
sigmoid = nn.Sigmoid()
tanh = nn.Tanh()
leaky_relu = nn.LeakyReLU()
elu = nn.ELU()
# usingexample
x = torch.tensor([-1.0, 0.0, 1.0])
print(f"输入: {x}")
print(f"ReLU输出: {relu(x)}")
print(f"Sigmoid输出: {sigmoid(x)}")
print(f"Tanh输出: {tanh(x)}")
print(f"Leaky ReLU输出: {leaky_relu(x)}")
print(f"ELU输出: {elu(x)}")
4. 池化层
池化层用于降 low 特征graph 空间维度, reducingmodelparameter数量, 防止过拟合. 池化operation通常 has 两种: 最 big 池化 (Max Pooling) and 平均池化 (Average Pooling) .
4.1 最 big 池化 vs 平均池化
- 最 big 池化: 取池化窗口 in 最 big 值, able to保留纹理information
- 平均池化: 取池化窗口 in 平均值, able to保留整体information
4.2 PyTorchin 池化层
# 最 big 池化层
max_pool = nn.MaxPool2d(
kernel_size=2, # 池化窗口 big small
stride=2 # 步 long
)
# 平均池化层
avg_pool = nn.AvgPool2d(
kernel_size=2, # 池化窗口 big small
stride=2 # 步 long
)
# 输入example
input = torch.randn(1, 6, 28, 28) # 6个特征graph, 每个 big small for 28x28
# 最 big 池化
max_output = max_pool(input)
print(f"输入形状: {input.shape}")
print(f"最 big 池化输出形状: {max_output.shape}")
# 平均池化
avg_output = avg_pool(input)
print(f"平均池化输出形状: {avg_output.shape}")
5. 构建 simple CNNmodel
现 in 我们来构建一个 simple CNNmodel, 用于MNIST手写number识别.
5.1 model定义
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
# 卷积层1: 1个输入通道, 16个输出通道, 3x3卷积核
self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1)
# 激活function
self.relu = nn.ReLU()
# 池化层: 2x2最 big 池化
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
# 卷积层2: 16个输入通道, 32个输出通道, 3x3卷积核
self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1)
# 全连接层1: 32*7*7输入, 128输出
self.fc1 = nn.Linear(32 * 7 * 7, 128)
# 全连接层2: 128输入, 10输出 (MNIST has 10个class别)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
# 第一层卷积
x = self.conv1(x)
x = self.relu(x)
x = self.pool(x) # 输出形状: (batch_size, 16, 14, 14)
# 第二层卷积
x = self.conv2(x)
x = self.relu(x)
x = self.pool(x) # 输出形状: (batch_size, 32, 7, 7)
# 展平特征graph
x = x.view(-1, 32 * 7 * 7) # 输出形状: (batch_size, 32*7*7)
# 全连接层
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x) # 输出形状: (batch_size, 10)
return x
# creationmodelinstance
model = SimpleCNN()
print(model)
5.2 model训练
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# 设置设备
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# data预processing
transform = transforms.Compose([
transforms.ToTensor(), # 转换 for 张量
transforms.Normalize((0.1307,), (0.3081,)) # 归一化, 均值 and 标准差 is MNISTdata集 statistics值
])
# 加载MNISTdata集
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
# creationdata加载器
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
# 初始化model, 损失function and optimization器
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 训练model
epochs = 5
for epoch in range(epochs):
run_loss = 0.0
correct = 0
total = 0
model.train() # 切换 to 训练模式
for inputs, targets in train_loader:
inputs, targets = inputs.to(device), targets.to(device)
# before 向传播
outputs = model(inputs)
loss = criterion(outputs, targets)
# 反向传播 and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
# statistics损失 and 准确率
run_loss += loss.item() * inputs.size(0)
_, predicted = torch.max(outputs.data, 1)
total += targets.size(0)
correct += (predicted == targets).sum().item()
# 计算平均损失 and 准确率
avg_loss = run_loss / len(train_loader.dataset)
accuracy = correct / total
print(f'Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}, Accuracy: {accuracy:.4f}')
# assessmentmodel
model.eval() # 切换 to assessment模式
test_correct = 0
test_total = 0
with torch.no_grad():
for inputs, targets in test_loader:
inputs, targets = inputs.to(device), targets.to(device)
outputs = model(inputs)
_, predicted = torch.max(outputs.data, 1)
test_total += targets.size(0)
test_correct += (predicted == targets).sum().item()
test_accuracy = test_correct / test_total
print(f'Test Accuracy: {test_accuracy:.4f}')
6. 经典CNNarchitecture
自CNN提出以来, 出现了许 many 经典 CNNarchitecture, 它们 in 不同 computer visiontaskin取得了出色 成绩. 以 under is 一些常用 经典CNNarchitecture:
6.1 LeNet-5
LeNet-5 is 由Yann LeCunetc.人 in 1998年提出 , is 第一个成功application于手写number识别 CNNmodel. 它package含2个卷积层, 2个池化层 and 3个全连接层.
6.2 AlexNet
AlexNet is 由Alex Krizhevskyetc.人 in 2012年提出 , 它 in ImageNet竞赛in取得了突破性 成绩, 引发了深度Learning 热潮. AlexNetpackage含5个卷积层, 3个池化层 and 3个全连接层, 并using了ReLU激活function and Dropouttechniques.
6.3 VGGNet
VGGNet is 由牛津 big 学 Visual Geometry Group in 2014年提出 , 它 特点 is using了 big 量 3x3卷积核 and 池化层, 形成了深度较深 networkstructure (VGG16 and VGG19) .
6.4 GoogLeNet
GoogLeNet (也称 for Inception v1) is 由Google团队 in 2014年提出 , 它引入了Inceptionmodule, able to in 同一层using不同 big small 卷积核, improving了model 表达capacity.
6.5 ResNet
ResNet (Residual Network) is 由Microsoft团队 in 2015年提出 , 它through引入残差连接 (Residual Connection) 解决了深度network训练in 梯度消失issues, 使得训练数百层甚至 on 千层 network成 for 可能.
6.6 PyTorchin 预训练model
PyTorch torchvision.modelsmoduleproviding了许 many 预训练 CNNmodel, 我们可以直接using这些modelformigrationLearning or 微调.
from torchvision import models
# 加载预训练 ResNet18model
resnet18 = models.resnet18(pretrained=True)
print(resnet18)
# 加载预训练 VGG16model
vgg16 = models.vgg16(pretrained=True)
print(vgg16)
# 加载预训练 GoogLeNetmodel
googlenet = models.googlenet(pretrained=True)
print(googlenet)
# 冻结model 卷积层parameter
for param in resnet18.parameters():
param.requires_grad = False
# modify最 after 一层全连接层, adapting to new classificationtask
num_ftrs = resnet18.fc.in_features
resnet18.fc = nn.Linear(num_ftrs, 10) # 10个class别
print(resnet18.fc)
7. advancedCNNtechniques
除了basic CNNarchitecture out , 还 has 一些advancedCNNtechniques可以improvingmodel performance:
7.1 data增强
data增强 is a常用 techniques, 它through for 训练datafor随机变换 (such as旋转, 缩放, 翻转etc.) 来增加data many 样性, from 而improvingmodel 泛化capacity.
7.2 批量归一化 (Batch Normalization)
批量归一化可以加速model 收敛, reducing梯度消失issues, 并允许using更 high Learning率. PyTorchproviding了nn.BatchNorm2dclass来implementation批量归一化.
7.3 Dropout
Dropout is a防止过拟合 techniques, 它 in 训练过程in随机discard一部分神经元. PyTorchproviding了nn.Dropoutclass来implementationDropout.
7.4 migrationLearning
migrationLearning is a利用预训练model来解决 new task techniques, 它可以节省训练时间 and 计算resource, improvingmodel performance.
实践练习
练习1: 构建CNNmodel
usingPyTorch构建一个更 complex CNNmodel, 用于CIFAR10data集 classificationtask. CIFAR10data集package含10个class别 彩色graph像, 每个graph像 big small for 32x32.
练习2: using预训练model
加载一个预训练 ResNetmodel, modify其最 after 一层, 用于CIFAR10data集 classificationtask, 并比较微调 before after performancediff.
练习3: implementationadvancedCNNtechniques
in 练习1 Basics on , 添加批量归一化, Dropoutetc.advancedtechniques, 观察它们 for modelperformance 影响.
8. summarized
本tutorial介绍了PyTorchin卷积神经network (CNN) implementation, including:
- CNN basic组成 and 原理
- 卷积层 implementation and using
- 常用 激活function and 池化层
- 构建 simple CNNmodel
- 经典CNNarchitecture介绍
- PyTorchin 预训练model
- advancedCNNtechniques
CNN is computer vision领域 coretechniques, MasterCNN 原理 and implementation for 于 from 事深度Learning and computer vision相关工作至关 important . in practicalapplicationin, 我们可以根据task complexity选择合适 CNNarchitecture, or 者using预训练modelformigrationLearning.