PyTorch model训练 and optimization
1. model训练 basic流程
in PyTorchin, model训练 basic流程including以 under 步骤:
- 准备data (usingDataLoader)
- 定义model
- 选择损失function
- 选择optimization器
- 训练model ( before 向传播, 计算损失, 反向传播, updateparameter)
- assessmentmodel
- 保存model
2. 损失function选择
损失function is model训练 corecomponent之一, 它衡量model预测值 and 真实值之间 diff. PyTorchproviding了 many 种损失function, 我们需要根据taskclass型选择合适 损失function.
2.1 回归task
for 于回归task, 常用 损失functionincluding:
torch.nn.MSELoss: 均方误差, 适用于 big many 数回归tasktorch.nn.L1Loss: 平均绝 for 误差, for exception值不敏感torch.nn.SmoothL1Loss: 平滑L1损失, 结合了MSE and L1 优点
# 回归task损失functionexample criterion = torch.nn.MSELoss() # 均方误差 # criterion = torch.nn.L1Loss() # 平均绝 for 误差 # criterion = torch.nn.SmoothL1Loss() # 平滑L1损失
2.2 classificationtask
for 于classificationtask, 常用 损失functionincluding:
torch.nn.CrossEntropyLoss: 交叉熵损失, 适用于 many classificationtask, 自动package含Softmaxoperationtorch.nn.BCELoss: 二元交叉熵损失, 适用于二classificationtask, 需要配合Sigmoidusingtorch.nn.BCEWithLogitsLoss: 带 has Sigmoid 二元交叉熵损失, efficiency更 high
# classificationtask损失functionexample # many classificationtask criterion = torch.nn.CrossEntropyLoss() # 二classificationtask # criterion = torch.nn.BCELoss() # 需要先usingSigmoid criterion = torch.nn.BCEWithLogitsLoss() # 自带Sigmoid
3. optimization器选择 and configuration
optimization器用于updatemodelparameter, PyTorchproviding了 many 种optimization器, 每种optimization器都 has 其适用场景 and 超parameter.
3.1 常用optimization器
torch.optim.SGD: 随机梯度 under 降, Basicsoptimization器torch.optim.Momentum: 带动量 SGD, 加速收敛torch.optim.Adam: 自适应矩估计, 结合了Momentum and RMSProp 优点torch.optim.RMSprop: 自适应Learning率optimization器torch.optim.Adagrad: 自适应Learning率, 适合稀疏datatorch.optim.Adadelta: 无需手动调整Learning率
# optimization器example # SGDoptimization器 optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9) # Adamoptimization器 optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # RMSpropoptimization器 optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001, alpha=0.99)
3.2 optimization器超parameter调整
不同 optimization器 has 不同 超parameter, 我们需要根据具体taskfor调整:
- Learning率 (lr) : 最 important 超parameter, 控制parameterupdate 步 long
- 动量 (momentum) : 加速收敛, 防止局部最优
- 权重衰减 (weight_decay) : 正则化项, 防止过拟合
- β1, β2 (Adam) : 动量 and RMSProp 衰减率
4. Learning率调整策略
Learning率 is model训练in最 important 超parameter之一. 固定Learning率可能导致model收敛 slow or 过拟合, 因此我们需要adopts合适 Learning率调整策略.
4.1 Learning率scheduling器
PyTorchproviding了 many 种Learning率scheduling器, 用于动态调整Learning率:
torch.optim.lr_scheduler.StepLR: 每隔一定步数降 low Learning率torch.optim.lr_scheduler.MultiStepLR: in 指定步数降 low Learning率torch.optim.lr_scheduler.ExponentialLR: 指数衰减Learning率torch.optim.lr_scheduler.CosineAnnealingLR: 余弦退火Learning率torch.optim.lr_scheduler.ReduceLROnPlateau: 根据verification损失自动降 low Learning率
# Learning率scheduling器example
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
# StepLR: 每30个epoch, Learning率乘以0.1
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
# MultiStepLR: in 指定 epoch降 low Learning率
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30, 60, 90], gamma=0.1)
# ReduceLROnPlateau: 当verification损失不再 under 降时降 low Learning率
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10)
# in 训练循环inusingscheduling器
epochs = 100
for epoch in range(epochs):
# 训练code...
# updateLearning率
scheduler.step()
# for 于ReduceLROnPlateau, 需要传入verification损失
# scheduler.step(val_loss)
4.2 自定义Learning率策略
除了using in 置 Learning率scheduling器, 我们还可以自定义Learning率策略:
# 自定义Learning率策略example
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
# 训练循环
epochs = 100
for epoch in range(epochs):
# 自定义Learning率: 随着epoch增加, Learning率线性 under 降
lr = 0.1 * (1 - epoch / epochs)
for param_group in optimizer.param_groups:
param_group['lr'] = lr
# 训练code...
5. 防止过拟合
过拟合 is 指model in 训练集 on 表现很 good , 但 in test集 on 表现较差 现象. 我们可以adopts以 under method防止过拟合:
5.1 data增强
data增强through for 训练datafor随机变换来增加data many 样性, from 而improvingmodel 泛化capacity. 我们 in 第5课in已经介绍了data增强 method.
5.2 正则化
正则化through in 损失functionin添加正则项来限制model complexity:
- L1正则化: 添加权重 绝 for 值之 and , 会产生稀疏权重
- L2正则化: 添加权重 平方 and , 又称权重衰减
# 添加L2正则化 (权重衰减) optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001)
5.3 Dropout
Dropout is a常用 正则化techniques, 它 in 训练过程in随机discard一部分神经元, 防止model过度依赖某些神经元.
# Dropoutexample
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = torch.nn.Linear(784, 256)
self.dropout = torch.nn.Dropout(0.5) # Dropout层, discard概率 for 0.5
self.fc2 = torch.nn.Linear(256, 128)
self.fc3 = torch.nn.Linear(128, 10)
def forward(self, x):
x = torch.nn.functional.relu(self.fc1(x))
x = self.dropout(x) # applicationDropout
x = torch.nn.functional.relu(self.fc2(x))
x = self.dropout(x) # applicationDropout
x = self.fc3(x)
return x
5.4 早停
早停 is 指 in verification损失不再 under 降时停止训练, 防止model继续拟合训练集 噪声.
# 早停example
best_val_loss = float('inf')
patience = 10
epochs_no_improve = 0
for epoch in range(epochs):
# 训练code...
# verificationcode...
val_loss = validate(model, val_loader, criterion)
# check is 否 has improvement
if val_loss < best_val_loss:
best_val_loss = val_loss
epochs_no_improve = 0
# 保存最佳model
torch.save(model.state_dict(), 'best_model.pth')
else:
epochs_no_improve += 1
# 早停check
if epochs_no_improve >= patience:
print(f"早停 in 第 {epoch+1} 个epoch")
break
6. model训练techniques
以 under is 一些improvingmodel训练效果 techniques:
6.1 梯度裁剪
梯度裁剪可以防止梯度爆炸, through限制梯度 最 big 范数来implementation:
# 梯度裁剪example optimizer.zero_grad() y_pred = model(X) loss = criterion(y_pred, y) loss.backward() # 梯度裁剪, 最 big 范数 for 1.0 torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) optimizer.step()
6.2 混合精度训练
混合精度训练using半精度浮点数 (float16) and 单精度浮点数 (float32) 混合for计算, 可以加速训练并reducingmemoryusing:
# 混合精度训练example (PyTorch 1.6+)
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for epoch in range(epochs):
for inputs, targets in train_loader:
inputs, targets = inputs.cuda(), targets.cuda()
optimizer.zero_grad()
# before 向传播usingautocast
with autocast():
outputs = model(inputs)
loss = criterion(outputs, targets)
# 反向传播usingscaler
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
6.3 批量归一化
批量归一化可以加速model收敛, reducing梯度消失issues:
# 批量归一化example
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = torch.nn.Linear(784, 256)
self.bn1 = torch.nn.BatchNorm1d(256) # 批量归一化层
self.fc2 = torch.nn.Linear(256, 128)
self.bn2 = torch.nn.BatchNorm1d(128) # 批量归一化层
self.fc3 = torch.nn.Linear(128, 10)
def forward(self, x):
x = torch.nn.functional.relu(self.bn1(self.fc1(x)))
x = torch.nn.functional.relu(self.bn2(self.fc2(x)))
x = self.fc3(x)
return x
实践练习
练习1: 不同optimization器比较
using相同 model and data集, 比较不同optimization器 (SGD, Adam, RMSprop) 训练效果, including收敛速度 and 最终准确率.
练习2: Learning率scheduling器
尝试不同 Learning率scheduling器 (StepLR, MultiStepLR, ReduceLROnPlateau) , 观察它们 for model训练 影响.
练习3: 防止过拟合
design一个 easy 过拟合 model, 然 after application many 种防止过拟合 techniques (Dropout, 正则化, 早停etc.) , 观察modelperformance 变化.
7. summarized
本tutorial介绍了PyTorchmodel训练 and optimization advancedtechniques, including:
- 不同task 损失function选择
- 常用optimization器 configuration and using
- Learning率调整策略
- 防止过拟合 method
- model训练 Practicaltechniques
Master这些techniques可以helping你训练出更 high 效, 更准确 深度Learningmodel. in practicalapplicationin, 我们需要根据具体task and data集选择合适 训练策略.