PyTorch model训练 and optimization

深入Learningmodel训练过程, including损失function选择, optimization器configuration, Learning率调整 and 过拟合防止etc.advancedtechniques

PyTorch model训练 and optimization

1. model训练 basic流程

in PyTorchin, model训练 basic流程including以 under 步骤:

  1. 准备data (usingDataLoader)
  2. 定义model
  3. 选择损失function
  4. 选择optimization器
  5. 训练model ( before 向传播, 计算损失, 反向传播, updateparameter)
  6. assessmentmodel
  7. 保存model

2. 损失function选择

损失function is model训练 corecomponent之一, 它衡量model预测值 and 真实值之间 diff. PyTorchproviding了 many 种损失function, 我们需要根据taskclass型选择合适 损失function.

2.1 回归task

for 于回归task, 常用 损失functionincluding:

  • torch.nn.MSELoss: 均方误差, 适用于 big many 数回归task
  • torch.nn.L1Loss: 平均绝 for 误差, for exception值不敏感
  • torch.nn.SmoothL1Loss: 平滑L1损失, 结合了MSE and L1 优点
# 回归task损失functionexample
criterion = torch.nn.MSELoss()  # 均方误差
# criterion = torch.nn.L1Loss()  # 平均绝 for 误差
# criterion = torch.nn.SmoothL1Loss()  # 平滑L1损失

2.2 classificationtask

for 于classificationtask, 常用 损失functionincluding:

  • torch.nn.CrossEntropyLoss: 交叉熵损失, 适用于 many classificationtask, 自动package含Softmaxoperation
  • torch.nn.BCELoss: 二元交叉熵损失, 适用于二classificationtask, 需要配合Sigmoidusing
  • torch.nn.BCEWithLogitsLoss: 带 has Sigmoid 二元交叉熵损失, efficiency更 high
# classificationtask损失functionexample
#  many classificationtask
criterion = torch.nn.CrossEntropyLoss()

# 二classificationtask
# criterion = torch.nn.BCELoss()  # 需要先usingSigmoid
criterion = torch.nn.BCEWithLogitsLoss()  # 自带Sigmoid

3. optimization器选择 and configuration

optimization器用于updatemodelparameter, PyTorchproviding了 many 种optimization器, 每种optimization器都 has 其适用场景 and 超parameter.

3.1 常用optimization器

  • torch.optim.SGD: 随机梯度 under 降, Basicsoptimization器
  • torch.optim.Momentum: 带动量 SGD, 加速收敛
  • torch.optim.Adam: 自适应矩估计, 结合了Momentum and RMSProp 优点
  • torch.optim.RMSprop: 自适应Learning率optimization器
  • torch.optim.Adagrad: 自适应Learning率, 适合稀疏data
  • torch.optim.Adadelta: 无需手动调整Learning率
# optimization器example
# SGDoptimization器
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# Adamoptimization器
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# RMSpropoptimization器
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001, alpha=0.99)

3.2 optimization器超parameter调整

不同 optimization器 has 不同 超parameter, 我们需要根据具体taskfor调整:

  • Learning率 (lr) : 最 important 超parameter, 控制parameterupdate 步 long
  • 动量 (momentum) : 加速收敛, 防止局部最优
  • 权重衰减 (weight_decay) : 正则化项, 防止过拟合
  • β1, β2 (Adam) : 动量 and RMSProp 衰减率

4. Learning率调整策略

Learning率 is model训练in最 important 超parameter之一. 固定Learning率可能导致model收敛 slow or 过拟合, 因此我们需要adopts合适 Learning率调整策略.

4.1 Learning率scheduling器

PyTorchproviding了 many 种Learning率scheduling器, 用于动态调整Learning率:

  • torch.optim.lr_scheduler.StepLR: 每隔一定步数降 low Learning率
  • torch.optim.lr_scheduler.MultiStepLR: in 指定步数降 low Learning率
  • torch.optim.lr_scheduler.ExponentialLR: 指数衰减Learning率
  • torch.optim.lr_scheduler.CosineAnnealingLR: 余弦退火Learning率
  • torch.optim.lr_scheduler.ReduceLROnPlateau: 根据verification损失自动降 low Learning率
# Learning率scheduling器example
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# StepLR: 每30个epoch, Learning率乘以0.1
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

# MultiStepLR:  in 指定 epoch降 low Learning率
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30, 60, 90], gamma=0.1)

# ReduceLROnPlateau: 当verification损失不再 under 降时降 low Learning率
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10)

#  in 训练循环inusingscheduling器
epochs = 100
for epoch in range(epochs):
    # 训练code...
    
    # updateLearning率
    scheduler.step()
    #  for 于ReduceLROnPlateau, 需要传入verification损失
    # scheduler.step(val_loss)

4.2 自定义Learning率策略

除了using in 置 Learning率scheduling器, 我们还可以自定义Learning率策略:

# 自定义Learning率策略example
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# 训练循环
epochs = 100
for epoch in range(epochs):
    # 自定义Learning率: 随着epoch增加, Learning率线性 under 降
    lr = 0.1 * (1 - epoch / epochs)
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr
    
    # 训练code...

5. 防止过拟合

过拟合 is 指model in 训练集 on 表现很 good , 但 in test集 on 表现较差 现象. 我们可以adopts以 under method防止过拟合:

5.1 data增强

data增强through for 训练datafor随机变换来增加data many 样性, from 而improvingmodel 泛化capacity. 我们 in 第5课in已经介绍了data增强 method.

5.2 正则化

正则化through in 损失functionin添加正则项来限制model complexity:

  • L1正则化: 添加权重 绝 for 值之 and , 会产生稀疏权重
  • L2正则化: 添加权重 平方 and , 又称权重衰减
# 添加L2正则化 (权重衰减) 
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001)

5.3 Dropout

Dropout is a常用 正则化techniques, 它 in 训练过程in随机discard一部分神经元, 防止model过度依赖某些神经元.

# Dropoutexample
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = torch.nn.Linear(784, 256)
        self.dropout = torch.nn.Dropout(0.5)  # Dropout层, discard概率 for 0.5
        self.fc2 = torch.nn.Linear(256, 128)
        self.fc3 = torch.nn.Linear(128, 10)
    
    def forward(self, x):
        x = torch.nn.functional.relu(self.fc1(x))
        x = self.dropout(x)  # applicationDropout
        x = torch.nn.functional.relu(self.fc2(x))
        x = self.dropout(x)  # applicationDropout
        x = self.fc3(x)
        return x

5.4 早停

早停 is 指 in verification损失不再 under 降时停止训练, 防止model继续拟合训练集 噪声.

# 早停example
best_val_loss = float('inf')
patience = 10
epochs_no_improve = 0

for epoch in range(epochs):
    # 训练code...
    
    # verificationcode...
    val_loss = validate(model, val_loader, criterion)
    
    # check is 否 has improvement
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        epochs_no_improve = 0
        # 保存最佳model
        torch.save(model.state_dict(), 'best_model.pth')
    else:
        epochs_no_improve += 1
    
    # 早停check
    if epochs_no_improve >= patience:
        print(f"早停 in 第 {epoch+1} 个epoch")
        break

6. model训练techniques

以 under is 一些improvingmodel训练效果 techniques:

6.1 梯度裁剪

梯度裁剪可以防止梯度爆炸, through限制梯度 最 big 范数来implementation:

# 梯度裁剪example
optimizer.zero_grad()
y_pred = model(X)
loss = criterion(y_pred, y)
loss.backward()

# 梯度裁剪, 最 big 范数 for 1.0
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

optimizer.step()

6.2 混合精度训练

混合精度训练using半精度浮点数 (float16) and 单精度浮点数 (float32) 混合for计算, 可以加速训练并reducingmemoryusing:

# 混合精度训练example (PyTorch 1.6+) 
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for epoch in range(epochs):
    for inputs, targets in train_loader:
        inputs, targets = inputs.cuda(), targets.cuda()
        
        optimizer.zero_grad()
        
        #  before 向传播usingautocast
        with autocast():
            outputs = model(inputs)
            loss = criterion(outputs, targets)
        
        # 反向传播usingscaler
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

6.3 批量归一化

批量归一化可以加速model收敛, reducing梯度消失issues:

# 批量归一化example
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = torch.nn.Linear(784, 256)
        self.bn1 = torch.nn.BatchNorm1d(256)  # 批量归一化层
        self.fc2 = torch.nn.Linear(256, 128)
        self.bn2 = torch.nn.BatchNorm1d(128)  # 批量归一化层
        self.fc3 = torch.nn.Linear(128, 10)
    
    def forward(self, x):
        x = torch.nn.functional.relu(self.bn1(self.fc1(x)))
        x = torch.nn.functional.relu(self.bn2(self.fc2(x)))
        x = self.fc3(x)
        return x

实践练习

练习1: 不同optimization器比较

using相同 model and data集, 比较不同optimization器 (SGD, Adam, RMSprop) 训练效果, including收敛速度 and 最终准确率.

练习2: Learning率scheduling器

尝试不同 Learning率scheduling器 (StepLR, MultiStepLR, ReduceLROnPlateau) , 观察它们 for model训练 影响.

练习3: 防止过拟合

design一个 easy 过拟合 model, 然 after application many 种防止过拟合 techniques (Dropout, 正则化, 早停etc.) , 观察modelperformance 变化.

7. summarized

本tutorial介绍了PyTorchmodel训练 and optimization advancedtechniques, including:

  • 不同task 损失function选择
  • 常用optimization器 configuration and using
  • Learning率调整策略
  • 防止过拟合 method
  • model训练 Practicaltechniques

Master这些techniques可以helping你训练出更 high 效, 更准确 深度Learningmodel. in practicalapplicationin, 我们需要根据具体task and data集选择合适 训练策略.