PyTorch自动微分mechanism

LearningPyTorch 自动微分mechanism, includingAutogradbasicconcepts, 计算graph, 梯度计算 and 张量梯度etc. in 容

1. Autogradbasicconcepts

PyTorch autogradpackage is PyTorch自动微分mechanism core, 它允许我们自动计算张量 梯度.

1.1 what is 自动微分?

自动微分 is a计算function导数 techniques, 它可以自动计算任意 complex function 梯度, 而无需手动推导梯度公式. in 深度Learningin, 我们需要计算损失function for modelparameter 梯度, 以便using梯度 under 降algorithmsupdateparameter.

1.2 PyTorch Autograd working principles

PyTorch Autograd working principles基于计算graph:

  • 当我们creation张量并foroperation时, PyTorch会自动构建一个计算graph, 记录所 has operation 执行过程
  • 计算graph node is 张量, edge is 张量之间 operation
  • 当我们调用.backward()method时, PyTorch会 from 计算graph 输出node开始, 反向遍历计算graph, using链式法则计算所 has 叶子node 梯度

2. 计算graph creation

PyTorch默认会 for 每个张量operationcreation计算graph.

2.1 creation需要梯度 张量

要creation需要计算梯度 张量, 我们可以 in creation张量时设置requires_grad=True:

import torch

# creation需要梯度 张量
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)

print(f"x: {x}, requires_grad: {x.requires_grad}")
print(f"y: {y}, requires_grad: {y.requires_grad}")

2.2 执行张量operation

当我们 for 需要梯度 张量执行operation时, PyTorch会自动构建计算graph:

# 执行operation, 构建计算graph
z = x * y + x**2
print(f"z: {z}")

# 查看operationhistory
print(f"z.grad_fn: {z.grad_fn}")
print(f"z.grad_fn.next_functions: {z.grad_fn.next_functions}")

3. 梯度计算

要计算梯度, 我们可以调用张量 backward()method.

3.1 basic梯度计算

# 计算梯度
z.backward()

# 查看梯度
print(f"x.grad: {x.grad}")  # dz/dx = y + 2x = 3 + 4 = 7
print(f"y.grad: {y.grad}")  # dz/dy = x = 2

3.2 梯度累加

PyTorch会累加梯度, 所以 in 每次iteration before 需要清空梯度:

# 再次执行operation
z2 = x * y + x**2

# 直接计算梯度会累加之 before  梯度
z2.backward()
print(f"x.grad after second backward: {x.grad}")  # 7 + 7 = 14

# 清空梯度
torch.zero_grad()

# 重 new 计算梯度
z3 = x * y + x**2
z3.backward()
print(f"x.grad after zero_grad and third backward: {x.grad}")  # 7

3.3 标量输出 梯度计算

当输出 is 标量时, 我们可以直接调用backward()method:

x = torch.tensor(2.0, requires_grad=True)
y = x**2 + 2*x + 1

# y is 标量, 可以直接调用backward()
y.backward()
print(f"dy/dx at x=2: {x.grad}")  # 2x + 2 = 6

3.4 张量输出 梯度计算

当输出 is 张量时, 我们需要providing一个 and 输出形状相同 梯度张量:

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x**2  # y is 张量, 形状 for (3,)

# providing梯度张量
y.backward(torch.tensor([1.0, 1.0, 1.0]))
print(f"dy/dx: {x.grad}")  # [2.0, 4.0, 6.0]

4. 计算graph 关闭

in 不需要计算梯度 circumstances under , 我们可以关闭计算graph以improvingperformance.

4.1 usingtorch.no_grad()

x = torch.tensor(2.0, requires_grad=True)

#  in no_grad() on  under 文management器in, 不会creation计算graph
with torch.no_grad():
    y = x * x
    print(f"y: {y}, requires_grad: {y.requires_grad}")  # False

# 直接调用detach()method, creation一个 and 原张量共享data但不需要梯度 张量
z = x.detach()
print(f"z: {z}, requires_grad: {z.requires_grad}")  # False

4.2 usingtorch.inference_mode()

PyTorch 1.9+引入了inference_mode(), 比no_grad()更 high 效:

with torch.inference_mode():
    y = x * x
    print(f"y: {y}, requires_grad: {y.requires_grad}")  # False

5. 自定义梯度

我们可以through自定义torch.autograd.Function来implementation自定义 before 向传播 and 反向传播.

5.1 自定义Function

class MyFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        #  before 向传播
        result = x ** 3
        # 保存需要 in 反向传播inusing 值
        ctx.save_for_backward(x)
        return result
    
    @staticmethod
    def backward(ctx, grad_output):
        # 反向传播
        x, = ctx.saved_tensors
        # 计算梯度: d/dx(x^3) = 3x^2
        grad_x = 3 * x ** 2 * grad_output
        return grad_x

# using自定义Function
x = torch.tensor(2.0, requires_grad=True)
y = MyFunction.apply(x)
print(f"y: {y}")

# 计算梯度
y.backward()
print(f"x.grad: {x.grad}")  # 3*(2)^2 = 12

6. advancedAutogradfunctions

6.1 high 阶导数

PyTorchsupport计算 high 阶导数:

x = torch.tensor(2.0, requires_grad=True)

# 一阶导数
y = x**3
y.backward(create_graph=True)  # creation计算graph以support high 阶导数
print(f"一阶导数 dy/dx: {x.grad}")

# 二阶导数
grad_x = x.grad
x.grad.zero_()  # 清空梯度
grad_x.backward()
print(f"二阶导数 d²y/dx²: {x.grad}")

6.2 梯度check

我们可以using数值method来check自动微分计算 梯度 is 否正确:

def f(x):
    return x**3 + 2*x**2 + 1

def grad_f(x):
    return 3*x**2 + 4*x

# creation张量
x = torch.tensor(2.0, requires_grad=True)
y = f(x)
y.backward()

# 自动微分计算 梯度
auto_grad = x.grad.item()
# 手动计算 梯度
manual_grad = grad_f(2.0)

print(f"自动微分梯度: {auto_grad}")
print(f"手动计算梯度: {manual_grad}")
print(f"误差: {abs(auto_grad - manual_grad)}")

7. Autograd and 神经network

in practical 神经network训练in, Autograd会自动processing complex 梯度计算.

7.1 simple 神经networkexample

# creation simple 神经network
model = torch.nn.Sequential(
    torch.nn.Linear(1, 10),
    torch.nn.ReLU(),
    torch.nn.Linear(10, 1)
)

# 定义损失function
loss_fn = torch.nn.MSELoss()

# creationdata
x_train = torch.randn(100, 1)
y_train = 2 * x_train + 1 + torch.randn(100, 1) * 0.1

# 训练model
learning_rate = 0.01

for epoch in range(100):
    #  before 向传播
    y_pred = model(x_train)
    
    # 计算损失
    loss = loss_fn(y_pred, y_train)
    
    # 反向传播
    loss.backward()
    
    # updateparameter
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad
    
    # 清空梯度
    model.zero_grad()
    
    # 打印损失
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/100], Loss: {loss.item():.4f}")

8. Autogradbest practices

  • 只 for 需要梯度 张量设置requires_grad=True: 这可以reducing计算graph big small , improvingperformance
  • in 不需要梯度计算时关闭Autograd: usingno_grad() or inference_mode()
  • 及时清空梯度: in 每次iteration before usingzero_grad()清空梯度
  • 合理using计算graph: 避免creation不必要 计算graph
  • using自定义Functionimplementation complex operation: for 于 complex operation, 可以自定义Function以improvingefficiency