神经networkBasics

Understand神经network basicstructure, working principles and 数学Basics

1. what is 神经network?

神经network (Neural Network) is a模仿人脑神经元structure and functions 计算model, 它由 big 量相互连接 人工神经元组成, able tothroughLearning from datain提取特征 and 模式.

提示

神经network is 机器Learning and 深度Learning corealgorithms之一, 它able toprocessing complex 非线性issues, in graph像识别, 自然languageprocessing and other fields取得了显著成果.

1.1 神经network 生物学启发

神经network design灵感来sources于人class big 脑 神经元structure:

  • 神经元: big 脑in basic计算单元, through突触接收 and 传递信号.
  • 突触: 神经元之间 连接, 传递信号 强度可以调节.
  • 激活: 当神经元接收 to 信号超过阈值时, 会产生动作电位.

1.2 人工神经network basic组成

人工神经network由以 under basiccomponent组成:

  • 输入层: 接收原始data 神经元层.
  • 隐藏层: 位于输入层 and 输出层之间 神经元层, 用于提取特征.
  • 输出层: 产生最终结果 神经元层.
  • 权重: 连接神经元 parameter, 决定信号 强度.
  • 偏置: 每个神经元 阈值parameter, 影响神经元 激活status.
  • 激活function: 决定神经元输出 非线性function.

2. 感知器model

感知器 (Perceptron) is 最 simple 神经networkmodel, 由美国心理学家弗兰克·罗森布拉特于1957年提出.

2.1 感知器 structure

感知器由以 under 部group成:

  • 一个 or many 个输入node
  • 一个输出node
  • 连接输入 and 输出 权重
  • 一个偏置项
  • 一个激活function (通常 is 阶跃function)

2.2 感知器 working principles

感知器 工作过程such as under :

  1. 计算输入信号 加权 and : z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
  2. 将加权 and through激活function: y = f(z)
  3. 根据预测结果 and practical结果调整权重: wᵢ = wᵢ + η(y - ŷ)xᵢ

2.3 感知器 局限性

感知器只能解决线性可分issues, for 于非线性issues (such as异 or issues) 无法正确classification. 这一局限性由马文·明斯基 and 西蒙·派珀特 in 1969年 《感知器》一书in指出, 导致了artificial intelligence 第一次 low 谷.

3. many 层神经network

for 了克服感知器 局限性, 研究者们提出了 many 层神经network (Multi-layer Neural Network) , 也称 for many 层感知器 (MLP) .

3.1 many 层神经network structure

many 层神经network通常package含:

  • 输入层: 接收原始data
  • 一个 or many 个隐藏层: 提取特征
  • 输出层: 产生最终结果

3.2 before 向传播

before 向传播 (Forward Propagation) is 指信号 from 输入层经过隐藏层传递 to 输出层 过程:

  1. 输入层接收data: x
  2. 隐藏层计算: z₁ = w₁x + b₁, a₁ = f(z₁)
  3. 输出层计算: z₂ = w₂a₁ + b₂, y = f(z₂)

3.3 反向传播

反向传播 (Backward Propagation) is 指误差 from 输出层反向传递 to 输入层, 并update权重 过程:

  1. 计算输出层误差: δ₂ = (y - ŷ) * f'(z₂)
  2. 计算隐藏层误差: δ₁ = w₂ᵀδ₂ * f'(z₁)
  3. update输出层权重: w₂ = w₂ - ηδ₂a₁ᵀ
  4. update隐藏层权重: w₁ = w₁ - ηδ₁xᵀ

4. 激活function

激活function (Activation Function) is 神经networkin important 组成部分, 它 for 神经network引入了非线性features, 使得神经networkable toLearning complex 非线性relationships.

4.1 常用 激活function

4.1.1 阶跃function (Step Function)

f(z) = 1 if z ≥ 0, else 0

特点: simple 但不可导, 只适用于感知器.

4.1.2 Sigmoidfunction

f(z) = 1 / (1 + e^(-z))

特点: 输出范围 in (0,1)之间, 可导, 但存 in 梯度消失issues.

4.1.3 Tanhfunction

f(z) = (e^z - e^(-z)) / (e^z + e^(-z))

特点: 输出范围 in (-1,1)之间, 可导, 梯度消失issues比Sigmoid轻.

4.1.4 ReLUfunction

f(z) = max(0, z)

特点: 计算 simple , 缓解了梯度消失issues, 但存 in 死亡ReLUissues.

4.1.5 Leaky ReLUfunction

f(z) = max(αz, z), 其inα is a small 正数

特点: 解决了死亡ReLUissues, 保持了ReLU 优点.

4.1.6 Softmaxfunction

f(zᵢ) = e^zᵢ / Σe^zⱼ

特点: 输出概率分布, 常用于 many classificationissues 输出层.

4.2 激活function 选择

  • 隐藏层: 通常usingReLU or Leaky ReLU
  • 二classificationissues输出层: 通常usingSigmoidfunction
  • many classificationissues输出层: 通常usingSoftmaxfunction
  • 回归issues输出层: 通常不需要激活function or using线性激活function

5. 神经network 训练

神经network 训练过程including以 under 步骤:

5.1 损失function

损失function (Loss Function) 用于衡量model预测值 and practical值之间 diff:

5.1.1 均方误差 (MSE)

L = (1/n)Σ(yᵢ - ŷᵢ)², 适用于回归issues.

5.1.2 交叉熵损失

L = -Σyᵢlog(ŷᵢ), 适用于classificationissues.

5.2 optimizationalgorithms

optimizationalgorithms用于最 small 化损失function, update神经network parameter:

5.2.1 梯度 under 降

最basic optimizationalgorithms, through计算损失function 梯度来updateparameter: θ = θ - η∇L(θ)

5.2.2 随机梯度 under 降 (SGD)

每次using一个样本计算梯度并updateparameter, 计算速度 fast 但波动 big .

5.2.3 small 批量梯度 under 降 (Mini-batch SGD)

每次using一 small 批样本计算梯度并updateparameter, 平衡了计算速度 and stable 性.

5.2.4 动量法 (Momentum)

引入动量项, 加速optimization过程并reducing波动: v = γv + η∇L(θ), θ = θ - v

5.2.5 自适应Learning率algorithms

such asAdam, RMSpropetc., 自动调整每个parameter Learning率, 加速收敛.

5.3 过拟合issues

过拟合 (Overfitting) is 指model in 训练data on 表现良 good , 但 in testdata on 表现不佳 现象.

5.3.1 过拟合 原因

  • model过于 complex
  • 训练data不足
  • 训练data噪声过 big

5.3.2 防止过拟合 method

  • 正则化: such asL1正则化, L2正则化
  • dropout: 训练过程in随机失活部分神经元
  • 早停: 当verification集performance不再提升时停止训练
  • data增强: 增加训练data many 样性
  • 批量归一化: 加速收敛并reducing过拟合

6. codeexample: implementation一个 simple 神经network

under 面 is a usingPython and NumPyimplementation simple 神经network, 用于解决异 or issues:

import numpy as np

# 定义激活function and 其导数
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# 定义神经networkclass
class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # 初始化权重
        self.weights1 = np.random.randn(input_size, hidden_size) * 0.01
        self.bias1 = np.zeros((1, hidden_size))
        self.weights2 = np.random.randn(hidden_size, output_size) * 0.01
        self.bias2 = np.zeros((1, output_size))
    
    def forward(self, X):
        #  before 向传播
        self.z1 = np.dot(X, self.weights1) + self.bias1
        self.a1 = sigmoid(self.z1)
        self.z2 = np.dot(self.a1, self.weights2) + self.bias2
        self.a2 = sigmoid(self.z2)
        return self.a2
    
    def backward(self, X, y, output, learning_rate):
        # 反向传播
        # 计算输出层误差
        output_error = y - output
        output_delta = output_error * sigmoid_derivative(output)
        
        # 计算隐藏层误差
        hidden_error = np.dot(output_delta, self.weights2.T)
        hidden_delta = hidden_error * sigmoid_derivative(self.a1)
        
        # update权重 and 偏置
        self.weights2 += np.dot(self.a1.T, output_delta) * learning_rate
        self.bias2 += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
        self.weights1 += np.dot(X.T, hidden_delta) * learning_rate
        self.bias1 += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate
    
    def train(self, X, y, epochs, learning_rate):
        for i in range(epochs):
            output = self.forward(X)
            self.backward(X, y, output, learning_rate)
            if (i + 1) % 1000 == 0:
                loss = np.mean(np.square(y - output))
                print(f"Epoch {i+1}, Loss: {loss:.4f}")
    
    def predict(self, X):
        return self.forward(X)

# test神经network
if __name__ == "__main__":
    # 异 or issuesdata
    X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    y = np.array([[0], [1], [1], [0]])
    
    # creation神经network
    nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
    
    # 训练神经network
    print("Training Neural Network...")
    nn.train(X, y, epochs=10000, learning_rate=0.1)
    
    # test神经network
    print("\nTesting Neural Network...")
    predictions = nn.predict(X)
    print("Input:\n", X)
    print("Expected Output:\n", y)
    print("Predicted Output:\n", np.round(predictions, 4))
    print("Rounded Predictions:\n", np.round(predictions))

7. 实践case: using神经networkfor手写number识别

under 面 is a usingPython and NumPyimplementation 神经network, 用于识别MNIST手写number:

7.1 加载 and 预processingdata

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 加载MNISTdata集
print("Loading MNIST dataset...")
x, y = fetch_openml('mnist_784', version=1, return_X_y=True)

# data预processing
x = x.astype('float32')
y = y.astype('int')

# 归一化data
scaler = StandardScaler()
x = scaler.fit_transform(x)

# 将tag转换 for 独热编码
y_one_hot = np.zeros((y.shape[0], 10))
y_one_hot[np.arange(y.shape[0]), y] = 1

# 分割data for 训练集 and test集
x_train, x_test, y_train, y_test = train_test_split(
    x, y_one_hot, test_size=0.2, random_state=42
)

print(f"Training data shape: {x_train.shape}")
print(f"Testing data shape: {x_test.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Testing labels shape: {y_test.shape}")

7.2 定义 and 训练神经network

# 定义激活function
def relu(x):
    return np.maximum(0, x)

def relu_derivative(x):
    return np.where(x > 0, 1, 0)

def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

# 定义神经networkclass
class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # 初始化权重
        self.weights1 = np.random.randn(input_size, hidden_size) * 0.01
        self.bias1 = np.zeros((1, hidden_size))
        self.weights2 = np.random.randn(hidden_size, output_size) * 0.01
        self.bias2 = np.zeros((1, output_size))
    
    def forward(self, X):
        #  before 向传播
        self.z1 = np.dot(X, self.weights1) + self.bias1
        self.a1 = relu(self.z1)
        self.z2 = np.dot(self.a1, self.weights2) + self.bias2
        self.a2 = softmax(self.z2)
        return self.a2
    
    def backward(self, X, y, output, learning_rate):
        # 反向传播
        # 计算输出层误差
        output_error = output - y
        
        # 计算隐藏层误差
        hidden_error = np.dot(output_error, self.weights2.T)
        hidden_delta = hidden_error * relu_derivative(self.z1)
        
        # update权重 and 偏置
        self.weights2 -= np.dot(self.a1.T, output_error) * learning_rate
        self.bias2 -= np.sum(output_error, axis=0, keepdims=True) * learning_rate
        self.weights1 -= np.dot(X.T, hidden_delta) * learning_rate
        self.bias1 -= np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate
    
    def train(self, X, y, epochs, batch_size, learning_rate):
        n_samples = X.shape[0]
        for i in range(epochs):
            # 随机打乱data
            permutation = np.random.permutation(n_samples)
            X_shuffled = X[permutation]
            y_shuffled = y[permutation]
            
            #  small 批量训练
            for j in range(0, n_samples, batch_size):
                end = min(j + batch_size, n_samples)
                batch_X = X_shuffled[j:end]
                batch_y = y_shuffled[j:end]
                
                output = self.forward(batch_X)
                self.backward(batch_X, batch_y, output, learning_rate)
            
            if (i + 1) % 10 == 0:
                output = self.forward(X)
                loss = -np.mean(np.sum(y * np.log(output + 1e-10), axis=1))
                accuracy = np.mean(np.argmax(output, axis=1) == np.argmax(y, axis=1))
                print(f"Epoch {i+1}, Loss: {loss:.4f}, Accuracy: {accuracy:.4f}")
    
    def predict(self, X):
        output = self.forward(X)
        return np.argmax(output, axis=1)

# creation and 训练神经network
print("\nCreating and training neural network...")
nn = NeuralNetwork(input_size=784, hidden_size=128, output_size=10)
nn.train(x_train, y_train, epochs=50, batch_size=128, learning_rate=0.01)

# test神经network
print("\nTesting neural network...")
y_pred = nn.predict(x_test)
y_test_labels = np.argmax(y_test, axis=1)
accuracy = np.mean(y_pred == y_test_labels)
print(f"Test accuracy: {accuracy:.4f}")

# visualization预测结果
print("\nVisualizing predictions...")
fig, axes = plt.subplots(2, 5, figsize=(12, 6))
for i, ax in enumerate(axes.flat):
    img = x_test[i].reshape(28, 28)
    ax.imshow(img, cmap='gray')
    ax.set_title(f"Pred: {y_pred[i]}, True: {y_test_labels[i]}")
    ax.axis('off')
plt.tight_layout()
plt.show()

8. 互动练习

练习 1: 感知器implementation

  1. usingPythonimplementation一个感知器model.
  2. using感知器解决逻辑 and (AND) issues.
  3. 尝试using感知器解决异 or (XOR) issues, 观察结果.
  4. 解释 for what感知器无法解决异 or issues.

练习 2: 神经networkparameter调整

  1. using on 面 神经networkcode.
  2. 尝试不同 隐藏层 big small (such as2, 4, 8, 16) .
  3. 尝试不同 Learning率 (such as0.001, 0.01, 0.1, 0.5) .
  4. 尝试不同 激活function (such asSigmoid, ReLU, Tanh) .
  5. 比较不同parameter设置 under modelperformance.

练习 3: 神经networkvisualization

  1. 选择一个 simple data集 (such asirisdata集) .
  2. 训练一个 small 型神经network.
  3. visualization神经network 决策edge界.
  4. analysis不同隐藏层 big small for 决策edge界 影响.
返回tutoriallist under 一节: 深度LearningIntroduction