深度LearningIntroduction

Understand深度Learning concepts, 发展历程 and coretechniques

1. what is 深度Learning?

深度Learning (Deep Learning) is 机器Learning 一个branch, 它using many 层神经network来Learningdata complex 表示. and 传统机器Learning相比, 深度Learningable to自动 from datain提取特征, 无需手动特征工程.

提示

深度Learning "深度"指 is 神经networkin隐藏层 数量. 深层networkable toLearning更abstraction, 更 complex 特征表示.

1.1 深度Learning and 传统机器Learning 区别

features 传统机器Learning 深度Learning
特征提取 需要手动design特征 自动 from datainLearning特征
datarequirements 适用于 small to inetc.规模data 需要 big 量data才能发挥优势
计算resource 计算requirements较 low 需要强 big 计算resource (GPU/TPU)
model可解释性 通常具 has 较 good 可解释性 被视 for "黑盒", 可解释性较差
适用task structure化data, simple 模式 非structure化data, complex 模式

2. 深度Learning 发展历程

深度Learning 发展经历了以 under 几个 important 阶段:

2.1 早期发展 (1940s-1980s)

  • 1943年: 麦卡洛克 and 皮茨提出人工神经元model
  • 1957年: 罗森布拉特发明感知器
  • 1969年: 明斯基 and 派珀特指出感知器 局限性
  • 1986年: 辛顿etc.人提出反向传播algorithms

2.2 复兴期 (2000s-2010s)

  • 2006年: 辛顿etc.人提出深度信念network and 预训练method
  • 2009年: 李飞飞etc.人creationImageNetdata集
  • 2012年: AlexNet in ImageNet竞赛in取得突破, 深度Learning开始崛起
  • 2014年: 生成 for 抗network (GAN) 提出
  • 2015年: ResNet提出, 解决了深层network 梯度消失issues
  • 2017年: Transformerarchitecture提出, revolutionizing自然languageprocessing

2.3 繁荣期 (2010s-至今)

  • 2018年: BERTmodel提出, 自然languageprocessingcapacity big 幅提升
  • 2020年: GPT-3提出, parameter规模达 to 1750亿
  • 2022年: ChatGPTrelease, 引发全球AI热潮
  • 2023年: GPT-4, Claudeetc. big languagemodel相继release

3. 深度Learning corearchitecture

3.1 卷积神经network (CNN)

卷积神经network (Convolutional Neural Network, CNN) is a专门用于processing网格data (such asgraph像) 深度Learningarchitecture.

3.1.1 CNN basic组成

  • 卷积层: using卷积核提取局部特征
  • 池化层: reducing特征graph尺寸, 保留 important information
  • 全连接层: for最终 classification or 回归
  • 激活function: 引入非线性features

3.1.2 CNN application

  • graph像classification
  • 目标检测
  • graph像分割
  • 人脸识别
  • 医学影像analysis

3.2 循环神经network (RNN)

循环神经network (Recurrent Neural Network, RNN) is a专门用于processing序列data 深度Learningarchitecture.

3.2.1 RNN basic原理

RNNthrough in networkin引入循环连接, able toprocessing序列datain 时间依赖relationships. 然而, 传统RNN存 in 梯度消失 and 梯度爆炸issues.

3.2.2 RNN 变体

  • LSTM ( long short 期记忆network) : 解决了 long 序列依赖issues
  • GRU (门控循环单元) : LSTM 简化version, 计算efficiency更 high

3.2.3 RNN application

  • 自然languageprocessing
  • speech recognition
  • 时间序列预测
  • 机器翻译
  • 视频analysis

3.3 生成 for 抗network (GAN)

生成 for 抗network (Generative Adversarial Network, GAN) is a用于生成 new data 深度Learningarchitecture.

3.3.1 GAN basic原理

GAN由两个network组成:

  • 生成器 (Generator) : 尝试生成逼真 data
  • 判别器 (Discriminator) : 尝试区分真实data and 生成data

两个networkthrough for 抗训练continuouslyimproving各自 capacity, 最终生成器able to生成 high quality fakedata.

3.3.2 GAN application

  • graph像生成
  • 风格migration
  • 超分辨率重建
  • 文本 to graph像生成
  • data增强

3.4 Transformerarchitecture

Transformer is a基于自注意力mechanism 深度Learningarchitecture, 最初 for 机器翻译taskdesign, 现已widely used in各种自然languageprocessingtask.

3.4.1 Transformer basic原理

Transformer完全基于注意力mechanism, 抛弃了传统 循环 and 卷积structure, able toparallelprocessing序列data, big 幅improving训练efficiency.

3.4.2 Transformer application

  • 自然languageprocessing (BERT, GPTetc.)
  • 机器翻译
  • speech recognition
  • graph像processing
  • many 模态Learning

4. 深度Learningframework

深度Learningframework for Development者providing了构建, 训练 and deployment深度Learningmodel tool and interface.

4.1 主流深度Learningframework

4.1.1 TensorFlow

  • 由GoogleDevelopment
  • support静态计算graph
  • 适合producedeployment
  • 拥 has 丰富 ecosystem

4.1.2 PyTorch

  • 由FacebookDevelopment
  • support动态计算graph
  • 适合研究 and 原型Development
  • Pythonic APIdesign

4.1.3 otherframework

  • Keras: advancedAPI, 可基于TensorFlow or Theano
  • Caffe: 适合computer visiontask
  • MXNet: high 效 distributed训练
  • JAX: GoogleDevelopment 数值计算library

5. 深度Learning 训练过程

5.1 data准备

  • data收集
  • data清洗
  • data增强
  • data分割 (训练集, verification集, test集)

5.2 modeldesign

  • 选择合适 networkarchitecture
  • 确定network深度 and 宽度
  • 选择激活function
  • design损失function

5.3 model训练

  • 选择optimizationalgorithms (such asAdam, SGD)
  • 设置Learning率 and 批量 big small
  • monitor训练过程
  • 防止过拟合 (正则化, dropoutetc.)

5.4 modelassessment

  • in verification集 on assessmentmodelperformance
  • 调整超parameter
  • in test集 on 最终assessment

5.5 modeldeployment

  • modelexport
  • modeloptimization (量化, 剪枝etc.)
  • deployment to produceenvironment
  • modelmonitor and maintenance

6. codeexample: usingPyTorch构建 simple CNN

under 面 is a usingPyTorch构建 simple CNN用于MNIST手写number识别 example:

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

# data预processing
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# 加载data集
trainset = torchvision.datasets.MNIST(root='./data', train=True, 
                                      download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, 
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.MNIST(root='./data', train=False, 
                                     download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, 
                                         shuffle=False, num_workers=2)

# 定义CNNmodel
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 卷积层
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        # 池化层
        self.pool = nn.MaxPool2d(2, 2)
        # 全连接层
        self.fc1 = nn.Linear(32 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        # 激活function
        self.relu = nn.ReLU()
    
    def forward(self, x):
        # 第一层卷积
        x = self.pool(self.relu(self.conv1(x)))
        # 第二层卷积
        x = self.pool(self.relu(self.conv2(x)))
        # 展平
        x = x.view(-1, 32 * 7 * 7)
        # 全连接层
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# creationmodelinstance
net = Net()

# 定义损失function and optimization器
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

# 训练model
epochs = 5
for epoch in range(epochs):
    run_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        
        # 梯度清零
        optimizer.zero_grad()
        
        #  before 向传播
        outputs = net(inputs)
        
        # 计算损失
        loss = criterion(outputs, labels)
        
        # 反向传播
        loss.backward()
        
        # updateparameter
        optimizer.step()
        
        run_loss += loss.item()
        if i % 100 == 99:
            print(f'[{epoch + 1}, {i + 1}] loss: {run_loss / 100:.3f}')
            run_loss = 0.0

print('Finished Training')

# testmodel
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total}%')

# visualization预测结果
dataiter = iter(testloader)
images, labels = dataiter.next()

# 打印预测结果
outputs = net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join(f'{predicted[j]}' for j in range(4)))
print('Actual:    ', ' '.join(f'{labels[j]}' for j in range(4)))

# 显示graph像
fig, axes = plt.subplots(1, 4, figsize=(10, 3))
for i in range(4):
    img = images[i].numpy().squeeze()
    axes[i].imshow(img, cmap='gray')
    axes[i].set_title(f'Pred: {predicted[i]}, True: {labels[i]}')
    axes[i].axis('off')
plt.tight_layout()
plt.show()

7. 实践case: usingTensorFlow构建 simple RNN

under 面 is a usingTensorFlow构建 simple RNN用于文本classification example:

7.1 data准备 and model定义

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

# 设置parameter
max_features = 10000  # 词汇表 big  small 
maxlen = 500          # 序列最 big  long 度
batch_size = 32       # 批量 big  small 

# 加载IMDBdata集
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(f'Training data shape: {x_train.shape}')
print(f'Testing data shape: {x_test.shape}')

# 序列填充
print('Padding sequences...')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print(f'Training data shape after padding: {x_train.shape}')
print(f'Testing data shape after padding: {x_test.shape}')

# 构建RNNmodel
print('building model...')
model = Sequential()
model.add(Embedding(max_features, 32))  # 嵌入层
model.add(SimpleRNN(32))                # RNN层
model.add(Dense(1, activation='sigmoid'))  # 输出层

# 编译model
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# 查看modelstructure
model.summary()

7.2 训练 and assessmentmodel

# 训练model
print('Training model...')
history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=batch_size,
                    validation_split=0.2)

# assessmentmodel
print('Evaluating model...')
test_loss, test_acc = model.evaluate(x_test, y_test, batch_size=batch_size)
print(f'Test accuracy: {test_acc:.4f}')

# 绘制训练过程
import matplotlib.pyplot as plt

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

8. 深度Learning application领域

  • computer vision: graph像识别, 目标检测, 人脸识别, 自动驾驶etc.
  • 自然languageprocessing: 机器翻译, 情感analysis, 文本生成, 问答systemetc.
  • speech recognition: 语音转文本, 语音助手, 语音合成etc.
  • 推荐system: e-commerce recommendations, content recommendations, personalizedserviceetc.
  • 医疗healthy: disease diagnosis, 医学影像analysis, 药物研发etc.
  • 金融service: fraud detection, riskassessment, algorithms交易etc.
  • 游戏: 游戏AI, role动画, procedural content generationetc.
  • 机器人: 机器人感知, 运动控制, 自主导航etc.

9. 互动练习

练习 1: CNNmodel构建

  1. usingPyTorch or TensorFlow构建一个CNNmodel.
  2. usingCIFAR-10data集for训练 and test.
  3. 尝试不同 networkstructure (such as增加卷积层, 调整池化层etc.) .
  4. 比较不同structure modelperformance.

练习 2: RNN文本classification

  1. usingPyTorch or TensorFlow构建一个RNNmodel.
  2. usingIMDBdata集for情感analysis.
  3. 尝试using不同 RNN变体 (such asLSTM, GRU) .
  4. 比较不同model performance and 训练速度.

练习 3: 深度Learningapplication调研

  1. 选择一个你感兴趣 深度Learningapplication领域.
  2. 调研该领域 最 new techniques and applicationcase.
  3. analysis该领域面临 challenges and 未来发展方向.
  4. 撰写一份简 short 调研报告.
返回tutoriallist under 一节: 自然languageprocessing