深度LearningIntroduction - artificial intelligence入门tutorial

1. what is 深度Learning？

深度Learning (Deep Learning) is 机器Learning 一个branch, 它using many 层神经network来Learningdata complex 表示. and 传统机器Learning相比, 深度Learningable to自动 from datain提取特征, 无需手动特征工程.

提示

深度Learning "深度"指 is 神经networkin隐藏层数量. 深层networkable toLearning更abstraction, 更 complex 特征表示.

1.1 深度Learning and 传统机器Learning 区别

features	传统机器Learning	深度Learning
特征提取	需要手动design特征	自动 from datainLearning特征
datarequirements	适用于 small to inetc.规模data	需要 big 量data才能发挥优势
计算resource	计算requirements较 low	需要强 big 计算resource (GPU/TPU)
model可解释性	通常具 has 较 good 可解释性	被视 for "黑盒", 可解释性较差
适用task	structure化data, simple 模式	非structure化data, complex 模式

2. 深度Learning 发展历程

深度Learning 发展经历了以 under 几个 important 阶段:

2.1 早期发展 (1940s-1980s)

1943年: 麦卡洛克 and 皮茨提出人工神经元model
1957年: 罗森布拉特发明感知器
1969年: 明斯基 and 派珀特指出感知器局限性
1986年: 辛顿etc.人提出反向传播algorithms

2.2 复兴期 (2000s-2010s)

2006年: 辛顿etc.人提出深度信念network and 预训练method
2009年: 李飞飞etc.人creationImageNetdata集
2012年: AlexNet in ImageNet竞赛in取得突破, 深度Learning开始崛起
2014年: 生成 for 抗network (GAN) 提出
2015年: ResNet提出, 解决了深层network 梯度消失issues
2017年: Transformerarchitecture提出, revolutionizing自然languageprocessing

2.3 繁荣期 (2010s-至今)

2018年: BERTmodel提出, 自然languageprocessingcapacity big 幅提升
2020年: GPT-3提出, parameter规模达 to 1750亿
2022年: ChatGPTrelease, 引发全球AI热潮
2023年: GPT-4, Claudeetc. big languagemodel相继release

3. 深度Learning corearchitecture

3.1 卷积神经network (CNN)

卷积神经network (Convolutional Neural Network, CNN) is a专门用于processing网格data (such asgraph像) 深度Learningarchitecture.

3.1.1 CNN basic组成

卷积层: using卷积核提取局部特征
池化层: reducing特征graph尺寸, 保留 important information
全连接层: for最终 classification or 回归
激活function: 引入非线性features

3.1.2 CNN application

graph像classification
目标检测
graph像分割
人脸识别
医学影像analysis

3.2 循环神经network (RNN)

循环神经network (Recurrent Neural Network, RNN) is a专门用于processing序列data 深度Learningarchitecture.

3.2.1 RNN basic原理

RNNthrough in networkin引入循环连接, able toprocessing序列datain 时间依赖relationships. 然而, 传统RNN存 in 梯度消失 and 梯度爆炸issues.

3.2.2 RNN 变体

LSTM ( long short 期记忆network) : 解决了 long 序列依赖issues
GRU (门控循环单元) : LSTM 简化version, 计算efficiency更 high

3.2.3 RNN application

自然languageprocessing
speech recognition
时间序列预测
机器翻译
视频analysis

3.3 生成 for 抗network (GAN)

生成 for 抗network (Generative Adversarial Network, GAN) is a用于生成 new data 深度Learningarchitecture.

3.3.1 GAN basic原理

GAN由两个network组成:

生成器 (Generator) : 尝试生成逼真 data
判别器 (Discriminator) : 尝试区分真实data and 生成data

两个networkthrough for 抗训练continuouslyimproving各自 capacity, 最终生成器able to生成 high quality fakedata.

3.3.2 GAN application

graph像生成
风格migration
超分辨率重建
文本 to graph像生成
data增强

3.4 Transformerarchitecture

Transformer is a基于自注意力mechanism 深度Learningarchitecture, 最初 for 机器翻译taskdesign, 现已widely used in各种自然languageprocessingtask.

3.4.1 Transformer basic原理

Transformer完全基于注意力mechanism, 抛弃了传统循环 and 卷积structure, able toparallelprocessing序列data, big 幅improving训练efficiency.

3.4.2 Transformer application

自然languageprocessing (BERT, GPTetc.)
机器翻译
speech recognition
graph像processing
many 模态Learning

4. 深度Learningframework

深度Learningframework for Development者providing了构建, 训练 and deployment深度Learningmodel tool and interface.

4.1 主流深度Learningframework

4.1.1 TensorFlow

由GoogleDevelopment
support静态计算graph
适合producedeployment
拥 has 丰富 ecosystem

4.1.2 PyTorch

由FacebookDevelopment
support动态计算graph
适合研究 and 原型Development
Pythonic APIdesign

4.1.3 otherframework

Keras: advancedAPI, 可基于TensorFlow or Theano
Caffe: 适合computer visiontask
MXNet: high 效 distributed训练
JAX: GoogleDevelopment 数值计算library

5. 深度Learning 训练过程

5.1 data准备

data收集
data清洗
data增强
data分割 (训练集, verification集, test集)

5.2 modeldesign

选择合适 networkarchitecture
确定network深度 and 宽度
选择激活function
design损失function

5.3 model训练

选择optimizationalgorithms (such asAdam, SGD)
设置Learning率 and 批量 big small
monitor训练过程
防止过拟合 (正则化, dropoutetc.)

5.4 modelassessment

in verification集 on assessmentmodelperformance
调整超parameter
in test集 on 最终assessment

5.5 modeldeployment

modelexport
modeloptimization (量化, 剪枝etc.)
deployment to produceenvironment
modelmonitor and maintenance

6. codeexample: usingPyTorch构建 simple CNN

under 面 is a usingPyTorch构建 simple CNN用于MNIST手写number识别 example:

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

# data预processing
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# 加载data集
trainset = torchvision.datasets.MNIST(root='./data', train=True, 
                                      download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, 
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.MNIST(root='./data', train=False, 
                                     download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, 
                                         shuffle=False, num_workers=2)

# 定义CNNmodel
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 卷积层
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        # 池化层
        self.pool = nn.MaxPool2d(2, 2)
        # 全连接层
        self.fc1 = nn.Linear(32 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        # 激活function
        self.relu = nn.ReLU()
    
    def forward(self, x):
        # 第一层卷积
        x = self.pool(self.relu(self.conv1(x)))
        # 第二层卷积
        x = self.pool(self.relu(self.conv2(x)))
        # 展平
        x = x.view(-1, 32 * 7 * 7)
        # 全连接层
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# creationmodelinstance
net = Net()

# 定义损失function and optimization器
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

# 训练model
epochs = 5
for epoch in range(epochs):
    run_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        
        # 梯度清零
        optimizer.zero_grad()
        
        #  before 向传播
        outputs = net(inputs)
        
        # 计算损失
        loss = criterion(outputs, labels)
        
        # 反向传播
        loss.backward()
        
        # updateparameter
        optimizer.step()
        
        run_loss += loss.item()
        if i % 100 == 99:
            print(f'[{epoch + 1}, {i + 1}] loss: {run_loss / 100:.3f}')
            run_loss = 0.0

print('Finished Training')

# testmodel
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total}%')

# visualization预测结果
dataiter = iter(testloader)
images, labels = dataiter.next()

# 打印预测结果
outputs = net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join(f'{predicted[j]}' for j in range(4)))
print('Actual:    ', ' '.join(f'{labels[j]}' for j in range(4)))

# 显示graph像
fig, axes = plt.subplots(1, 4, figsize=(10, 3))
for i in range(4):
    img = images[i].numpy().squeeze()
    axes[i].imshow(img, cmap='gray')
    axes[i].set_title(f'Pred: {predicted[i]}, True: {labels[i]}')
    axes[i].axis('off')
plt.tight_layout()
plt.show()

7. 实践case: usingTensorFlow构建 simple RNN

under 面 is a usingTensorFlow构建 simple RNN用于文本classification example:

7.1 data准备 and model定义

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

# 设置parameter
max_features = 10000  # 词汇表 big  small 
maxlen = 500          # 序列最 big  long 度
batch_size = 32       # 批量 big  small 

# 加载IMDBdata集
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(f'Training data shape: {x_train.shape}')
print(f'Testing data shape: {x_test.shape}')

# 序列填充
print('Padding sequences...')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print(f'Training data shape after padding: {x_train.shape}')
print(f'Testing data shape after padding: {x_test.shape}')

# 构建RNNmodel
print('building model...')
model = Sequential()
model.add(Embedding(max_features, 32))  # 嵌入层
model.add(SimpleRNN(32))                # RNN层
model.add(Dense(1, activation='sigmoid'))  # 输出层

# 编译model
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# 查看modelstructure
model.summary()

7.2 训练 and assessmentmodel

# 训练model
print('Training model...')
history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=batch_size,
                    validation_split=0.2)

# assessmentmodel
print('Evaluating model...')
test_loss, test_acc = model.evaluate(x_test, y_test, batch_size=batch_size)
print(f'Test accuracy: {test_acc:.4f}')

# 绘制训练过程
import matplotlib.pyplot as plt

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

8. 深度Learning application领域

computer vision: graph像识别, 目标检测, 人脸识别, 自动驾驶etc.
自然languageprocessing: 机器翻译, 情感analysis, 文本生成, 问答systemetc.
speech recognition: 语音转文本, 语音助手, 语音合成etc.
推荐system: e-commerce recommendations, content recommendations, personalizedserviceetc.
医疗healthy: disease diagnosis, 医学影像analysis, 药物研发etc.
金融service: fraud detection, riskassessment, algorithms交易etc.
游戏: 游戏AI, role动画, procedural content generationetc.
机器人: 机器人感知, 运动控制, 自主导航etc.

9. 互动练习

练习 1: CNNmodel构建

usingPyTorch or TensorFlow构建一个CNNmodel.
usingCIFAR-10data集for训练 and test.
尝试不同 networkstructure (such as增加卷积层, 调整池化层etc.) .
比较不同structure modelperformance.

练习 2: RNN文本classification

usingPyTorch or TensorFlow构建一个RNNmodel.
usingIMDBdata集for情感analysis.
尝试using不同 RNN变体 (such asLSTM, GRU) .
比较不同model performance and 训练速度.

练习 3: 深度Learningapplication调研

选择一个你感兴趣深度Learningapplication领域.
调研该领域最 new techniques and applicationcase.
analysis该领域面临 challenges and 未来发展方向.
撰写一份简 short 调研报告.

返回tutoriallist under 一节: 自然languageprocessing