1. 循环神经networkIntroduction
循环神经network (Recurrent Neural Network, RNN) is a专门用于processing序列data 神经networkmodel. and 传统 before 馈神经network不同, RNN具 has 记忆capacity, able to利用historyinformation来processing当 before 输入.
RNN 主要特点including:
- 记忆capacity: able to记住之 before 输入information, 用于当 before 预测
- 序列processing: 适合processing时间序列, 文本, 语音etc.序列data
- parameter共享: in 不同时间步共享相同 parameter, reducingmodelcomplexity
- 可变 long 度输入: able toprocessing不同 long 度 序列输入
2. RNN basic原理
RNNthrough循环连接implementation记忆functions, 其basicstructuresuch as under :
2.1 RNN 数学model
for 于时间步t, RNN 计算可以表示 for :
h_t = tanh(W_{xh} x_t + W_{hh} h_{t-1} + b_h)
y_t = W_{hy} h_t + b_y
其in:
- x_t is 时间步t 输入
- h_t is 时间步t 隐藏status (记忆)
- y_t is 时间步t 输出
- W_{xh}, W_{hh}, W_{hy} is 权重矩阵
- b_h, b_y is 偏置项
- tanh is 激活function
2.2 RNN unfold
将RNN按时间步unfold after , 可以看作 is a 具 has many 个重复层 before 馈神经network, 每层 for 应一个时间步. 这种unfold has 助于understandingRNN working principles and 训练过程.
3. RNN 训练
RNNusing反向传播algorithmsfor训练, 但由于其特殊 循环structure, 需要usingBPTT (Back Propagation Through Time) algorithms, 即 in unfold network on for反向传播.
3.1 BPTTalgorithms
BPTTalgorithms basic步骤:
- before 向传播: 计算每个时间步 隐藏status and 输出
- 计算损失: using损失function计算预测值 and 真实值 diff
- 反向传播: from 最 after 一个时间步开始, 计算每个权重 梯度
- updateparameter: usingoptimizationalgorithmsupdatemodelparameter
3.2 梯度消失 and 梯度爆炸issues
由于RNN 循环structure, BPTTalgorithms in 训练过程in会遇 to 梯度消失 and 梯度爆炸issues:
- 梯度消失: 随着时间步 增加, 梯度会指数级衰减, 导致早期时间步 权重无法 has 效update
- 梯度爆炸: 梯度会指数级增 long , 导致model训练不 stable
for Understand决这些issues, 研究人员提出了 many 种improvement RNN变体, such asLSTM and GRU.
4. RNN变体
4.1 long short 期记忆network (LSTM)
LSTM (Long Short-Term Memory) is a特殊 RNN, through门控mechanism解决了梯度消失issues, able toLearning long 期依赖relationships.
4.1.1 LSTM 门控mechanism
LSTMthrough三个门控单元来控制information 流动:
- 遗忘门 (Forget Gate) : 决定哪些historyinformation需要被遗忘
- 输入门 (Input Gate) : 决定哪些 new information需要被保存
- 输出门 (Output Gate) : 决定当 before 隐藏status 输出
4.1.2 LSTM 数学model
LSTM 计算过程such as under :
1. 遗忘门
f_t = σ(W_f · [h_{t-1}, x_t] + b_f)
2. 输入门
i_t = σ(W_i · [h_{t-1}, x_t] + b_i)
~C_t = tanh(W_C · [h_{t-1}, x_t] + b_C)
3. 细胞statusupdate
C_t = f_t * C_{t-1} + i_t * ~C_t
4. 输出门
o_t = σ(W_o · [h_{t-1}, x_t] + b_o)
h_t = o_t * tanh(C_t)
其in:
- σ is sigmoid激活function
- * is 元素级乘法
- C_t is 细胞status ( long 期记忆)
- h_t is 隐藏status ( short 期记忆)
4.2 门控循环单元 (GRU)
GRU (Gated Recurrent Unit) is LSTM 简化version, throughmerge门控单元reducing了modelparameter, improving了训练efficiency.
4.2.1 GRU 门控mechanism
GRU只 has 两个门控单元:
- update门 (Update Gate) : 控制historyinformation and new information 比例
- reset门 (Reset Gate) : 决定such as何usinghistoryinformation
4.2.2 GRU 数学model
GRU 计算过程such as under :
1. update门 and reset门
z_t = σ(W_z · [h_{t-1}, x_t] + b_z)
r_t = σ(W_r · [h_{t-1}, x_t] + b_r)
2. 候选隐藏status
~h_t = tanh(W · [r_t * h_{t-1}, x_t] + b)
3. 隐藏statusupdate
h_t = (1 - z_t) * h_{t-1} + z_t * ~h_t
5. TensorFlowin RNNimplementation
5.1 basicRNN层
TensorFlowproviding了SimpleRNN层, 用于implementationbasic RNN:
# usingSimpleRNN层
rnn_layer = tf.keras.layers.SimpleRNN(
units=64, # 隐藏单元数量
activation='tanh', # 激活function
return_sequences=False, # is 否返回所 has 时间步 输出
input_shape=(timesteps, input_dim) # 输入形状
)
5.2 LSTM层
usingLSTM层implementation long short 期记忆network:
# usingLSTM层
lstm_layer = tf.keras.layers.LSTM(
units=64, # 隐藏单元数量
activation='tanh', # 隐藏status激活function
recurrent_activation='sigmoid', # 门控激活function
return_sequences=False, # is 否返回所 has 时间步 输出
return_state=False, # is 否返回隐藏status and 细胞status
input_shape=(timesteps, input_dim) # 输入形状
)
5.3 GRU层
usingGRU层implementation门控循环单元:
# usingGRU层
gru_layer = tf.keras.layers.GRU(
units=64, # 隐藏单元数量
activation='tanh', # 隐藏status激活function
recurrent_activation='sigmoid', # 门控激活function
return_sequences=False, # is 否返回所 has 时间步 输出
return_state=False, # is 否返回隐藏status
input_shape=(timesteps, input_dim) # 输入形状
)
6. RNNapplicationinstance
6.1 文本classification
usingLSTMfor文本classification, 以IMDB电影评论情感analysis for 例:
# 加载IMDBdata集
imdb = tf.keras.datasets.imdb
max_features = 10000 # 只using before 10000个最common 词
maxlen = 200 # 每条评论截断 or 填充 to 200个词
# 加载data集
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)
# 填充序列
X_train = tf.keras.preprocessing.sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = tf.keras.preprocessing.sequence.pad_sequences(X_test, maxlen=maxlen)
# 构建LSTMmodel
model = tf.keras.Sequential([
# 嵌入层: 将词index转换 for 词向量
tf.keras.layers.Embedding(input_dim=max_features, output_dim=128, input_length=maxlen),
# LSTM层
tf.keras.layers.LSTM(units=64, return_sequences=True),
tf.keras.layers.LSTM(units=32),
# 全连接层
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dropout(0.5),
# 输出层
tf.keras.layers.Dense(1, activation='sigmoid')
])
# 查看model摘要
model.summary()
# 编译model
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
# 训练model
history = model.fit(
X_train, y_train,
epochs=10,
batch_size=64,
validation_split=0.2,
verbose=1
)
# assessmentmodel
loss, accuracy = model.evaluate(X_test, y_test, verbose=1)
print(f"test损失: {loss}")
print(f"test准确率: {accuracy}")
6.2 时间序列预测
usingLSTMfor时间序列预测, 以正弦波预测 for 例:
# 生成正弦波data
import numpy as np
import matplotlib.pyplot as plt
# 生成data
time = np.arange(0, 1000, 0.1)
sin_wave = np.sin(time)
# 构建data集
def create_dataset(data, window_size):
X, y = [], []
for i in range(len(data) - window_size):
X.append(data[i:i+window_size])
y.append(data[i+window_size])
return np.array(X), np.array(y)
# 设置窗口 big small
window_size = 20
# creationdata集
X, y = create_dataset(sin_wave, window_size)
# 划分训练集 and test集
split_index = int(len(X) * 0.8)
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]
# 重塑data形状 for (样本数, 时间步, 特征数)
X_train = X_train.reshape(-1, window_size, 1)
X_test = X_test.reshape(-1, window_size, 1)
# 构建LSTMmodel
model = tf.keras.Sequential([
tf.keras.layers.LSTM(units=50, return_sequences=True, input_shape=(window_size, 1)),
tf.keras.layers.LSTM(units=50),
tf.keras.layers.Dense(1)
])
# 编译model
model.compile(
optimizer='adam',
loss='mse'
)
# 训练model
history = model.fit(
X_train, y_train,
epochs=20,
batch_size=32,
validation_split=0.2,
verbose=1
)
# 预测
y_pred = model.predict(X_test)
# visualization预测结果
plt.figure(figsize=(12, 6))
plt.plot(y_test, label='真实值')
plt.plot(y_pred, label='预测值')
plt.title('正弦波预测')
plt.xlabel('时间步')
plt.ylabel('值')
plt.legend()
plt.show()
6.3 序列生成
usingLSTM生成文本序列, 以生成古诗词 for 例:
# simple 文本生成example
# fake设我们 has 一个package含古诗词 文本file 'poems.txt'
# 读取文本data
with open('poems.txt', 'r', encoding='utf-8') as f:
text = f.read()
# creation字符 to index map
chars = sorted(list(set(text)))
char2idx = {char: i for i, char in enumerate(chars)}
idx2char = {i: char for i, char in enumerate(chars)}
vocab_size = len(chars)
# creation训练data
max_length = 100 # 序列 long 度
step = 5 # 步 long
sentences = []
next_chars = []
for i in range(0, len(text) - max_length, step):
sentences.append(text[i:i+max_length])
next_chars.append(text[i+max_length])
# 将文本转换 for 数值
X = np.zeros((len(sentences), max_length, vocab_size), dtype=np.bool)
y = np.zeros((len(sentences), vocab_size), dtype=np.bool)
for i, sentence in enumerate(sentences):
for t, char in enumerate(sentence):
X[i, t, char2idx[char]] = 1
y[i, char2idx[next_chars[i]]] = 1
# 构建文本生成model
model = tf.keras.Sequential([
tf.keras.layers.LSTM(128, input_shape=(max_length, vocab_size), return_sequences=True),
tf.keras.layers.LSTM(128),
tf.keras.layers.Dense(vocab_size, activation='softmax')
])
# 编译model
model.compile(
optimizer='adam',
loss='categorical_crossentropy'
)
# 训练model
history = model.fit(
X, y,
epochs=50,
batch_size=128,
verbose=1
)
# 生成文本
def generate_text(model, start_string, num_generate=1000, temperature=1.0):
# 将起始string转换 for 数值
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0) # scale for (batch_size, sequence_length)
# 生成 文本
text_generated = []
# resetmodelstatus
model.reset_states()
for i in range(num_generate):
# 预测 under 一个字符
predictions = model(input_eval)
predictions = tf.squeeze(predictions, 0) # 移除batch维度
# usingtemperature调整预测分布 随机性
predictions = predictions / temperature
# 采样 under 一个字符 index
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
# 将预测 字符添加 to 生成 文本in
text_generated.append(idx2char[predicted_id])
# update输入: 将预测 字符serving as under 一个时间步 输入
input_eval = tf.expand_dims([predicted_id], 0)
return start_string + ''.join(text_generated)
# 生成文本
start_string = "床 before 明月光"
generated_text = generate_text(model, start_string, num_generate=500, temperature=0.8)
print(generated_text)
7. RNNadvancedtechniques
7.1 双向RNN
双向RNN同时考虑过去 and 未来 information, improvingmodelperformance:
# using双向LSTM
bi_lstm = tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(units=64, return_sequences=True),
merge_mode='concat' # merge模式: 'concat', 'sum', 'mul', 'ave'
)
7.2 many 层RNN
堆叠 many 个RNN层, 增加model 深度:
# 构建 many 层LSTMmodel
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=max_features, output_dim=128, input_length=maxlen),
# 第一层LSTM, 返回所 has 时间步 输出
tf.keras.layers.LSTM(units=64, return_sequences=True),
tf.keras.layers.Dropout(0.2),
# 第二层LSTM, 返回所 has 时间步 输出
tf.keras.layers.LSTM(units=64, return_sequences=True),
tf.keras.layers.Dropout(0.2),
# 第三层LSTM, 只返回最 after 一个时间步 输出
tf.keras.layers.LSTM(units=32),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1, activation='sigmoid')
])
7.3 using注意力mechanism
注意力mechanism可以让model in processing序列时关注 important 部分:
# implementation一个 simple 注意力层
class AttentionLayer(tf.keras.layers.Layer):
def __init__(self):
super(AttentionLayer, self).__init__()
def build(self, input_shape):
# creation权重 and 偏置
self.W = self.add_weight(name='attention_weights', shape=(input_shape[-1], 1),
initializer='random_normal', trainable=True)
self.b = self.add_weight(name='attention_bias', shape=(input_shape[1], 1),
initializer='zeros', trainable=True)
def call(self, inputs):
# 计算注意力得分
attention_scores = tf.matmul(inputs, self.W) + self.b
attention_scores = tf.squeeze(attention_scores, axis=-1)
# usingsoftmax计算注意力权重
attention_weights = tf.nn.softmax(attention_scores, axis=1)
attention_weights = tf.expand_dims(attention_weights, axis=-1)
# 加权求 and
context_vector = inputs * attention_weights
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
# in modelinusing注意力层
inputs = tf.keras.Input(shape=(maxlen, 128))
lstm_output = tf.keras.layers.LSTM(64, return_sequences=True)(inputs)
context_vector, attention_weights = AttentionLayer()(lstm_output)
dense_output = tf.keras.layers.Dense(64, activation='relu')(context_vector)
output = tf.keras.layers.Dense(1, activation='sigmoid')(dense_output)
model = tf.keras.Model(inputs=inputs, outputs=output)
7.4 migrationLearning
using预训练 词嵌入 or languagemodelimprovingRNNmodelperformance:
# using预训练 GloVe词嵌入
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
# 加载GloVe词向量
def load_glove_vectors(glove_file):
word_vectors = {}
with open(glove_file, 'r', encoding='utf-8') as f:
for line in f:
values = line.split()
word = values[0]
vector = np.asarray(values[1:], dtype='float32')
word_vectors[word] = vector
return word_vectors
# 加载预训练 词向量
glove_file = 'glove.6B.100d.txt'
word_vectors = load_glove_vectors(glove_file)
# creation嵌入矩阵
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
word_index = tokenizer.word_index
embedding_dim = 100
embedding_matrix = np.zeros((len(word_index) + 1, embedding_dim))
for word, i in word_index.items():
if word in word_vectors:
embedding_matrix[i] = word_vectors[word]
# using预训练 嵌入层
embedding_layer = tf.keras.layers.Embedding(
input_dim=len(word_index) + 1,
output_dim=embedding_dim,
weights=[embedding_matrix],
input_length=maxlen,
trainable=False # is 否 in 训练过程inupdate嵌入向量
)
8. RNN application领域
RNN in 许 many 领域都 has 广泛 application, including:
8.1 自然languageprocessing
- 文本classification
- 情感analysis
- 机器翻译
- 文本生成
- 命名实体识别
- 句法analysis
8.2 语音processing
- speech recognition
- 语音合成
- 说话人识别
- 情感识别
8.3 时间序列预测
- 股票价格预测
- 天气预测
- 交通traffic预测
- 电力负荷预测
8.4 视频processing
- 动作识别
- 视频classification
- 视频生成
- 视频字幕生成
9. 练习
练习 1: 文本classification
- usingIMDBdata集, 构建一个LSTMmodelfor情感analysis.
- 尝试不同 networkstructure, such as many 层LSTM, 双向LSTMetc..
- 比较不同model performancediff.
练习 2: 时间序列预测
- using股票价格data or 天气data, 构建LSTMmodelfor预测.
- 尝试不同 窗口 big small and modelparameter.
- visualization预测结果, assessmentmodelperformance.
练习 3: 文本生成
- using自己喜欢 文本data (such as small 说, 诗歌etc.) , 训练一个文本生成model.
- 尝试不同 温度值, 观察生成文本 many 样性.
- 生成一段 long 度 for 500-1000字符 文本.
练习 4: using注意力mechanism
- in LSTMmodelin添加注意力层, improvingmodelperformance.
- visualization注意力权重, analysismodel关注 文本部分.
- 比较添加注意力层 before after modelperformance.