TensorFlow 循环神经network(RNN)

1. 循环神经networkIntroduction

循环神经network (Recurrent Neural Network, RNN) is a专门用于processing序列data 神经networkmodel. and 传统 before 馈神经network不同, RNN具 has 记忆capacity, able to利用historyinformation来processing当 before 输入.

RNN 主要特点including:

记忆capacity: able to记住之 before 输入information, 用于当 before 预测
序列processing: 适合processing时间序列, 文本, 语音etc.序列data
parameter共享: in 不同时间步共享相同 parameter, reducingmodelcomplexity
可变 long 度输入: able toprocessing不同 long 度序列输入

2. RNN basic原理

RNNthrough循环连接implementation记忆functions, 其basicstructuresuch as under :

2.1 RNN 数学model

for 于时间步t, RNN 计算可以表示 for :

h_t = tanh(W_{xh} x_t + W_{hh} h_{t-1} + b_h)

y_t = W_{hy} h_t + b_y

其in:

x_t is 时间步t 输入
h_t is 时间步t 隐藏status (记忆)
y_t is 时间步t 输出
W_{xh}, W_{hh}, W_{hy} is 权重矩阵
b_h, b_y is 偏置项
tanh is 激活function

2.2 RNN unfold

将RNN按时间步unfold after , 可以看作 is a 具 has many 个重复层 before 馈神经network, 每层 for 应一个时间步. 这种unfold has 助于understandingRNN working principles and 训练过程.

3. RNN 训练

RNNusing反向传播algorithmsfor训练, 但由于其特殊循环structure, 需要usingBPTT (Back Propagation Through Time) algorithms, 即 in unfold network on for反向传播.

3.1 BPTTalgorithms

BPTTalgorithms basic步骤:

before 向传播: 计算每个时间步隐藏status and 输出
计算损失: using损失function计算预测值 and 真实值 diff
反向传播: from 最 after 一个时间步开始, 计算每个权重梯度
updateparameter: usingoptimizationalgorithmsupdatemodelparameter

3.2 梯度消失 and 梯度爆炸issues

由于RNN 循环structure, BPTTalgorithms in 训练过程in会遇 to 梯度消失 and 梯度爆炸issues:

梯度消失: 随着时间步增加, 梯度会指数级衰减, 导致早期时间步权重无法 has 效update
梯度爆炸: 梯度会指数级增 long , 导致model训练不 stable

for Understand决这些issues, 研究人员提出了 many 种improvement RNN变体, such asLSTM and GRU.

4. RNN变体

4.1 long short 期记忆network (LSTM)

LSTM (Long Short-Term Memory) is a特殊 RNN, through门控mechanism解决了梯度消失issues, able toLearning long 期依赖relationships.

4.1.1 LSTM 门控mechanism

LSTMthrough三个门控单元来控制information 流动:

遗忘门 (Forget Gate) : 决定哪些historyinformation需要被遗忘
输入门 (Input Gate) : 决定哪些 new information需要被保存
输出门 (Output Gate) : 决定当 before 隐藏status 输出

4.1.2 LSTM 数学model

LSTM 计算过程such as under :

1. 遗忘门

f_t = σ(W_f · [h_{t-1}, x_t] + b_f)

2. 输入门

i_t = σ(W_i · [h_{t-1}, x_t] + b_i)

~C_t = tanh(W_C · [h_{t-1}, x_t] + b_C)

3. 细胞statusupdate

C_t = f_t * C_{t-1} + i_t * ~C_t

4. 输出门

o_t = σ(W_o · [h_{t-1}, x_t] + b_o)

h_t = o_t * tanh(C_t)

其in:

σ is sigmoid激活function
* is 元素级乘法
C_t is 细胞status ( long 期记忆)
h_t is 隐藏status ( short 期记忆)

4.2 门控循环单元 (GRU)

GRU (Gated Recurrent Unit) is LSTM 简化version, throughmerge门控单元reducing了modelparameter, improving了训练efficiency.

4.2.1 GRU 门控mechanism

GRU只 has 两个门控单元:

update门 (Update Gate) : 控制historyinformation and new information 比例
reset门 (Reset Gate) : 决定such as何usinghistoryinformation

4.2.2 GRU 数学model

GRU 计算过程such as under :

1. update门 and reset门

z_t = σ(W_z · [h_{t-1}, x_t] + b_z)

r_t = σ(W_r · [h_{t-1}, x_t] + b_r)

2. 候选隐藏status

~h_t = tanh(W · [r_t * h_{t-1}, x_t] + b)

3. 隐藏statusupdate

h_t = (1 - z_t) * h_{t-1} + z_t * ~h_t

5. TensorFlowin RNNimplementation

5.1 basicRNN层

TensorFlowproviding了SimpleRNN层, 用于implementationbasic RNN:

# usingSimpleRNN层
rnn_layer = tf.keras.layers.SimpleRNN(
    units=64,  # 隐藏单元数量
    activation='tanh',  # 激活function
    return_sequences=False,  #  is 否返回所 has 时间步 输出
    input_shape=(timesteps, input_dim)  # 输入形状
)

5.2 LSTM层

usingLSTM层implementation long short 期记忆network:

# usingLSTM层
lstm_layer = tf.keras.layers.LSTM(
    units=64,  # 隐藏单元数量
    activation='tanh',  # 隐藏status激活function
    recurrent_activation='sigmoid',  # 门控激活function
    return_sequences=False,  #  is 否返回所 has 时间步 输出
    return_state=False,  #  is 否返回隐藏status and 细胞status
    input_shape=(timesteps, input_dim)  # 输入形状
)

5.3 GRU层

usingGRU层implementation门控循环单元:

# usingGRU层
gru_layer = tf.keras.layers.GRU(
    units=64,  # 隐藏单元数量
    activation='tanh',  # 隐藏status激活function
    recurrent_activation='sigmoid',  # 门控激活function
    return_sequences=False,  #  is 否返回所 has 时间步 输出
    return_state=False,  #  is 否返回隐藏status
    input_shape=(timesteps, input_dim)  # 输入形状
)

6. RNNapplicationinstance

6.1 文本classification

usingLSTMfor文本classification, 以IMDB电影评论情感analysis for 例:

# 加载IMDBdata集
imdb = tf.keras.datasets.imdb
max_features = 10000  # 只using before 10000个最common 词
maxlen = 200  # 每条评论截断 or 填充 to 200个词

# 加载data集
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)

# 填充序列
X_train = tf.keras.preprocessing.sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = tf.keras.preprocessing.sequence.pad_sequences(X_test, maxlen=maxlen)

# 构建LSTMmodel
model = tf.keras.Sequential([
    # 嵌入层: 将词index转换 for 词向量
    tf.keras.layers.Embedding(input_dim=max_features, output_dim=128, input_length=maxlen),
    
    # LSTM层
    tf.keras.layers.LSTM(units=64, return_sequences=True),
    tf.keras.layers.LSTM(units=32),
    
    # 全连接层
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    
    # 输出层
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# 查看model摘要
model.summary()

# 编译model
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# 训练model
history = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=64,
    validation_split=0.2,
    verbose=1
)

# assessmentmodel
loss, accuracy = model.evaluate(X_test, y_test, verbose=1)
print(f"test损失: {loss}")
print(f"test准确率: {accuracy}")

6.2 时间序列预测

usingLSTMfor时间序列预测, 以正弦波预测 for 例:

# 生成正弦波data
import numpy as np
import matplotlib.pyplot as plt

# 生成data
time = np.arange(0, 1000, 0.1)
sin_wave = np.sin(time)

# 构建data集
def create_dataset(data, window_size):
    X, y = [], []
    for i in range(len(data) - window_size):
        X.append(data[i:i+window_size])
        y.append(data[i+window_size])
    return np.array(X), np.array(y)

# 设置窗口 big  small 
window_size = 20

# creationdata集
X, y = create_dataset(sin_wave, window_size)

# 划分训练集 and test集
split_index = int(len(X) * 0.8)
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# 重塑data形状 for  (样本数, 时间步, 特征数)
X_train = X_train.reshape(-1, window_size, 1)
X_test = X_test.reshape(-1, window_size, 1)

# 构建LSTMmodel
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(units=50, return_sequences=True, input_shape=(window_size, 1)),
    tf.keras.layers.LSTM(units=50),
    tf.keras.layers.Dense(1)
])

# 编译model
model.compile(
    optimizer='adam',
    loss='mse'
)

# 训练model
history = model.fit(
    X_train, y_train,
    epochs=20,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

# 预测
y_pred = model.predict(X_test)

# visualization预测结果
plt.figure(figsize=(12, 6))
plt.plot(y_test, label='真实值')
plt.plot(y_pred, label='预测值')
plt.title('正弦波预测')
plt.xlabel('时间步')
plt.ylabel('值')
plt.legend()
plt.show()

6.3 序列生成

usingLSTM生成文本序列, 以生成古诗词 for 例:

#  simple  文本生成example
# fake设我们 has 一个package含古诗词 文本file 'poems.txt'

# 读取文本data
with open('poems.txt', 'r', encoding='utf-8') as f:
    text = f.read()

# creation字符 to index map
chars = sorted(list(set(text)))
char2idx = {char: i for i, char in enumerate(chars)}
idx2char = {i: char for i, char in enumerate(chars)}
vocab_size = len(chars)

# creation训练data
max_length = 100  # 序列 long 度
step = 5  # 步 long 

sentences = []
next_chars = []

for i in range(0, len(text) - max_length, step):
    sentences.append(text[i:i+max_length])
    next_chars.append(text[i+max_length])

# 将文本转换 for 数值
X = np.zeros((len(sentences), max_length, vocab_size), dtype=np.bool)
y = np.zeros((len(sentences), vocab_size), dtype=np.bool)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char2idx[char]] = 1
    y[i, char2idx[next_chars[i]]] = 1

# 构建文本生成model
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(128, input_shape=(max_length, vocab_size), return_sequences=True),
    tf.keras.layers.LSTM(128),
    tf.keras.layers.Dense(vocab_size, activation='softmax')
])

# 编译model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy'
)

# 训练model
history = model.fit(
    X, y,
    epochs=50,
    batch_size=128,
    verbose=1
)

# 生成文本
def generate_text(model, start_string, num_generate=1000, temperature=1.0):
    # 将起始string转换 for 数值
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)  # scale for (batch_size, sequence_length)
    
    # 生成 文本
    text_generated = []
    
    # resetmodelstatus
    model.reset_states()
    
    for i in range(num_generate):
        # 预测 under 一个字符
        predictions = model(input_eval)
        predictions = tf.squeeze(predictions, 0)  # 移除batch维度
        
        # usingtemperature调整预测分布 随机性
        predictions = predictions / temperature
        
        # 采样 under 一个字符 index
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
        
        # 将预测 字符添加 to 生成 文本in
        text_generated.append(idx2char[predicted_id])
        
        # update输入: 将预测 字符serving as under 一个时间步 输入
        input_eval = tf.expand_dims([predicted_id], 0)
    
    return start_string + ''.join(text_generated)

# 生成文本
start_string = "床 before 明月光"
generated_text = generate_text(model, start_string, num_generate=500, temperature=0.8)
print(generated_text)

7. RNNadvancedtechniques

7.1 双向RNN

双向RNN同时考虑过去 and 未来 information, improvingmodelperformance:

# using双向LSTM
bi_lstm = tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(units=64, return_sequences=True),
    merge_mode='concat'  # merge模式: 'concat', 'sum', 'mul', 'ave'
)

7.2 many 层RNN

堆叠 many 个RNN层, 增加model 深度:

# 构建 many 层LSTMmodel
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=max_features, output_dim=128, input_length=maxlen),
    
    # 第一层LSTM, 返回所 has 时间步 输出
    tf.keras.layers.LSTM(units=64, return_sequences=True),
    tf.keras.layers.Dropout(0.2),
    
    # 第二层LSTM, 返回所 has 时间步 输出
    tf.keras.layers.LSTM(units=64, return_sequences=True),
    tf.keras.layers.Dropout(0.2),
    
    # 第三层LSTM, 只返回最 after 一个时间步 输出
    tf.keras.layers.LSTM(units=32),
    tf.keras.layers.Dropout(0.2),
    
    tf.keras.layers.Dense(1, activation='sigmoid')
])

7.3 using注意力mechanism

注意力mechanism可以让model in processing序列时关注 important 部分:

# implementation一个 simple  注意力层
class AttentionLayer(tf.keras.layers.Layer):
    def __init__(self):
        super(AttentionLayer, self).__init__()
        
    def build(self, input_shape):
        # creation权重 and 偏置
        self.W = self.add_weight(name='attention_weights', shape=(input_shape[-1], 1),
                               initializer='random_normal', trainable=True)
        self.b = self.add_weight(name='attention_bias', shape=(input_shape[1], 1),
                               initializer='zeros', trainable=True)
    
    def call(self, inputs):
        # 计算注意力得分
        attention_scores = tf.matmul(inputs, self.W) + self.b
        attention_scores = tf.squeeze(attention_scores, axis=-1)
        
        # usingsoftmax计算注意力权重
        attention_weights = tf.nn.softmax(attention_scores, axis=1)
        attention_weights = tf.expand_dims(attention_weights, axis=-1)
        
        # 加权求 and 
        context_vector = inputs * attention_weights
        context_vector = tf.reduce_sum(context_vector, axis=1)
        
        return context_vector, attention_weights

#  in modelinusing注意力层
inputs = tf.keras.Input(shape=(maxlen, 128))
lstm_output = tf.keras.layers.LSTM(64, return_sequences=True)(inputs)
context_vector, attention_weights = AttentionLayer()(lstm_output)
dense_output = tf.keras.layers.Dense(64, activation='relu')(context_vector)
output = tf.keras.layers.Dense(1, activation='sigmoid')(dense_output)

model = tf.keras.Model(inputs=inputs, outputs=output)

7.4 migrationLearning

using预训练词嵌入 or languagemodelimprovingRNNmodelperformance:

# using预训练 GloVe词嵌入
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# 加载GloVe词向量
def load_glove_vectors(glove_file):
    word_vectors = {}
    with open(glove_file, 'r', encoding='utf-8') as f:
        for line in f:
            values = line.split()
            word = values[0]
            vector = np.asarray(values[1:], dtype='float32')
            word_vectors[word] = vector
    return word_vectors

# 加载预训练 词向量
glove_file = 'glove.6B.100d.txt'
word_vectors = load_glove_vectors(glove_file)

# creation嵌入矩阵
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
word_index = tokenizer.word_index

embedding_dim = 100
embedding_matrix = np.zeros((len(word_index) + 1, embedding_dim))

for word, i in word_index.items():
    if word in word_vectors:
        embedding_matrix[i] = word_vectors[word]

# using预训练 嵌入层
embedding_layer = tf.keras.layers.Embedding(
    input_dim=len(word_index) + 1,
    output_dim=embedding_dim,
    weights=[embedding_matrix],
    input_length=maxlen,
    trainable=False  #  is 否 in 训练过程inupdate嵌入向量
)

8. RNN application领域

RNN in 许 many 领域都 has 广泛 application, including:

8.1 自然languageprocessing

文本classification
情感analysis
机器翻译
文本生成
命名实体识别
句法analysis

8.2 语音processing

speech recognition
语音合成
说话人识别
情感识别

8.3 时间序列预测

股票价格预测
天气预测
交通traffic预测
电力负荷预测

8.4 视频processing

动作识别
视频classification
视频生成
视频字幕生成

9. 练习

练习 1: 文本classification

usingIMDBdata集, 构建一个LSTMmodelfor情感analysis.
尝试不同 networkstructure, such as many 层LSTM, 双向LSTMetc..
比较不同model performancediff.

练习 2: 时间序列预测

using股票价格data or 天气data, 构建LSTMmodelfor预测.
尝试不同窗口 big small and modelparameter.
visualization预测结果, assessmentmodelperformance.

练习 3: 文本生成

using自己喜欢文本data (such as small 说, 诗歌etc.) , 训练一个文本生成model.
尝试不同温度值, 观察生成文本 many 样性.
生成一段 long 度 for 500-1000字符文本.

练习 4: using注意力mechanism

in LSTMmodelin添加注意力层, improvingmodelperformance.
visualization注意力权重, analysismodel关注文本部分.
比较添加注意力层 before after modelperformance.

on 一节: TensorFlow 卷积神经network(CNN) under 一节: TensorFlow modeldeployment and produce化