TensorFlow 循环神经network(RNN)

Learning循环神经network 原理 and application, includingLSTM, GRUetc.变体, 以及 in 序列dataprocessingin application

1. 循环神经networkIntroduction

循环神经network (Recurrent Neural Network, RNN) is a专门用于processing序列data 神经networkmodel. and 传统 before 馈神经network不同, RNN具 has 记忆capacity, able to利用historyinformation来processing当 before 输入.

RNN 主要特点including:

  • 记忆capacity: able to记住之 before 输入information, 用于当 before 预测
  • 序列processing: 适合processing时间序列, 文本, 语音etc.序列data
  • parameter共享: in 不同时间步共享相同 parameter, reducingmodelcomplexity
  • 可变 long 度输入: able toprocessing不同 long 度 序列输入

2. RNN basic原理

RNNthrough循环连接implementation记忆functions, 其basicstructuresuch as under :

2.1 RNN 数学model

for 于时间步t, RNN 计算可以表示 for :

h_t = tanh(W_{xh} x_t + W_{hh} h_{t-1} + b_h)

y_t = W_{hy} h_t + b_y

其in:

  • x_t is 时间步t 输入
  • h_t is 时间步t 隐藏status (记忆)
  • y_t is 时间步t 输出
  • W_{xh}, W_{hh}, W_{hy} is 权重矩阵
  • b_h, b_y is 偏置项
  • tanh is 激活function

2.2 RNN unfold

将RNN按时间步unfold after , 可以看作 is a 具 has many 个重复层 before 馈神经network, 每层 for 应一个时间步. 这种unfold has 助于understandingRNN working principles and 训练过程.

3. RNN 训练

RNNusing反向传播algorithmsfor训练, 但由于其特殊 循环structure, 需要usingBPTT (Back Propagation Through Time) algorithms, 即 in unfold network on for反向传播.

3.1 BPTTalgorithms

BPTTalgorithms basic步骤:

  1. before 向传播: 计算每个时间步 隐藏status and 输出
  2. 计算损失: using损失function计算预测值 and 真实值 diff
  3. 反向传播: from 最 after 一个时间步开始, 计算每个权重 梯度
  4. updateparameter: usingoptimizationalgorithmsupdatemodelparameter

3.2 梯度消失 and 梯度爆炸issues

由于RNN 循环structure, BPTTalgorithms in 训练过程in会遇 to 梯度消失 and 梯度爆炸issues:

  • 梯度消失: 随着时间步 增加, 梯度会指数级衰减, 导致早期时间步 权重无法 has 效update
  • 梯度爆炸: 梯度会指数级增 long , 导致model训练不 stable

for Understand决这些issues, 研究人员提出了 many 种improvement RNN变体, such asLSTM and GRU.

4. RNN变体

4.1 long short 期记忆network (LSTM)

LSTM (Long Short-Term Memory) is a特殊 RNN, through门控mechanism解决了梯度消失issues, able toLearning long 期依赖relationships.

4.1.1 LSTM 门控mechanism

LSTMthrough三个门控单元来控制information 流动:

  • 遗忘门 (Forget Gate) : 决定哪些historyinformation需要被遗忘
  • 输入门 (Input Gate) : 决定哪些 new information需要被保存
  • 输出门 (Output Gate) : 决定当 before 隐藏status 输出

4.1.2 LSTM 数学model

LSTM 计算过程such as under :

1. 遗忘门

f_t = σ(W_f · [h_{t-1}, x_t] + b_f)

2. 输入门

i_t = σ(W_i · [h_{t-1}, x_t] + b_i)

~C_t = tanh(W_C · [h_{t-1}, x_t] + b_C)

3. 细胞statusupdate

C_t = f_t * C_{t-1} + i_t * ~C_t

4. 输出门

o_t = σ(W_o · [h_{t-1}, x_t] + b_o)

h_t = o_t * tanh(C_t)

其in:

  • σ is sigmoid激活function
  • * is 元素级乘法
  • C_t is 细胞status ( long 期记忆)
  • h_t is 隐藏status ( short 期记忆)

4.2 门控循环单元 (GRU)

GRU (Gated Recurrent Unit) is LSTM 简化version, throughmerge门控单元reducing了modelparameter, improving了训练efficiency.

4.2.1 GRU 门控mechanism

GRU只 has 两个门控单元:

  • update门 (Update Gate) : 控制historyinformation and new information 比例
  • reset门 (Reset Gate) : 决定such as何usinghistoryinformation

4.2.2 GRU 数学model

GRU 计算过程such as under :

1. update门 and reset门

z_t = σ(W_z · [h_{t-1}, x_t] + b_z)

r_t = σ(W_r · [h_{t-1}, x_t] + b_r)

2. 候选隐藏status

~h_t = tanh(W · [r_t * h_{t-1}, x_t] + b)

3. 隐藏statusupdate

h_t = (1 - z_t) * h_{t-1} + z_t * ~h_t

5. TensorFlowin RNNimplementation

5.1 basicRNN层

TensorFlowproviding了SimpleRNN层, 用于implementationbasic RNN:

# usingSimpleRNN层
rnn_layer = tf.keras.layers.SimpleRNN(
    units=64,  # 隐藏单元数量
    activation='tanh',  # 激活function
    return_sequences=False,  #  is 否返回所 has 时间步 输出
    input_shape=(timesteps, input_dim)  # 输入形状
)

5.2 LSTM层

usingLSTM层implementation long short 期记忆network:

# usingLSTM层
lstm_layer = tf.keras.layers.LSTM(
    units=64,  # 隐藏单元数量
    activation='tanh',  # 隐藏status激活function
    recurrent_activation='sigmoid',  # 门控激活function
    return_sequences=False,  #  is 否返回所 has 时间步 输出
    return_state=False,  #  is 否返回隐藏status and 细胞status
    input_shape=(timesteps, input_dim)  # 输入形状
)

5.3 GRU层

usingGRU层implementation门控循环单元:

# usingGRU层
gru_layer = tf.keras.layers.GRU(
    units=64,  # 隐藏单元数量
    activation='tanh',  # 隐藏status激活function
    recurrent_activation='sigmoid',  # 门控激活function
    return_sequences=False,  #  is 否返回所 has 时间步 输出
    return_state=False,  #  is 否返回隐藏status
    input_shape=(timesteps, input_dim)  # 输入形状
)

6. RNNapplicationinstance

6.1 文本classification

usingLSTMfor文本classification, 以IMDB电影评论情感analysis for 例:

# 加载IMDBdata集
imdb = tf.keras.datasets.imdb
max_features = 10000  # 只using before 10000个最common 词
maxlen = 200  # 每条评论截断 or 填充 to 200个词

# 加载data集
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)

# 填充序列
X_train = tf.keras.preprocessing.sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = tf.keras.preprocessing.sequence.pad_sequences(X_test, maxlen=maxlen)

# 构建LSTMmodel
model = tf.keras.Sequential([
    # 嵌入层: 将词index转换 for 词向量
    tf.keras.layers.Embedding(input_dim=max_features, output_dim=128, input_length=maxlen),
    
    # LSTM层
    tf.keras.layers.LSTM(units=64, return_sequences=True),
    tf.keras.layers.LSTM(units=32),
    
    # 全连接层
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    
    # 输出层
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# 查看model摘要
model.summary()

# 编译model
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# 训练model
history = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=64,
    validation_split=0.2,
    verbose=1
)

# assessmentmodel
loss, accuracy = model.evaluate(X_test, y_test, verbose=1)
print(f"test损失: {loss}")
print(f"test准确率: {accuracy}")

6.2 时间序列预测

usingLSTMfor时间序列预测, 以正弦波预测 for 例:

# 生成正弦波data
import numpy as np
import matplotlib.pyplot as plt

# 生成data
time = np.arange(0, 1000, 0.1)
sin_wave = np.sin(time)

# 构建data集
def create_dataset(data, window_size):
    X, y = [], []
    for i in range(len(data) - window_size):
        X.append(data[i:i+window_size])
        y.append(data[i+window_size])
    return np.array(X), np.array(y)

# 设置窗口 big  small 
window_size = 20

# creationdata集
X, y = create_dataset(sin_wave, window_size)

# 划分训练集 and test集
split_index = int(len(X) * 0.8)
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

# 重塑data形状 for  (样本数, 时间步, 特征数)
X_train = X_train.reshape(-1, window_size, 1)
X_test = X_test.reshape(-1, window_size, 1)

# 构建LSTMmodel
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(units=50, return_sequences=True, input_shape=(window_size, 1)),
    tf.keras.layers.LSTM(units=50),
    tf.keras.layers.Dense(1)
])

# 编译model
model.compile(
    optimizer='adam',
    loss='mse'
)

# 训练model
history = model.fit(
    X_train, y_train,
    epochs=20,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

# 预测
y_pred = model.predict(X_test)

# visualization预测结果
plt.figure(figsize=(12, 6))
plt.plot(y_test, label='真实值')
plt.plot(y_pred, label='预测值')
plt.title('正弦波预测')
plt.xlabel('时间步')
plt.ylabel('值')
plt.legend()
plt.show()

6.3 序列生成

usingLSTM生成文本序列, 以生成古诗词 for 例:

#  simple  文本生成example
# fake设我们 has 一个package含古诗词 文本file 'poems.txt'

# 读取文本data
with open('poems.txt', 'r', encoding='utf-8') as f:
    text = f.read()

# creation字符 to index map
chars = sorted(list(set(text)))
char2idx = {char: i for i, char in enumerate(chars)}
idx2char = {i: char for i, char in enumerate(chars)}
vocab_size = len(chars)

# creation训练data
max_length = 100  # 序列 long 度
step = 5  # 步 long 

sentences = []
next_chars = []

for i in range(0, len(text) - max_length, step):
    sentences.append(text[i:i+max_length])
    next_chars.append(text[i+max_length])

# 将文本转换 for 数值
X = np.zeros((len(sentences), max_length, vocab_size), dtype=np.bool)
y = np.zeros((len(sentences), vocab_size), dtype=np.bool)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char2idx[char]] = 1
    y[i, char2idx[next_chars[i]]] = 1

# 构建文本生成model
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(128, input_shape=(max_length, vocab_size), return_sequences=True),
    tf.keras.layers.LSTM(128),
    tf.keras.layers.Dense(vocab_size, activation='softmax')
])

# 编译model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy'
)

# 训练model
history = model.fit(
    X, y,
    epochs=50,
    batch_size=128,
    verbose=1
)

# 生成文本
def generate_text(model, start_string, num_generate=1000, temperature=1.0):
    # 将起始string转换 for 数值
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)  # scale for (batch_size, sequence_length)
    
    # 生成 文本
    text_generated = []
    
    # resetmodelstatus
    model.reset_states()
    
    for i in range(num_generate):
        # 预测 under 一个字符
        predictions = model(input_eval)
        predictions = tf.squeeze(predictions, 0)  # 移除batch维度
        
        # usingtemperature调整预测分布 随机性
        predictions = predictions / temperature
        
        # 采样 under 一个字符 index
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
        
        # 将预测 字符添加 to 生成 文本in
        text_generated.append(idx2char[predicted_id])
        
        # update输入: 将预测 字符serving as under 一个时间步 输入
        input_eval = tf.expand_dims([predicted_id], 0)
    
    return start_string + ''.join(text_generated)

# 生成文本
start_string = "床 before 明月光"
generated_text = generate_text(model, start_string, num_generate=500, temperature=0.8)
print(generated_text)

7. RNNadvancedtechniques

7.1 双向RNN

双向RNN同时考虑过去 and 未来 information, improvingmodelperformance:

# using双向LSTM
bi_lstm = tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(units=64, return_sequences=True),
    merge_mode='concat'  # merge模式: 'concat', 'sum', 'mul', 'ave'
)

7.2 many 层RNN

堆叠 many 个RNN层, 增加model 深度:

# 构建 many 层LSTMmodel
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=max_features, output_dim=128, input_length=maxlen),
    
    # 第一层LSTM, 返回所 has 时间步 输出
    tf.keras.layers.LSTM(units=64, return_sequences=True),
    tf.keras.layers.Dropout(0.2),
    
    # 第二层LSTM, 返回所 has 时间步 输出
    tf.keras.layers.LSTM(units=64, return_sequences=True),
    tf.keras.layers.Dropout(0.2),
    
    # 第三层LSTM, 只返回最 after 一个时间步 输出
    tf.keras.layers.LSTM(units=32),
    tf.keras.layers.Dropout(0.2),
    
    tf.keras.layers.Dense(1, activation='sigmoid')
])

7.3 using注意力mechanism

注意力mechanism可以让model in processing序列时关注 important 部分:

# implementation一个 simple  注意力层
class AttentionLayer(tf.keras.layers.Layer):
    def __init__(self):
        super(AttentionLayer, self).__init__()
        
    def build(self, input_shape):
        # creation权重 and 偏置
        self.W = self.add_weight(name='attention_weights', shape=(input_shape[-1], 1),
                               initializer='random_normal', trainable=True)
        self.b = self.add_weight(name='attention_bias', shape=(input_shape[1], 1),
                               initializer='zeros', trainable=True)
    
    def call(self, inputs):
        # 计算注意力得分
        attention_scores = tf.matmul(inputs, self.W) + self.b
        attention_scores = tf.squeeze(attention_scores, axis=-1)
        
        # usingsoftmax计算注意力权重
        attention_weights = tf.nn.softmax(attention_scores, axis=1)
        attention_weights = tf.expand_dims(attention_weights, axis=-1)
        
        # 加权求 and 
        context_vector = inputs * attention_weights
        context_vector = tf.reduce_sum(context_vector, axis=1)
        
        return context_vector, attention_weights

#  in modelinusing注意力层
inputs = tf.keras.Input(shape=(maxlen, 128))
lstm_output = tf.keras.layers.LSTM(64, return_sequences=True)(inputs)
context_vector, attention_weights = AttentionLayer()(lstm_output)
dense_output = tf.keras.layers.Dense(64, activation='relu')(context_vector)
output = tf.keras.layers.Dense(1, activation='sigmoid')(dense_output)

model = tf.keras.Model(inputs=inputs, outputs=output)

7.4 migrationLearning

using预训练 词嵌入 or languagemodelimprovingRNNmodelperformance:

# using预训练 GloVe词嵌入
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# 加载GloVe词向量
def load_glove_vectors(glove_file):
    word_vectors = {}
    with open(glove_file, 'r', encoding='utf-8') as f:
        for line in f:
            values = line.split()
            word = values[0]
            vector = np.asarray(values[1:], dtype='float32')
            word_vectors[word] = vector
    return word_vectors

# 加载预训练 词向量
glove_file = 'glove.6B.100d.txt'
word_vectors = load_glove_vectors(glove_file)

# creation嵌入矩阵
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
word_index = tokenizer.word_index

embedding_dim = 100
embedding_matrix = np.zeros((len(word_index) + 1, embedding_dim))

for word, i in word_index.items():
    if word in word_vectors:
        embedding_matrix[i] = word_vectors[word]

# using预训练 嵌入层
embedding_layer = tf.keras.layers.Embedding(
    input_dim=len(word_index) + 1,
    output_dim=embedding_dim,
    weights=[embedding_matrix],
    input_length=maxlen,
    trainable=False  #  is 否 in 训练过程inupdate嵌入向量
)

8. RNN application领域

RNN in 许 many 领域都 has 广泛 application, including:

8.1 自然languageprocessing

  • 文本classification
  • 情感analysis
  • 机器翻译
  • 文本生成
  • 命名实体识别
  • 句法analysis

8.2 语音processing

  • speech recognition
  • 语音合成
  • 说话人识别
  • 情感识别

8.3 时间序列预测

  • 股票价格预测
  • 天气预测
  • 交通traffic预测
  • 电力负荷预测

8.4 视频processing

  • 动作识别
  • 视频classification
  • 视频生成
  • 视频字幕生成

9. 练习

练习 1: 文本classification

  1. usingIMDBdata集, 构建一个LSTMmodelfor情感analysis.
  2. 尝试不同 networkstructure, such as many 层LSTM, 双向LSTMetc..
  3. 比较不同model performancediff.

练习 2: 时间序列预测

  1. using股票价格data or 天气data, 构建LSTMmodelfor预测.
  2. 尝试不同 窗口 big small and modelparameter.
  3. visualization预测结果, assessmentmodelperformance.

练习 3: 文本生成

  1. using自己喜欢 文本data (such as small 说, 诗歌etc.) , 训练一个文本生成model.
  2. 尝试不同 温度值, 观察生成文本 many 样性.
  3. 生成一段 long 度 for 500-1000字符 文本.

练习 4: using注意力mechanism

  1. in LSTMmodelin添加注意力层, improvingmodelperformance.
  2. visualization注意力权重, analysismodel关注 文本部分.
  3. 比较添加注意力层 before after modelperformance.
on 一节: TensorFlow 卷积神经network(CNN) under 一节: TensorFlow modeldeployment and produce化