Deep Learning Overview
Deep learning is an important branch of machine learning that uses multi-layer neural networks to simulate the human brain's learning process. Deep learning models can automatically learn feature representations from data without manual feature engineering, achieving significant results in image recognition, natural language processing, speech recognition, and other fields.
Key characteristics of deep learning include:
- Multi-layer structure: By stacking multiple neural network layers, models can learn multi-level feature representations
- Automatic feature extraction: Automatically learning useful features from raw data
- Strong expressive power: Capable of modeling complex non-linear relationships
- Requires large amounts of data: Typically needs large labeled datasets for training
- Computationally intensive: Training process requires powerful computing resources
Neural Network Fundamentals
1. Perceptron
The perceptron is the basic building block of neural networks. It receives multiple inputs, computes a weighted sum, and then outputs the result through an activation function.
import numpy as np
class Perceptron:
def __init__(self, input_size, learning_rate=0.01):
self.weights = np.zeros(input_size + 1) # +1 for bias
self.learning_rate = learning_rate
def predict(self, inputs):
# Calculate weighted sum (including bias)
summation = np.dot(inputs, self.weights[1:]) + self.weights[0]
# Apply step function
return 1 if summation > 0 else 0
def train(self, training_inputs, labels, epochs):
for epoch in range(epochs):
for inputs, label in zip(training_inputs, labels):
prediction = self.predict(inputs)
# Update weights
self.weights[1:] += self.learning_rate * (label - prediction) * inputs
self.weights[0] += self.learning_rate * (label - prediction)
# Test the perceptron
if __name__ == "__main__":
# Training data (logical AND)
training_inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
labels = np.array([0, 0, 0, 1])
perceptron = Perceptron(2)
perceptron.train(training_inputs, labels, 10)
# Test
test_inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
for inputs in test_inputs:
print(f"Input: {inputs}, Prediction: {perceptron.predict(inputs)}")
2. Multi-layer Neural Network
A multi-layer neural network (also known as a feedforward neural network) consists of an input layer, hidden layers, and an output layer. The presence of hidden layers allows neural networks to learn complex non-linear relationships.
import numpy as np
class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
# Initialize weights
self.weights1 = np.random.randn(input_size, hidden_size)
self.bias1 = np.zeros((1, hidden_size))
self.weights2 = np.random.randn(hidden_size, output_size)
self.bias2 = np.zeros((1, output_size))
def sigmoid(self, x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(self, x):
return x * (1 - x)
def forward(self, inputs):
# Forward propagation
self.layer1 = self.sigmoid(np.dot(inputs, self.weights1) + self.bias1)
self.output = self.sigmoid(np.dot(self.layer1, self.weights2) + self.bias2)
return self.output
def backward(self, inputs, labels, learning_rate):
# Calculate error
output_error = labels - self.output
output_delta = output_error * self.sigmoid_derivative(self.output)
# Hidden layer error
layer1_error = output_delta.dot(self.weights2.T)
layer1_delta = layer1_error * self.sigmoid_derivative(self.layer1)
# Update weights
self.weights2 += self.layer1.T.dot(output_delta) * learning_rate
self.bias2 += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
self.weights1 += inputs.T.dot(layer1_delta) * learning_rate
self.bias1 += np.sum(layer1_delta, axis=0, keepdims=True) * learning_rate
def train(self, inputs, labels, epochs, learning_rate):
for epoch in range(epochs):
self.forward(inputs)
self.backward(inputs, labels, learning_rate)
if epoch % 1000 == 0:
loss = np.mean(np.square(labels - self.output))
print(f"Epoch {epoch}, Loss: {loss:.4f}")
# Test the neural network
if __name__ == "__main__":
# Training data (logical XOR)
inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
labels = np.array([[0], [1], [1], [0]])
nn = NeuralNetwork(2, 4, 1)
nn.train(inputs, labels, 10000, 0.1)
# Test
print("\nTest Results:")
for i, input_data in enumerate(inputs):
prediction = nn.forward(input_data.reshape(1, -1))
print(f"Input: {input_data}, Prediction: {prediction[0][0]:.4f}, Target: {labels[i][0]}")
Activation Functions
1. Common Activation Functions
Activation functions introduce non-linearity to neural networks, enabling models to learn complex function relationships. Common activation functions include:
import numpy as np
import matplotlib.pyplot as plt
# Define activation functions
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def tanh(x):
return np.tanh(x)
def relu(x):
return np.maximum(0, x)
def leaky_relu(x, alpha=0.01):
return np.maximum(alpha * x, x)
def softmax(x):
exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
return exp_x / np.sum(exp_x, axis=-1, keepdims=True)
# Generate data
x = np.linspace(-10, 10, 100)
# Plot activation functions
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes[0, 0].plot(x, sigmoid(x))
axes[0, 0].set_title('Sigmoid')
axes[0, 0].grid(True)
axes[0, 1].plot(x, tanh(x))
axes[0, 1].set_title('Tanh')
axes[0, 1].grid(True)
axes[0, 2].plot(x, relu(x))
axes[0, 2].set_title('ReLU')
axes[0, 2].grid(True)
axes[1, 0].plot(x, leaky_relu(x))
axes[1, 0].set_title('Leaky ReLU')
axes[1, 0].grid(True)
# Softmax example
x_softmax = np.array([-1, 0, 1, 2])
y_softmax = softmax(x_softmax)
axes[1, 1].bar(range(len(x_softmax)), y_softmax)
axes[1, 1].set_title('Softmax')
axes[1, 1].grid(True)
# Hide the last subplot
axes[1, 2].axis('off')
plt.tight_layout()
plt.show()
2. Choosing Activation Functions
Different activation functions are suitable for different scenarios:
- Sigmoid: Suitable for the output layer of binary classification problems, but prone to gradient vanishing in hidden layers
- Tanh: Has better gradient properties than Sigmoid, but may still cause gradient vanishing
- ReLU: Widely used in hidden layers, computationally efficient, mitigates gradient vanishing, but may cause neuron death
- Leaky ReLU: Addresses the neuron death problem of ReLU
- Softmax: Suitable for the output layer of multi-classification problems, converts outputs to probability distributions
Loss Functions and Optimizers
1. Loss Functions
Loss functions measure the difference between model predictions and true values, serving as the objective function for model training. Common loss functions include:
import numpy as np
# Mean Squared Error (MSE) - Regression problems
def mean_squared_error(y_true, y_pred):
return np.mean(np.square(y_true - y_pred))
# Mean Absolute Error (MAE) - Regression problems
def mean_absolute_error(y_true, y_pred):
return np.mean(np.abs(y_true - y_pred))
# Binary Cross Entropy - Binary classification problems
def binary_cross_entropy(y_true, y_pred):
# Avoid log(0) cases
epsilon = 1e-15
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
# Categorical Cross Entropy - Multi-class classification problems
def categorical_cross_entropy(y_true, y_pred):
epsilon = 1e-15
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
return -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
# Examples
if __name__ == "__main__":
# Regression problem example
y_true_reg = np.array([1.0, 2.0, 3.0])
y_pred_reg = np.array([0.9, 1.8, 3.1])
print(f"MSE: {mean_squared_error(y_true_reg, y_pred_reg):.4f}")
print(f"MAE: {mean_absolute_error(y_true_reg, y_pred_reg):.4f}")
# Binary classification problem example
y_true_bin = np.array([1, 0, 1])
y_pred_bin = np.array([0.9, 0.2, 0.8])
print(f"Binary Cross Entropy: {binary_cross_entropy(y_true_bin, y_pred_bin):.4f}")
# Multi-class classification problem example
y_true_cat = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
y_pred_cat = np.array([[0.8, 0.1, 0.1], [0.2, 0.7, 0.1], [0.1, 0.2, 0.7]])
print(f"Categorical Cross Entropy: {categorical_cross_entropy(y_true_cat, y_pred_cat):.4f}")
2. Optimizers
Optimizers are used to minimize loss functions and update model parameters. Common optimizers include:
import numpy as np
class SGD:
def __init__(self, learning_rate=0.01):
self.learning_rate = learning_rate
def update(self, params, gradients):
for param_name, param in params.items():
params[param_name] -= self.learning_rate * gradients[param_name]
return params
class Momentum:
def __init__(self, learning_rate=0.01, momentum=0.9):
self.learning_rate = learning_rate
self.momentum = momentum
self.velocities = {}
def update(self, params, gradients):
if not self.velocities:
for param_name, param in params.items():
self.velocities[param_name] = np.zeros_like(param)
for param_name, param in params.items():
self.velocities[param_name] = self.momentum * self.velocities[param_name] + self.learning_rate * gradients[param_name]
params[param_name] -= self.velocities[param_name]
return params
class Adam:
def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
self.learning_rate = learning_rate
self.beta1 = beta1
self.beta2 = beta2
self.epsilon = epsilon
self.m = {}
self.v = {}
self.t = 0
def update(self, params, gradients):
self.t += 1
if not self.m:
for param_name, param in params.items():
self.m[param_name] = np.zeros_like(param)
self.v[param_name] = np.zeros_like(param)
for param_name, param in params.items():
# Update first moment estimate
self.m[param_name] = self.beta1 * self.m[param_name] + (1 - self.beta1) * gradients[param_name]
# Update second moment estimate
self.v[param_name] = self.beta2 * self.v[param_name] + (1 - self.beta2) * np.square(gradients[param_name])
# Bias correction
m_hat = self.m[param_name] / (1 - np.power(self.beta1, self.t))
v_hat = self.v[param_name] / (1 - np.power(self.beta2, self.t))
# Update parameters
params[param_name] -= self.learning_rate * m_hat / (np.sqrt(v_hat) + self.epsilon)
return params
# Example
if __name__ == "__main__":
# Simple parameter and gradient example
params = {
'weights': np.array([0.5, -0.5]),
'bias': np.array([0.0])
}
gradients = {
'weights': np.array([0.1, -0.1]),
'bias': np.array([0.05])
}
# Test different optimizers
print("Initial params:", params)
sgd = SGD(learning_rate=0.1)
sgd_params = sgd.update(params.copy(), gradients)
print("SGD updated params:", sgd_params)
momentum = Momentum(learning_rate=0.1)
momentum_params = momentum.update(params.copy(), gradients)
print("Momentum updated params:", momentum_params)
adam = Adam(learning_rate=0.1)
adam_params = adam.update(params.copy(), gradients)
print("Adam updated params:", adam_params)
Building Neural Networks with TensorFlow/Keras
TensorFlow and Keras are powerful tools for building and training deep learning models. Keras is a high-level API for TensorFlow, providing a concise interface for defining and training neural networks.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Load and preprocess data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Data preprocessing
x_train = x_train.reshape(-1, 28*28) / 255.0
x_test = x_test.reshape(-1, 28*28) / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
# Build model
model = Sequential([
Dense(128, activation='relu', input_shape=(28*28,)),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
# Compile model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Print model summary
model.summary()
# Train model
history = model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
# Evaluate model
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")
# Save model
model.save('mnist_model.h5')
# Load model
from tensorflow.keras.models import load_model
loaded_model = load_model('mnist_model.h5')
# Use loaded model for prediction
predictions = loaded_model.predict(x_test[:5])
print("\nPredictions for first 5 test images:")
print(np.argmax(predictions, axis=1))
print("Actual labels:")
print(np.argmax(y_test[:5], axis=1))
Practical Case: Image Classification
In this practical case, we'll use TensorFlow and Keras to build a Convolutional Neural Network (CNN) for image classification on the CIFAR-10 dataset.
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Dropout
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
# Load and preprocess data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# Data preprocessing
x_train = x_train / 255.0
x_test = x_test / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
# Class names
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
# Visualize some training images
plt.figure(figsize=(10, 10))
for i in range(25):
plt.subplot(5, 5, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[i])
plt.xlabel(class_names[np.argmax(y_train[i])])
plt.tight_layout()
plt.show()
# Build CNN model
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
Flatten(),
Dense(64, activation='relu'),
Dropout(0.5),
Dense(10, activation='softmax')
])
# Compile model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Print model summary
model.summary()
# Train model
history = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2)
# Evaluate model
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")
# Plot training and validation accuracy
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
# Plot training and validation loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
# Save model
model.save('cifar10_cnn_model.h5')
Interactive Exercises
Exercise 1: Neural Network Implementation
Implement a simple three-layer neural network using NumPy for binary classification.
- Implement a neural network with input layer, hidden layer, and output layer
- Use sigmoid as the activation function
- Implement forward propagation and backpropagation
- Use gradient descent optimizer
- Train and test the model on a synthetic dataset
Exercise 2: Building Models with Keras
Build a neural network using TensorFlow and Keras to predict Boston housing prices.
- Load the Boston housing dataset
- Standardize the data
- Build a neural network with multiple hidden layers
- Compile and train the model
- Evaluate model performance
- Try different optimizers and activation functions
Exercise 3: CNN Image Classification
Build a convolutional neural network using TensorFlow and Keras for classifying the Fashion MNIST dataset.
- Load the Fashion MNIST dataset
- Preprocess the data
- Build a CNN model
- Train the model and monitor performance
- Evaluate model performance on the test set
- Visualize model predictions
Recommended Tutorials
Unsupervised Learning
Explore clustering, dimensionality reduction, and anomaly detection algorithms
View TutorialModel Evaluation
Learn model evaluation metrics and model selection methods to ensure reliability and generalization
View TutorialFeature Engineering
Master feature engineering techniques to improve model performance and generalization
View Tutorial