Deep Learning Overview

Deep learning is an important branch of machine learning that uses multi-layer neural networks to simulate the human brain's learning process. Deep learning models can automatically learn feature representations from data without manual feature engineering, achieving significant results in image recognition, natural language processing, speech recognition, and other fields.

Key characteristics of deep learning include:

  • Multi-layer structure: By stacking multiple neural network layers, models can learn multi-level feature representations
  • Automatic feature extraction: Automatically learning useful features from raw data
  • Strong expressive power: Capable of modeling complex non-linear relationships
  • Requires large amounts of data: Typically needs large labeled datasets for training
  • Computationally intensive: Training process requires powerful computing resources

Neural Network Fundamentals

1. Perceptron

The perceptron is the basic building block of neural networks. It receives multiple inputs, computes a weighted sum, and then outputs the result through an activation function.

import numpy as np class Perceptron: def __init__(self, input_size, learning_rate=0.01): self.weights = np.zeros(input_size + 1) # +1 for bias self.learning_rate = learning_rate def predict(self, inputs): # Calculate weighted sum (including bias) summation = np.dot(inputs, self.weights[1:]) + self.weights[0] # Apply step function return 1 if summation > 0 else 0 def train(self, training_inputs, labels, epochs): for epoch in range(epochs): for inputs, label in zip(training_inputs, labels): prediction = self.predict(inputs) # Update weights self.weights[1:] += self.learning_rate * (label - prediction) * inputs self.weights[0] += self.learning_rate * (label - prediction) # Test the perceptron if __name__ == "__main__": # Training data (logical AND) training_inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) labels = np.array([0, 0, 0, 1]) perceptron = Perceptron(2) perceptron.train(training_inputs, labels, 10) # Test test_inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) for inputs in test_inputs: print(f"Input: {inputs}, Prediction: {perceptron.predict(inputs)}")

2. Multi-layer Neural Network

A multi-layer neural network (also known as a feedforward neural network) consists of an input layer, hidden layers, and an output layer. The presence of hidden layers allows neural networks to learn complex non-linear relationships.

import numpy as np class NeuralNetwork: def __init__(self, input_size, hidden_size, output_size): # Initialize weights self.weights1 = np.random.randn(input_size, hidden_size) self.bias1 = np.zeros((1, hidden_size)) self.weights2 = np.random.randn(hidden_size, output_size) self.bias2 = np.zeros((1, output_size)) def sigmoid(self, x): return 1 / (1 + np.exp(-x)) def sigmoid_derivative(self, x): return x * (1 - x) def forward(self, inputs): # Forward propagation self.layer1 = self.sigmoid(np.dot(inputs, self.weights1) + self.bias1) self.output = self.sigmoid(np.dot(self.layer1, self.weights2) + self.bias2) return self.output def backward(self, inputs, labels, learning_rate): # Calculate error output_error = labels - self.output output_delta = output_error * self.sigmoid_derivative(self.output) # Hidden layer error layer1_error = output_delta.dot(self.weights2.T) layer1_delta = layer1_error * self.sigmoid_derivative(self.layer1) # Update weights self.weights2 += self.layer1.T.dot(output_delta) * learning_rate self.bias2 += np.sum(output_delta, axis=0, keepdims=True) * learning_rate self.weights1 += inputs.T.dot(layer1_delta) * learning_rate self.bias1 += np.sum(layer1_delta, axis=0, keepdims=True) * learning_rate def train(self, inputs, labels, epochs, learning_rate): for epoch in range(epochs): self.forward(inputs) self.backward(inputs, labels, learning_rate) if epoch % 1000 == 0: loss = np.mean(np.square(labels - self.output)) print(f"Epoch {epoch}, Loss: {loss:.4f}") # Test the neural network if __name__ == "__main__": # Training data (logical XOR) inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) labels = np.array([[0], [1], [1], [0]]) nn = NeuralNetwork(2, 4, 1) nn.train(inputs, labels, 10000, 0.1) # Test print("\nTest Results:") for i, input_data in enumerate(inputs): prediction = nn.forward(input_data.reshape(1, -1)) print(f"Input: {input_data}, Prediction: {prediction[0][0]:.4f}, Target: {labels[i][0]}")

Activation Functions

1. Common Activation Functions

Activation functions introduce non-linearity to neural networks, enabling models to learn complex function relationships. Common activation functions include:

import numpy as np import matplotlib.pyplot as plt # Define activation functions def sigmoid(x): return 1 / (1 + np.exp(-x)) def tanh(x): return np.tanh(x) def relu(x): return np.maximum(0, x) def leaky_relu(x, alpha=0.01): return np.maximum(alpha * x, x) def softmax(x): exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True)) return exp_x / np.sum(exp_x, axis=-1, keepdims=True) # Generate data x = np.linspace(-10, 10, 100) # Plot activation functions fig, axes = plt.subplots(2, 3, figsize=(15, 10)) axes[0, 0].plot(x, sigmoid(x)) axes[0, 0].set_title('Sigmoid') axes[0, 0].grid(True) axes[0, 1].plot(x, tanh(x)) axes[0, 1].set_title('Tanh') axes[0, 1].grid(True) axes[0, 2].plot(x, relu(x)) axes[0, 2].set_title('ReLU') axes[0, 2].grid(True) axes[1, 0].plot(x, leaky_relu(x)) axes[1, 0].set_title('Leaky ReLU') axes[1, 0].grid(True) # Softmax example x_softmax = np.array([-1, 0, 1, 2]) y_softmax = softmax(x_softmax) axes[1, 1].bar(range(len(x_softmax)), y_softmax) axes[1, 1].set_title('Softmax') axes[1, 1].grid(True) # Hide the last subplot axes[1, 2].axis('off') plt.tight_layout() plt.show()

2. Choosing Activation Functions

Different activation functions are suitable for different scenarios:

  • Sigmoid: Suitable for the output layer of binary classification problems, but prone to gradient vanishing in hidden layers
  • Tanh: Has better gradient properties than Sigmoid, but may still cause gradient vanishing
  • ReLU: Widely used in hidden layers, computationally efficient, mitigates gradient vanishing, but may cause neuron death
  • Leaky ReLU: Addresses the neuron death problem of ReLU
  • Softmax: Suitable for the output layer of multi-classification problems, converts outputs to probability distributions

Loss Functions and Optimizers

1. Loss Functions

Loss functions measure the difference between model predictions and true values, serving as the objective function for model training. Common loss functions include:

import numpy as np # Mean Squared Error (MSE) - Regression problems def mean_squared_error(y_true, y_pred): return np.mean(np.square(y_true - y_pred)) # Mean Absolute Error (MAE) - Regression problems def mean_absolute_error(y_true, y_pred): return np.mean(np.abs(y_true - y_pred)) # Binary Cross Entropy - Binary classification problems def binary_cross_entropy(y_true, y_pred): # Avoid log(0) cases epsilon = 1e-15 y_pred = np.clip(y_pred, epsilon, 1 - epsilon) return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) # Categorical Cross Entropy - Multi-class classification problems def categorical_cross_entropy(y_true, y_pred): epsilon = 1e-15 y_pred = np.clip(y_pred, epsilon, 1 - epsilon) return -np.mean(np.sum(y_true * np.log(y_pred), axis=1)) # Examples if __name__ == "__main__": # Regression problem example y_true_reg = np.array([1.0, 2.0, 3.0]) y_pred_reg = np.array([0.9, 1.8, 3.1]) print(f"MSE: {mean_squared_error(y_true_reg, y_pred_reg):.4f}") print(f"MAE: {mean_absolute_error(y_true_reg, y_pred_reg):.4f}") # Binary classification problem example y_true_bin = np.array([1, 0, 1]) y_pred_bin = np.array([0.9, 0.2, 0.8]) print(f"Binary Cross Entropy: {binary_cross_entropy(y_true_bin, y_pred_bin):.4f}") # Multi-class classification problem example y_true_cat = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]) y_pred_cat = np.array([[0.8, 0.1, 0.1], [0.2, 0.7, 0.1], [0.1, 0.2, 0.7]]) print(f"Categorical Cross Entropy: {categorical_cross_entropy(y_true_cat, y_pred_cat):.4f}")

2. Optimizers

Optimizers are used to minimize loss functions and update model parameters. Common optimizers include:

import numpy as np class SGD: def __init__(self, learning_rate=0.01): self.learning_rate = learning_rate def update(self, params, gradients): for param_name, param in params.items(): params[param_name] -= self.learning_rate * gradients[param_name] return params class Momentum: def __init__(self, learning_rate=0.01, momentum=0.9): self.learning_rate = learning_rate self.momentum = momentum self.velocities = {} def update(self, params, gradients): if not self.velocities: for param_name, param in params.items(): self.velocities[param_name] = np.zeros_like(param) for param_name, param in params.items(): self.velocities[param_name] = self.momentum * self.velocities[param_name] + self.learning_rate * gradients[param_name] params[param_name] -= self.velocities[param_name] return params class Adam: def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8): self.learning_rate = learning_rate self.beta1 = beta1 self.beta2 = beta2 self.epsilon = epsilon self.m = {} self.v = {} self.t = 0 def update(self, params, gradients): self.t += 1 if not self.m: for param_name, param in params.items(): self.m[param_name] = np.zeros_like(param) self.v[param_name] = np.zeros_like(param) for param_name, param in params.items(): # Update first moment estimate self.m[param_name] = self.beta1 * self.m[param_name] + (1 - self.beta1) * gradients[param_name] # Update second moment estimate self.v[param_name] = self.beta2 * self.v[param_name] + (1 - self.beta2) * np.square(gradients[param_name]) # Bias correction m_hat = self.m[param_name] / (1 - np.power(self.beta1, self.t)) v_hat = self.v[param_name] / (1 - np.power(self.beta2, self.t)) # Update parameters params[param_name] -= self.learning_rate * m_hat / (np.sqrt(v_hat) + self.epsilon) return params # Example if __name__ == "__main__": # Simple parameter and gradient example params = { 'weights': np.array([0.5, -0.5]), 'bias': np.array([0.0]) } gradients = { 'weights': np.array([0.1, -0.1]), 'bias': np.array([0.05]) } # Test different optimizers print("Initial params:", params) sgd = SGD(learning_rate=0.1) sgd_params = sgd.update(params.copy(), gradients) print("SGD updated params:", sgd_params) momentum = Momentum(learning_rate=0.1) momentum_params = momentum.update(params.copy(), gradients) print("Momentum updated params:", momentum_params) adam = Adam(learning_rate=0.1) adam_params = adam.update(params.copy(), gradients) print("Adam updated params:", adam_params)

Building Neural Networks with TensorFlow/Keras

TensorFlow and Keras are powerful tools for building and training deep learning models. Keras is a high-level API for TensorFlow, providing a concise interface for defining and training neural networks.

import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.datasets import mnist from tensorflow.keras.utils import to_categorical # Load and preprocess data (x_train, y_train), (x_test, y_test) = mnist.load_data() # Data preprocessing x_train = x_train.reshape(-1, 28*28) / 255.0 x_test = x_test.reshape(-1, 28*28) / 255.0 y_train = to_categorical(y_train, 10) y_test = to_categorical(y_test, 10) # Build model model = Sequential([ Dense(128, activation='relu', input_shape=(28*28,)), Dense(64, activation='relu'), Dense(10, activation='softmax') ]) # Compile model model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # Print model summary model.summary() # Train model history = model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2) # Evaluate model loss, accuracy = model.evaluate(x_test, y_test) print(f"Test Loss: {loss:.4f}") print(f"Test Accuracy: {accuracy:.4f}") # Save model model.save('mnist_model.h5') # Load model from tensorflow.keras.models import load_model loaded_model = load_model('mnist_model.h5') # Use loaded model for prediction predictions = loaded_model.predict(x_test[:5]) print("\nPredictions for first 5 test images:") print(np.argmax(predictions, axis=1)) print("Actual labels:") print(np.argmax(y_test[:5], axis=1))

Practical Case: Image Classification

In this practical case, we'll use TensorFlow and Keras to build a Convolutional Neural Network (CNN) for image classification on the CIFAR-10 dataset.

import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Dropout from tensorflow.keras.datasets import cifar10 from tensorflow.keras.utils import to_categorical import matplotlib.pyplot as plt # Load and preprocess data (x_train, y_train), (x_test, y_test) = cifar10.load_data() # Data preprocessing x_train = x_train / 255.0 x_test = x_test / 255.0 y_train = to_categorical(y_train, 10) y_test = to_categorical(y_test, 10) # Class names class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] # Visualize some training images plt.figure(figsize=(10, 10)) for i in range(25): plt.subplot(5, 5, i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(x_train[i]) plt.xlabel(class_names[np.argmax(y_train[i])]) plt.tight_layout() plt.show() # Build CNN model model = Sequential([ Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)), MaxPooling2D((2, 2)), Conv2D(64, (3, 3), activation='relu'), MaxPooling2D((2, 2)), Conv2D(64, (3, 3), activation='relu'), Flatten(), Dense(64, activation='relu'), Dropout(0.5), Dense(10, activation='softmax') ]) # Compile model model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # Print model summary model.summary() # Train model history = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2) # Evaluate model loss, accuracy = model.evaluate(x_test, y_test) print(f"Test Loss: {loss:.4f}") print(f"Test Accuracy: {accuracy:.4f}") # Plot training and validation accuracy plt.figure(figsize=(12, 4)) plt.subplot(1, 2, 1) plt.plot(history.history['accuracy'], label='Training Accuracy') plt.plot(history.history['val_accuracy'], label='Validation Accuracy') plt.title('Training and Validation Accuracy') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.legend() plt.grid(True) # Plot training and validation loss plt.subplot(1, 2, 2) plt.plot(history.history['loss'], label='Training Loss') plt.plot(history.history['val_loss'], label='Validation Loss') plt.title('Training and Validation Loss') plt.xlabel('Epoch') plt.ylabel('Loss') plt.legend() plt.grid(True) plt.tight_layout() plt.show() # Save model model.save('cifar10_cnn_model.h5')

Interactive Exercises

Exercise 1: Neural Network Implementation

Implement a simple three-layer neural network using NumPy for binary classification.

  1. Implement a neural network with input layer, hidden layer, and output layer
  2. Use sigmoid as the activation function
  3. Implement forward propagation and backpropagation
  4. Use gradient descent optimizer
  5. Train and test the model on a synthetic dataset

Exercise 2: Building Models with Keras

Build a neural network using TensorFlow and Keras to predict Boston housing prices.

  1. Load the Boston housing dataset
  2. Standardize the data
  3. Build a neural network with multiple hidden layers
  4. Compile and train the model
  5. Evaluate model performance
  6. Try different optimizers and activation functions

Exercise 3: CNN Image Classification

Build a convolutional neural network using TensorFlow and Keras for classifying the Fashion MNIST dataset.

  1. Load the Fashion MNIST dataset
  2. Preprocess the data
  3. Build a CNN model
  4. Train the model and monitor performance
  5. Evaluate model performance on the test set
  6. Visualize model predictions

Recommended Tutorials

Unsupervised Learning

Explore clustering, dimensionality reduction, and anomaly detection algorithms

View Tutorial

Model Evaluation

Learn model evaluation metrics and model selection methods to ensure reliability and generalization

View Tutorial

Feature Engineering

Master feature engineering techniques to improve model performance and generalization

View Tutorial