Python Machine Learning - Deep Learning Basics

Deep Learning Overview

Deep learning is an important branch of machine learning that uses multi-layer neural networks to simulate the human brain's learning process. Deep learning models can automatically learn feature representations from data without manual feature engineering, achieving significant results in image recognition, natural language processing, speech recognition, and other fields.

Key characteristics of deep learning include:

Multi-layer structure: By stacking multiple neural network layers, models can learn multi-level feature representations
Automatic feature extraction: Automatically learning useful features from raw data
Strong expressive power: Capable of modeling complex non-linear relationships
Requires large amounts of data: Typically needs large labeled datasets for training
Computationally intensive: Training process requires powerful computing resources

Neural Network Fundamentals

1. Perceptron

The perceptron is the basic building block of neural networks. It receives multiple inputs, computes a weighted sum, and then outputs the result through an activation function.

                        import numpy as np

class Perceptron:
    def __init__(self, input_size, learning_rate=0.01):
        self.weights = np.zeros(input_size + 1)  # +1 for bias
        self.learning_rate = learning_rate
    
    def predict(self, inputs):
        # Calculate weighted sum (including bias)
        summation = np.dot(inputs, self.weights[1:]) + self.weights[0]
        # Apply step function
        return 1 if summation > 0 else 0
    
    def train(self, training_inputs, labels, epochs):
        for epoch in range(epochs):
            for inputs, label in zip(training_inputs, labels):
                prediction = self.predict(inputs)
                # Update weights
                self.weights[1:] += self.learning_rate * (label - prediction) * inputs
                self.weights[0] += self.learning_rate * (label - prediction)

# Test the perceptron
if __name__ == "__main__":
    # Training data (logical AND)
    training_inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    labels = np.array([0, 0, 0, 1])
    
    perceptron = Perceptron(2)
    perceptron.train(training_inputs, labels, 10)
    
    # Test
    test_inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    for inputs in test_inputs:
        print(f"Input: {inputs}, Prediction: {perceptron.predict(inputs)}")
                    

2. Multi-layer Neural Network

A multi-layer neural network (also known as a feedforward neural network) consists of an input layer, hidden layers, and an output layer. The presence of hidden layers allows neural networks to learn complex non-linear relationships.

                        import numpy as np

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize weights
        self.weights1 = np.random.randn(input_size, hidden_size)
        self.bias1 = np.zeros((1, hidden_size))
        self.weights2 = np.random.randn(hidden_size, output_size)
        self.bias2 = np.zeros((1, output_size))
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(self, x):
        return x * (1 - x)
    
    def forward(self, inputs):
        # Forward propagation
        self.layer1 = self.sigmoid(np.dot(inputs, self.weights1) + self.bias1)
        self.output = self.sigmoid(np.dot(self.layer1, self.weights2) + self.bias2)
        return self.output
    
    def backward(self, inputs, labels, learning_rate):
        # Calculate error
        output_error = labels - self.output
        output_delta = output_error * self.sigmoid_derivative(self.output)
        
        # Hidden layer error
        layer1_error = output_delta.dot(self.weights2.T)
        layer1_delta = layer1_error * self.sigmoid_derivative(self.layer1)
        
        # Update weights
        self.weights2 += self.layer1.T.dot(output_delta) * learning_rate
        self.bias2 += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
        self.weights1 += inputs.T.dot(layer1_delta) * learning_rate
        self.bias1 += np.sum(layer1_delta, axis=0, keepdims=True) * learning_rate
    
    def train(self, inputs, labels, epochs, learning_rate):
        for epoch in range(epochs):
            self.forward(inputs)
            self.backward(inputs, labels, learning_rate)
            if epoch % 1000 == 0:
                loss = np.mean(np.square(labels - self.output))
                print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Test the neural network
if __name__ == "__main__":
    # Training data (logical XOR)
    inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
    labels = np.array([[0], [1], [1], [0]])
    
    nn = NeuralNetwork(2, 4, 1)
    nn.train(inputs, labels, 10000, 0.1)
    
    # Test
    print("\nTest Results:")
    for i, input_data in enumerate(inputs):
        prediction = nn.forward(input_data.reshape(1, -1))
        print(f"Input: {input_data}, Prediction: {prediction[0][0]:.4f}, Target: {labels[i][0]}")
                    

Activation Functions

1. Common Activation Functions

Activation functions introduce non-linearity to neural networks, enabling models to learn complex function relationships. Common activation functions include:

                        import numpy as np
import matplotlib.pyplot as plt

# Define activation functions
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

def relu(x):
    return np.maximum(0, x)

def leaky_relu(x, alpha=0.01):
    return np.maximum(alpha * x, x)

def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=-1, keepdims=True)

# Generate data
x = np.linspace(-10, 10, 100)

# Plot activation functions
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

axes[0, 0].plot(x, sigmoid(x))
axes[0, 0].set_title('Sigmoid')
axes[0, 0].grid(True)

axes[0, 1].plot(x, tanh(x))
axes[0, 1].set_title('Tanh')
axes[0, 1].grid(True)

axes[0, 2].plot(x, relu(x))
axes[0, 2].set_title('ReLU')
axes[0, 2].grid(True)

axes[1, 0].plot(x, leaky_relu(x))
axes[1, 0].set_title('Leaky ReLU')
axes[1, 0].grid(True)

# Softmax example
x_softmax = np.array([-1, 0, 1, 2])
y_softmax = softmax(x_softmax)
axes[1, 1].bar(range(len(x_softmax)), y_softmax)
axes[1, 1].set_title('Softmax')
axes[1, 1].grid(True)

# Hide the last subplot
axes[1, 2].axis('off')

plt.tight_layout()
plt.show()
                    

2. Choosing Activation Functions

Different activation functions are suitable for different scenarios:

Sigmoid: Suitable for the output layer of binary classification problems, but prone to gradient vanishing in hidden layers
Tanh: Has better gradient properties than Sigmoid, but may still cause gradient vanishing
ReLU: Widely used in hidden layers, computationally efficient, mitigates gradient vanishing, but may cause neuron death
Leaky ReLU: Addresses the neuron death problem of ReLU
Softmax: Suitable for the output layer of multi-classification problems, converts outputs to probability distributions

Loss Functions and Optimizers

1. Loss Functions

Loss functions measure the difference between model predictions and true values, serving as the objective function for model training. Common loss functions include:

                        import numpy as np

# Mean Squared Error (MSE) - Regression problems
def mean_squared_error(y_true, y_pred):
    return np.mean(np.square(y_true - y_pred))

# Mean Absolute Error (MAE) - Regression problems
def mean_absolute_error(y_true, y_pred):
    return np.mean(np.abs(y_true - y_pred))

# Binary Cross Entropy - Binary classification problems
def binary_cross_entropy(y_true, y_pred):
    # Avoid log(0) cases
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

# Categorical Cross Entropy - Multi-class classification problems
def categorical_cross_entropy(y_true, y_pred):
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    return -np.mean(np.sum(y_true * np.log(y_pred), axis=1))

# Examples
if __name__ == "__main__":
    # Regression problem example
    y_true_reg = np.array([1.0, 2.0, 3.0])
    y_pred_reg = np.array([0.9, 1.8, 3.1])
    print(f"MSE: {mean_squared_error(y_true_reg, y_pred_reg):.4f}")
    print(f"MAE: {mean_absolute_error(y_true_reg, y_pred_reg):.4f}")
    
    # Binary classification problem example
    y_true_bin = np.array([1, 0, 1])
    y_pred_bin = np.array([0.9, 0.2, 0.8])
    print(f"Binary Cross Entropy: {binary_cross_entropy(y_true_bin, y_pred_bin):.4f}")
    
    # Multi-class classification problem example
    y_true_cat = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
    y_pred_cat = np.array([[0.8, 0.1, 0.1], [0.2, 0.7, 0.1], [0.1, 0.2, 0.7]])
    print(f"Categorical Cross Entropy: {categorical_cross_entropy(y_true_cat, y_pred_cat):.4f}")
                    

2. Optimizers

Optimizers are used to minimize loss functions and update model parameters. Common optimizers include:

                        import numpy as np

class SGD:
    def __init__(self, learning_rate=0.01):
        self.learning_rate = learning_rate
    
    def update(self, params, gradients):
        for param_name, param in params.items():
            params[param_name] -= self.learning_rate * gradients[param_name]
        return params

class Momentum:
    def __init__(self, learning_rate=0.01, momentum=0.9):
        self.learning_rate = learning_rate
        self.momentum = momentum
        self.velocities = {}
    
    def update(self, params, gradients):
        if not self.velocities:
            for param_name, param in params.items():
                self.velocities[param_name] = np.zeros_like(param)
        
        for param_name, param in params.items():
            self.velocities[param_name] = self.momentum * self.velocities[param_name] + self.learning_rate * gradients[param_name]
            params[param_name] -= self.velocities[param_name]
        return params

class Adam:
    def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
        self.learning_rate = learning_rate
        self.beta1 = beta1
        self.beta2 = beta2
        self.epsilon = epsilon
        self.m = {}
        self.v = {}
        self.t = 0
    
    def update(self, params, gradients):
        self.t += 1
        
        if not self.m:
            for param_name, param in params.items():
                self.m[param_name] = np.zeros_like(param)
                self.v[param_name] = np.zeros_like(param)
        
        for param_name, param in params.items():
            # Update first moment estimate
            self.m[param_name] = self.beta1 * self.m[param_name] + (1 - self.beta1) * gradients[param_name]
            # Update second moment estimate
            self.v[param_name] = self.beta2 * self.v[param_name] + (1 - self.beta2) * np.square(gradients[param_name])
            # Bias correction
            m_hat = self.m[param_name] / (1 - np.power(self.beta1, self.t))
            v_hat = self.v[param_name] / (1 - np.power(self.beta2, self.t))
            # Update parameters
            params[param_name] -= self.learning_rate * m_hat / (np.sqrt(v_hat) + self.epsilon)
        return params

# Example
if __name__ == "__main__":
    # Simple parameter and gradient example
    params = {
        'weights': np.array([0.5, -0.5]),
        'bias': np.array([0.0])
    }
    
    gradients = {
        'weights': np.array([0.1, -0.1]),
        'bias': np.array([0.05])
    }
    
    # Test different optimizers
    print("Initial params:", params)
    
    sgd = SGD(learning_rate=0.1)
    sgd_params = sgd.update(params.copy(), gradients)
    print("SGD updated params:", sgd_params)
    
    momentum = Momentum(learning_rate=0.1)
    momentum_params = momentum.update(params.copy(), gradients)
    print("Momentum updated params:", momentum_params)
    
    adam = Adam(learning_rate=0.1)
    adam_params = adam.update(params.copy(), gradients)
    print("Adam updated params:", adam_params)
                    

Building Neural Networks with TensorFlow/Keras

TensorFlow and Keras are powerful tools for building and training deep learning models. Keras is a high-level API for TensorFlow, providing a concise interface for defining and training neural networks.

                        import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load and preprocess data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Data preprocessing
x_train = x_train.reshape(-1, 28*28) / 255.0
x_test = x_test.reshape(-1, 28*28) / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Build model
model = Sequential([
    Dense(128, activation='relu', input_shape=(28*28,)),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Print model summary
model.summary()

# Train model
history = model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate model
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

# Save model
model.save('mnist_model.h5')

# Load model
from tensorflow.keras.models import load_model
loaded_model = load_model('mnist_model.h5')

# Use loaded model for prediction
predictions = loaded_model.predict(x_test[:5])
print("\nPredictions for first 5 test images:")
print(np.argmax(predictions, axis=1))
print("Actual labels:")
print(np.argmax(y_test[:5], axis=1))
                    

Practical Case: Image Classification

In this practical case, we'll use TensorFlow and Keras to build a Convolutional Neural Network (CNN) for image classification on the CIFAR-10 dataset.

                        import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Dropout
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

# Load and preprocess data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Data preprocessing
x_train = x_train / 255.0
x_test = x_test / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Class names
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# Visualize some training images
plt.figure(figsize=(10, 10))
for i in range(25):
    plt.subplot(5, 5, i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(x_train[i])
    plt.xlabel(class_names[np.argmax(y_train[i])])
plt.tight_layout()
plt.show()

# Build CNN model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    Flatten(),
    Dense(64, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Print model summary
model.summary()

# Train model
history = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2)

# Evaluate model
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

# Plot training and validation accuracy
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

# Plot training and validation loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

# Save model
model.save('cifar10_cnn_model.h5')
                    

Interactive Exercises

Exercise 1: Neural Network Implementation

Implement a simple three-layer neural network using NumPy for binary classification.

Implement a neural network with input layer, hidden layer, and output layer
Use sigmoid as the activation function
Implement forward propagation and backpropagation
Use gradient descent optimizer
Train and test the model on a synthetic dataset

Exercise 2: Building Models with Keras

Build a neural network using TensorFlow and Keras to predict Boston housing prices.

Load the Boston housing dataset
Standardize the data
Build a neural network with multiple hidden layers
Compile and train the model
Evaluate model performance
Try different optimizers and activation functions

Exercise 3: CNN Image Classification

Build a convolutional neural network using TensorFlow and Keras for classifying the Fashion MNIST dataset.

Load the Fashion MNIST dataset
Preprocess the data
Build a CNN model
Train the model and monitor performance
Evaluate model performance on the test set
Visualize model predictions