Implement XOR Problem using Multi-Layered Perceptron

XoR Problem:

Below is a clear, step-by-step implementation of the XOR problem using a Multi-Layer Perceptron (MLP). I will first explain the theory, then show the mathematical steps, and finally provide a simple implementation.

Step 1: Understand the XOR Problem

The XOR (Exclusive OR) function outputs:

x₁

x₂

XOR

0

0

0

0

1

1

1

0

1

1

1

0

Key observation:
XOR is not linearly separable, so it cannot be solved by a single-layer perceptron.
Hence, we need a Multi-Layer Perceptron with at least one hidden layer.

Step 2: Network Architecture

We choose a 2–2–1 MLP architecture:

  • Input layer: 2 neurons (x₁, x₂)
  • Hidden layer: 2 neurons
  • Output layer: 1 neuron

Activation function:

  • Hidden layer → Sigmoid
  • Output layer → Sigmoid

Step 3: Initialize Parameters

Let:

  • Weights from input to hidden layer →
  • Bias for hidden layer →
  • Weights from hidden to output layer →
  • Bias for output layer →

Initialize weights and biases with small random values.

Step 4: Forward Propagation

4.1 Hidden Layer Computation

4.2 Output Layer Computation

Where sigmoid function is:


Step 5: Loss Function

Use Mean Squared Error (MSE):


Step 6: Backpropagation

6.1 Output Layer Error

6.2 Hidden Layer Error


Where:


Step 7: Update Weights and Biases

Using Gradient Descent:

Where is the learning rate.

Step 8: Algorithm (Step-by-Step)

  1. Initialize weights and biases
  2. Perform forward propagation
  3. Compute loss
  4. Perform backpropagation
  5. Update weights and biases
  6. Repeat for multiple epochs



import numpy as np
import matplotlib.pyplot as plt

# -------------------------------
# Step 1: XOR Dataset
# -------------------------------
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])

y = np.array([[0],
              [1],
              [1],
              [0]])

# -------------------------------
# Step 2: Initialize Parameters
# -------------------------------
np.random.seed(0)

W1 = np.random.randn(2, 2)
b1 = np.zeros((1, 2))

W2 = np.random.randn(2, 1)
b2 = np.zeros((1, 1))

# -------------------------------
# Step 3: Activation Functions
# -------------------------------
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# -------------------------------
# Step 4: Training Parameters
# -------------------------------
learning_rate = 0.1
epochs = 10000
losses = []

# -------------------------------
# Step 5: Training Loop
# -------------------------------
for epoch in range(epochs):

    # Forward Propagation
    z1 = np.dot(X, W1) + b1
    a1 = sigmoid(z1)

    z2 = np.dot(a1, W2) + b2
    y_pred = sigmoid(z2)

    # Mean Squared Error Loss
    loss = np.mean((y - y_pred) ** 2)
    losses.append(loss)

    # Backpropagation
    delta_output = (y_pred - y) * sigmoid_derivative(y_pred)
    delta_hidden = np.dot(delta_output, W2.T) * sigmoid_derivative(a1)

    # Update Weights and Biases
    W2 -= learning_rate * np.dot(a1.T, delta_output)
    b2 -= learning_rate * np.sum(delta_output, axis=0, keepdims=True)

    W1 -= learning_rate * np.dot(X.T, delta_hidden)
    b1 -= learning_rate * np.sum(delta_hidden, axis=0, keepdims=True)

# -------------------------------
# Step 6: Plot Training Loss Graph
# -------------------------------
plt.figure()
plt.plot(losses)
plt.xlabel("Epochs")
plt.ylabel("Mean Squared Error")
plt.title("Training Loss Curve for XOR using MLP")
plt.show()

# -------------------------------
# Step 7: Final Prediction
# -------------------------------
print("Final Predicted Output:")
print(np.round(y_pred))




Comments

Popular posts from this blog

Principal Component Analysis

About me

Keras