Linear regression is one of the most basic and commonly used statistical techniques in machine learning. The link between a dependent variable (target) and one or more independent variables (features) is modeled using this technique. The objective is to find the optimal linear equation that fits the data and can be used to forecast the dependent variable using the values of the independent variables.
Simple Linear Regression
The one independent variable (feature) and the single dependent variable (target) in basic linear regression. A straight line is used to depict their relationship.
It involves one independent variable:
- is the dependent variable (target).
- is the independent variable (feature).
- is the y-intercept (constant term).
- is the slope of the line (coefficient for the feature).
- is the error term (residual).
Multiple Linear Regression
In multiple linear regression, there are numerous independent variables (features). A hyperplane in higher dimensions represents the relationship between the dependent variable and the independent variables.
It involves multiple independent variables:
Working Process of Linear Regression
Hypothesis Representation:
- The relationship between the dependent and independent variables is represented as a linear equation. The objective is to find the best-fitting line (in simple linear regression) or hyperplane (in multiple linear regression).
Cost Function (Mean Squared Error):
- The cost function computes the difference between the expected and actual values. The most frequent cost function for linear regression is Mean Squared Error. (MSE): where is the predicted value, is the actual value, and
Optimization (Gradient Descent):
- The goal is to minimize the cost function by adjusting the parameters and . Gradient Descent is a common optimization technique used to find the minimum of the cost function: is the learning rate, and is the partial derivative of the cost function for .
Model Evaluation:
- The performance of the linear regression model can be evaluated using metrics such as R-squared (), Mean Absolute Error (MAE), or Root Mean Squared Error (RMSE).
Implementation of Simple Linear Graph:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 3, 2, 3, 5])
# Creating and training the model
model = LinearRegression()
model.fit(X, y)
# Predicting
y_pred = model.predict(X)
# Plotting the data points and the regression line
plt.figure(figsize=(8, 6))
plt.scatter(X, y, color='blue', label='Data Points', s=100) # Larger dots for clarity
plt.plot(X, y_pred, color='red', linewidth=2, label='Regression Line') # Thicker line for better visibility
plt.xlabel('X', fontsize=14)
plt.ylabel('y', fontsize=14)
plt.title('Simple Linear Regression', fontsize=16)
plt.legend()
plt.grid(True) # Adding grid for better reference
plt.show()
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.linear_model import LinearRegression
# Sample data
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]])
y = np.array([2, 3, 4, 5, 6])
# Creating and training the model
model = LinearRegression()
model.fit(X, y)
# Predicting
y_pred = model.predict(X)
# Plotting the data points and the regression plane
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:, 0], X[:, 1], y, color='blue', label='Data Points')
# Creating the plane
x_surf, y_surf = np.meshgrid(np.linspace(X[:, 0].min(), X[:, 0].max(), 100),
np.linspace(X[:, 1].min(), X[:, 1].max(), 100))
z_surf = model.predict(np.c_[x_surf.ravel(), y_surf.ravel()]).reshape(x_surf.shape)
# Plotting the plane
ax.plot_surface(x_surf, y_surf, z_surf, color='red', alpha=0.5)
ax.set_xlabel('X1')
ax.set_ylabel('X2')
ax.set_zlabel('y')
ax.set_title('Multiple Linear Regression')
plt.show()
Comments