Skip to main content

Polynomial Regression

Polynomial Regression is an extension of Linear Regression that describes the connection between input features and target variables as an n-th degree polynomial. Unlike linear regression, which fits a straight line, polynomial regression can fit a curve to the data.

1. What is Polynomial Regression?

In Polynomial Regression, the relationship between the input variable xx and the output variable yy is modeled as an nn-th degree polynomial. The general form of a polynomial regression model is:

y^=θ0+θ1x+θ2x2+θ3x3++θnxn\hat{y} = \theta_0 + \theta_1 x + \theta_2 x^2 + \theta_3 x^3 + \dots + \theta_n x^n

Where:

  • y^\hat{y} is the predicted value.
  • xx is the input feature.
  • θ0,θ1,θ2,,θn\theta_0, \theta_1, \theta_2, \dots, \theta_n are the parameters (coefficients) of the model.
  • nn is the degree of the polynomial.

2. Why Use Polynomial Regression?

Polynomial Regression is used when the data shows a nonlinear relationship between the input features and the target variable. If a linear model fails to capture the pattern in the data, a polynomial model can provide a better fit by capturing the curvature in the data.

  • Nonlinear Relationships: Polynomial regression is useful when the relationship between the input features and the target variable is nonlinear.
  • Flexibility: The degree of the polynomial can be changed to regulate the model's flexibility. Higher degrees can fit more complex patterns, but they also raise the possibility of overfitting.
  • Better Fit: In cases where linear regression fails to capture the underlying trend in the data, polynomial regression can provide a better fit by modeling the curvature of the data.

3. Working Process of Polynomial Regression

Step 1: Data Transformation

To use polynomial regression, the input feature x is converted into polynomial features. For example, if x=[1,2,3]x = [1, 2, 3], and we want to fit a quadratic (2nd degree) polynomial, the transformed features would be:

Original X:[1,2,3]Transformed X:[112132222333233]=[1112483927]\text{Original } X: [1, 2, 3] \quad \text{Transformed } X: \left[\begin{array}{ccc} 1 & 1^2 & 1^3 \\ 2 & 2^2 & 2^3 \\ 3 & 3^2 & 3^3 \\ \end{array}\right] = \left[\begin{array}{ccc} 1 & 1 & 1 \\ 2 & 4 & 8 \\ 3 & 9 & 27 \\ \end{array}\right]

Step 2: Fit the Polynomial Model

After transforming the input features, you fit a linear regression model to the transformed features. The model will learn the parameters θ0,θ1,θ2,\theta_0, \theta_1, \theta_2, \dots which minimize the error between the predicted and actual values.

Step 3: Make Predictions

Using the fitted model, you can make predictions for new data by applying the learned parameters to the polynomial features of the new input data.

Implementation of Polynomial Regression

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# Sample data
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(-1, 1)
y = np.array([1, 4, 9, 16, 25, 36, 49, 64, 81])

# Transforming the data to include polynomial features
degree = 2  # You can change this to higher degrees to fit more complex curves
polynomial_features = PolynomialFeatures(degree=degree, include_bias=False)

# Create a pipeline that first transforms the data to polynomial features and then fits a linear model
model = make_pipeline(polynomial_features, LinearRegression())

# Fit the model
model.fit(X, y)

# Predicting
X_new = np.linspace(0, 10, 100).reshape(100, 1)
y_new = model.predict(X_new)

# Plotting the results
plt.figure(figsize=(8, 6))
plt.scatter(X, y, color='blue', label='Data Points')
plt.plot(X_new, y_new, color='red', linewidth=2, label='Polynomial Regression Line')
plt.xlabel('X', fontsize=14)
plt.ylabel('y', fontsize=14)
plt.title(f'Polynomial Regression (degree={degree})', fontsize=16)
plt.legend()
plt.grid(True)
plt.show()

Advantages:

  1. Flexibility in Modeling Nonlinear Relationships:

    • Polynomial regression can model complex, nonlinear relationships between the input features and the target variable. This makes it a versatile tool for capturing trends that linear regression cannot.
  2. Easy to Implement:

    • Polynomial regression is relatively straightforward to implement, especially using tools like scikit-learn. It builds on the principles of linear regression, making it accessible for those familiar with basic regression techniques.
  3. Interpretability:

    • Although more complex than linear regression, polynomial regression still maintains a degree of interpretability, particularly for low-degree polynomials. You can understand the impact of each term in the polynomial equation.
  4. Good Fit for Small Datasets:

    • Polynomial regression can be effective for small datasets where the relationship between variables is inherently nonlinear. It can provide a better fit than linear regression when the dataset is small and well-behaved.

Disadvantages:

  1. Overfitting:

    • A major risk of polynomial regression is overfitting, especially when the degree of the polynomial is high. The model may fit the training data very well but fail to generalize to new, unseen data, leading to poor performance on test datasets.
  2. Extrapolation Issues:

    • Polynomial regression models can behave unpredictably when making predictions outside the range of the training data. The curve can become extremely steep or oscillatory, leading to unrealistic predictions.
  3. Complexity with High-Degree Polynomials:

    • As the degree of the polynomial increases, the model becomes increasingly complex, and it becomes harder to interpret the relationship between the features and the target variable. High-degree polynomials also require more computational resources.
  4. Sensitive to Outliers:

    • Polynomial regression is sensitive to outliers. Since the model tries to minimize the error for all points, an outlier can significantly skew the polynomial curve, leading to a poor fit for the majority of the data.
  5. Multicollinearity:

    • When using polynomial regression with multiple features, there can be a high degree of multicollinearity (correlation between the polynomial terms). This can make the model unstable and difficult to interpret.






Comments

Popular posts from this blog

Logistic Regression

Logistic regression is a statistical method used for binary classification problems. It's particularly useful when you need to predict the probability of a binary outcome based on one or more predictor variables. Here's a breakdown: What is Logistic Regression? Purpose : It models the probability of a binary outcome (e.g., yes/no, success/failure) using a logistic function (sigmoid function). Function : The logistic function maps predicted values (which are in a range from negative infinity to positive infinity) to a probability range between 0 and 1. Formula : The model is typically expressed as: P ( Y = 1 ∣ X ) = 1 1 + e − ( β 0 + β 1 X ) P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}} P ( Y = 1∣ X ) = 1 + e − ( β 0 ​ + β 1 ​ X ) 1 ​ Where P ( Y = 1 ∣ X ) P(Y = 1 | X) P ( Y = 1∣ X ) is the probability of the outcome being 1 given predictor X X X , and β 0 \beta_0 β 0 ​ and β 1 \beta_1 β 1 ​ are coefficients estimated during model training. When to Apply Logistic R...

Linear Regression using Ordinary Least Square method

Ordinary Least Square Method Download Dataset Step 1: Import the necessary libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt Step 2: Load the CSV Data # Load the dataset data = pd.read_csv('house_data.csv') # Extract the features (X) and target variable (y) X = data['Size'].values y = data['Price'].values # Reshape X to be a 2D array X = X.reshape(-1, 1) # Add a column of ones to X for the intercept X_b = np.c_[np.ones((X.shape[0], 1)), X] Step 3: Add a Column of Ones to X for the Intercept # Add a column of ones to X for the intercept X_b = np.c_[np.ones((X.shape[0], 1)), X] Step 4: Implement the OLS Method # Calculate the OLS estimate of theta (the coefficients) theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y) Step 5: Make Predictions # Make predictions y_pred = X_b.dot(theta_best) Step 6: Visualize the Results # Plot the data and the regression line plt.scatter(X, y, color='blue', label='Data') plt.pl...

Quadratic Regression

  Quadratic regression is a statistical method used to model a relationship between variables with a parabolic best-fit curve, rather than a straight line. It's ideal when the data relationship appears curvilinear. The goal is to fit a quadratic equation   y=ax^2+bx+c y = a ⁢ x 2 + b ⁢ x + c to the observed data, providing a nuanced model of the relationship. Contrary to historical or biological connotations, "regression" in this mathematical context refers to advancing our understanding of complex relationships among variables, particularly when data follows a curvilinear pattern. Working with quadratic regression These calculations can become quite complex and tedious. We have just gone over a few very detailed formulas, but the truth is that we can handle these calculations with a graphing calculator. This saves us from having to go through so many steps -- but we still must understand the core concepts at play. Let's try a practice problem that includes quadratic ...