Skip to main content

Logistic Regression

Logistic regression is a statistical method used for binary classification problems. It's particularly useful when you need to predict the probability of a binary outcome based on one or more predictor variables. Here's a breakdown:

What is Logistic Regression?

  • Purpose: It models the probability of a binary outcome (e.g., yes/no, success/failure) using a logistic function (sigmoid function).
  • Function: The logistic function maps predicted values (which are in a range from negative infinity to positive infinity) to a probability range between 0 and 1.
  • Formula: The model is typically expressed as: P(Y=1X)=11+e(β0+β1X)P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}} Where P(Y=1X)P(Y = 1 | X) is the probability of the outcome being 1 given predictor XX, and β0\beta_0 and β1\beta_1 are coefficients estimated during model training.

When to Apply Logistic Regression

  • Binary Outcomes: Use logistic regression when your outcome variable is binary or categorical with two levels.
  • Predicting Probabilities: When you must predict the probability of a certain event occurring.
  • Feature Types: It's suitable for continuous or categorical predictor variables.

1. Understanding Logistic Regression

Logistic regression is used for binary classification problems, where the goal is to predict the probability of a binary outcome (e.g., yes/no, 0/1, success/failure). The model predicts the probability that a given input belongs to a particular class.

2. The Logistic Regression Model

a. Linear Combination of Inputs

The foundation of logistic regression is a linear model, similar to linear regression:

z=β0+β1X1+β2X2++βnXnz = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_n X_n
  • z: The linear combination of the input features.
  • β0\beta_0: The intercept (also called the bias term).
  • β1,β2,,βn\beta_1, \beta_2, \dots, \beta_n: The coefficients (weights) for each feature.
  • X1,X2,,XnX_1, X_2, \dots, X_n: The input features.

This linear combination zz is not directly used for classification; instead, it is transformed using the logistic (sigmoid) function to predict probabilities.

b. Sigmoid Function (Logistic Function)

The sigmoid function maps the linear combination zz to a value between 0 and 1, representing the probability of the positive class (e.g., Y=1Y = 1):

P(Y=1X)=11+ez=11+e(β0+β1X1+β2X2++βnXn)P(Y=1|\mathbf{X}) = \frac{1}{1 + e^{-z}} = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_n X_n)}}
  • P(Y=1X)P(Y=1|\mathbf{X}): The probability that the outcome YY is 1, given the input features X\mathbf{X}.
  • The sigmoid function ensures the output is between 0 and 1, which can be interpreted as a probability.

3. Decision Rule

After calculating the probability, a decision rule is applied to classify the input:

Predicted Class={1if P(Y=1X)0.50if P(Y=1X)<0.5\text{Predicted Class} = \begin{cases} 1 & \text{if } P(Y=1|\mathbf{X}) \geq 0.5 \\ 0 & \text{if } P(Y=1|\mathbf{X}) < 0.5 \end{cases}
  • Threshold: The default threshold is 0.5, but it can be adjusted depending on the problem's requirements.

4. Model Training: Maximum Likelihood Estimation (MLE)

To train the model, we need to find the parameters β0,β1,,βn\beta_0, \beta_1, \dots, \beta_n that best fit the data. This is done by maximizing the likelihood of observing the given data.

a. Likelihood Function

For each training instance, the likelihood of the observed outcome yiy_i is:

L(β)=i=1mP(yiXi)L(\beta) = \prod_{i=1}^{m} P(y_i|\mathbf{X}_i)
  • mm: The number of training instances.
  • yiy_i: The actual outcome for the ii-th instance.

For logistic regression, the likelihood can be expressed as:

L(β)=i=1m[P(Y=1Xi)yi×(1P(Y=1Xi))(1yi)]L(\beta) = \prod_{i=1}^{m} \left[ P(Y=1|\mathbf{X}_i)^{y_i} \times (1 - P(Y=1|\mathbf{X}_i))^{(1-y_i)} \right]

b. Log-Likelihood Function

To simplify calculations, we take the logarithm of the likelihood function (log-likelihood):

logL(β)=i=1m[yilog(P(Y=1Xi))+(1yi)log(1P(Y=1Xi))]\text{log} L(\beta) = \sum_{i=1}^{m} \left[ y_i \cdot \log(P(Y=1|\mathbf{X}_i)) + (1 - y_i) \cdot \log(1 - P(Y=1|\mathbf{X}_i)) \right]

5. Optimization: Gradient Descent

To find the optimal parameters β0,β1,,βn\beta_0, \beta_1, \dots, \beta_n we maximize the log-likelihood function using an optimization algorithm like gradient descent.

a. Gradient Descent Update Rule

The gradient descent algorithm updates the parameters iteratively:

βj:=βj+αβjlogL(β)\beta_j := \beta_j + \alpha \cdot \frac{\partial}{\partial \beta_j} \text{log} L(\beta)
  • α\alpha: The learning rate, which controls the step size in the parameter space.
  • βjlogL(β)\frac{\partial}{\partial \beta_j} \text{log} L(\beta): The derivative of the log-likelihood function with respect to βj\beta_j.

6. Interpretation of Coefficients

  • Intercept β0\beta_0: The log-odds of the outcome when all features X1,X2,,XnX_1, X_2, \dots, X_n are zero.
  • Coefficients β1,,βn\beta_1, \dots, \beta_n: Each coefficient represents the change in the log-odds of the outcome for a one-unit increase in the corresponding feature, holding all other features constant.

7. Example: Binary Classification with a Single Feature

For a single feature XX, the logistic regression model simplifies to:

P(Y=1X)=11+e(β0+β1X)P(Y=1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}
  • This gives the probability that YY is 1 given the value of XX.
  • The decision rule is applied to classify the input as 0 or 1 based on this probability.

Implementation of Logistic regression:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load dataset
data = pd.read_csv('housing.csv')

# Display the first few rows
data.head()

# Check for missing values
data.isnull().sum()

# For simplicity, drop rows with missing values
data = data.dropna()

# Convert categorical columns to numerical if any (e.g., using one-hot encoding)
# data = pd.get_dummies(data, drop_first=True)

# Example: Suppose we want to predict whether 'Price' is above the median
median_price = data['Price'].median()
data['Above_Median'] = (data['Price'] > median_price).astype(int)

# Features and target variable
X = data.drop(['Price', 'Above_Median'], axis=1)
y = data['Above_Median']

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the model
log_reg = LogisticRegression()

# Train the model
log_reg.fit(X_train, y_train)

# Predict on the test set
y_pred = log_reg.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:')
print(conf_matrix)

# Classification Report
class_report = classification_report(y_test, y_pred)
print('Classification Report:')
print(class_report)

Comments

Popular posts from this blog

Linear Regression using Ordinary Least Square method

Ordinary Least Square Method Download Dataset Step 1: Import the necessary libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt Step 2: Load the CSV Data # Load the dataset data = pd.read_csv('house_data.csv') # Extract the features (X) and target variable (y) X = data['Size'].values y = data['Price'].values # Reshape X to be a 2D array X = X.reshape(-1, 1) # Add a column of ones to X for the intercept X_b = np.c_[np.ones((X.shape[0], 1)), X] Step 3: Add a Column of Ones to X for the Intercept # Add a column of ones to X for the intercept X_b = np.c_[np.ones((X.shape[0], 1)), X] Step 4: Implement the OLS Method # Calculate the OLS estimate of theta (the coefficients) theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y) Step 5: Make Predictions # Make predictions y_pred = X_b.dot(theta_best) Step 6: Visualize the Results # Plot the data and the regression line plt.scatter(X, y, color='blue', label='Data') plt.pl...

Quadratic Regression

  Quadratic regression is a statistical method used to model a relationship between variables with a parabolic best-fit curve, rather than a straight line. It's ideal when the data relationship appears curvilinear. The goal is to fit a quadratic equation   y=ax^2+bx+c y = a ⁢ x 2 + b ⁢ x + c to the observed data, providing a nuanced model of the relationship. Contrary to historical or biological connotations, "regression" in this mathematical context refers to advancing our understanding of complex relationships among variables, particularly when data follows a curvilinear pattern. Working with quadratic regression These calculations can become quite complex and tedious. We have just gone over a few very detailed formulas, but the truth is that we can handle these calculations with a graphing calculator. This saves us from having to go through so many steps -- but we still must understand the core concepts at play. Let's try a practice problem that includes quadratic ...