Logistic Regression

Logistic regression is a statistical method used for binary classification problems. It's particularly useful when you need to predict the probability of a binary outcome based on one or more predictor variables. Here's a breakdown:

What is Logistic Regression?

Purpose: It models the probability of a binary outcome (e.g., yes/no, success/failure) using a logistic function (sigmoid function).
Function: The logistic function maps predicted values (which are in a range from negative infinity to positive infinity) to a probability range between 0 and 1.
Formula: The model is typically expressed as: $P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}$ Where $P(Y = 1 | X)$ is the probability of the outcome being 1 given predictor $X$ , and $\beta_0$ and $\beta_1$ are coefficients estimated during model training.

When to Apply Logistic Regression

Binary Outcomes: Use logistic regression when your outcome variable is binary or categorical with two levels.
Predicting Probabilities: When you must predict the probability of a certain event occurring.
Feature Types: It's suitable for continuous or categorical predictor variables.

1. Understanding Logistic Regression

Logistic regression is used for binary classification problems, where the goal is to predict the probability of a binary outcome (e.g., yes/no, 0/1, success/failure). The model predicts the probability that a given input belongs to a particular class.

2. The Logistic Regression Model

a. Linear Combination of Inputs

The foundation of logistic regression is a linear model, similar to linear regression:

z = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_n X_n

$z: The linear combination of the input features.$
$\beta_0$ : The intercept (also called the bias term).
$\beta_1, \beta_2, \dots, \beta_n$ : The coefficients (weights) for each feature.
$X_1, X_2, \dots, X_n$ : The input features.

This linear combination $z$ is not directly used for classification; instead, it is transformed using the logistic (sigmoid) function to predict probabilities.

b. Sigmoid Function (Logistic Function)

The sigmoid function maps the linear combination $z$ to a value between 0 and 1, representing the probability of the positive class (e.g., $Y = 1$ ):

P(Y=1|\mathbf{X}) = \frac{1}{1 + e^{-z}} = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_n X_n)}}

$P(Y=1|\mathbf{X})$ : The probability that the outcome $Y$ is 1, given the input features $\mathbf{X}$ .
The sigmoid function ensures the output is between 0 and 1, which can be interpreted as a probability.

3. Decision Rule

After calculating the probability, a decision rule is applied to classify the input:

\text{Predicted Class} = \begin{cases} 1 & \text{if } P(Y=1|\mathbf{X}) \geq 0.5 \\ 0 & \text{if } P(Y=1|\mathbf{X}) < 0.5 \end{cases}

Threshold: The default threshold is 0.5, but it can be adjusted depending on the problem's requirements.

4. Model Training: Maximum Likelihood Estimation (MLE)

To train the model, we need to find the parameters $\beta_0, \beta_1, \dots, \beta_n$ that best fit the data. This is done by maximizing the likelihood of observing the given data.

a. Likelihood Function

For each training instance, the likelihood of the observed outcome $y_i$ is:

L(\beta) = \prod_{i=1}^{m} P(y_i|\mathbf{X}_i)

$m$ : The number of training instances.
$y_i$ : The actual outcome for the $i$ -th instance.

For logistic regression, the likelihood can be expressed as:

L(\beta) = \prod_{i=1}^{m} \left[ P(Y=1|\mathbf{X}_i)^{y_i} \times (1 - P(Y=1|\mathbf{X}_i))^{(1-y_i)} \right]

b. Log-Likelihood Function

To simplify calculations, we take the logarithm of the likelihood function (log-likelihood):

\text{log} L(\beta) = \sum_{i=1}^{m} \left[ y_i \cdot \log(P(Y=1|\mathbf{X}_i)) + (1 - y_i) \cdot \log(1 - P(Y=1|\mathbf{X}_i)) \right]

5. Optimization: Gradient Descent

To find the optimal parameters $\beta_0, \beta_1, \dots, \beta_n$ we maximize the log-likelihood function using an optimization algorithm like gradient descent.

a. Gradient Descent Update Rule

The gradient descent algorithm updates the parameters iteratively:

\beta_j := \beta_j + \alpha \cdot \frac{\partial}{\partial \beta_j} \text{log} L(\beta)

$\alpha$ : The learning rate, which controls the step size in the parameter space.
$\frac{\partial}{\partial \beta_j} \text{log} L(\beta)$ : The derivative of the log-likelihood function with respect to $\beta_j$ .

6. Interpretation of Coefficients

Intercept $\beta_0$ : The log-odds of the outcome when all features $X_1, X_2, \dots, X_n$ are zero.
Coefficients $\beta_1, \dots, \beta_n$ : Each coefficient represents the change in the log-odds of the outcome for a one-unit increase in the corresponding feature, holding all other features constant.

7. Example: Binary Classification with a Single Feature

For a single feature $X$ , the logistic regression model simplifies to:

P(Y=1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}

This gives the probability that $Y$ is 1 given the value of $X$ .
The decision rule is applied to classify the input as 0 or 1 based on this probability.

Implementation of Logistic regression:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load dataset
data = pd.read_csv('housing.csv')

# Display the first few rows
data.head()

# Check for missing values
data.isnull().sum()

# For simplicity, drop rows with missing values
data = data.dropna()

# Convert categorical columns to numerical if any (e.g., using one-hot encoding)
# data = pd.get_dummies(data, drop_first=True)

# Example: Suppose we want to predict whether 'Price' is above the median
median_price = data['Price'].median()
data['Above_Median'] = (data['Price'] > median_price).astype(int)

# Features and target variable
X = data.drop(['Price', 'Above_Median'], axis=1)
y = data['Above_Median']

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the model
log_reg = LogisticRegression()

# Train the model
log_reg.fit(X_train, y_train)

# Predict on the test set
y_pred = log_reg.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:')
print(conf_matrix)

# Classification Report
class_report = classification_report(y_test, y_pred)
print('Classification Report:')
print(class_report)

Dr.Muttipati

Search This Blog