Logistic regression is a statistical method used for binary classification problems. It's particularly useful when you need to predict the probability of a binary outcome based on one or more predictor variables. Here's a breakdown:
What is Logistic Regression?
- Purpose: It models the probability of a binary outcome (e.g., yes/no, success/failure) using a logistic function (sigmoid function).
- Function: The logistic function maps predicted values (which are in a range from negative infinity to positive infinity) to a probability range between 0 and 1.
- Formula: The model is typically expressed as: Where is the probability of the outcome being 1 given predictor , and and are coefficients estimated during model training.
When to Apply Logistic Regression
- Binary Outcomes: Use logistic regression when your outcome variable is binary or categorical with two levels.
- Predicting Probabilities: When you must predict the probability of a certain event occurring.
- Feature Types: It's suitable for continuous or categorical predictor variables.
1. Understanding Logistic Regression
Logistic regression is used for binary classification problems, where the goal is to predict the probability of a binary outcome (e.g., yes/no, 0/1, success/failure). The model predicts the probability that a given input belongs to a particular class.
2. The Logistic Regression Model
a. Linear Combination of Inputs
The foundation of logistic regression is a linear model, similar to linear regression:
- : The intercept (also called the bias term).
- : The coefficients (weights) for each feature.
- : The input features.
This linear combination is not directly used for classification; instead, it is transformed using the logistic (sigmoid) function to predict probabilities.
b. Sigmoid Function (Logistic Function)
The sigmoid function maps the linear combination to a value between 0 and 1, representing the probability of the positive class (e.g., ):
- : The probability that the outcome is 1, given the input features .
- The sigmoid function ensures the output is between 0 and 1, which can be interpreted as a probability.
3. Decision Rule
After calculating the probability, a decision rule is applied to classify the input:
- Threshold: The default threshold is 0.5, but it can be adjusted depending on the problem's requirements.
4. Model Training: Maximum Likelihood Estimation (MLE)
To train the model, we need to find the parameters that best fit the data. This is done by maximizing the likelihood of observing the given data.
a. Likelihood Function
For each training instance, the likelihood of the observed outcome is:
- : The number of training instances.
- : The actual outcome for the -th instance.
For logistic regression, the likelihood can be expressed as:
b. Log-Likelihood Function
To simplify calculations, we take the logarithm of the likelihood function (log-likelihood):
5. Optimization: Gradient Descent
To find the optimal parameters we maximize the log-likelihood function using an optimization algorithm like gradient descent.
a. Gradient Descent Update Rule
The gradient descent algorithm updates the parameters iteratively:
- : The learning rate, which controls the step size in the parameter space.
- : The derivative of the log-likelihood function with respect to .
6. Interpretation of Coefficients
- Intercept : The log-odds of the outcome when all features are zero.
- Coefficients : Each coefficient represents the change in the log-odds of the outcome for a one-unit increase in the corresponding feature, holding all other features constant.
7. Example: Binary Classification with a Single Feature
For a single feature , the logistic regression model simplifies to:
- This gives the probability that is 1 given the value of .
- The decision rule is applied to classify the input as 0 or 1 based on this probability.
Comments