Skip to main content

Fine-tuning

Fine-tuning in machine learning refers to optimizing a model’s performance by adjusting its hyperparameters, improving data processing steps, or refining the model architecture.
Here are some common Fine-tuning methods with examples:

Hyperparameter Tuning:

Hyperparameters are settings not learned from the data but set before the training begins. Fine-tuning involves searching for the optimal set of hyperparameters to improve model performance.

a. Grid Search
Grid Search is an exhaustive search over a specified parame
ter grid. It trains a model for every combination of hyperparameters and selects the best combination based on cross-validation performance.

import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler

# Sample Data (replace with your actual dataset)
data = {
    'Age': [25, 30, 35, 40, 50, 28, 45],
    'Income': [50000, 60000, 45000, 52000, 58000, 61000, 55000],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Male', 'Female', 'Female'],
    'Purchased': [1, 0, 1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)

# Encode categorical data
df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})

# Define features and target variable
X = df[['Age', 'Income', 'Gender']]
y = df['Purchased']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define the model
model = RandomForestClassifier(random_state=42)

# Define the hyperparameters grid
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Implement Grid Search with 3-Fold Cross-Validation
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Best hyperparameters
best_params = grid_search.best_params_
print("Best hyperparameters:", best_params)

# Best model
best_model = grid_search.best_estimator_

Best hyperparameters: {'max_depth': None, 'min_samples_split': 5, 'n_estimators': 300}

b. Randomized Search

Randomized Search is similar to Grid Search but searches over a random set of hyperparameters from the specified distributions, which allows for a broader search in less time.

import pandas as pd
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from scipy.stats import randint

# Sample Data (replace with your actual dataset)
data = {
    'Age': [25, 30, 35, 40, 50, 28, 45],
    'Income': [50000, 60000, 45000, 52000, 58000, 61000, 55000],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Male', 'Female', 'Female'],
    'Purchased': [1, 0, 1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)

# Encode categorical data
df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})

# Define features and target variable
X = df[['Age', 'Income', 'Gender']]
y = df['Purchased']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define the model
model = RandomForestClassifier(random_state=42)

# Define the hyperparameters distribution
param_dist = {
    'n_estimators': randint(100, 500),
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': randint(2, 15)
}

# Implement Randomized Search with 3-Fold Cross-Validation
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=10, cv=3, scoring='accuracy', random_state=42)
random_search.fit(X_train, y_train)

# Best hyperparameters
best_params = random_search.best_params_
print("Best hyperparameters:", best_params)

# Best model
best_model = random_search.best_estimator_

2. Feature Engineering

Feature engineering entails developing new features or altering existing ones to improve model performance.

Polynomial Features: To represent non-linear connections in a linear model, polynomial features might be included.

from sklearn.preprocessing import PolynomialFeatures

# Generate polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
print (poly)

3. Model Regularization

Regularization techniques help prevent overfitting by penalizing more complex models.

a. L1 and L2 Regularization

L1 (Lasso) and L2 (Ridge) regularization add a penalty to the loss function to constrain the model coefficients.

from sklearn.linear_model import Ridge, Lasso

# Ridge Regression (L2 Regularization)
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

# Lasso Regression (L1 Regularization)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

           Lasso          
Lasso(alpha=0.1)

4. Learning Rate Scheduling

Learning Rate Scheduling is a technique used to adjust the learning rate during training of a machine learning model, particularly in neural networks. Adjusting the learning rate can help in reaching convergence faster and avoiding local minima.

In neural networks, adjusting the learning rate during training can improve convergence. Common strategies include:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import LearningRateScheduler

# Define a simple neural network model
model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Define a learning rate schedule function
def step_decay(epoch):
    initial_lr = 0.01
    drop = 0.5
    epochs_drop = 10
    lr = initial_lr * (drop ** (epoch // epochs_drop))
    return lr

# Implement the learning rate scheduler
lr_scheduler = LearningRateScheduler(step_decay)

# Fit the model with the learning rate scheduler
model.fit(X_train, y_train, epochs=50, callbacks=[lr_scheduler], validation_data=(X_test, y_test))
  • Step Decay: Reducing the learning rate by a factor after a fixed number of epochs.
# Initial parameters
initial_lr = 0.1
drop = 0.5
epochs_drop = 10
num_epochs = 50

def step_decay(epoch):
    return initial_lr * (drop ** (epoch // epochs_drop))

# Simulating the learning rate schedule over epochs
for epoch in range(num_epochs):
    lr = step_decay(epoch)
    print(f"Epoch {epoch+1}/{num_epochs}, Learning Rate: {lr:.4f}")
  • Exponential Decay: Reducing the learning rate exponentially.
# Initial parameters
initial_lr = 0.1
drop = 0.5
epochs_drop = 10
num_epochs = 50

def step_decay(epoch):
    return initial_lr * (drop ** (epoch // epochs_drop))

# Simulating the learning rate schedule over epochs
for epoch in range(num_epochs):
    lr = step_decay(epoch)
    print(f"Epoch {epoch+1}/{num_epochs}, Learning Rate: {lr:.4f}")

  • Time-Based Decay: In Time-Based Decay, the learning rate decreases linearly over time.
# Initial parameters
initial_lr = 0.1
decay_rate = 0.01
num_epochs = 50

def time_based_decay(epoch):
    return initial_lr / (1 + decay_rate * epoch)

# Simulating the learning rate schedule over epochs
for epoch in range(num_epochs):
    lr = time_based_decay(epoch)
    print(f"Epoch {epoch+1}/{num_epochs}, Learning Rate: {lr:.4f}")

  • Custom Schedule: You can also define a custom schedule where the learning rate is adjusted at specific epochs.
# Initial parameters
initial_lr = 0.1
num_epochs = 50

def custom_schedule(epoch):
    if epoch < 10:
        return initial_lr
    elif epoch < 30:
        return initial_lr * 0.5
    else:
        return initial_lr * 0.1

# Simulating the learning rate schedule over epochs
for epoch in range(num_epochs):
    lr = custom_schedule(epoch)
    print(f"Epoch {epoch+1}/{num_epochs}, Learning Rate: {lr:.4f}")

5. Early Stopping

Early stopping is a technique used to stop training when the model’s performance on a validation set stops improving, which helps to prevent overfitting.

from sklearn.model_selection import train_test_split

# Assuming X_train and y_train are your full training data
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

from tensorflow.keras.callbacks import EarlyStopping

# Define early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=5)

# Train the model with early stopping
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=50, callbacks=[early_stopping])

6. Ensemble Methods

Ensemble methods combine multiple models to improve overall performance. Common approaches include bagging, boosting, and stacking.

Example:

  • Random Forest (Bagging):
from sklearn.ensemble import RandomForestClassifier

# Train a Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

        RandomForestClassifier         
RandomForestClassifier(random_state=42)

  • Gradient Boosting (Boosting):
from sklearn.ensemble import GradientBoostingClassifier

# Train a Gradient Boosting model
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
gb_model.fit(X_train, y_train)

        GradientBoostingClassifier         
GradientBoostingClassifier(random_state=42)

Comments

Popular posts from this blog

ML Lab Questions

1. Using matplotlib and seaborn to perform data visualization on the standard dataset a. Perform the preprocessing b. Print the no of rows and columns c. Plot box plot d. Heat map e. Scatter plot f. Bubble chart g. Area chart 2. Build a Linear Regression model using Gradient Descent methods in Python for a wine data set 3. Build a Linear Regression model using an ordinary least-squared model in Python for a wine data set  4. Implement quadratic Regression for the wine dataset 5. Implement Logistic Regression for the wine data set 6. Implement classification using SVM for Iris Dataset 7. Implement Decision-tree learning for the Tip Dataset 8. Implement Bagging using Random Forests  9.  Implement K-means Clustering    10.  Implement DBSCAN clustering  11.  Implement the Gaussian Mixture Model  12. Solve the curse of Dimensionality by implementing the PCA algorithm on a high-dimensional 13. Comparison of Classification algorithms  14. Compa...

DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular density-based clustering algorithm that groups data points based on their density in feature space. It’s beneficial for datasets with clusters of varying shapes, sizes, and densities, and can identify noise or outliers. Step 1: Initialize Parameters Define two important parameters: Epsilon (ε) : The maximum distance between two points for them to be considered neighbors. Minimum Points (minPts) : The minimum number of points required in an ε-radius neighborhood for a point to be considered a core point. Step 2: Label Each Point as Core, Border, or Noise For each data point P P P in the dataset: Find all points within the ε radius of P P P (the ε-neighborhood of P P P ). Core Point : If P P P has at least minPts points within its ε-neighborhood, it’s marked as a core point. Border Point : If P P P has fewer than minPts points in its ε-neighborhood but is within the ε-neighborhood of a core point, it’...

Gaussian Mixture Model

A Gaussian Mixture Model (GMM) is a probabilistic model used for clustering and density estimation. It assumes that data is generated from a mixture of several Gaussian distributions, each representing a cluster within the dataset. Unlike K-means, which assigns data points to the nearest cluster centroid deterministically, GMM considers each data point as belonging to each cluster with a certain probability, allowing for soft clustering. GMM is ideal when: Clusters have elliptical shapes or different spreads : GMM captures varying shapes and densities, unlike K-means, which assumes clusters are spherical. Soft clustering is preferred : If you want to know the probability of a data point belonging to each cluster (not a hard assignment). Data has overlapping clusters : GMM allows a point to belong partially to multiple clusters, which is helpful when clusters have significant overlap. Applications of GMM Image Segmentation : Used to segment images into regions, where each region can be...