Fine-tuning

Fine-tuning in machine learning refers to optimizing a model’s performance by adjusting its hyperparameters, improving data processing steps, or refining the model architecture.
Here are some common Fine-tuning methods with examples:

Hyperparameter Tuning:

Hyperparameters are settings not learned from the data but set before the training begins. Fine-tuning involves searching for the optimal set of hyperparameters to improve model performance.

a. Grid Search
Grid Search is an exhaustive search over a specified parame
ter grid. It trains a model for every combination of hyperparameters and selects the best combination based on cross-validation performance.

import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler

# Sample Data (replace with your actual dataset)
data = {
    'Age': [25, 30, 35, 40, 50, 28, 45],
    'Income': [50000, 60000, 45000, 52000, 58000, 61000, 55000],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Male', 'Female', 'Female'],
    'Purchased': [1, 0, 1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)

# Encode categorical data
df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})

# Define features and target variable
X = df[['Age', 'Income', 'Gender']]
y = df['Purchased']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define the model
model = RandomForestClassifier(random_state=42)

# Define the hyperparameters grid
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Implement Grid Search with 3-Fold Cross-Validation
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Best hyperparameters
best_params = grid_search.best_params_
print("Best hyperparameters:", best_params)

# Best model
best_model = grid_search.best_estimator_

Best hyperparameters: {'max_depth': None, 'min_samples_split': 5, 'n_estimators': 300}

b. Randomized Search

Randomized Search is similar to Grid Search but searches over a random set of hyperparameters from the specified distributions, which allows for a broader search in less time.

import pandas as pd
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from scipy.stats import randint

# Sample Data (replace with your actual dataset)
data = {
    'Age': [25, 30, 35, 40, 50, 28, 45],
    'Income': [50000, 60000, 45000, 52000, 58000, 61000, 55000],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Male', 'Female', 'Female'],
    'Purchased': [1, 0, 1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)

# Encode categorical data
df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})

# Define features and target variable
X = df[['Age', 'Income', 'Gender']]
y = df['Purchased']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define the model
model = RandomForestClassifier(random_state=42)

# Define the hyperparameters distribution
param_dist = {
    'n_estimators': randint(100, 500),
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': randint(2, 15)
}

# Implement Randomized Search with 3-Fold Cross-Validation
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=10, cv=3, scoring='accuracy', random_state=42)
random_search.fit(X_train, y_train)

# Best hyperparameters
best_params = random_search.best_params_
print("Best hyperparameters:", best_params)

# Best model
best_model = random_search.best_estimator_

2. Feature Engineering

Feature engineering entails developing new features or altering existing ones to improve model performance.

Polynomial Features: To represent non-linear connections in a linear model, polynomial features might be included.

from sklearn.preprocessing import PolynomialFeatures

# Generate polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
print (poly)

3. Model Regularization

Regularization techniques help prevent overfitting by penalizing more complex models.

a. L1 and L2 Regularization

L1 (Lasso) and L2 (Ridge) regularization add a penalty to the loss function to constrain the model coefficients.

from sklearn.linear_model import Ridge, Lasso

# Ridge Regression (L2 Regularization)
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

# Lasso Regression (L1 Regularization)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

           Lasso          
Lasso(alpha=0.1)

4. Learning Rate Scheduling

Learning Rate Scheduling is a technique used to adjust the learning rate during training of a machine learning model, particularly in neural networks. Adjusting the learning rate can help in reaching convergence faster and avoiding local minima.

In neural networks, adjusting the learning rate during training can improve convergence. Common strategies include:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import LearningRateScheduler

# Define a simple neural network model
model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Define a learning rate schedule function
def step_decay(epoch):
    initial_lr = 0.01
    drop = 0.5
    epochs_drop = 10
    lr = initial_lr * (drop ** (epoch // epochs_drop))
    return lr

# Implement the learning rate scheduler
lr_scheduler = LearningRateScheduler(step_decay)

# Fit the model with the learning rate scheduler
model.fit(X_train, y_train, epochs=50, callbacks=[lr_scheduler], validation_data=(X_test, y_test))
  • Step Decay: Reducing the learning rate by a factor after a fixed number of epochs.
# Initial parameters
initial_lr = 0.1
drop = 0.5
epochs_drop = 10
num_epochs = 50

def step_decay(epoch):
    return initial_lr * (drop ** (epoch // epochs_drop))

# Simulating the learning rate schedule over epochs
for epoch in range(num_epochs):
    lr = step_decay(epoch)
    print(f"Epoch {epoch+1}/{num_epochs}, Learning Rate: {lr:.4f}")
  • Exponential Decay: Reducing the learning rate exponentially.
# Initial parameters
initial_lr = 0.1
drop = 0.5
epochs_drop = 10
num_epochs = 50

def step_decay(epoch):
    return initial_lr * (drop ** (epoch // epochs_drop))

# Simulating the learning rate schedule over epochs
for epoch in range(num_epochs):
    lr = step_decay(epoch)
    print(f"Epoch {epoch+1}/{num_epochs}, Learning Rate: {lr:.4f}")

  • Time-Based Decay: In Time-Based Decay, the learning rate decreases linearly over time.
# Initial parameters
initial_lr = 0.1
decay_rate = 0.01
num_epochs = 50

def time_based_decay(epoch):
    return initial_lr / (1 + decay_rate * epoch)

# Simulating the learning rate schedule over epochs
for epoch in range(num_epochs):
    lr = time_based_decay(epoch)
    print(f"Epoch {epoch+1}/{num_epochs}, Learning Rate: {lr:.4f}")

  • Custom Schedule: You can also define a custom schedule where the learning rate is adjusted at specific epochs.
# Initial parameters
initial_lr = 0.1
num_epochs = 50

def custom_schedule(epoch):
    if epoch < 10:
        return initial_lr
    elif epoch < 30:
        return initial_lr * 0.5
    else:
        return initial_lr * 0.1

# Simulating the learning rate schedule over epochs
for epoch in range(num_epochs):
    lr = custom_schedule(epoch)
    print(f"Epoch {epoch+1}/{num_epochs}, Learning Rate: {lr:.4f}")

5. Early Stopping

Early stopping is a technique used to stop training when the model’s performance on a validation set stops improving, which helps to prevent overfitting.

from sklearn.model_selection import train_test_split

# Assuming X_train and y_train are your full training data
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

from tensorflow.keras.callbacks import EarlyStopping

# Define early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=5)

# Train the model with early stopping
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=50, callbacks=[early_stopping])

6. Ensemble Methods

Ensemble methods combine multiple models to improve overall performance. Common approaches include bagging, boosting, and stacking.

Example:

  • Random Forest (Bagging):
from sklearn.ensemble import RandomForestClassifier

# Train a Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

        RandomForestClassifier         
RandomForestClassifier(random_state=42)

  • Gradient Boosting (Boosting):
from sklearn.ensemble import GradientBoostingClassifier

# Train a Gradient Boosting model
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
gb_model.fit(X_train, y_train)

        GradientBoostingClassifier         
GradientBoostingClassifier(random_state=42)

Comments

Popular posts from this blog

About me

A set of documents that need to be classified, use the Naive Bayesian Classifier

Keras