Skip to main content

Posts

Showing posts from September, 2024

Bagging & Random Forest

What is Bagging? Bagging , short for Bootstrap Aggregating , is an ensemble learning technique designed to improve the stability and accuracy of machine learning algorithms. It works by: Generating Multiple Datasets : It creates multiple subsets of the original training data through bootstrapping, which involves random sampling with replacement. This means that some observations may appear multiple times in a subset while others may not appear at all. Training Multiple Models : For each of these subsets, a separate model is trained. This can be any model, but decision trees are commonly used because they are prone to overfitting. Aggregating Results : Once all the models are trained, their predictions are aggregated to produce a final output. For classification tasks, the most common approach is to take a majority vote, while for regression, the average of the predictions is used. What are Random Forests? Random Forests is a specific implementation of Bagging that employs decision tre...

Random Forest

import numpy as np import pandas as pd from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score, classification_report from sklearn.utils import resample class CustomRandomForest:     def __init__(self, n_estimators=100, max_features='sqrt', random_state=None):         self.n_estimators = n_estimators         self.max_features = max_features         self.random_state = random_state         self.trees = []         np.random.seed(self.random_state)     def fit(self, X, y):         # Create multiple bootstrapped datasets and train decision trees         for _ in range(self.n_estimators):             # Create a bootstrapped sample             X_bootst...