Cnuinformatica

Posts

Showing posts from September, 2024

Bagging & Random Forest

- September 27, 2024

What is Bagging? Bagging , short for Bootstrap Aggregating , is an ensemble learning technique designed to improve the stability and accuracy of machine learning algorithms. It works by: Generating Multiple Datasets : It creates multiple subsets of the original training data through bootstrapping, which involves random sampling with replacement. This means that some observations may appear multiple times in a subset while others may not appear at all. Training Multiple Models : For each of these subsets, a separate model is trained. This can be any model, but decision trees are commonly used because they are prone to overfitting. Aggregating Results : Once all the models are trained, their predictions are aggregated to produce a final output. For classification tasks, the most common approach is to take a majority vote, while for regression, the average of the predictions is used. What are Random Forests? Random Forests is a specific implementation of Bagging that employs decision tre...

Random Forest

- September 27, 2024

import numpy as np import pandas as pd from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score, classification_report from sklearn.utils import resample class CustomRandomForest: def __init__(self, n_estimators=100, max_features='sqrt', random_state=None): self.n_estimators = n_estimators self.max_features = max_features self.random_state = random_state self.trees = [] np.random.seed(self.random_state) def fit(self, X, y): # Create multiple bootstrapped datasets and train decision trees for _ in range(self.n_estimators): # Create a bootstrapped sample X_bootst...