Skip to main content

Posts

Bagging & Random Forest

What is Bagging? Bagging , short for Bootstrap Aggregating , is an ensemble learning technique designed to improve the stability and accuracy of machine learning algorithms. It works by: Generating Multiple Datasets : It creates multiple subsets of the original training data through bootstrapping, which involves random sampling with replacement. This means that some observations may appear multiple times in a subset while others may not appear at all. Training Multiple Models : For each of these subsets, a separate model is trained. This can be any model, but decision trees are commonly used because they are prone to overfitting. Aggregating Results : Once all the models are trained, their predictions are aggregated to produce a final output. For classification tasks, the most common approach is to take a majority vote, while for regression, the average of the predictions is used. What are Random Forests? Random Forests is a specific implementation of Bagging that employs decision tre...

Random Forest

import numpy as np import pandas as pd from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score, classification_report from sklearn.utils import resample class CustomRandomForest:     def __init__(self, n_estimators=100, max_features='sqrt', random_state=None):         self.n_estimators = n_estimators         self.max_features = max_features         self.random_state = random_state         self.trees = []         np.random.seed(self.random_state)     def fit(self, X, y):         # Create multiple bootstrapped datasets and train decision trees         for _ in range(self.n_estimators):             # Create a bootstrapped sample             X_bootst...

Machine Learning Programs

Machine Learning Programs Implement the matrices operations using both Numpy and pandas  Using matplotlib to perform data visualization on the standard dataset  Implement Linear Regression using ordinary least square(OLS) and Gradient Descent methods Implement quadratic Regression   Implement Logistic Regression Evaluate performance measures on regression models (Linear, quadratic and Logistic). Implement classification using SVM   Implement Decision-tree learning   Implement Bagging using Random Forests   Implement K-means Clustering to Find Natural Patterns in Data   Implement DBSCAN clustering   Implement the Gaussian Mixture Model   Solve the curse of dimensionality by implementing the PCA algorithm on a high-dimensional   Comparison of Machine Learning algorithms  

Evaluation Performance measure of Regression models

  Evaluation Performance measure of Regression models You need to read CSV  import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression, LogisticRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.metrics import (mean_absolute_error, mean_squared_error, r2_score,                              accuracy_score, precision_score, recall_score, f1_score,                              confusion_matrix, roc_curve, auc, RocCurveDisplay) import matplotlib.pyplot as plt # Assuming you have a dataset loaded as `data` # For simplicity, let's assume 'X' are the features and 'y' is the target # Splitting the dataset into train and test sets X = data.drop(columns=[ 'target' ]) y = data[ 'target' ] X_train, X_test, y_train, y_tes...

Logistic Regression

Logistic regression is a statistical method used for binary classification problems. It's particularly useful when you need to predict the probability of a binary outcome based on one or more predictor variables. Here's a breakdown: What is Logistic Regression? Purpose : It models the probability of a binary outcome (e.g., yes/no, success/failure) using a logistic function (sigmoid function). Function : The logistic function maps predicted values (which are in a range from negative infinity to positive infinity) to a probability range between 0 and 1. Formula : The model is typically expressed as: P ( Y = 1 ∣ X ) = 1 1 + e − ( β 0 + β 1 X ) P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}} P ( Y = 1∣ X ) = 1 + e − ( β 0 ​ + β 1 ​ X ) 1 ​ Where P ( Y = 1 ∣ X ) P(Y = 1 | X) P ( Y = 1∣ X ) is the probability of the outcome being 1 given predictor X X X , and β 0 \beta_0 β 0 ​ and β 1 \beta_1 β 1 ​ are coefficients estimated during model training. When to Apply Logistic R...

Main Challenges of Machine Learning

Machine Learning (ML) offers powerful capabilities, but it also comes with a set of significant challenges that must be addressed to ensure successful model development and deployment. Here are some of the main challenges in ML: 1. Data Quality and Quantity Data Quality : ML models require high-quality data to make accurate predictions. Poor data quality, such as missing values, noise, or inconsistencies, can lead to biased or incorrect models. Ensuring data is clean, well-labeled, and relevant is a crucial challenge. Data Quantity : ML models often require large amounts of data to learn effectively. Inadequate data can lead to underfitting, where the model fails to capture the underlying patterns. Gathering sufficient data, especially for rare events or new applications, can be difficult. Example: Medical Diagnosis 2. Overfitting and Underfitting Overfitting : This occurs when a model becomes too complex and starts to learn noise and irrelevant details from the training data, leading ...

Why Use Machine Learning

 Why use Machine Learning Automation of Complex Tasks : ML can automate decision-making processes, handling tasks that are too complex for traditional rule-based systems. Handling Large-Scale Data : ML algorithms can process and analyze vast amounts of data, uncovering patterns and insights that would be impossible to identify manually. Improved Accuracy : In many cases, ML models can make predictions and decisions with greater accuracy than humans, especially when dealing with complex data. Adaptability : ML models can adapt to new data, continuously improving their performance over time as they are exposed to more information. Use Cases: Healthcare : Disease prediction, personalized medicine, medical image analysis. Finance : Fraud detection, algorithmic trading, credit scoring. Retail : Customer segmentation, recommendation systems, demand forecasting. Manufacturing : Predictive maintenance, quality control, supply chain optimization. Transportation : Autonomous vehicles, route ...