Skip to main content

Lab Internal



S.No

Program

1

a)     Write a program in python to read and write different types of Files(CSV, json, txt etc)

b)     Write a program in python to perform statistical analysis on given data set

2.

a)     Write a program in python to manipulate, Aggregate and Analyze data using Numpy

b)     Write a program in python to handle and Analyze data using Pandas

3.

a)     Working with vectors and matrices in python

b)     Working with matplotlib and seaborn packages in python

4.

a)     Writes a python program to get the number of observations, missing values and nan values from the given dataset

b)     Writes a python program to get observations of each species from iris data and plot it using seaborn or matplotlib packages

5.

a)     Writes a python program to create a join plot to describe individual distributions on the same plot between sepal length and Sepal width

b)     Writes a python program using seaborn to create a Kernal Density Estimate plot of petal_length versus petal width for setosa species of flower

6.

a)     Write a Python program to split the iris dataset into its attribute(X) and labels (Y). The X variable contains the first four column.

b)     Write a python program using scikit-learn to split the iris dataset into 80% train data and 20% test data. Out of total 150 records, the training set will come 120 records and the test set contains 30 of those records. Print both datasets

7.

a)     Write a Python program to split the iris dataset into its attribute(X) and labels (Y). The X variable contains the first four column.

b)     Write a python program using scikit-learn to convert species column in a numerical column of the iris data frame. To encode this data map convert each value to a number. e.g. Iris setosa:0, Iris-versicolor:1 and Iris-versicolor:2. Now print the iris data into 70% train data and 20 % test data. Out of total 150 records, the training set will contain 120 records and the test set contains 30 of those records. Print both datasets.

8.

a)     Write a python program to add an indeed field, changing misleading data fields, Re-expressing categorical data as numerical data, standardizing numerical fields and identifying outliers for data preparation phase for bank marketing data set.

b)     Write a python program to implement a correlation

9.

a)     Write a Python program to find the location address of a specified latitude and longitude using Nominatim API and Geopy package.

b)     Write a Python function to get the city, state, and country names of a specified latitude and longitude using Nominatim API and Geopy packages

10

a)     Write a Python program to search the Street address, and name from a given location information using Nominatim API and GeoPy package.

b)     Write a Python program to search the country name from a given state name using Nominatim API and GeoPy package.

 

Comments

Popular posts from this blog

ML Lab Questions

1. Using matplotlib and seaborn to perform data visualization on the standard dataset a. Perform the preprocessing b. Print the no of rows and columns c. Plot box plot d. Heat map e. Scatter plot f. Bubble chart g. Area chart 2. Build a Linear Regression model using Gradient Descent methods in Python for a wine data set 3. Build a Linear Regression model using an ordinary least-squared model in Python for a wine data set  4. Implement quadratic Regression for the wine dataset 5. Implement Logistic Regression for the wine data set 6. Implement classification using SVM for Iris Dataset 7. Implement Decision-tree learning for the Tip Dataset 8. Implement Bagging using Random Forests  9.  Implement K-means Clustering    10.  Implement DBSCAN clustering  11.  Implement the Gaussian Mixture Model  12. Solve the curse of Dimensionality by implementing the PCA algorithm on a high-dimensional 13. Comparison of Classification algorithms  14. Compa...

DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular density-based clustering algorithm that groups data points based on their density in feature space. It’s beneficial for datasets with clusters of varying shapes, sizes, and densities, and can identify noise or outliers. Step 1: Initialize Parameters Define two important parameters: Epsilon (ε) : The maximum distance between two points for them to be considered neighbors. Minimum Points (minPts) : The minimum number of points required in an ε-radius neighborhood for a point to be considered a core point. Step 2: Label Each Point as Core, Border, or Noise For each data point P P P in the dataset: Find all points within the ε radius of P P P (the ε-neighborhood of P P P ). Core Point : If P P P has at least minPts points within its ε-neighborhood, it’s marked as a core point. Border Point : If P P P has fewer than minPts points in its ε-neighborhood but is within the ε-neighborhood of a core point, it’...

Gaussian Mixture Model

A Gaussian Mixture Model (GMM) is a probabilistic model used for clustering and density estimation. It assumes that data is generated from a mixture of several Gaussian distributions, each representing a cluster within the dataset. Unlike K-means, which assigns data points to the nearest cluster centroid deterministically, GMM considers each data point as belonging to each cluster with a certain probability, allowing for soft clustering. GMM is ideal when: Clusters have elliptical shapes or different spreads : GMM captures varying shapes and densities, unlike K-means, which assumes clusters are spherical. Soft clustering is preferred : If you want to know the probability of a data point belonging to each cluster (not a hard assignment). Data has overlapping clusters : GMM allows a point to belong partially to multiple clusters, which is helpful when clusters have significant overlap. Applications of GMM Image Segmentation : Used to segment images into regions, where each region can be...