Overfit AI – Page 8 – Models That Don't Break on New Data

ROC Curve and AUC

February 20, 2021 September 19, 2023Machine Learning, Supervised Learning

ROC curves and AUC are used to measure performance in machine earning. They are the most widely used evaluation metrics for checking any classification model’s performance. It tells how much the model is capable of distinguishing between classes. ROC (Receiver Operator Characteristic Curve) is a probability curve and AUC represents the…

Machine Learning Pipeline

February 14, 2021 August 16, 2023Data Science

Introduction The machine learning pipeline is a systematic and organized way to move through an ML project. Each step is essential and builds on the previous one, forming a path from understanding your problem to deploying a solution. Following this pipeline ensures a disciplined approach, which is vital for the…

Ridge and Lasso Regression

February 14, 2021 September 19, 2023Machine Learning, Supervised Learning

Regularization is a process used to create an optimally complex model. A model should be as simple as possible. Ridge and Lasso regression are some of the simple techniques to reduce model complexity and prevent over-fitting which may result from simple linear regression. Linear regression is the simplest supervised machine learning…

Principal Component Analysis

January 25, 2021 September 19, 2023Machine Learning

Principal Component Analysis (PCA) is a dimensionality reduction technique commonly used in data analysis and machine learning. One of its primary objectives is to capture the most variance in the data while reducing the dimensionality of the dataset. Variance is a statistical measure that quantifies the spread or dispersion of…

Eigenvalues and Eigenvectors

January 25, 2021 September 18, 2023Machine Learning

Eigenvalues and eigenvectors are fundamental concepts in linear algebra that play a key role in various data science algorithms, notably in dimensionality reduction techniques like Principal Component Analysis (PCA). In simple terms, an eigenvector is a vector that only scales (stretches or compresses) and does not change its direction when…

Handling Imbalanced datasets

January 20, 2021 September 19, 2023Machine Learning

Imbalanced datasets are a common challenge in machine learning, where one class significantly outnumbers the others. This imbalance can lead to biased models that favor the majority class and perform poorly on minority classes. Fortunately, there are several strategies to address this issue and improve the performance of machine learning…

Handling outliers

January 20, 2021 September 18, 2023Machine Learning

Handling outliers is a crucial aspect of data preprocessing in data science projects. Outliers can significantly affect various aspects of data analysis, from basic statistics to the behavior and performance of predictive models. Outliers are data points that deviate significantly from other observations. They can arise due to: Measurement errors…

Key Steps in Data Preprocessing Pipeline

January 18, 2021 August 16, 2023Data Science

Introduction In a machine learning project, the quality of the data used is often a more significant determinant of success than the choice of model. Data preprocessing is the process of cleaning and transforming raw data into a format that can be effectively used to train machine learning models. It…

Handling Missing Values

January 15, 2021 August 19, 2023Machine Learning

Handling missing values is a critical step in the data preprocessing phase of building a machine learning model. Missing data can be problematic because most machine learning algorithms require complete datasets to train on. Here are some commonly used techniques to handle missing values: Removing Data: Listwise Deletion: This involves…

Coefficient of Correlation vs Coefficient of Determination

January 11, 2021 August 19, 2023Machine Learning

Coefficient of Correlation: It is the degree of relationship between two variables. Any two variables in this universe can be argued to have a correlation value. If they are not correlated then the correlation value can still be computed which would be 0. The correlation value always lies between -1…

Archives