Conducted ANOVA and Logistic regression analysis using matplot library to visualize the result.

Last update: Feb 06, 2022

Related tags

Overview

Intro-to-Data-Science

Conducted ANOVA and Logistic regression analysis.

Project ANOVA

The main aim of this project is to perform One-Way ANOVA analysis on the given set of data(values in various levels of education) using python. We build a model that outputs the summary and gives anova table. We set hypothesis for the given data and calculate F-statistic. From F-statistic, p-value is calculated. If the p-value is less than significance level, we reject Null hypothesis which refers to that means of all groups are not equal and the observed difference in the means is not due to sampling variability. After performing hypothesis test, we perform multiple pairwise comparisons of different groups using t-test to determine which means are different. In conclusion, we determine whether the mean of various levels of education is same or which levels of education have different means.

Project Logistic regression analysis

The main aim of this project is to perform logistic regression analysis on the given data set that represents whether a given e-mail is spam or not spam. The dataset contains 20 features that are used to determine whether an e-mail is spam or not spam. Before performing logistic regression, we perform feature elimination so that significant feature sets are used in model analysis. After modeling the data, we iterate the model for various threshold probability values and check the values of sensitivity and specificity for various thresholds.

Therefore, our goal is to find the optimal threshold value for which the true positive rate is close to 1 so that we build an optimum classification model that classifies a spam e-mail from ham.

Outline

Abstract
Theory
Exploratory Data analysis
Analysis Results & Explanation
Conclusion

Conducted ANOVA and Logistic regression analysis using matplot library to visualize the result.

Related tags

Overview

Intro-to-Data-Science

Project ANOVA

Project Logistic regression analysis

Outline

Owner

Chris Yuan

Client - 🔥 A tool for visualizing and tracking your machine learning experiments

MaD GUI is a basis for graphical annotation and computational analysis of time series data.

LinearRegression2 Tvads and CarSales

Extended Isolation Forest for Anomaly Detection

Distributed Deep learning with Keras & Spark

This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Time series changepoint detection

Bodywork deploys machine learning projects developed in Python, to Kubernetes.

Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

GroundSeg Clustering Optimized Kdtree

This jupyter notebook project was completed by me and my friend using the dataset from Kaggle

Gaussian Process Optimization using GPy

Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Machine Learning Algorithms

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

A concept I came up which ditches the idea of "layers" in a neural network.

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

Exemplary lightweight and ready-to-deploy machine learning project