Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

Last update: Apr 21, 2022

Related tags

Machine Learning Feature-Engineering

Overview

Feature-Engineering

Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared.

When the dataset is passed through this script, the modeling starts. expected to be ready.

Dataset Story

The data set is the data set of the people who were in the Titanic shipwreck. It consists of 768 observations and 12 variables. The target variable is specified as "Survived"; 1: one's survival, 0: indicates the person's inability to survive.

Variables

PassengerId: ID of the passenger

Survived: Survival status (0: not survived, 1: survived)
Pclass: Ticket class (1: 1st class (upper), 2: 2nd class (middle), 3: 3rd class(lower))
Name: Name of the passenger
Sex: Gender of the passenger (male, female)
Age: Age in years
Sibsp: Number of siblings/spouses aboard the Titanic
- Sibling = Brother, sister, stepbrother, stepsister
- Spouse = Husband, wife (mistresses and fiances were ignored) Parch: Number of parents/children aboard the Titanic
- Parent = Mother, father
- Child = Daughter, son, stepdaughter, stepson
- Some children travelled only with a nanny , therefore Parch = 0 for them.
Ticket: Ticket number
Fare: Passenger fare
Cabin: Cabin number
Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

REFERENCE: Data Science and ML Boot Camp, 2021, Veri Bilimi Okulu (https://www.veribilimiokulu.com/)

Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

Related tags

Overview

Feature-Engineering

Dataset Story

Variables

Owner

kemalgunay

A single Python file with some tools for visualizing machine learning in the terminal.

PLUR is a collection of source code datasets suitable for graph-based machine learning.

Primitives for machine learning and data science.

This is my implementation on the K-nearest neighbors algorithm from scratch using Python

A simple application that calculates the probability distribution of a normal distribution

CobraML: Completely Customizable A python ML library designed to give the end user full control

Magenta: Music and Art Generation with Machine Intelligence

Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

Automatically create Faiss knn indices with the most optimal similarity search parameters.

Probabilistic time series modeling in Python

A toolkit for making real world machine learning and data analysis applications in C++

A quick reference guide to the most commonly used patterns and functions in PySpark SQL

Reproducibility and Replicability of Web Measurement Studies

Real-time domain adaptation for semantic segmentation

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

Graphsignal is a machine learning model monitoring platform.

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale.

Kalman filter library

This repository demonstrates the usage of hover to understand and supervise a machine learning task.