Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Overview

Retrainable-Faulty-Wafer-Detector

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced) and then train the model on this data so that it can continuously update itself with the environment and become more generalized with time. In this regard, a training and prediction dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values then they are considered good files on which we can train or predict our model, however, if it didn't match then they are moved to the bad folder. Moreover, we also identify the columns with null values. If all the data in a column is missing then the file is also moved to the bad folder. On the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and considered it as good data.

Data Insertion in Database:

First, open a connection to the database if it exists otherwise create a new one. A table with the name train_good_raw_dt or pred_good_raw_dt is created in the database, based on the training or prediction process, for inserting the good data files obtained from the data validation step. If the table is already present then new files are inserted in that table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file, to be used for the model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The columns with zero standard deviation were also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, Random Forest, K-Neighbours, Logistic Regression and XGBoost. For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for all the models and select the one with the best score. Similarly, the best model is selected for each cluster. For every cluster, the models are saved so that they can be used in future predictions. In the end, the confusion matrix of the model associated with every cluster is also saved to give the a glance over the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Retraining:

After the prediction, the prediction data is merged with the previous training dataset and then the models were retrained on this data using the hyperparameter values obtained from the GridSearch. The cycle repeats with every prediction it does and learns from the newly acquired data, making it more robust.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Patch PL to disable LK verification. Patch LK to disable boot/recovery verification.

Simple Python(3) script to disable LK verification in Amazon Preloader images and boot/recovery image verification in Amazon LK ("Little Kernel") images.

Roger Ortiz 18 Mar 17, 2022
Data on Free Food at MIT

MIT Free Food Timing Procrastinating research by plotting data on how long it takes emails on the free-food at mit edu mailing list to go through. Dat

Peter Sharpe 2 Nov 01, 2021
Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

Retrying Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

Ray Holder 1.9k Dec 29, 2022
Cross-platform MachO/ObjC Static binary analysis tool & library. class-dump + otool + lipo + more

ktool Static Mach-O binary metadata analysis tool / information dumper pip3 install k2l Development is currently taking place on the @python3.10 branc

Kritanta 301 Dec 28, 2022
"Hacking" the (Telekom) Zyxel GPON SFP module (PMG3000-D20B)

"Hacking" the (Telekom) Zyxel GPON SFP module (PMG3000-D20B) The SFP can be sour

Matthias Riegler 52 Jan 03, 2023
A general-purpose wallet generator, for supported coins only

2gen A general-purpose generator for keys. Designed for all cryptocurrencies supporting the Bitcoin format of keys and addresses. Functions To enable

Vlad Usatii 1 Jan 12, 2022
TinyBar - Tiny MacOS menu bar utility to track price dynamics for assets on TinyMan.org

📃 About A simple MacOS menu bar app to display current coins from most popular

Al 8 Dec 23, 2022
Software that extracts spreadsheets from various .pdf files to .csv

Extração de planilhas de diversos arquivos .pdf para .csv O código inteiro foi desenvolvido em Python. Foi utilizado o pacote "tabula" e a biblioteca

Marcos Silva 2 Jan 09, 2022
A Python application that helps users determine their calorie intake, and automatically generates customized weekly meal and workout plans based on metrics computed using their physical parameters

A Python application that helps users determine their calorie intake, and automatically generates customized weekly meal and workout plans based on metrics computed using their physical parameters

Anam Iqbal 1 Jan 13, 2022
Turn a raspberry pi into a Bluetooth Midi device

PiBluetoothMidSetup This will change serveral system wide packages/configurations Do not run this on your primary machine or anything you don't know h

MyLab6 40 Sep 19, 2022
WordPress-style shortcodes for Python

Python Shortcodes WordPress-style shortcodes for Python Create and use WordPress-style shortcodes in your Python based app. Example # static output de

Bob 1 Dec 22, 2021
Moji sends text and fun facts from different APIs wit da use of a notification deamon

Moji sends text and fun facts from different APIs wit da use of a notification deamon. Can be runned via dmenu or rofi.

kshly 2 Jan 12, 2022
LeetComp - Background tasks powering the static content at LeetComp

LeetComp Analysing compensations mentioned on the Leetcode forums (https://kuuts

Kumar Utsav 125 Dec 21, 2022
Fast STL (ASCII & Binary) importer for Blender

blender-fast-stl-importer Fast STL (ASCII & Binary) importer for Blender based on https://en.wikipedia.org/wiki/STL_(file_format) Technical notes: flo

Iyad Ahmed 7 Apr 17, 2022
A compilation of useful scripts to automate common tasks

Scripts-To-Automate-This A compilation of useful scripts for common tasks Name What it does Type Add file extensions Adds ".png" to a list of file nam

0 Nov 05, 2021
This library attempts to abstract the handling of Sigma rules in Python

This library attempts to abstract the handling of Sigma rules in Python. The rules are parsed using a schema defined with pydantic, and can be easily loaded from YAML files into a structured Python o

Caleb Stewart 44 Oct 29, 2022
A framework that let's you compose websites in Python with ease!

Perry Perry = A framework that let's you compose websites in Python with ease! Perry works similar to Qt and Flutter, allowing you to create componen

Linkus 13 Oct 09, 2022
All kinds of programs are accepted here, raise a genuine PR, and claim a PR, Make 4 successful PR's and get the Stickers and T-Shirt from hacktoberfest 2021

this repository is excluded from hacktoberfest Hacktoberfest-2021 This repository aims to help code beginners with their first successful pull request

34 Sep 11, 2022
Created a Python Keylogger script.

Python Script Simple Keylogger Script WHAT IS IT? Created a Python Keylogger script. HOW IT WORKS Once the script has been executed, it will automatic

AC 0 Dec 12, 2021
A general purpose low level programming language written in Python.

A general purpose low level programming language written in Python. Basal is an easy mid level programming language compiling to C. It has an easy syntax, similar to Python, Rust etc.

Snm Logic 6 Mar 30, 2022