A naive Bayes model for cancer classification using a set of documents

Last update: Nov 24, 2021

Related tags

Machine Learning naivebayes

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

Purpose
Requirements/files included
How to use

1. Purpose

The Purpose of this program is to read in from csv files containing two columns:

                    Document | classifcation
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer
                    xxxxxx   | cancer/nocancer

This program uses the data to read into classes containing each documents one file is used as the training set, and the other as the testing set. Each set goes through the same tokenization. From there one is trained and the other is tested.

2. Requirements/files used

* python3 * numpy library - for calculating log * pandas library - for reading in csv files * main.py and naivesbayes.py * stopwords.txt - list of stop words * Scoring.docx - list of scoring for precsion, Recall, F-score

3. How to use

This program has 3 modes of operation for tokenizing your sets:

                $python3 main.py -train 1 -test 1

This first command will execute std tokenization on training set 1 and test set 1. To change which training set just change the 1 into a 2.

                $python3 main.py -train 2 -test 1

#NOTE do not change testing set number leave it as 1 it was intended for multiple testing sets

For binary:

                $python3 main.py -train # -test 1 -b

For stopwords:

                $python3 main.py -train # -test 1 -s

For both stopwords and binary:

                $python3 main.py -train # -test 1 -b -s

A naive Bayes model for cancer classification using a set of documents

Related tags

Overview

Naivebayes text classifcation model for cancer and noncancer documents

Author: Alex King

1. Purpose

2. Requirements/files used

3. How to use

Owner

Alex W King

PySurvival is an open source python package for Survival Analysis modeling

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

Factorization machines in python

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

A Python implementation of FastDTW

About Solve CTF offline disconnection problem - based on python3's small crawler

The unified machine learning framework, enabling framework-agnostic functions, layers and libraries.

Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters

A Python library for choreographing your machine learning research.

Lightweight Machine Learning Experiment Logging 📖

hgboost - Hyperoptimized Gradient Boosting

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

Reggy - Regressions with arbitrarily complex regularization terms

Machine Learning for Time-Series with Python.Published by Packt

Crypto-trading - ML techiques are used to forecast short term returns in 14 popular cryptocurrencies

MegFlow - Efficient ML solutions for long-tailed demands.

Repositório para o #alurachallengedatascience1

Simulate & classify transient absorption spectroscopy (TAS) spectral features for bulk semiconducting materials (Post-DFT)

A handy tool for common machine learning models' hyper-parameter tuning.

The Ultimate FREE Machine Learning Study Plan