Sentiment Analysis Project

This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The training models for this Machine Learning project are built through Count Vectorizer (for the countvectorizer.py program) and TF-IDF Vectorizer (for the tdidf.py program). You can see the difference in implementation and accuracy results through both types of Vectorizers by running the programs separately (usually, TF-IDF Vectorizer is considered more accurate).

System Requirements

Use the pip install command to install the following imports:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier

Usage (description of actions performed)

1. dataset imported
2. null values deleted
3. 30% representative sample is taken to avoid slow down of system
4. sentiments column added
5. input training features and labels defined
6. dataset split into training sets and testing sets
7. text data vectorizer (using CountVectorizer or TF-IDF Vectorizer)
8. models trained:
 -  Logistic Regression (linear clasification)
 -  Support Vector Machine (linear/non-linear data separated into classes by a line/hyperplane)
 -  K Nearest Neighbor (local approximation)
9. print Accuracy Scores, Confusion Matrix, Ture Positive and Negative Rates for all three models

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Related tags

Overview

Sentiment Analysis Project

System Requirements

Usage (description of actions performed)

Contributing

License

Owner

Simran Farrukh

A programming language with logic of Python, and syntax of all languages.

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

American Sign Language (ASL) to Text Converter

Text Classification in Turkish Texts with Bert

Basic Utilities for PyTorch Natural Language Processing (NLP)

The SVO-Probes Dataset for Verb Understanding

Klexikon: A German Dataset for Joint Summarization and Simplification

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

Extract Keywords from sentence or Replace keywords in sentences.

Dope Wars game engine on StarkNet L2 roll-up

The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

Transformer training code for sequential tasks

Pytorch-Named-Entity-Recognition-with-BERT

The first online catalogue for Arabic NLP datasets.

Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

A combination of autoregressors and autoencoders using XLNet for sentiment analysis

AI and Machine Learning workflows on Anthos Bare Metal.

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Rootski - Full codebase for rootski.io (without the data)

Pytorch implementation of Tacotron