Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Last update: Jan 12, 2022

Overview

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

The main part of the work focuses on the exploration and study of different approaches which are used for Sentiment Analysis (e.g. Bag of Words, TF-IDF, Word Embeddings). In addition, the work utilizes and compares different classification algorithms for Sentiment Analysis tasks in Natural Language Processing (e.g. Tree based Algorithms, Linear Models and Support Vector Machines).

Author: Nikolas Petrou, MSc in Data Science

Technical-Report and Code Availability

The complete text and analysis of the work is available and located in EDA-and-Sentiment-Analysis-on IMDB-Dataset.pdf file
The implementation and code of the project is located in the Implementation-Python Files folder.

Overview

The goal of this work focuses on the exploration and study of different approaches which are used for Sentiment Analysis (e.g. Bag of Words, TF-IDF, Word Embeddings). In addition, the work utilizes and compares different classification algorithms for Sentiment Analysis tasks in Natural Language Processing (e.g. Tree based Algorithms, Linear Models and Support Vector Machines).

Dataset

For this work, a large dataset which consists of movie reviews was used. Specifically, the publicly available Internet Movie Database (IMDB) review dataset

The data can be obtained from Kaggle or direcetly from Stanford

Methodology

An abstract methodology scheme of the work is illustrated in the following Figure.

Summarizing, firstly the initial questions were set in respect to the used dataset. Subsequentially, the data scrapping and data collection were performed. In addition, after the data preprocessing steps were performed, different data analytics and analysis were ,employed in order to better understand the data insights. Finally, during the final analysis, different methodologies and models were utilized in order to classify the textual data based on the sentiment. It is crucial to mention that the whole processed followed a cyclical scheme.

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Related tags

Overview

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Technical-Report and Code Availability

Overview

Dataset

Methodology

Owner

Nikolas Petrou

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

2021语言与智能技术竞赛：机器阅读理解任务

Utilities for preprocessing text for deep learning with Keras

Two-stage text summarization with BERT and BART

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

A natural language modeling framework based on PyTorch

Various capabilities for static malware analysis.

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

Implementation of Multistream Transformers in Pytorch

Toy example of an applied ML pipeline for me to experiment with MLOps tools.

nlp基础任务

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

An A-SOUL Text Generator Based on CPM-Distill.

✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

Uses Google's gTTS module to easily create robo text readin' on command.

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

Mkdocs + material + cool stuff