In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Last update: Jan 17, 2022

Overview

Making Emojis More Predictable

by Karan Abrol, Karanjot Singh and Pritish Wadhwa, Natural Language Processing (CSE546) under the guidance of Dr. Shad Akhtar from Indraprastha Institute of Information Technology, Delhi.

Introduction

The advent of social media platforms like WhatsApp, Facebook (Meta) and Twitter, etc. has changed natural language conversations forever. Emojis are small ideograms depicting objects, people, and scenes (Cappallo et al., 2015). Emojis are used to complement short text messages with a visual enhancement and have become a de-facto standard for online communication. Our aim is to predict a single emoji that appears in the input tweets.

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Project Pipeline Summary

We started off by collecting the data. The data was then thoroughly studied and preprocessed. Key features were also extracted at this stage. Due to computational restrictions, a subset of data was taken which was further divided into training, test- ing and validation split, such that the distribution of any class in any two sets were same. After this, various machine learning and deep learning models were applied on the data set and the results were generated and analysed.

Deployment

Emoji Prediction Website

Screenshots

Dataset

The data we used consists of a list of tweets associated with a single emoji, indexed by 20 labels for each of the 20 emojis. 5,00,000 Tweets by users in the United States, from October 2015 to Jan 2018, were retrieved using the Twitter API. The script for scraping this dataset was made available by the SemEval 2018 challenge. Due to computational limitations we merged the test and trial data, and further divided that into training, trial and test data with a split of 70:10:20. We maintained the label ratios for each emoji across the three sets to best reflect how frequently they are used in real life.

Models

Machine Learning Models:
- Logistic Regression
- K-Nearest Neighbours
- Stochastic Gradient Descent
- Random Forest Classifier
- Naive Bayes
- Adaboost Classifier
- Support Vector Machine
Deep Learning Models:
- RNN
- LSTM
- BiLSTM

Contact

For further queries feel free to reach out to following contributors.
Karan Abrol ([email protected])
Karanjot Singh ([email protected])
Pritish Wadhwa ([email protected])

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Related tags

Overview

Making Emojis More Predictable

Introduction

Project Pipeline Summary

Deployment

Screenshots

Dataset

Models

Contact

Final Report

Owner

Karanjot Singh

Fuzzy String Matching in Python

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Material for GW4SHM workshop, 16/03/2022.

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

spaCy plugin for Transformers , Udify, ELmo, etc.

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

Задания КЕГЭ по информатике 2021 на Python

German Text-To-Speech Engine using Tacotron and Griffin-Lim

A curated list of efficient attention modules

Natural language computational chemistry command line interface.

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Text-to-Speech for Belarusian language

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

AllenNLP integration for Shiba: Japanese CANINE model

Code for text augmentation method leveraging large-scale language models

My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.

NSFW A chatbot based on GPT2-chitchat

Host your own GPT-3 Discord bot

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Related tags

Overview

Making Emojis More Predictable

Introduction

Project Pipeline Summary

Deployment

Screenshots

Dataset

Models

Contact

Final Report

Owner

Karanjot Singh

Fuzzy String Matching in Python

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Material for GW4SHM workshop, 16/03/2022.

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

spaCy plugin for Transformers , Udify, ELmo, etc.

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

Задания КЕГЭ по информатике 2021 на Python

German Text-To-Speech Engine using Tacotron and Griffin-Lim

A curated list of efficient attention modules

Natural language computational chemistry command line interface.

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Text-to-Speech for Belarusian language

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

AllenNLP integration for Shiba: Japanese CANINE model

Code for text augmentation method leveraging large-scale language models

My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.

**NSFW** A chatbot based on GPT2-chitchat

Host your own GPT-3 Discord bot

NSFW A chatbot based on GPT2-chitchat