pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Last update: Dec 29, 2022

Overview

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Transformer-based library for SocialNLP classification tasks.

Currently supports:

Sentiment Analysis (Spanish, English)
Emotion Analysis (Spanish, English)

Just do pip install pysentimiento and start using it:

from pysentimiento import SentimentAnalyzer
analyzer = SentimentAnalyzer(lang="es")

analyzer.predict("Qué gran jugador es Messi")
# returns SentimentOutput(output=POS, probas={POS: 0.998, NEG: 0.002, NEU: 0.000})
analyzer.predict("Esto es pésimo")
# returns SentimentOutput(output=NEG, probas={NEG: 0.999, POS: 0.001, NEU: 0.000})
analyzer.predict("Qué es esto?")
# returns SentimentOutput(output=NEU, probas={NEU: 0.993, NEG: 0.005, POS: 0.002})

analyzer.predict("jejeje no te creo mucho")
# SentimentOutput(output=NEG, probas={NEG: 0.587, NEU: 0.408, POS: 0.005})
"""
Emotion Analysis in English
"""

emotion_analyzer = EmotionAnalyzer(lang="en")

emotion_analyzer.predict("yayyy")
# returns EmotionOutput(output=joy, probas={joy: 0.723, others: 0.198, surprise: 0.038, disgust: 0.011, sadness: 0.011, fear: 0.010, anger: 0.009})
emotion_analyzer.predict("fuck off")
# returns EmotionOutput(output=anger, probas={anger: 0.798, surprise: 0.055, fear: 0.040, disgust: 0.036, joy: 0.028, others: 0.023, sadness: 0.019})

Also, you might use pretrained models directly with transformers library.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("finiteautomata/beto-sentiment-analysis")

model = AutoModelForSequenceClassification.from_pretrained("finiteautomata/beto-sentiment-analysis")

Preprocessing

pysentimiento features a tweet preprocessor specially suited for tweet classification with transformer-based models.

from pysentimiento.preprocessing import preprocess_tweet

# Replaces user handles and URLs by special tokens
preprocess_tweet("@perezjotaeme debería cambiar esto http://bit.ly/sarasa") # "@usuario debería cambiar esto url"

# Shortens repeated characters
preprocess_tweet("no entiendo naaaaaaaadaaaaaaaa", shorten=2) # "no entiendo naadaa"

# Normalizes laughters
preprocess_tweet("jajajajaajjajaajajaja no lo puedo creer ajajaj") # "jaja no lo puedo creer jaja"

# Handles hashtags
preprocess_tweet("esto es #UnaGenialidad")
# "esto es una genialidad"

# Handles emojis
preprocess_tweet("🎉🎉", lang="en")
# 'emoji party popper emoji emoji party popper emoji'

Trained models so far

Check CLASSIFIERS.md for details on the reported performances of each model.

Spanish models

English models

Instructions for developers

First, download TASS 2020 data to data/tass2020 (you have to register here to download the dataset)

Labels must be placed under data/tass2020/test1.1/labels

Run script to train models

Check TRAIN_EVALUATE.md

Upload models to Huggingface's Model Hub

Check "Model sharing and upload" instructions in huggingface docs.

License

pysentimiento is an open-source library. However, please be aware that models are trained with third-party datasets and are subject to their respective licenses, many of which are for non-commercial use

TASS Dataset license (License for Sentiment Analysis in Spanish, Emotion Analysis in Spanish & English)
SEMEval 2017 Dataset license (Sentiment Analysis in English)

Citation

If you use pysentimiento in your work, please cite this paper

@misc{perez2021pysentimiento,
      title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
      author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque},
      year={2021},
      eprint={2106.09462},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

TODO:

Upload some other models
Train in other languages

Suggestions and bugfixes

Please use the repository issue tracker to point out bugs and make suggestions (new models, use another datasets, some other languages, etc)

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Related tags

Overview

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Preprocessing

Trained models so far

Spanish models

English models

Instructions for developers

License

Citation

TODO:

Suggestions and bugfixes

Owner

This project converts your human voice input to its text transcript and to an automated voice too.

Speach Recognitions

A framework for implementing federated learning

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

An open source framework for seq2seq models in PyTorch.

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Binaural Speech Synthesis

A demo for end-to-end English and Chinese text spotting using ABCNet.

Lumped-element impedance calculator and frequency-domain plotter.

Index different CKAN entities in Solr, not just datasets

Problem: Given a nepali news find the category of the news

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

A paper list for aspect based sentiment analysis.

L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

Guide to using pre-trained large language models of source code

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Related tags

Overview

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Preprocessing

Trained models so far

Spanish models

English models

Instructions for developers

License

Citation

TODO:

Suggestions and bugfixes

Owner

This project converts your human voice input to its text transcript and to an automated voice too.

Speach Recognitions

A framework for implementing federated learning

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

An open source framework for seq2seq models in PyTorch.

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Binaural Speech Synthesis

A demo for end-to-end English and Chinese text spotting using ABCNet.

Lumped-element impedance calculator and frequency-domain plotter.

Index different CKAN entities in Solr, not just datasets

Problem: Given a nepali news find the category of the news

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

A paper list for aspect based sentiment analysis.

L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

Guide to using pre-trained large language models of source code

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。