Data and code accompanying the paper Politics and Virality in the Time of Twitter

Last update: Jul 02, 2022

Overview

Politics and Virality in the Time of Twitter

Data and code accompanying the paper Politics and Virality in the Time of Twitter.

In specific:

the code used for the training of our models (./code/finetune_models.py and ./code/finetune_multi_cv.py)
a Jupyter Notebook containing the major parts of our analysis (./code/analysis.ipynb)
the model that was selected and used for the sentiment analysis.
the manually annotated data used for training are shared (./data/annotation/).
the ids of tweets that were used in our analyis and control experiments (./data/main/ & ./data/control)
names, parties and handles of the MPs that were tracked (./data/mps_list.csv).

Annotated Data (./data/annotation/)

One folder for each language (English, Spanish, Greek).
In each directory there are three files:
1. *_900.csv contains the 900 tweets that annotators labelled individually (300 tweets each annotator).
2. *_tiebreak_100.csv contains the initial 100 tweets all annotators labelled. 'annotator_3' indicates the annotator that was used as a tiebreaker.
3. *_combined.csv contains all tweets labelled for the language.

Model

While we plan to upload all the models trained for our experiments to huggingface.co, currently only the main model used in our analysis can be currently be find at: https://drive.google.com/file/d/1_Ngmh-uHGWEbKHFpKmQ1DhVf6LtDTglx/view?usp=sharing

The model, 'xlm-roberta-sentiment-multilingual', is based on the implementation of 'cardiffnlp/twitter-xlm-roberta-base-sentiment' while being further finetuned on the annotated dataset.

Example usage

from transformers import AutoModelForSequenceClassification, pipeline
model = AutoModelForSequenceClassification.from_pretrained('./xlm-roberta-sentiment-multilingual/')
sentiment_analysis_task = pipeline("sentiment-analysis", model=model, tokenizer="cardiffnlp/twitter-xlm-roberta-base-sentiment")

sentiment_analysis_task('Today is a good day')
Out: [{'label': 'Positive', 'score': 0.978614866733551}]

Reference paper

For more details, please check the reference paper. If you use the data contained in this repository for your research, please cite the paper using the following bib entry:

@inproceedings{antypas2022politics,
  title={{Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom}},
  author={Antypas, Dimosthenis and Preece, Alun and Camacho-Collados, Jose},
  booktitle={arXiv preprint arXiv:2202.00396},
  year={2022}
}

Data and code accompanying the paper Politics and Virality in the Time of Twitter

Related tags

Overview

Politics and Virality in the Time of Twitter

Annotated Data (./data/annotation/)

Model

Example usage

Reference paper

Owner

Cardiff NLP

Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

Create HTML profiling reports from pandas DataFrame objects

Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

🌍 Create 3d-printable STLs from satellite elevation data 🌏

Detailed analysis on fraud claims in insurance companies, gives you information as to why huge loss take place in insurance companies

CINECA molecular dynamics tutorial set

MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

Catalogue data - A Python Scripts to prepare catalogue data

Airflow ETL With EKS EFS Sagemaker

Learn machine learning the fun way, with Oracle and RedBull Racing

Python utility to extract differences between two pandas dataframes.

A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era.

Average time per match by division

PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j.

:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Performance analysis of predictive (alpha) stock factors

PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

A 2-dimensional physics engine written in Cairo