AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

Last update: Jun 08, 2022

Overview

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

Description

This repository contains the code for the paper Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models. This paper will be published at the SEPLN-WS-IberLEF 2021 (the 3rd Workshop on Iberian Languages Evaluation Forum at the SEPLN 2021 Conference) scientific event. Descriptions of the implementation and the dataset are contained in the paper (link: Paper is soon...).

Paper Abstract

The popularity of social media has created problems such as hate speech and sexism. The identification and classification of sexism in social media are very relevant tasks, as they would allow building a healthier social environment. Nevertheless, these tasks are considerably challenging. This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish. It was conducted in the context of the sEXism Identification in Social neTworks shared 2021 (EXIST 2021) task, proposed by the Iberian Languages Evaluation Forum (IberLEF). The proposed system and its main components are described, and an in-depth hyperparameters analysis is conducted. The main results observed were: (i) the system obtained better results than the baseline model (multilingual BERT); (ii) ensemble models obtained better results than monolingual models; and (iii) the E6 model (ensemble model considering all individual models and the best standardized values) obtained the best accuracies and F1-scores for both tasks. This work obtained first place in both tasks at EXIST, with the highest accuracies (0.780 for task 1 and 0.658 for task 2) and F1-scores (F1-binary of 0.780 for task 1 and F1-macro of 0.579 for task 2).

Credits

EXIST shared Task Organizers

Task website: http://nlp.uned.es/exist2021/

Contact: [email protected]

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

Related tags

Overview

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

Description

Paper Abstract

Credits

Owner

Angel de Paula

Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

Tooling for GANs in TensorFlow

This repo contains the code required to train the multivariate time-series Transformer.

Code for training and evaluation of the model from "Language Generation with Recurrent Generative Adversarial Networks without Pre-training"

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

PyTorch implementation of "Simple and Deep Graph Convolutional Networks"

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Source code for the paper "SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text" PACLIC 2021

Code for "Diffusion is All You Need for Learning on Surfaces"

Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

Details about the wide minima density hypothesis and metrics to compute width of a minima

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

It is the assignment for COMP 576 in Rice University

Peek-a-Boo: What (More) is Disguised in a Randomly Weighted Neural Network, and How to Find It Efficiently

Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Implementation of paper "DeepTag: A General Framework for Fiducial Marker Design and Detection"

This repo contains the source code and a benchmark for predicting user's utilities with Machine Learning techniques for Computational Persuasion

Fast, flexible and easy to use probabilistic modelling in Python.

Code release for Universal Domain Adaptation(CVPR 2019)

Lecture materials for Cornell CS5785 Applied Machine Learning (Fall 2021)