Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining

Overview

LMSOC: An Approach for Socially Sensitive Pretraining

Open All Collab Binder

image

Code for reproducing the paper LMSOC: An Approach for Socially Sensitive Pretraining to appear at 2021 Conference on Empirical Methods in Natural Language Processing: Findings.

Abstract

While large-scale pretrained language models have been shown to learn effective linguistic representations for many NLP tasks, there remain many real-world contextual aspects of language that current approaches do not capture. For instance, consider a cloze-test "I enjoyed the ____ game this weekend": the correct answer depends heavily on where the speaker is from, when the utterance occurred, and the speaker's broader social milieu and preferences. Although language depends heavily on the geographical, temporal, and other social contexts of the speaker, these elements have not been incorporated into modern transformer-based language models. We propose a simple but effective approach to incorporate speaker social context into the learned representations of large-scale language models. Our method first learns dense representations of social contexts using graph representation learning algorithms and then primes language model pretraining with these social context representations. We evaluate our approach on geographically-sensitive language-modeling tasks and show a substantial improvement (more than 100% relative lift on MRR) compared to baselines.

Citation

Please cite as:

Kulkarni, V., Mishra, S., & Haghighi, A. (2021). LMSOC: An Approach for Socially Sensitive Pretraining. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings. arXiv

@inproceedings{kulkarni2021lmsoc,
  title={LMSOC: An Approach for Socially Sensitive Pretraining},
  author={Kulkarni, Vivek and Mishra, Shubhanshu and Haghighi, Aria},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings},
  year={2021}
  address={Online},
  publisher={Association for Computational Linguistics},
  pages={1--9},
  eprint={2110.10319},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Reproducibility

NOTE: Dependencies are specified in the notebooks. But we have also encluded an requirements.txt and environment.yml files to install dependencies using pip or conda.

  • Create Social Context Embeddings via the example notebook embed_time_toy_task.ipynb which contains the implementation of how to embed time for Task 1 in the paper.
  • Upload the files in data/ to the location where you will run the next notebook.
  • The notebook lmsoc_train_and_eval_toy_task.ipynb contains the LMSOC training code.
    • NOTE: This notebook assumes you have already trained social context embeddings for the data you have (for example, here the social context is time).
    • It is a runnable colab notebook which demonstrates the entire process of training and evaluating LMSOC as described in the paper.
    • If run, it will reproduce the experimental setup for Task 1 and ultimately yield Figure 2.
    • In order to run this notebook in colab, open this notebook in Google Colab and upload the files in "data" directory to your colab workspace.

Security Issues?

Please report sensitive security issues via Twitter's bug-bounty program (https://hackerone.com/twitter) rather than GitHub.

Owner
Twitter Research
Twitter #opensource projects related to our published research
Twitter Research
Clustering with variational Bayes and population Monte Carlo

pypmc pypmc is a python package focusing on adaptive importance sampling. It can be used for integration and sampling from a user-defined target densi

45 Feb 06, 2022
Create Own QR code with Python

Create-Own-QR-code Create Own QR code with Python SO guys in here, you have to install pyqrcode 2. open CMD and type python -m pip install pyqrcode

JehanKandy 10 Jul 13, 2022
Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR"

Ziyue Feng 72 Dec 09, 2022
This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Ad^2Attack:Adaptive Adversarial Attack on Real-Time UAV Tracking Demo video 📹 Our video on bilibili demonstrates the test results of Ad^2Attack on se

Intelligent Vision for Robotics in Complex Environment 10 Nov 07, 2022
[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 96 Dec 10, 2022
Convert ONNX model graph to Keras model format.

Convert ONNX model graph to Keras model format.

Grigory Malivenko 175 Dec 28, 2022
PyTorch implementation of MLP-Mixer

PyTorch implementation of MLP-Mixer MLP-Mixer: an all-MLP architecture composed of alternate token-mixing and channel-mixing operations. The token-mix

Duo Li 33 Nov 27, 2022
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

7.7k Jan 03, 2023
Multi Agent Path Finding Algorithms

MATP-solver Simulator collision check path step random initial states or given states Traditional method Seperate A* algorithem Confict-based Search S

30 Dec 12, 2022
A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow.

ConvNeXt A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow. A FacebookResearch Implementation on A Conv

Raghvender 2 Feb 14, 2022
A visualization tool to show a TensorFlow's graph like TensorBoard

tfgraphviz tfgraphviz is a module to visualize a TensorFlow's data flow graph like TensorBoard using Graphviz. tfgraphviz enables to provide a visuali

44 Nov 09, 2022
Automatic Number Plate Recognition using Contours and Convolution Neural Networks (CNN)

Cite our paper if you find this project useful https://www.ijariit.com/manuscripts/v7i4/V7I4-1139.pdf Abstract Image processing technology is used in

Adithya M 2 Jun 28, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
Automated detection of anomalous exoplanet transits in light curve data.

Automatically detecting anomalous exoplanet transits This repository contains the source code for the paper "Automatically detecting anomalous exoplan

1 Feb 01, 2022
Pytorch implementation of various High Dynamic Range (HDR) Imaging algorithms

Deep High Dynamic Range Imaging Benchmark This repository is the pytorch impleme

Tianhong Dai 5 Nov 16, 2022
PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks

AttentionHTR PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks. Scene Text

Dmitrijs Kass 31 Dec 22, 2022
Open-Ended Commonsense Reasoning (NAACL 2021)

Open-Ended Commonsense Reasoning Quick links: [Paper] | [Video] | [Slides] | [Documentation] This is the repository of the paper, Differentiable Open-

(Bill) Yuchen Lin 31 Oct 19, 2022
CAST: Character labeling in Animation using Self-supervision by Tracking

CAST: Character labeling in Animation using Self-supervision by Tracking (Published as a conference paper at EuroGraphics 2022) Note: The CAST paper c

15 Nov 18, 2022
In this project, we create and implement a deep learning library from scratch.

ARA In this project, we create and implement a deep learning library from scratch. Table of Contents Deep Leaning Library Table of Contents About The

22 Aug 23, 2022
Visualization toolkit for neural networks in PyTorch! Demo -->

FlashTorch A Python visualization toolkit, built with PyTorch, for neural networks in PyTorch. Neural networks are often described as "black box". The

Misa Ogura 692 Dec 29, 2022