The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

Last update: Nov 13, 2021

Related tags

Deep Learning coda

Overview

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

Overview

Code and dataset for The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color.

This repository is roughly split into 2 parts:

probing: The probing implementations, including code for generating CoDa.
mturk-survey: Instruction pages and used for crowdsourcing annotations.

How to use

Using CoDa

If you'd like to use CoDa, we highly recommend using the version hosted on the Huggingface Hub as it requires no additional dependencies.

from datasets import load_dataset

ds = load_dataset('corypaik/coda')

You can find more details about how to use Huggingface Datasets here.

Running experiments

This repository is developed and tested on linux systems and uses Bazel. If you are on other platforms, you might consider running Bazel in a docker container. If you'd like more guidance on this, please open an Issue on GitHub.

First, clone the project

# clone project
git clone https://github.com/nala-cub/coda

# goto project
cd coda

You can run the specific tasks as:

# run zeroshot
bazel run //projects/coda/probing/zeroshot
# representation probing
bazel run //projects/coda/probing/representations
# ngrams
bazel run //projects/coda/probing/ngram_stats
# generate dataset from annotations (relative to workspace root)
bazel run //projects/coda/probing/dataset:create_dataset -- \
  --coda_ds_export_dir=<export_dir>

To see help for any of the commands, use:

bazel run <target> -- --help
# for example:
# bazel run //projects/coda/probing/zeroshot -- --help

Annotation Instructions

Annotations were collected using an Angular app on Firebase. The included files contain all instructions, but not the app itself. If you're interested in the latter please open an issue on GitHub.

Citation

If this code was useful, please cite the paper:

@misc{paik2021world,
      title={The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color},
      author={Cory Paik and Stéphane Aroca-Ouellette and Alessandro Roncone and Katharina Kann},
      year={2021},
      eprint={2110.08182},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

CoDa is licensed under the Apache 2.0 license. The text of the license can be found here.

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

Related tags

Overview

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

Overview

How to use

Using CoDa

Running experiments

Annotation Instructions

Citation

License

Owner

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

Place holder for HOPE: a human-centric and task-oriented MT evaluation framework using professional post-editing

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Implementation of the GVP-Transformer, which was used in the paper "Learning inverse folding from millions of predicted structures" for de novo protein design alongside Alphafold2

Developing your First ML Workflow of the AWS Machine Learning Engineer Nanodegree Program

This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression problems

chen2020iros: Learning an Overlap-based Observation Model for 3D LiDAR Localization.

ParaGen is a PyTorch deep learning framework for parallel sequence generation

MassiveSumm: a very large-scale, very multilingual, news summarisation dataset

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

KaziText is a tool for modelling common human errors.

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.

Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

Facial detection, landmark tracking and expression transfer library for Windows, Linux and Mac

A "gym" style toolkit for building lightweight Neural Architecture Search systems

The official start-up code for paper "FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark."

Facial recognition project

Manim is an engine for precise programmatic animations, designed for creating explanatory math videos

Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.