Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Last update: Dec 30, 2022

Overview

Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
metrics of uncertainty, consistency, and agreement with aggregate
loaders for popular crowdsourced datasets

The library is currently in a heavy development state, and interfaces are subject to change.

Installing

Installing Crowd-Kit is as easy as pip install crowd-kit

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, performer, label. Alternatively, you can download an example dataset.

df = pd.read_csv('results.csv')  # should contain columns: task, performer, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then you can aggregate the performer responses as easily as in scikit-learn:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available ( ✅ ) and in progress ( 🟡 ).

Categorical Responses

Method	Status
Majority Vote	✅
Dawid-Skene	✅
Gold Majority Vote	✅
M-MSR	✅
Wawa	✅
Zero-Based Skill	✅
GLAD	✅
BCC	🟡

Textual Responses

Method	Status
RASA	✅
HRRASA	✅
ROVER	✅

Image Segmentation

Method	Status
Segmentation MV	✅
Segmentation RASA	✅
Segmentation EM	✅

Pairwise Comparisons

Method	Status
Bradley-Terry	✅
Noisy Bradley-Terry	✅

Citation

Ustalov D., Pavlichenko N., Losev V., Giliazev I., and Tulin E. A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python. The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track. HCOMP 2021. 2021. arXiv: 2109.08584 [cs.HC].

@inproceedings{HCOMP2021/CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Losev, Vladimir and Giliazev, Iulian and Tulin, Evgeny},
  title     = {{A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python}},
  year      = {2021},
  booktitle = {The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track},
  series    = {HCOMP~2021},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://www.humancomputation.com/assets/wips_demos/HCOMP_2021_paper_85.pdf},
  language  = {english},
}

Questions and Bug Reports

For reporting bugs please use the Toloka/bugreport page.
Join our English-speaking slack community for both tech and abstract questions.

License

Comments

Crowd-Kit Learning

This is just an example of what this subpackage will contain.

We need to configure setup.cfg and add new tests. Here I suggest to discuss the concept.

opened by pilot7747 10
Fix the documentation generation issues
Stick to YAML files hosted in https://github.com/Toloka/docs and use the proper includes.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[x] My change requires a change to the documentation.

[x] I have updated the documentation accordingly.

[ ] I have added tests to cover my changes.

[ ] All new and existing tests passed.

documentation enhancement
opened by dustalov 9
Add MACE

Is it possible that you add MACE ? It is often used in my field but there is only a Java implementation that is hard to integrate into Python projects.
enhancement good first issue

opened by jcklie 4
Add MACE aggregation model
I have added the MACE aggregation model. https://www.cs.cmu.edu/~hovy/papers/13HLT-MACE.pdf

Description

Based on the original VB inference implementation, I wrote it in Python.

Connected issues (if any)

https://github.com/Toloka/crowd-kit/issues/5

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[ ] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.
opened by pilot7747 3
Documentation updates
Updated index.md and the Classification section:

added extra information to the models descriptions;

added descriptions for parameters;

fixed error and typos in descriptions.
opened by Natalyl3 2
Binary Relevance aggregation
Description

I have added code for Binary Relevance aggregation - simple method for multi-label classification. This approach treats each label as a class in binary classification task and aggregates it separately.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[ ] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.
opened by denaxen 2
Use mypy --strict
Description

This pull request enforces a stricter set of mypy type checks by enabling the strict mode. It also fixes several type inconsistencies. As the NumPy type annotations were introduced in version 1.20 (January 2021), some Crowd-Kit installations might broke, but I believe it is a worthy contribution.

Connected issues (if any)

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[x] Breaking change (fix or feature that would cause existing functionality to change)

[ ] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.

enhancement
opened by dustalov 2
Run Jupyter notebooks with tests
Description

This pull request runs the Jupyter notebooks with examples on the current version of Crowd-Kit with the rest of the test suite on GitHub Actions.

Connected issues (if any)

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.

enhancement good first issue
opened by dustalov 2
Dramatically improve the code maintainability
This pull request is probably the best thing that could happen to Crowd-Kit code maintainability.

Description

In this pull request, we switch from unnecessarily verbose Python stub files to more convenient inline type annotations. During this, many type annotations were fixed. We also removed the manage_docstring decorator and the corresponding utility functions.

I think this change might break the documentation generation process. We will release a new version of Crowd-Kit only after this is fixed.

Connected issues (if any)

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[x] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.

bug documentation enhancement
opened by dustalov 2
Add header and LM-based aggregation item
Description

This pull request makes README.md nicer. It adds the missing language model-based textual aggregation method.

Connected issues (if any)

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have added tests to cover my changes.

[x] All new and existing tests passed.

documentation
opened by dustalov 2
Renamed columns?

Hi, the guide says

df = pd.read_csv('results.csv') # should contain columns: task, performer, label

but when I load this file, then the second column is worker and not performer. I had used crowdkit with dataframes that had columns: task, performer, label, but after an update, it broke.

opened by jcklie 2
Ordinal Labels
Is it possible to support aggregation of ordinal labels as a part of this toolkit via this reduction algorithm.

Labels are categorical but have an ordering defined 1 < ... < K.

The K class ordinal labels are transformed into K−1 binary class label data.

Each of the binary task is then aggregated via crowdkit to estimate Pr[yi > c] for c = 1,...,K −1.

The probability of the actual class values can then be obtained as Pr[yi = c] = Pr[yi > c−1 and yi ≤ c] = Pr[yi > c−1]−Pr[yi > c].

The class with the maximum probability is assigned to the instance

enhancement
opened by vikasraykar 2

Releases(v1.2.0)

v1.2.0(Dec 14, 2022)
Crowd-Kit Learning subpackage introducing implementations of deep learning from crowds methods: CoNAL and CrowdLayer

Added Multi-Binary aggregation

Source code(tar.gz)
Source code(zip)
v1.2.0.rc1(Dec 13, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0(Sep 27, 2022)
New aggregation methods: One-Coin Dawid Skene, MACE, and KOS

Fixed bugs in Dawid-Skene implementation

Improved maintainability by removing stub files

Switched to setup.cfg from setup.py

Source code(tar.gz)
Source code(zip)
v1.1.0.rc4(Sep 26, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0.rc3(Sep 23, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0.rc2(Jul 28, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0.rc1(Jul 28, 2022)

Source code(tar.gz)
Source code(zip)
v1.0.0(Mar 22, 2022)
Not a backward-compatible change:

Replaced all mentions of "performer" with "worker". This change is not backward compatible because parameters names and DataFrame/Series columns are also affected.

Improvements:

GoldMajorityVote true_labels argument now supports multiple ground truth values for a single task.

Added tol optimization parameter as a tolerance stopping criteria for iterative methods with a variable number of steps.

Python 3.10 support added.

Enhanced aggregation methods descriptions.

Source code(tar.gz)
Source code(zip)
v0.0.9(Nov 30, 2021)
Added TextSummarization aggregation

Added new datasets

Added entropy_threshold method

Added names for pd.Series which are available after fit

Added on_missing_skill and default_skill params for models that use skills

Source code(tar.gz)
Source code(zip)
v0.0.8(Oct 14, 2021)
Added GLAD aggregeation

Fixed https://github.com/Toloka/crowd-kit/issues/6

Fixed https://github.com/Toloka/crowd-kit/issues/3

Source code(tar.gz)
Source code(zip)
v0.0.7(Sep 2, 2021)
Added segmentation EM

Added ROVER

Fixed HRRASA and refactored TextRASA and TextHRRASA

Source code(tar.gz)
Source code(zip)
v0.0.6(Aug 18, 2021)

crowd-kit==0.0.6 release
Source code(tar.gz)
Source code(zip)
v0.0.5(Jul 18, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.4(May 19, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.3(Apr 12, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.2(Apr 7, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.1(Mar 2, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Toloka

Data labeling platform for ML

GitHub Repository

Swapping face using Face Mesh with TensorFlow Lite

17 Apr 26, 2022

PyTorch code for the ICCV'21 paper: "Always Be Dreaming: A New Approach for Class-Incremental Learning"

Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning PyTorch code for the ICCV 2021 paper: Always Be Dreaming: A New Approach f

49 Dec 21, 2022

Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"

KnowPrompt Code and datasets for our paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction" Requireme

137 Dec 31, 2022

FwordCTF 2021 Infrastructure and Source code of Web/Bash challenges

FwordCTF 2021 You can find here the source code of the challenges I wrote (Web and Bash) in FwordCTF 2021 and the source code of the platform with our

5 Nov 25, 2022

Automatic voice-synthetised summaries of latest research papers on arXiv

PaperWhisperer PaperWhisperer is a Python application that keeps you up-to-date with research papers. How? It retrieves the latest articles from arXiv

124 Dec 20, 2022

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

CMUA-Watermark The official code for CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022) arxiv. It is bas

50 Nov 26, 2022

Baseline of DCASE 2020 task 4

Couple Learning for SED This repository provides the data and source code for sound event detection (SED) task. The improvement of the Couple Learning

21 Oct 18, 2022

IOT: Instance-wise Layer Reordering for Transformer Structures

Introduction This repository contains the code for Instance-wise Ordered Transformer (IOT), which is introduced in the ICLR2021 paper IOT: Instance-wi

19 Nov 15, 2022

Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication"

NFFT4ANOVA Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication" This package uses th

1 Aug 10, 2022

3D-Reconstruction 基于深度学习方法的单目多视图三维重建

基于深度学习方法的单目多视图三维重建 Part I 三维重建代码：Part1 技术文档：[Markdown] [PDF] 原始图像：Original Images 点云结果：Point Cloud Results-1

19 Dec 26, 2022

Like a cowsay but without cows!

Foxsay This is a simple program that generates pictures of a cute fox with a message. It is like a cowsay but without cows! Fox girls are better! Usag

28 Feb 20, 2022

Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.

Accompanying code for the paper Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.

6 Dec 01, 2022

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

105 Jan 03, 2023

Live Hand Tracking Using Python

Live-Hand-Tracking-Using-Python Project Description: In this project, we will be

2 Jan 06, 2022

This is a Pytorch implementation of paper: DropEdge: Towards Deep Graph Convolutional Networks on Node Classification

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification This is a Pytorch implementation of paper: DropEdge: Towards Deep Graph Con

401 Dec 16, 2022

Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

ONNX-HybridNets-Multitask-Road-Detection Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONN

45 Jan 01, 2023

[CVPR 2020] Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Contents Local and Global GAN Cross-View Image Translation Semantic Image Synthesis Acknowledgments Related Projects Citation Contributions Collaborat

131 Dec 07, 2022

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Related tags

Overview

Crowd-Kit: Computational Quality Control for Crowdsourcing

Installing

Getting Started

Implemented Aggregation Methods

Categorical Responses

Textual Responses

Image Segmentation

Pairwise Comparisons

Citation

Questions and Bug Reports

License

Comments

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Releases(v1.2.0)

v1.2.0(Dec 14, 2022)

v1.2.0.rc1(Dec 13, 2022)

v1.1.0(Sep 27, 2022)

v1.1.0.rc4(Sep 26, 2022)

v1.1.0.rc3(Sep 23, 2022)

v1.1.0.rc2(Jul 28, 2022)

v1.1.0.rc1(Jul 28, 2022)

v1.0.0(Mar 22, 2022)

v0.0.9(Nov 30, 2021)

v0.0.8(Oct 14, 2021)

v0.0.7(Sep 2, 2021)

v0.0.6(Aug 18, 2021)

v0.0.5(Jul 18, 2021)

v0.0.4(May 19, 2021)

v0.0.3(Apr 12, 2021)

v0.0.2(Apr 7, 2021)

v0.0.1(Mar 2, 2021)

Owner

Toloka

Swapping face using Face Mesh with TensorFlow Lite

PyTorch code for the ICCV'21 paper: "Always Be Dreaming: A New Approach for Class-Incremental Learning"

Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"

FwordCTF 2021 Infrastructure and Source code of Web/Bash challenges

Automatic voice-synthetised summaries of latest research papers on arXiv

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

Baseline of DCASE 2020 task 4

IOT: Instance-wise Layer Reordering for Transformer Structures

Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication"

3D-Reconstruction 基于深度学习方法的单目多视图三维重建

Like a cowsay but without cows!

Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Live Hand Tracking Using Python

This is a Pytorch implementation of paper: DropEdge: Towards Deep Graph Convolutional Networks on Node Classification

Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

[CVPR 2020] Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Demo project for real time anomaly detection using kafka and python

Kaggle Ultrasound Nerve Segmentation competition [Keras]

codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)