This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Last update: Dec 14, 2022

Overview

Feedback Prize - Evaluating Student Writing

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing. The competition can be found here: https://www.kaggle.com/competitions/feedback-prize-2021/

Datasets required

Download competition data from https://www.kaggle.com/competitions/feedback-prize-2021/data and extract it to ../input/feedback-prize-2021/
Download folds from https://www.kaggle.com/datasets/ubamba98/feedbackgroupshufflesplit1337 and extract it to ../input/feedbackgroupshufflesplit1337/groupshufflesplit_1337.p
Download and convert LSG Roberta from https://github.com/ccdv-ai/convert_checkpoint_to_lsg/blob/main/convert_roberta_checkpoint.py and place it in ../input/lsg-roberta-large

Use this command to convert roberta-large to LSG

$ python convert_roberta_checkpoint.py \
                        --initial_model roberta-large \
                        --model_name lsg-roberta-large \
                        --max_sequence_length 1536

Download fast tokenizer for Deberta V2/V3 from https://www.kaggle.com/datasets/nbroad/deberta-v2-3-fast-tokenizer and extract it to ../input/deberta-v2-3-fast-tokenizer/

Follow following instructions to manually add fast tokenizer to transformer library:

# The following is necessary if you want to use the fast tokenizer for deberta v2 or v3
# This must be done before importing transformers
import shutil
from pathlib import Path

# Path to installed transformer library
transformers_path = Path("/opt/conda/lib/python3.7/site-packages/transformers")

input_dir = Path("../input/deberta-v2-3-fast-tokenizer")

convert_file = input_dir / "convert_slow_tokenizer.py"
conversion_path = transformers_path/convert_file.name

if conversion_path.exists():
    conversion_path.unlink()

shutil.copy(convert_file, transformers_path)
deberta_v2_path = transformers_path / "models" / "deberta_v2"

for filename in ['tokenization_deberta_v2.py', 'tokenization_deberta_v2_fast.py']:
    filepath = deberta_v2_path/filename
    if filepath.exists():
        filepath.unlink()

    shutil.copy(input_dir/filename, filepath)

After this ../input directory should look something like this.

.
├── input
│   ├── feedback-prize-2021
│   │   ├── train/
│   │   ├── test/
│   │   ├── sample_submission.csv
│   │   └── train.csv
│   ├── lsg-roberta-large
│   │   ├── config.json
│   │   ├── merges.txt
│   │   ├── modeling.py
│   │   ├── pytorch_model.bin
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer.json
│   │   ├── tokenizer_config.json
│   │   └── vocab.json
│   ├── deberta-v2-3-fast-tokenizer
│   │   ├── convert_slow_tokenizer.py
│   │   ├── deberta__init__.py
│   │   ├── tokenization_auto.py
│   │   ├── tokenization_deberta_v2.py
│   │   ├── tokenization_deberta_v2_fast.py
│   │   └── transformers__init__.py
│   └── feedbackgroupshufflesplit1337
│       └── groupshufflesplit_1337.p

or you can change the DATA_BASE_DIR in SETTINGS.json to download the files in your desired location.

Models and Training

Deberta large, Deberta xlarge, Deberta v2 xlarge, Deberta v3 large, Funnel transformer large and BigBird are trained using trainer.py

Example:

$ python trainer.py --fold 0 --pretrained_model google/bigbird-roberta-large

where pretrained_model can be microsoft/deberta-large, microsoft/deberta-xlarge, microsoft/deberta-v2-xlarge, microsoft/deberta-v3-large, funnel-transformer/large or google/bigbird-roberta-large

Deberta large with LSTM head and jaccard loss is trained using debertabilstm_trainer.py

Example:

$ python debertabilstm_trainer.py --fold 0

Longformer large with LSTM head is trained using longformerwithbilstm_trainer.py

Example:

$ python longformerwithbilstm_trainer.py --fold 0

LSG Roberta is trained with lsgroberta_trainer.py

Example:

$ python lsgroberta_trainer.py --fold 0

YOSO is trained with yoso_trainer.py

Example:

$ python yoso_trainer.py --fold 0

Inference

After training all the models, the outputs were pushed to Kaggle Datasets.

And the final inference kernel can be found here: https://www.kaggle.com/code/cdeotte/2nd-place-solution-cv741-public727-private740?scriptVersionId=90301836

Solution writeup: https://www.kaggle.com/competitions/feedback-prize-2021/discussion/313389

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Related tags

Overview

Feedback Prize - Evaluating Student Writing

Datasets required

Models and Training

Inference

Owner

Udbhav Bamba

CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

Data Preparation, Processing, and Visualization for MoVi Data

Pytorch implementation of the paper "Class-Balanced Loss Based on Effective Number of Samples"

PushForKiCad - AISLER Push for KiCad EDA

A higher performance pytorch implementation of DeepLab V3 Plus(DeepLab v3+)

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning

SLAMP: Stochastic Latent Appearance and Motion Prediction

A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021.

PyTorch implementation of DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration (BMVC 2021)

The software associated with a paper accepted at EMNLP 2021 titled "Open Knowledge Graphs Canonicalization using Variational Autoencoders".

Assginment for UofT CSC420: Intro to Image Understanding

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

List of content farm sites like g.penzai.com.

Repository for RNNs using TensorFlow and Keras - LSTM and GRU Implementation from Scratch - Simple Classification and Regression Problem using RNNs

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Constraint-based geometry sketcher for blender

AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)