CorNet Correlation Networks for Extreme Multi-label Text Classification

Last update: Dec 31, 2022

Related tags

Text Data & NLP CorNet

Overview

CorNet

Correlation Networks for Extreme Multi-label Text Classification

Prerequisites

python==3.6.3
pytorch==1.2.0
torchgpipe==0.0.5
click==7.0
ruamel.yaml==0.16.5
numpy==1.16.2
scipy==1.2.1
scikit-learn==0.20.3
gensim==3.7.2
nltk==3.2.4
tqdm==4.31.1
joblib==0.13.2
logzero==1.5.0

Datasets

Pretrained Word Embeddings in gensim format

GloVe embeddings (840B,300d)

Run

Preprocess (the EUR-Lex dataset is already tokenized in advance)

./scripts/preprocess_eurlex.sh

or (the other datasets need to be tokenized using NLTK)

./scripts/preprocess_others.sh

Train and evaluate

./scripts/run_models.sh

Baselines

The codes for the baseline models are adapted from the following repositories: XML-CNN, BERT, MeSHProbeNet, and AttentionXML.

Owner

Guangxu Xun

GitHub Repository

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

japanese-ebook-analysis This aim of this project is to make analysing the contents of a japanese ebook easy and streamline the process for non-technic

14 Jul 23, 2022

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments or blank li

11.9k Jan 08, 2023

Code for the paper "BERT Loses Patience: Fast and Robust Inference with Early Exit".

Patience-based Early Exit Code for the paper "BERT Loses Patience: Fast and Robust Inference with Early Exit". NEWS: We now have a better and tidier i

54 Jan 04, 2023

A simple word search made in python

Word Search Puzzle A simple word search made in python Usage $ python3 main.py -h usage: main.py [-h] [-c] [-f FILE] Generates a word s

16 Mar 10, 2022

Pretrain CPM - 大规模预训练语言模型的预训练代码

CPM-Pretrain 版本更新记录为了促进中文自然语言处理研究的发展，本项目提供了大规模预训练语言模型的预训练代码。项目主要基于DeepSpeed、Megatron实现，可以支持数据并行、模型加速、流水并行的代码。安装 1、首先安装pytorch等基础依赖，再安装APEX以支持fp16。 p

37 Dec 06, 2022

A library for end-to-end learning of embedding index and retrieval model

Poeem Poeem is a library for efficient approximate nearest neighbor (ANN) search, which has been widely adopted in industrial recommendation, advertis

54 Dec 21, 2022

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

364 Jan 06, 2023

p-tuning for few-shot NLU task

p-tuning_NLU Overview 这个小项目是受乐于分享的苏剑林大佬这篇p-tuning 文章启发，也实现了个使用P-tuning进行NLU分类的任务，思路是一样的，prompt实现方式有不同，这里是将[unused*]的embeddings参数抽取出用于初始化prompt_embed后

3 Dec 29, 2022

Generate a cool README/About me page for your Github Profile

Github Profile README/ About Me Generator 💯 This webapp lets you build a cool README for your profile. A few inputs + ~15 mins = Your Github Profile

179 Jan 07, 2023

Exploration of BERT-based models on twitter sentiment classifications

twitter-sentiment-analysis Explore the relationship between twitter sentiment of Tesla and its stock price/return. Explore the effect of different BER

2 Oct 02, 2022

Search msDS-AllowedToActOnBehalfOfOtherIdentity

前言现在进行RBCD的攻击手段主要是搜索mS-DS-CreatorSID，如果机器的创建者是我们可控的话，那就可以修改对应机器的msDS-AllowedToActOnBehalfOfOtherIdentity，利用工具SharpAllowedToAct-Modify 那我们索性也试试搜索所有计算机

26 Dec 05, 2022

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

A Python package implementing a new model for text classification with visualization tools for Explainable AI 🍣 Online live demos: http://tworld.io/s

285 Jan 02, 2023

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib

227 Jan 02, 2023

Calibre recipe to convert latest issue of Analyse & Kritik into an ebook

Calibre Recipe für "Analyse & Kritik" Dies ist ein "Recipe" für die Konvertierung der aktuellen Ausgabe der Zeitung Analyse & Kritik in ein Ebook. Es

3 Jan 04, 2022

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

End-to-end neural table-text understanding models.

914 Jan 07, 2023

IEEEXtreme15.0 Questions And Answers

IEEEXtreme15.0 Questions And Answers IEEEXtreme is a global challenge in which teams of IEEE Student members – advised and proctored by an IEEE member

15 Oct 24, 2022

A collection of GNN-based fake news detection models.

This repo includes the Pytorch-Geometric implementation of a series of Graph Neural Network (GNN) based fake news detection models. All GNN models are implemented and evaluated under the User Prefere

251 Jan 01, 2023

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

90 Dec 27, 2022

Minimal GUI for accessing the Watson Text to Speech service.

Description Minimal graphical application for accessing the Watson Text to Speech service. Requirements Python 3 plus all dependencies listed in requi

1 Oct 22, 2021

ASCEND Chinese-English code-switching dataset

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong.

11 Dec 09, 2022