2021海华AI挑战赛·中文阅读理解·技术组·第三名

Overview

海华中文阅读理解比赛

队名:ATTOY排名:第三名

赛题背景

https://www.biendata.xyz/competition/haihua_2021

文字是人类用以记录和表达的最基本工具,也是信息传播的重要媒介。透过文字与符号,我们可以追寻人类文明的起源,可以传播知识与经验,读懂文字是认识与了解的第一步。对于人工智能而言,它的核心问题之一就是认知,而认知的核心则是语义理解。

机器阅读理解(Machine Reading Comprehension)是自然语言处理和人工智能领域的前沿课题,对于使机器拥有认知能力、提升机器智能水平具有重要价值,拥有广阔的应用前景。机器的阅读理解是让机器阅读文本,然后回答与阅读内容相关的问题,体现的是人工智能对文本信息获取、理解和挖掘的能力,在对话、搜索、问答、同声传译等领域,机器阅读理解可以产生的现实价值正在日益凸显,长远的目标则是能够为各行各业提供解决方案。

《2021海华AI挑战赛·中文阅读理解》大赛由中关村海华信息技术前沿研究院与清华大学交叉信息研究院联合主办,腾讯云计算协办。共设置题库16000条数据,总奖金池30万元,且腾讯云计算为中学组赛道提供独家算力资源支持。

本次比赛的数据来自小学/中高考语文阅读理解题库(其中,技术组的数据主要为中高考语文试题,中学组的数据主要来自小学语文试题)。相较于英文,中文阅读理解有着更多的歧义性和多义性,然而璀璨的中华文明得以绵延数千年,离不开每一个时代里努力钻研、坚守传承的人,这也正是本次大赛的魅力与挑战,让机器读懂文字,让机器学习文明。秉承着人才培养的初心,我们继续保留针对中学组以及技术组的两条平行赛道,科技创新,时代有我,期待你们的回响。

比赛任务

本次比赛技术组的数据来自中高考语文阅读理解题库。每条数据都包括一篇文章,至少一个问题和多个候选选项。参赛选手需要搭建模型,从候选选项中选出正确的一个。

2021海华AI挑战赛·中文阅读理解·技术组 第三名(ATTOY团队)解决方案

算法方案

1.预训练模型:MacBERT-Large

2.对抗训练

FreeLB ICLR 2020

3.知识蒸馏

Born Again Neural Networks ICML 2018

环境要求

tqdm==4.50.2 numpy==1.19.2 pandas==1.1.3 transformers==3.5.1 torch==1.7.0+cu110 scikit_learn==0.24.2

运行方法

bash bash.sh

超参数

FreeLB训练参数配置

'fold_num': 4, 
'seed': 42,
'model': 'hfl/chinese-macbert-large', 
'max_len': 512, 
'epochs': 12,
'train_bs': 4, 
'valid_bs': 4,
'lr': 2e-5,  
'lrSelf': 1e-4,  
'accum_iter': 8, 
'weight_decay': 1e-4, 
'adv_lr': 0.01,
'adv_norm_type': 'l2',
'adv_init_mag': 0.03,
'adv_max_norm': 1.0,
'ip': 2

EKD训练参数配置

'fold_num': 4, 
'seed': 42,
'model': 'hfl/chinese-macbert-large', 
'max_len': 256, 
'epochs': 12,
'train_bs': 4, 
'valid_bs': 4,
'lr': 2e-5,  
'lrSelf': 1e-4,  
'accum_iter': 8, 
'weight_decay': 1e-4, 
'adv_lr': 0.01,
'adv_norm_type': 'l2',
'adv_init_mag': 0.03,
'adv_max_norm': 1.0,
'ip': 2
TTS is a library for advanced Text-to-Speech generation.

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretra

Mozilla 6.5k Jan 08, 2023
PyTorch implementation of Tacotron speech synthesis model.

tacotron_pytorch PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron. Currently not as much good speech quality

Ryuichi Yamamoto 279 Dec 09, 2022
Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

eBook Reader Dictionaries Finally, decent dictionaries based on Wiktionary for your beloved eBook reader. Dictionaries Catalan 🚧 Ελληνικά (help welco

Mickaël Schoentgen 163 Dec 31, 2022
Multi Task Vision and Language

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-

Meta Research 711 Jan 08, 2023
Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

16 Oct 08, 2022
Resources for "Natural Language Processing" Coursera course.

Natural Language Processing course resources This github contains practical assignments for Natural Language Processing course by Higher School of Eco

Advanced Machine Learning specialisation by HSE 1.1k Jan 01, 2023
Simple Text-To-Speech Bot For Discord

Simple Text-To-Speech Bot For Discord This is a very simple TTS bot for discord made with python. For this bot you need FFMPEG, see installation to se

1 Sep 26, 2022
Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

BiQQLSTM_HS Code and data for paper: Title: An Effective, Robust and Fairness-awareHate Speech Detection Framework. Authors: Guanyi Mou and Kyumin Lee

Guanyi Mou 2 Dec 27, 2022
Machine learning classifiers to predict American Sign Language .

ASL-Classifiers American Sign Language (ASL) is a natural language that serves as the predominant sign language of Deaf communities in the United Stat

Tarek idrees 0 Feb 08, 2022
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

RAMI ALRFOU 2.1k Jan 07, 2023
Semi-automated vocabulary generation from semantic vector models

vec2word Semi-automated vocabulary generation from semantic vector models This script generates a list of potential conlang word forms along with asso

9 Nov 25, 2022
precise iris segmentation

PI-DECODER Introduction PI-DECODER, a decoder structure designed for Precise Iris Segmentation and Location. The decoder structure is shown below: Ple

8 Aug 08, 2022
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

deepset 1.6k Dec 27, 2022
BERN2: an advanced neural biomedical namedentity recognition and normalization tool

BERN2 We present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by

DMIS Laboratory - Korea University 99 Jan 06, 2023
To be a next-generation DL-based phenotype prediction from genome mutations.

Sequence -----------+-- 3D_structure -- 3D_module --+ +-- ? | |

Eric Alcaide 18 Jan 11, 2022
Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 03, 2023
SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

SNCSE SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples This is the repository for SNCSE. SNCSE aims to allev

Sense-GVT 59 Jan 02, 2023
Natural Language Processing Specialization

Natural Language Processing Specialization In this folder, Natural Language Processing Specialization projects and notes can be found. WHAT I LEARNED

Kaan BOKE 3 Oct 06, 2022
基于GRU网络的句子判断程序/A program based on GRU network for judging sentences

SentencesJudger SentencesJudger 是一个基于GRU神经网络的句子判断程序,基本的功能是判断文章中的某一句话是否为一个优美的句子。 English 如何使用SentencesJudger 确认Python运行环境 安装pyTorch与LTP python3 -m pip

8 Mar 24, 2022
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the c

Google Research 457 Dec 23, 2022