This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Last update: Jan 11, 2022

Related tags

Deep Learning text-representation

Overview

Introduction

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

If you find this code useful, please cite the following paper:

@article{tan2022coherence,
  title = {Coherence-Based Distributed Document Representation Learning for Scientific Documents},
  author = {Tan, Shicheng and Zhao, Shu and Zhang, Yanping},
  journal = {arXiv},
  year = {2022},
  type = {Journal Article}
}

Run

Installation environment (ref. requirements.txt)
Download data: Link: https://pan.baidu.com/s/1EEJk0_P55Ov5ReXsmyVZPA Password: rkh0
python _av_CTE.py

信息检索数据运行指南

数据处理（4个文件）：使用“...data helper-IR.py”获取3份数据，原始数据处理暂存文件、原始数据处理暂存文件的语料、构建的数据集，然后使用“_aj_get dataset corpus.py”获得构建的数据集的语料
词向量训练（4个文件）：使用“_ak_get word embedding.py”训练第一步的2个语料得到2个词表和2个词向量文件，glove需要去除后缀名“.txt”
运行5次“_al_em-avg.py”得到5个结果，avg-word2vec、avg-word2vec(globe)、avg-glove、avg-glove(globe)、random embedding
运行“_ac_tf-idf.py”得到一个距离矩阵和1个结果，矩阵用于CTE方法
LDA、doc2vec、BM25、LSI、GPT2、XLNet、GPT、Transformer-XL、XLM 对应文件各运行一次得到9个结果
运行“_ah_WMD.py”4次得到4个结果，WMD-word2vec、WMD-word2vec(globe)、WMD-glove、WMD-glove(globe)
运行“_at_BERT.py”2次得到2个结果，BERT-Large uncased、BERT-Large uncased(wwm)
运行“_at_ELMo.py”2次得到2个结果，ELMo-Original(5.5B)、ELMo-Original(5.5B,级联)
运行“_av_CET.py”13次得到13个结果，基于 random embedding 等13种基础词向量

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Related tags

Overview

Introduction

Run

信息检索数据运行指南

Owner

tsc

Scenic: A Jax Library for Computer Vision and Beyond

Most popular metrics used to evaluate object detection algorithms.

Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

Codebase for "ProtoAttend: Attention-Based Prototypical Learning."

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

LSTM Neural Networks for Spectroscopic Studies of Type Ia Supernovae

BuildingNet: Learning to Label 3D Buildings

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Realtime segmentation with ENet, the fast and accurate segmentation net.

[MedIA2021]MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Image Segmentation with U-Net Algorithm on Carvana Dataset using AWS Sagemaker

Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Space-invaders - Simple Game created using Python & PyGame, as my Beginner Python Project

CS583: Deep Learning

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

The official implementation of the IEEE S&P`22 paper "SoK: How Robust is Deep Neural Network Image Classification Watermarking".

A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes".

Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

PyTorch code for our paper "Gated Multiple Feedback Network for Image Super-Resolution" (BMVC2019)