Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

Related tags

Computer VisionCSCBLI
Overview

CSCBLI

Code for our ACL Findings 2021 paper,
"Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction".

Requirements

python >= 3.6
numpy >= 1.9.0
pytorch >= 1.0

Supervised

How to train

CUDA_VISIBLE_DEVICES=0 python train.py --src_lang $lg --tgt_lang en\
        --static_src_emb_path $ssemb --static_tgt_emb_path $stemb\
        --context_src_emb_path $csemb --context_tgt_emb_path $ctemb\
        --train_data_path $data_path --save_path $save_path
--static_src_emb_path   aligned source static embedding path 
--static_tgt_emb_path   aligned target static embedding path
--context_src_emb_path  source context embedding path
--context_tgt_emb_path  target context embedding path

How to Test

CUDA_VISIBLE_DEVICES=0 python test_on_all_word.py --src_lang $lg\
        --tgt_lang en --model_path $model_path\
        --dict_path $dict_path\
        --vecmap_context_src_emb_path $vcpath\
        --vecmap_context_tgt_emb_path $vspath\
        --vecmap
--vecmap_context_src_emb_path aligned source context embedding path
--vecmap_context_tgt_emb_path aligned target context embedding path
--vecmap use interpolation method, else unified method

Unsupervised

How to train

lg=ar
CUDA_VISIBLE_DEVICES=0 python train.py --src_lang en --tgt_lang $lg\
  --static_src_emb_path $ssemb --static_tgt_emb_path $stemb\
  --context_src_emb_path $csemb --context_tgt_emb_path $ctemb\
   --save_path $save_path 
--static_src_emb_path   aligned source static embedding path 
--static_tgt_emb_path   aligned target static embedding path
--context_src_emb_path  source context embedding path
--context_tgt_emb_path  target context embedding path

How to Test

src=ar
tgt=en
model_path=../checkpoints/$src-$tgt-add_orign_nw.pkl_last
CUDA_VISIBLE_DEVICES=0 python test.py  --model_path $model_path \
        --dict_path ../$src-$tgt.5000-6500.txt  --mode v2 \
        --src_lang $src --tgt_lang $tgt  \
        --reload_src_ctx   $path1 \
        --reload_tgt_ctx   $path2 --lambda_w1 0.11
--mode type    use v1 for unified method and v2 for interpolated 
--lambda_w1    the weight for interpolation
--reload_src_ctx   aligned source context embedding
--reload_tgt_ctx   aligned targte context embedding
Owner
Jinpeng Zhang
Jinpeng Zhang
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

Pascal Fischer 178 Dec 27, 2022
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 06, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 01, 2023
TextBoxes re-implement using tensorflow

TextBoxes-TensorFlow TextBoxes re-implementation using tensorflow. This project is greatly inspired by slim project And many functions are modified ba

Gu Xiaodong 44 Dec 29, 2022
learn how to use Gesture Control to change the volume of a computer

Volume-Control-using-gesture In this project we are going to learn how to use Gesture Control to change the volume of a computer. We first look into h

Diwas Pandey 49 Sep 22, 2022
Automatically download multiple papers by keywords in CVPR

CVFPaperHelper Automatically download multiple papers by keywords in CVPR Install mkdir PapersToRead cd PaperToRead pip install requests tqdm git clon

46 Jun 08, 2022
This can be use to convert text in a file to handwritten text.

TextToHandwriting This can be used to convert text to handwriting. Clone this project or download the code. Run TextToImage.py give the filename of th

Ashutosh Mahapatra 2 Feb 06, 2022
A community-supported supercharged version of paperless: scan, index and archive all your physical documents

Paperless-ngx Paperless-ngx is a document management system that transforms your physical documents into a searchable online archive so you can keep,

5.2k Jan 04, 2023
A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

The project is based on older versions of tesseract and other tools, and is now superseded by another project which allows for more granular control o

Maxim 32 Jul 24, 2022
一键翻译各类图片内文字

一键翻译各类图片内文字 针对群内、各个图站上大量不太可能会有人去翻译的图片设计,让我这种日语小白能够勉强看懂图片 主要支持日语,不过也能识别汉语和小写英文 支持简单的涂白和嵌字

574 Dec 28, 2022
OpenCV-Erlang/Elixir bindings

evision [WIP] : OS : arch Build Status Ubuntu 20.04 arm64 Ubuntu 20.04 armv7 Ubuntu 20.04 s390x Ubuntu 20.04 ppc64le Ubuntu 20.04 x86_64 macOS 11 Big

Cocoa 194 Jan 05, 2023
An organized collection of tutorials and projects created for aspriring computer vision students.

A repository created with the purpose of teaching students in BME lab 308A- Hanoi University of Science and Technology

Givralnguyen 5 Nov 24, 2021
A dataset handling library for computer vision datasets in LOST-fromat

A dataset handling library for computer vision datasets in LOST-fromat

8 Dec 15, 2022
Let's explore how we can extract text from forms

Form Segmentation Let's explore how we can extract text from any forms / scanned pages. Objectives The goal is to find an algorithm that can extract t

Philip Doxakis 42 Jun 05, 2022
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

Minghui Liao 102 Jun 29, 2022
A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

FudanVIC Team 170 Dec 26, 2022
a micro OCR network with 0.07mb params.

MicroOCR a micro OCR network with 0.07mb params. Layer (type) Output Shape Param # Conv2d-1 [-1, 64, 8,

william 29 Aug 06, 2022
a deep learning model for page layout analysis / segmentation.

OCR Segmentation a deep learning model for page layout analysis / segmentation. dependencies tensorflow1.8 python3 dataset: uw3-framed-lines-degraded-

99 Dec 12, 2022