Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Overview

Smaller Multilingual Transformers

This repository shares smaller versions of multilingual transformers that keep the same representations offered by the original ones. The idea came from a simple observation: after massively multilingual pretraining, not all embeddings are needed to perform finetuning and inference. In practice one would rarely require a model that supports more than 100 languages as the original mBERT. Therefore, we extracted several smaller versions that handle fewer languages. Since most of the parameters of multilingual transformers are located in the embeddings layer, our models are between 21% and 45% smaller in size.

The table bellow compares two of our exracted versions with the original mBERT. It shows the models size, memory footprint and the obtained accuracy on the XNLI dataset (Cross-lingual Transfer from english for french). These measurements have been computed on a Google Cloud n1-standard-1 machine (1 vCPU, 3.75 GB).

Model Num parameters Size Memory Accuracy
bert-base-multilingual-cased 178 million 714 MB 1400 MB 73.8
Geotrend/bert-base-15lang-cased 141 million 564 MB 1098 MB 74.1
Geotrend/bert-base-en-fr-cased 112 million 447 MB 878 MB 73.8

Reducing the size of multilingual transformers facilitates their deployment on public cloud platforms. For instance, Google Cloud Platform requires that the model size on disk should be lower than 500 MB for serveless deployments (Cloud Functions / Cloud ML).

For more information, please refer to our paper: Load What You Need.

Available Models

Until now, we generated 70 smaller models from the original mBERT cased version. These models have been uploaded to the Hugging Face Model Hub in order to facilitate their use: https://huggingface.co/Geotrend.

They can be downloaded easily using the transformers library:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-fr-cased")
model = AutoModel.from_pretrained("Geotrend/bert-base-en-fr-cased")

More models will be released soon.

Generating new Models

We also share a python script that allows users to generate smaller transformers by their own based on a subset of the original vocabulary (the method does not only concern multilingual transformers):

pip install -r requirements.txt

python3 reduce_model.py \
	--source_model bert-base-multilingual-cased \
	--vocab_file vocab_5langs.txt \
	--output_model bert-base-5lang-cased \
	--convert_to_tf False

Where:

  • --source_model is the multilingual transformer to reduce
  • --vocab_file is the intended vocabulary file path
  • --output_model is the name of the final reduced model
  • --convert_to_tf tells the scipt whether to generate a tenserflow version or not

How to Cite

@inproceedings{smallermbert,
  title={Load What You Need: Smaller Versions of Multilingual BERT},
  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
  booktitle={SustaiNLP / EMNLP},
  year={2020}
}

Contact

Please contact [email protected] for any question, feedback or request.

Owner
Geotrend
Geotrend
Pytorch implementation of paper: "NeurMiPs: Neural Mixture of Planar Experts for View Synthesis"

NeurMips: Neural Mixture of Planar Experts for View Synthesis This is the official repo for PyTorch implementation of paper "NeurMips: Neural Mixture

James Lin 101 Dec 13, 2022
Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships.

feature-set-comp Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships. Reposito

Trent Henderson 7 May 25, 2022
dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)

dualFace dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ) We provide python implementations for our CVM 2021 paper "dualFac

Haoran XIE 46 Nov 10, 2022
Semi-SDP Semi-supervised parser for semantic dependency parsing.

Semi-SDP Semi-supervised parser for semantic dependency parsing. This repo contains the code used for the semi-supervised semantic dependency parser i

12 Sep 17, 2021
PG2Net: Personalized and Group PreferenceGuided Network for Next Place Prediction

PG2Net PG2Net:Personalized and Group Preference Guided Network for Next Place Prediction Datasets Experiment results on two Foursquare check-in datase

Urban Mobility 5 Dec 20, 2022
This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

🗣️ aspeak A simple text-to-speech client using azure TTS API(trial). 😆 TL;DR: This program uses trial auth token of Azure Cognitive Services to do s

Levi Zim 359 Jan 05, 2023
DrNAS: Dirichlet Neural Architecture Search

This paper proposes a novel differentiable architecture search method by formulating it into a distribution learning problem. We treat the continuously relaxed architecture mixing weight as random va

Xiangning Chen 37 Jan 03, 2023
🍷 Gracefully claim weekly free games and monthly content from Epic Store.

EPIC 免费人 🚀 优雅地领取 Epic 免费游戏 Introduction 👋 Epic AwesomeGamer 帮助玩家优雅地领取 Epic 免费游戏。 使用 「Epic免费人」可以实现如下需求: get:搬空游戏商店,获取所有常驻免费游戏与免费附加内容; claim:领取周免游戏及其免

571 Dec 28, 2022
Keep CALM and Improve Visual Feature Attribution

Keep CALM and Improve Visual Feature Attribution Jae Myung Kim1*, Junsuk Choe1*, Zeynep Akata2, Seong Joon Oh1† * Equal contribution † Corresponding a

NAVER AI 90 Dec 07, 2022
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Mingyang Zhou 28 Dec 30, 2022
Deep Learning for Human Part Discovery in Images - Chainer implementation

Deep Learning for Human Part Discovery in Images - Chainer implementation NOTE: This is not official implementation. Original paper is Deep Learning f

Shintaro Shiba 63 Sep 25, 2022
Multiband spectro-radiometric satellite image analysis with K-means cluster algorithm

Multi-band Spectro Radiomertric Image Analysis with K-means Cluster Algorithm Overview Multi-band Spectro Radiomertric images are images comprising of

Chibueze Henry 6 Mar 16, 2022
SMPL-X: A new joint 3D model of the human body, face and hands together

SMPL-X: A new joint 3D model of the human body, face and hands together [Paper Page] [Paper] [Supp. Mat.] Table of Contents License Description News I

Vassilis Choutas 1k Jan 09, 2023
Deep learned, hardware-accelerated 3D object pose estimation

Isaac ROS Pose Estimation Overview This repository provides NVIDIA GPU-accelerated packages for 3D object pose estimation. Using a deep learned pose e

NVIDIA Isaac ROS 41 Dec 18, 2022
A torch implementation of "Pixel-Level Domain Transfer"

Pixel Level Domain Transfer A torch implementation of "Pixel-Level Domain Transfer". based on dcgan.torch. Dataset The dataset used is "LookBook", fro

Fei Xia 260 Sep 02, 2022
Compare GAN code.

Compare GAN This repository offers TensorFlow implementations for many components related to Generative Adversarial Networks: losses (such non-saturat

Google 1.8k Jan 05, 2023
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim

16 Dec 08, 2022
Open source person re-identification library in python

Open-ReID Open-ReID is a lightweight library of person re-identification for research purpose. It aims to provide a uniform interface for different da

Tong Xiao 1.3k Jan 01, 2023
PyTorch GPU implementation of the ES-RNN model for time series forecasting

Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm A GPU-enabled version of the hybrid ES-RNN model by Slawek et al that won the M4 time-series

Kaung 305 Jan 03, 2023
Training DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)

Training DALL-E with volunteers from all over the Internet This repository is a part of the NeurIPS 2021 demonstration "Training Transformers Together

<a href=[email protected]"> 19 Dec 13, 2022