AllenNLP integration for Shiba: Japanese CANINE model

Last update: Feb 16, 2022

Overview

Allennlp Integration for Shiba

allennlp-shiab-model is a Python library that provides AllenNLP integration for shiba-model.

SHIBA is an approximate reimplementation of CANINE [1] in raw Pytorch, pretrained on the Japanese wikipedia corpus using random span masking. If you are unfamiliar with CANINE, you can think of it as a very efficient (approximately 4x as efficient) character-level BERT model. Of course, the name SHIBA comes from the identically named Japanese canine.

Installation

Installing the library and dependencies is simple using pip.

pip install allennlp-shiba

Example

This library enables users to specify the in a jsonnet config file. Here is an example of the model in jsonnet config file:

{
    "dataset_reader": {
        "tokenizer": {
            "type": "shiba",
        },
        "token_indexers": {
            "tokens": {
                "type": "shiba",
            }
        },
    },
    "model": {
        "shiba_embedder": {
            "type": "basic",
            "token_embedders": {
                "shiba": {
                    "type": "shiba",
                    "eval_model": true,
                }
            }

        }
    }
}

Reference

Joshua Tanner and Masato Hagiwara (2021). SHIBA: Japanese CANINE model. GitHub repository, GitHub.

You might also like...

Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

5 Aug 25, 2022

Code for evaluating Japanese pretrained models provided by NTT Ltd.

japanese-dialog-transformers 日本語の説明文はこちら This repository provides the information necessary to evaluate the Japanese Transformer Encoder-decoder dialo

216 Dec 22, 2022

Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

2 Jan 6, 2022

An open collection of annotated voices in Japanese language

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション Koniwa (声庭): An open collection of annotated voices in Japanese language 概要 Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテ

32 Dec 14, 2022

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

3 Dec 22, 2021

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022

A Japanese tokenizer based on recurrent neural networks

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following

325 Jan 5, 2023

This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

60 Nov 11, 2022

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

This repository provides a library for efficient training of masked language models (MLM), built with fairseq. We fork fairseq to give researchers mor

92 Dec 27, 2022

AllenNLP integration for Shiba: Japanese CANINE model

Related tags

Overview

Allennlp Integration for Shiba

Installation

Example

Reference

You might also like...

Auto translate textbox from Japanese to English or Indonesia

Code for evaluating Japanese pretrained models provided by NTT Ltd.

Script to download some free japanese lessons in portuguse from NHK

An open collection of annotated voices in Japanese language

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

A Japanese tokenizer based on recurrent neural networks

This repository has a implementations of data augmentation for NLP for Japanese.

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

Releases(v0.1.1)

v0.1.1(Jun 26, 2021)

v0.1.0(Jun 26, 2021)

v0.0.1(Jun 26, 2021)

Owner

Shunsuke KITADA

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

AutoGluon: AutoML for Text, Image, and Tabular Data

An attempt to map the areas with active conflict in Ukraine using open source twitter data.

The guide to tackle with the Text Summarization

Intent parsing and slot filling in PyTorch with seq2seq + attention

Share constant definitions between programming languages and make your constants constant again

A Structured Self-attentive Sentence Embedding

Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

Kestrel Threat Hunting Language

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Easy-to-use CPM for Chinese text generation

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

Natural Language Processing library built with AllenNLP 🌲🌱

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Kerberoast with ACL abuse capabilities

Application for shadowing Chinese.

Submit issues and feature requests for our API here.

This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

Source code for CsiNet and CRNet using Fully Connected Layer-Shared feedback architecture.

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.