This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Last update: Dec 04, 2022

Related tags

Text Data & NLP proteno

Overview

Proteno

This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems (https://arxiv.org/abs/2104.07777)

Security

See CONTRIBUTING for more information.

License

This project is released under CC-BY-NC-4.0 and other licenses:

English: CC-BY-SA
Spanish: CC-BY-SA
Tamil: CC-BY-NC-SA

Citation

If you use our data, please cite the following paper:

@inproceedings{tyagi-etal-2021-proteno,
    title = "Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems",
    author = "Tyagi, Shubhi  and
      Bonafonte, Antonio  and
      Lorenzo-Trueba, Jaime  and
      Latorre, Javier",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-industry.10",
    pages = "72--79",
}

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Related tags

Overview

Proteno

Security

License

Citation

Owner

TPlinker for NER 中文/英文命名实体识别

The training code for the 4th place model at MDX 2021 leaderboard A.

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

BiNE: Bipartite Network Embedding

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

Pretrained Japanese BERT models

Mlcode - Continuous ML API Integrations

EdiTTS: Score-based Editing for Controllable Text-to-Speech

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

Use fastai-v2 with HuggingFace's pretrained transformers

Nmt - TensorFlow Neural Machine Translation Tutorial

Language-Agnostic SEntence Representations

Tool which allow you to detect and translate text.

PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

Topic Modelling for Humans

Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation