EmoBERT-MLOps

The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this project have some differences on design, tools and frameworks used, with the objective to practice and give a different angle and implementation to the original course.

This project uses a BERT model for emotion classification and is based on the GoEmotions dataset.

Content list

TODO

Dataset descrition

Taken from https://ai.googleblog.com/2021/10/goemotions-dataset-for-fine-grained.html

In “GoEmotions: A Dataset of Fine-Grained Emotions”, we describe GoEmotions, a human-annotated dataset of 58k Reddit comments extracted from popular English-language subreddits and labeled with 27 emotion categories. As the largest fully annotated English language fine-grained emotion dataset to date, we designed the GoEmotions taxonomy with both psychology and data applicability in mind. In contrast to the basic six emotions, which include only one positive emotion (joy), our taxonomy includes 12 positive, 11 negative, 4 ambiguous emotion categories and 1 “neutral”, making it widely suitable for conversation understanding tasks that require a subtle differentiation between emotion expressions.

Model descrition

TODO

End-to-end MLOps pipeline of a BERT model for emotion classification.

Related tags

Overview

EmoBERT-MLOps

Content list

Dataset descrition

Model descrition

Owner

Dimitre Oliveira

A single model that parses Universal Dependencies across 75 languages.

This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

Client library to download and publish models and other files on the huggingface.co hub

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Dust model dichotomous performance analysis

A Python script which randomly chooses and prints a file from a directory.

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Exploration of BERT-based models on twitter sentiment classifications

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

An implementation of the Pay Attention when Required transformer

A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

Ongoing research training transformer language models at scale, including: BERT & GPT-2

A Flask Sentiment Analysis API, with visual implementation

Large-scale pretraining for dialogue

AutoGluon: AutoML for Text, Image, and Tabular Data

Fine-tune GPT-3 with a Google Chat conversation history