offline-training-pipeline

This is the offline-training-pipeline for our project.

We adopt the offline training and online prediction Machine Learning System framework structure.

We used the recent DistilBERT pre-trained large-scale NLP language model and fine-tuned it for the downstream fake news classification task.

Initial fine-tune training dataset are adopted from CONSTRAINT workshop of AAAI21. For offline routine training and updating in the future, we will adopt the Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Fakenewsnet offers up-to-date datasets and is continuously been updated on a regular basis. We hope to track the lastest trend of popular fake news and broader fake news topic as well by doing offline-training of our model and achieve better performance in the online prediction.

References:

@misc{patwa2020fighting, title={Fighting an Infodemic: COVID-19 Fake News Dataset}, author={Parth Patwa and Shivam Sharma and Srinivas PYKL and Vineeth Guptha and Gitanjali Kumari and Md Shad Akhtar and Asif Ekbal and Amitava Das and Tanmoy Chakraborty}, year={2020}, eprint={2011.03327}, archivePrefix={arXiv}, primaryClass={cs.CL} }

@article{sanh2019distilbert, title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas}, journal={arXiv preprint arXiv:1910.01108}, year={2019} }

@article{shu2020fakenewsnet, title={Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media}, author={Shu, Kai and Mahudeswaran, Deepak and Wang, Suhang and Lee, Dongwon and Liu, Huan}, journal={Big data}, volume={8}, number={3}, pages={171--188}, year={2020}, publisher={Mary Ann Liebert, Inc., publishers 140 Huguenot Street, 3rd Floor New~…} }

This is the offline-training-pipeline for our project.

Related tags

Overview

offline-training-pipeline

Owner

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

[ICLR 2021 Spotlight] Pytorch implementation for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Tool which allow you to detect and translate text.

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Dope Wars game engine on StarkNet L2 roll-up

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

NVDA, the free and open source Screen Reader for Microsoft Windows

Label data using HuggingFace's transformers and automatically get a prediction service

Global Rhythm Style Transfer Without Text Transcriptions

In this Notebook I've build some machine-learning and deep-learning to classify corona virus tweets, in both multi class classification and binary classification.

Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

Fully featured implementation of Routing Transformer

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers