In this Notebook I've build some machine-learning and deep-learning to classify corona virus tweets, in both multi class classification and binary classification.
Overview
Main repository for the chatbot Bobotinho.
Bobotinho Bot Main repository for the chatbot Bobotinho. ℹ️ Introduction Twitch chatbot with entertainment commands. 💻 Technologies Concurrent code
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
Ucto for Python This is a Python binding to the tokeniser Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task,
Synthetic data for the people.
zpy: Synthetic data in Blender. Website • Install • Docs • Examples • CLI • Contribute • Licence Abstract Collecting, labeling, and cleaning data for
SGMC: Spectral Graph Matrix Completion
SGMC: Spectral Graph Matrix Completion Code for AAAI21 paper "Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning". Data Format
EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".
Pre-train or Annotate? Domain Adaptation with a Constrained Budget This repo contains code and data associated with EMNLP 2021 paper "Pre-train or Ann
中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark
English | 中文说明 CBLUE AI (Artificial Intelligence) is playing an indispensabe role in the biomedical field, helping improve medical technology. For fur
Built for cleaning purposes in military institutions
Ferramenta do AL Construído para fins de limpeza em instituições militares. Instalação Requer python = 3.2 pip install -r requirements.txt Usagem Exe
Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"
PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art
Every Google, Azure & IBM text to speech voice for free
TTS-Grabber Quick thing i made about a year ago to download any text with any tts voice, over 630 voices to choose from currently. It will split the i
[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs
New Benchmarks for Learning on Non-Homophilous Graphs Here are the codes and datasets accompanying the paper: New Benchmarks for Learning on Non-Homop
Simple virtual assistant using pyttsx3 and speech recognition optionally with pywhatkit and pther libraries.
VirtualAssistant Simple virtual assistant using pyttsx3 and speech recognition optionally with pywhatkit and pther libraries. Third Party Libraries us
My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.
Safaricom_Codility Machine Learning 2022 The test entails two questions. Question 1 was on Machine Learning. Question 2 was on SQL I ran out of time.
Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.
LibreTranslate Try it online! | API Docs | Community Forum Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it d
Codes for coreference-aware machine reading comprehension
Data and code for the paper "Tracing Origins: Coreference-aware Machine Reading Comprehension" at ACL2022. Dataset There are three folders for our thr
Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs
Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr
Unsupervised intent recognition
INTENT author: steeve LAQUITAINE description: deployment pattern: currently batch only Setup & run git clone https://github.com/slq0/intent.git bash
SDL: Synthetic Document Layout dataset
SDL is the project that synthesizes document images. It facilitates multiple-level labeling on document images and can generate in multiple languages.
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset
PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp
Python implementation of TextRank for phrase extraction and summarization of text documents
PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document
RecipeReduce: Simplified Recipe Processing for Lazy Programmers
RecipeReduce This repo will help you figure out the amount of ingredients to buy for a certain number of meals with selected recipes. RecipeReduce Get