Machine learning models from Singapore's NLP research community

Related tags

Text Data & NLPsgnlp
Overview

SG-NLP

Machine learning models from Singapore's natural language processing (NLP) research community.

sgnlp is a Python package that allows you to easily get started on using various (NLP) models implemented using the Pytorch and Transfromers frameworks.

We have an accompanying demo site where you can interact with our models and get a better understanding on how they work.

Installation

  • Python >= 3.8
pip install sgnlp

Documentation

Visit our documentation for tutorials.

License

Code and models from this project are released under the MIT License unless otherwise stated. If a model's code is under a separate license, it can be found in the respective model's folder.

Comments
  • Change demo api to use gevent worker

    Change demo api to use gevent worker

    • Using multiple workers of the default type 'sync' in gunicorn is not working on Kubernetes
    • Workers constantly terminated due to signal 9
    • Try gevent to see if it works out
    opened by jonheng 2
  • UFD use case tutorial and usability improvement

    UFD use case tutorial and usability improvement

    • Added additional tutorial on how to use UFD to train and evaluate on custom dataset
    • Bug fix for UFD parse_args_and_load_config util function
    • Added feature to create folder if folder doesn't exist
    • Added some train args param in eval args param to improve usability
    • Made caching optional
    • Added validation to make debugging easier
    • Added links to config file examples for reccon models
    opened by vincenttzc 1
  • Wrong assert comparison for SenticGCN dataclass

    Wrong assert comparison for SenticGCN dataclass

    Latest SenticGCN implementation for the Dev branch. In the dataclass.py, post_init method in SenticGCNTrainArgs, there are the following assertions,

    assert self.repeats > 1, "Repeats value must be at least 1."
    assert self.patience > 1, "Patience value must be at least 1." 
    

    The comparison operator should be >= instead.

    bug 
    opened by raymondng76 0
  • 47 centralized logging

    47 centralized logging

    • Create a centralized logger for 'sgnlp' base logger
    • 'sgnlp' logger is created from a config json and is init a the 'sgnlp' module init.py
    • Replace all logging method call with their own script specific logger
    opened by raymondng76 0
  • Add parent class for preprocessor

    Add parent class for preprocessor

    • [x] Create a module named sgnlp.base
    • [x] Add abstractmethods for preprocess, save, load
    • [x] Add batch iteration to parent __call__
    • [x] Parent __call__ should return a dictionary
    enhancement 
    opened by jonheng 0
  • 46 senticgcn bugfix

    46 senticgcn bugfix

    • Add multi-word aspect support
    • Update documentation to reflect multi-word support
    • Update unit tests
    • Update usage example to include multi-word support
    opened by raymondng76 0
  • Fix multi-word aspect issue with Sentic-GCN preprocessor

    Fix multi-word aspect issue with Sentic-GCN preprocessor

    The current implementation of preprocessor matches a single aspect index for the purpose of matching postprocessor output. The aspect index field for process_input payload should be expended to handle aspects with multiple indexes.

    bug 
    opened by raymondng76 0
  • Add Sentic-GCN demo_api to SGNlp

    Add Sentic-GCN demo_api to SGNlp

    Close #43

    This pull request is to add Sentic-GCN demo_api models to sgnlp. Includes the follow components:

    • model_card
    • api.py
    • dockerfiles
    • requirements.txt
    • usage.py
    opened by K-WeiMing 0
  • Add Sentic-GCN to SGNlp

    Add Sentic-GCN to SGNlp

    close #41

    This pull request is to add Sentic-GCN models to sgnlp. Includes the follow components:

    • Models
    • Configs
    • Tokenizers
    • Embedding models
    • Trainer/Evaluator
    • Unit test
    • documentation

    Does not include demo_api as it is covered in another issue tickets.

    opened by raymondng76 0
  • download_pretrained for demo API does not cache downloaded files/models

    download_pretrained for demo API does not cache downloaded files/models

    To allow the containers to start up quicker, models and files were downloaded and cached during build time.

    Recent changes in the huggingface transformers package has broken this functionality:

    • Released in v4.22.0
    • Issue

    Possible choices moving forward:

    • Write a simple caching utility function
    • Stick to versions of transformers before 4.22.0
    opened by jonheng 0
  • Add Stance Detection model

    Add Stance Detection model

    opened by atenzer 0
Releases(v0.4.0)
Owner
AI Singapore | AI Makerspace
Grow local AI talents and empowering start-ups, SMEs and enterprises with AI components, frameworks, platforms and advisory services.
AI Singapore | AI Makerspace
Crie tokens de autenticação íntegros e seguros com UToken.

UToken - Tokens seguros. UToken (ou Unhandleable Token) é uma bilioteca criada para ser utilizada na geração de tokens seguros e íntegros, ou seja, nã

Jaedson Silva 0 Nov 29, 2022
this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

1 Nov 02, 2021
Almost State-of-the-art Text Generation library

Ps: we are adding transformer model soon Text Gen 🐐 Almost State-of-the-art Text Generation library Text gen is a python library that allow you build

Emeka boris ama 63 Jun 24, 2022
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

msg systems ag 169 Dec 21, 2022
Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

SpeechMix Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together. Introduction For the same input: from datas

Eric Lam 31 Nov 07, 2022
Host your own GPT-3 Discord bot

GPT3 Discord Bot Host your own GPT-3 Discord bot i'd host and make the bot invitable myself, however GPT3 terms of service prohibit public use of GPT3

[something hillarious here] 8 Jan 07, 2023
EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

BioLAMA BioLAMA is biomedical factual knowledge triples for probing biomedical LMs. The triples are collected and pre-processed from three sources: CT

DMIS Laboratory - Korea University 41 Nov 18, 2022
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

Zhiling Zhang 5 Oct 21, 2022
Two-stage text summarization with BERT and BART

Two-Stage Text Summarization Description We experiment with a 2-stage summarization model on CNN/DailyMail dataset that combines the ability to filter

Yukai Yang (Alexis) 6 Oct 22, 2022
Training code for Korean multi-class sentiment analysis

KoSentimentAnalysis Bert implementation for the Korean multi-class sentiment analysis 왜 한국어 감정 다중분류 모델은 거의 없는 것일까?에서 시작된 프로젝트 Environment: Pytorch, Da

Donghoon Shin 3 Dec 02, 2022
Predict the spans of toxic posts that were responsible for the toxic label of the posts

toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant

Ilias Antonopoulos 3 Jul 24, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
An open collection of annotated voices in Japanese language

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション Koniwa (声庭): An open collection of annotated voices in Japanese language 概要 Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテ

Koniwa project 32 Dec 14, 2022
Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles (TASLP 2022)

Zhuosheng Zhang 3 Apr 14, 2022
Crowd sourced training data for Rasa NLU models

NLU Training Data Crowd-sourced training data for the development and testing of Rasa NLU models. If you're interested in grabbing some data feel free

Rasa 169 Dec 26, 2022
Chinese Grammatical Error Diagnosis

nlp-CGED Chinese Grammatical Error Diagnosis 中文语法纠错研究 基于序列标注的方法 所需环境 Python==3.6 tensorflow==1.14.0 keras==2.3.1 bert4keras==0.10.6 笔者使用了开源的bert4keras

12 Nov 25, 2022
Easy, fast, effective, and automatic g-code compression!

Getting to the meat of g-code. Easy, fast, effective, and automatic g-code compression! MeatPack nearly doubles the effective data rate of a standard

Scott Mudge 97 Nov 21, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
A BERT-based reverse dictionary of Korean proverbs

Wisdomify A BERT-based reverse-dictionary of Korean proverbs. 김유빈 : 모델링 / 데이터 수집 / 프로젝트 설계 / back-end 김종윤 : 데이터 수집 / 프로젝트 설계 / front-end / back-end 임용

94 Dec 08, 2022