Implementation of legal QA system based on SentenceKoBART

Last update: Dec 27, 2022

Related tags

Text Data & NLP LegalQA

Overview

LegalQA using SentenceKoBART

Implementation of legal QA system based on SentenceKoBART

How to train SentenceKoBART
Based on Neural Search Engine Jina
Provide Korean legal QA data(1,830 pairs)

Setup

# install git lfs , https://github.com/git-lfs/git-lfs/wiki/Installation
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git clone https://github.com/haven-jeon/LegalQA.git
cd LegalQA
git lfs pull
pip install -r requirements.txt

Index

python app.py -t index

GPU-based indexing available as an option

pods/encoder.yml - on_gpu: true

Search

With REST API

To start the Jina server for REST API:

python app.py -t query_restful

Then use a client to query:

curl --request POST -d '{"top_k": 1, "mode": "search",  "data": ["상속 관련 문의"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:1234/api/search'

Or use Jinabox with endpoint http://127.0.0.1:1234/api/search

From the terminal

python app.py -t query

Demo

http://ec2-3-36-123-253.ap-northeast-2.compute.amazonaws.com:7874/

Citation

Model training, data crawling, and demo system were all supported by the AWS Hero program.

@misc{heewon2021,
author = {Heewon Jeon},
title = {LegalQA using SentenceKoBART},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/haven-jeon/LegalQA}}

License

QA data data/legalqa.jsonlines is crawled in www.freelawfirm.co.kr based on robots.txt. Commercial use other than academic use is prohibited.
We are not responsible for any legal decisions we make based on the resources provided here.

Implementation of legal QA system based on SentenceKoBART

Related tags

Overview

LegalQA using SentenceKoBART

Setup

Index

Search

With REST API

From the terminal

Demo

Citation

License

Owner

Heewon Jeon(gogamza)

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

StarGAN - Official PyTorch Implementation

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Curso práctico: NLP de cero a cien 🤗

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

Lumped-element impedance calculator and frequency-domain plotter.

The (extremely) naive sentiment classification function based on NBSVM trained on wisesight_sentiment

Unofficial Python library for using the Polish Wordnet (plWordNet / Słowosieć)

Python wrapper for Stanford CoreNLP tools v3.4.1

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

novel deep learning research works with PaddlePaddle

This is a MD5 password/passphrase brute force tool

The ability of computer software to identify words and phrases in spoken language and convert them to human-readable text

NLP, before and after spaCy

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

Snips Python library to extract meaning from text