This is a NLP based project to extract effective date of the contract from their text files.

Last update: Jan 26, 2022

Overview

Date-Extraction-from-Contracts

This is a NLP based project to extract effective date of the contract from their text files.

Problem statement

This is a NLP based project where effective dates needs to be identified from the contracts as per the given text data of the contracts. The dates could be in any format for eg - 01/01/2022, 1st Jan, 2022, 1st January, 2022, 01 Jan 2022, etc.

Libraries Used

Numpy
Tensorflow
keras
nltk
Sklearn
matplotlib
pandas

Approach

Data prerprocessing

To preprocess the text data the custom function was developed to preprocess the data as the convential libraires out there are not focused on preprocessing dates in a text corpus. To perform the requried tokenization and vectorization of the text nltk was used instaed of tensorflow or keras based text preprocessors. The preprocessing includes data cleaning (remvoing improper data lbaleing or file namings), stopwords removal, puncation removal but keeping in mind the punctutaions within a date like '/', spacing and seperating dates with words as there were cases where the numbers in the dates are conjoined with the preceding word, tokenization and vectorization of word. For vectorization of the word a normal word based vectorization was used as usig TF-IDF would not have made much difference in terms of date extraction.

Preprocessed data before vectorization:

Model Building

The model for this problem was a RNN based model with a bidirectional LSTM layer. the inputs of the model include the preprocessed data with 3 output values each predicting the values of a day, month and year respectively.

The model was trained a decayed learning rate starting from a learning rate of 0.001 and trained for 80 epochs with a batch size of 8.

Model Architecture:

Results

The model performed quite well being a baseline model to extract date using just a single Bidirectional LSTM layer. The prediction file is atatched to refer the results.

This is a NLP based project to extract effective date of the contract from their text files.

Related tags

Overview

Date-Extraction-from-Contracts

Problem statement

Libraries Used

Approach

Data prerprocessing

Model Building

Results

Owner

Sambhav Garg

Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

Korean Sentence Embedding Repository

TTS is a library for advanced Text-to-Speech generation.

Deep learning for NLP crash course at ABBYY.

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.

VoiceFixer VoiceFixer is a framework for general speech restoration.

Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

An automated program that helps customers of Pizza Palour place their pizza orders

Graphical user interface for Argos Translate

TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

An ActivityWatch watcher to pose questions to the user and record her answers.

Code for lyric-section-to-comment generation based on huggingface transformers.

A library for finding knowledge neurons in pretrained transformer models.

An Open-Source Package for Neural Relation Extraction (NRE)

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

NLP-based analysis of poor Chinese movie reviews on Douban

Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.