Quantifiers-and-Negations-in-RE-Documents

This project was part of my work for a seminar at the Technical University of Munich (TUM) during my bachelor studies in 2019. The python project can be used to find quantifiers and negations in documents. It searches for problematic findings. Problematic findings are i.e. sentences that use specific combinations of quantifiers and negations that are ambiguous. This means there are multiple valid interpretations of the sentence. It can extract those and report them.

Motivation:

You want to avoid ambiguous sentences as they can cause problems that are hard to find and possibly hard to fix. This is especially the case for technical specifications and similar use cases. In this project we compare two different approaches to finding ambiguous sentences:

String based search
NLP based search

We want to find out if the computational overhead of using NLP gives better results than standard string based search methods.

Features:

Detect quantifiers and negations in .xml or .txt documents
Search either by a string based search or by NLP based search (using Stanfords CoreNLP library [1])
Extract possibly ambiguous sentences
Compare string search results with NLP search results

Prerequisites:

Java 8 or higher
Python 3.6 or higher as project interpreter
Stanford Corenlp library: https://stanfordnlp.github.io/CoreNLP/download.html
Environment variable "CORENLP_HOME" set to where the CoreNLP library is stored

References:

[1] Christopher D.Manning, MihaiSurdeanu, JohnBauer, JennyFinkel, StevenJ.Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60, 2014.

Quantifiers and Negations in RE Documents

Related tags

Overview

Quantifiers-and-Negations-in-RE-Documents

Owner

Nicolas Ruscher

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

FB ID CLONER WUTHOT CHECKPOINT, FACEBOOK ID CLONE FROM FILE

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

PyTorch source code of NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models"

Large-scale pretraining for dialogue

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

Clone a voice in 5 seconds to generate arbitrary speech in real-time

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

This repository has a implementations of data augmentation for NLP for Japanese.

تولید اسم های رندوم فینگیلیش

Library for fast text representation and classification.

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Various Algorithms for Short Text Mining