Automate the case review on legal case documents and find the most critical cases using network analysis

Overview

Automation on Legal Court Cases Review

This project is to automate the case review on legal case documents and find the most critical cases using network analysis.

Short write-up

Affiliation: Institute for Social and Economic Research and Policy, Columbia University

Project Information:

Keywords: Automation, PDF parse, String Extraction, Network Analysis

Software:

  • Python : pdfminer, LexNLP, nltk sklearn
  • R: igraph

Scope:

  1. Parse court documents, extract citations from raw text.
  2. Build citation network, identify important cases in the network.
  3. Extract judge's opinion text and meta information including opinion author, court, decision.
  4. Model training to predict court decision based on opinion text.

Polit Study on 159 Legal Court Documents (in pilot_159 folder)

1. Process PDF documents using Python

Ipython Notebook Description
1.Extraction by LexNLP.ipynb Extract meta inforation use LexNLP package.
2.Layer Analysis on Sigle File. ipynb Use pdfminer to extract the raw text and the paragraph segamentation in the PDF document.
3.Patent Position by Layer.ipynb Identify the position of patent number in extracted layers from PDF.
4.Opinion and Author by Layer.ipynb Extract opinion text, author, decisions from the layers list.
5.Wrap up to Meta Data.ipynb Store extracted meta data to .json or .csv
6.Visualize citation frequency.ipynb Bar plot of the citation frequencies

2. Data: Parse PDF documents via Python

These datasets are NOT included in this public repository for intellectual property and privacy concern

File
pdf2text159.json A dictionary of 3 list: file_name, raw_text, layers.
cite_edge159.csv Edge list of citation network
cite_node159.csv Meta information of each case: case_number, court, dates
reference_extract.csv cited cases in a list for every case, untidy format for analysis
citation159.csv file citation pair, tidy format for calculation
regulation159.csv file regulation pair, tidy format for calculation

3. Analyze and Visualize using R

File
Calculate Citation Frequency.Rmd Analyze reference_extract.csv
Citation Network.Rmd Analyze cite_edge159

4. Visulization Chart Sample

Citation Frequencycase_freq

Citation Networkcitation_net

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in Extraction_Modelling folder)

1. Extract opinion and meta information from raw text data

.ipynb notebook Description
Full Dataset Merge.ipynb Merge the 854 cases dataset
Edge and Node List.ipynb Create edge and node list
Full Extractions.ipynb Extract author, judge panel, opinion text
Clean Opinion Text.ipynb Remove references and special characters in opinion text

2. Datasets

These datasets are NOT included in this public repository for intellectual property and privacy concern

Dataset Description
amy_cases.json large dictionary {file name: raw text} for 854 cases, from Lilian's PDF parsing
full_name_text.json convert amy_cases.json key value pair to two list: file_name, raw_text
cite_edge.csv edge list of citation
cite_node.csv node list contains case_code, case_name, court_from, court_type
extraction854.csv full extractions include case_code, case_name, court_from, court_type, result, author, judge_panel
decision_text.json json file include author, decision(result of the case), opinion (opinion text), cleaned_text (cleaned opinion text)
cleaned_text.csv csv file contains allt the cleaned text
predict_data.csv cleaned dataset for NLP modeling predict court decision

3. Visulization using R

R markdown file
Full Network Graph.Rmd draw the full citation network
Citation Betwwen Nodes.Rmd draw citation between all the available cases
Clean Data For Predictive Modelling.rmd clean text data for predictive modeling

Interactive Graph

Play with Interactive Graph

Full Citation Network (all cases and cited cases)

Citation Between Available Cases

4. Predictive Modeling using Python

ipynb notebook
NLP Predictive Modeling.ipynb Try different preprocessing, and build a logistic regression to predict court decision.

Visulization of the Bi-gram (words) with the strongest coefficient

Bigram

Owner
Yi Yin
Tech & Business Alignment @ Wolfram Research, Social Sciences Research @ Columbia University
Yi Yin
RockNext is an Open Source extending ERPNext built on top of Frappe bringing enterprise ready utilization.

RockNext is an Open Source extending ERPNext built on top of Frappe bringing enterprise ready utilization.

Matheus Breguêz 13 Oct 12, 2022
Active Transport Analytics Model (ATAM) is a new strategic transport modelling and data visualization framework for Active Transport as well as emerging micro-mobility modes

{ATAM} Active Transport Analytics Model Active Transport Analytics Model (“ATAM”) is a new strategic transport modelling and data visualization framew

Peter Stephan 0 Jan 12, 2022
Application for viewing pokemon regional variants.

Pokemon Regional Variants Application Application for viewing pokemon regional variants. Run The Source Code Download Python https://www.python.org/do

Michael J Bailey 4 Oct 08, 2021
Compute and visualise incidence (reworking of the original incidence package)

incidence2 incidence2 is an R package that implements functions and classes to compute, handle and visualise incidence from linelist data. It refocuss

15 Nov 22, 2022
Show Data: Show your dataset in web browser!

Show Data is to generate html tables for large scale image dataset, especially for the dataset in remote server. It provides some useful commond line tools and fully customizeble API reference to gen

Dechao Meng 83 Nov 26, 2022
Python script for writing text on github contribution chart.

Github Contribution Drawer Python script for writing text on github contribution chart. Requirements Python 3.X Getting Started Create repository Put

Steven 0 May 27, 2022
Using SQLite within Python to create database and analyze Starcraft 2 units data (Pandas also used)

SQLite python Starcraft 2 English This project shows the usage of SQLite with python. To create, modify and communicate with the SQLite database from

1 Dec 30, 2021
A curated list of awesome Dash (plotly) resources

Awesome Dash A curated list of awesome Dash (plotly) resources Dash is a productive Python framework for building web applications. Written on top of

Luke Singham 1.7k Jan 07, 2023
CONTRIBUTIONS ONLY: Voluptuous, despite the name, is a Python data validation library.

CONTRIBUTIONS ONLY What does this mean? I do not have time to fix issues myself. The only way fixes or new features will be added is by people submitt

Alec Thomas 1.8k Dec 31, 2022
Lightweight, extensible data validation library for Python

Cerberus Cerberus is a lightweight and extensible data validation library for Python. v = Validator({'name': {'type': 'string'}}) v.validate({

eve 2.9k Dec 27, 2022
Python support for Godot 🐍🐍🐍

Godot Python, because you want Python on Godot ! The goal of this project is to provide Python language support as a scripting module for the Godot ga

Emmanuel Leblond 1.4k Jan 04, 2023
Learn Basic to advanced level Data visualisation techniques from this Repository

Data visualisation Hey, You can learn Basic to advanced level Data visualisation techniques from this Repository. Data visualization is the graphic re

Shashank dwivedi 16 Jan 03, 2023
plotly scatterplots which show molecule images on hover!

molplotly Plotly scatterplots which show molecule images on hovering over the datapoints! Required packages: pandas rdkit jupyter_dash ➡️ See example.

150 Dec 28, 2022
Interactive Dashboard for Visualizing OSM Data Change

Dashboard and intuitive data downloader for more interactive experience with interpreting osm change data.

1 Feb 20, 2022
An XLSX spreadsheet renderer for Django REST Framework.

drf-renderer-xlsx provides an XLSX renderer for Django REST Framework. It uses OpenPyXL to create the spreadsheet and returns the data.

The Wharton School 166 Dec 01, 2022
A blender import/export system for Defold

defold-blender-export A Blender export system for the Defold game engine. Setup Notes There are no exhaustive documents for this tool yet. Its just no

David Lannan 27 Dec 30, 2022
This Crash Course will cover all you need to know to start using Plotly in your projects.

Plotly Crash Course This course was designed to help you get started using Plotly. If you ever felt like your data visualization skills could use an u

Fábio Neves 2 Aug 21, 2022
A program that analyzes data from inertia measurement units installed in aircraft and generates g-exceedance curves.

A program that analyzes data from inertia measurement units installed in aircraft and generates g-exceedance curves.

Pooya 1 Dec 02, 2021
Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects

carcassonne_tools Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects NOTE NOTE NOTE The

1 Nov 08, 2021
Data science project for exploratory analysis on the kcse grades dataset (Kamilimu Data Science Track)

Kcse-Data-Analysis Data science project for exploratory analysis on the kcse grades dataset (Kamilimu Data Science Track) Findings The performance of

MUGO BRIAN 1 Feb 23, 2022