A Python client for the Softcite software mention recognizer server

Overview

Softcite software mention recognizer client

Python client for using the Softcite software mention recognition service. It can be applied to

  • individual PDF files

  • recursively to a local directory, processing all the encountered PDF

  • to a collection of documents harvested by biblio-glutton-harvester and article-dataset-builder, with the benefit of re-using the collection manifest for injectng metadata and keeping track of progress. The collection can be stored locally or on a S3 storage.

Requirements

The client has been tested with Python 3.5-3.7.

The client requires a working Softcite software mention recognition service. Service host and port can be changed in the config.json file of the client.

Install

cd software_mention_client/

It is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:

virtualenv --system-site-packages -p python3 env

source env/bin/activate

Install the dependencies, use:

pip3 install -r requirements.txt

Usage and options

usage: software_mention_client.py [-h] [--repo-in REPO_IN] [--file-in FILE_IN]
                                  [--file-out FILE_OUT]
                                  [--data-path DATA_PATH] [--config CONFIG]
                                  [--reprocess] [--reset] [--load]
                                  [--diagnostic] [--scorched-earth]

Softcite software mention recognizer client

optional arguments:
  -h, --help            show this help message and exit
  --repo-in REPO_IN     path to a directory of PDF files to be processed by
                        the Softcite software mention recognizer
  --file-in FILE_IN     a single PDF input file to be processed by the
                        Softcite software mention recognizer
  --file-out FILE_OUT   path to a single output the software mentions in JSON
                        format, extracted from the PDF file-in
  --data-path DATA_PATH
                        path to the resource files created/harvested by
                        biblio-glutton-harvester
  --config CONFIG       path to the config file, default is ./config.json
  --reprocess           reprocessed failed PDF
  --reset               ignore previous processing states and re-init the
                        annotation process from the beginning
  --load                load json files into the MongoDB instance, the --repo-
                        in parameter must indicate the path to the directory
                        of resulting json files to be loaded
  --diagnostic          perform a full count of annotations and diagnostic
                        using MongoDB regarding the harvesting and
                        transformation process
  --scorched-earth      remove a PDF file after its successful processing in
                        order to save storage space, careful with this!

The logs are written by default in a file ./client.log, but the location of the logs can be changed in the configuration file (default ./config.json).

Processing local PDF files

For processing a single file., the resulting json being written as file at the indicated output path:

python3 software_mention_client.py --file-in toto.pdf --file-out toto.json

For processing recursively a directory of PDF files, the results will be:

  • written to a mongodb server and database indicated in the config file

  • and in the directory of PDF files, as json files, together with each processed PDF

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/

The default config file is ./config.json, but could also be specified via the parameter --config:

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/ --config ./my_config.json

Processing a collection of PDF harvested by biblio-glutton-harvester

biblio-glutton-harvester and article-dataset-builder creates a collection manifest as a LMDB database to keep track of the harvesting of large collection of files. Storage of the resource can be located on a local file system or on a AWS S3 storage. The software-mention client will use the collection manifest to process these harvested documents.

  • locally:

python3 software_mention_client.py --data-path /mnt/data/biblio-glutton-harvester/data/

--data-path indicates the path to the repository of data harvested by biblio-glutton-harvester.

The resulting JSON files will be enriched by the metadata records of the processed PDF and will be stored together with each processed PDF in the data repository.

If the harvested collection is located on a S3 storage, the access information must be indicated in the configuration file of the client config.json. The extracted software mention will be written in a file with extension .software.json, for example:

-rw-rw-r-- 1 lopez lopez 1.1M Aug  8 03:26 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.pdf
-rw-rw-r-- 1 lopez lopez  485 Aug  8 03:41 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.software.json

If a MongoDB server access information is indicated in the configuration file config.json, the extracted information will additionally be written in MongoDB.

License and contact

Distributed under Apache 2.0 license. The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.

Main author and contact: Patrice Lopez ([email protected])

A Telegram bot to all media and documents files to web link .

FileStreamBot A Telegram bot to all media and documents files to web link . Report a Bug | Request Feature 🍁 About This Bot : This bot will give you

Code X Mania 129 Jan 03, 2023
Github integration with Telegram

The Telegram bot myGit is your GiHub assistant. In your conversations with your team, you can simply insert the information about the projects you are working at.

Alexandru Buzescu 2 Jan 06, 2022
Its Is A Telegram Maths Basic Calculator Bot

Its Is A Telegram Maths Basic Calculator Bot

ANKIT KUMAR 1 Dec 26, 2021
Mikasa is a 100% Spanish bot, a multifunctional bot, Mikasa is in beta.

Mikasa Miaksa, It is a multi-functional discord bot that is currently in development, this is not complete, there are still many things to fix and imp

Made in 2 Oct 05, 2021
TeslaPy - A Python implementation based on unofficial documentation of the client side interface to the Tesla Motors Owner API

TeslaPy - A Python implementation based on unofficial documentation of the client side interface to the Tesla Motors Owner API, which provides functiona

Tim Dorssers 233 Dec 30, 2022
SEBUAH TOOLS CRACK FACEBOOK & INSTAGRAM DENGAN FITUR YANGMENDUKUNG

SEBUAH TOOLS CRACK FACEBOOK & INSTAGRAM DENGAN FITUR YANGMENDUKUNG

Jeeck X Nano 1 Dec 27, 2021
A simple python bot that serves to send some notifications about GitHub events to Slack.

github alerts slack bot 🤖 What is it? 🔍 This is a simple bot that serves to send some notifications about GitHub events to Slack channels. These are

Jackson Alves 10 Dec 10, 2022
dex.guru python sdk

dexguru-sdk.py dexguru-sdk.py allows you to access dex.guru public methods from your async python scripts. Installation To install latest version, jus

DexGuru 17 Dec 06, 2022
Busty - A bot for the Busty Discord server

Busty Discord bot used for the Busty server. Install You'll need at least Python

Andrew Morgan 7 Dec 05, 2022
Web3 Ethereum DeFi toolkit for smart contracts, Uniswap and PancakeSwap trades, Ethereum JSON-RPC utilities, wallets and automated test suites.

Web3 Ethereum Defi This project contains common Ethereum smart contracts and utilities, for trading, wallets,automated test suites and backend integra

Trading Strategy 222 Jan 04, 2023
ABACUS Aroio API for Webinterfaces and App-Connections

ABACUS Aroio API for Webinterfaces and App-Connections Setup Start virtual python environment if you don't have python3 running setup: $ python3 -m ve

Abacus Aroio Developer Team 1 Apr 01, 2021
A tool to build scripts to toggle between minimal & default services in Windows based on user defined lists.

A tool to build scripts to toggle between minimal & default services in Windows based on user defined lists.

AMIT 29 Jan 01, 2023
Unlimited Filter Telegram Bot 2

Mother NAther Bot Features Auto Filter Manuel Filter IMDB Admin Commands Broadcast Index IMDB search Inline Search Random pics ids and User info Stats

LɪᴏɴKᴇᴛᴛʏUᴅ 1 Oct 30, 2021
Python Dialogflow CX Scripting API (SCRAPI)

Python Dialogflow CX Scripting API (SCRAPI) A high level scripting API for bot builders, developers, and maintainers. Table of Contents Introduction W

Google Cloud Platform 39 Dec 09, 2022
PerrOS - The operating system for your discord server.

PerrOS PerrOS is a Opensource Discord Bot to do it all! Installation Use the package manager pip to install the python3 requirements. pip3 install -r

Webshort 2 Jun 20, 2022
Ark API Wrapper in Python

Pythark Ark API Wrapper in Python. Built with Python Requests Installation Pythark uses Arky to create a new transaction, if you want to use this feat

Jolan 14 Mar 11, 2021
API which returns cusswords , can be used to check cusswords in bots etc.

Anti-abuse-api-flask API which returns cusswords , can be used to check cusswords in bots etc. Run pip install -r requirements.txt py app.py API Endpo

8 Jan 03, 2023
Balsam Python client API & SDK

balsam No description provided (generated by Openapi Generator https://github.com/openapitools/openapi-generator) This Python package is automatically

Darren Govoni 1 Oct 22, 2021
A simple url uploader bot with permenent thumbnail support

URL-Uploader A simple url uploader bot with permenent thumbnail support Scrapped some code from @SpEcHIDe's AnyDLBot Repository Please fork this repos

Fayas Noushad 40 Nov 29, 2021
Python wrapper for eBay API

python-ebay - Python Wrapper for eBay API This project intends to create a simple python wrapper around eBay APIs. Development and Download Sites The

Roopesh 99 Nov 16, 2022