This repository provides a set functions to extract paragraphs from AWS Textract responses.

Overview

extract-paragraphs-with-aws-textract

Since AWS Textract (the AWS OCR service) does not have a native function to extract paragraphs, this repository provides a set of Python 3.X functions built on top of the AWS Python SDK (boto3) to extract paragraphs from AWS Textract responses.

PLEASE NOTE THAT:

  1. It is assumed that your client has the neccesary IAM permissions to access the different AWS resources required.
  2. Since AWS Textract analyze PDF files by running asynchronous operations, the current version assumes that you've already created an s3 bucket and that the PDF files are already stored there. If not, please go to the boto3 docs to know how to create a bucket as well as upload files.
  3. The paragraph_constructor is an ad hoc function for my use case. You may have to adapt it based on the space between lines in your data.

UPCOMING FEATURES:

  • Address abstract cases with the paragrpah_constructor function.
  • Export data in different formats.
  • AWS CloudFormation template for a serverless architecture to execute the functions when a new object is uploaded in your S3 bucket.

Please feel free to suggest new features or improvements to the current code. <3

Owner
Juan Anzola
Juan Anzola
Python Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!

Google Images Download Python Script for 'searching' and 'downloading' hundreds of Google images to the local hard disk! Documentation Documentation H

Hardik Vasa 8.2k Jan 05, 2023
A community made discord bot coded in Python and running on AWS.

Pogbot Project Open Group Discord This is an open source community ran project. Join the discord for more information on how to participate. Coded in

Project Open Group 2 Jul 27, 2022
Zaid Vc Player Allows u to steam Songs/music on vc chat

ᴢᴀɪᴅ ᴠᴄ ᴘʟᴀʏᴇʀ 🔥 SORRY FOR OUR PROJECTS DELETED BY GITHUB FLAGGED ᴢᴀɪᴅ ᴠᴄ ᴘʟᴀᴇʀ ɪꜱ ᴀ ᴛᴇʟᴇɢʀᴀᴍ ᴘʀᴏᴊᴇᴄᴛ ʙᴀꜱᴇᴅ ᴏɴ ᴘʏʀᴏɢʀᴀᴍ ꜰᴏʀ ᴘʟᴀʏ ᴍᴜꜱɪᴄꜱ ɪɴ ᴠᴄ ᴄʜᴀᴛꜱ..

Zaid 117 Dec 29, 2022
Powerful Ethereum Smart-Contract Toolkit

Heimdall Heimdall is an advanced and modular smart-contract toolkit which aims to make dealing with smart contracts on EVM based chains easier. Instal

Jonathan Becker 69 Dec 26, 2022
A program to convert YouTube channel registration information into Json files for ThirdTube.

ThirdTubeImporter A program to convert YouTube channel registration information into Json files for ThirdTube. Usage Japanese https://takeout.google.c

Hidegon 2 Dec 18, 2021
Instagram - Instagram Account Reporting Tool

Instagram Instagram Account Reporting Tool Installation On Termux $ apt update $

Aryan 6 Nov 18, 2022
Tracks twitter spaces and sends it to a discord webhook.

Tracks twitter spaces and sends it to a discord webhook. Uses the twitter api to find twitter spaces and then the m3u8 url for the space is found using selenium and will have it posted using a discor

Sam Phung 20 Dec 17, 2022
Tinkoff social pulse api wrapper

Tinkoff social pulse api wrapper

Semenov Artur 9 Dec 20, 2022
A python package to easy the integration with Direct Online Pay (Mpesa, TigoPesa, AirtelMoney, Card Payments)

python-dpo A python package to easy the integration with Direct Online Pay (DPO) which easily allow you easily integrate with payment options once wit

NEUROTECH 15 Oct 08, 2022
An asyncio Python wrapper around the Discord API, forked off of Rapptz's Discord.py.

Novus A modern, easy to use, feature-rich, and async ready API wrapper for Discord written in Python. A full fork of Rapptz's Discord.py library, with

Voxel Fox 60 Jan 03, 2023
AminoSpamKilla - Spam bot for amino that uses multiprocessing module

AminoSpamKilla Spam bot for amino that uses multiprocessing module Pydroid Open

4 Jun 27, 2022
Discord Mass Edit is a unique, purging related Discord tool that differs from the regular mass delete.

Discord Mass Edit is a unique, purging related Discord tool that differs from the regular mass delete. This tool will automatically edit every message in a chosen channel and change it to a random st

c0mpt0 1 Jul 27, 2022
Um simples bot público para todos usarem no discord!

Discord Bot - Código Público Características: Linguagem de Programação: Python Quantidade de comandos: 17 Comandos: Prefixo do bot: O prefixo desse bo

Kevin 3 Dec 31, 2021
This script books automatically a slot on Doctolib in one of the public vaccination centers in Berlin.

BOOKING IN BERLINS VACCINATION CENTERS This python script books automatically a slot on Doctolib in one of the public vaccination centers in Berlin. T

17 Jan 13, 2022
Dicha herramienta esta creada con una api... esta api permite enviar un SMS cada 12 horas dependiendo del pais... Hay algunos paises y operadoras no están soportados.

SMSFree pkg install python3 pip install requests git clone https://github.com/Hidden-parker/SMSFree cd SMSFree python sms.py DISFRUTA... Dicha herrami

piter 2 Nov 14, 2021
A Characther powerful in saints saiya anime and modular telegram group management bot built using python3

Kaneki Ken A Powerful and Modular Saint Aries is a Characther powerful in saints saiya anime and modular telegram group management bot built using pyt

1 Dec 21, 2021
An attendance bot that joins google meet automatically according to schedule and marks present in the google meet.

Google-meet-self-attendance-bot An attendance bot which joins google meet automatically according to schedule and marks present in the google meet. I

Sarvesh Wadi 12 Sep 20, 2022
Quot-a-lecture - Lecture transcript question extraction

Setup virtualenv venv source venv/bin/activate pip install -r requirements.txt

Pratyaksh Sharma 5 Sep 12, 2022
Python wrapper for eBay API

python-ebay - Python Wrapper for eBay API This project intends to create a simple python wrapper around eBay APIs. Development and Download Sites The

Roopesh 99 Nov 16, 2022
Accurately dump Commodore 64 tapes

TrueTape64 A cheap, easy to build adapter to interface a Commodore 1530 (C2N) Datasette to your PC to dump and preserve your aging Commodore 64 softwa

francesco 38 Dec 03, 2022