Repositorio com arquivos processados da CPI da COVID para facilitar analise

Related tags

Miscellaneouscpi4all
Overview

cpi4all

Repositorio com arquivos processados da CPI da COVID para facilitar analise

Organização

No site do senado é possivel encontrar a lista de todos os documentos coletados pela CPI da COVID.

A tabela no site possui a seguinte estrutura:

No Arquivos Data de recebimento Remetente Origem Descrição Caixa Em Resposta
1 Link1 ... ... ... ... ... ...
2 Link2/link3 ... ... ... ... ... ...

Esses links levam ao download de arquivos PDF com os documentos em questão.

Nesse repositorio você podera encontrar a versão txt desses arquivos. O nome do arquivo nesse repositorio é formado por <No do documento>_<numero do link>. Por exemplo:

link1 = 1_1 porque ele é relativo ao arquivo No 1, e é o primeiro link.

link2 = 2_1 porque ele é relativo ao arquivo No 2, e é o primeiro link dessa linha.

link3 = 2_2 porque ele é relativo ao arquivo No 2, e é o segundo link da linha.

A versão texto de todos os documentos está na pasta database/txts/.

Exemplos:

Arquivo No 1, primeiro link: 1_1

Arquivo No 4, quarto link: 3_4

Nota 1: Nem todos os arquivos foram convertidos ainda

Nota 2: A conversão usa reconhecimento de imagem e pode ficar bem ruim as vezes, gerando erros ortograficos ou palavras sem nexo algum.

Para desenvolvedores

Os scripts funcionam na seguinte sequencia:

  1. extract_rows.py: Vai no site do senado e extrai as informações de cada linha da tabela. Todos os dados são salvos em database/rows.
  2. extract_headers.py: Para cada link em cada linha, esse script pega metadados do arquivo (tamanho, tipo) que vão ser uteis depois. Esses dados são salvos em database/headers.
  3. download_pdfs.py: Baixa todos os PDFs descritos em database/headers e salva em database/pdfs.
  4. convert_pdf_to_jpg.py: Converte todos os PDFs em database/pdfs para imagens em database/jpgs.
  5. convert_jpg_to_txt.py: Converte todos as imagens em database/jpgs para texto em database/txt.

Por motivos de performance, apenas as pastas database/rows, database/headers e database/txts sao salvas nesse repositorio.

TODO: 0. Melhorar esse readme :)

  1. Usar o githubpages para gerar um site estatico que permite pesquisar em todos os txt
  2. Terminar de converter todos os arquivos
  3. Investigar arquivos em que a conversão ficou pessima.
  4. Fazer extração automatica de datas e prover um json com a ordem cronologica dos arquivos.
Owner
Breno Rodrigues Guimarães
Breno Rodrigues Guimarães
Binary++ is an esoteric programming language based on* binary

Binary++ is an esoteric programming language based on* binary. * It's meant to be based on binary, but you can write Binary++ code using different mea

Supercolbat 3 Feb 18, 2022
Tugas kelompok Struktur Data

Binary-Tree Tugas kelompok Struktur Data Silahkan jika ingin mengubah tipe data pada operasi binary tree *Boleh juga semua program kelompok bisa disat

Usmar manalu 2 Nov 28, 2022
Install JetBrains Toolbox

ansible-role-jetbrains-toolbox Install JetBrains Toolbox Example Playbook This example is taken from molecule/default/converge.yml and is tested on ea

Antoine Mace 2 Feb 04, 2022
Stopmagic gives you the power of creating amazing Stop Motion animations faster and easier than ever before.

Stopmagic gives you the power of creating amazing Stop Motion animations faster and easier than ever before. This project is maintained by Aldrin Mathew.

Aldrin's Art Factory 67 Dec 31, 2022
OpenTracing API for Python

OpenTracing API for Python This library is a Python platform API for OpenTracing. Required Reading In order to understand the Python platform API, one

OpenTracing API 767 Dec 16, 2022
Pyjiting is a experimental Python-JIT compiler, which is the product of my undergraduate thesis

Pyjiting is a experimental Python-JIT compiler, which is the product of my undergraduate thesis. The goal is to implement a light-weight miniature general-purpose Python JIT compiler.

Lance.Moe 10 Apr 17, 2022
A promo calculator for sports betting odds.

Sportbetter Calculation Toolkit Parlay Calculator This is a quick parlay calculator that considers some of the common promos offered. It is used to id

Luke Bhan 1 Sep 08, 2022
GWCelery is a simple and reliable package for annotating and orchestrating LIGO/Virgo alerts

GWCelery is a simple and reliable package for annotating and orchestrating LIGO/Virgo alerts, built from widely used open source components.

Min-A Cho Zeno 1 Nov 02, 2021
Some shitty programs just to brush up on my understanding of binary conversions.

Binary Converters Some shitty programs just to brush up on my understanding of binary conversions. Supported conversions formats = "unsigned-binary" |

Tim 2 Jan 09, 2022
Blender-3D-SH-Dma-plugin - Import and export Sonic Heroes Delta Morph animations (.anm) into Blender 3D

io_scene_sonic_heroes_dma This plugin for Blender 3D allows you to import and ex

Psycrow 3 Mar 22, 2022
Here, I have discuss the three methods of list reversion. The three methods are built-in method, slicing method and position changing method.

Three-different-method-for-list-reversion Here, I have discuss the three methods of list reversion. The three methods are built-in method, slicing met

Sachin Vinayak Dabhade 4 Sep 24, 2021
Integration of Hotwire's Turbo library with Flask.

turbo-flask Integration of Hotwire's Turbo library with Flask, to allow you to create applications that look and feel like single-page apps without us

Miguel Grinberg 240 Jan 06, 2023
A small script I made that takes any standard Decklist of magic the gathering cards and pulls all card images from scryfall at once!

A small script I made that takes any standard Decklist of magic the gathering cards and pulls all card images from scryfall at once!

15 Aug 26, 2022
Anki Addon idea by gbrl.sc to see previous ratings of a card in the reviewer

Card History At A Glance Stop having to press card browser and ctrl+i for every card and then WINCING to see it's history of reviews FEATURES Visualiz

Jerry Zhou 11 Dec 19, 2022
Iss-tracker - ISS tracking script in python using NASA's API

ISS Tracker Tracking International Space Station using NASA's API and plotting i

Partho 9 Nov 29, 2022
Step by step development of a vending coffee machine project, including tkinter, sqlite3, simulation, etc.

Step by step development of a vending coffee machine project, including tkinter, sqlite3, simulation, etc.

Nikolaos Avouris 2 Dec 05, 2021
This is a pretty basic but relatively nice looking Python Pomodoro Timer.

Python Pomodoro-Timer This is a pretty basic but relatively nice looking Pomodoro Timer. Currently its set to a very basic mode, but the funcationalit

EmmHarris 2 Oct 18, 2021
A module that can manage you're gtps

Growtopia Private Server Controler Module For Controle Your GTPS | Build in Python3 Creator Information

iFanpS 6 Jan 14, 2022
Percolation simulation using python

PythonPercolation Percolation simulation using python Exemple de percolation : Etude statistique sur le pourcentage de remplissage jusqu'à percolation

Tony Chouteau 1 Sep 08, 2022
Simple python script for AD enumeration

AutoAD - Simple python script for AD enumeration This tool was created on my spare time to help fellow penetration testers in automating the basic enu

Mohammad Arman 28 Jun 21, 2022