Repositorio com arquivos processados da CPI da COVID para facilitar analise

Related tags

Miscellaneouscpi4all
Overview

cpi4all

Repositorio com arquivos processados da CPI da COVID para facilitar analise

Organização

No site do senado é possivel encontrar a lista de todos os documentos coletados pela CPI da COVID.

A tabela no site possui a seguinte estrutura:

No Arquivos Data de recebimento Remetente Origem Descrição Caixa Em Resposta
1 Link1 ... ... ... ... ... ...
2 Link2/link3 ... ... ... ... ... ...

Esses links levam ao download de arquivos PDF com os documentos em questão.

Nesse repositorio você podera encontrar a versão txt desses arquivos. O nome do arquivo nesse repositorio é formado por <No do documento>_<numero do link>. Por exemplo:

link1 = 1_1 porque ele é relativo ao arquivo No 1, e é o primeiro link.

link2 = 2_1 porque ele é relativo ao arquivo No 2, e é o primeiro link dessa linha.

link3 = 2_2 porque ele é relativo ao arquivo No 2, e é o segundo link da linha.

A versão texto de todos os documentos está na pasta database/txts/.

Exemplos:

Arquivo No 1, primeiro link: 1_1

Arquivo No 4, quarto link: 3_4

Nota 1: Nem todos os arquivos foram convertidos ainda

Nota 2: A conversão usa reconhecimento de imagem e pode ficar bem ruim as vezes, gerando erros ortograficos ou palavras sem nexo algum.

Para desenvolvedores

Os scripts funcionam na seguinte sequencia:

  1. extract_rows.py: Vai no site do senado e extrai as informações de cada linha da tabela. Todos os dados são salvos em database/rows.
  2. extract_headers.py: Para cada link em cada linha, esse script pega metadados do arquivo (tamanho, tipo) que vão ser uteis depois. Esses dados são salvos em database/headers.
  3. download_pdfs.py: Baixa todos os PDFs descritos em database/headers e salva em database/pdfs.
  4. convert_pdf_to_jpg.py: Converte todos os PDFs em database/pdfs para imagens em database/jpgs.
  5. convert_jpg_to_txt.py: Converte todos as imagens em database/jpgs para texto em database/txt.

Por motivos de performance, apenas as pastas database/rows, database/headers e database/txts sao salvas nesse repositorio.

TODO: 0. Melhorar esse readme :)

  1. Usar o githubpages para gerar um site estatico que permite pesquisar em todos os txt
  2. Terminar de converter todos os arquivos
  3. Investigar arquivos em que a conversão ficou pessima.
  4. Fazer extração automatica de datas e prover um json com a ordem cronologica dos arquivos.
Owner
Breno Rodrigues Guimarães
Breno Rodrigues Guimarães
A simple and easy to use Python's PIP configuration manager, similar to the Arch Linux's Java manager.

PIPCONF - The PIP configuration manager If you need to manage multiple configurations containing indexes and trusted hosts for PIP, this project was m

João Paulo Carvalho 11 Nov 30, 2022
A silly RPG(Not MMO) made in python

Project_PyMMo A silly RPG(Not MMO) made in python, FOR WINDOWS 10 ONLY! Hello tester, to install pymmo follow the steps bellow: 1.First install python

0 Feb 08, 2022
An osu! cheat made in c++ rewritten in python and currently undetected.

megumi-python An osu! cheat made in c++ rewritten in python and currently undetected. Installation Guide Download python 3.9 from https://python.org C

Elaina 2 Nov 18, 2022
Attempt at a Windows version of the plotman Chia Plot Manager system

windows plotman: an attempt to get plotman to work on windows THIS IS A BETA. Not ready for production use just yet. Almost, but not quite there yet.

59 May 11, 2022
An OBS script to fuze files together

OBS TEXT FUZE Fuze text files and inject the output into a text source. The Index file directory should be a list of file directorys for the text file

SuperZooper3 1 Dec 27, 2021
Python Interactive Graphical System made during Computer Graphics classes (INE5420-2021.1)

PY-IGS - The PYthon Interactive Graphical System The PY-IGS Installation To install this software you will need these dependencies (with their thevelo

Enzo Coelho Albornoz 4 Dec 03, 2021
Um Script De Mensagem anonimas Para linux e Termux Feito em python

Um Script De Mensagem anonimas Para linux e Termux Feito em python feito em um celular

6 Sep 09, 2021
:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

pre-commit-dbt List of pre-commit hooks to ensure the quality of your dbt projects. BETA NOTICE: This tool is still BETA and may have some bugs, so pl

Offbi 262 Nov 25, 2022
A collection of Workflows samples for various use cases

Workflows Samples Workflows allow you to orchestrate and automate Google Cloud and HTTP-based API services with serverless workflows.

Google Cloud Platform 76 Jan 07, 2023
✔️ Create to-do lists to easily manage your ideas and work.

Todo List + Add task + Remove task + List completed task + List not completed task + Set clock task time + View task statistics by date Changelog v 1.

Abbas Ataei 30 Nov 28, 2022
A Python script made for the Python Discord Pixels event.

Python Discord Pixels A Python script made for the Python Discord Pixels event. Usage Create an image.png RGBA image with your pattern. Transparent pi

Stanisław Jelnicki 4 Mar 23, 2022
Python library for ODE integration via Taylor's method and LLVM

heyoka.py Modern Taylor's method via just-in-time compilation Explore the docs » Report bug · Request feature · Discuss The heyókȟa [...] is a kind of

Francesco Biscani 45 Dec 21, 2022
Identify and annotate mutations from genome editing assays.

CRISPR-detector Here we propose our CRISPR-detector to facilitate the CRISPR-edited amplicon and whole genome sequencing data analysis, with functions

hlcas 2 Feb 20, 2022
A simple wrapper to analyse and visualise reinforcement learning agents' behaviour in the environment.

Visrl Visrl (pronounced "visceral") is a simple wrapper to analyse and visualise reinforcement learning agents' behaviour in the environment. Reinforc

Jet New 14 Jun 27, 2022
Blender 3.1 Alpha (and later) PLY importer that correctly loads point clouds (and all PLY models as point clouds)

import-ply-as-verts Blender 3.1 Alpha (and later) PLY importer that correctly loads point clouds (and all PLY models as point clouds) Latest News Mand

Michael Prostka 82 Dec 20, 2022
Wrapper for the undocumented CodinGame API. Can be used both synchronously and asynchronlously.

codingame API wrapper Pythonic wrapper for the undocumented CodinGame API. Installation Python 3.6 or higher is required. Install codingame with pip:

Takos 19 Jun 20, 2022
Anki Cards for the HSK vocabulary Chinese-German

Anki-HanyuShuipingKaoshi Anki Cards for the HSK vocabulary Chinese-German Das Deck baut auf folgenden Quellen auf: China Endecken Wortschatz von wohok

1 Jan 07, 2022
Necst-lib - Pure Python tools for NECST

necst-lib Pure Python tools for NECST. Features This library provides: something

NANTEN2 Group 5 Dec 15, 2022
Old versions of Deadcord that are problematic or used as reference.

⚠️ Unmaintained and broken. We have decided to release the old version of Deadcord before our v1.0 rewrite. (which will be equiped with much more feat

Galaxzy 1 Feb 10, 2022
Team collaborative evaluation tracker.

Team collaborative evaluation tracker.

2 Dec 19, 2021