Dados Públicos de CNPJ disponibilizados pela Receita Federal do Brasil

Overview

Dados Públicos CNPJ

  • Fonte oficial da Receita Federal do Brasil, aqui.
  • Layout dos arquivos, aqui.

A Receita Federal do Brasil disponibiliza bases com os dados públicos do cadastro nacional de pessoas jurídicas (CNPJ).

De forma geral, nelas constam as mesmas informações que conseguimos ver no cartão do CNPJ, quando fazemos uma consulta individual, acrescidas de outros dados de Simples Nacional, sócios e etc. Análises muito ricas podem sair desses dados, desde econômicas, mercadológicas até investigações.

Nesse repositório consta um processo de ETL para i) baixar os arquivos; ii) descompactar; iii) ler, tratar e iv) inserir num banco de dados relacional PostgreSQL.


Infraestrutura necessária:

  • Python 3.8 - libraries:

    • wget
    • pandas
    • ftplib
    • datetime
    • gzip
    • urllib
    • bs4
    • re
    • os
    • zipfile
    • sqlalchemy
    • psycopg2
    • time
  • Banco de dados:


How to use:

  1. Com o Postgre instalado, inicie a instância do servidor (pode ser local) e crie o banco de dados conforme o arquivo banco_de_dados.sql.

  2. Conforme o seu ambiente, substitua as variáveis abaixo no arquivo ETL_coletar_dados_e_gravar_BD.py:

    • output_files: diretório de destino para o donwload dos arquivos
    • user: usuário do banco de dados criado pelo arquivo banco_de_dados.sql
    • passw: senha do usuário do BD
    • host: host da conexão com o BD
    • port: porta da conexão com o BD
    • database: nome da base de dados na instância (Dados_RFB - conforme arquivo banco_de_dados.sql)
  3. Executar o arquivo ETL_coletar_dados_e_gravar_BD.py e aguardar a finalização do processo.

    • Os arquivos são grandes: dependendo da infraestrutura isso deve levar muitas horas para conclusão.
    • Arquivos de 08/05/2021: 4,68 GB compactados e 17,1 GB descompactados.

Tabelas geradas:

  • Para maiores informações, consulte o layout.

    • empresa: dados cadastrais da empresa em nível de matriz
    • estabelecimento: dados analíticos da empresa por unidade / estabelecimento (telefones, endereço, filial, etc)
    • socios: dados cadastrais dos sócios das empresas
    • simples: dados de MEI e Simples Nacional
    • cnae: código e descrição dos CNAEs
    • quals: tabela de qualificação das pessoas físicas - sócios, responsável e representante legal.
    • natju: tabela de naturezas jurídicas - código e descrição.
    • moti: tabela de motivos da situação cadastral - código e descrição.
    • pais: tabela de países - código e descrição.
    • munic: tabela de municípios - código e descrição.
  • Pelo volume de dados, as tabelas empresa, estabelecimento, socios e simples possuem índices para a coluna cnpj_basico, que é a principal chave de ligação entre elas.

Modelo de Entidade Relacionamento:

alt text

Owner
Aphonso Henrique do Amaral Rafael
Economist, accountant and data & analytics enthusiastic. Data science and statistics permanently student.
Aphonso Henrique do Amaral Rafael
FTX auto lending bot with python

FTX auto lending bot Get the API key Check my article for step by step + screenshots Setup & Run Install python 3 Install dependency pip install -r re

Patompong Manprasatkul 1 Dec 24, 2021
♻️ API to run evaluations of the FAIR principles (Findable, Accessible, Interoperable, Reusable) on online resources

♻️ FAIR enough 🎯 An OpenAPI where anyone can run evaluations to assess how compliant to the FAIR principles is a resource, given the resource identif

Maastricht University IDS 4 Oct 20, 2022
Telegram bot for stream music on telegram, powered by py-tgcalls and Pyrogram

Telegram Streamer Bot Telegram bot for stream music on telegram, powered by py-tgcalls and Pyrogram ✨ Features Coming soon, help me to improve it 🛠 C

Shohih Abdul 11 Oct 21, 2022
This is a Python bot, which automates logging in, purchasing and planting the seeds. Open source bot and completely free.

🌻 Sunflower Land Bot 🌻 ⚠️ Warning I am not responsible for any penalties incurred by those who use the bot, use it at your own risk. This BOT is com

Newerton 18 Aug 31, 2022
A python api to get info on covid-19

A python api to get info on covid-19

roof 2 Sep 18, 2022
This repository contains code written in the AWS Cloud Development Kit (CDK)

This repository contains code written in the AWS Cloud Development Kit (CDK) which launches infrastructure across two different regions to demonstrate using AWS AppSync in a multi-region setup.

AWS Samples 5 Jun 03, 2022
Python module and command line script client for http://urbandictionary.com

py-urbandict py-urbandict is a client for urbandictionary.com. Project page on github: https://github.com/novel/py-urbandict PyPI: https://pypi.org/pr

Roman Bogorodskiy 32 Oct 01, 2022
Multi-Branch CI/CD Pipeline using CDK Pipelines.

Using AWS CDK Pipelines and AWS Lambda for multi-branch pipeline management and infrastructure deployment. This project shows how to use the AWS CDK P

AWS Samples 36 Dec 23, 2022
A discord bot written in discord.py to manage custom roles assigned to boosters of your server.

BBotty A discord bot written in discord.py to manage custom roles assigned to boosters of your server. v0.0.1-alpha released! This version is incomple

Oui002 1 Nov 27, 2021
A frame to create discord bots (for myself) that uses cogs, JSON, activities, and more.

dpy-frame A frame to create discord bots (for myself) that uses cogs, JSON, activities, and more. NOTE: Documentation is incomplete, so please wait un

Apple Discord 1 Nov 06, 2021
Termux Pkg

PKG Install Termux All Basic Pkg. Installation : pkg update && pkg upgrade && pkg install python && pkg install python2 && pkg install git && git clon

ɴᴏʙɪᴛᴀシ︎ 1 Oct 28, 2021
An advanced crypto trading bot written in Python

Jesse Jesse is an advanced crypto trading framework which aims to simplify researching and defining trading strategies. Why Jesse? In short, Jesse is

Jesse 4.4k Jan 09, 2023
A simple Python library to integrate with the Heron Data API

Heron Python This library provides easy access to the Heron Data API from applications written in Python. Documentation No language-specific docs are

Heron Data 11 Nov 11, 2022
Discord Voice Call DoS

VC DoS Simple, effective Discord DM/GC voice call Denial of Service. How to Use & FAQ 1. Download the script (obviously). 2. In CMD prompt, find the l

Roover 4 Feb 28, 2022
Telegram Bot to save Posts or Files that can be Accessed via Special Links

OKAERI-FILE Bot Telegram untuk menyimpan Posting atau File yang dapat Diakses melalui Link Khusus. Jika Anda memerlukan tambahan module lagi dalam rep

Wahyusaputra 5 Aug 04, 2022
Lumi-Bot - Discord bot that fetches cryptocurrency prices utilizing CoinGeko API

Lumi-Bot Discord bot that fetches and monitors cryptocurrency prices utilizing C

Diego Castro 2 Oct 08, 2022
An interactive and multi-function Telegram bot, made especially for Telegram groups.

PyKorone An interaction and fun bot for Telegram groups, having some useful and other useless commands. Created as an experiment and learning bot but

Amano Team 17 Nov 12, 2022
Python wrapper for the Intercom API.

python-intercom Not officially supported Please note that this is NOT an official Intercom SDK. The third party that maintained it reached out to us t

Intercom 215 Dec 22, 2022
This repository is used to simplify the process of cloning the SSM documents across the AWS regions.

SSM Cloner Introduction This module is created in order to simplify the process of copying the SSM documents from one region to another regions. As an

6 Jun 04, 2022
Changes your desktop wallpaper based on the weather.

WallPaperChanger 🖼️ Description ⛈️ This Python script changes your desktop wallpaper based on the weather. Cloning 🌀 $ git clone https://github.com/

Clarence Yang 74 Nov 29, 2022