Query multiple mongoDB database collections easily

Overview

leakscoop

Perform queries across multiple MongoDB databases and collections, where the field names and the field content structure in each database may vary.

The Problem

Suppose you've got two database collections, "leak1" and "leak2"

In leak1, the schema looks like this:

FIRST_NAME: "JOHN"
LAST_NAME: "DOE"

and in leak2, the schema looks like this:

FName: "John"
LName: "Doe"

A simple program to iterate through all your collections and perform queries wouldn't work, because:

  • the field names are different. Notice that in leak1, the first name field is FIRST_NAME, while in leak2, the first name field is named FName.
  • the field values might be structured differently. In leak1, everything is captialized. In leak2, it's all title-case.

This program lets you write a configuration for each collection, specifying, in JSON, how to query each field.

It's a work in progress, but so far, it works pretty well. It'll probably be easier to understand if you take a look at the config files under ./collections/. Each JSON file under ./collections/ should be an array of objects. The program automatically processes all JSON files under that directory.

Some more info for how the configurations work can be found in the wiki.

Example Usage:

Find all records of a guy named John Doe.

python3 -m dev --firstname John --lastname Doe

Each database will be searched, and results will be put into a new file under ./results/

Find all records for someone with an address of "1234 NW Long St" python3 -m dev --address "1234 NW long st"

Adding a zipcode to the end, or a state/province might speed up the query (depending on how you index your databases)

Screenshot

(information redacted for this person's privacy)

image Configured fields will print to the console, while all the other fields in a result will be saved under ./results/.

Owner
bagel
bagel
Logica is a logic programming language that compiles to StandardSQL and runs on Google BigQuery.

Logica: language of Big Data Logica is an open source declarative logic programming language for data manipulation. Logica is a successor to Yedalog,

Evgeny Skvortsov 1.5k Dec 30, 2022
Application which allows you to make PostgreSQL databases with Python

Automate PostgreSQL Databases with Python Application which allows you to make PostgreSQL databases with Python I used the psycopg2 library which is u

Marc-Alistair Coffi 0 Dec 31, 2021
Apache Libcloud is a Python library which hides differences between different cloud provider APIs and allows you to manage different cloud resources through a unified and easy to use API

Apache Libcloud - a unified interface for the cloud Apache Libcloud is a Python library which hides differences between different cloud provider APIs

The Apache Software Foundation 1.9k Dec 25, 2022
Use SQL query in a jupyter notebook!

SQL-query Use SQL query in a jupyter notebook! The table I used can be found on UN Data. Or you can just click the link and download the file undata_s

Chuqin 2 Oct 05, 2022
A Redis client library for Twisted Python

txRedis Asynchronous Redis client for Twisted Python. Install Install via pip. Usage examples can be found in the examples/ directory of this reposito

Dorian Raymer 127 Oct 23, 2022
Find graph motifs using intuitive notation

d o t m o t i f Find graph motifs using intuitive notation DotMotif is a library that identifies subgraphs or motifs in a large graph. It looks like t

APL BRAIN 45 Jan 02, 2023
aioodbc - is a library for accessing a ODBC databases from the asyncio

aioodbc aioodbc is a Python 3.5+ module that makes it possible to access ODBC databases with asyncio. It relies on the awesome pyodbc library and pres

aio-libs 253 Dec 31, 2022
A library for python made by me,to make the use of MySQL easier and more pythonic

my_ezql A library for python made by me,to make the use of MySQL easier and more pythonic This library was made by Tony Hasson , a 25 year old student

3 Nov 19, 2021
Create a database, insert data and easily select it with Sqlite

sqliteBasics create a database, insert data and easily select it with Sqlite Watch on YouTube a step by step tutorial explaining this code: https://yo

Mariya 27 Dec 27, 2022
MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)

mongo-connector The mongo-connector project originated as a MongoDB mongo-labs project and is now community-maintained under the custody of YouGov, Pl

YouGov 1.9k Jan 04, 2023
PyMongo - the Python driver for MongoDB

PyMongo Info: See the mongo site for more information. See GitHub for the latest source. Documentation: Available at pymongo.readthedocs.io Author: Mi

mongodb 3.7k Jan 08, 2023
Dinamopy is a python helper library for dynamodb

Dinamopy is a python helper library for dynamodb. You can define your access patterns in a json file and can use dynamic method names to make operations.

Rasim Andıran 2 Jul 18, 2022
Pure-python PostgreSQL driver

pg-purepy pg-purepy is a pure-Python PostgreSQL wrapper based on the anyio library. A lot of this library was inspired by the pg8000 library. Credits

Lura Skye 11 May 23, 2022
SQL for Humans™

Records: SQL for Humans™ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Ken Reitz 6.9k Jan 03, 2023
MySQL database connector for Python (with Python 3 support)

mysqlclient This project is a fork of MySQLdb1. This project adds Python 3 support and fixed many bugs. PyPI: https://pypi.org/project/mysqlclient/ Gi

PyMySQL 2.2k Dec 25, 2022
pandas-gbq is a package providing an interface to the Google BigQuery API from pandas

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

Google APIs 348 Jan 03, 2023
Python client for Apache Kafka

Kafka Python client Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the offici

Dana Powers 5.1k Jan 08, 2023
Python version of the TerminusDB client - for TerminusDB API and WOQLpy

TerminusDB Client Python Development status ⚙️ Python Package status 📦 Python version of the TerminusDB client - for TerminusDB API and WOQLpy Requir

TerminusDB 66 Dec 02, 2022
Pony Object Relational Mapper

Downloads Pony Object-Relational Mapper Pony is an advanced object-relational mapper. The most interesting feature of Pony is its ability to write que

3.1k Jan 04, 2023
python-bigquery Apache-2python-bigquery (🥈34 · ⭐ 3.5K · 📈) - Google BigQuery API client library. Apache-2

Python Client for Google BigQuery Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google

Google APIs 550 Jan 01, 2023