Python function to query SQLite files stored on S3

Overview

sqlite-s3-query CircleCI Test Coverage

Python function to query a SQLite file stored on S3. It uses multiple HTTP range requests per query to avoid downloading the entire file, and so is suitable for large databases.

All the HTTP requests for a query request the same version of the database object in S3, so queries should complete succesfully even if the database is replaced concurrently by another S3 client. Versioning must be enabled on the S3 bucket.

Operations that write to the database are not supported.

Inspired by phiresky's sql.js-httpvfs, and dacort's Stack Overflow answer.

Installation

sqlite-s3-query depends on APSW, which is not available on PyPI, but can be installed directly from GitHub.

pip install sqlite_s3_query
pip install https://github.com/rogerbinns/apsw/releases/download/3.36.0-r1/apsw-3.36.0-r1.zip --global-option=fetch --global-option=--version --global-option=3.36.0 --global-option=--all --global-option=build --global-option=--enable-all-extensions

Usage

from sqlite_s3_query import sqlite_s3_query

results_iter = sqlite_s3_query(
    'SELECT * FROM my_table WHERE my_column = ?', params=('my-value',),
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
)

for row in results_iter:
    print(row)

If in your project you use multiple queries to the same file, functools.partial can be used to make an interface with less duplication.

from functools import partial
from sqlite_s3_query import sqlite_s3_query

query_my_db = partial(sqlite_s3_query,
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
)

for row in query_my_db('SELECT * FROM my_table WHERE my_col = ?', params=('my-value',)):
    print(row)

for row in query_my_db('SELECT * FROM my_table_2 WHERE my_col = ?', params=('my-value',)):
    print(row)

The AWS region and the credentials are taken from environment variables, but this can be changed using the get_credentials parameter. Below shows the default implementation of this that can be overriden.

import os
from functools import partial
from sqlite_s3_query import sqlite_s3_query

query_my_db = partial(sqlite_s3_query
    url='https://my-bucket.s3.eu-west-2.amazonaws.com/my-db.sqlite',
    get_credentials=lambda: (
        os.environ['AWS_DEFAULT_REGION'],
        os.environ['AWS_ACCESS_KEY_ID'],
        os.environ['AWS_SECRET_ACCESS_KEY'],
        os.environ.get('AWS_SESSION_TOKEN'),  # Only needed for temporary credentials
    ),
)

for row in query_my_db('SELECT * FROM my_table_2 WHERE my_col = ?', params=('my-value',)):
    print(row)
Owner
Michal Charemza
Michal Charemza
A Modular MWDB Utility to Collect Fresh Malware Samples

MWDB Feeds A Modular MWDB Utility to Collect Fresh Malware Samples This project is FREE as in FREE 🍺 , use it commercially, privately or however you

c3rb3ru5 32 Jul 07, 2022
Turn SELECT queries returned by a query into links to execute them

datasette-query-links Turn SELECT queries returned by a query into links to execute them Installation Install this plugin in the same environment as D

Simon Willison 5 Apr 27, 2022
Makes google's political ad database actually useful

Making Google's political ad transparency library suck less This is a series of scripts that takes Google's political ad transparency data and makes t

The Guardian 7 Apr 28, 2022
ClutterDB - Extremely simple JSON database made for infrequent changes which behaves like a dict

extremely simple JSON database made for infrequent changes which behaves like a dict this was made for ClutterBot

Clutter Development 1 Jan 12, 2022
MyReplitDB - the most simplistic and easiest wrapper to use for replit's database system.

MyReplitDB is the most simplistic and easiest wrapper to use for replit's database system. Installing You can install it from the PyPI Or y

kayle 4 Jul 03, 2022
This repo contains the backend of the KMK project

KMK Backend This repository contains the backend part of the KMK project Demo Watch it on Youtube Getting started Pre-commit hooks After you cloned th

21 Nov 26, 2022
Connect Django Project to PostgreSQL

An application for learning things with creating quizzes and flashcards.Django, PostgresSQL are used for this project.

Cena Ashoori 1 Jan 25, 2022
LightDB is a lightweight JSON Database for Python

LightDB What is this? LightDB is a lightweight JSON Database for Python that allows you to quickly and easily write data to a file Installing pip3 ins

Stanislaw 14 Oct 01, 2022
Python function to extract all the rows from a SQLite database file while iterating over its bytes, such as while downloading it

Python function to extract all the rows from a SQLite database file while iterating over its bytes, such as while downloading it

Department for International Trade 16 Nov 09, 2022
TinyDB is a lightweight document oriented database optimized for your happiness :)

Quick Links Example Code Supported Python Versions Documentation Changelog Extensions Contributing Introduction TinyDB is a lightweight document orien

Markus Siemens 5.6k Dec 30, 2022
A Painless Simple Way To Create Schema and Do Database Operations Quickly In Python

PainlessDB - Taking Your Pain away to the moon 🚀 Contribute · Community · Documentation 🎫 Introduction : PainlessDB is a Python-based free and open-

Aiden Ellis 3 Jul 15, 2022
Given a metadata file with relevant schema, an SQL Engine can be run for a subset of SQL queries.

Mini-SQL-Engine Given a metadata file with relevant schema, an SQL Engine can be run for a subset of SQL queries. The query engine supports Project, A

Prashant Raj 1 Dec 03, 2021
Decentralised graph database management system

Decentralised graph database management system To get started clone the repo, and run the command below. python3 database.py Now, create a new termina

Omkar Patil 2 Apr 18, 2022
LaikaDB, banco de dados para projetos simples.

LaikaDB LaikaDB é um banco de dados noSQL para uso local e simples, onde você pode realizar gravações e leituras de forma eficiente e simples. Todos o

Jaedson Silva 0 Jun 24, 2022
ChaozzDBPy - A python implementation based on the original ChaozzDB from Chaozznl with some new features

ChaozzDBPy About ChaozzDBPy is a python implementation based on the original Cha

Igor Iglesias 1 May 25, 2022
Tiny local JSON database for Python.

Pylowdb Simple to use local JSON database 🦉 # This is pure python, not specific to pylowdb ;) db.data['posts'] = ({ 'id': 1, 'title': 'pylowdb is awe

Hussein Sarea 3 Jan 26, 2022
Simpledb-py: Simple JSON database

Simpledb-py: Simple JSON database

тейлс 2 Feb 09, 2022
Python object-oriented database

ZODB, a Python object-oriented database ZODB provides an object-oriented database for Python that provides a high-degree of transparency. ZODB runs on

Zope 574 Dec 31, 2022
Codeqlcompile - 自动反编译闭源应用,创建codeql数据库

codeql_compile 自动反编译闭源应用,创建codeql数据库 准备 首先下载ecj.jar和idea提供反编译的java-decompiler.ja

236 Jan 05, 2023
Tools for analyzing Git history using SQLite

git-history Tools for analyzing Git history using SQLite Installation Install this tool using pip: $ pip install git-history Usage This tool can be r

Simon Willison 128 Jan 02, 2023