Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

Overview

open(LARGE)

Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

Motivation

Oftentimes, especially when working with data-heavy applications, large files can proliferate in a repository. Version controlling them is an obvious next step, however, GitHub's git LFS implementation doesn't support the deletion of large files, making it easy for them to eat-up the LFS quota and explode the size of your repos.

Solution

pip install open-large

Simple example

from open_large import LargeFile

LargeFile.configure_credentials({
    "aws_region_name": "your_region_like_eu-west-2",
    "aws_access_key_id": "YOUR_ACCESS_KEY_ID",
    "aws_secret_access_key": "YOUR_VERY_SECRET_ACCESS_KEY",
    "large_files_bucket_name": "create_a_bucket_and_put_its_name_here",
})

# Creates a new version and deletes the older version leaving the 3 most recently used intact
with LargeFile("test.txt", "w", keep_last_n=3) as f:
    for i in range(100000):
        f.write('test\n')

# By default the latest version is returned
# but an optional `version` keyword argument can be provided as well
with LargeFile("test.txt", "r") as f:
    print(f.readlines()[0])

Automatically creates a file, writes to it, uploads it to S3, and then queries the most recent version of it. In this case, the latest version is already in the local cache, no download is required.

More details

LargeFile behaves like an opened file (in the background it is a temp file after all). Binary reading and writing is supported along with the different keywords open() accepts.

The local cache can be configured with these properties:

LargeFile.cache_path = Path('.cache')
LargeFile.max_cache_size = "30 GB"

I only need a path

In case you only need a path to the "remote" file, this pattern can be applied:

path_to_model = LargeFile("folder-of-my-bert-model", version=31).get()

This will first download the file/folder into your local cache folder. Then, it returns a Path object to the local version. Which can be turned into a string with str(path_to_model).

The same approach works for uploads:

LargeFile("folder-of-my-bert-model").push('path_to_local/folder_or_file')

This way, both regular files and folders can be handled. The uploaded file is called folder-of-my-bert-model, the local name is ignored.

Lastly, all version of the remote object can be deleted by calling LargeFile("my-file").delete(). It will still reside in your local cache afterwards, its deletion will happen next time your local cache has to be pruned.

Command-line example

The package can be used as a module from the command-line to give you more flexibility.

Setup

Create an .ini file (or use ~/.aws/credentials). It may look like this:

[DEFAULT]
aws_region_name = your_region_like_eu-west-2
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_VERY_SECRET_ACCESS_KEY
large_files_bucket_name = my_large_files

Just like in example secrets.

Print the expected options

python3 -m open_large --help

Upload some files

python3 -m open_large --secrets secrets.ini --push my_first_file.json folder/my_second_file my_folder

Only the filename is used as the S3 name, the rest of the path is ignored.

Download some files to the local cache

This can be useful when building a Docker image for example. This way, the files can already reside inside the container and need not be downloaded later.

python3 -m open_large --secrets ~/.aws/credentials --cache my_first_file.json:3 my_second_file my_folder:0

Versions may be specified by using :-s.

Delete remote files

python3 -m open_large --secrets ~/.aws/credentials --delete my_first_file.json
👻🟡 Download all Snapchat video & photo memories from a data export.

Snapchat "Memories" Fetcher In compliance with the California Consumer Privacy Act of 2018 (“CCPA”), businesses which collect and store user data must

Todd Birchard 18 Dec 26, 2022
Python script to download all images/webms of a 4chan thread

Python3 script to continuously download all images/webms of multiple 4chan thread simultaneously - without installation

Micha Fink 208 Jan 04, 2023
Python script designed to search and fetch direct download links from nxbrew.com

SwitchGamesDownloader Only for windows nxbrew.com is a website, accessible only using a proxy, where the majority of games for the Nintendo Switch are

Backend 91 Dec 28, 2022
squid-dl is a massively parallel yt-dlp-based YouTube downloader.

squid-dl squid-dl is a massively parallel yt-dlp-based YouTube downloader. Installation Run the setup.py, which will install squid-dl and its two depe

tuxlovesyou 51 Jan 05, 2023
A prometheus exporter for torrent downloader like qbittorrent/transmission/deluge

downloader-exporter A prometheus exporter for qBitorrent/Transmission/Deluge. Get metrics from multiple servers and offers them in a prometheus format

Lei Shi 41 Nov 18, 2022
Programmers-quest - Programmer's Quest! An open source MMO built on top of the Panda3D game engine and Astron server

Programmer's Quest! Programmer's Quest! The open source Python 3 2D MMORPG showc

Jordan Maxwell 5 Oct 07, 2022
Make YouTube videos tasks in Todoist faster and time efficient!

Youtubist Basically fork of yt-dlp python module to my needs. You can paste playlist or channel link on the YouTube. It will automatically format to s

Konrad Konieczny 1 Dec 04, 2022
Easily download audio described movies and TV shows found on audiovault.net

AudioVault Downloader A convenient downloader for audio described movies and TV shows found on the Audio Vault. get latest binary release for Windows

Carter Temm 5 Feb 10, 2022
The lyrics module of the repository apple-playlist-downloader

This is the lyrics module of the repository apple-playlist-downloader. With this code you can download the .lrc file (time synced lyrics) from yours t

Antoine Bollengier 6 Oct 07, 2022
Downloads photos you saved from a specific profile.

instagram-download-saved A python script that downloads photos you saved from a specific profile. The only dependency is instaloader, an open-source p

Aviv 1 Dec 19, 2021
An Inline Telegram bot that can download YouTube videos with permanent thumbnail support

Tube (YouTube Downloader) An Inline Telegram bot that can download YouTube videos with permanent thumbnail support About Bot need to be in Inline Mode

Renjith Mangal 30 Dec 14, 2022
Download from HBO-MAX-BLIM-TV-Paramount

#HBO MAX- BlimTV -Paramount plus 4K Downloader Tool To download 4K HDR DV SDR from HBO MAX- BlimTV -Paramount plus Hello Fellow Developers/ ! Hi! M

4 Dec 25, 2021
A Celery application to collect data, download media and extract information from social media APIs

Project IBEX A Celery application to collect data, download media and extract information from social media APIs. Requirements You must have a Redis D

ibex 4 Dec 15, 2022
Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included

WV-AMZN-4K-RIPPER Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included For CDM You can Mail :- 11 Dec 23, 2021

Simple Python script to download images and videos from public subreddits without using Reddit's API 😎

Subreddit Media Downloader Download images and videos from any public subreddit without using Reddit's API Made with ❤ by Nico 💬 About: This script a

Nico 106 Jan 07, 2023
Persepolis Download Manager is a GUI for aria2.

Persepolis Download Manager Content About FAQ Screenshots Credits About Persepolis is a download manager & a GUI for Aria2. It's written in Python. Pe

Persepolis 5.6k Dec 31, 2022
Python script to download entire campaign images and navigation.

Squidle campaign downloader Python script to download entire campaign images and navigation. usage: squidle_campaign_downloader.py [-h] [--api-token A

Miquel Massot 2 Nov 17, 2021
Python/Selenium script to scrape data about university courses

university-courses Python/Selenium script to scrape data about university courses. Script first extracts URLs of each courses homepage, then trawls ea

Sam Brown 1 Feb 02, 2022
Apple Music Animated Artwork Fetcher

A python script for downloading the animated artwork of an Apple Music album.

bunny 46 Jan 03, 2023
Spotify Playlist Downloader With Python

Spotify Playlist Downloader This will let you download Spotify playlists for free without Premium. It gets all the songs from the API and downloads th

Yasho 16 Sep 28, 2022