Extract an archive file (zip file or tar file) stored on AWS S3

Overview

S3 Extract

Extract an archive file (zip file or tar file) stored on AWS S3.

Details

Downloads archive from S3 into memory, then extract and re-upload to given destination.

The following S3 information is expected to be given as Environment Variables:

  • AWS_ENDPOINT_URL
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • CERT_PATH (optional)
  • CERT_DL_URL (optional)
  • SIGNATURE_VERSION (optional, defaults to "s3v4")
  • REGION_NAME (optional, defaults to "us-east-1")

Additionally, these information needed for ClearML remote execution can also be given as Env Variable (optional, args will override env var):

  • DEFAULT_DOCKER_IMG
  • DEFAULT_QUEUE

For those not familiar, environment variables can be set through various ways, some being:

  • export = in terminal
  • can be set in ~/.bashrc as well for more permanence

Iteratively extracting a folder of zips/tars is supported as well, through --src-is-dir flag.

Usage

usage: run.py [-h] [--src-is-dir] [--dst-bucket DST_BUCKET] [--verbose] [--remote] [--clml-proj CLML_PROJ] [--clml-task-name CLML_TASK_NAME] [--clml-task-type CLML_TASK_TYPE]
              [--docker-img DOCKER_IMG] [--queue QUEUE]
              src_bucket src_path dst_path

positional arguments:
  src_bucket            Source bucket
  src_path              Source path
  dst_path              Destination path

optional arguments:
  -h, --help            show this help message and exit
  --src-is-dir          Flag to indicate that given src path is a directory. Will iteratively extract any files in it ending with .zip or .tar.
  --dst-bucket DST_BUCKET
                        Destination bucket (optional), will default to Source bucket.
  --verbose             print out current upload filename as it progresses
  --remote              use clearml to remotely run job
  --clml-proj CLML_PROJ
                        ClearML Project Name
  --clml-task-name CLML_TASK_NAME
                        ClearML Task Name
  --clml-task-type CLML_TASK_TYPE
                        ClearML Task Type, e.g. training, testing, inference, etc
  --docker-img DOCKER_IMG
                        Base docker image to pull for ClearML remote execution
  --queue QUEUE         ClearML remote execution queue

Example usage:

python run.py my-bucket dataset/coco/images.tar dataset/coco/ --verbose --remote --clml-proj coco --clml-task-name coco_extraction --docker-img ubuntu/20.04 --queue 1xGPU
Owner
Evan
Evan
An universal file format tool kit. At present will handle the ico format problem.

An universal file format tool kit. At present will handle the ico format problem.

Sadam·Sadik 1 Dec 26, 2021
File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

Samuel Ko 1 Feb 05, 2022
fast change directory with python and ruby

fcdir fast change directory with python and ruby run run python script , chose drirectoy and change your directory need you need python and ruby deskt

XCO 2 Jun 20, 2022
An easy-to-use library for emulating code in minidump files.

dumpulator Note: This is a work-in-progress prototype, please treat it as such. An easy-to-use library for emulating code in minidump files. Example T

Duncan Ogilvie 362 Dec 31, 2022
Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags.

Tiff Tools Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags. Developed by Kitware, Inc. with funding from The National Canc

Digital Slide Archive 32 Dec 14, 2022
Read and write TIFF files

Read and write TIFF files Tifffile is a Python library to store numpy arrays in TIFF (Tagged Image File Format) files, and read image and metadata fro

Christoph Gohlke 346 Dec 18, 2022
CSV-Handler written in Python3

CSVHandler This code allows you to work intelligently with CSV files. A file in CSV syntax is converted into several lists, which are combined in a to

Max Tischberger 1 Jan 13, 2022
A python script to convert an ucompressed Gnucash XML file to a text file for Ledger and hledger.

README 1 gnucash2ledger gnucash2ledger is a Python script based on the Github Gist by nonducor (nonducor/gcash2ledger.py). This Python script will tak

Thomas Freeman 0 Jan 28, 2022
This is a file deletion program that asks you for an extension of a file (.mp3, .pdf, .docx, etc.) to delete all of the files in a dir that have that extension.

FileBulk This is a file deletion program that asks you for an extension of a file (.mp3, .pdf, .docx, etc.) to delete all of the files in a dir that h

Enoc Mena 1 Jun 26, 2022
Organize the files into the relevant sub-folders

This program can be used to organize files in a directory by their file extension. And move duplicate files to a duplicates folder.

Thushara Thiwanka 2 Dec 15, 2021
RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem

RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem. These files are exposed either in their original format, or as PDF files that contain your annotations. This le

Robert Schroll 82 Nov 24, 2022
Python library and shell utilities to monitor filesystem events.

Watchdog Python API and shell utilities to monitor file system events. Works on 3.6+. If you want to use Python 2.6, you should stick with watchdog

Yesudeep Mangalapilly 5.6k Jan 04, 2023
Instant Fuzzy File Search for Alfred

List all the files inside a folder using fd, and instantly fuzzy-search through all of them using fzf, all from inside Alfred with a single keyword: fzf.

Mr. Pennyworth 37 Nov 30, 2022
File storage with API access. Used as a part of the Swipio project

API File storage File storage with API access. Used as a part of the Swipio project 📝 About The Project File storage allows you to upload and downloa

25 Sep 17, 2022
Listreqs is a simple requirements.txt generator. It's an alternative to pipreqs

⚡ Listreqs Listreqs is a simple requirements.txt generator. It's an alternative to pipreqs. Where in Pipreqs, it helps you to Generate requirements.tx

Soumyadip Sarkar 4 Oct 15, 2021
CredSweeper is a tool to detect credentials in any directories or files.

CredSweeper is a tool to detect credentials in any directories or files. CredSweeper could help users to detect unwanted exposure of credentials (such as personal information, token, passwords, api k

Samsung 54 Dec 13, 2022
Nintendo Game Boy music assembly files parser into musicxml format

GBMusicParser Nintendo Game Boy music assembly files parser into musicxml format This python code will get an file.asm from the disassembly of a Game

1 Dec 11, 2021
This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it

This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it. In the current state, it outputs a PrettyTable to txt file as

Joshua Wren 1 Nov 09, 2021
dotsend is a web application which helps you to upload your large files and share file via link

dotsend is a web application which helps you to upload your large files and share file via link

Devocoe 0 Dec 03, 2022
Import Python modules from any file system path

pathimp Import Python modules from any file system path. Installation pip3 install pathimp Usage import pathimp

Danijar Hafner 2 Nov 29, 2021