Python plugin/extra to load data files from an external source (such as AWS S3) to a local directory

Overview

Data Loader Plugin - Python

Table of Content (ToC)

Table of contents generated with markdown-toc

Overview

The data loader plugin, aims at supporting running programs (e.g., API service backends) when downloading data from cloud services such as AWS S3. It provides a base Python library, namely data-loader-plugin, offering a few methods to download data files from AWS S3.

References

Python module

Python virtual environments

Installation

Clone this Git repository

$ mkdir -p ~/dev/infra && \
  git clone [email protected]:cloud-helpers/python-plugin-data-loader.git ~/dev/infra/python-plugin-data-loader
$ cd ~/dev/infra/python-plugin-data-loader

Python environment

  • If not already done so, install pyenv, Python 3.9 and, pip and pipenv
    • PyEnv:
$ git clone https://github.com/pyenv/pyenv.git ${HOME}/.pyenv
$ cat >> ~/.profile2 << _EOF

# Python
eval "\$(pyenv init --path)"

_EOF
$ cat >> ~/.bashrc << _EOF

# Python
export PYENV_ROOT="\${HOME}/.pyenv"
export PATH="\${PYENV_ROOT}/bin:\${PATH}"
. ~/.profile2
if command -v pyenv 1>/dev/null 2>&1
then
        eval "\$(pyenv init -)"
fi
if command -v pipenv 1>/dev/null 2>&1
then
        eval "\$(pipenv --completion)"
fi

_EOF
$ . ~/.bashrc
  • Python 3.9:
$ pyenv install 3.9.8 && pyenv local 3.9.8
  • pip:
$ python -mpip install -U pip
  • pipenv:
$ python -mpip install -U pipenv

Usage

Install the data-loader-plugin module

  • There are at least two ways to install the data-loader-plugin module, in the Python user space with pip and in a dedicated virtual environment with pipenv.

    • Both options may be installed in parallel
    • The Python user space (typically, /usr/local/opt/[email protected] on MacOS or ~/.pyenv/versions/3.9.8 on Linux) may already have many other modules installed, parasiting a fine-grained control over the versions of every Python dependency. If all the versions are compatible, then that option is convenient as it is available from the whole user space, not just from this sub-directory
  • In the remainder of that Usage section, it will be assumed that the data-loader-plugin module has been installed and readily available from the environment, whether that environment is virtual or not. In other words, to adapt the documentation for the case where pipenv is used, just add pipenv run in front of every Python-related command.

Install in the Python user space

  • Install and use the data-loader-plugin module in the user space (with pip):
$ python -mpip uninstall data-loader-plugin
$ python -mpip install -U data-loader-plugin

Installation in a dedicated Python virtual environment

  • Install and use the data-loader-plugin module in a virtual environment:
$ pipenv shell
(python-...-JwpAHotb) ✔ python -mpip install -U data-loader-plugin
(python-...-JwpAHotb) ✔ python -mpip install -U data-loader-plugin
(python-...-JwpAHotb) ✔ exit

Use data-loader-plugin as a module from another Python program

  • Check the data file with the AWS command-line (CLI):
$ aws s3 ls --human s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv --no-sign-request
2021-10-29 20:44:34  249.3 MiB yellow_tripdata_2021-07.csv
  • Module import statements:
>>> import importlib
>>> from types import ModuleType
>>> from data_loader_plugin.base import DataLoaderBase
  • Create an instance of the DataLoaderBase Python class:
>>> plugin: ModuleType = importlib.import_module("data_loader_plugin.copyfile")
>>> data_loader: DataLoaderBase = plugin.DataLoader(
        local_path='/tmp/yellow_tripdata_2021-07.csv',
        external_url='s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv',
    )
>>> data_load_success, message = data_loader.load()

Development / Contribution

  • Build the source distribution and Python artifacts (wheels):
$ rm -rf _skbuild/ build/ dist/ .tox/ __pycache__/ .pytest_cache/ MANIFEST *.egg-info/
$ pipenv run python setup.py sdist bdist_wheel
  • Upload to Test PyPi (no Linux binary wheel can be uploaded on PyPi):
$ PYPIURL="https://test.pypi.org"
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://test.pypi.org/legacy/
Uploading data_loader_plugin-0.0.1-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.1.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]

View at:
https://test.pypi.org/project/data-loader-plugin/0.0.1/
  • Upload/release the Python packages onto the PyPi repository:
    • Register the authentication token for access to PyPi:
$ PYPIURL="https://upload.pypi.org"
$ pipenv run keyring set ${PYPIURL}/ __token__
Password for '__token__' in '${PYPIURL}/':
  • Register the authentication token for access to PyPi:
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://upload.pypi.org/legacy/
Uploading data_loader_plugin-0.0.1-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.1.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]

View at:
https://pypi.org/project/data-loader-plugin/0.0.1/
$ pipenv run python setup.py build_sphinx
running build_sphinx
Running Sphinx v4.3.0
[autosummary] generating autosummary for: README.md
myst v0.15.2: ..., words_per_minute=200)
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 1 added, 0 changed, 0 removed
reading sources... [100%] README
...
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] README
...
build succeeded.

The HTML pages are in build/sphinx/html.
  • Re-generate the Python dependency files (requirements.txt) for the CI/CD pipeline (currently Travis CI):
$ pipenv --rm; rm -f Pipfile.lock; pipenv install; pipenv install --dev
$ git add Pipfile.lock
$ pipenv lock -r > ci/requirements.txt
$ pipenv lock --dev -r > ci/requirements-dev.txt
$ git add ci/requirements.txt ci/requirements-dev.txt
$ git commit -m "[CI] Upgraded the Python dependencies for the Travis CI pipeline"

Test the data loader plugin Python module

  • Enter into the pipenv Shell:
$ pipenv shell
(python-...-iVzKEypY) ✔ python -V
Python 3.9.8
  • Uninstall any previously installed data-loader-plugin module/library:
(python-...-iVzKEypY) ✔ python -mpip uninstall data-loader-plugin
  • Launch a simple test with pytest
(python-iVzKEypY) ✔ python -mpytest tests
=================== test session starts ==================
platform darwin -- Python 3.9.8, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: ~/dev/infra/python-plugin-data-loader
plugins: cov-3.0.0
collected 3 items

tests/test_copyfile.py .                             [ 33%]
tests/test_s3.py ..                                  [100%]
====================== 3 passed in 1.22s ==================
  • Exit the pipenv Shell:
(python-...-iVzKEypY) ✔ exit
Owner
Cloud Helpers
Cloud helper tools and documentation
Cloud Helpers
Assembly example for CadQuery

Spindle and vacuum attachment This is a model of the vacuum attachment for my Workbee CNC router. There is a mist spray coming from the left hand side

Marcus Boyd 20 Sep 16, 2022
A python script for practicing Toki Pona.

toki.py A python script for practicing Toki Pona. Modified from a hirigana script by ~vilmibm. Example of the script running: $ ./toki.py This script

Dustin 2 Dec 09, 2021
TallerStereoVision Convencion Python Chile 2021

TallerStereoVision Convencion Python Chile 2021 Taller Stereo Vision & Python PyCon.cl 2021 Instalación Se recomienta utilizar Virtual Environment pyt

2 Oct 20, 2022
bib2xml - A tool for getting Word formatted XML from Bibtex files

bib2xml - A tool for getting Word formatted XML from Bibtex files Processes Bibtex files (.bib), produces Word Bibliography XML (.xml) output Why not

Matheus Sartor 1 May 05, 2022
:art: Diagram as Code for prototyping cloud system architectures

Diagrams Diagram as Code. Diagrams lets you draw the cloud system architecture in Python code. It was born for prototyping a new system architecture d

MinJae Kwon 27.5k Jan 04, 2023
Export transactions for an algorand wallet to a CSV file

algorand_txn_csv_exporter - (Algorand transaction CSV exporter) This script will export transactions for an algorand wallet to a CSV file. It is inten

TeneoPython01 5 Jun 19, 2022
A plugin for poetry that allows you to execute scripts defined in your pyproject.toml, just like you can in npm or pipenv

poetry-exec-plugin A plugin for poetry that allows you to execute scripts defined in your pyproject.toml, just like you can in npm or pipenv Installat

38 Jan 06, 2023
Open source stenotype engine

Plover Bringing stenography to everyone. Homepage Releases Wiki Blog Google Group Discord Chat About Installation Getting help Contributing Donations

Open Steno Project 2k Jan 09, 2023
Necst-lib - Pure Python tools for NECST

necst-lib Pure Python tools for NECST. Features This library provides: something

NANTEN2 Group 5 Dec 15, 2022
Курс про техническое совершенство для нетехнарей

Technical Excellence 101 Курс про техническое совершенство для нетехнарей. Этот курс представлят из себя серию воркшопов, при помощи которых можно объ

Anton Bevzuk 11 Nov 13, 2022
Calculadora-basica - Calculator with basic operators

Calculadora básica Calculadora com operadores básicos; O programa solicitará a d

Vitor Antoni 2 Apr 26, 2022
A conda-smithy repository for boost-histogram.

The official Boost.Histogram Python bindings. Provides fast, efficient histogramming with a variety of different storages combined with dozens of composable axes. Part of the Scikit-HEP family.

conda-forge 0 Dec 17, 2021
Learn the basics of Python. These tutorials are for Python beginners. so even if you have no prior knowledge of Python, you won’t face any difficulty understanding these tutorials.

01_Python_Introduction Introduction 👋 Python is a modern, robust, high level programming language. It is very easy to pick up even if you are complet

Milaan Parmar / Милан пармар / _米兰 帕尔马 245 Dec 30, 2022
This Python library searches through a static directory and appends artist, title, track number, album title, duration, and genre to a .json object

This Python library searches through a static directory (needs to match your environment) and appends artist, title, track number, album title, duration, and genre to a .json object. This .json objec

Edan Ybarra 1 Jun 20, 2022
Autogenerador tonto de paquetes para ROSCPP

Autogenerador tonto de paquetes para ROSCPP Autogenerador de paquetes que usan C++ en ROS. Por ahora tiene las siguientes capacidades: Permite crear p

1 Nov 26, 2021
A beautiful and useful prompt for your shell

A Powerline style prompt for your shell A beautiful and useful prompt generator for Bash, ZSH, Fish, and tcsh: Shows some important details about the

Buck Ryan 6k Jan 08, 2023
Run CodeServer on Google Colab using Inlets in less than 60 secs using your own domain.

Inlets Colab Run CodeServer on Colab using Inlets in less than 60 secs using your own domain. Features Optimized for Inlets/InletsPro Use your own Cus

2 Dec 30, 2021
A function decorator for enforcing function signatures

A function decorator for enforcing function signatures

Emmanuel I. Obi 0 Dec 08, 2021
SWS Filters App - SWS Filters App With Python

SWS Filters App Fun 😅 ... Fun 😅 Click On photo and see 😂 😂 😂 Your Video rec

Sagar Jangid 3 Jul 07, 2022
Goddard A collection of small, simple strategies for Freqtrade

Goddard A collection of small, simple strategies for Freqtrade. Simply add the strategy you choose in your strategies folder and run. ⚠️ General Crypt

Shane Jones 118 Dec 14, 2022