Python plugin/extra to load data files from an external source (such as AWS S3) to a local directory

Overview

Data Loader Plugin - Python

Table of Content (ToC)

Table of contents generated with markdown-toc

Overview

The data loader plugin, aims at supporting running programs (e.g., API service backends) when downloading data from cloud services such as AWS S3. It provides a base Python library, namely data-loader-plugin, offering a few methods to download data files from AWS S3.

References

Python module

Python virtual environments

Installation

Clone this Git repository

$ mkdir -p ~/dev/infra && \
  git clone [email protected]:cloud-helpers/python-plugin-data-loader.git ~/dev/infra/python-plugin-data-loader
$ cd ~/dev/infra/python-plugin-data-loader

Python environment

  • If not already done so, install pyenv, Python 3.9 and, pip and pipenv
    • PyEnv:
$ git clone https://github.com/pyenv/pyenv.git ${HOME}/.pyenv
$ cat >> ~/.profile2 << _EOF

# Python
eval "\$(pyenv init --path)"

_EOF
$ cat >> ~/.bashrc << _EOF

# Python
export PYENV_ROOT="\${HOME}/.pyenv"
export PATH="\${PYENV_ROOT}/bin:\${PATH}"
. ~/.profile2
if command -v pyenv 1>/dev/null 2>&1
then
        eval "\$(pyenv init -)"
fi
if command -v pipenv 1>/dev/null 2>&1
then
        eval "\$(pipenv --completion)"
fi

_EOF
$ . ~/.bashrc
  • Python 3.9:
$ pyenv install 3.9.8 && pyenv local 3.9.8
  • pip:
$ python -mpip install -U pip
  • pipenv:
$ python -mpip install -U pipenv

Usage

Install the data-loader-plugin module

  • There are at least two ways to install the data-loader-plugin module, in the Python user space with pip and in a dedicated virtual environment with pipenv.

    • Both options may be installed in parallel
    • The Python user space (typically, /usr/local/opt/[email protected] on MacOS or ~/.pyenv/versions/3.9.8 on Linux) may already have many other modules installed, parasiting a fine-grained control over the versions of every Python dependency. If all the versions are compatible, then that option is convenient as it is available from the whole user space, not just from this sub-directory
  • In the remainder of that Usage section, it will be assumed that the data-loader-plugin module has been installed and readily available from the environment, whether that environment is virtual or not. In other words, to adapt the documentation for the case where pipenv is used, just add pipenv run in front of every Python-related command.

Install in the Python user space

  • Install and use the data-loader-plugin module in the user space (with pip):
$ python -mpip uninstall data-loader-plugin
$ python -mpip install -U data-loader-plugin

Installation in a dedicated Python virtual environment

  • Install and use the data-loader-plugin module in a virtual environment:
$ pipenv shell
(python-...-JwpAHotb) ✔ python -mpip install -U data-loader-plugin
(python-...-JwpAHotb) ✔ python -mpip install -U data-loader-plugin
(python-...-JwpAHotb) ✔ exit

Use data-loader-plugin as a module from another Python program

  • Check the data file with the AWS command-line (CLI):
$ aws s3 ls --human s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv --no-sign-request
2021-10-29 20:44:34  249.3 MiB yellow_tripdata_2021-07.csv
  • Module import statements:
>>> import importlib
>>> from types import ModuleType
>>> from data_loader_plugin.base import DataLoaderBase
  • Create an instance of the DataLoaderBase Python class:
>>> plugin: ModuleType = importlib.import_module("data_loader_plugin.copyfile")
>>> data_loader: DataLoaderBase = plugin.DataLoader(
        local_path='/tmp/yellow_tripdata_2021-07.csv',
        external_url='s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv',
    )
>>> data_load_success, message = data_loader.load()

Development / Contribution

  • Build the source distribution and Python artifacts (wheels):
$ rm -rf _skbuild/ build/ dist/ .tox/ __pycache__/ .pytest_cache/ MANIFEST *.egg-info/
$ pipenv run python setup.py sdist bdist_wheel
  • Upload to Test PyPi (no Linux binary wheel can be uploaded on PyPi):
$ PYPIURL="https://test.pypi.org"
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://test.pypi.org/legacy/
Uploading data_loader_plugin-0.0.1-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.1.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]

View at:
https://test.pypi.org/project/data-loader-plugin/0.0.1/
  • Upload/release the Python packages onto the PyPi repository:
    • Register the authentication token for access to PyPi:
$ PYPIURL="https://upload.pypi.org"
$ pipenv run keyring set ${PYPIURL}/ __token__
Password for '__token__' in '${PYPIURL}/':
  • Register the authentication token for access to PyPi:
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://upload.pypi.org/legacy/
Uploading data_loader_plugin-0.0.1-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.1.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]

View at:
https://pypi.org/project/data-loader-plugin/0.0.1/
$ pipenv run python setup.py build_sphinx
running build_sphinx
Running Sphinx v4.3.0
[autosummary] generating autosummary for: README.md
myst v0.15.2: ..., words_per_minute=200)
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 1 added, 0 changed, 0 removed
reading sources... [100%] README
...
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] README
...
build succeeded.

The HTML pages are in build/sphinx/html.
  • Re-generate the Python dependency files (requirements.txt) for the CI/CD pipeline (currently Travis CI):
$ pipenv --rm; rm -f Pipfile.lock; pipenv install; pipenv install --dev
$ git add Pipfile.lock
$ pipenv lock -r > ci/requirements.txt
$ pipenv lock --dev -r > ci/requirements-dev.txt
$ git add ci/requirements.txt ci/requirements-dev.txt
$ git commit -m "[CI] Upgraded the Python dependencies for the Travis CI pipeline"

Test the data loader plugin Python module

  • Enter into the pipenv Shell:
$ pipenv shell
(python-...-iVzKEypY) ✔ python -V
Python 3.9.8
  • Uninstall any previously installed data-loader-plugin module/library:
(python-...-iVzKEypY) ✔ python -mpip uninstall data-loader-plugin
  • Launch a simple test with pytest
(python-iVzKEypY) ✔ python -mpytest tests
=================== test session starts ==================
platform darwin -- Python 3.9.8, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: ~/dev/infra/python-plugin-data-loader
plugins: cov-3.0.0
collected 3 items

tests/test_copyfile.py .                             [ 33%]
tests/test_s3.py ..                                  [100%]
====================== 3 passed in 1.22s ==================
  • Exit the pipenv Shell:
(python-...-iVzKEypY) ✔ exit
Owner
Cloud Helpers
Cloud helper tools and documentation
Cloud Helpers
Simple project to assist in tracking/logging my working hours

Fill working hours Basic script to assist in the logging/tracking of my working hours How it works Create a file called projects.json in this director

Robin Kennedy-Reid 2 Oct 31, 2022
Just another sentiment wrapper.

sentimany Just a simple sentiment tool. It just grabs a set of pre-made sentiment models that you can quickly use to attach sentiment scores to text.

vincent d warmerdam 15 Dec 27, 2022
Virtual webcam that takes real webcam footage and replaces the background in order to have Virtual Backgrounds in MS Teams for Linux where the feature is unimplemented.

Background Remover The Need It's been good long while since Microsoft first released a Teams version for Linux and yet, one of Teams' coolest features

Dylan Turner 80 Dec 20, 2022
The next generation Canto RSS daemon

Canto Daemon This is the RSS backend for Canto clients. Canto-curses is the default client at: http://github.com/themoken/canto-curses Requirements De

Jack Miller 155 Dec 28, 2022
CaskDB is a disk-based, embedded, persistent, key-value store based on the Riak's bitcask paper, written in Python.

CaskDB - Disk based Log Structured Hash Table Store CaskDB is a disk-based, embedded, persistent, key-value store based on the Riak's bitcask paper, w

886 Dec 27, 2022
A simple countdown timer in eazy code to show timer with python

Countdown_Timer The simple CLI countdown timer in eazy code to show timer How Work First you fill the input by int-- (Enter the time in Seconds:) for

Yasin Rezvani 3 Nov 15, 2022
Python pyside2 kütüphanesi ile oluşturduğum drone için yer kontrol istasyonu yazılımı.

Ground Control Station (Yer Kontrol İstasyonu) Teknofest yarışmasında yerlilik kısmında Yer Kontrol İstasyonu yazılımı seçeneği bulunuyordu. Bu yüzden

Emirhan Bülbül 4 May 14, 2022
A Curated Collection of Awesome Python Scripts

A Curated Collection of Awesome Python Scripts that will make you go wow. This repository will help you in getting those green squares. Hop in and enjoy the journey of open source. 🚀

Prathima Kadari 248 Dec 31, 2022
Student Result Management System Project in tkinter created based on python, tkinter, and SQLITE3 Database

Student-Result-Management-System This Student Result Management System Project in tkinter created based on python, tkinter, and SQLITE3 Database. The

Ravi Chauhan 2 Aug 03, 2022
Sorter makes file organisation and management easier.

Sorter Sorter makes file organisation easier. It simply helps you organise several files that contain similar characteristics into a single folder. Yo

Aswa Paul 34 Aug 14, 2022
The bidirectional mapping library for Python.

bidict The bidirectional mapping library for Python. Status bidict: has been used for many years by several teams at Google, Venmo, CERN, Bank of Amer

Joshua Bronson 1.2k Dec 31, 2022
This is a simple SV calling package for diploid assemblies.

dipdiff This is a simple SV calling package for diploid assemblies. It uses a modified version of svim-asm. The package includes its own version minim

Mikhail Kolmogorov 11 Jan 05, 2023
Python requirements.txt Guesser

Python-Requirements-Guesser ⚠️ This is alpha quality software. Work in progress Attempt to guess requirements.txt modules versions based on Git histor

Jerome 9 May 24, 2022
A shim for the typeshed changes in mypy 0.900

types-all A shim for the typeshed changes in mypy 0.900 installation pip install types-all why --install-types is annoying, this installs all the thin

Anthony Sottile 28 Oct 20, 2022
Python MQTT v5.0 async client

gmqtt: Python async MQTT client implementation. Installation The latest stable version is available in the Python Package Index (PyPi) and can be inst

Gurtam 306 Jan 03, 2023
HOWTO: Downgrade from nYNAB to YNAB4

HOWTO: Downgrade from nYNAB to YNAB4 This page explains how to move from nYNAB to YNAB4 while retaining as much information as possible. See Appendix

Tobias Kunze 10 Dec 29, 2022
How to build an Fahrenheit to Celsius Converter in Python

Generally to measure the temperature we make use of one of these two popular units i.e. Fahrenheit & Celsius.

PyLaboratory 0 Feb 07, 2022
Run the Tianxunet software on the Xiaoyao Android simulator

Run the Tianxunet software on the Xiaoyao Android simulator, and automatically fill in the answers of English listening on the premise of having answers

1 Feb 13, 2022
"Cambio de monedas" Change-making problem with Python, dynamic programming best solutions,

Change-making-problem / Cambio de monedas Entendiendo el problema Dada una cantidad de dinero y una lista de denominaciones de monedas, encontrar el n

Juan Antonio Ayola Cortes 1 Dec 08, 2021
Toppr Os Auto Class Joiner

Toppr Os Auto Class Joiner Toppr os is a irritating platform to work with especially for students it takes a while and is problematic most of the time

1 Dec 18, 2021