Python plugin/extra to load data files from an external source (such as AWS S3) to a local directory

Overview

Data Loader Plugin - Python

Table of Content (ToC)

Table of contents generated with markdown-toc

Overview

The data loader plugin, aims at supporting running programs (e.g., API service backends) when downloading data from cloud services such as AWS S3. It provides a base Python library, namely data-loader-plugin, offering a few methods to download data files from AWS S3.

References

Python module

Python virtual environments

Installation

Clone this Git repository

$ mkdir -p ~/dev/infra && \
  git clone [email protected]:cloud-helpers/python-plugin-data-loader.git ~/dev/infra/python-plugin-data-loader
$ cd ~/dev/infra/python-plugin-data-loader

Python environment

  • If not already done so, install pyenv, Python 3.9 and, pip and pipenv
    • PyEnv:
$ git clone https://github.com/pyenv/pyenv.git ${HOME}/.pyenv
$ cat >> ~/.profile2 << _EOF

# Python
eval "\$(pyenv init --path)"

_EOF
$ cat >> ~/.bashrc << _EOF

# Python
export PYENV_ROOT="\${HOME}/.pyenv"
export PATH="\${PYENV_ROOT}/bin:\${PATH}"
. ~/.profile2
if command -v pyenv 1>/dev/null 2>&1
then
        eval "\$(pyenv init -)"
fi
if command -v pipenv 1>/dev/null 2>&1
then
        eval "\$(pipenv --completion)"
fi

_EOF
$ . ~/.bashrc
  • Python 3.9:
$ pyenv install 3.9.8 && pyenv local 3.9.8
  • pip:
$ python -mpip install -U pip
  • pipenv:
$ python -mpip install -U pipenv

Usage

Install the data-loader-plugin module

  • There are at least two ways to install the data-loader-plugin module, in the Python user space with pip and in a dedicated virtual environment with pipenv.

    • Both options may be installed in parallel
    • The Python user space (typically, /usr/local/opt/[email protected] on MacOS or ~/.pyenv/versions/3.9.8 on Linux) may already have many other modules installed, parasiting a fine-grained control over the versions of every Python dependency. If all the versions are compatible, then that option is convenient as it is available from the whole user space, not just from this sub-directory
  • In the remainder of that Usage section, it will be assumed that the data-loader-plugin module has been installed and readily available from the environment, whether that environment is virtual or not. In other words, to adapt the documentation for the case where pipenv is used, just add pipenv run in front of every Python-related command.

Install in the Python user space

  • Install and use the data-loader-plugin module in the user space (with pip):
$ python -mpip uninstall data-loader-plugin
$ python -mpip install -U data-loader-plugin

Installation in a dedicated Python virtual environment

  • Install and use the data-loader-plugin module in a virtual environment:
$ pipenv shell
(python-...-JwpAHotb) ✔ python -mpip install -U data-loader-plugin
(python-...-JwpAHotb) ✔ python -mpip install -U data-loader-plugin
(python-...-JwpAHotb) ✔ exit

Use data-loader-plugin as a module from another Python program

  • Check the data file with the AWS command-line (CLI):
$ aws s3 ls --human s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv --no-sign-request
2021-10-29 20:44:34  249.3 MiB yellow_tripdata_2021-07.csv
  • Module import statements:
>>> import importlib
>>> from types import ModuleType
>>> from data_loader_plugin.base import DataLoaderBase
  • Create an instance of the DataLoaderBase Python class:
>>> plugin: ModuleType = importlib.import_module("data_loader_plugin.copyfile")
>>> data_loader: DataLoaderBase = plugin.DataLoader(
        local_path='/tmp/yellow_tripdata_2021-07.csv',
        external_url='s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv',
    )
>>> data_load_success, message = data_loader.load()

Development / Contribution

  • Build the source distribution and Python artifacts (wheels):
$ rm -rf _skbuild/ build/ dist/ .tox/ __pycache__/ .pytest_cache/ MANIFEST *.egg-info/
$ pipenv run python setup.py sdist bdist_wheel
  • Upload to Test PyPi (no Linux binary wheel can be uploaded on PyPi):
$ PYPIURL="https://test.pypi.org"
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://test.pypi.org/legacy/
Uploading data_loader_plugin-0.0.1-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.1.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]

View at:
https://test.pypi.org/project/data-loader-plugin/0.0.1/
  • Upload/release the Python packages onto the PyPi repository:
    • Register the authentication token for access to PyPi:
$ PYPIURL="https://upload.pypi.org"
$ pipenv run keyring set ${PYPIURL}/ __token__
Password for '__token__' in '${PYPIURL}/':
  • Register the authentication token for access to PyPi:
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://upload.pypi.org/legacy/
Uploading data_loader_plugin-0.0.1-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.1.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]

View at:
https://pypi.org/project/data-loader-plugin/0.0.1/
$ pipenv run python setup.py build_sphinx
running build_sphinx
Running Sphinx v4.3.0
[autosummary] generating autosummary for: README.md
myst v0.15.2: ..., words_per_minute=200)
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 1 added, 0 changed, 0 removed
reading sources... [100%] README
...
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] README
...
build succeeded.

The HTML pages are in build/sphinx/html.
  • Re-generate the Python dependency files (requirements.txt) for the CI/CD pipeline (currently Travis CI):
$ pipenv --rm; rm -f Pipfile.lock; pipenv install; pipenv install --dev
$ git add Pipfile.lock
$ pipenv lock -r > ci/requirements.txt
$ pipenv lock --dev -r > ci/requirements-dev.txt
$ git add ci/requirements.txt ci/requirements-dev.txt
$ git commit -m "[CI] Upgraded the Python dependencies for the Travis CI pipeline"

Test the data loader plugin Python module

  • Enter into the pipenv Shell:
$ pipenv shell
(python-...-iVzKEypY) ✔ python -V
Python 3.9.8
  • Uninstall any previously installed data-loader-plugin module/library:
(python-...-iVzKEypY) ✔ python -mpip uninstall data-loader-plugin
  • Launch a simple test with pytest
(python-iVzKEypY) ✔ python -mpytest tests
=================== test session starts ==================
platform darwin -- Python 3.9.8, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: ~/dev/infra/python-plugin-data-loader
plugins: cov-3.0.0
collected 3 items

tests/test_copyfile.py .                             [ 33%]
tests/test_s3.py ..                                  [100%]
====================== 3 passed in 1.22s ==================
  • Exit the pipenv Shell:
(python-...-iVzKEypY) ✔ exit
Owner
Cloud Helpers
Cloud helper tools and documentation
Cloud Helpers
YBlade - Import QBlade blades into Fusion 360

YBlade - Import QBlade blades into Fusion 360 Simple script for Fusion 360 that takes QBlade blade description and constructs the blade: Usage First,

Jan Mrázek 37 Sep 25, 2022
Consulta cpf fds

Consulta-cpf Consulta cpf fds Instalação: apt-get update -y

Moleey 1 Nov 24, 2021
Collection of script & resources for Foundry's Nuke software.

Author: Liam Collod. Collections of scripting stuff I wrote for Foundry's Nuke software. Utilisation You can have a look at the README.md file in each

Liam Collod 1 May 14, 2022
This project is about for notifying moderators about uploaded photos on server.

This project is about for notifying moderators (people who moderate data from photos) about uploaded photos on server.

1 Nov 24, 2021
urlwatch is intended to help you watch changes in webpages and get notified of any changes.

urlwatch is intended to help you watch changes in webpages and get notified (via e-mail, in your terminal or through various third party services) of any changes.

Thomas Perl 2.5k Jan 08, 2023
Location of public benchmarking; primarily final results

CSL_public_benchmark This repo is intended to provide a periodically-updated, public view into genome sequencing benchmarks managed by HudsonAlpha's C

HudsonAlpha Institute for Biotechnology 15 Jun 13, 2022
Union oichecklists For Python

OI Checklist Union Auto-Union user's OI Checklists. Just put your checklist's ID in and it works. How to use it? Put all your OI Checklist IDs (that i

FHVirus 4 Mar 30, 2022
A web interface for a soft serve Git server.

Soft Serve monitor Soft Sevre is a very nice git server. It offers a really nice TUI to browse the repositories on the server. Unfortunately, it does

Maxime Bouillot 5 Apr 26, 2022
Contains the code of my learning of Python OOP.

OOP Python This repository contains the code of my learning of Python OOP. All the code: is following PEP 8 ✅ has proper concept illustrations and com

Samyak Jain 2 Jan 15, 2022
🤡 Multiple Discord selfbot src deobfuscated !

Deobfuscated selfbot sources About. If you whant to add src, please make pull requests. If you whant to deobfuscate src, send mail to

Sreecharan 5 Sep 13, 2021
Tips that improve your life in one way or another

Tips that improve your life in one way or another. This software downloads life tips from reddit.com/r/LifeProTips and tweet the most upvoted tips on Twitter.

Burak Tokman 2 Aug 04, 2022
Youtube Channel Website

Videos-By-Sanjeevi Youtube Channel Website YouTube Channel Website Features: Free Hosting using GitHub Pages and open-source code base in GitHub. It c

Sanjeevi Subramani 5 Mar 26, 2022
NFT generator for Solana!

Solseum NFT Generator for Solana! Check this guide here! Creating your randomized uniques NFTs, getting rarity information and displaying it on a webp

Solseum™ VR NFTs 145 Dec 30, 2022
Simple utlity for sniffing decrypted HTTP/HTTPS traffic on a jailbroken iOS device into an HAR format.

Description iOS devices contain a hidden feature for sniffing decrypted HTTP/HTTPS traffic from all processes using the CFNetwork framework into an HA

83 Dec 25, 2022
Blender addon for executing the operator in response to the received OSC message.

I/F Joiner 受信したOSCメッセージに応じてオペレータ(bpy.ops)を実行するアドオンです. OSC通信に対応したコントローラやアプリをインストールしたスマートフォンを使用してBlenderを操作することが可能になります. 同時開発しているAndroidコントローラ化アプリMocopa

simasimataiyo 6 Oct 02, 2022
a package that provides a marketstrategy for whitelisting on golem

filterms a package that provides a marketstrategy for whitelisting on golem watching requestor logs distribute 10 tasks asynchronously is fun. but you

KJM 3 Aug 03, 2022
This app is to use algorithms to find the root of the equation

In this repository, I made an amazing app with tkinter python language and other libraries the idea of this app is to use algorithms to find the root of the equation I used three methods from numeric

Mohammad Al Jadallah 3 Sep 16, 2022
Curses frontend for Canto daemon

Canto Curses The curses (text) client for canto-daemon. Canto-daemon is required to work and is found at: http://github.com/themoken/canto-next Requir

Jack Miller 86 Dec 28, 2022
The ldapconsole script allows you to perform custom LDAP requests to a Windows domain

ldapconsole The ldapconsole script allows you to perform custom LDAP requests to a Windows domain. Features Authenticate with password Authenticate wi

Podalirius 38 Dec 09, 2022
🇮🇳 A Indian Flag Animation Project Made With Python

🇮🇳 A Indian Flag Animation Project Made With Python

MuFaz-TG 2 Oct 21, 2022