Convert monolithic Jupyter notebooks into Ploomber pipelines.

Last update: Dec 16, 2022

Overview

Soorgeon

Convert monolithic Jupyter notebooks into Ploomber pipelines.

soorgeon.mp4

3-minute video tutorial.

Try the interactive demo:

Note: Soorgeon is in alpha, help us make it better.

Install

pip install soorgeon

Usage

# refactor notebook
soorgeon refactor nb.ipynb

# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet

# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory

# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py

To learn more, check out our guide.

Examples

git clone https://github.com/ploomber/soorgeon

Exploratory daya analysis notebook:

cd examples/exploratory
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

Machine learning notebook:

cd examples/machine-learning
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build

To learn more, check out our guide.

Convert monolithic Jupyter notebooks into Ploomber pipelines.

Related tags

Overview

Soorgeon

Install

Usage

Examples

Community

Owner

Ploomber

The Dash Enterprise App Gallery "Oil & Gas Wells" example

A crude Hy handle on Pandas library

ETL pipeline on movie data using Python and postgreSQL

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology.

PyClustering is a Python, C++ data mining library.

Pipeline and Dataset helpers for complex algorithm evaluation.

[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Zipline, a Pythonic Algorithmic Trading Library

.npy, .npz, .mtx converter.

Integrate bus data from a variety of sources (batch processing and real time processing).

This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot.

INFO-H515 - Big Data Scalable Analytics

Automated Exploration Data Analysis on a financial dataset

ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

Cleaning and analysing aggregated UK political polling data.

Using Python to derive insights on particular Pokemon, Types, Generations, and Stats