A proof-of-concept jupyter extension which converts english queries into relevant python code

Overview

Text2Code for Jupyter notebook

A proof-of-concept jupyter extension which converts english queries into relevant python code.

Blog post with more details:

Data analysis made easy: Text2Code for Jupyter notebook

Demo Video:

Text2Code for Jupyter notebook

Supported Operating Systems:

  • Ubuntu
  • macOS

Installation

NOTE: We have renamed the plugin from mopp to jupyter-text2code. Uninstall mopp before installing new jupyter-text2code version.

pip uninstall mopp

CPU-only install:

For Mac and other Ubuntu installations not having a nvidia GPU, we need to explicitly set an environment variable at time of install.

export JUPYTER_TEXT2CODE_MODE="cpu"

GPU install dependencies:

sudo apt-get install libopenblas-dev libomp-dev

Installation commands:

git clone https://github.com/deepklarity/jupyter-text2code.git
cd jupyter-text2code
pip install .
jupyter nbextension enable jupyter-text2code/main

Uninstallation:

pip uninstall jupyter-text2code

Usage Instructions:

  • Start Jupyter notebook server by running the following command: jupyter notebook
  • If you don't see Nbextensions tab in Jupyter notebook run the following command:jupyter contrib nbextension install --user
  • You can open the sample notebooks/ctds.ipynb notebook for testing
  • If installation happened successfully, then for the first time, Universal Sentence Encoder model will be downloaded from tensorflow_hub
  • Click on the Terminal Icon which appears on the menu (to activate the extension)
  • Type "help" to see a list of currently supported commands in the repo
  • Watch Demo video for some examples

Docker containers for jupyter-text2code

We have published CPU and GPU images to docker hub with all dependencies pre-installed.

Visit https://hub.docker.com/r/deepklarity/jupyter-text2code/ to download the images and usage instructions.
CPU image size: 1.51 GB
GPU image size: 2.56 GB

Model training:

Generate training data:

From a list of templates present at jupyter_text2code/jupyter_text2code_serverextension/data/ner_templates.csv, generate training data by running the following command:

cd scripts && python generate_training_data.py

This command will generate data for intent matching and NER(Named Entity Recognition).

Create intent index faiss

Use the generated data to create a intent-matcher using faiss.

cd scripts && python create_intent_index.py

Train NER model

cd scripts && python train_spacy_ner.py

Steps to add more intents:

  • Add more templates in ner_templates with a new intent_id
  • Generate training data. Modify generate_training_data.py if different generation techniques are needed or if introducing a new entity.
  • Train intent index
  • Train NER model
  • modify jupyter_text2code/jupyter_text2code_serverextension/__init__.py with new intent's condition and add actual code for the intent
  • Reinstall plugin by running: pip install .

TODO:

  • Publish Docker image
  • Refactor code and make it mode modular, remove duplicate code, etc
  • Add support for Windows
  • Add support for more commands
  • Improve intent detection and NER
  • Explore sentence Paraphrasing to generate higher-quality training data
  • Gather real-world variable names, library names as opposed to randomly generating them
  • Try NER with a transformer-based model
  • With enough data, train a language model to directly do English->code like GPT-3 does, instead of having separate stages in the pipeline
  • Create a survey to collect linguistic data
  • Add Speech2Code support

Authored By:

Owner
DeepKlarity
DeepKlarity
Read images to numpy arrays

mahotas-imread: Read Image Files IO with images and numpy arrays. Mahotas-imread is a simple module with a small number of functions: imread Reads an

Luis Pedro Coelho 67 Jan 07, 2023
A ninja python package that unifies the Google Earth Engine ecosystem.

A Python package that unifies the Google Earth Engine ecosystem. EarthEngine.jl | rgee | rgee+ | eemont GitHub: https://github.com/r-earthengine/ee_ex

47 Dec 27, 2022
Read and write rasters in parallel using Rasterio and Dask

dask-rasterio dask-rasterio provides some methods for reading and writing rasters in parallel using Rasterio and Dask arrays. Usage Read a multiband r

Dymaxion Labs 85 Aug 30, 2022
A python package that extends Google Earth Engine.

A python package that extends Google Earth Engine GitHub: https://github.com/davemlz/eemont Documentation: https://eemont.readthedocs.io/ PyPI: https:

David Montero Loaiza 307 Jan 01, 2023
Simulation and Parameter Estimation in Geophysics

Simulation and Parameter Estimation in Geophysics - A python package for simulation and gradient based parameter estimation in the context of geophysical applications.

SimPEG 390 Dec 15, 2022
A multi-page streamlit app for the geospatial community.

A multi-page streamlit app for the geospatial community.

Qiusheng Wu 522 Dec 30, 2022
Program that shows all the details of the given IP address. Build with Python and ipinfo.io API

ip-details This is a program that shows all the details of the given IP address. Build with Python and ipinfo.io API Usage To use this program, run th

4 Mar 01, 2022
Interactive Maps with Geopandas

Create Interactive maps 🗺️ with your geodataframe Geopatra extends geopandas for interactive mapping and attempts to wrap the goodness of amazing map

sangarshanan 46 Aug 16, 2022
A trivia questions about Europe

EUROPE TRIVIA QUIZ IN PYTHON Project Outline Ask user if he / she knows more about Europe. If yes show the Trivia main screen, else show the end Trivi

David Danso 1 Nov 17, 2021
Python module and script to interact with the Tractive GPS tracker.

pyTractive GPS Python module and script to interact with the Tractive GPS tracker. Requirements Python 3 geopy folium pandas pillow usage: main.py [-h

Dr. Usman Kayani 3 Nov 16, 2022
ProjPicker (projection picker) is a Python module that allows the user to select all coordinate reference systems (CRSs)

ProjPicker ProjPicker (projection picker) is a Python module that allows the user to select all coordinate reference systems (CRSs) whose extent compl

Huidae Cho 4 Feb 06, 2022
Solving the Traveling Salesman Problem using Self-Organizing Maps

Solving the Traveling Salesman Problem using Self-Organizing Maps This repository contains an implementation of a Self Organizing Map that can be used

Diego Vicente 3.1k Dec 31, 2022
A part of HyRiver software stack for handling geospatial data manipulations

Package Description Status PyNHD Navigate and subset NHDPlus (MR and HR) using web services Py3DEP Access topographic data through National Map's 3DEP

Taher Chegini 5 Dec 14, 2022
Streamlit Component for rendering Folium maps

streamlit-folium This Streamlit Component is a work-in-progress to determine what functionality is desirable for a Folium and Streamlit integration. C

Randy Zwitch 224 Dec 30, 2022
Tools for the extraction of OpenStreetMap street network data

OSMnet Tools for the extraction of OpenStreetMap (OSM) street network data. Intended to be used in tandem with Pandana and UrbanAccess libraries to ex

Urban Data Science Toolkit 47 Sep 21, 2022
WhiteboxTools Python Frontend

whitebox-python Important Note This repository is related to the WhiteboxTools Python Frontend only. You can report issues to this repo if you have pr

Qiusheng Wu 304 Dec 15, 2022
A package to fetch sentinel 2 Satellite data from Google.

Sentinel 2 Data Fetcher Installation Create a Virtual Environment and activate it. python3 -m venv venv . venv/bin/activate Install the Package via pi

1 Nov 18, 2021
iNaturalist observations along hiking trails

This tool reads the route of a hike and generates a table of iNaturalist observations along the trails. It also shows the observations and the route of the hike on a map. Moreover, it saves waypoints

7 Nov 11, 2022
Get Landsat surface reflectance time-series from google earth engine

geextract Google Earth Engine data extraction tool. Quickly obtain Landsat multispectral time-series for exploratory analysis and algorithm testing On

Loïc Dutrieux 50 Dec 15, 2022
This is the antenna performance plotted from tinyGS reception data.

tinyGS-antenna-map This is the antenna performance plotted from tinyGS reception data. See their repository. The code produces a plot that provides Az

Martin J. Levy 14 Aug 21, 2022