Type4Py: Deep Similarity Learning-Based Type Inference for Python

Overview

Type4Py: Deep Similarity Learning-Based Type Inference for Python

GH Workflow

This repository contains the implementation of Type4Py and instructions for re-producing the results of the paper.

Dataset

For Type4Py, we use the ManyTypes4Py dataset. You can download the latest version of the dataset here. Also, note that the dataset is already de-duplicated.

Code De-deduplication

If you want to use your own dataset, it is essential to de-duplicate the dataset by using a tool like CD4Py.

Installation Guide

Requirements

  • Linux-based OS
  • Python 3.5 or newer
  • An NVIDIA GPU with CUDA support

Quick Install

git clone https://github.com/saltudelft/type4py.git && cd type4py
pip install .

Usage Guide

Follow the below steps to train and evaluate the Type4Py model.

1. Extraction

NOTE: Skip this step if you're using the ManyTypes4Py dataset.

$ type4py extract --c $DATA_PATH --o $OUTPUT_DIR --d $DUP_FILES --w $CORES

Description:

  • $DATA_PATH: The path to the Python corpus or dataset.
  • $OUTPUT_DIR: The path to store processed projects.
  • $DUP_FILES: The path to the duplicate files, i.e., the *.jsonl.gz file produced by CD4Py. [Optional]
  • $CORES: Number of CPU cores to use for processing projects.

2. Preprocessing

$ type4py preprocess --o $OUTPUT_DIR --l $LIMIT

Description:

  • $OUTPUT_DIR: The path that was used in the first step to store processed projects. For the MT4Py dataset, use the directory in which the dataset is extracted.
  • $LIMIT: The number of projects to be processed. [Optional]

3. Vectorizing

$ type4py vectorize --o $OUTPUT_DIR

Description:

  • $OUTPUT_DIR: The path that was used in the previous step to store processed projects.

4. Learning

$ type4py learn --o $OUTPUT_DIR --c --p $PARAM_FILE

Description:

  • $OUTPUT_DIR: The path that was used in the previous step to store processed projects.

  • --c: Trains the complete model. Use type4py learn -h to see other configurations.

  • --p $PARAM_FILE: The path to user-provided hyper-parameters for the model. See this file as an example. [Optional]

5. Testing

$ type4py predict --o $OUTPUT_DIR --c

Description:

  • $OUTPUT_DIR: The path that was used in the first step to store processed projects.
  • --c: Predicts using the complete model. Use type4py predict -h to see other configurations.

6. Evaluating

$ type4py eval --o $OUTPUT_DIR --t c --tp 10

Description:

  • $OUTPUT_DIR: The path that was used in the first step to store processed projects.
  • --t: Evaluates the model considering different prediction tasks. E.g., --t c considers all predictions tasks, i.e., parameters, return, and variables. [Default: c]
  • --tp 10: Considers Top-10 predictions for evaluation. For this argument, You can choose a positive integer between 1 and 10. [Default: 10]

Use type4py eval -h to see other options.

Converting Type4Py to ONNX

To convert the pre-trained Type4Py model to the ONNX format, use the following command:

$ type4py to_onnx --o $OUTPUT_DIR

Description:

  • $OUTPUT_DIR: The path that was used in the usage section to store processed projects and the model.

VSCode Extension

vsm-version

Type4Py can be used in VSCode, which provides ML-based type auto-completion for Python files. The Type4Py's VSCode extension can be installed from the VS Marketplace here.

Type4Py Server

GH Workflow

The Type4Py server is deployed on our server, which exposes a public API and powers the VSCode extension. However, if you would like to deploy the Type4Py server on your own machine, you can adapt the server code here. Also, please feel free to reach out to us for deployment, using the pre-trained Type4Py model and how to train your own model by creating an issue.

Citing Type4Py

@article{mir2021type4py,
  title={Type4Py: Deep Similarity Learning-Based Type Inference for Python},
  author={Mir, Amir M and Latoskinas, Evaldas and Proksch, Sebastian and Gousios, Georgios},
  journal={arXiv preprint arXiv:2101.04470},
  year={2021}
}
Owner
Software Analytics Lab
Software Analytics Lab @ TU Delft
Software Analytics Lab
Conformer: Local Features Coupling Global Representations for Visual Recognition

Conformer: Local Features Coupling Global Representations for Visual Recognition (arxiv) This repository is built upon DeiT and timm Usage First, inst

Zhiliang Peng 378 Jan 08, 2023
A python library for time-series smoothing and outlier detection in a vectorized way.

tsmoothie A python library for time-series smoothing and outlier detection in a vectorized way. Overview tsmoothie computes, in a fast and efficient w

Marco Cerliani 517 Dec 28, 2022
A general python framework for visual object tracking and video object segmentation, based on PyTorch

PyTracking A general python framework for visual object tracking and video object segmentation, based on PyTorch. 📣 Two tracking/VOS papers accepted

2.6k Jan 04, 2023
CVPR 2021 Challenge on Super-Resolution Space

Learning the Super-Resolution Space Challenge NTIRE 2021 at CVPR Learning the Super-Resolution Space challenge is held as a part of the 6th edition of

andreas 104 Oct 26, 2022
Hypercomplex Neural Networks with PyTorch

HyperNets Hypercomplex Neural Networks with PyTorch: this repository would be a container for hypercomplex neural network modules to facilitate resear

Eleonora Grassucci 21 Dec 27, 2022
Classification models 1D Zoo - Keras and TF.Keras

Classification models 1D Zoo - Keras and TF.Keras This repository contains 1D variants of popular CNN models for classification like ResNets, DenseNet

Roman Solovyev 12 Jan 06, 2023
Voice Gender Recognition

In this project it was used some different Machine Learning models to identify the gender of a voice (Female or Male) based on some specific speech and voice attributes.

Anne Livia 1 Jan 27, 2022
Simple PyTorch hierarchical models.

A python package adding basic hierarchal networks in pytorch for classification tasks. It implements a simple hierarchal network structure based on feed-backward outputs.

Rajiv Sarvepalli 5 Mar 06, 2022
Knowledge Management for Humans using Machine Learning & Tags

HyperTag HyperTag helps humans intuitively express how they think about their files using tags and machine learning.

Ravn Tech, Inc. 165 Nov 04, 2022
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usec

The Kompute Project 1k Jan 06, 2023
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

VL-BERT By Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai. This repository is an official implementation of the paper VL-BERT:

Weijie Su 698 Dec 18, 2022
ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation

ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation (Accepted by BMVC'21) Abstract: Images acquir

10 Dec 08, 2022
Pytorch implementation of NeurIPS 2021 paper: Geometry Processing with Neural Fields.

Geometry Processing with Neural Fields Pytorch implementation for the NeurIPS 2021 paper: Geometry Processing with Neural Fields Guandao Yang, Serge B

Guandao Yang 162 Dec 16, 2022
A PyTorch Toolbox for Face Recognition

FaceX-Zoo FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards stat

JDAI-CV 1.6k Jan 06, 2023
Signals-backend - A suite of card games written in Python

Card game A suite of card games written in the Python language. Features coming

1 Feb 15, 2022
Lava-DL, but with PyTorch-Lightning flavour

Deep learning project seed Use this seed to start new deep learning / ML projects. Built in setup.py Built in requirements Examples with MNIST Badges

Sami BARCHID 4 Oct 31, 2022
RealTime Emotion Recognizer for Machine Learning Study Jam's demo

Emotion recognizer Table of contents Clone project Dataset Install dependencies Main program Demo 1. Clone project git clone https://github.com/GDSC20

Google Developer Student Club - UIT 1 Oct 05, 2021
Tensorflow implementation of "Learning Deconvolution Network for Semantic Segmentation"

Tensorflow implementation of Learning Deconvolution Network for Semantic Segmentation. Install Instructions Works with tensorflow 1.11.0 and uses the

Fabian Bormann 224 Apr 15, 2022
Greedy Gaussian Segmentation

GGS Greedy Gaussian Segmentation (GGS) is a Python solver for efficiently segmenting multivariate time series data. For implementation details, please

Stanford University Convex Optimization Group 72 Dec 07, 2022
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

Project This repo has been populated by an initial template to help get you started. Please make sure to update the content to build a great experienc

Microsoft 674 Dec 26, 2022