Type4Py: Deep Similarity Learning-Based Type Inference for Python

Last update: Dec 15, 2022

Overview

Type4Py: Deep Similarity Learning-Based Type Inference for Python

This repository contains the implementation of Type4Py and instructions for re-producing the results of the paper.

Dataset
Installation Guide
Usage Guide
Converting Type4Py to ONNX
VSCode Extension
Type4Py Server
Citing Type4Py

Dataset

For Type4Py, we use the ManyTypes4Py dataset. You can download the latest version of the dataset here. Also, note that the dataset is already de-duplicated.

Code De-deduplication

If you want to use your own dataset, it is essential to de-duplicate the dataset by using a tool like CD4Py.

Installation Guide

Requirements

Linux-based OS
Python 3.5 or newer
An NVIDIA GPU with CUDA support

Quick Install

git clone https://github.com/saltudelft/type4py.git && cd type4py
pip install .

Usage Guide

Follow the below steps to train and evaluate the Type4Py model.

1. Extraction

NOTE: Skip this step if you're using the ManyTypes4Py dataset.

$ type4py extract --c $DATA_PATH --o $OUTPUT_DIR --d $DUP_FILES --w $CORES

Description:

$DATA_PATH: The path to the Python corpus or dataset.
$OUTPUT_DIR: The path to store processed projects.
$DUP_FILES: The path to the duplicate files, i.e., the *.jsonl.gz file produced by CD4Py. [Optional]
$CORES: Number of CPU cores to use for processing projects.

2. Preprocessing

$ type4py preprocess --o $OUTPUT_DIR --l $LIMIT

Description:

$OUTPUT_DIR: The path that was used in the first step to store processed projects. For the MT4Py dataset, use the directory in which the dataset is extracted.
$LIMIT: The number of projects to be processed. [Optional]

3. Vectorizing

$ type4py vectorize --o $OUTPUT_DIR

Description:

$OUTPUT_DIR: The path that was used in the previous step to store processed projects.

4. Learning

$ type4py learn --o $OUTPUT_DIR --c --p $PARAM_FILE

Description:

$OUTPUT_DIR: The path that was used in the previous step to store processed projects.
--c: Trains the complete model. Use type4py learn -h to see other configurations.
--p $PARAM_FILE: The path to user-provided hyper-parameters for the model. See this file as an example. [Optional]

5. Testing

$ type4py predict --o $OUTPUT_DIR --c

Description:

$OUTPUT_DIR: The path that was used in the first step to store processed projects.
--c: Predicts using the complete model. Use type4py predict -h to see other configurations.

6. Evaluating

$ type4py eval --o $OUTPUT_DIR --t c --tp 10

Description:

$OUTPUT_DIR: The path that was used in the first step to store processed projects.
--t: Evaluates the model considering different prediction tasks. E.g., --t c considers all predictions tasks, i.e., parameters, return, and variables. [Default: c]
--tp 10: Considers Top-10 predictions for evaluation. For this argument, You can choose a positive integer between 1 and 10. [Default: 10]

Use type4py eval -h to see other options.

Converting Type4Py to ONNX

To convert the pre-trained Type4Py model to the ONNX format, use the following command:

$ type4py to_onnx --o $OUTPUT_DIR

Description:

$OUTPUT_DIR: The path that was used in the usage section to store processed projects and the model.

VSCode Extension

Type4Py can be used in VSCode, which provides ML-based type auto-completion for Python files. The Type4Py's VSCode extension can be installed from the VS Marketplace here.

Type4Py Server

The Type4Py server is deployed on our server, which exposes a public API and powers the VSCode extension. However, if you would like to deploy the Type4Py server on your own machine, you can adapt the server code here. Also, please feel free to reach out to us for deployment, using the pre-trained Type4Py model and how to train your own model by creating an issue.

Citing Type4Py

@article{mir2021type4py,
  title={Type4Py: Deep Similarity Learning-Based Type Inference for Python},
  author={Mir, Amir M and Latoskinas, Evaldas and Proksch, Sebastian and Gousios, Georgios},
  journal={arXiv preprint arXiv:2101.04470},
  year={2021}
}

Type4Py: Deep Similarity Learning-Based Type Inference for Python

Related tags

Overview

Type4Py: Deep Similarity Learning-Based Type Inference for Python

Dataset

Code De-deduplication

Installation Guide

Requirements

Quick Install

Usage Guide

1. Extraction

2. Preprocessing

3. Vectorizing

4. Learning

5. Testing

6. Evaluating

Converting Type4Py to ONNX

VSCode Extension

Type4Py Server

Citing Type4Py

Owner

Software Analytics Lab

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

A tensorflow implementation of Fully Convolutional Networks For Semantic Segmentation

ML models and internal tensors 3D visualizer

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

Implementation for Homogeneous Unbalanced Regularized Optimal Transport

Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

Code samples for my book "Neural Networks and Deep Learning"

Human-Pose-and-Motion History

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

Anti-UAV base on PaddleDetection

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

A resource for learning about ML, DL, PyTorch and TensorFlow. Feedback always appreciated :)

This is the source code for: Context-aware Entity Typing in Knowledge Graphs.

Open-Ended Commonsense Reasoning (NAACL 2021)

Codes for Causal Semantic Generative model (CSG), the model proposed in "Learning Causal Semantic Representation for Out-of-Distribution Prediction" (NeurIPS-21)

AI pipelines for Nvidia Jetson Platform