This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Last update: Dec 30, 2022

Related tags

Deep Learning clipseg

Overview

Prompt-Based Multi-Modal Image Segmentation

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

The systems allows to create segmentation models without training based on:

An arbitrary text query
Or an image with a mask highlighting stuff or an object.

Quick Start

In the Quickstart.ipynb notebook we provide the code for using a pre-trained CLIPSeg model. It can also be used interactively using MyBinder (please note that the VM does not use a GPU, thus inference takes a few seconds).

Dependencies

This code base depends on pytorch, torchvision and clip (pip install git+https://github.com/openai/CLIP.git). Additional dependencies are hidden for double blind review.

Datasets

PhraseCut and PhraseCutPlus: Referring expression dataset
PFEPascalWrapper: Wrapper class for PFENet's Pascal-5i implementation
PascalZeroShot: Wrapper class for PascalZeroShot
COCOWrapper: Wrapper class for COCO.

Models

CLIPDensePredT: CLIPSeg model with transformer-based decoder.
ViTDensePredT: CLIPSeg model with transformer-based decoder.

Third Party Dependencies

For some of the datasets third party dependencies are required. Run the following commands in the third_party folder.

git clone https://github.com/cvlab-yonsei/JoEm
git clone https://github.com/Jia-Research-Lab/PFENet.git
git clone https://github.com/ChenyunWu/PhraseCutDataset.git
git clone https://github.com/juhongm999/hsnet.git

Weights

CLIPSeg-D64 (4.1MB, without CLIP weights)
CLIPSeg-D16 (1.1MB, without CLIP weights)

Training

See the experiment folder for yaml definitions of the training configurations. The training code is in experiment_setup.py.

Usage of PFENet Wrappers

In order to use the dataset and model wrappers for PFENet, the PFENet repository needs to be cloned to the root folder. git clone https://github.com/Jia-Research-Lab/PFENet.git

Citation

@article{lueddecke21
    title={Prompt-Based Multi-Modal Image Segmentation},
    author={Timo Lüddecke and Alexander Ecker},
    journal={arXiv preprint arXiv:2112.10003},
    year={2021}
}

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Related tags

Overview

Prompt-Based Multi-Modal Image Segmentation

Quick Start

Dependencies

Datasets

Models

Third Party Dependencies

Weights

Training

Usage of PFENet Wrappers

Citation

Owner

Timo Lüddecke

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

(EI 2022) Controllable Confidence-Based Image Denoising

SparseML is a libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

Ladder Variational Autoencoders (LVAE) in PyTorch

PyDEns is a framework for solving Ordinary and Partial Differential Equations (ODEs & PDEs) using neural networks

abess: Fast Best-Subset Selection in Python and R

Sparse Physics-based and Interpretable Neural Networks

Controlling Hill Climb Racing with Hand Tacking

OpenVisionAPI server

The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

Code for weakly supervised segmentation of a single class

A python comtrade load library accelerated by go

Camera calibration & 3D pose estimation tools for AcinoSet

An open-access benchmark and toolbox for electricity price forecasting

Merlion: A Machine Learning Framework for Time Series Intelligence

Contra is a lightweight, production ready Tensorflow alternative for solving time series prediction challenges with AI