PASSL包含 SimCLR,MoCo,BYOL,CLIP等基于对比学习的图像自监督算法以及 Vision-Transformer,Swin-Transformer,BEiT,CVT,T2T,MLP_Mixer等视觉Transformer算法

Overview

PASSL

Introduction

PASSL is a Paddle based vision library for state-of-the-art Self-Supervised Learning research with PaddlePaddle. PASSL aims to accelerate research cycle in self-supervised learning: from designing a new self-supervised task to evaluating the learned representations.

  • Reproducible implementation of SOTA in Self-Supervision: Existing SOTA in Self-Supervision are implemented - SimCLR, MoCo(v1),MoCo(v2), MoCo-BYOL, CLIP. BYOL is coming soon. Also supports supervised trainings.
  • Modular: Easy to build new tasks and reuse the existing components from other tasks (Trainer, models and heads, data transforms, etc.).

Installation

Implemented Models

Benchmark Linear Image Classification on ImageNet-1K

epochs official results passl results Backbone Model
MoCo 200 60.6 60.64 ResNet-50 download
SimCLR 100 64.5 65.3 ResNet-50 download
MoCo v2 200 67.7 67.72 ResNet-50 download
MoCo-BYOL 300 71.56 72.10 ResNet-50 download
BYOL 300 72.50 71.62 ResNet-50 download

Getting Started

Please see GETTING_STARTED.md for the basic usage of PASSL.

Tutorials

Comments
  • MLP-Mixer: An all-MLP Architecture for Vision

    MLP-Mixer: An all-MLP Architecture for Vision

    readme文件里的两个模型的TOP1 是不是写反了?模型大的准确度比模型小的准确度小一些?

    Arch | Weight | Top-1 Acc | Top-5 Acc | Crop ratio | # Params -- | -- | -- | -- | -- | -- mlp_mixer_b16_224 | pretrain 1k | 76.60 | 92.23 | 0.875 | 60.0M mlp_mixer_l16_224 | pretrain 1k | 72.06 | 87.67 | 0.875 | 208.2M

    opened by gaorui999 3
  • 我很关注图像分类的自监督进展

    我很关注图像分类的自监督进展

    小弟想问问,对于图像分类的自监督,目前是什么进展呢?比如猫狗分类这种典型的二分类准确率如何?imagenet1k分类准确率如何?PASSL里面的关于图像分类的自监督算法或者模型,有哪些?能给个例子,让我知道如何使用吗?目前看到PASSLissues才1条,文档完全没看到.方便加个微信或者QQ聊几句吗?小弟对于图像分类的自监督高度重视.还有一个疑问,关于图像分类的自监督模型,是不是我给一堆图片,模型运行后,就会把图片归类呢?我需不需要给出类别的数量呢?说白了,我想知道图像分类的自监督的一个使用流程.现在都1.0了,该有点用处了吧.如果一个模型运行后,图像就分好类了,归纳为N类,我有什么办法判断分类的正确性呢?这方面有算法吗? 提了很多问题,跪求每个问题都回答一下,谢谢大佬.

    opened by yuwoyizhan 2
  • Unintended behavior in clip_logit_scale

    Unintended behavior in clip_logit_scale

    https://github.com/PaddlePaddle/PASSL/blob/83c49e6a5ba3444cee7f054122559d7759152764/passl/modeling/backbones/clip.py#L317

    check this issue for reference https://github.com/PaddlePaddle/Paddle/issues/43710

    Suggested approach (with non-public API)

    logit_scale_buffer = self.logit_scale.clip(-4.6, 4.6)
    logit_scale_buffer._share_buffer_to(self.logit_scale)
    
    opened by minogame 1
  • 建议

    建议

    1.passl很多文字都是英文的,包括快速使用等文档,希望可以提供中文文档. 2.希望知道图像分类自监督学习的技术研究目前到达什么程度了.比如猫狗这种二分类准确率如何,imagenet准确率如何,使用passl进行图像分类,需要给类别总数量吗? 3.能加个QQ或者微信聊几句吗?有些疑问,拜托了,大佬. QQ:1226194560 微信:18820785964

    opened by yuwoyizhan 1
  • fix bug of mixup for DeiT

    fix bug of mixup for DeiT

    DeiT/B-16 pretrained on ImageNet1K:

    [01/21 02:54:46] passl.engine.trainer INFO: Validate Epoch [290] acc1 (81.336), acc5 (95.544)
    [01/21 03:02:31] passl.engine.trainer INFO: Validate Epoch [291] acc1 (81.328), acc5 (95.580)
    [01/21 03:10:20] passl.engine.trainer INFO: Validate Epoch [292] acc1 (81.390), acc5 (95.608)
    [01/21 03:18:10] passl.engine.trainer INFO: Validate Epoch [293] acc1 (81.484), acc5 (95.636)
    [01/21 03:26:00] passl.engine.trainer INFO: Validate Epoch [294] acc1 (81.452), acc5 (95.600)
    [01/21 03:33:52] passl.engine.trainer INFO: Validate Epoch [295] acc1 (81.354), acc5 (95.528)
    [01/21 03:41:38] passl.engine.trainer INFO: Validate Epoch [296] acc1 (81.338), acc5 (95.562)
    [01/21 03:49:25] passl.engine.trainer INFO: Validate Epoch [297] acc1 (81.344), acc5 (95.542)
    [01/21 03:57:15] passl.engine.trainer INFO: Validate Epoch [298] acc1 (81.476), acc5 (95.550)
    [01/21 04:05:03] passl.engine.trainer INFO: Validate Epoch [299] acc1 (81.476), acc5 (95.572)
    [01/21 04:12:51] passl.engine.trainer INFO: Validate Epoch [300] acc1 (81.386), acc5 (95.536)
    
    opened by GuoxiaWang 1
  • BYOL的预训练中好像使用了gt_label?

    BYOL的预训练中好像使用了gt_label?

    • 在byol的config 中设置了 num_classes=1000: https://github.com/PaddlePaddle/PASSL/blob/9d7a9fd4af41772e29120553dddab1c162e4cb70/configs/byol/byol_r50_IM.yaml#L34
    • 在model中设置了self.classifier = nn.Linear(embedding_dim, num_classes),并且forward中将classif_out和label一起传给了head

    image

    https://github.com/PaddlePaddle/PASSL/blob/9d7a9fd4af41772e29120553dddab1c162e4cb70/passl/modeling/architectures/BYOL.py#L263

    • 在L2 Head中将对比loss和有监督的CE loss加在了一起返回

    image

    https://github.com/PaddlePaddle/PASSL/blob/9d7a9fd4af41772e29120553dddab1c162e4cb70/passl/modeling/heads/l2_head.py#L43

    opened by youqingxiaozhua 0
  • [飞桨论文复现挑战赛(第六期)] (85) Emerging Properties in Self-Supervised Vision Transformers

    [飞桨论文复现挑战赛(第六期)] (85) Emerging Properties in Self-Supervised Vision Transformers

    PR types

    New features

    PR changes

    APIs

    Describe

    • Task: https://github.com/PaddlePaddle/Paddle/issues/41482
    • 添加 passl.model.architectures.dino

    Peformance

    | Model | Official | Passl | | ---- | ---- | ---- | | DINO | 74.0 | 73.6 |

    • [x] 预训练和linear probe代码
    • [ ] 预训练和linear probe权重
    • [ ] 文档
    • [ ] TIPC
    opened by fuqianya 0
Releases(v1.0.0)
  • v1.0.0(Feb 24, 2022)

    • 新增 XCiT 视觉 Transformer 模型 xcit_nano_12_p8_224 蒸馏模型训练指标对齐,感谢 @BrilliantYuKaimin 的高质量贡献 🎉 🎉 🎉

    PASSL飞桨自监督领域核心学习库,提供大量高精度的视觉自监督模型、视觉 Transformer 模型,并支持超大视觉模型分布式训练功能,旨在提升飞桨开发者在自监督领域建模效率,并提供基于飞桨框架2.2的超大视觉模型领域最佳实践

    Source code(tar.gz)
    Source code(zip)
Simple reimplemetation experiments about FcaNet

FcaNet-CIFAR An implementation of the paper FcaNet: Frequency Channel Attention Networks on CIFAR10/CIFAR100 dataset. how to run Code: python Cifar.py

76 Feb 04, 2021
[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

Delving into Deep Imbalanced Regression This repository contains the implementation code for paper: Delving into Deep Imbalanced Regression Yuzhe Yang

Yuzhe Yang 568 Dec 30, 2022
AQP is a modular pipeline built to enable the comparison and testing of different quality metric configurations.

Audio Quality Platform - AQP An Open Modular Python Platform for Objective Speech and Audio Quality Metrics AQP is a highly modular pipeline designed

Jack Geraghty 24 Oct 01, 2022
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

AST: Audio Spectrogram Transformer Introduction Citing Getting Started ESC-50 Recipe Speechcommands Recipe AudioSet Recipe Pretrained Models Contact I

Yuan Gong 603 Jan 07, 2023
Research using Cirq!

ReCirq Research using Cirq! This project contains modules for running quantum computing applications and experiments through Cirq and Quantum Engine.

quantumlib 230 Dec 29, 2022
we propose a novel deep network, named feature aggregation and refinement network (FARNet), for the automatic detection of anatomical landmarks.

Feature Aggregation and Refinement Network for 2D Anatomical Landmark Detection Overview Localization of anatomical landmarks is essential for clinica

aoyueyuan 0 Aug 28, 2022
A module that used for encrypt code which includes RSA and AES

软件加密模块 requirement: Crypto,pycryptodome,pyqt5 本地加密信息为随机字符串 使用说明 命令行参数 -h 帮助 -checkWorking 检查是否能正常工作,后接1确认指令 -checkEndDate 检查截至日期,后接1确认指令 -activateCode

2 Sep 27, 2022
Hypercomplex Neural Networks with PyTorch

HyperNets Hypercomplex Neural Networks with PyTorch: this repository would be a container for hypercomplex neural network modules to facilitate resear

Eleonora Grassucci 21 Dec 27, 2022
RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

Phil Wang 556 Jan 04, 2023
PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

Liyan 52 Jan 04, 2023
MAVE: : A Product Dataset for Multi-source Attribute Value Extraction

The dataset contains 3 million attribute-value annotations across 1257 unique categories on 2.2 million cleaned Amazon product profiles. It is a large, multi-sourced, diverse dataset for product attr

Google Research Datasets 89 Jan 08, 2023
pytorch bert intent classification and slot filling

pytorch_bert_intent_classification_and_slot_filling 基于pytorch的中文意图识别和槽位填充 说明 基本思路就是:分类+序列标注(命名实体识别)同时训练。 使用的预训练模型:hugging face上的chinese-bert-wwm-ext 依

西西嘛呦 33 Dec 15, 2022
Linear image-to-image translation

Linear (Un)supervised Image-to-Image Translation Examples for linear orthogonal transformations in PCA domain, learned without pairing supervision. Tr

Eitan Richardson 40 Aug 31, 2022
VideoGPT: Video Generation using VQ-VAE and Transformers

VideoGPT: Video Generation using VQ-VAE and Transformers [Paper][Website][Colab][Gradio Demo] We present VideoGPT: a conceptually simple architecture

Wilson Yan 470 Dec 30, 2022
Source code for "Pack Together: Entity and Relation Extraction with Levitated Marker"

PL-Marker Source code for Pack Together: Entity and Relation Extraction with Levitated Marker. Quick links Overview Setup Install Dependencies Data Pr

THUNLP 173 Dec 30, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Repository of 3D Object Detection with Pointformer (CVPR2021)

3D Object Detection with Pointformer This repository contains the code for the paper 3D Object Detection with Pointformer (CVPR 2021) [arXiv]. This wo

Zhuofan Xia 117 Jan 06, 2023
COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

COPA-SSE Repository for COPA-SSE: Semi-Structured Explanations for Commonsense Reasoning. COPA-SSE contains crowdsourced explanations for the Balanced

Ana Brassard 5 Jul 31, 2022
An open source library for face detection in images. The face detection speed can reach 1000FPS.

libfacedetection This is an open source library for CNN-based face detection in images. The CNN model has been converted to static variables in C sour

Shiqi Yu 11.4k Dec 27, 2022
HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

: We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. Furthermore, to allow maximal adaptivity, the w

Yuval Nirkin 182 Dec 14, 2022