Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Overview

Updates

  • (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Pyramid Vision Transformer

The image is from Transformers: Revenge of the Fallen.

This repository contains the official implementation of PVTv1 & PVTv2 in image classification, object detection, and semantic segmentation tasks.

Model Zoo

Image Classification

Classification configs & weights see >>>here<<<.

  • PVTv2 on ImageNet-1K
Method Size [email protected] #Params (M)
PVTv2-B0 224 70.5 3.7
PVTv2-B1 224 78.7 14.0
PVTv2-B2-Linear 224 82.1 22.6
PVTv2-B2 224 82.0 25.4
PVTv2-B3 224 83.1 45.2
PVTv2-B4 224 83.6 62.6
PVTv2-B5 224 83.8 82.0
  • PVTv1 on ImageNet-1K
Method Size [email protected] #Params (M)
PVT-Tiny 224 75.1 13.2
PVT-Small 224 79.8 24.5
PVT-Medium 224 81.2 44.2
PVT-Large 224 81.7 61.4

Object Detection

Detection configs & weights see >>>here<<<.

  • PVTv2 on COCO

Baseline Detectors

Method Backbone Pretrain Lr schd Aug box AP mask AP
RetinaNet PVTv2-b0 ImageNet-1K 1x No 37.2 -
RetinaNet PVTv2-b1 ImageNet-1K 1x No 41.2 -
RetinaNet PVTv2-b2 ImageNet-1K 1x No 44.6 -
RetinaNet PVTv2-b3 ImageNet-1K 1x No 45.9 -
RetinaNet PVTv2-b4 ImageNet-1K 1x No 46.1 -
RetinaNet PVTv2-b5 ImageNet-1K 1x No 46.2 -
Mask R-CNN PVTv2-b0 ImageNet-1K 1x No 38.2 36.2
Mask R-CNN PVTv2-b1 ImageNet-1K 1x No 41.8 38.8
Mask R-CNN PVTv2-b2 ImageNet-1K 1x No 45.3 41.2
Mask R-CNN PVTv2-b3 ImageNet-1K 1x No 47.0 42.5
Mask R-CNN PVTv2-b4 ImageNet-1K 1x No 47.5 42.7
Mask R-CNN PVTv2-b5 ImageNet-1K 1x No 47.4 42.5

Advanced Detectors

Method Backbone Pretrain Lr schd Aug box AP mask AP
Cascade Mask R-CNN PVTv2-b2-Linear ImageNet-1K 3x Yes 50.9 44.0
Cascade Mask R-CNN PVTv2-b2 ImageNet-1K 3x Yes 51.1 44.4
ATSS PVTv2-b2-Linear ImageNet-1K 3x Yes 48.9 -
ATSS PVTv2-b2 ImageNet-1K 3x Yes 49.9 -
GFL PVTv2-b2-Linear ImageNet-1K 3x Yes 49.2 -
GFL PVTv2-b2 ImageNet-1K 3x Yes 50.2 -
Sparse R-CNN PVTv2-b2-Linear ImageNet-1K 3x Yes 48.9 -
Sparse R-CNN PVTv2-b2 ImageNet-1K 3x Yes 50.1 -
  • PVTv1 on COCO
Detector Backbone Pretrain Lr schd box AP mask AP
RetinaNet PVT-Tiny ImageNet-1K 1x 36.7 -
RetinaNet PVT-Small ImageNet-1K 1x 40.4 -
Mask RCNN PVT-Tiny ImageNet-1K 1x 36.7 35.1
Mask RCNN PVT-Small ImageNet-1K 1x 40.4 37.8
DETR PVT-Small ImageNet-1K 50ep 34.7 -

Semantic Segmentation

Segmentation configs & weights see >>>here<<<.

  • PVTv1 on ADE20K
Method Backbone Pretrain Iters mIoU
Semantic FPN PVT-Tiny ImageNet-1K 40K 35.7
Semantic FPN PVT-Small ImageNet-1K 40K 39.8
Semantic FPN PVT-Medium ImageNet-1K 40K 41.6
Semantic FPN PVT-Large ImageNet-1K 40K 42.1

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citation

If you use this code for a paper, please cite:

PVTv1

@misc{wang2021pyramid,
      title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2102.12122},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

PVTv2

@misc{wang2021pvtv2,
      title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2106.13797},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

This repo is currently maintained by Wenhai Wang (@whai362), Enze Xie (@xieenze), and Zhe Chen (@czczup).

Comments
  • Mask R-CNN configs

    Mask R-CNN configs

    Hi, thank you for your great work! Recently we would like to compare your model with ours on the Mask R-CNN results. I wonder if you can provide some configs for Mask R-CNN settings? Thanks!

    opened by xwjabc 10
  • semantic segmentation code

    semantic segmentation code

    Hi,thaks for your excellent work!!! I have read your paper named 'Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions', and want to apply it in my work in semantic segmentation, When will you make the semantic segmentation code and models public?

    opened by hgmlu 8
  • About FLOPs calculation in Table 2

    About FLOPs calculation in Table 2

    Hi Wenhai, thanks for this great work.

    I have few questions about the FLOPs calculation in this paper. Previously I tested the DeiT models with ptflops, I got 2.51G, 9.20G, 35.13G FLOPs for DeiT-Tiny, DeiT-Small, DeiT-Base, respectively.

    B.T.W I also included the matrix multiplications in the self-attention layer, namely q @ k and attn @ v. I assume there is something wrong with my calculation, may I know how do you calculate FLOPs for your experiments?

    Thanks.

    opened by HubHop 6
  • tkinter.tclerror

    tkinter.tclerror

    thanks for your work. i test demo.py and face this problems: if i comment out model.show_result, can obtain the result normally. Traceback (most recent call last): File "demo.py", line 62, in main(args) File "demo.py", line 35, in main model.show_result( File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 327, in show_result img = imshow_det_bboxes( File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/mmdet/core/visualization/image.py", line 113, in imshow_det_bboxes fig = plt.figure(win_name, frameon=False) File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/pyplot.py", line 687, in figure figManager = new_figure_manager(num, figsize=figsize, File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/pyplot.py", line 315, in new_figure_manager return _backend_mod.new_figure_manager(*args, **kwargs) File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 3494, in new_figure_manager return cls.new_figure_manager_given_figure(num, fig) File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/site-packages/matplotlib/backends/_backend_tk.py", line 885, in new_figure_manager_given_figure window = tk.Tk(className="matplotlib") File "/home/test/anaconda3/envs/pytensorrt/lib/python3.8/tkinter/init.py", line 2261, in init self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use) _tkinter.TclError: couldn't connect to display "localhost:10.0"

    opened by shengyuan-tang 4
  • how can i load pickle file?

    how can i load pickle file?

    thanks for sharing the code .. i'm trying to load pickle file to read it using these commands

    import pickle infile = open('data.pkl','rb') new_dict = pickle.load(infile) infile.close() print(type(new_dict)) but error is _pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified. i searched for the solution but got that pickle file appears to be using advanced features that suggest it was never supposed to be directly loaded this way. can you help, please ?

    opened by mathshangw 4
  • question for PVTv2: in the paper the reduction ratio is 7 in Linear SRA, but in the code  is sr_ratios=[8, 4, 2, 1]

    question for PVTv2: in the paper the reduction ratio is 7 in Linear SRA, but in the code is sr_ratios=[8, 4, 2, 1]

    question for PVTv2: in the paper the reduction ratio is 7 in Linear SRA, but in the code is sr_ratios=[8, 4, 2, 1],

    Is there something wrong with my understanding?

    opened by StormArcher 3
  • Low mAP on coco val

    Low mAP on coco val

    Hello, thx for your work. I was trying to train RetinaNet-FPN-PVTv2-B2-1x model on COCO2017, the reported mAP on val set is 44.6, but the results i got after training was only 33.5. Is there anything wrong?

    I trained on 8 V100 GPUs using your provided pre-trained model pvt_v2_b2.pth. Training script was: ./dist_train.sh configs/retinanet_pvt_v2_b2_fpn_1x_coco.py 8

    The config file was: model = dict( type='RetinaNet', pretrained='/opt/tiger/wanxingyu_tfm/pvt/pretrained/pvt_v2_b2.pth', backbone=dict( type='pvt_v2_b2', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='FPN', in_channels=[64, 128, 320, 512], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=5), bbox_head=dict( type='RetinaHead', num_classes=80, in_channels=256, stacked_convs=4, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) dataset_type = 'CocoDataset' data_root = '/opt/tiger/wanxingyu_tfm/datasets/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file= '/opt/tiger/wanxingyu_tfm/datasets/coco/annotations/instances_train2017.json', img_prefix='/opt/tiger/wanxingyu_tfm/datasets/coco/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='CocoDataset', ann_file= '/opt/tiger/wanxingyu_tfm/datasets/coco/annotations/instances_val2017.json', img_prefix='/opt/tiger/wanxingyu_tfm/datasets/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file= '/opt/tiger/wanxingyu_tfm/datasets/coco/annotations/instances_val2017.json', img_prefix='/opt/tiger/wanxingyu_tfm/datasets/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(interval=1, metric='bbox') optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) checkpoint_config = dict(interval=1) log_config = dict(interval=5, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] work_dir = './work_dirs/retinanet_pvt_v2_b2_fpn_1x_coco' gpu_ids = range(0, 8)

    Test script was: ./dist_test.sh configs/retinanet_pvt_v2_b2_fpn_1x_coco.py work_dirs/retinanet_pvt_v2_b2_fpn_1x_coco/epoch_12.pth 8 --eval bbox

    The result i got: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.335 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.514 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.352 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.190 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.356 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.450 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.525 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.525 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.525 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.325 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.561 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.683 OrderedDict([('bbox_mAP', 0.335), ('bbox_mAP_50', 0.514), ('bbox_mAP_75', 0.352), ('bbox_mAP_s', 0.19), ('bbox_mAP_m', 0.356), ('bbox_mAP_l', 0.45), ('bbox_mAP_copypaste', '0.335 0.514 0.352 0.190 0.356 0.450')])

    opened by memorywxy 3
  • How can I get small_pvt.pth?

    How can I get small_pvt.pth?

    I run your main.py .. I'm confusing what this class do ? it gave me the accuracy for 500 epoch and loss of them right ? and when I tried to train my images by this command 'dist_train.sh configs/retinanet_pvt_s_fpn_1x_coco_640.py 1'

    I got that small_pvt.pth not found .. excuse me does that will be the weights ? Or checkpoints ?

    Does small_pvt.pth here https://drive.google.com/file/d/1vtcyoU8KUqNzktlMGXZrYcMRsNNiVZFQ/view?usp=sharing For imagenet ? But how can I got pth file.if the dataset.is different . Appreciating your reply. Thanks

    opened by SamMohel 3
  • problems about loading pretrained model with pytorch version below 1.6

    problems about loading pretrained model with pytorch version below 1.6

    problems about loading pretrained model with pytorch version below 1.6

    pytorch 1.6 have switched torch.save to use a zip file-based format by default rather than the old Pickle-based format. This cause pytorchs with version below 1.6 can not load the pretained models AT ALL.

    Can you use "_use_new_zipfile_serialization=False" when using torch.save()? just like torch.save(m.state_dict(), 'mymodel.pt', _use_new_zipfile_serialization=False). And provide another version of pretrained models?

    Thanks a lot!!!!

    opened by WxWstranger 3
  • PVT Large deosn't converge

    PVT Large deosn't converge

    Thanks for your great work. But when I trained PVT Large (pvt_large) as your default settings, the model didn't converge. The loss declined correctly in the first 37 epochs and the accuracy went to 57% but the model went wrong at 38th epoch. I used your code without any change. What's the problem? Thank you!

    Below is a part of my training log.

    Test: Total time: 0:01:55 (0.4429 s / it)

    • [email protected] 57.009 [email protected] 81.174 loss 1.948 Accuracy of the network on the 50000 test images: 57.0% Max accuracy: 57.01% Epoch: [38] [ 0/1251] eta: 2:06:33 lr: 0.000963 loss: 4.9324 (4.9324) time: 6.0701 data: 3.6057 max mem: 25529 Epoch: [38] [ 10/1251] eta: 0:31:59 lr: 0.000963 loss: 4.5930 (4.5768) time: 1.5465 data: 0.3281 max mem: 25529 Epoch: [38] [ 20/1251] eta: 0:27:07 lr: 0.000963 loss: 4.6624 (4.6160) time: 1.0843 data: 0.0003 max mem: 25529 Epoch: [38] [ 30/1251] eta: 0:25:15 lr: 0.000963 loss: 4.7355 (4.5806) time: 1.0737 data: 0.0003 max mem: 25529 Epoch: [38] [ 40/1251] eta: 0:24:16 lr: 0.000963 loss: 4.6986 (4.5811) time: 1.0784 data: 0.0003 max mem: 25529 Epoch: [38] [ 50/1251] eta: 0:23:33 lr: 0.000963 loss: 4.6986 (4.5609) time: 1.0766 data: 0.0003 max mem: 25529 Epoch: [38] [ 60/1251] eta: 0:23:07 lr: 0.000963 loss: 4.7104 (4.5901) time: 1.0864 data: 0.0003 max mem: 25529 Epoch: [38] [ 70/1251] eta: 0:22:39 lr: 0.000963 loss: 4.8095 (4.6143) time: 1.0854 data: 0.0003 max mem: 25529 Epoch: [38] [ 80/1251] eta: 0:22:17 lr: 0.000963 loss: 4.7373 (4.5898) time: 1.0721 data: 0.0003 max mem: 25529 Epoch: [38] [ 90/1251] eta: 0:21:55 lr: 0.000963 loss: 4.4603 (4.5742) time: 1.0696 data: 0.0003 max mem: 25529 Epoch: [38] [ 100/1251] eta: 0:21:37 lr: 0.000963 loss: 4.5539 (4.5777) time: 1.0682 data: 0.0003 max mem: 25529 Epoch: [38] [ 110/1251] eta: 0:21:21 lr: 0.000963 loss: 4.9701 (4.5993) time: 1.0787 data: 0.0003 max mem: 25529 Epoch: [38] [ 120/1251] eta: 0:21:06 lr: 0.000963 loss: 4.9029 (4.5914) time: 1.0811 data: 0.0003 max mem: 25529 Epoch: [38] [ 130/1251] eta: 0:20:50 lr: 0.000963 loss: 4.7300 (4.5999) time: 1.0711 data: 0.0003 max mem: 25529 Epoch: [38] [ 140/1251] eta: 0:20:35 lr: 0.000963 loss: 4.7998 (4.5936) time: 1.0630 data: 0.0003 max mem: 25529 Epoch: [38] [ 150/1251] eta: 0:20:23 lr: 0.000963 loss: 4.8562 (4.5969) time: 1.0850 data: 0.0003 max mem: 25529 Epoch: [38] [ 160/1251] eta: 0:20:09 lr: 0.000963 loss: 4.8583 (4.5961) time: 1.0852 data: 0.0003 max mem: 25529 Epoch: [38] [ 170/1251] eta: 0:19:55 lr: 0.000963 loss: 4.8583 (4.6029) time: 1.0677 data: 0.0003 max mem: 25529 Epoch: [38] [ 180/1251] eta: 0:19:42 lr: 0.000963 loss: 5.0298 (4.6202) time: 1.0675 data: 0.0003 max mem: 25529 Epoch: [38] [ 190/1251] eta: 0:19:28 lr: 0.000963 loss: 4.8480 (4.6175) time: 1.0634 data: 0.0003 max mem: 25529 Epoch: [38] [ 200/1251] eta: 0:19:15 lr: 0.000963 loss: 4.6446 (4.6124) time: 1.0629 data: 0.0003 max mem: 25529 Epoch: [38] [ 210/1251] eta: 0:19:04 lr: 0.000963 loss: 4.8329 (4.6245) time: 1.0741 data: 0.0003 max mem: 25529 Epoch: [38] [ 220/1251] eta: 0:18:52 lr: 0.000963 loss: 4.9058 (4.6362) time: 1.0833 data: 0.0003 max mem: 25529 Epoch: [38] [ 230/1251] eta: 0:18:40 lr: 0.000963 loss: 4.7250 (4.6332) time: 1.0764 data: 0.0003 max mem: 25529 Epoch: [38] [ 240/1251] eta: 0:18:28 lr: 0.000963 loss: 4.6894 (4.6391) time: 1.0808 data: 0.0003 max mem: 25529 Epoch: [38] [ 250/1251] eta: 0:18:16 lr: 0.000963 loss: 4.8600 (4.6438) time: 1.0789 data: 0.0003 max mem: 25529 Epoch: [38] [ 260/1251] eta: 0:18:04 lr: 0.000963 loss: 4.9939 (4.6550) time: 1.0710 data: 0.0003 max mem: 25529 Epoch: [38] [ 270/1251] eta: 0:17:53 lr: 0.000963 loss: 4.7281 (4.6478) time: 1.0717 data: 0.0003 max mem: 25529 Epoch: [38] [ 280/1251] eta: 0:17:41 lr: 0.000963 loss: 4.3858 (4.6383) time: 1.0664 data: 0.0003 max mem: 25529 Epoch: [38] [ 290/1251] eta: 0:17:29 lr: 0.000963 loss: 4.5126 (4.6390) time: 1.0627 data: 0.0003 max mem: 25529 Epoch: [38] [ 300/1251] eta: 0:17:17 lr: 0.000963 loss: 4.3964 (4.6302) time: 1.0638 data: 0.0003 max mem: 25529 Epoch: [38] [ 310/1251] eta: 0:17:05 lr: 0.000963 loss: 4.3964 (4.6284) time: 1.0683 data: 0.0003 max mem: 25529 Epoch: [38] [ 320/1251] eta: 0:16:54 lr: 0.000963 loss: 4.4917 (4.6220) time: 1.0689 data: 0.0003 max mem: 25529 Epoch: [38] [ 330/1251] eta: 0:16:42 lr: 0.000963 loss: 4.7606 (4.6335) time: 1.0695 data: 0.0003 max mem: 25529 Epoch: [38] [ 340/1251] eta: 0:16:31 lr: 0.000963 loss: 5.0333 (4.6346) time: 1.0699 data: 0.0003 max mem: 25529 Epoch: [38] [ 350/1251] eta: 0:16:20 lr: 0.000963 loss: 4.6795 (4.6276) time: 1.0700 data: 0.0003 max mem: 25529 Epoch: [38] [ 360/1251] eta: 0:16:08 lr: 0.000963 loss: 4.7723 (4.6305) time: 1.0728 data: 0.0003 max mem: 25529 Epoch: [38] [ 370/1251] eta: 0:15:57 lr: 0.000963 loss: 4.8322 (4.6305) time: 1.0767 data: 0.0003 max mem: 25529 Epoch: [38] [ 380/1251] eta: 0:15:46 lr: 0.000963 loss: 4.7535 (4.6310) time: 1.0725 data: 0.0003 max mem: 25529 Epoch: [38] [ 390/1251] eta: 0:15:35 lr: 0.000963 loss: 4.5236 (4.6247) time: 1.0746 data: 0.0003 max mem: 25529 Epoch: [38] [ 400/1251] eta: 0:15:24 lr: 0.000963 loss: 4.5129 (4.6280) time: 1.0783 data: 0.0003 max mem: 25529 Epoch: [38] [ 410/1251] eta: 0:15:13 lr: 0.000963 loss: 4.6520 (4.6250) time: 1.0803 data: 0.0003 max mem: 25529 Epoch: [38] [ 420/1251] eta: 0:15:02 lr: 0.000963 loss: 4.6115 (4.6235) time: 1.0841 data: 0.0003 max mem: 25529 Epoch: [38] [ 430/1251] eta: 0:14:51 lr: 0.000963 loss: 4.5550 (4.6176) time: 1.0788 data: 0.0003 max mem: 25529 Epoch: [38] [ 440/1251] eta: 0:14:40 lr: 0.000963 loss: 4.3985 (4.6097) time: 1.0745 data: 0.0003 max mem: 25529 Epoch: [38] [ 450/1251] eta: 0:14:29 lr: 0.000963 loss: 4.5041 (4.6144) time: 1.0711 data: 0.0004 max mem: 25529 Epoch: [38] [ 460/1251] eta: 0:14:18 lr: 0.000963 loss: 4.7949 (4.6127) time: 1.0769 data: 0.0003 max mem: 25529 Epoch: [38] [ 470/1251] eta: 0:14:07 lr: 0.000963 loss: 4.7556 (4.6148) time: 1.0773 data: 0.0003 max mem: 25529 Epoch: [38] [ 480/1251] eta: 0:13:56 lr: 0.000963 loss: 5.0523 (4.6200) time: 1.0845 data: 0.0003 max mem: 25529 Epoch: [38] [ 490/1251] eta: 0:13:45 lr: 0.000963 loss: 4.5865 (4.6152) time: 1.0781 data: 0.0003 max mem: 25529 Epoch: [38] [ 500/1251] eta: 0:13:34 lr: 0.000963 loss: 4.6311 (4.6210) time: 1.0776 data: 0.0003 max mem: 25529 Epoch: [38] [ 510/1251] eta: 0:13:23 lr: 0.000963 loss: 4.8767 (4.6208) time: 1.0855 data: 0.0003 max mem: 25529 Epoch: [38] [ 520/1251] eta: 0:13:13 lr: 0.000963 loss: 4.7439 (4.6204) time: 1.0891 data: 0.0003 max mem: 25529 Epoch: [38] [ 530/1251] eta: 0:13:02 lr: 0.000963 loss: 4.7974 (4.6190) time: 1.0813 data: 0.0003 max mem: 25529 Epoch: [38] [ 540/1251] eta: 0:12:51 lr: 0.000963 loss: 4.6865 (4.6171) time: 1.0676 data: 0.0003 max mem: 25529 Epoch: [38] [ 550/1251] eta: 0:12:40 lr: 0.000963 loss: 4.4560 (4.6144) time: 1.0727 data: 0.0003 max mem: 25529 Epoch: [38] [ 560/1251] eta: 0:12:29 lr: 0.000963 loss: 4.2302 (4.6069) time: 1.0761 data: 0.0003 max mem: 25529 Epoch: [38] [ 570/1251] eta: 0:12:18 lr: 0.000963 loss: 4.3246 (4.6080) time: 1.0741 data: 0.0003 max mem: 25529 Epoch: [38] [ 580/1251] eta: 0:12:07 lr: 0.000963 loss: 4.5513 (4.6052) time: 1.0661 data: 0.0003 max mem: 25529 Epoch: [38] [ 590/1251] eta: 0:11:56 lr: 0.000963 loss: 4.4924 (4.6075) time: 1.0740 data: 0.0003 max mem: 25529 Epoch: [38] [ 600/1251] eta: 0:11:45 lr: 0.000963 loss: 4.5949 (4.6052) time: 1.0817 data: 0.0003 max mem: 25529 Epoch: [38] [ 610/1251] eta: 0:11:34 lr: 0.000963 loss: 4.5321 (4.6035) time: 1.0638 data: 0.0003 max mem: 25529 Epoch: [38] [ 620/1251] eta: 0:11:23 lr: 0.000963 loss: 4.7689 (4.6075) time: 1.0604 data: 0.0003 max mem: 25529 Epoch: [38] [ 630/1251] eta: 0:11:12 lr: 0.000963 loss: 4.7689 (4.6088) time: 1.0649 data: 0.0003 max mem: 25529 Epoch: [38] [ 640/1251] eta: 0:11:01 lr: 0.000963 loss: 4.4721 (4.6039) time: 1.0580 data: 0.0003 max mem: 25529 Epoch: [38] [ 650/1251] eta: 0:10:50 lr: 0.000963 loss: 4.5410 (4.6067) time: 1.0654 data: 0.0003 max mem: 25529 Epoch: [38] [ 660/1251] eta: 0:10:39 lr: 0.000963 loss: 4.5659 (4.5996) time: 1.0689 data: 0.0003 max mem: 25529 Epoch: [38] [ 670/1251] eta: 0:10:28 lr: 0.000963 loss: 4.4456 (4.5999) time: 1.0727 data: 0.0003 max mem: 25529 Epoch: [38] [ 680/1251] eta: 0:10:17 lr: 0.000963 loss: 4.8766 (4.6035) time: 1.0818 data: 0.0003 max mem: 25529 Epoch: [38] [ 690/1251] eta: 0:10:06 lr: 0.000963 loss: 4.8766 (4.6041) time: 1.0854 data: 0.0003 max mem: 25529 Epoch: [38] [ 700/1251] eta: 0:09:55 lr: 0.000963 loss: 4.9327 (4.6104) time: 1.0805 data: 0.0003 max mem: 25529 Epoch: [38] [ 710/1251] eta: 0:09:44 lr: 0.000963 loss: 5.0049 (4.6129) time: 1.0702 data: 0.0003 max mem: 25529 Epoch: [38] [ 720/1251] eta: 0:09:34 lr: 0.000963 loss: 4.6922 (4.6117) time: 1.0673 data: 0.0003 max mem: 25529 Epoch: [38] [ 730/1251] eta: 0:09:23 lr: 0.000963 loss: 4.6331 (4.6107) time: 1.0810 data: 0.0003 max mem: 25529 Epoch: [38] [ 740/1251] eta: 0:09:12 lr: 0.000963 loss: 4.5547 (4.6111) time: 1.0795 data: 0.0003 max mem: 25529 Epoch: [38] [ 750/1251] eta: 0:09:01 lr: 0.000963 loss: 4.8843 (4.6181) time: 1.0719 data: 0.0003 max mem: 25529 Epoch: [38] [ 760/1251] eta: 0:08:50 lr: 0.000963 loss: 4.8843 (4.6160) time: 1.0851 data: 0.0003 max mem: 25529 Epoch: [38] [ 770/1251] eta: 0:08:40 lr: 0.000963 loss: 4.2934 (4.6119) time: 1.0840 data: 0.0003 max mem: 25529 Epoch: [38] [ 780/1251] eta: 0:08:29 lr: 0.000963 loss: 4.1930 (4.6087) time: 1.0784 data: 0.0003 max mem: 25529 Epoch: [38] [ 790/1251] eta: 0:08:18 lr: 0.000963 loss: 4.4176 (4.6073) time: 1.0748 data: 0.0003 max mem: 25529 Epoch: [38] [ 800/1251] eta: 0:08:07 lr: 0.000963 loss: 4.7402 (4.6115) time: 1.0681 data: 0.0003 max mem: 25529 Epoch: [38] [ 810/1251] eta: 0:07:56 lr: 0.000963 loss: 4.7749 (4.6094) time: 1.0713 data: 0.0003 max mem: 25529 Epoch: [38] [ 820/1251] eta: 0:07:45 lr: 0.000963 loss: 4.6709 (4.6079) time: 1.0732 data: 0.0003 max mem: 25529 Epoch: [38] [ 830/1251] eta: 0:07:34 lr: 0.000963 loss: 4.7506 (4.6088) time: 1.0641 data: 0.0003 max mem: 25529 Epoch: [38] [ 840/1251] eta: 0:07:23 lr: 0.000963 loss: 4.8636 (4.6112) time: 1.0592 data: 0.0003 max mem: 25529 Epoch: [38] [ 850/1251] eta: 0:07:13 lr: 0.000963 loss: 4.9930 (4.6116) time: 1.0767 data: 0.0003 max mem: 25529 Epoch: [38] [ 860/1251] eta: 0:07:02 lr: 0.000963 loss: 5.0639 (4.6155) time: 1.0766 data: 0.0003 max mem: 25529 Epoch: [38] [ 870/1251] eta: 0:06:51 lr: 0.000963 loss: 5.0486 (4.6160) time: 1.0683 data: 0.0003 max mem: 25529 Epoch: [38] [ 880/1251] eta: 0:06:40 lr: 0.000963 loss: 4.6785 (4.6145) time: 1.0654 data: 0.0003 max mem: 25529 Epoch: [38] [ 890/1251] eta: 0:06:29 lr: 0.000963 loss: 4.6382 (4.6126) time: 1.0603 data: 0.0003 max mem: 25529 Epoch: [38] [ 900/1251] eta: 0:06:18 lr: 0.000963 loss: 4.9989 (4.6179) time: 1.0642 data: 0.0003 max mem: 25529 Epoch: [38] [ 910/1251] eta: 0:06:08 lr: 0.000963 loss: 5.0227 (4.6205) time: 1.0740 data: 0.0003 max mem: 25529 Epoch: [38] [ 920/1251] eta: 0:05:57 lr: 0.000963 loss: 4.7505 (4.6198) time: 1.0733 data: 0.0003 max mem: 25529 Epoch: [38] [ 930/1251] eta: 0:05:46 lr: 0.000963 loss: 4.6593 (4.6196) time: 1.0636 data: 0.0003 max mem: 25529 Epoch: [38] [ 940/1251] eta: 0:05:35 lr: 0.000963 loss: 4.7349 (4.6184) time: 1.0697 data: 0.0003 max mem: 25529 Epoch: [38] [ 950/1251] eta: 0:05:24 lr: 0.000963 loss: 4.8424 (4.6185) time: 1.0741 data: 0.0003 max mem: 25529 Epoch: [38] [ 960/1251] eta: 0:05:13 lr: 0.000963 loss: 4.5308 (4.6170) time: 1.0704 data: 0.0003 max mem: 25529 Epoch: [38] [ 970/1251] eta: 0:05:03 lr: 0.000963 loss: 4.6764 (4.6186) time: 1.0749 data: 0.0003 max mem: 25529 Epoch: [38] [ 980/1251] eta: 0:04:52 lr: 0.000963 loss: 4.6764 (4.6176) time: 1.0768 data: 0.0004 max mem: 25529 Epoch: [38] [ 990/1251] eta: 0:04:41 lr: 0.000963 loss: 4.5145 (4.6176) time: 1.0677 data: 0.0004 max mem: 25529 Epoch: [38] [1000/1251] eta: 0:04:30 lr: 0.000963 loss: 4.5645 (4.6202) time: 1.0686 data: 0.0003 max mem: 25529 Epoch: [38] [1010/1251] eta: 0:04:19 lr: 0.000963 loss: 5.3548 (4.6373) time: 1.0613 data: 0.0003 max mem: 25529 Epoch: [38] [1020/1251] eta: 0:04:09 lr: 0.000963 loss: 6.9353 (4.6599) time: 1.0595 data: 0.0003 max mem: 25529 Epoch: [38] [1030/1251] eta: 0:03:58 lr: 0.000963 loss: 6.9423 (4.6820) time: 1.0729 data: 0.0003 max mem: 25529 Epoch: [38] [1040/1251] eta: 0:03:47 lr: 0.000963 loss: 6.9381 (4.7036) time: 1.0715 data: 0.0003 max mem: 25529 Epoch: [38] [1050/1251] eta: 0:03:36 lr: 0.000963 loss: 6.9351 (4.7248) time: 1.0717 data: 0.0003 max mem: 25529 Epoch: [38] [1060/1251] eta: 0:03:25 lr: 0.000963 loss: 6.9315 (4.7456) time: 1.0655 data: 0.0003 max mem: 25529 Epoch: [38] [1070/1251] eta: 0:03:15 lr: 0.000963 loss: 6.9319 (4.7660) time: 1.0609 data: 0.0003 max mem: 25529 Epoch: [38] [1080/1251] eta: 0:03:04 lr: 0.000963 loss: 6.9287 (4.7860) time: 1.0717 data: 0.0003 max mem: 25529 Epoch: [38] [1090/1251] eta: 0:02:53 lr: 0.000963 loss: 6.9198 (4.8055) time: 1.0834 data: 0.0003 max mem: 25529 Epoch: [38] [1100/1251] eta: 0:02:42 lr: 0.000963 loss: 6.9219 (4.8248) time: 1.0835 data: 0.0003 max mem: 25529 Epoch: [38] [1110/1251] eta: 0:02:32 lr: 0.000963 loss: 6.9286 (4.8437) time: 1.1036 data: 0.0003 max mem: 25529 Epoch: [38] [1120/1251] eta: 0:02:21 lr: 0.000963 loss: 6.9209 (4.8622) time: 1.0965 data: 0.0003 max mem: 25529 Epoch: [38] [1130/1251] eta: 0:02:10 lr: 0.000963 loss: 6.9212 (4.8804) time: 1.0701 data: 0.0003 max mem: 25529 Epoch: [38] [1140/1251] eta: 0:01:59 lr: 0.000963 loss: 6.9192 (4.8983) time: 1.0686 data: 0.0003 max mem: 25529 Epoch: [38] [1150/1251] eta: 0:01:48 lr: 0.000963 loss: 6.9192 (4.9159) time: 1.0640 data: 0.0003 max mem: 25529 Epoch: [38] [1160/1251] eta: 0:01:38 lr: 0.000963 loss: 6.9231 (4.9332) time: 1.0687 data: 0.0003 max mem: 25529 Epoch: [38] [1170/1251] eta: 0:01:27 lr: 0.000963 loss: 6.9241 (4.9502) time: 1.0702 data: 0.0003 max mem: 25529 Epoch: [38] [1180/1251] eta: 0:01:16 lr: 0.000963 loss: 6.9240 (4.9669) time: 1.0687 data: 0.0003 max mem: 25529 Epoch: [38] [1190/1251] eta: 0:01:05 lr: 0.000963 loss: 6.9198 (4.9833) time: 1.0668 data: 0.0003 max mem: 25529 Epoch: [38] [1200/1251] eta: 0:00:54 lr: 0.000963 loss: 6.9150 (4.9993) time: 1.0864 data: 0.0003 max mem: 25529 Epoch: [38] [1210/1251] eta: 0:00:44 lr: 0.000963 loss: 6.9144 (5.0152) time: 1.0855 data: 0.0003 max mem: 25529 Epoch: [38] [1220/1251] eta: 0:00:33 lr: 0.000963 loss: 6.9167 (5.0308) time: 1.0714 data: 0.0003 max mem: 25529 Epoch: [38] [1230/1251] eta: 0:00:22 lr: 0.000963 loss: 6.9167 (5.0461) time: 1.0702 data: 0.0003 max mem: 25529 Epoch: [38] [1240/1251] eta: 0:00:11 lr: 0.000963 loss: 6.9135 (5.0612) time: 1.0574 data: 0.0005 max mem: 25529 Epoch: [38] [1250/1251] eta: 0:00:01 lr: 0.000963 loss: 6.9179 (5.0760) time: 1.0532 data: 0.0004 max mem: 25529 Epoch: [38] Total time: 0:22:28 (1.0781 s / it) Averaged stats: lr: 0.000963 loss: 6.9179 (5.0558) Test: [ 0/261] eta: 0:31:19 loss: 6.8103 (6.8103) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 7.2018 data: 6.7932 max mem: 25529 Test: [ 10/261] eta: 0:04:17 loss: 6.9766 (6.9290) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 1.0263 data: 0.6262 max mem: 25529 Test: [ 20/261] eta: 0:02:56 loss: 6.9750 (6.9375) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 0.4103 data: 0.0066 max mem: 25529 Test: [ 30/261] eta: 0:02:25 loss: 6.9495 (6.9457) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.0000) time: 0.4091 data: 0.0024 max mem: 25529 Test: [ 40/261] eta: 0:02:06 loss: 6.9158 (6.9258) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.6352) time: 0.4017 data: 0.0010 max mem: 25529 Test: [ 50/261] eta: 0:01:53 loss: 6.8871 (6.9364) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.5106) time: 0.3975 data: 0.0007 max mem: 25529 Test: [ 60/261] eta: 0:01:43 loss: 6.9326 (6.9323) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.4269) time: 0.3969 data: 0.0007 max mem: 25529 Test: [ 70/261] eta: 0:01:35 loss: 6.8942 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3668) time: 0.3951 data: 0.0016 max mem: 25529 Test: [ 80/261] eta: 0:01:27 loss: 6.8974 (6.9259) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3215) time: 0.3954 data: 0.0025 max mem: 25529 Test: [ 90/261] eta: 0:01:21 loss: 6.9066 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2862) time: 0.3983 data: 0.0017 max mem: 25529 Test: [100/261] eta: 0:01:15 loss: 6.9556 (6.9323) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2578) time: 0.3960 data: 0.0009 max mem: 25529 Test: [110/261] eta: 0:01:09 loss: 6.9268 (6.9298) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2346) time: 0.3962 data: 0.0010 max mem: 25529 Test: [120/261] eta: 0:01:04 loss: 6.8970 (6.9270) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.2152) time: 0.4211 data: 0.0242 max mem: 25529 Test: [130/261] eta: 0:00:59 loss: 6.8970 (6.9251) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.1988) time: 0.4183 data: 0.0242 max mem: 25529 Test: [140/261] eta: 0:00:54 loss: 6.9251 (6.9268) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3694) time: 0.3986 data: 0.0018 max mem: 25529 Test: [150/261] eta: 0:00:49 loss: 6.9534 (6.9264) acc1: 0.0000 (0.0000) acc5: 0.0000 (0.3449) time: 0.4021 data: 0.0045 max mem: 25529 Test: [160/261] eta: 0:00:45 loss: 6.8927 (6.9243) acc1: 0.0000 (0.1617) acc5: 0.0000 (0.4852) time: 0.4124 data: 0.0182 max mem: 25529 Test: [170/261] eta: 0:00:40 loss: 6.8886 (6.9231) acc1: 0.0000 (0.1523) acc5: 0.0000 (0.4569) time: 0.4112 data: 0.0157 max mem: 25529 Test: [180/261] eta: 0:00:35 loss: 6.9188 (6.9233) acc1: 0.0000 (0.1439) acc5: 0.0000 (0.4316) time: 0.3997 data: 0.0016 max mem: 25529 Test: [190/261] eta: 0:00:31 loss: 6.9170 (6.9216) acc1: 0.0000 (0.1363) acc5: 0.0000 (0.4090) time: 0.4233 data: 0.0265 max mem: 25529 Test: [200/261] eta: 0:00:26 loss: 6.9137 (6.9224) acc1: 0.0000 (0.1296) acc5: 0.0000 (0.3887) time: 0.4463 data: 0.0536 max mem: 25529 Test: [210/261] eta: 0:00:22 loss: 6.9097 (6.9210) acc1: 0.0000 (0.1234) acc5: 0.0000 (0.3703) time: 0.5000 data: 0.1046 max mem: 25529 Test: [220/261] eta: 0:00:18 loss: 6.8762 (6.9184) acc1: 0.0000 (0.1178) acc5: 0.0000 (0.3535) time: 0.4731 data: 0.0773 max mem: 25529 Test: [230/261] eta: 0:00:13 loss: 6.8775 (6.9185) acc1: 0.0000 (0.1127) acc5: 0.0000 (0.4509) time: 0.3974 data: 0.0048 max mem: 25529 Test: [240/261] eta: 0:00:09 loss: 6.9246 (6.9183) acc1: 0.0000 (0.1081) acc5: 0.0000 (0.4322) time: 0.4009 data: 0.0050 max mem: 25529 Test: [250/261] eta: 0:00:04 loss: 6.9132 (6.9190) acc1: 0.0000 (0.1038) acc5: 0.0000 (0.5188) time: 0.3949 data: 0.0010 max mem: 25529 Test: [260/261] eta: 0:00:00 loss: 6.9128 (6.9180) acc1: 0.0000 (0.1000) acc5: 0.0000 (0.5000) time: 0.3788 data: 0.0001 max mem: 25529 Test: Total time: 0:01:54 (0.4370 s / it)
    • [email protected] 0.100 [email protected] 0.500 loss 6.918 Accuracy of the network on the 50000 test images: 0.1% Max accuracy: 57.01%
    opened by VictorLlu 3
  • pretrained model load

    pretrained model load

    hello~, i am very interested in your work. Now i meet some questions when the pretrained model was load image checkpoint = torch.load(args.finetune, map_location='cpu')

    debug: image pos_embed_checkpoint = checkpoint_model['pos_embed'] the checkpiont have "pos_embed1" "pos_embed2" "pos_embed3" "pos_embed4", but no "pos_embed"

    opened by surelyee 3
  • Why there is no DETR+PVTv2 in object detection?

    Why there is no DETR+PVTv2 in object detection?

    I noticed that there is DETR+PVTv1, although its AP value is not satisfactory. Why is there no implementation of DETR+PVTv2? Is it ineffective or just not provided yet.

    opened by yuhua666 0
  • Did you train PVT on ImageNet22k?

    Did you train PVT on ImageNet22k?

    Thank you for your great work! As the title descripted, I want to know about your ImageNet22k results. I saw a checkpoint of PVT_v2_b5 on imagenet_22k in your release. Is that useful?

    opened by Roger-Liang 0
  • Question about cls token

    Question about cls token

    Hi author! thanks for your nice work.

    I have a question about cls token in PVT.

    In ViT and DeiT, cls token is appended at input embedding process. But PVT append cls token at input of last stage.

    Why PVT doesn't append cls token at input embedding process?

    Thanks.

    opened by eremo2002 0
  • without Convolutions?

    without Convolutions?

    Paper offers convolution free architecture but implementation contains convolution, at pvt2 paper authors says spatial reduction done with conv but I could not see that in pvt1. Is there any other way to do that?

    opened by Oguzhanercan 0
  • Question about pooling size

    Question about pooling size

    Hi @whai362

    I was wondering why the pooling size is set to 7 for all stages? Have you tried a higher pooling size (e.g. more keys and values) for the initial stages while decreasing in later stages?

    opened by magehrig 0
Code and datasets for TPAMI 2021

SkeletonNet This repository constains the codes and ShapeNetV1-Surface-Skeleton,ShapNetV1-SkeletalVolume and 2d image datasets ShapeNetRendering. Plea

34 Aug 15, 2022
Pytorch implementation of set transformer

set_transformer Official PyTorch implementation of the paper Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks .

Juho Lee 410 Jan 06, 2023
Numba-accelerated Pythonic implementation of MPDATA with examples in Python, Julia and Matlab

PyMPDATA PyMPDATA is a high-performance Numba-accelerated Pythonic implementation of the MPDATA algorithm of Smolarkiewicz et al. used in geophysical

Atmospheric Cloud Simulation Group @ Jagiellonian University 15 Nov 23, 2022
Self-Supervised Contrastive Learning of Music Spectrograms

Self-Supervised Music Analysis Self-Supervised Contrastive Learning of Music Spectrograms Dataset Songs on the Billboard Year End Hot 100 were collect

27 Dec 10, 2022
A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

DeepFilterNet A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering. libDF contains Rust code used for dat

Hendrik Schröter 292 Dec 25, 2022
StorSeismic: An approach to pre-train a neural network to store seismic data features

StorSeismic: An approach to pre-train a neural network to store seismic data features This repository contains codes and resources to reproduce experi

Seismic Wave Analysis Group 11 Dec 05, 2022
PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This is the original implementation of our paper, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem (arXiv:1706.1

Zhengyao Jiang 1.5k Dec 29, 2022
PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
Tutorial in Python targeted at Epidemiologists. Will discuss the basics of analysis in Python 3

Python-for-Epidemiologists This repository is an introduction to epidemiology analyses in Python. Additionally, the tutorials for my library zEpid are

Paul Zivich 120 Nov 17, 2022
Large scale and asynchronous Hyperparameter Optimization at your fingertip.

Syne Tune This package provides state-of-the-art distributed hyperparameter optimizers (HPO) where trials can be evaluated with several backend option

Amazon Web Services - Labs 236 Jan 01, 2023
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

2.7k Jan 03, 2023
Official PyTorch Implementation of Embedding Transfer with Label Relaxation for Improved Metric Learning, CVPR 2021

Embedding Transfer with Label Relaxation for Improved Metric Learning Official PyTorch implementation of CVPR 2021 paper Embedding Transfer with Label

Sungyeon Kim 37 Dec 06, 2022
Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Welcome to Barlow Barlow is a tool for identifying the failure modes for a given neural network. To achieve this, Barlow first creates a group of imag

Sahil Singla 33 Dec 05, 2022
Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction

Official PyTorch code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction. Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe,

stanley 152 Dec 16, 2022
Download & Install mods for your favorit game with a few simple clicks

Husko's SteamWorkshop Downloader 🔴 IMPORTANT ❗ 🔴 The Tool is currently being rewritten so updates will be slow and only on the dev branch until it i

Husko 67 Nov 25, 2022
Semantic Edge Detection with Diverse Deep Supervision

Semantic Edge Detection with Diverse Deep Supervision This repository contains the code for our IJCV paper: "Semantic Edge Detection with Diverse Deep

Yun Liu 12 Dec 31, 2022
Collection of sports betting AI tools.

sports-betting sports-betting is a collection of tools that makes it easy to create machine learning models for sports betting and evaluate their perf

George Douzas 109 Dec 31, 2022
This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020

README This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et a

Raghav 42 Dec 15, 2022
BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

Overview BisQue is a web-based platform specifically designed to provide researchers with organizational and quantitative analysis tools for up to 5D

Vision Research Lab @ UCSB 26 Nov 29, 2022
This is the official Pytorch implementation of the paper "Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model"

Diverse Motion Stylization (Official) This is the official Pytorch implementation of this paper. Diverse Motion Stylization for Multiple Style Domains

Soomin Park 28 Dec 16, 2022