code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Last update: Oct 26, 2022

Related tags

Overview

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling

This repository contains our PyTorch training code, evaluation code and pretrained models for AttentiveNAS.

[Update 06/21] Recenty, we have improved AttentiveNAS using an adaptive knowledge distillation training strategy, see our AlphaNet repo for more details of this work. AlphaNet has been accepted by ICML'21.

[Update 07/21] We provide an example code for searching the best models of FLOPs vs. accuracy trade-offs at here.

For more details, please see AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling by Dilin Wang, Meng Li, Chengyue Gong and Vikas Chandra.

If you find this repo useful in your research, please consider citing our work:

@article{wang2020attentivenas,
  title={AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling},
  author={Wang, Dilin and Li, Meng and Gong, Chengyue and Chandra, Vikas},
  journal={arXiv preprint arXiv:2011.09011},
  year={2020}
}

Evaluation

To reproduce our results:

Please first download our pretrained AttentiveNAS models from a Google Drive path and put the pretrained models under your local folder ./attentive_nas_data

To evaluate our pre-trained AttentiveNAS models, from AttentiveNAS-A0 to A6, on ImageNet with a single GPU, please run:

python test_attentive_nas.py --config-file ./configs/eval_attentive_nas_models.yml --model a[0-6]

Expected results:

Name	MFLOPs	Top-1 (%)
AttentiveNAS-A0	203	77.3
AttentiveNAS-A1	279	78.4
AttentiveNAS-A2	317	78.8
AttentiveNAS-A3	357	79.1
AttentiveNAS-A4	444	79.8
AttentiveNAS-A5	491	80.1
AttentiveNAS-A6	709	80.7

Training

To train our AttentiveNAS models from scratch, please run

python train_attentive_nas.py --config-file configs/train_attentive_nas_models.yml --machine-rank ${machine_rank} --num-machines ${num_machines} --dist-url ${dist_url}

We adopt SGD training on 64 GPUs. The mini-batch size is 32 per GPU; all training hyper-parameters are specified in train_attentive_nas_models.yml.

Additional data

A (sub-network config, FLOPs) lookup table could be used for constructing the architecture distribution under FLOPs-constraints.
A accuracy predictor trained via scikit-learn, which takes a subnetwork configuration as input, and outputs its predicted accuracy on ImageNet.
- Convert a subnetwork configuration to our accuracy predictor compatibale inputs:
```
    res = [cfg['resolution']]
    for k in ['width', 'depth', 'kernel_size', 'expand_ratio']:
        res += cfg[k]
    input = np.asarray(res).reshape((1, -1))
```

License

The majority of AttentiveNAS is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Once For All is licensed under the Apache 2.0 license.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more info.

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Related tags

Overview

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling

Evaluation

Training

Additional data

License

Contributing

Owner

Facebook Research

Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA

Multi-Output Gaussian Process Toolkit

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

Auto White-Balance Correction for Mixed-Illuminant Scenes

TAUFE: Task-Agnostic Undesirable Feature DeactivationUsing Out-of-Distribution Data

Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

基于YoloX目标检测+DeepSort算法实现多目标追踪Baseline

Pytorch implementation of RED-SDS (NeurIPS 2021).

g9.py - Torch interactive graphics

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Medical-Image-Triage-and-Classification-System-Based-on-COVID-19-CT-and-X-ray-Scan-Dataset

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

TransCD: Scene Change Detection via Transformer-based Architecture

Code for the paper "On the Power of Edge Independent Graph Models"