This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

Overview

PyTorch Infer Utils

This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

To install

git clone https://github.com/gorodnitskiy/pytorch_infer_utils.git
pip install /path/to/pytorch_infer_utils/

Export PyTorch model to ONNX

  • Check model for denormal weights to achieve better performance. Use load_weights_rounded_model func to load model with weights rounding:
    from pytorch_infer_utils import load_weights_rounded_model
    
    model = ModelClass()
    load_weights_rounded_model(
        model,
        "/path/to/model_state_dict",
        map_location=map_location
    )
    
  • Use ONNXExporter.torch2onnx method to export pytorch model to ONNX:
    from pytorch_infer_utils import ONNXExporter
    
    model = ModelClass()
    model.load_state_dict(
        torch.load("/path/to/model_state_dict", map_location=map_location)
    )
    model.eval()
    
    exporter = ONNXExporter()
    input_shapes = [-1, 3, 224, 224] # -1 means that is dynamic shape
    exporter.torch2onnx(model, "/path/to/model.onnx", input_shapes)
    
  • Use ONNXExporter.optimize_onnx method to optimize ONNX via onnxoptimizer:
    from pytorch_infer_utils import ONNXExporter
    
    exporter = ONNXExporter()
    exporter.optimize_onnx("/path/to/model.onnx", "/path/to/optimized_model.onnx")
    
  • Use ONNXExporter.optimize_onnx_sim method to optimize ONNX via onnx-simplifier. Be careful with onnx-simplifier not to lose dynamic shapes.
    from pytorch_infer_utils import ONNXExporter
    
    exporter = ONNXExporter()
    exporter.optimize_onnx_sim("/path/to/model.onnx", "/path/to/optimized_model.onnx")
    
  • Also, a method combined the above methods is available ONNXExporter.torch2optimized_onnx:
    from pytorch_infer_utils import ONNXExporter
    
    model = ModelClass()
    model.load_state_dict(
        torch.load("/path/to/model_state_dict", map_location=map_location)
    )
    model.eval()
    
    exporter = ONNXExporter()
    input_shapes = [-1, 3, -1, -1] # -1 means that is dynamic shape
    exporter.torch2optimized_onnx(model, "/path/to/model.onnx", input_shapes)
    
  • Other params that can be used in class initialization:
    • default_shapes: default shapes if dimension is dynamic, default = [1, 3, 224, 224]
    • onnx_export_params:
      • export_params: store the trained parameter weights inside the model file, default = True
      • do_constant_folding: whether to execute constant folding for optimization, default = True
      • input_names: the model's input names, default = ["input"]
      • output_names: the model's output names, default = ["output"]
      • opset_version: the ONNX version to export the model to, default = 11
    • onnx_optimize_params:
      • fixed_point: use fixed point, default = False
      • passes: optimization passes, default = [ "eliminate_deadend", "eliminate_duplicate_initializer", "eliminate_identity", "eliminate_if_with_const_cond", "eliminate_nop_cast", "eliminate_nop_dropout", "eliminate_nop_flatten", "eliminate_nop_monotone_argmax", "eliminate_nop_pad", "eliminate_nop_transpose", "eliminate_unused_initializer", "extract_constant_to_initializer", "fuse_add_bias_into_conv", "fuse_bn_into_conv", "fuse_consecutive_concats", "fuse_consecutive_log_softmax", "fuse_consecutive_reduce_unsqueeze", "fuse_consecutive_squeezes", "fuse_consecutive_transposes", "fuse_matmul_add_bias_into_gemm", "fuse_pad_into_conv", "fuse_transpose_into_gemm", "lift_lexical_references", "nop" ]

Export ONNX to TensorRT

  • Check TensorRT health via check_tensorrt_health func
  • Use TRTEngineBuilder.build_engine method to export ONNX to TensorRT:
    from pytorch_infer_utils import TRTEngineBuilder
    
    exporter = TRTEngineBuilder()
    # get engine by itself
    engine = exporter.build_engine("/path/to/model.onnx")
    # or save engine to /path/to/model.trt
    exporter.build_engine("/path/to/model.onnx", engine_path="/path/to/model.trt")
    
  • fp16_mode is available:
    from pytorch_infer_utils import TRTEngineBuilder
    
    exporter = TRTEngineBuilder()
    engine = exporter.build_engine("/path/to/model.onnx", fp16_mode=True)
    
  • int8_mode is available. It requires calibration_set of images as List[Any], load_image_func - func to correctly read and process images, max_image_shape - max image size as [C, H, W] to allocate correct size of memory:
    from pytorch_infer_utils import TRTEngineBuilder
    
    exporter = TRTEngineBuilder()
    engine = exporter.build_engine(
        "/path/to/model.onnx",
        int8_mode=True,
        calibration_set=calibration_set,
        max_image_shape=max_image_shape,
        load_image_func=load_image_func,
    )
    
  • Also, additional params for builder config builder.create_builder_config can be put to kwargs.
  • Other params that can be used in class initialization:
    • opt_shape_dict: optimal shapes, default = {'input': [[1, 3, 224, 224], [1, 3, 224, 224], [1, 3, 224, 224]]}
    • max_workspace_size: max workspace size, default = [1, 30]
    • stream_batch_size: batch size for forward network during transferring to int8, default = 100
    • cache_file: int8_mode cache filename, default = "model.trt.int8calibration"

Inference via onnxruntime on CPU and onnx_tensort on GPU

  • Base class ONNXWrapper __init__ has the structure as below:
    def __init__(
        self,
        onnx_path: str,
        gpu_device_id: Optional[int] = None,
        intra_op_num_threads: Optional[int] = 0,
        inter_op_num_threads: Optional[int] = 0,
    ) -> None:
        """
        :param onnx_path: onnx-file path, required
        :param gpu_device_id: gpu device id to use, default = 0
        :param intra_op_num_threads: ort_session_options.intra_op_num_threads,
            to let onnxruntime choose by itself is required 0, default = 0
        :param inter_op_num_threads: ort_session_options.inter_op_num_threads,
            to let onnxruntime choose by itself is required 0, default = 0
        :type onnx_path: str
        :type gpu_device_id: int
        :type intra_op_num_threads: int
        :type inter_op_num_threads: int
        """
        if gpu_device_id is None:
            import onnxruntime
    
            self.is_using_tensorrt = False
            ort_session_options = onnxruntime.SessionOptions()
            ort_session_options.intra_op_num_threads = intra_op_num_threads
            ort_session_options.inter_op_num_threads = inter_op_num_threads
            self.ort_session = onnxruntime.InferenceSession(
                onnx_path, ort_session_options
            )
    
        else:
            import onnx
            import onnx_tensorrt.backend as backend
    
            self.is_using_tensorrt = True
            model_proto = onnx.load(onnx_path)
            for gr_input in model_proto.graph.input:
                gr_input.type.tensor_type.shape.dim[0].dim_value = 1
    
            self.engine = backend.prepare(
                model_proto, device=f"CUDA:{gpu_device_id}"
            )
    
  • ONNXWrapper.run method assumes the use of such a structure:
    img = self._process_img_(img)
    if self.is_using_tensorrt:
        preds = self.engine.run(img)
    else:
        ort_inputs = {self.ort_session.get_inputs()[0].name: img}
        preds = self.ort_session.run(None, ort_inputs)
    
    preds = self._process_preds_(preds)
    

Inference via onnxruntime on CPU and TensorRT on GPU

  • Base class TRTWrapper __init__ has the structure as below:
    def __init__(
        self,
        onnx_path: Optional[str] = None,
        trt_path: Optional[str] = None,
        gpu_device_id: Optional[int] = None,
        intra_op_num_threads: Optional[int] = 0,
        inter_op_num_threads: Optional[int] = 0,
        fp16_mode: bool = False,
    ) -> None:
        """
        :param onnx_path: onnx-file path, default = None
        :param trt_path: onnx-file path, default = None
        :param gpu_device_id: gpu device id to use, default = 0
        :param intra_op_num_threads: ort_session_options.intra_op_num_threads,
            to let onnxruntime choose by itself is required 0, default = 0
        :param inter_op_num_threads: ort_session_options.inter_op_num_threads,
            to let onnxruntime choose by itself is required 0, default = 0
        :param fp16_mode: use fp16_mode if class initializes only with
            onnx_path on GPU, default = False
        :type onnx_path: str
        :type trt_path: str
        :type gpu_device_id: int
        :type intra_op_num_threads: int
        :type inter_op_num_threads: int
        :type fp16_mode: bool
        """
        if gpu_device_id is None:
            import onnxruntime
    
            self.is_using_tensorrt = False
            ort_session_options = onnxruntime.SessionOptions()
            ort_session_options.intra_op_num_threads = intra_op_num_threads
            ort_session_options.inter_op_num_threads = inter_op_num_threads
            self.ort_session = onnxruntime.InferenceSession(
                onnx_path, ort_session_options
            )
    
        else:
            self.is_using_tensorrt = True
            if trt_path is None:
                builder = TRTEngineBuilder()
                trt_path = builder.build_engine(onnx_path, fp16_mode=fp16_mode)
    
            self.trt_session = TRTRunWrapper(trt_path)
    
  • TRTWrapper.run method assumes the use of such a structure:
    img = self._process_img_(img)
    if self.is_using_tensorrt:
        preds = self.trt_session.run(img)
    else:
        ort_inputs = {self.ort_session.get_inputs()[0].name: img}
        preds = self.ort_session.run(None, ort_inputs)
    
    preds = self._process_preds_(preds)
    

Environment

TensorRT

  • TensorRT installing guide is here
  • Required CUDA-Runtime, CUDA-ToolKit
  • Also, required additional python packages not included to setup.cfg (it depends upon CUDA environment version):
    • pycuda
    • nvidia-tensorrt
    • nvidia-pyindex

onnx_tensorrt

  • onnx_tensorrt requires cuda-runtime and tensorrt.
  • To install:
    git clone --depth 1 --branch 21.02 https://github.com/onnx/onnx-tensorrt.git
    cd onnx-tensorrt
    cp -r onnx_tensorrt /usr/local/lib/python3.8/dist-packages
    cd ..
    rm -rf onnx-tensorrt
    
Owner
Alex Gorodnitskiy
Computer Vision Engineer 🤖
Alex Gorodnitskiy
unofficial pytorch implementation of RefineGAN

RefineGAN unofficial pytorch implementation of RefineGAN (https://arxiv.org/abs/1709.00753) for CSMRI reconstruction, the official code using tensorpa

xinby17 5 Jul 21, 2022
This is the code of using DQN to play Sekiro .

Update for using DQN to play sekiro 2021.2.2(English Version) This is the code of using DQN to play Sekiro . I am very glad to tell that I have writen

144 Dec 25, 2022
A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

IconQA About IconQA is a new diverse abstract visual question answering dataset that highlights the importance of abstract diagram understanding and c

Pan Lu 24 Dec 30, 2022
Utilities to bridge Canvas-generated course rosters with GitLab's API.

gitlab-canvas-utils A collection of scripts originally written for CSE 13S. Oversees everything from GitLab course group creation, student repository

Eugene Chou 5 Jun 08, 2022
Official implementation of "Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks", NeurIPS 2021.

PHDimGeneralization Official implementation of "Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks", NeurIPS 2021. Overvie

Tolga Birdal 13 Nov 08, 2022
Production First and Production Ready End-to-End Speech Recognition Toolkit

WeNet 中文版 Discussions | Docs | Papers | Runtime (x86) | Runtime (android) | Pretrained Models We share neural Net together. The main motivation of WeN

2.7k Jan 04, 2023
This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

Deep-Detail-Enhancement-for-Any-Garment Introduction This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in

40 Dec 13, 2022
Fast convergence of detr with spatially modulated co-attention

Fast convergence of detr with spatially modulated co-attention Usage There are no extra compiled components in SMCA DETR and package dependencies are

peng gao 135 Dec 07, 2022
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

CALVIN CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks Oier Mees, Lukas Hermann, Erick Rosete,

Oier Mees 107 Dec 26, 2022
A library for augmentation of a YOLO-formated dataset

YOLO Dataset Augmentation lib Инструкция по использованию этой библиотеки Запуск всех файлов осуществлять из консоли. GoogleCrawl_to_Dataset.py Это ск

Egor Orel 1 Dec 10, 2022
Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets.

Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets.

beringresearch 285 Jan 04, 2023
💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena.

Heidelberg-NLP 17 Nov 07, 2022
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

What's New Below we share, in reverse chronological order, the updates and new releases in VISSL. All VISSL releases are available here. [Oct 2021]: V

Meta Research 2.9k Jan 07, 2023
Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22) Paper Link | Project Page Abstract : Manual an

Mohamed Afham 152 Dec 23, 2022
Single-Shot Motion Completion with Transformer

Single-Shot Motion Completion with Transformer 👉 [Preprint] 👈 Abstract Motion completion is a challenging and long-discussed problem, which is of gr

FuxiCV 78 Dec 29, 2022
Software & Hardware to do multi color printing with Sharpies

3D Print Colorizer is a combination of 3D printed parts and a Cura plugin which allows anyone with an Ender 3 like 3D printer to produce multi colored

343 Jan 06, 2023
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Bin Xiao 175 Jan 08, 2023
Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021] Paper: https://arxiv.org/abs/2104.11208 Introduction Despite the significa

76 Dec 07, 2022
Linescanning - Package for (pre)processing of anatomical and (linescanning) fMRI data

line scanning repository This repository contains all of the tools used during the acquisition and postprocessing of line scanning data at the Spinoza

Jurjen Heij 4 Sep 14, 2022
A list of all papers and resoureces on Semantic Segmentation

Semantic-Segmentation A list of all papers and resoureces on Semantic Segmentation. Dataset importance SemanticSegmentation_DL Some implementation of

Alan Tang 1.1k Dec 12, 2022