Skip to content

Releases: NVIDIA/TensorRT

22.06

09 Jun 02:54
Compare
Choose a tag to compare

Commit used by the 22.06 TensorRT NGC container.

Changelog

Added

  • None

Changed

  • Disentangled attention (DMHA) plugin refactored
  • ONNX parser updated to 8.2GA

Removed

  • None

22.05

13 May 21:52
Compare
Choose a tag to compare

Commit used by the 22.05 TensorRT NGC container.

Changelog

Added

  • Disentangled attention plugin for DeBERTa
  • DMHA (multiscaleDeformableAttnPlugin) plugin for DDETR
  • Performance benchmarking mode to HuggingFace demo

Changed

  • Updated base TensorRT version to 8.2.5.1
  • Updated onnx-graphsurgeon v0.3.19 CHANGELOG
  • fp16 support for pillarScatterPlugin
  • #1939 - Fixed path in quantization classification_flow
  • Fixed GPT2 onnx export failure due to 2G limitation
  • Use axis0 as default for deconv in pytorch-quantization toolkit
  • Updated onnx export script for CoordConvAC sample
  • Install devtoolset-8 for updated g++ version in CentOS7 container

Removed

  • Usage of deprecated TensorRT APIs in samples removed
  • quant_bert.py module removed from pytorch-quantization

22.04

14 Apr 01:19
Compare
Choose a tag to compare

Commit used by the 22.04 TensorRT NGC container.

Changelog

Added

  • TensorRT Engine Explorer v0.1.0 README
  • Detectron 2 Mask R-CNN R50-FPN python sample
  • Model export script for sampleOnnxMnistCoordConvAC

Changed

  • Updated base TensorRT version to 8.2.4.2
  • Updated copyright headers with SPDX identifiers
  • Updated onnx-graphsurgeon v0.3.17 CHANGELOG
  • PyramidROIAlign plugin refactor and bug fixes
  • Fixed MultilevelCropAndResize crashes on Windows
  • #1583 - sublicense ieee/half.h under Apache2
  • Updated demo/BERT performance tables for rel-8.2
  • #1774 Fix python hangs at IndexErrors when TF is imported after TensorRT
  • Various bugfixes in demos - BERT, Tacotron2 and HuggingFace GPT/T5 notebooks
  • Cleaned up sample READMEs

Removed

  • sampleNMT removed from samples

22.03

24 Mar 05:20
Compare
Choose a tag to compare

Commit used by the 22.03 TensorRT NGC container.

Changelog

Added

  • EfficientDet sample enhancements
    • Added support for EfficientDet Lite and AdvProp models.
    • Added dynamic batch support.
    • Added mixed precision engine builder.

Changed

  • Better decoupling of HuggingFace demo tests

22.02

04 Feb 18:40
Compare
Choose a tag to compare

Commit used by the 22.02 TensorRT NGC container.

Changelog

Added

Changed

  • Extend Megatron LayerNorm plugins to support larger hidden sizes
  • Refactored EfficientNMS plugin for TFTRT and added implicit batch mode support
  • Update base TensorRT version to 8.2.3.0
  • GPT-2 greedy search speedup - now runs on GPU
  • Updates to TensorRT developer tools
  • Updated ONNX parser to v8.2.3.0
  • Minor updates and bugfixes
    • Samples: TFOD, GPT-2, demo/BERT
    • Plugins: proposalPlugin, geluPlugin, bertQKVToContextPlugin, batchedNMS

Removed

  • Unused source file(s) in demo/BERT

22.01

24 Jan 23:49
Compare
Choose a tag to compare

Commit used by the 22.01 TensorRT NGC container.

TensorRT OSS v8.2.1 GA

24 Nov 18:19
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.

  • Updates since TensorRT 8.2.0 EA release.

  • Please refer to the TensorRT 8.2.1 GA release notes for more information.

  • ONNX parser v8.2.1

    • Removed duplicate constant layer checks that caused some performance regressions
    • Fixed expand dynamic shape calculations
    • Added parser-side checks for Scatter layer support
  • Sample updates

    • Added Tensorflow Object Detection API converter samples, including Single Shot Detector, Faster R-CNN and Mask R-CNN models
    • Multiple enhancements in HuggingFace transformer demos
      • Added multi-batch support
      • Fixed resultant performance regression in batchsize=1
      • Fixed T5 large/T5-3B accuracy issues
      • Added notebooks for T5 and GPT-2
      • Added CPU benchmarking option
    • Deprecated kSTRICT_TYPES (strict type constraints). Equivalent behaviour now achieved by setting PREFER_PRECISION_CONSTRAINTS, DIRECT_IO, and REJECT_EMPTY_ALGORITHMS
    • Removed sampleMovieLens
    • Renamed sampleReformatFreeIO to sampleIOFormats
    • Add idleTime option for samples to control qps
    • Specify default value for precisionConstraints
    • Fixed reporting of TensorRT build version in trtexec
    • Fixed combineDescriptions typo in trtexec/tracer.py
    • Fixed usages of kDIRECT_IO
  • Plugin updates

    • EfficientNMS plugin support extended to TF-TRT, and for clang builds.
    • Sanitize header definitions for BERT fused MHA plugin
    • Separate C++ and cu files in splitPlugin to avoid PTX generation (required for CUDA enhanced compatibility support)
    • Enable C++14 build for plugins
  • ONNX tooling updates

  • Build and container fixes

    • Add SM86 target to default GPU_ARCHS for platforms with cuda-11.1+
    • Remove deprecated SM_35 and add SM_60 to default GPU_ARCHS
    • Skip CUB builds for cuda 11.0+ #1455
    • Fixed cuda-10.2 container build failures in Ubuntu 20.04
    • Add native ARM server build container
    • Install devtoolset-8 for updated g++ version in CentOS7
    • Added a note on supporting c++14 builds for CentOS7
    • Fixed docker build for large UIDs #1373
    • Updated README instructions for Jetpack builds
  • demo enhancements

    • Updated Tacotron2 instructions and add CPU benchmarking
    • Fixed issues in demoBERT python notebook
  • Documentation updates

    • Updated Python documentation for add_reduce, add_top_k, and ISoftMaxLayer
    • Renamed default GitHub branch to main and updated hyperlinks

21.10

05 Oct 17:03
Compare
Choose a tag to compare

Commit used by the 21.10 TensorRT NGC container.

Changelog

Added

  • Benchmark script for demoBERT-Megatron
  • Dynamic Input Shape support for EfficientNMS plugin
  • Support empty dimensions in ONNX
  • INT32 and dynamic clips through elementwise in ONNX parser

Changed

  • Bump TensorRT version to 8.0.3.4
  • Use static shape for only single batch single sequence input in demo/BERT
  • Revert to using native FC layer in demo/BERT and FCPlugin only on older GPUs.
  • Update demo/Tacotron2 for TensorRT 8.0
  • Updates to TensorRT developer tools
    • Polygraphy v0.33.0
      • Added various examples, a CLI User Guide and how-to guides.
      • Added experimental support for DLA.
      • Added a data to-input tool that can combine inputs/outputs created by --save-inputs/--save-outputs.
      • Added a PluginRefRunner which provides CPU reference implementations for TensorRT plugins
      • Made several performance improvements in the Polygraphy CUDA wrapper.
      • Removed the to-json tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON.
    • Bugfixes and documentation updates in pytorch-quantization toolkit.
  • Bumped up package versions: tensorflow-gpu 2.5.1, pillow 8.3.2
  • ONNX parser enhancements and bugfixes
    • Update ONNX submodule to v1.8.0
    • Update convDeconvMultiInput function to properly handle deconvs
    • Update RNN documentation
    • Update QDQ axis assertion
    • Fix bidirectional activation alpha and beta values
    • Fix opset10 Resize
    • Fix shape tensor unsqueeze
    • Mark BOOL tiles as unsupported
    • Remove unnecessary shape tensor checks

Removed

  • N/A

TensorRT OSS v8.2.0 EA

05 Oct 19:03
Compare
Choose a tag to compare
Pre-release

TensorRT OSS release corresponding to TensorRT 8.2.0.6 EA release.

Added

  • Demo applications showcasing TensorRT inference of HuggingFace Transformers.
    • Support is currently extended to GPT-2 and T5 models.
  • Added support for the following ONNX operators:
    • Einsum
    • IsNan
    • GatherND
    • Scatter
    • ScatterElements
    • ScatterND
    • Sign
    • Round
  • Added support for building TensorRT Python API on Windows.

Updated

  • Notable API updates in TensorRT 8.2.0.6 EA release. See TensorRT Developer Guide for details.
    • Added three new APIs, IExecutionContext: getEnqueueEmitsProfile(), setEnqueueEmitsProfile(), and reportToProfiler() which can be used to collect layer profiling info when the inference is launched as a CUDA graph.
    • Eliminated the global logger; each Runtime, Builder or Refitter now has its own logger.
    • Added new operators: IAssertionLayer, IConditionLayer, IEinsumLayer, IIfConditionalBoundaryLayer, IIfConditionalOutputLayer, IIfConditionalInputLayer, and IScatterLayer.
    • Added new IGatherLayer modes: kELEMENT and kND
    • Added new ISliceLayer modes: kFILL, kCLAMP, and kREFLECT
    • Added new IUnaryLayer operators: kSIGN and kROUND
    • Added new runtime class IEngineInspector that can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc.
    • ProfilingVerbosity enums have been updated to show their functionality more explicitly.
  • Updated TensorRT OSS container defaults to cuda 11.4
  • CMake to target C++14 builds.
  • Updated following ONNX operators:
    • Gather and GatherElements implementations to natively support negative indices
    • Pad layer to support ND padding, along with edge and reflect padding mode support
    • If layer with general performance improvements.

Removed

  • Removed sampleMLP.
  • Several flags of trtexec have been deprecated:
    • --explicitBatch flag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.
    • --explicitPrecision flag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.
    • --nvtxMode=[verbose|default|none] has been deprecated in favor of --profilingVerbosity=[detailed|layer_names_only|none] to show its functionality more explicitly.

Signed-off-by: Rajeev Rao rajeevrao@nvidia.com

21.09

22 Sep 17:28
Compare
Choose a tag to compare

Commit used by the 21.09 TensorRT NGC container.

Changelog

Added

  • Add ONNX2TRT_VERSION overwrite in CMake.

Changed

  • Updates to TensorRT developer tools
  • Fix assertion in EfficientNMSPlugin

Removed

  • N/A