Skip to content

Releases: NVIDIA/TensorRT

TensorRT OSS v8.6.1

05 May 00:34
a25ca8b
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.6.1.6 GA release.

Key Features and Updates:

  • Added a new flag --use-cuda-graph to demoDiffusion to improve performance.
  • Optimized GPT2 and T5 HuggingFace demos to use fp16 I/O tensors for fp16 networks.

TensorRT OSS v8.6.0

17 Mar 04:06
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.6.0.12 EA release.

Key Features and Updates:

  • demoDiffusion acceleration is now supported out of the box in TensorRT without requiring plugins.
    • The following plugins have been removed accordingly: GroupNorm, LayerNorm, MultiHeadCrossAttention, MultiHeadFlashAttention, SeqLen2Spatial, and SplitGeLU.
  • Added a new sample called onnx_custom_plugin.

We needed to force-push main and release/8.6 branches and v8.6.0 release. If you cloned/pulled the repo recently, please rebase the affected branches. Our apologies for this inconvenience.

TensorRT OSS v8.5.3

03 Feb 20:28
b0c259a
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.5.3.1 GA release.

Key Features and Updates:

  • Added the following HuggingFace demos: GPT-J-6B, GPT2-XL, and GPT2-Medium
  • Added nvinfer1::plugin namespace
  • Optimized KV Cache performance for T5

TensorRT OSS v8.5.2

14 Dec 00:46
ad932f7
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.5.2.2 GA release.

Updates since TensorRT 8.5.1 GA release.
Please refer to the TensorRT 8.5.2 GA release notes for more information.

Key Features and Updates:

22.12

08 Dec 06:19
Compare
Choose a tag to compare

Commit used by the 22.12 TensorRT NGC container.

Added

  • Stable Diffusion demo using TensorRT Plugins
  • KV-cache and beam search to GPT2 and T5 demos
  • Perplexity calculation to all HF demos

Changed

  • Updated trex to v0.1.5
  • Increased default workspace size in demoBERT to build BS=128 fp32 engines
  • Use avg_iter=8 and timing cache to make demoBERT perf more stable

Removed

  • None

TensorRT OSS v8.5.1

02 Nov 23:53
1d6bf36
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.5.1.7 GA release.

Key Features and Updates:

  • Samples enhancements

    • Added sampleNamedDimensions which works with named dimensions.
    • Updated sampleINT8API and introductory_parser_samples to use ONNX models over Caffe/UFF
    • Removed UFF/Caffe samples including sampleMNIST, end_to_end_tensorflow_mnist, sampleINT8, sampleMNISTAPI, sampleUffMNIST, sampleUffPluginV2Ext, engine_refit_mnist, int8_caffe_mnist, uff_custom_plugin, sampleFasterRCNN, sampleUffFasterRCNN, sampleGoogleNet, sampleSSD, sampleUffSSD, sampleUffMaskRCNN and uff_ssd.
  • Plugin enhancements

    • Added GridAnchorRectPlugin to support rectangular feature maps in gridAnchorPlugin.
    • Added ROIAlignPlugin to support the ONNX operator RoiAlign. The ONNX parser will automatically route ROIAlign ops through the plugin.
    • Added Hopper support for the BERTQKVToContextPlugin plugin.
    • Exposed the use_int8_scale_max attribute in the BERTQKVToContextPlugin plugin to allow users to disable the by-default usage of INT8 scale factors to optimize softmax MAX reduction in versions 2 and 3 of the plugin.
  • ONNX-TensorRT changes

  • Build containers

    • Updated default cuda versions to 11.8.0.
  • Tooling enhancements

TensorRT OSS v8.4.3

19 Aug 22:51
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.4.3.1 release.

Key Updates:

  • Python packages for Python 3.10.
  • Bug fix for potential overlaps in H2D and inference execution in trtexec.

22.08

17 Aug 00:14
Compare
Choose a tag to compare

Commit used by the 22.08 TensorRT NGC container.

Changelog

Updated TensorRT version to 8.4.2 - see the TensorRT 8.4.2 release notes for more information

Changed

  • Updated default protobuf version to 3.20.x
  • Updated ONNX-TensorRT submodule version to 22.08 tag
  • Updated sampleIOFormats and sampleAlgorithmSelector to use ONNX models over Caffe

Fixes

  • Fixed missing serialization member in CustomClipPlugin plugin
  • Fixed various Python import issues

Added

  • Added new DeBERTA demo
  • Added version 2 for disentangledAttentionPlugin to support DeBERTA v2

Removed

  • None

22.07

22 Jul 02:46
Compare
Choose a tag to compare

Commit used by the 22.07 TensorRT NGC container.

Changelog

Added

  • polygraphy-trtexec-plugin tool for Polygraphy
  • Multi-profile support for demoBERT
  • KV cache support for HF BART demo

Changed

  • Updated ONNX-GS to v0.3.20

Removed

  • None

TensorRT OSS v8.4.1 GA

14 Jun 21:25
Compare
Choose a tag to compare

TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.

Key Features and Updates:

  • Samples enhancements

  • EfficientDet sample

    • Added support for EfficientDet Lite and AdvProp models.
    • Added dynamic batch support.
    • Added mixed precision engine builder.
  • HuggingFace transformer demo

    • Added BART model.
    • Performance speedup of GPT-2 greedy search using GPU implementation.
    • Fixed GPT2 onnx export failure due to 2G file size limitation.
    • Extended Megatron LayerNorm plugins to support larger hidden sizes.
    • Added performance benchmarking mode.
    • Enable tf32 format by default.
  • demoBERT enhancements

    • Add --duration flag to perf benchmarking script.
    • Fixed import of nvinfer_plugins library in demoBERT on Windows.
  • Torch-QAT toolkit

    • quant_bert.py module removed. It is now upstreamed to HuggingFace QDQBERT.
    • Use axis0 as default for deconv.
    • #1939 - Fixed path in classification_flow example.
  • Plugin enhancements

  • Build containers

    • Updated default cuda versions to 11.6.2.
    • CentOS Linux 8 has reached End-of-Life on Dec 31, 2021. The corresponding container has been removed from TensorRT-OSS.
    • Install devtoolset-8 for updated g++ versions in CentOS7 container.
  • Tooling enhancements

  • trtexec enhancements

    • Added --layerPrecisions and --layerOutputTypes flags for specifying layer-wise precision and output type constraints.
    • Added --memPoolSize flag to specify the size of workspace as well as the DLA memory pools via a unified interface. Correspondingly the --workspace flag has been deprecated.
    • "End-To-End Host Latency" metric has been removed. Use the “Host Latency” metric instead. For more information, refer to Benchmarking Network section in the TensorRT Developer Guide.
    • Use enqueueV2() instead of enqueue() when engine has explicit batch dimensions.