09 Jun 02:54

rajeevsrao

156c59a

22.06

Commit used by the 22.06 TensorRT NGC container.

Changelog

Added

None

Changed

Disentangled attention (DMHA) plugin refactored
ONNX parser updated to 8.2GA

Removed

None

Assets 2

13 May 21:52

rajeevsrao

22.05

99a11a5

22.05

Commit used by the 22.05 TensorRT NGC container.

Changelog

Added

Disentangled attention plugin for DeBERTa
DMHA (multiscaleDeformableAttnPlugin) plugin for DDETR
Performance benchmarking mode to HuggingFace demo

Changed

Updated base TensorRT version to 8.2.5.1
Updated onnx-graphsurgeon v0.3.19 CHANGELOG
fp16 support for pillarScatterPlugin
#1939 - Fixed path in quantization classification_flow
Fixed GPT2 onnx export failure due to 2G limitation
Use axis0 as default for deconv in pytorch-quantization toolkit
Updated onnx export script for CoordConvAC sample
Install devtoolset-8 for updated g++ version in CentOS7 container

Removed

Usage of deprecated TensorRT APIs in samples removed
quant_bert.py module removed from pytorch-quantization

Assets 2

14 Apr 01:19

rajeevsrao

22.04

f4a8635

22.04

Commit used by the 22.04 TensorRT NGC container.

Changelog

Added

TensorRT Engine Explorer v0.1.0 README
Detectron 2 Mask R-CNN R50-FPN python sample
Model export script for sampleOnnxMnistCoordConvAC

Changed

Updated base TensorRT version to 8.2.4.2
Updated copyright headers with SPDX identifiers
Updated onnx-graphsurgeon v0.3.17 CHANGELOG
PyramidROIAlign plugin refactor and bug fixes
Fixed MultilevelCropAndResize crashes on Windows
#1583 - sublicense ieee/half.h under Apache2
Updated demo/BERT performance tables for rel-8.2
#1774 Fix python hangs at IndexErrors when TF is imported after TensorRT
Various bugfixes in demos - BERT, Tacotron2 and HuggingFace GPT/T5 notebooks
Cleaned up sample READMEs

Removed

sampleNMT removed from samples

Assets 2

24 Mar 05:20

rajeevsrao

22.03

46253b6

22.03

Commit used by the 22.03 TensorRT NGC container.

Changelog

Added

EfficientDet sample enhancements
- Added support for EfficientDet Lite and AdvProp models.
- Added dynamic batch support.
- Added mixed precision engine builder.

Changed

Better decoupling of HuggingFace demo tests

Assets 2

04 Feb 18:40

rajeevsrao

22.02

42805f0

22.02

Commit used by the 22.02 TensorRT NGC container.

Changelog

Added

New plugins: decodeBbox3DPlugin, pillarScatterPlugin, and voxelGeneratorPlugin

Changed

Extend Megatron LayerNorm plugins to support larger hidden sizes
Refactored EfficientNMS plugin for TFTRT and added implicit batch mode support
Update base TensorRT version to 8.2.3.0
GPT-2 greedy search speedup - now runs on GPU
Updates to TensorRT developer tools
- Polygraphy v0.35.1
- onnx-graphsurgeon v0.3.15
Updated ONNX parser to v8.2.3.0
Minor updates and bugfixes
- Samples: TFOD, GPT-2, demo/BERT
- Plugins: proposalPlugin, geluPlugin, bertQKVToContextPlugin, batchedNMS

Removed

Unused source file(s) in demo/BERT

Assets 2

24 Jan 23:49

rajeevsrao

22.01

498dcb0

22.01

Commit used by the 22.01 TensorRT NGC container.

Assets 2

24 Nov 18:19

rajeevsrao

8.2.1

6f38570

TensorRT OSS v8.2.1 GA

TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.

Updates since TensorRT 8.2.0 EA release.
Please refer to the TensorRT 8.2.1 GA release notes for more information.
ONNX parser v8.2.1
- Removed duplicate constant layer checks that caused some performance regressions
- Fixed expand dynamic shape calculations
- Added parser-side checks for Scatter layer support
Sample updates
- Added Tensorflow Object Detection API converter samples, including Single Shot Detector, Faster R-CNN and Mask R-CNN models
- Multiple enhancements in HuggingFace transformer demos
  - Added multi-batch support
  - Fixed resultant performance regression in batchsize=1
  - Fixed T5 large/T5-3B accuracy issues
  - Added notebooks for T5 and GPT-2
  - Added CPU benchmarking option
- Deprecated kSTRICT_TYPES (strict type constraints). Equivalent behaviour now achieved by setting PREFER_PRECISION_CONSTRAINTS, DIRECT_IO, and REJECT_EMPTY_ALGORITHMS
- Removed sampleMovieLens
- Renamed sampleReformatFreeIO to sampleIOFormats
- Add idleTime option for samples to control qps
- Specify default value for precisionConstraints
- Fixed reporting of TensorRT build version in trtexec
- Fixed combineDescriptions typo in trtexec/tracer.py
- Fixed usages of kDIRECT_IO
Plugin updates
- EfficientNMS plugin support extended to TF-TRT, and for clang builds.
- Sanitize header definitions for BERT fused MHA plugin
- Separate C++ and cu files in splitPlugin to avoid PTX generation (required for CUDA enhanced compatibility support)
- Enable C++14 build for plugins
ONNX tooling updates
- onnx-graphsurgeon upgraded to v0.3.14
- Polygraphy upgraded to v0.33.2
- pytorch-quantization toolkit upgraded to v2.1.2
Build and container fixes
- Add SM86 target to default GPU_ARCHS for platforms with cuda-11.1+
- Remove deprecated SM_35 and add SM_60 to default GPU_ARCHS
- Skip CUB builds for cuda 11.0+ #1455
- Fixed cuda-10.2 container build failures in Ubuntu 20.04
- Add native ARM server build container
- Install devtoolset-8 for updated g++ version in CentOS7
- Added a note on supporting c++14 builds for CentOS7
- Fixed docker build for large UIDs #1373
- Updated README instructions for Jetpack builds
demo enhancements
- Updated Tacotron2 instructions and add CPU benchmarking
- Fixed issues in demoBERT python notebook
Documentation updates
- Updated Python documentation for add_reduce, add_top_k, and ISoftMaxLayer
- Renamed default GitHub branch to main and updated hyperlinks

Assets 2

05 Oct 17:03

rajeevsrao

21.10

80674b3

21.10

Commit used by the 21.10 TensorRT NGC container.

Changelog

Added

Benchmark script for demoBERT-Megatron
Dynamic Input Shape support for EfficientNMS plugin
Support empty dimensions in ONNX
INT32 and dynamic clips through elementwise in ONNX parser

Changed

Bump TensorRT version to 8.0.3.4
Use static shape for only single batch single sequence input in demo/BERT
Revert to using native FC layer in demo/BERT and FCPlugin only on older GPUs.
Update demo/Tacotron2 for TensorRT 8.0
Updates to TensorRT developer tools
- Polygraphy v0.33.0
  - Added various examples, a CLI User Guide and how-to guides.
  - Added experimental support for DLA.
  - Added a data to-input tool that can combine inputs/outputs created by --save-inputs/--save-outputs.
  - Added a PluginRefRunner which provides CPU reference implementations for TensorRT plugins
  - Made several performance improvements in the Polygraphy CUDA wrapper.
  - Removed the to-json tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON.
- Bugfixes and documentation updates in pytorch-quantization toolkit.
Bumped up package versions: tensorflow-gpu 2.5.1, pillow 8.3.2
ONNX parser enhancements and bugfixes
- Update ONNX submodule to v1.8.0
- Update convDeconvMultiInput function to properly handle deconvs
- Update RNN documentation
- Update QDQ axis assertion
- Fix bidirectional activation alpha and beta values
- Fix opset10 Resize
- Fix shape tensor unsqueeze
- Mark BOOL tiles as unsupported
- Remove unnecessary shape tensor checks

Removed

Assets 2

05 Oct 19:03

rajeevsrao

8.2.0-EA

2d517d2

TensorRT OSS v8.2.0 EA Pre-release

Pre-release

TensorRT OSS release corresponding to TensorRT 8.2.0.6 EA release.

Added

Demo applications showcasing TensorRT inference of HuggingFace Transformers.
- Support is currently extended to GPT-2 and T5 models.
Added support for the following ONNX operators:
- Einsum
- IsNan
- GatherND
- Scatter
- ScatterElements
- ScatterND
- Sign
- Round
Added support for building TensorRT Python API on Windows.

Updated

Notable API updates in TensorRT 8.2.0.6 EA release. See TensorRT Developer Guide for details.
- Added three new APIs, IExecutionContext: getEnqueueEmitsProfile(), setEnqueueEmitsProfile(), and reportToProfiler() which can be used to collect layer profiling info when the inference is launched as a CUDA graph.
- Eliminated the global logger; each Runtime, Builder or Refitter now has its own logger.
- Added new operators: IAssertionLayer, IConditionLayer, IEinsumLayer, IIfConditionalBoundaryLayer, IIfConditionalOutputLayer, IIfConditionalInputLayer, and IScatterLayer.
- Added new IGatherLayer modes: kELEMENT and kND
- Added new ISliceLayer modes: kFILL, kCLAMP, and kREFLECT
- Added new IUnaryLayer operators: kSIGN and kROUND
- Added new runtime class IEngineInspector that can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc.
- ProfilingVerbosity enums have been updated to show their functionality more explicitly.
Updated TensorRT OSS container defaults to cuda 11.4
CMake to target C++14 builds.
Updated following ONNX operators:
- Gather and GatherElements implementations to natively support negative indices
- Pad layer to support ND padding, along with edge and reflect padding mode support
- If layer with general performance improvements.

Removed

Removed sampleMLP.
Several flags of trtexec have been deprecated:
- --explicitBatch flag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.
- --explicitPrecision flag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.
- --nvtxMode=[verbose|default|none] has been deprecated in favor of --profilingVerbosity=[detailed|layer_names_only|none] to show its functionality more explicitly.

Signed-off-by: Rajeev Rao rajeevrao@nvidia.com

Assets 2

22 Sep 17:28

rajeevsrao

21.09

dc51748

21.09

Commit used by the 21.09 TensorRT NGC container.

Changelog

Added

Add ONNX2TRT_VERSION overwrite in CMake.

Changed

Updates to TensorRT developer tools
- ONNX-GraphSurgeon v0.3.12
- pytorch-quantization toolkit v2.1.1
Fix assertion in EfficientNMSPlugin

Removed

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

Added

Changed

Removed

Changelog

Added

Changed

Removed

Changelog

Added

Changed

Removed

Changelog

Added

Changed

Changelog

Added

Changed

Removed

Changelog

Added

Changed

Removed

Added

Updated

Removed

Changelog

Added

Changed

Removed

Releases: NVIDIA/TensorRT

22.06

Changelog

Added

Changed

Removed

22.05

Changelog

Added

Changed

Removed

22.04

Changelog

Added

Changed

Removed

22.03

Changelog

Added

Changed

22.02

Changelog

Added

Changed

Removed

22.01

TensorRT OSS v8.2.1 GA

21.10

Changelog

Added

Changed

Removed

TensorRT OSS v8.2.0 EA

Added

Updated

Removed

21.09

Changelog

Added

Changed

Removed