Releases: NVIDIA/TensorRT
22.06
Commit used by the 22.06 TensorRT NGC container.
Changelog
Added
- None
Changed
- Disentangled attention (DMHA) plugin refactored
- ONNX parser updated to 8.2GA
Removed
- None
22.05
Commit used by the 22.05 TensorRT NGC container.
Changelog
Added
- Disentangled attention plugin for DeBERTa
- DMHA (multiscaleDeformableAttnPlugin) plugin for DDETR
- Performance benchmarking mode to HuggingFace demo
Changed
- Updated base TensorRT version to 8.2.5.1
- Updated onnx-graphsurgeon v0.3.19 CHANGELOG
- fp16 support for pillarScatterPlugin
- #1939 - Fixed path in quantization
classification_flow
- Fixed GPT2 onnx export failure due to 2G limitation
- Use axis0 as default for deconv in pytorch-quantization toolkit
- Updated onnx export script for CoordConvAC sample
- Install devtoolset-8 for updated g++ version in CentOS7 container
Removed
- Usage of deprecated TensorRT APIs in samples removed
quant_bert.py
module removed from pytorch-quantization
22.04
Commit used by the 22.04 TensorRT NGC container.
Changelog
Added
- TensorRT Engine Explorer v0.1.0 README
- Detectron 2 Mask R-CNN R50-FPN python sample
- Model export script for sampleOnnxMnistCoordConvAC
Changed
- Updated base TensorRT version to 8.2.4.2
- Updated copyright headers with SPDX identifiers
- Updated onnx-graphsurgeon v0.3.17 CHANGELOG
PyramidROIAlign
plugin refactor and bug fixes- Fixed
MultilevelCropAndResize
crashes on Windows - #1583 - sublicense ieee/half.h under Apache2
- Updated demo/BERT performance tables for rel-8.2
- #1774 Fix python hangs at IndexErrors when TF is imported after TensorRT
- Various bugfixes in demos - BERT, Tacotron2 and HuggingFace GPT/T5 notebooks
- Cleaned up sample READMEs
Removed
- sampleNMT removed from samples
22.03
Commit used by the 22.03 TensorRT NGC container.
Changelog
Added
- EfficientDet sample enhancements
- Added support for EfficientDet Lite and AdvProp models.
- Added dynamic batch support.
- Added mixed precision engine builder.
Changed
- Better decoupling of HuggingFace demo tests
22.02
Commit used by the 22.02 TensorRT NGC container.
Changelog
Added
- New plugins: decodeBbox3DPlugin, pillarScatterPlugin, and voxelGeneratorPlugin
Changed
- Extend Megatron LayerNorm plugins to support larger hidden sizes
- Refactored EfficientNMS plugin for TFTRT and added implicit batch mode support
- Update base TensorRT version to 8.2.3.0
- GPT-2 greedy search speedup - now runs on GPU
- Updates to TensorRT developer tools
- Updated ONNX parser to v8.2.3.0
- Minor updates and bugfixes
- Samples: TFOD, GPT-2, demo/BERT
- Plugins: proposalPlugin, geluPlugin, bertQKVToContextPlugin, batchedNMS
Removed
- Unused source file(s) in demo/BERT
22.01
Commit used by the 22.01 TensorRT NGC container.
TensorRT OSS v8.2.1 GA
TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
-
Updates since TensorRT 8.2.0 EA release.
-
Please refer to the TensorRT 8.2.1 GA release notes for more information.
-
ONNX parser v8.2.1
- Removed duplicate constant layer checks that caused some performance regressions
- Fixed expand dynamic shape calculations
- Added parser-side checks for
Scatter
layer support
-
Sample updates
- Added Tensorflow Object Detection API converter samples, including Single Shot Detector, Faster R-CNN and Mask R-CNN models
- Multiple enhancements in HuggingFace transformer demos
- Added multi-batch support
- Fixed resultant performance regression in batchsize=1
- Fixed T5 large/T5-3B accuracy issues
- Added notebooks for T5 and GPT-2
- Added CPU benchmarking option
- Deprecated
kSTRICT_TYPES
(strict type constraints). Equivalent behaviour now achieved by settingPREFER_PRECISION_CONSTRAINTS
,DIRECT_IO
, andREJECT_EMPTY_ALGORITHMS
- Removed
sampleMovieLens
- Renamed sampleReformatFreeIO to sampleIOFormats
- Add
idleTime
option for samples to control qps - Specify default value for
precisionConstraints
- Fixed reporting of TensorRT build version in trtexec
- Fixed
combineDescriptions
typo in trtexec/tracer.py - Fixed usages of
kDIRECT_IO
-
Plugin updates
EfficientNMS
plugin support extended to TF-TRT, and for clang builds.- Sanitize header definitions for BERT fused MHA plugin
- Separate C++ and cu files in
splitPlugin
to avoid PTX generation (required for CUDA enhanced compatibility support) - Enable C++14 build for plugins
-
ONNX tooling updates
- onnx-graphsurgeon upgraded to v0.3.14
- Polygraphy upgraded to v0.33.2
- pytorch-quantization toolkit upgraded to v2.1.2
-
Build and container fixes
- Add
SM86
target to defaultGPU_ARCHS
for platforms with cuda-11.1+ - Remove deprecated
SM_35
and addSM_60
to defaultGPU_ARCHS
- Skip CUB builds for cuda 11.0+ #1455
- Fixed cuda-10.2 container build failures in Ubuntu 20.04
- Add native ARM server build container
- Install devtoolset-8 for updated g++ version in CentOS7
- Added a note on supporting c++14 builds for CentOS7
- Fixed docker build for large UIDs #1373
- Updated README instructions for Jetpack builds
- Add
-
demo enhancements
- Updated Tacotron2 instructions and add CPU benchmarking
- Fixed issues in demoBERT python notebook
-
Documentation updates
- Updated Python documentation for
add_reduce
,add_top_k
, andISoftMaxLayer
- Renamed default GitHub branch to
main
and updated hyperlinks
- Updated Python documentation for
21.10
Commit used by the 21.10 TensorRT NGC container.
Changelog
Added
- Benchmark script for demoBERT-Megatron
- Dynamic Input Shape support for EfficientNMS plugin
- Support empty dimensions in ONNX
- INT32 and dynamic clips through elementwise in ONNX parser
Changed
- Bump TensorRT version to 8.0.3.4
- Use static shape for only single batch single sequence input in demo/BERT
- Revert to using native FC layer in demo/BERT and FCPlugin only on older GPUs.
- Update demo/Tacotron2 for TensorRT 8.0
- Updates to TensorRT developer tools
- Polygraphy v0.33.0
- Added various examples, a CLI User Guide and how-to guides.
- Added experimental support for DLA.
- Added a
data to-input
tool that can combine inputs/outputs created by--save-inputs
/--save-outputs
. - Added a
PluginRefRunner
which provides CPU reference implementations for TensorRT plugins - Made several performance improvements in the Polygraphy CUDA wrapper.
- Removed the
to-json
tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON.
- Bugfixes and documentation updates in pytorch-quantization toolkit.
- Polygraphy v0.33.0
- Bumped up package versions: tensorflow-gpu 2.5.1, pillow 8.3.2
- ONNX parser enhancements and bugfixes
- Update ONNX submodule to v1.8.0
- Update convDeconvMultiInput function to properly handle deconvs
- Update RNN documentation
- Update QDQ axis assertion
- Fix bidirectional activation alpha and beta values
- Fix opset10
Resize
- Fix shape tensor unsqueeze
- Mark BOOL tiles as unsupported
- Remove unnecessary shape tensor checks
Removed
- N/A
TensorRT OSS v8.2.0 EA
TensorRT OSS release corresponding to TensorRT 8.2.0.6 EA release.
Added
- Demo applications showcasing TensorRT inference of HuggingFace Transformers.
- Support is currently extended to GPT-2 and T5 models.
- Added support for the following ONNX operators:
Einsum
IsNan
GatherND
Scatter
ScatterElements
ScatterND
Sign
Round
- Added support for building TensorRT Python API on Windows.
Updated
- Notable API updates in TensorRT 8.2.0.6 EA release. See TensorRT Developer Guide for details.
- Added three new APIs,
IExecutionContext: getEnqueueEmitsProfile()
,setEnqueueEmitsProfile()
, andreportToProfiler()
which can be used to collect layer profiling info when the inference is launched as a CUDA graph. - Eliminated the global logger; each
Runtime
,Builder
orRefitter
now has its own logger. - Added new operators:
IAssertionLayer
,IConditionLayer
,IEinsumLayer
,IIfConditionalBoundaryLayer
,IIfConditionalOutputLayer
,IIfConditionalInputLayer
, andIScatterLayer
. - Added new
IGatherLayer
modes:kELEMENT
andkND
- Added new
ISliceLayer
modes:kFILL
,kCLAMP
, andkREFLECT
- Added new
IUnaryLayer
operators:kSIGN
andkROUND
- Added new runtime class
IEngineInspector
that can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc. ProfilingVerbosity
enums have been updated to show their functionality more explicitly.
- Added three new APIs,
- Updated TensorRT OSS container defaults to cuda 11.4
- CMake to target C++14 builds.
- Updated following ONNX operators:
Gather
andGatherElements
implementations to natively support negative indicesPad
layer to support ND padding, along withedge
andreflect
padding mode supportIf
layer with general performance improvements.
Removed
- Removed
sampleMLP
. - Several flags of trtexec have been deprecated:
--explicitBatch
flag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.--explicitPrecision
flag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.--nvtxMode=[verbose|default|none]
has been deprecated in favor of--profilingVerbosity=[detailed|layer_names_only|none]
to show its functionality more explicitly.
Signed-off-by: Rajeev Rao rajeevrao@nvidia.com