Merge pull request #4062 from NVIDIA/dev-brb-update-for-10.3-GA

Release 10.3-GA
NVIDIA · Aug 8, 2024 · c5b9de3 · c5b9de3
2 parents 4575799 + 84dd6ed
commit c5b9de3
Show file tree

Hide file tree

Showing 80 changed files with 2,701 additions and 533 deletions.
diff --git a/.clang-format b/.clang-format
@@ -74,7 +74,7 @@ SpacesInContainerLiterals: true
 SpacesInParentheses: false
 SpacesInSquareBrackets: false
 Standard:        Cpp11
-StatementMacros: [API_ENTRY_TRY]
+StatementMacros: [API_ENTRY_TRY,TRT_TRY]
 TabWidth:        4
 UseTab:          Never
 ...
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,22 @@
 # TensorRT OSS Release Changelog
 
-## 10.2.0 GA - 2024-07-10
+## 10.3.0 GA - 2024-08-07
+
+Key Features and Updates:
+
+ - Demo changes
+   - Added [Stable Video Diffusion](demo/Diffusion)(`SVD`) pipeline.
+ - Plugin changes
+   - Deprecated Version 1 of [ScatterElements plugin](plugin/scatterElementsPlugin). It is superseded by Version 2, which implements the `IPluginV3` interface.
+ - Quickstart guide
+   - Updated the [SemanticSegmentation](quickstart/SemanticSegmentation) guide with latest APIs.
+ - Parser changes
+   - Added support for tensor `axes` inputs for `Slice` node.
+   - Updated `ScatterElements` importer to use Version 2 of [ScatterElements plugin](plugin/scatterElementsPlugin), which implements the `IPluginV3` interface.
+ - Updated tooling
+   - Polygraphy v0.49.13
+
+## 10.2.0 GA - 2024-07-09
 
 Key Features and Updates:
 

diff --git a/LICENSE b/LICENSE
@@ -337,10 +337,11 @@
      limitations under the License.
 
    > demo/Diffusion/utilities.py
+   > demo/Diffusion/stable_video_diffusion_pipeline.py
 
      HuggingFace diffusers library.
 
-     Copyright 2022 The HuggingFace Team.
+     Copyright 2024 The HuggingFace Team.
 
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
@@ -380,3 +381,21 @@
       LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
       OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
       SOFTWARE.
+
+   > demo/Diffusion/utilities.py
+
+      ModelScope library.
+
+      Copyright (c) Alibaba, Inc. and its affiliates.
+
+      Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License.
+      You may obtain a copy of the License at
+
+         http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+      See the License for the specific language governing permissions and
+      limitations under the License.
diff --git a/README.md b/README.md
@@ -26,7 +26,7 @@ You can skip the **Build** section to enjoy TensorRT with Python.
 To build the TensorRT-OSS components, you will first need the following software packages.
 
 **TensorRT GA build**
-* TensorRT v10.2.0.19
+* TensorRT v10.3.0.26
   * Available from direct download links listed below
 
 **System Packages**
@@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software
     If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.
 
     Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
-      - [TensorRT 10.2.0.19 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz)
-      - [TensorRT 10.2.0.19 for CUDA 12.5, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz)
-      - [TensorRT 10.2.0.19 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows.win10.cuda-11.8.zip)
-      - [TensorRT 10.2.0.19 for CUDA 12.5, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows.win10.cuda-12.5.zip)
+      - [TensorRT 10.3.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
+      - [TensorRT 10.3.0.26 for CUDA 12.5, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz)
+      - [TensorRT 10.3.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/zip/TensorRT-10.3.0.26.Windows.win10.cuda-11.8.zip)
+      - [TensorRT 10.3.0.26 for CUDA 12.5, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/zip/TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip)
 
 
     **Example: Ubuntu 20.04 on x86-64 with cuda-12.5**
 
     ```bash
     cd ~/Downloads
-    tar -xvzf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz
-    export TRT_LIBPATH=`pwd`/TensorRT-10.2.0.19
+    tar -xvzf TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz
+    export TRT_LIBPATH=`pwd`/TensorRT-10.3.0.26
     ```
 
     **Example: Windows on x86-64 with cuda-12.5**
 
     ```powershell
-    Expand-Archive -Path TensorRT-10.2.0.19.Windows.win10.cuda-12.5.zip
-    $env:TRT_LIBPATH="$pwd\TensorRT-10.2.0.19\lib"
+    Expand-Archive -Path TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip
+    $env:TRT_LIBPATH="$pwd\TensorRT-10.3.0.26\lib"
     ```
 
 ## Setting Up The Build Environment

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-10.2.0.19
+10.3.0.26
diff --git a/demo/BERT/README.md b/demo/BERT/README.md
@@ -75,7 +75,7 @@ The following software version configuration has been tested:
 |Software|Version|
 |--------|-------|
 |Python|>=3.8|
-|TensorRT|10.2.0.19|
+|TensorRT|10.3.0.26|
 |CUDA|12.5|
 
 ## Setup

diff --git a/demo/Diffusion/README.md b/demo/Diffusion/README.md
@@ -48,14 +48,14 @@ onnx                1.15.0
 onnx-graphsurgeon   0.5.2
 onnxruntime         1.16.3
 polygraphy          0.49.9
-tensorrt            10.2.0.19
+tensorrt            10.3.0.26
 tokenizers          0.13.3
 torch               2.2.0
 transformers        4.33.1
 controlnet-aux      0.0.6
 nvidia-modelopt     0.11.2
 ```
-> NOTE: optionally install HuggingFace [accelerate](https://pypi.org/project/accelerate/) package for faster and less memory-intense model loading.
+> NOTE: optionally install HuggingFace [accelerate](https://pypi.org/project/accelerate/) package for faster and less memory-intense model loading. Note that installing accelerate is known to cause failures while running certain pipelines in Torch Compile mode ([known issue](https://github.com/huggingface/diffusers/issues/9091))
 
 # Running demoDiffusion
 
@@ -178,6 +178,28 @@ python3 demo_txt2img_sd3.py "dog wearing a sweater and a blue collar" --version
 
 Note that a denosing-percentage is applied to the number of denoising-steps when an input image conditioning is provided. Its default value is set to 0.6. This parameter can be updated using `--denoising-percentage`
 
+### Image-to-video using SVD (Stable Video Diffusion)
+
+Download the pre-exported ONNX model
+
+```bash
+git lfs install
+git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1-tensorrt onnx-svd-xt-1-1
+cd onnx-svd-xt-1-1 && git lfs pull && cd ..
+```
+
+SVD-XT-1.1 (25 frames at resolution 576x1024)
+```bash
+python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --hf-token=$HF_TOKEN
+```
+
+You may also specify a custom conditioning image using `--input-image`:
+```bash
+python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --input-image https://www.hdcarwallpapers.com/walls/2018_chevrolet_camaro_zl1_nascar_race_car_2-HD.jpg --hf-token=$HF_TOKEN
+```
+
+NOTE: The min and max guidance scales are configured using --min-guidance-scale and --max-guidance-scale respectively.
+
 ## Configuration options
 - Noise scheduler can be set using `--scheduler <scheduler>`. Note: not all schedulers are available for every version.
 - To accelerate engine building time use `--timing-cache <path to cache file>`. The cache file will be created if it does not already exist. Note that performance may degrade if cache files are used across multiple GPU targets. It is recommended to use timing caches only during development. To achieve the best perfromance in deployment, please build engines without timing cache.

diff --git a/demo/Diffusion/demo_img2vid.py b/demo/Diffusion/demo_img2vid.py
@@ -0,0 +1,117 @@
+#
+# SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import argparse
+
+from PIL import Image
+
+from stable_video_diffusion_pipeline import StableVideoDiffusionPipeline
+from utilities import (
+    PIPELINE_TYPE,
+    add_arguments,
+    download_image,
+)
+
+def parseArgs():
+    parser = argparse.ArgumentParser(description="Options for Stable Diffusion Img2Vid Demo", conflict_handler='resolve')
+    parser = add_arguments(parser)
+    parser.add_argument('--version', type=str, default="svd-xt-1.1", choices=["svd-xt-1.1"], help="Version of Stable Video Diffusion")
+    parser.add_argument('--input-image', type=str, default="", help="Path to the input image")
+    parser.add_argument('--height', type=int, default=576, help="Height of image to generate (must be multiple of 8)")
+    parser.add_argument('--width', type=int, default=1024, help="Width of image to generate (must be multiple of 8)")
+    parser.add_argument('--min-guidance-scale', type=float, default=1.0, help="The minimum guidance scale. Used for the classifier free guidance with first frame")
+    parser.add_argument('--max-guidance-scale', type=float, default=3.0, help="The maximum guidance scale. Used for the classifier free guidance with last frame")
+    parser.add_argument('--denoising-steps', type=int, default=25, help="Number of denoising steps")
+    parser.add_argument('--num-warmup-runs', type=int, default=1, help="Number of warmup runs before benchmarking performance")
+    return parser.parse_args()
+
+def process_pipeline_args(args):
+
+    if not args.input_image:
+        args.input_image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true"
+    if isinstance(args.input_image, str):
+        input_image = download_image(args.input_image).resize((args.width, args.height))
+    elif isinstance(args.input_image, Image.Image):
+        input_image = Image.open(args.input_image)
+    else:
+        raise ValueError(f"Input image(s) must be of type `PIL.Image.Image` or `str` (URL) but is {type(args.input_image)}")
+
+    if args.height % 8 != 0 or args.width % 8 != 0:
+        raise ValueError(f"Image height and width have to be divisible by 8 but are: {args.image_height} and {args.width}.")
+
+    # TODO enable BS>1
+    max_batch_size = 1
+    args.build_static_batch = True
+
+    if args.batch_size > max_batch_size:
+        raise ValueError(f"Batch size {args.batch_size} is larger than allowed {max_batch_size}.")
+
+    if not args.build_static_batch or args.build_dynamic_shape:
+        raise ValueError(f"Dynamic shapes not supported. Do not specify `--build-dynamic-shape`")
+
+    kwargs_init_pipeline = {
+        'version': args.version,
+        'max_batch_size': max_batch_size,
+        'denoising_steps': args.denoising_steps,
+        'scheduler': args.scheduler,
+        'min_guidance_scale': args.min_guidance_scale,
+        'max_guidance_scale': args.max_guidance_scale,
+        'output_dir': args.output_dir,
+        'hf_token': args.hf_token,
+        'verbose': args.verbose,
+        'nvtx_profile': args.nvtx_profile,
+        'use_cuda_graph': args.use_cuda_graph,
+        'framework_model_dir': args.framework_model_dir,
+        'torch_inference': args.torch_inference,
+    }
+
+    kwargs_load_engine = {
+        'onnx_opset': args.onnx_opset,
+        'opt_batch_size': args.batch_size,
+        'opt_image_height': args.height,
+        'opt_image_width': args.width,
+        'static_batch': args.build_static_batch,
+        'static_shape': not args.build_dynamic_shape,
+        'enable_all_tactics': args.build_all_tactics,
+        'enable_refit': args.build_enable_refit,
+        'timing_cache': args.timing_cache,
+    }
+
+    args_run_demo = (input_image, args.height, args.width, args.batch_size, args.batch_count, args.num_warmup_runs, args.use_cuda_graph)
+
+    return kwargs_init_pipeline, kwargs_load_engine, args_run_demo
+
+if __name__ == "__main__":
+    print("[I] Initializing StableDiffusion img2vid demo using TensorRT")
+    args = parseArgs()
+    kwargs_init_pipeline, kwargs_load_engine, args_run_demo = process_pipeline_args(args)
+
+    # Initialize demo
+    demo = StableVideoDiffusionPipeline(
+        pipeline_type=PIPELINE_TYPE.IMG2VID,
+        **kwargs_init_pipeline)
+    demo.loadEngines(
+        args.engine_dir,
+        args.framework_model_dir,
+        args.onnx_dir,
+        **kwargs_load_engine)
+    demo.loadResources(args.height, args.width, args.batch_size, args.seed)
+
+    # Run inference
+    demo.run(*args_run_demo)
+
+    demo.teardown()