Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft of ORT GPU build #5622

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft

Conversation

ChSonnabend
Copy link

This is a draft PR to discuss possible changes to onnxruntime.sh for GPU builds on the EPN's and potentially CUDA (to be tested)

@ChSonnabend
Copy link
Author

Ping @davidrohr

Copy link
Contributor

@davidrohr davidrohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Du solltest ein paar Umgebungsvariable, die wir in o2.sh nutzen, auch mitaufnehmen und analog behandeln: https://github.com/alisw/alidist/blob/1916f6d88d42959097998d9481b517dc1c1ea84d/o2.sh#L191C9-L191C30

  • ALIBUILD_O2_FORCE_GPU
  • DISABLE_GPU
  • ALIBUILD_ENABLE_CUDA
  • ALIBUILD_ENABLE_HIP
  • ALIBUILD_O2_OVERRIDE_HIP_ARCHS
  • ALIBUILD_O2_OVERRIDE_CUDA_ARCHS

Wenn ENABLE_CUDA oder ENABLE_HIP gesetzt ist, sollte der build fehlschlagen, wenn er CUDA/HIP nicht bauen kann.

onnxruntime.sh Outdated
"
elif command -v nvcc >/dev/null 2>&1; then
CUDA_VERSION=$(nvcc --version | grep "release" | awk '{print $NF}' | cut -d. -f1)
if [[ "$CUDA_VERSION" == "V11" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glaube CUDA 11 kannst du weglassen, und nur >=12 annehmen

ORT_BUILD_FLAGS=""
case $ARCHITECTURE in
osx_*)
if [[ $ARCHITECTURE == *_x86-64 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solche printouts würde ich weglassen, das ist ja hauptsächlich für debugging

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ja, aber ich nehme an das es auch einen macOS build gibt der die Mac GPU anspricht. Da muss ich nochmal ein bisschen rumsuchen, dann könnte man den if-Block nämlich nehmen um da die build flags rein zu packen. Aber ja, die print-outs nehm ich am Ende natürlich noch raus

fi
;;
*)
if command -v rocminfo >/dev/null 2>&1; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • rocm version check fehlt
  • Es ist nicht klar, ob rocminfo im Pfad liegt. Du solltest zumindest /opt/rocm/bin/rocminfo testen. Und dann ist migraphx ein separates ROCm paket. Sprich, wenn rocminfo vorhanden ist, heist das noch nicht, das migraphx vorhanden ist. Du solltest explicit auf migraphx testen.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, das check ich nochmal

onnxruntime.sh Outdated
ORT_BUILD_FLAGS=" -Donnxruntime_USE_CUDA=ON \
-DCUDA_TOOLKIT_ROOT_DIR=$CUDA_ROOT \
-Donnxruntime_USE_CUDA_NHWC_OPS=ON \
-Donnxruntime_CUDA_USE_TENSORRT=ON \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wenn du tensorrt nutzt, musst du dann prüfen, ob das explicit installiert ist? Oder ist das immer beim CUDA SDK dabei?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheint nicht automatisch mitzukommen (https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)... Ok da bau ich auch noch einen Check mit ein

onnxruntime.sh Outdated
-Donnxruntime_USE_CUDA_NHWC_OPS=ON \
-Donnxruntime_CUDA_USE_TENSORRT=ON \
"
elif [[ "$CUDA_VERSION" == "V12" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was ist wenn ROCm und CUDA beides vorhanden ist? Können wir dann nicht beides bauen?

Copy link
Author

@ChSonnabend ChSonnabend Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktf
Copy link
Member

ktf commented Sep 17, 2024

Gneau...

…adding env-variables for GPU enabling during code execution. For al9_gpu container and simultaneous CUDA & ROCm build, this requires ChSonnabend/onnxruntime@6ffc40c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants