[libcxx] Add testing configuration for GPU targets #104515

jhuber6 · 2024-08-15T22:07:42Z

Summary:
The GPU runs these tests using the files built from the libc project.
These will be placed in include/<triple> and lib/<triple>. We use
the amdhsa-loader and nvptx-loader tools, which are also provided by
libc. These launch a kernel called _start which calls main so we
can pretend like GPU programs are normal terminal applications.

We force serial exeuction here, because llvm-lit runs way too many
processes in parallel, which has a bad habit of making the GPU drivers
hang or run out of resources. This allows the compilation to be run in
parallel while the jobs themselves are serialized via a file lock.

In the future this can likely be refined to accept user specified
architectures, or better handle including the root directory by exposing
that instead of just include/<triple>/c++/v1/.

This currently fails ~1% of the tests on AMDGPU and ~3% of the tests on
NVPTX. This will hopefully be reduced further, and later patches can
XFAIL a lot of them once it's down to a reasonable number.

Future support will likely want to allow passing in a custom
architecture instead of simply relying on -mcpu=native.

llvmbot · 2024-08-15T22:08:15Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-libcxx

Author: Joseph Huber (jhuber6)

Changes

Summary:
The GPU runs these tests using the files built from the libc project.
These will be placed in include/<triple> and lib/<triple>. We use
the amdhsa-loader and nvptx-loader tools, which are also provided by
libc. These launch a kernel called _start which calls main so we
can pretend like GPU programs are normal terminal applications.

We force serial exeuction here, because llvm-lit runs way too many
processes in parallel, which has a bad habit of making the GPU drivers
hang or run out of resources. This allows the compilation to be run in
parallel while the jobs themselves are serialized via a file lock.

In the future this can likely be refined to accept user specified
architectures, or better handle including the root directory by exposing
that instead of just include/<triple>/c++/v1/.

This currently fails ~1% of the tests on AMDGPU and ~3% of the tests on
NVPTX. This will hopefully be reduced further, and later patches can
XFAIL a lot of them once it's down to a reasonable number.

Future support will likely want to allow passing in a custom
architecture instead of simply relying on -mcpu=native.

Full diff: https://github.com/llvm/llvm-project/pull/104515.diff

6 Files Affected:

(added) libcxx/cmake/caches/AMDGPU.cmake (+40)
(added) libcxx/cmake/caches/NVPTX.cmake (+40)
(added) libcxx/test/configs/amdgpu-libc++-shared.cfg.in (+29)
(added) libcxx/test/configs/nvptx-libc++-shared.cfg.in (+31)
(modified) libcxx/test/std/containers/sequences/deque/deque.modifiers/insert_range.pass.cpp (+4)
(modified) libcxx/test/std/strings/basic.string/string.modifiers/string_replace/replace_with_range.pass.cpp (+4)

diff --git a/libcxx/cmake/caches/AMDGPU.cmake b/libcxx/cmake/caches/AMDGPU.cmake
new file mode 100644
index 00000000000000..00549c69af00fb
--- /dev/null
+++ b/libcxx/cmake/caches/AMDGPU.cmake
@@ -0,0 +1,40 @@
+# Configuration options for libcxx.
+set(LIBCXX_ABI_VERSION 2 CACHE STRING "")
+set(LIBCXX_CXX_ABI none CACHE STRING "")
+set(LIBCXX_ENABLE_SHARED OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_STATIC ON CACHE BOOL "")
+set(LIBCXX_ENABLE_FILESYSTEM OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_RANDOM_DEVICE OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_LOCALIZATION OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_UNICODE OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_WIDE_CHARACTERS OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_EXCEPTIONS OFF CACHE BOOL "")
+set(LIBCXX_HAS_TERMINAL_AVAILABLE OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_RTTI OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_STATIC_ABI_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_STATICALLY_LINK_ABI_IN_STATIC_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_ENABLE_THREADS OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_MONOTONIC_CLOCK ON CACHE BOOL "")
+set(LIBCXX_INSTALL_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_LIBC "llvm-libc" CACHE STRING "")
+set(LIBCXX_USE_COMPILER_RT ON CACHE BOOL "")
+set(LIBCXX_ENABLE_NEW_DELETE_DEFINITIONS ON CACHE BOOL "")
+
+# Configuration options for libcxxabi.
+set(LIBCXXABI_BAREMETAL ON CACHE BOOL "")
+set(LIBCXXABI_ENABLE_SHARED OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_EXCEPTIONS OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_THREADS OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_NEW_DELETE_DEFINITIONS OFF CACHE BOOL "")
+set(LIBCXXABI_USE_LLVM_UNWINDER OFF CACHE BOOL "")
+
+# Test configuration.
+set(LIBCXX_TEST_CONFIG "amdgpu-libc++-shared.cfg.in" CACHE STRING "")
+set(LIBCXX_TEST_PARAMS "long_tests=False;executor=amdhsa-loader" CACHE STRING "")
+
+# Necessary compile flags for AMDGPU.
+set(LIBCXX_ADDITIONAL_COMPILE_FLAGS
+    "-nogpulib;-flto;-fconvergent-functions;-Xclang;-mcode-object-version=none" CACHE STRING "")
+set(LIBCXXABI_ADDITIONAL_COMPILE_FLAGS
+    "-nogpulib;-flto;-fconvergent-functions;-Xclang;-mcode-object-version=none" CACHE STRING "")
+set(CMAKE_REQUIRED_FLAGS "-nogpulib -nodefaultlibs" CACHE STRING "")
diff --git a/libcxx/cmake/caches/NVPTX.cmake b/libcxx/cmake/caches/NVPTX.cmake
new file mode 100644
index 00000000000000..dae83940af5b04
--- /dev/null
+++ b/libcxx/cmake/caches/NVPTX.cmake
@@ -0,0 +1,40 @@
+# Configuration options for libcxx.
+set(LIBCXX_ABI_VERSION 2 CACHE STRING "")
+set(LIBCXX_CXX_ABI none CACHE STRING "")
+set(LIBCXX_ENABLE_SHARED OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_STATIC ON CACHE BOOL "")
+set(LIBCXX_ENABLE_FILESYSTEM OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_RANDOM_DEVICE OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_LOCALIZATION OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_UNICODE OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_WIDE_CHARACTERS OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_EXCEPTIONS OFF CACHE BOOL "")
+set(LIBCXX_HAS_TERMINAL_AVAILABLE OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_RTTI OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_STATIC_ABI_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_STATICALLY_LINK_ABI_IN_STATIC_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_ENABLE_THREADS OFF CACHE BOOL "")
+set(LIBCXX_ENABLE_MONOTONIC_CLOCK ON CACHE BOOL "")
+set(LIBCXX_INSTALL_LIBRARY ON CACHE BOOL "")
+set(LIBCXX_LIBC "llvm-libc" CACHE STRING "")
+set(LIBCXX_USE_COMPILER_RT ON CACHE BOOL "")
+set(LIBCXX_ENABLE_NEW_DELETE_DEFINITIONS ON CACHE BOOL "")
+
+# Configuration options for libcxxabi.
+set(LIBCXXABI_BAREMETAL ON CACHE BOOL "")
+set(LIBCXXABI_ENABLE_SHARED OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_EXCEPTIONS OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_THREADS OFF CACHE BOOL "")
+set(LIBCXXABI_ENABLE_NEW_DELETE_DEFINITIONS OFF CACHE BOOL "")
+set(LIBCXXABI_USE_LLVM_UNWINDER OFF CACHE BOOL "")
+
+# Test configuration.
+set(LIBCXX_TEST_CONFIG "nvptx-libc++-shared.cfg.in" CACHE STRING "")
+set(LIBCXX_TEST_PARAMS "long_tests=False;executor=nvptx-loader" CACHE STRING "")
+
+# Necessary compile flags for NVPTX.
+set(LIBCXX_ADDITIONAL_COMPILE_FLAGS
+    "-nogpulib;-flto;-fconvergent-functions;--cuda-feature=+ptx63" CACHE STRING "")
+set(LIBCXXABI_ADDITIONAL_COMPILE_FLAGS
+    "-nogpulib;-flto;-fconvergent-functions;--cuda-feature=+ptx63" CACHE STRING "")
+set(CMAKE_REQUIRED_FLAGS "-nogpulib -nodefaultlibs -flto -c" CACHE STRING "")
diff --git a/libcxx/test/configs/amdgpu-libc++-shared.cfg.in b/libcxx/test/configs/amdgpu-libc++-shared.cfg.in
new file mode 100644
index 00000000000000..9b37a81f8de5d4
--- /dev/null
+++ b/libcxx/test/configs/amdgpu-libc++-shared.cfg.in
@@ -0,0 +1,29 @@
+lit_config.load_config(config, '@CMAKE_CURRENT_BINARY_DIR@/cmake-bridge.cfg')
+
+config.substitutions.append(('%{flags}',
+  f'--target={config.target_triple} -Wno-multi-gpu -flto -mcpu=native'))
+config.substitutions.append(('%{compile_flags}',
+    '-nogpulib -fno-builtin-printf -nogpuinc -nostdlibinc '
+    '-I %{include-dir} -I %{target-include-dir}/../../ '
+    '-I %{target-include-dir} -I %{libcxx-dir}/test/support'
+))
+config.substitutions.append(('%{link_flags}',
+  '-O1 -nostdinc++ -nostdlib++ %{lib-dir}/crt1.o '
+  '-L %{lib-dir} -lc++ -lc++abi -lclang_rt.builtins '
+))
+
+config.substitutions.append(('%{exec}',
+    '%{executor} --no-parallelism'
+))
+
+config.stdlib = 'llvm-libc++'
+
+import os, site
+site.addsitedir(os.path.join('@LIBCXX_SOURCE_DIR@', 'utils'))
+import libcxx.test.params, libcxx.test.config
+libcxx.test.config.configure(
+    libcxx.test.params.DEFAULT_PARAMETERS,
+    libcxx.test.features.DEFAULT_FEATURES,
+    config,
+    lit_config
+)
diff --git a/libcxx/test/configs/nvptx-libc++-shared.cfg.in b/libcxx/test/configs/nvptx-libc++-shared.cfg.in
new file mode 100644
index 00000000000000..26d93b29183f72
--- /dev/null
+++ b/libcxx/test/configs/nvptx-libc++-shared.cfg.in
@@ -0,0 +1,31 @@
+lit_config.load_config(config, '@CMAKE_CURRENT_BINARY_DIR@/cmake-bridge.cfg')
+
+config.substitutions.append(('%{flags}',
+  f'--target={config.target_triple} -Wno-multi-gpu -flto -march=native'))
+config.substitutions.append(('%{compile_flags}',
+    '-nogpulib -fno-builtin-printf -nogpuinc -nostdlibinc '
+    '-I %{include-dir} -I %{target-include-dir}/../../ '
+    '-I %{target-include-dir} -I %{libcxx-dir}/test/support'
+))
+config.substitutions.append(('%{link_flags}',
+   '-nostdinc++ -nostdlib++ %{lib-dir}/crt1.o '
+   '-L %{lib-dir} -lc++ -lc++abi -lclang_rt.builtins '
+   '-Wl,--suppress-stack-size-warning '
+   '-Wl,-mllvm,-nvptx-lower-global-ctor-dtor=1 '
+   '-Wl,-mllvm,-nvptx-emit-init-fini-kernel'
+))
+config.substitutions.append(('%{exec}',
+    '%{executor} --no-parallelism'
+))
+
+config.stdlib = 'llvm-libc++'
+
+import os, site
+site.addsitedir(os.path.join('@LIBCXX_SOURCE_DIR@', 'utils'))
+import libcxx.test.params, libcxx.test.config
+libcxx.test.config.configure(
+    libcxx.test.params.DEFAULT_PARAMETERS,
+    libcxx.test.features.DEFAULT_FEATURES,
+    config,
+    lit_config
+)
diff --git a/libcxx/test/std/containers/sequences/deque/deque.modifiers/insert_range.pass.cpp b/libcxx/test/std/containers/sequences/deque/deque.modifiers/insert_range.pass.cpp
index a5f5455297ad44..b0218cb75aca93 100644
--- a/libcxx/test/std/containers/sequences/deque/deque.modifiers/insert_range.pass.cpp
+++ b/libcxx/test/std/containers/sequences/deque/deque.modifiers/insert_range.pass.cpp
@@ -6,6 +6,10 @@
 //
 //===----------------------------------------------------------------------===//
 
+// FIXME: This takes over an hour to compile, disable for now.
+// UNSUPPORTED: target=amdgcn-amd-amdhsa
+// UNSUPPORTED: target=nvptx64-nvidia-cuda
+
 // UNSUPPORTED: c++03, c++11, c++14, c++17, c++20
 // UNSUPPORTED: GCC-ALWAYS_INLINE-FIXME
 
diff --git a/libcxx/test/std/strings/basic.string/string.modifiers/string_replace/replace_with_range.pass.cpp b/libcxx/test/std/strings/basic.string/string.modifiers/string_replace/replace_with_range.pass.cpp
index 03e82590ed4ef6..d4b16b79a0b8dc 100644
--- a/libcxx/test/std/strings/basic.string/string.modifiers/string_replace/replace_with_range.pass.cpp
+++ b/libcxx/test/std/strings/basic.string/string.modifiers/string_replace/replace_with_range.pass.cpp
@@ -6,6 +6,10 @@
 //
 //===----------------------------------------------------------------------===//
 
+// FIXME: This takes over an hour to compile, disable for now.
+// UNSUPPORTED: target=amdgcn-amd-amdhsa
+// UNSUPPORTED: target=nvptx64-nvidia-cuda
+
 // UNSUPPORTED: c++03, c++11, c++14, c++17, c++20
 // ADDITIONAL_COMPILE_FLAGS(has-fconstexpr-steps): -fconstexpr-steps=10000000
 // ADDITIONAL_COMPILE_FLAGS(has-fconstexpr-ops-limit): -fconstexpr-ops-limit=70000000

jhuber6 · 2024-08-29T17:53:25Z

ping

ldionne

Is it possible to get the added CI setup alongside this?

jhuber6 · 2024-08-30T14:52:00Z

Is it possible to get the added CI setup alongside this?

What would that look like? I've talked w/ @jplehr and @Artem-B about getting a build for these targets at least set-up. I don't think a full CI tester will be available for awhile -- there's still lots of failing tests and it takes a long time to run.

Right now I need to redo this since the %{lib-dir} no longer points to the right spot, I probably just need to add a new variable that gives me access to ${LLVM_BINARY_DIR}/lib and ${LLVM_BINARY_DIR/include.

ldionne · 2024-08-30T15:21:51Z

Is it possible to get the added CI setup alongside this?

What would that look like? I've talked w/ @jplehr and @Artem-B about getting a build for these targets at least set-up. I don't think a full CI tester will be available for awhile -- there's still lots of failing tests and it takes a long time to run.

How long does it take to run the tests? What's the reason for it being that slow?

jhuber6 · 2024-08-30T15:31:41Z

How long does it take to run the tests? What's the reason for it being that slow?

There's a lot of factors that contribute to it being really slow.

GPU backends in general are slow due to a lot of extra IPO passes (attributor) and instruction scheduling being more complicated.
Everything needs to be done through LTO because there's no backwards compatibility. This effectively allows us to defer the final architecture until it's linked in by the user. The libc test suite actually has an entirely separate target that's used for testing, so we could build that separately for a single target. Normally I'd say AMDGPU ELF linking isn't supported at all, but there's a good chance it would work anyway since all these massive unit tests aren't using anything problematic and likely use 100% of the register budget anyways.
These tests are all running on a single GPU thread. This makes it really easy to test things on the GPU, but GPUs aren't known for their single threaded performance. Performance wise, you're probably looking at two orders of magnitude slower than a server CPU. This is exacerbated by the fact that I force all of these jobs to be run serially. This is because the GPU drivers are prone to locking up and spuriously failing if you spam them with >64 processes trying to use the GPU at once. (This basically just claims a file lock internally so only one test can use the GPU at a time).

From my current configuration, I'd say an average test run takes about an hour on my server. The bot that runs the NVPTX tests for example is nowhere near as powerful as my computer, so expect that to take like 10 hours?

jhuber6 · 2024-08-30T16:16:56Z

I added some CMake that gets the llvm-libc include and library directories so I can use them. Hopefully this is fine even though it's only relevant here.

jhuber6 · 2024-09-11T14:34:31Z

ping

jhuber6 · 2024-09-18T12:45:54Z

ping

jhuber6 · 2024-09-30T21:02:52Z

ping

ldionne

A few comments, should be fine to merge once addressed.

libcxx/test/configs/amdgpu-libc++-shared.cfg.in

libcxx/test/std/containers/sequences/deque/deque.modifiers/insert_range.pass.cpp

jhuber6 · 2024-10-01T19:58:25Z

It seems that some changes moved

A few comments, should be fine to merge once addressed.

Thanks, made some updates since I apparently didn't merge any of my changes last time. Since the recent changes made the libc++ libs go to a separate install, I need access to the C library directory.

Summary: The GPU runs these tests using the files built from the `libc` project. These will be placed in `include/<triple>` and `lib/<triple>`. We use the `amdhsa-loader` and `nvptx-loader` tools, which are also provided by `libc`. These launch a kernel called `_start` which calls `main` so we can pretend like GPU programs are normal terminal applications. We force serial exeuction here, because `llvm-lit` runs way too many processes in parallel, which has a bad habit of making the GPU drivers hang or run out of resources. This allows the compilation to be run in parallel while the jobs themselves are serialized via a file lock. In the future this can likely be refined to accept user specified architectures, or better handle including the root directory by exposing that instead of just `include/<triple>/c++/v1/`. This currently fails ~1% of the tests on AMDGPU and ~3% of the tests on NVPTX. This will hopefully be reduced further, and later patches can XFAIL a lot of them once it's down to a reasonable number. Future support will likely want to allow passing in a custom architecture instead of simply relying on `-mcpu=native`.

jhuber6 requested a review from a team as a code owner August 15, 2024 22:07

jhuber6 requested review from ldionne, mordante and philnik777 August 15, 2024 22:07

llvmbot added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. backend:AMDGPU labels Aug 15, 2024

jhuber6 force-pushed the libcxx-wip-v3 branch from 0589aaa to a8e6bc5 Compare August 21, 2024 17:11

ldionne reviewed Aug 30, 2024

View reviewed changes

ldionne reviewed Oct 1, 2024

View reviewed changes

libcxx/test/configs/amdgpu-libc++-shared.cfg.in Outdated Show resolved Hide resolved

libcxx/test/std/containers/sequences/deque/deque.modifiers/insert_range.pass.cpp Outdated Show resolved Hide resolved

jhuber6 force-pushed the libcxx-wip-v3 branch from 2d6d1a2 to d1aef6d Compare October 1, 2024 19:59

Merge branch 'main' into libcxx-wip-v3

a8a9ba2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libcxx] Add testing configuration for GPU targets #104515

[libcxx] Add testing configuration for GPU targets #104515

jhuber6 commented Aug 15, 2024

llvmbot commented Aug 15, 2024 •

edited

Loading

jhuber6 commented Aug 29, 2024

ldionne left a comment

jhuber6 commented Aug 30, 2024

ldionne commented Aug 30, 2024

jhuber6 commented Aug 30, 2024 •

edited

Loading

jhuber6 commented Aug 30, 2024

jhuber6 commented Sep 11, 2024

jhuber6 commented Sep 18, 2024

jhuber6 commented Sep 30, 2024

ldionne left a comment

jhuber6 commented Oct 1, 2024

[libcxx] Add testing configuration for GPU targets #104515

Are you sure you want to change the base?

[libcxx] Add testing configuration for GPU targets #104515

Conversation

jhuber6 commented Aug 15, 2024

llvmbot commented Aug 15, 2024 • edited Loading

jhuber6 commented Aug 29, 2024

ldionne left a comment

Choose a reason for hiding this comment

jhuber6 commented Aug 30, 2024

ldionne commented Aug 30, 2024

jhuber6 commented Aug 30, 2024 • edited Loading

jhuber6 commented Aug 30, 2024

jhuber6 commented Sep 11, 2024

jhuber6 commented Sep 18, 2024

jhuber6 commented Sep 30, 2024

ldionne left a comment

Choose a reason for hiding this comment

jhuber6 commented Oct 1, 2024

llvmbot commented Aug 15, 2024 •

edited

Loading

jhuber6 commented Aug 30, 2024 •

edited

Loading