Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb][libc++] Hide all libc++ implementation details from stacktraces #108870

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

vogelsgesang
Copy link
Member

@vogelsgesang vogelsgesang commented Sep 16, 2024

This commit changes the libc++ frame recognizer to hide implementation details of libc++ more aggressively. The applied heuristic is rather straightforward: We consider every function name starting with __ as an implementation detail.

This works pretty neatly for std::invoke, std::function, std::sort, std::map::emplace and many others. Also, this should align quite nicely with libc++'s general coding convention of using the __ for their implementation details, thereby keeping the future maintenance effort low.

However, this heuristic by itself does not work in 100% of the cases: E.g., std::ranges::sort is not a function, but an object with an overloaded operator(), which means that there is no actual call std::ranges::sort in the call stack. Instead, there is a std::ranges::__sort::operator() call. To make sure that we don't hide this stack frame, we never hide the frame which represents the entry point from user code into libc++ code

@llvmbot llvmbot added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. lldb labels Sep 16, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 16, 2024

@llvm/pr-subscribers-lldb

@llvm/pr-subscribers-libcxx

Author: Adrian Vogelsgesang (vogelsgesang)

Changes

This commit changes the libc++ frame recognizer to hide implementation details of libc++ more aggressively. The applied heuristic is rather straightforward: We consider every function name starting with __ as an implementation detail.

This works pretty neatly for std::invoke, std::function, std::sort, std::map::emplace and many others. Also, this should align quite nicely with libc++'s general coding convention of using the __ for their implementation details, thereby keeping the future maintenance effort low.

However, it is noteworthy, that this does not work 100% in all cases: E.g., for std::ranges::sort, itself is not really a function call, but an object with an overloaded operator(), which means that there is no actual call std::ranges::sort in the call stack.


Full diff: https://github.com/llvm/llvm-project/pull/108870.diff

5 Files Affected:

  • (modified) libcxx/docs/UserDocumentation.rst (+26)
  • (modified) lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp (+8-19)
  • (added) lldb/test/API/lang/cpp/libcxx-internals-recognizer/Makefile (+5)
  • (added) lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py (+56)
  • (added) lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp (+86)
diff --git a/libcxx/docs/UserDocumentation.rst b/libcxx/docs/UserDocumentation.rst
index 3651e52ed77a75..f48d65ee0e753b 100644
--- a/libcxx/docs/UserDocumentation.rst
+++ b/libcxx/docs/UserDocumentation.rst
@@ -343,6 +343,32 @@ Third-party Integrations
 
 Libc++ provides integration with a few third-party tools.
 
+Debugging libc++ internals in LLDB
+----------------------------------
+
+LLDB hides the implementation details of libc++ by default.
+
+E.g., when setting a breakpoint in a comparator passed to ``std::sort``, the
+backtrace will read as
+
+.. code-block::
+
+  (lldb) thread backtrace
+  * thread #1, name = 'a.out', stop reason = breakpoint 3.1
+    * frame #0: 0x000055555555520e a.out`my_comparator(a=1, b=8) at test-std-sort.cpp:6:3
+      frame #7: 0x0000555555555615 a.out`void std::__1::sort[abi:ne200000]<std::__1::__wrap_iter<int*>, bool (*)(int, int)>(__first=(item = 8), __last=(item = 0), __comp=(a.out`my_less(int, int) at test-std-sort.cpp:5)) at sort.h:1003:3
+      frame #8: 0x000055555555531a a.out`main at test-std-sort.cpp:24:3
+
+Note how the caller of ``my_comparator`` is shown as ``std::sort``. Looking at
+the frame numbers, we can see that frames #1 until #6 were hidden. Those frames
+represent internal implementation details such as ``__sort4`` and similar
+utility functions.
+
+To also show those implementation details, use ``thread backtrace -u``.
+Alternatively, to disable those compact backtraces for good, use
+``frame recognizer list`` and ``frame recognizer delete`` to delete the libc++
+frame recognizer.
+
 GDB Pretty printers for libc++
 ------------------------------
 
diff --git a/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp b/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
index faa05e8f834ea1..d0e84bdeb94f01 100644
--- a/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
+++ b/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
@@ -45,7 +45,7 @@ char CPPLanguageRuntime::ID = 0;
 /// A frame recognizer that is installed to hide libc++ implementation
 /// details from the backtrace.
 class LibCXXFrameRecognizer : public StackFrameRecognizer {
-  std::array<RegularExpression, 4> m_hidden_regex;
+  std::array<RegularExpression, 2> m_hidden_regex;
   RecognizedStackFrameSP m_hidden_frame;
 
   struct LibCXXHiddenFrame : public RecognizedStackFrame {
@@ -55,28 +55,17 @@ class LibCXXFrameRecognizer : public StackFrameRecognizer {
 public:
   LibCXXFrameRecognizer()
       : m_hidden_regex{
-            // internal implementation details of std::function
+            // internal implementation details in the `std::` namespace
             //    std::__1::__function::__alloc_func<void (*)(), std::__1::allocator<void (*)()>, void ()>::operator()[abi:ne200000]
             //    std::__1::__function::__func<void (*)(), std::__1::allocator<void (*)()>, void ()>::operator()
             //    std::__1::__function::__value_func<void ()>::operator()[abi:ne200000]() const
-            RegularExpression{""
-              R"(^std::__[^:]*::)" // Namespace.
-              R"(__function::.*::operator\(\))"},
-            // internal implementation details of std::function in ABI v2
             //    std::__2::__function::__policy_invoker<void (int, int)>::__call_impl[abi:ne200000]<std::__2::__function::__default_alloc_func<int (*)(int, int), int (int, int)>>
-            RegularExpression{""
-              R"(^std::__[^:]*::)" // Namespace.
-              R"(__function::.*::__call_impl)"},
-            // internal implementation details of std::invoke
-            //   std::__1::__invoke[abi:ne200000]<void (*&)()>
-            RegularExpression{
-              R"(^std::__[^:]*::)" // Namespace.
-              R"(__invoke)"},
-            // internal implementation details of std::invoke
-            //   std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne200000]<void (*&)()>
-            RegularExpression{
-              R"(^std::__[^:]*::)" // Namespace.
-              R"(__invoke_void_return_wrapper<.*>::__call)"}
+            //    std::__1::__invoke[abi:ne200000]<void (*&)()>
+            //    std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne200000]<void (*&)()>
+            RegularExpression{R"(^std::__[^:]*::__)"},
+            // internal implementation details in the `std::ranges` namespace
+            //    std::__1::ranges::__sort::__sort_fn_impl[abi:ne200000]<std::__1::__wrap_iter<int*>, std::__1::__wrap_iter<int*>, bool (*)(int, int), std::__1::identity>
+            RegularExpression{R"(^std::__[^:]*::ranges::__)"},
         },
         m_hidden_frame(new LibCXXHiddenFrame()) {}
 
diff --git a/lldb/test/API/lang/cpp/libcxx-internals-recognizer/Makefile b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/Makefile
new file mode 100644
index 00000000000000..bb571299664934
--- /dev/null
+++ b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/Makefile
@@ -0,0 +1,5 @@
+CXX_SOURCES := main.cpp
+USE_LIBCPP := 1
+CXXFLAGS_EXTRAS := -std=c++20
+
+include Makefile.rules
diff --git a/lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py
new file mode 100644
index 00000000000000..a5b4e4fe995c38
--- /dev/null
+++ b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py
@@ -0,0 +1,56 @@
+import lldb
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+
+
+class LibCxxInternalsRecognizerTestCase(TestBase):
+    NO_DEBUG_INFO_TESTCASE = True
+
+    @add_test_categories(["libc++"])
+    def test_frame_recognizer(self):
+        """Test that implementation details of libc++ are hidden"""
+        self.build()
+        (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
+            self, "break here", lldb.SBFileSpec("main.cpp")
+        )
+
+        expected_parents = {
+            "sort_less(int, int)": ["::sort", "test_algorithms"],
+            # `std::ranges::sort` is implemented as an object of types `__sort`,
+            # which unfortunately means that there is no `std::ranges::sort`
+            # stack frame, and `main` is the direct parent of `my_less_ranges`.
+            "ranges_sort_less(int, int)": ["test_algorithms"],
+            # `ranges::views::transform` internally uses `std::invoke`, and that
+            # call also shows up in the stack trace
+            "view_transform(int)": ["::invoke", "ranges::transform_view", "test_algorithms"],
+            # Various types of `invoke` calls
+            "consume_number(int)": ["::invoke", "test_invoke"],
+            "invoke_add(int, int)": ["::invoke", "test_invoke"],
+            "Callable::member_function(int) const": ["::invoke", "test_invoke"],
+            "Callable::operator()(int) const": ["::invoke", "test_invoke"],
+            # Containers
+            "MyKey::operator<(MyKey const&) const": ["less", "::emplace", "test_containers"],
+        }
+        stop_set = set()
+        while process.GetState() != lldb.eStateExited:
+            fn = thread.GetFrameAtIndex(0).GetFunctionName()
+            stop_set.add(fn)
+            self.assertIn(fn, expected_parents.keys())
+            frame_id = 1
+            for expected_parent in expected_parents[fn]:
+                # Skip all hidden frames
+                while (
+                    frame_id < thread.GetNumFrames()
+                    and thread.GetFrameAtIndex(frame_id).IsHidden()
+                ):
+                    frame_id = frame_id + 1
+                # Expect the correct parent frame
+                self.assertIn(
+                    expected_parent, thread.GetFrameAtIndex(frame_id).GetFunctionName()
+                )
+                frame_id = frame_id + 1
+            process.Continue()
+
+        # Make sure that we actually verified all intended scenarios
+        self.assertEqual(len(stop_set), len(expected_parents))
diff --git a/lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp
new file mode 100644
index 00000000000000..870301b0970439
--- /dev/null
+++ b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp
@@ -0,0 +1,86 @@
+#include <algorithm>
+#include <functional>
+#include <map>
+#include <ranges>
+#include <vector>
+
+bool sort_less(int a, int b) {
+  __builtin_printf("break here");
+  return a < b;
+}
+
+bool ranges_sort_less(int a, int b) {
+  __builtin_printf("break here");
+  return a < b;
+}
+
+int view_transform(int a) {
+  __builtin_printf("break here");
+  return a * a;
+}
+
+void test_algorithms() {
+  std::vector<int> vec{8, 1, 3, 2};
+
+  // The internal frames for `std::sort` should be hidden
+  std::sort(vec.begin(), vec.end(), sort_less);
+
+  // The internal frames for `ranges::sort` should be hidden
+  std::ranges::sort(vec.begin(), vec.end(), ranges_sort_less);
+
+  // Same for views
+  for (auto x : vec | std::ranges::views::transform(view_transform)) {
+    // no-op
+  }
+}
+
+void consume_number(int i) { __builtin_printf("break here"); }
+
+int invoke_add(int i, int j) {
+  __builtin_printf("break here");
+  return i + j;
+}
+
+struct Callable {
+  Callable(int num) : num_(num) {}
+  void operator()(int i) const { __builtin_printf("break here"); }
+  void member_function(int i) const { __builtin_printf("break here"); }
+  int num_;
+};
+
+void test_invoke() {
+  // Invoke a void-returning function
+  std::invoke(consume_number, -9);
+
+  // Invoke a non-void-returning function
+  std::invoke(invoke_add, 1, 10);
+
+  // Invoke a member function
+  const Callable foo(314159);
+  std::invoke(&Callable::member_function, foo, 1);
+
+  // Invoke a function object
+  std::invoke(Callable(12), 18);
+}
+
+struct MyKey {
+  int x;
+  bool operator==(const MyKey &) const = default;
+  bool operator<(const MyKey &other) const {
+    __builtin_printf("break here");
+    return x < other.x;
+  }
+};
+
+void test_containers() {
+  std::map<MyKey, int> map;
+  map.emplace(MyKey{1}, 2);
+  map.emplace(MyKey{2}, 3);
+}
+
+int main() {
+  test_algorithms();
+  test_invoke();
+  test_containers();
+  return 0;
+}

Copy link

github-actions bot commented Sep 16, 2024

✅ With the latest revision this PR passed the Python code formatter.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a good idea, but I am also a bit scared of hiding too much stuff and making it difficult to figure out what's going wrong from a stack trace. It's not rational, I have no evidence to back this up and perhaps this is only fear of unknown.

Factually, the heuristic of using __ makes sense to me since all implementation-detail functions start with __, and nothing in the public libc++ API starts with __ (off the top of my head).

Overall, I think I like this patch but like I said I'm a bit scared of it being too aggressive.

@vogelsgesang
Copy link
Member Author

vogelsgesang commented Sep 17, 2024

[...] like I said I'm a bit scared of it being too aggressive.

Yes, this is also my main concern here. E.g., for std::ranges::sort, it already turns out to be slightly too aggressive, as discussed in the commit message and the test case comment. The "std::ranges::sort calls an operator() of an internal struct type" pattern is already slightly problematic...

Do you have an idea how we could de-risk this change? E.g., would do you have a couple more scenarios in mind for which I should add test cases? Are there similar patterns to the operator() which I should also cover in test cases?

Do we have some libc++ developers which are using lldb top-of-tree for their day-to-day work on libc++ and who would make us aware in case this commit turns out to be too aggressive?

@Michael137
Copy link
Member

Michael137 commented Sep 17, 2024

When you're stopped in a frame and you have libc++ frames above you in the backtrace, you're either stopped in a callback (like std::function, std::make_unique, std::sort), or you're deliberately stepping into libc++ code. For the latter, you probably don't want to hide any of the implementation details (though the libc++ devs might be able to comment better on their workflow here). I would find it weird if I stepped into a series of function calls manually but the backtrace omitted my path down.

I wonder if a heuristic for the callback case could be: mark all but the first libc++ entry point as an implementation detail. Would this be even more aggressive? Also, the implementation of this might be a bit awkward. Not sure. Wdyt?

@ldionne
Copy link
Member

ldionne commented Sep 17, 2024

Do you have an idea how we could de-risk this change? E.g., would do you have a couple more scenarios in mind for which I should add test cases? Are there similar patterns to the operator() which I should also cover in test cases?

What I could imagine is to drive this via attributes instead of a heuristic. We could potentially mark implementation details of libc++ as such to control what the debugging experience is like at a much finer grain if we used attributes, but obviously this would also increase complexity in the code and it would be yet another thing that we have to slap on almost every declaration.

Do we have some libc++ developers which are using lldb top-of-tree for their day-to-day work on libc++ and who would make us aware in case this commit turns out to be too aggressive?

I'm not aware of anyone. I just use lldb provided by the platform usually.

@jimingham
Copy link
Collaborator

Do we need to worry about whether this recognizer hides frames that might be useful to someone actually working on the libc++ they are debugging? I would imagine in that case, you'd probably want to turn off this recognizer anyway, since you want to see the details.

Seems to me like this feature should be tuned to "users of libc++" not "developers of libc++".

We really should add a "disable" to the frame recognizers. Once you delete them you can't really get the built-in ones back. People who are working ON libc++ still spend a bunch of their time debugging "uses" of it, and likely would benefit from the implementation hiding, then every so often want to see the details. We should make that easier.

@vogelsgesang
Copy link
Member Author

We really should add a "disable" to the frame recognizers

Do you mean something like #109219? (Test cases still missing; rest should be ready for review)

Seems to me like this feature should be tuned to "users of libc++" not "developers of libc++".

Agree, I think that should be the goal here

I wonder if a heuristic for the callback case could be: mark all but the first libc++ entry point as an implementation detail. Would this be even more aggressive? Also, the implementation of this might be a bit awkward. Not sure. Wdyt?

🤔 Might be a good idea... I will have to experiment a bit with this...

@vogelsgesang
Copy link
Member Author

vogelsgesang commented Sep 18, 2024

What I could imagine is to drive this via attributes instead of a heuristic. We could potentially mark implementation details of libc++ as such to control what the debugging experience is like at a much finer grain if we used attributes, but obviously this would also increase complexity in the code and it would be yet another thing that we have to slap on almost every declaration.

Maybe we could also find some other convention. E.g. hiding the std::*::__detail namespace. (You could still move all those __ methods into a different namespace, right? They are not part of the stable ABI, are they?). Using a namespace would have the benefits that

  1. you don't need to annotate each individual method, you can simply use a scope
  2. we don't need to add a new attribute to clang, dwarf etc.

This commit changes the libc++ frame recognizer to hide implementation
details of libc++ more aggressively. The applied heuristic is rather
straightforward: We consider every function name starting with `__` as
an implementation detail.

This works pretty neatly for `std::invoke`, `std::function`,
`std::sort`, `std::map::emplace` and many others. Also, this should
align quite nicely with libc++'s general coding convention of using the
`__` for their implementation details, thereby keeping the future
maintenance effort low.

However, it is noteworthy, that this does not work 100% in all cases:
E.g., for `std::ranges::sort`, itself is not really a function call, but
an object with an overloaded `operator()`, which means that there is no
actual call `std::ranges::sort` in the call stack.
Copy link

github-actions bot commented Sep 21, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@vogelsgesang
Copy link
Member Author

We really should add a "disable" to the frame recognizers

Merged #109219

I wonder if a heuristic for the callback case could be: mark all but the first libc++ entry point as an implementation detail. Would this be even more aggressive? Also, the implementation of this might be a bit awkward. Not sure. Wdyt?

I combined your approach with the __-based heuristic. I am quite happy with the result. It's more robust now, as can be seen by the updated test expectations for ranges::sort: The top-level std::__1::ranges::__sort::operator() is no longer hidden


expected_parents = {
"sort_less(int, int)": ["::sort", "test_algorithms"],
# `std::ranges::sort` is implemented as an object of types `__sort`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There're a large number of such customization point objects (and niebloids, which will be respecified as CPOs soon, see P3136R0) since C++20. Should we invent some convention to recognize them uniformly?

Copy link
Member Author

@vogelsgesang vogelsgesang Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that such a convention would be necessary. With the current heuristic ("never hide a frame which was immediately called from user code"), the end result to the user is fine, despite the debug info containing the __sort::operator() function name

@vogelsgesang
Copy link
Member Author

@Michael137 thanks for the review! I updated the pull request accordingly

@jimingham
Copy link
Collaborator

jimingham commented Sep 23, 2024 via email

@vogelsgesang
Copy link
Member Author

vogelsgesang commented Sep 24, 2024

@jimingham I agree with everything you wrote. I think you misunderstood the context based on the way GitHub provided you some misleading context in the email notification

@jimingham
Copy link
Collaborator

@jimingham I agree with everything you wrote. I think you misunderstood the context based on the way GitHub provided you some misleading context in the email notification

Yeah, I should stop trying to read the notifications in the GitHub emails and just use them as a link to the website. Sigh...

Anyway, the real question was "what does an empty string do in these regex's". The answer is the one you probably suspected, an empty string means "no regular expression to apply", not ".*".

@vogelsgesang
Copy link
Member Author

(I will be waiting for one or two more weeks before merging this, in case there are any additional comments)

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regexes look sane (and sufficiently restricted) to me, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. lldb
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants