-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT] Cache location of unwind sections #82994
Conversation
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsTBD: Just testing build on different configurations...
|
@filipnavara. The result is awesome. I have ran the tests I have used in my analysis with this change. Originally, the Linux NativeAOT was clearly not scaling at all, now the multi-threaded performance is only about 10% worse than the single threaded one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
This can improve GC root reporting too as that performs stack walks and in server GC case does it on multiple threads. |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks!!
Abstract
In issue #77568 the exception handling performance was tested on various scenarios. For Linux AOT a bottleneck was identified in the
findUnwindSections
method. Specifically, for multi-threaded scenario there's a significant performance penalty due to the usage ofdl_iterate_phdr
API which internally uses a lock.A simple observation is that nearly all the frames we try to unwind are the compiled managed code which uses the same unwind table all the time. We can cache the value upfront and avoid all the lookups entirely.
Another side-effect of this is that it also helps the code paths that do thread hijacking during GC, and it potentially avoids some locks in those code paths.
Implementation
The implementation reshuffles the implementation of the
FindProcInfo
andVirtualUnwind
methods and moves them intoUnixCodeManager
where theUnwindInfoSections
value is cached.The llvm-libunwind API offers two ways to inject the cached information about the unwind sections. It can either be done through custom
AddressSpace
class implementation which has the benefit that high-level C++ API can be reused by simply switching one template parameter. Alternatively, the low-level C++ API can be used directly and the information just passed to it. Since the unwinding code already used the low-level API in most cases I opted to go that route.Testing
Test code
The test code was injected into an empty application created with
dotnet new console
and then compiled withdotnet publish -p:PublishAot=true -r linux-x64 -c Debug
.My test configuration is a Ryzen 7950X machine with Ubuntu 22.04.2 LTS in Windows Subsystem for Linux. Baseline is .NET 8 Preview 1, where I get ~19500 exceptions per second. With this PR I get around 145000 exceptions per second, or more than 7 times as fast throughput.
I also briefly tested on MacBook Air M1 in osx-arm64 configuration. The throughput of the PR is about 1.78x faster than the .NET 8 Preview 1 baseline.