Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark XxHash64.Complete as noinline #90142

Merged
merged 3 commits into from
Aug 8, 2023
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Aug 8, 2023

Fixes #90090

public static IEnumerable<byte[]> TestData()
{
    yield return "aadwejkadjgb8c27tr874c3/./[}|P{OP&^&$%^^TGERfgea"u8.ToArray();
}

[Benchmark]
[ArgumentsSource(nameof(TestData))]
public byte[] XxHash64_(byte[] data) => XxHash64.Hash(data);
Method Toolchain data Mean
XxHash64_ \Core_Root\corerun.exe Byte[48] 11.322 ns
XxHash64_ \Core_Root_base\corerun.exe Byte[48] 30.245 ns

Temp workaround for the inliner's budget problem. It seems to decide to inline this function because it's on a hot path + PGO + it has multiple recognizeable intrinsics and looks like it mistakenly recognized foldable expressions here. Actually, it does make code faster if we inline it, but, since we run out of budget, we give up on inlining simple things such as Span.Slice and that makes perf worse. No impact on large inputs as the main work is done in this loop and Complete is called as the last iteration so only small inputs are affected..

It's not an issue for XxHash32 where Complete() is 2x simpler and is inlined just fine. XxHash128 is not affected as well due to a different structure of code, although, it hit a similiar issue in the past.

@ghost
Copy link

ghost commented Aug 8, 2023

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #90090

public static IEnumerable<byte[]> TestData()
{
    yield return "aadwejkadjgb8c27tr874c3/./[}|P{OP&^&$%^^TGERfgea"u8.ToArray();
}

[Benchmark]
[ArgumentsSource(nameof(TestData))]
public byte[] XxHash64_(byte[] data) => XxHash64.Hash(data);
Method Toolchain data Mean
XxHash64_ \Core_Root\corerun.exe Byte[48] 11.322 ns
XxHash64_ \Core_Root_base\corerun.exe Byte[48] 30.245 ns

Temp workaround for the inliner's budget problem. It seems to decide to inline this function because it's on a hot path + PGO + it has multiple recognizeable intrinsics and looks like it mistakenly recognized foldable expressions here. Actually, it does make code faster if we inline it, but, since we run out of budget, we give up on inlining simple things such as Span.Slice and that makes perf worse. No impact on large inputs as the main work is done in this loop and Complete is called as the last iteration so only small inputs are affected..

It's not an issue for XxHash32 where Complete() is 2x simpler and is inlined just fine. XxHash128 is not affected as well due to a different structure of code, although, it hit a similiar issue in the past.

Author: EgorBo
Assignees: -
Labels:

area-System.IO

Milestone: -

@adamsitnik adamsitnik added the tenet-performance Performance related issue label Aug 8, 2023
@EgorBo EgorBo merged commit bcf9938 into dotnet:main Aug 8, 2023
103 checks passed
@EgorBo EgorBo deleted the xxhash-noinline branch August 8, 2023 13:16
@ghost ghost locked as resolved and limited conversation to collaborators Sep 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When calculating xxhash, net7.0 is slower than net6.0
3 participants