Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental compilation relies on hashes for soundness #129016

Open
RalfJung opened this issue Aug 12, 2024 · 1 comment
Open

Incremental compilation relies on hashes for soundness #129016

RalfJung opened this issue Aug 12, 2024 · 1 comment
Labels
A-incr-comp Area: Incremental compilation C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Aug 12, 2024

Incremental compilation relies on hashes for soundness -- specifically, on SipHash-1-3 with an all-zero key, which is not a cryptographic hash function. Collisions in that hash can lead to UB and other ill effects. Is that a problem? Should we do things to mitigate the risk of that? Or do we tell people they are expected to use non-incremental builds if they actually want to rely on the soundness of the result? Or is this not a bug at all and the current hashing scheme is "good enough"? After all, as @the8472 argues, if the programmer doesn't actively try to exploit the hard-coded all-zero key they are quite unlikely to hit a collision by pure chance. (That should follow from the fact that SipHash is a PRF, but I am no cryptographer.)

Fixing this by using a cryptographic hash is likely to be very bad for performance. However, @michaelwoerister mentions that there could be cheaper mitigation techniques:

Unless I'm overlooking something all incr. comp. specific hashes only need to be stable between successive incremental builds. So we can easily harden our use of SipHash there by generating random keys and then caching these in the incr. comp. cache.

If we use a truly random key for this, we could rely on SipHash being a cryptographic PRF -- without knowing the key, it's supposed to be very hard to find collisions. However, that promise is typically made for SipHash-2-4, not the weakened variant SipHash-1-3 that rustc uses. There seems to be evidence that even the weaker function is "good", but I'll leave it to cryptographic experts to evaluate the evidence here. @briansmith would be good to get your take on this.

Nominating for t-lang discussion to see what their stance is on the soundness requirements for incremental compilation, and whether relying on a non-collision-resistant hash function for soundness is "good enough" -- or whether this should be considered an implementation decision, to be made by t-compiler. (Previously, t-lang ruled that TypeId should use a "full (non-truncated) cryptographic hash", but the tradeoffs are quite different here so it's not at all clear that the same decision would apply to incremental hashes.)

EDIT: I have moved the nomination to #129030 as that broader question directly affects this more specific question.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 12, 2024
@RalfJung RalfJung added the I-lang-nominated Nominated for discussion during a lang team meeting. label Aug 12, 2024
@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-incr-comp Area: Incremental compilation C-bug Category: This is a bug. labels Aug 12, 2024
@RalfJung RalfJung removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 12, 2024
@jieyouxu jieyouxu added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Aug 12, 2024
@michaelwoerister
Copy link
Member

Some more, possibly interesting, points:

  • We never claimed that incr. comp. hashes are resistant against a deliberate attack. The hashes are meant to be fingerprints with the property of virtual uniqueness, i.e. the chance of a collision is supposed to be so small to be negligible in practice.
  • As far as I understand it, using a cryptographic hash function would only really improve the situation if we also widened hashes to at least 256 bits (we are using 128 bits at the moment).
  • There are hash functions that are designed specifically for the fingerprinting case, like XXH3. These make no claims about collision resistance.
  • The UMASH hash function might be interesting for this use case. It is not a cryptographic hash function but it provides a collision probability bound, which is the thing we are most interested in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants