Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure let stmt compound assignment removal suggestion respect codepoint boundaries #128865

Merged
merged 2 commits into from
Aug 9, 2024

Conversation

jieyouxu
Copy link
Member

@jieyouxu jieyouxu commented Aug 9, 2024

Previously we would try to issue a suggestion for let x <op>= 1, i.e.
a compound assignment within a let binding, to remove the <op>. The
suggestion code unfortunately incorrectly assumed that the <op> is an
exactly-1-byte ASCII character, but this assumption is incorrect because
we also recover Unicode-confusables like ➖= as -=. In this example,
the suggestion code used a + BytePos(1) to calculate the span of the
<op> codepoint that looks like - but the mult-byte Unicode
look-alike would cause the suggested removal span to be inside a
multi-byte codepoint boundary, triggering a codepoint boundary
assertion.

The fix is to use SourceMap::start_point(token_span) which properly accounts for codepoint boundaries.

Fixes #128845.

cc #128790

r? @fmease

For codepoint boundary assertion triggered by a let stmt compound
assignment removal suggestion when encountering recovered multi-byte
compound ops.

Issue: <rust-lang#128845>
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 9, 2024
@jieyouxu jieyouxu changed the title Ensure let stmt compount assignment removal suggestion respect codepoint boundaries Ensure let stmt compound assignment removal suggestion respect codepoint boundaries Aug 9, 2024
…t codepoint boundaries

Previously we would try to issue a suggestion for `let x <op>= 1`, i.e.
a compound assignment within a `let` binding, to remove the `<op>`. The
suggestion code unfortunately incorrectly assumed that the `<op>` is an
exactly-1-byte ASCII character, but this assumption is incorrect because
we also recover Unicode-confusables like `➖=` as `-=`. In this example,
the suggestion code used a `+ BytePos(1)` to calculate the span of the
`<op>` codepoint that looks like `-` but the mult-byte Unicode
look-alike would cause the suggested removal span to be inside a
multi-byte codepoint boundary, triggering a codepoint boundary
assertion.

Issue: <rust-lang#128845>
Copy link
Member

@Urgau Urgau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

r=me when CI is ready (unless you want fmease to review it)

@jieyouxu
Copy link
Member Author

jieyouxu commented Aug 9, 2024

unless you want fmease to review it

Nah this was mostly a "lol look another BytePos(1)" 😄

@bors r=@Urgau rollup

@bors
Copy link
Contributor

bors commented Aug 9, 2024

📌 Commit d65f131 has been approved by Urgau

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 9, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 9, 2024
Ensure let stmt compound assignment removal suggestion respect codepoint boundaries

Previously we would try to issue a suggestion for `let x <op>= 1`, i.e.
a compound assignment within a `let` binding, to remove the `<op>`. The
suggestion code unfortunately incorrectly assumed that the `<op>` is an
exactly-1-byte ASCII character, but this assumption is incorrect because
we also recover Unicode-confusables like `➖=` as `-=`. In this example,
the suggestion code used a `+ BytePos(1)` to calculate the span of the
`<op>` codepoint that looks like `-` but the mult-byte Unicode
look-alike would cause the suggested removal span to be inside a
multi-byte codepoint boundary, triggering a codepoint boundary
assertion.

The fix is to use `SourceMap::start_point(token_span)` which properly accounts for codepoint boundaries.

Fixes rust-lang#128845.

cc rust-lang#128790

r? `@fmease`
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 9, 2024
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#128742 (miri: make vtable addresses not globally unique)
 - rust-lang#128815 (Add `Steal::is_stolen()`)
 - rust-lang#128859 (Fix the name of signal 19 in library/std/src/sys/pal/unix/process/process_unix/tests.rs for mips/sparc linux)
 - rust-lang#128864 (Use `SourceMap::end_point` instead of `- BytePos(1)` in arg removal suggestion)
 - rust-lang#128865 (Ensure let stmt compound assignment removal suggestion respect codepoint boundaries)
 - rust-lang#128874 (Disable verbose bootstrap command failure logging by default)

r? `@ghost`
`@rustbot` modify labels: rollup
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 9, 2024
Ensure let stmt compound assignment removal suggestion respect codepoint boundaries

Previously we would try to issue a suggestion for `let x <op>= 1`, i.e.
a compound assignment within a `let` binding, to remove the `<op>`. The
suggestion code unfortunately incorrectly assumed that the `<op>` is an
exactly-1-byte ASCII character, but this assumption is incorrect because
we also recover Unicode-confusables like `➖=` as `-=`. In this example,
the suggestion code used a `+ BytePos(1)` to calculate the span of the
`<op>` codepoint that looks like `-` but the mult-byte Unicode
look-alike would cause the suggested removal span to be inside a
multi-byte codepoint boundary, triggering a codepoint boundary
assertion.

The fix is to use `SourceMap::start_point(token_span)` which properly accounts for codepoint boundaries.

Fixes rust-lang#128845.

cc rust-lang#128790

r? ``@fmease``
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Aug 9, 2024
Ensure let stmt compound assignment removal suggestion respect codepoint boundaries

Previously we would try to issue a suggestion for `let x <op>= 1`, i.e.
a compound assignment within a `let` binding, to remove the `<op>`. The
suggestion code unfortunately incorrectly assumed that the `<op>` is an
exactly-1-byte ASCII character, but this assumption is incorrect because
we also recover Unicode-confusables like `➖=` as `-=`. In this example,
the suggestion code used a `+ BytePos(1)` to calculate the span of the
`<op>` codepoint that looks like `-` but the mult-byte Unicode
look-alike would cause the suggested removal span to be inside a
multi-byte codepoint boundary, triggering a codepoint boundary
assertion.

The fix is to use `SourceMap::start_point(token_span)` which properly accounts for codepoint boundaries.

Fixes rust-lang#128845.

cc rust-lang#128790

r? ```@fmease```
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 9, 2024
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#128815 (Add `Steal::is_stolen()`)
 - rust-lang#128817 (VxWorks code refactored )
 - rust-lang#128822 (add `builder-config` into tarball sources)
 - rust-lang#128838 (rustdoc: do not run doctests with invalid langstrings)
 - rust-lang#128852 (use stable sort to sort multipart diagnostics)
 - rust-lang#128859 (Fix the name of signal 19 in library/std/src/sys/pal/unix/process/process_unix/tests.rs for mips/sparc linux)
 - rust-lang#128864 (Use `SourceMap::end_point` instead of `- BytePos(1)` in arg removal suggestion)
 - rust-lang#128865 (Ensure let stmt compound assignment removal suggestion respect codepoint boundaries)
 - rust-lang#128874 (Disable verbose bootstrap command failure logging by default)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 665a1a4 into rust-lang:master Aug 9, 2024
6 checks passed
@rustbot rustbot added this to the 1.82.0 milestone Aug 9, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Aug 9, 2024
Rollup merge of rust-lang#128865 - jieyouxu:unicurd, r=Urgau

Ensure let stmt compound assignment removal suggestion respect codepoint boundaries

Previously we would try to issue a suggestion for `let x <op>= 1`, i.e.
a compound assignment within a `let` binding, to remove the `<op>`. The
suggestion code unfortunately incorrectly assumed that the `<op>` is an
exactly-1-byte ASCII character, but this assumption is incorrect because
we also recover Unicode-confusables like `➖=` as `-=`. In this example,
the suggestion code used a `+ BytePos(1)` to calculate the span of the
`<op>` codepoint that looks like `-` but the mult-byte Unicode
look-alike would cause the suggested removal span to be inside a
multi-byte codepoint boundary, triggering a codepoint boundary
assertion.

The fix is to use `SourceMap::start_point(token_span)` which properly accounts for codepoint boundaries.

Fixes rust-lang#128845.

cc rust-lang#128790

r? ````@fmease````
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 9, 2024
…llaumeGomez

Rollup of 8 pull requests

Successful merges:

 - rust-lang#128815 (Add `Steal::is_stolen()`)
 - rust-lang#128822 (add `builder-config` into tarball sources)
 - rust-lang#128838 (rustdoc: do not run doctests with invalid langstrings)
 - rust-lang#128852 (use stable sort to sort multipart diagnostics)
 - rust-lang#128859 (Fix the name of signal 19 in library/std/src/sys/pal/unix/process/process_unix/tests.rs for mips/sparc linux)
 - rust-lang#128864 (Use `SourceMap::end_point` instead of `- BytePos(1)` in arg removal suggestion)
 - rust-lang#128865 (Ensure let stmt compound assignment removal suggestion respect codepoint boundaries)
 - rust-lang#128874 (Disable verbose bootstrap command failure logging by default)

r? `@ghost`
`@rustbot` modify labels: rollup
@jieyouxu jieyouxu deleted the unicurd branch August 10, 2024 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ICE: assertion failed: bpos.to_u32() >= mbc.pos.to_u32() + mbc.bytes as u32
5 participants