Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollup of 9 pull requests #94333

Merged
merged 27 commits into from
Feb 25, 2022
Merged

Rollup of 9 pull requests #94333

merged 27 commits into from
Feb 25, 2022

Conversation

Dylan-DPC
Copy link
Member

Successful merges:

Failed merges:

r? @ghost
@rustbot modify labels: rollup

Create a similar rollup

scottmcm and others added 27 commits February 21, 2022 00:54
Like I previously did for `reverse`, this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have.

It does still need logic to type-erase where appropriate, though, as while LLVM is now smart enough to vectorize over slices of things like `[u8; 4]`, it fails to do so over slices of `[u8; 3]`.

As a bonus, this also means one no longer gets the spurious `memcpy`(s?) at the end up swapping a slice of `__m256`s: <https://rust.godbolt.org/z/joofr4v8Y>
- Test the combinations of --check-cfg with partial values() and --cfg
- Test that we detect unexpected value when none are expected
`scan_escape` currently has a fast path (for when the first char isn't
'\\') and a slow path.

This commit changes `scan_escape` so it only handles the slow path, i.e.
the actual escaping code. The fast path is inlined into the two call
sites.

This change makes the code faster, because there is no function call
overhead on the fast path. (`scan_escape` is a big function and doesn't
get inlined.)

This change also improves readability, because it removes a bunch of
mode checks on the the fast paths.
The change looks big because `rustfmt` rearranges things, but the only
real change is the inlining annotation.
…-in macros

Previously it always returned `MacroKind::Bang` while some of those macros are actually attributes and derives
To make the `macro_rules` flag more readily available without decoding everything else
resolve/metadata: Stop encoding macros as reexports

Supersedes rust-lang#88335.
r? `@cjgillot`
…error-span, r=jackh726

better ObligationCause for normalization errors in `can_type_implement_copy`

Some logic is needed so we can point to the field when given totally nonsense types like `struct Foo(<u32 as Iterator>::Item);`

Fixes rust-lang#93687
…rochenkov

Improve `--check-cfg` implementation

This pull-request is a mix of improvements regarding the `--check-cfg` implementation:

- Simpler internal representation (usage of `Option` instead of separate bool)
- Add --check-cfg to the unstable book (based on the RFC)
- Improved diagnostics:
    * List possible values when the value is unexpected
    * Suggest if possible a name or value that is similar
- Add more tests (well known names, mix of combinations, ...)

r? ```@petrochenkov```
Stop manually SIMDing in `swap_nonoverlapping`

Like I previously did for `reverse` (rust-lang#90821), this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have.

A variety of codegen tests are included to confirm that the various cases are still being vectorized.

It does still need logic to type-erase in some cases, though, as while LLVM is now smart enough to vectorize over slices of things like `[u8; 4]`, it fails to do so over slices of `[u8; 3]`.

As a bonus, this change also means one no longer gets the spurious `memcpy`(s?) at the end up swapping a slice of `__m256`s: <https://rust.godbolt.org/z/joofr4v8Y>

<details>

<summary>ASM for this example</summary>

## Before (from godbolt)

note the `push`/`pop`s and `memcpy`

```x86
swap_m256_slice:
        push    r15
        push    r14
        push    r13
        push    r12
        push    rbx
        sub     rsp, 32
        cmp     rsi, rcx
        jne     .LBB0_6
        mov     r14, rsi
        shl     r14, 5
        je      .LBB0_6
        mov     r15, rdx
        mov     rbx, rdi
        xor     eax, eax
.LBB0_3:
        mov     rcx, rax
        vmovaps ymm0, ymmword ptr [rbx + rax]
        vmovaps ymm1, ymmword ptr [r15 + rax]
        vmovaps ymmword ptr [rbx + rax], ymm1
        vmovaps ymmword ptr [r15 + rax], ymm0
        add     rax, 32
        add     rcx, 64
        cmp     rcx, r14
        jbe     .LBB0_3
        sub     r14, rax
        jbe     .LBB0_6
        add     rbx, rax
        add     r15, rax
        mov     r12, rsp
        mov     r13, qword ptr [rip + memcpy@GOTPCREL]
        mov     rdi, r12
        mov     rsi, rbx
        mov     rdx, r14
        vzeroupper
        call    r13
        mov     rdi, rbx
        mov     rsi, r15
        mov     rdx, r14
        call    r13
        mov     rdi, r15
        mov     rsi, r12
        mov     rdx, r14
        call    r13
.LBB0_6:
        add     rsp, 32
        pop     rbx
        pop     r12
        pop     r13
        pop     r14
        pop     r15
        vzeroupper
        ret
```

## After (from my machine)

Note no `rsp` manipulation, sorry for different ASM syntax

```x86
swap_m256_slice:
	cmpq	%r9, %rdx
	jne	.LBB1_6
	testq	%rdx, %rdx
	je	.LBB1_6
	cmpq	$1, %rdx
	jne	.LBB1_7
	xorl	%r10d, %r10d
	jmp	.LBB1_4
.LBB1_7:
	movq	%rdx, %r9
	andq	$-2, %r9
	movl	$32, %eax
	xorl	%r10d, %r10d
	.p2align	4, 0x90
.LBB1_8:
	vmovaps	-32(%rcx,%rax), %ymm0
	vmovaps	-32(%r8,%rax), %ymm1
	vmovaps	%ymm1, -32(%rcx,%rax)
	vmovaps	%ymm0, -32(%r8,%rax)
	vmovaps	(%rcx,%rax), %ymm0
	vmovaps	(%r8,%rax), %ymm1
	vmovaps	%ymm1, (%rcx,%rax)
	vmovaps	%ymm0, (%r8,%rax)
	addq	$2, %r10
	addq	$64, %rax
	cmpq	%r10, %r9
	jne	.LBB1_8
.LBB1_4:
	testb	$1, %dl
	je	.LBB1_6
	shlq	$5, %r10
	vmovaps	(%rcx,%r10), %ymm0
	vmovaps	(%r8,%r10), %ymm1
	vmovaps	%ymm1, (%rcx,%r10)
	vmovaps	%ymm0, (%r8,%r10)
.LBB1_6:
	vzeroupper
	retq
```

</details>

This does all its copying operations as either the original type or as `MaybeUninit`s, so as far as I know there should be no potential abstract machine issues with reading padding bytes as integers.

<details>

<summary>Perf is essentially unchanged</summary>

Though perhaps with more target features this would help more, if it could pick bigger chunks

## Before

```
running 10 tests
test slice::swap_with_slice_4x_usize_30                            ... bench:         894 ns/iter (+/- 11)
test slice::swap_with_slice_4x_usize_3000                          ... bench:      99,476 ns/iter (+/- 2,784)
test slice::swap_with_slice_5x_usize_30                            ... bench:       1,257 ns/iter (+/- 7)
test slice::swap_with_slice_5x_usize_3000                          ... bench:     139,922 ns/iter (+/- 959)
test slice::swap_with_slice_rgb_30                                 ... bench:         328 ns/iter (+/- 27)
test slice::swap_with_slice_rgb_3000                               ... bench:      16,215 ns/iter (+/- 176)
test slice::swap_with_slice_u8_30                                  ... bench:         312 ns/iter (+/- 9)
test slice::swap_with_slice_u8_3000                                ... bench:       5,401 ns/iter (+/- 123)
test slice::swap_with_slice_usize_30                               ... bench:         368 ns/iter (+/- 3)
test slice::swap_with_slice_usize_3000                             ... bench:      28,472 ns/iter (+/- 3,913)
```

## After

```
running 10 tests
test slice::swap_with_slice_4x_usize_30                            ... bench:         868 ns/iter (+/- 36)
test slice::swap_with_slice_4x_usize_3000                          ... bench:      99,642 ns/iter (+/- 1,507)
test slice::swap_with_slice_5x_usize_30                            ... bench:       1,194 ns/iter (+/- 11)
test slice::swap_with_slice_5x_usize_3000                          ... bench:     139,761 ns/iter (+/- 5,018)
test slice::swap_with_slice_rgb_30                                 ... bench:         324 ns/iter (+/- 6)
test slice::swap_with_slice_rgb_3000                               ... bench:      15,962 ns/iter (+/- 287)
test slice::swap_with_slice_u8_30                                  ... bench:         281 ns/iter (+/- 5)
test slice::swap_with_slice_u8_3000                                ... bench:       5,324 ns/iter (+/- 40)
test slice::swap_with_slice_usize_30                               ... bench:         275 ns/iter (+/- 5)
test slice::swap_with_slice_usize_3000                             ... bench:      28,277 ns/iter (+/- 277)
```

</detail>
…ointer, r=michaelwoerister

properly handle fat pointers to uninhabitable types

Calculate the pointee metadata size by using `tcx.struct_tail_erasing_lifetimes` instead of duplicating the logic in `fat_pointer_kind`. Open to alternatively suggestions on how to fix this.

Fixes rust-lang#94149

r? ````@michaelwoerister```` since you touched this code last, I think!
…i-obk

Normalize main return type during mono item collection & codegen

The issue can be observed with `-Zprint-mono-items=lazy` in:

```rust
#![feature(termination_trait_lib)]
fn main() -> impl std::process::Termination { }
```
```
BEFORE: MONO_ITEM fn std::rt::lang_start::<impl std::process::Termination> ````@@```` t.93933fa2-cgu.2[External]
AFTER:  MONO_ITEM fn std::rt::lang_start::<()> ````@@```` t.df56e625-cgu.1[External]
```
…unescaping, r=petrochenkov

Improve string literal unescaping

Some easy wins that affect a few popular crates.

r? ```@matklad```
…etrochenkov

Avoid emitting full macro body into JSON errors

While investigating rust-lang#94322, it was noted that currently the JSON diagnostics for macro backtraces include the full def_site span -- the whole macro body.

It seems like this shouldn't be necessary, so this PR adjusts the span to just be the "guessed head", typically the macro name. It doesn't look like we keep enough information to synthesize a nicer span here at this time.

Atop rust-lang#92123, this reduces output for the src/test/ui/suggestions/missing-lifetime-specifier.rs test from 660 KB to 156 KB locally.
@rustbot rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. rollup A PR which is a rollup labels Feb 24, 2022
@Dylan-DPC
Copy link
Member Author

@bors r+ rollup=never p=5

@bors
Copy link
Contributor

bors commented Feb 24, 2022

📌 Commit 3bd163f has been approved by Dylan-DPC

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Feb 24, 2022
@bors
Copy link
Contributor

bors commented Feb 24, 2022

⌛ Testing commit 3bd163f with merge 4e82f35...

@bors
Copy link
Contributor

bors commented Feb 25, 2022

☀️ Test successful - checks-actions
Approved by: Dylan-DPC
Pushing 4e82f35 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 25, 2022
@bors bors merged commit 4e82f35 into rust-lang:master Feb 25, 2022
@rustbot rustbot added this to the 1.61.0 milestone Feb 25, 2022
@Dylan-DPC Dylan-DPC deleted the rollup-7yxtywp branch February 25, 2022 01:59
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4e82f35): comparison url.

Summary: This benchmark run shows 14 relevant improvements 🎉 but 9 relevant regressions 😿 to instruction counts.

  • Arithmetic mean of relevant regressions: 0.8%
  • Arithmetic mean of relevant improvements: -5.9%
  • Arithmetic mean of all relevant changes: -3.3%
  • Largest improvement in instruction counts: -14.4% on incr-unchanged builds of encoding check
  • Largest regression in instruction counts: 1.1% on incr-patched: println builds of cargo opt

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

@rustbot rustbot added the perf-regression Performance regression. label Feb 25, 2022
@RalfJung RalfJung mentioned this pull request Feb 25, 2022
@RalfJung
Copy link
Member

RalfJung commented Feb 25, 2022

This rollup also had surprising consequences in Miri: starting with this PR, Miri complains about UB in the libtest harness (see rust-lang/miri#1986).

EDIT: The culprit appears to be #94212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. rollup A PR which is a rollup S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.