Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead code pass no longer considers enum used in pub extern "C" fn as used #126706

Open
chinedufn opened this issue Jun 19, 2024 · 24 comments
Open
Labels
A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. C-bug Category: This is a bug. L-dead_code Lint: dead_code S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@chinedufn
Copy link
Contributor

A repository that I maintain recently received a bug report about our proc macro generating a warning after they upgraded to 1.79.0.

I haven't bisected for the exact commit that introduces the warning, but I have been able to reproduce it in 1.79.0.

Minimal Reproduction

fn main() {}

mod private_mod {
    #[repr(C)]
    pub enum MyEnum {
        Variant(u32)
    }
    
    #[no_mangle]
    pub extern "C" fn some_function() -> MyEnum {
        MyEnum::Variant(5)
    }
}
   Compiling playground v0.0.1 (/playground)
warning: field `0` is never read
 --> src/main.rs:6:17
  |
6 |         Variant(u32)
  |         ------- ^^^
  |         |
  |         field in this variant
  |
  = note: `#[warn(dead_code)]` on by default
help: consider changing the field to be of unit type to suppress this warning while preserving the field numbering, or remove the field
  |
6 |         Variant(())
  |                 ~~

warning: `playground` (bin "playground") generated 1 warning
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.85s
     Running `target/debug/playground`

In the playground -> https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3f5c81c0749130a6e46fbea14cbb055f

Potential Candidates

This comment links to some rustc commits that may have introduced the issue chinedufn/swift-bridge#278 (comment)

As mentioned, I haven't taken the time to bisect since I know that it's possible that a core maintainer already has a good sense of when/how this regression could have been introduced.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 19, 2024
@jieyouxu jieyouxu added A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. L-dead_code Lint: dead_code C-bug Category: This is a bug. S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 19, 2024
@jieyouxu
Copy link
Member

Also tagging T-lang on this because this involves user-observable changes to the dead_code lint. Feel free to readjust as appropriate.

@chinedufn
Copy link
Contributor Author

Just updated to the latest nightly and came across another similar regression:

// src/lib.rs

pub mod module {
    #[repr(C)]
    pub struct Foo(u32);
}

No warning on stable 1.79.0

Has a warning on latest nightly (I haven't bisected)


Stable (no warning) -> https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=08aa241f1e1544fae1c50f66d16c94f6
Nightly (warning) -> https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=9cacd97f9ce149c18cfd8f2e5052032a

Here is the output on nightly-2024-07-02:

   Compiling playground v0.0.1 (/playground)
warning: struct `Foo` is never constructed
 --> src/main.rs:4:16
  |
4 |     pub struct Foo(u32);
  |                ^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: `playground` (bin "playground") generated 1 warning
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.38s
     Running `target/debug/playground`

Ah, actually the pub mod my_mod { .. } wrapper isn't necessary to reproduce the warning.

Here's an illustration of code that we don't want to lead to a warning, but does on latest nightly.

// crate `foo`'s src/lib.rs

#[repr(C)]
pub struct Foo(u32);
    
impl Foo{
    pub fn print(&self) {
        println!("{}", self.0)
    }
}

It isn't dead code because another crate might use Foo like this:

// In another crate that depends on crate `foo` above
#[no_mangle)
pub extern "C" fn print_foo(foo: Foo) {
    foo.print();
}

@traviscross
Copy link
Contributor

traviscross commented Jul 6, 2024

Regarding the original report, this doesn't seem related to extern "C". E.g., this also warns:

mod private_mod {
    pub enum MyEnum {
        Variant(u32)
        //~^ WARN field `0` is never read
    }

    pub fn some_function() -> MyEnum {
        MyEnum::Variant(5)
    }
}

fn main() {
    let _ = private_mod::some_function();
}

And the warning is correct. The field is never read. If you read the field, the warning goes away, both here and in your original example.

Regarding your second example, the warning there also seems correct. The Foo struct has a private field, and it's never constructed. If you make that field public, the warning disappears:

// lib.rs
pub struct Foo(pub u32); //~ No warning

If this analysis understands your report correctly and you agree there's no issue here, then please close the issue.

@chinedufn
Copy link
Contributor Author

chinedufn commented Jul 7, 2024

The analysis misunderstands the problem.

In both my first and second example there is a dead code warning for code that is not dead (note that this is the dead_code lint emitting the warning).


The reason that the extern "C" is important is that it's what makes the field alive.

In your example code that removes the #[repr(C)] the enum's field is never read.

If you add #[repr(C)] and pass it over FFI then the field might be read by a caller (i.e. a C program) that Rust cannot see.


Recall that the error says:

warning: field `0` is never read

When the #[repr(C)] is present, Rust cannot know this.
For instance, a C program might be making use of that field 0.


I'm going to add a #[repr(C)] to your pub struct Foo(pub u32) to fully illustrate why it does not solve the problem.

mod foo {
    #[repr(C)]
    pub struct Foo(pub u32);

    impl Foo {
        pub fn new(val: u32) -> Self {
            Self(val)
        }
    }

    pub extern "C" fn return_foo() -> Foo {
        Foo(5);
    }
    // Doubles the value held inside `Foo`
    extern "C" {
        fn double_foo(foo: Foo) -> Foo;
    }
}

fn main() {
    let foo = Foo::new(10);

    // Good. We only want Rust callers to be able to double `Foo`.
    // We don't want them to be able to do anything else.
    let foo = unsafe { foo::double_foo(foo) };

    // BAD: We don't want `Foo`'s internal to be accessible outside of `mod foo`.
    // To solve this, we need to change `pub struct Foo(pub u32)` to `pub struct Foo(u32)`.
    foo.0 = 18;
}

In this example your suggestion changes the semantics of the program.

Foo(pub u32) has broken our intended invariant (Rust users are only able to double the inner value).


The simplify:

The reason that this bazz function below is not dead code is that it might be called by a caller that rustc cannot see.

// src/lib.rs
pub function bazz() {}

Similarly, the reason that the #[repr(C)] enum variant is alive in my examples above is alive is that it might be used by a caller that rustc cannot see.

@chinedufn
Copy link
Contributor Author

chinedufn commented Jul 7, 2024

Note that this gives us warning:

mod foo {
    #[repr(C)]
    pub enum MyEnum {
        Variant(u32)
    }

    #[no_mangle]
    pub extern "C" fn make_enum() -> MyEnum {
        MyEnum::Variant(10)
    }
}

fn main() {

But this does not warn:

// src/lib.rs

pub enum MyEnum {
    Variant(u32)
}

In the second example rustc is properly recognizing that MyEnum::Variant might be used by a downstream crate.

Similarly, the pub extern "C" fn foo() -> MyEnum {} with #[repr(C)] enum MyEnum { Variant(u32) } means that MyEnum::Variant might be used by some FFI caller.


Hoping that all of the above clearly illustrates the problem.

@traviscross
Copy link
Contributor

traviscross commented Jul 7, 2024

Regarding your first example, is it correct to say that you believe that this should not warn?:

// lib.rs

#[repr(C, u8)]
enum MyEnum {
    Variant { inner: u8 },
    //~^ WARN field `inner` is never read
}

#[no_mangle]
extern "C" fn some_function() -> MyEnum {
    MyEnum::Variant { inner: 0 }
}

...under the interpretation that the #[no_mangle] should act like #[used] and ensure that some_function ends up in and exported in the object file, and that some program could link this object and some C function could therefore call some_function, get a value of type MyEnum, and, under the RFC 2195 semantics, read the field in that variant?

(According to Godbolt, this program has always produced a warning, which isn't to say whether that's correct or not.)

@traviscross
Copy link
Contributor

traviscross commented Jul 7, 2024

Regarding your second example, is it correct to say that you believe that this should not warn:

// lib.rs

#[repr(transparent)]
pub struct Foo {
    //~^ WARN struct `Foo` is never constructed
    inner: u8,
}

impl Foo {
    pub fn print(&self) {
        println!("{}", self.inner)
    }
}

...because a downstream crate could do this?:

fn main() {
    let x: Foo = unsafe { core::mem::transmute(0u8) };
    x.print();
}

Note that this warning is in beta as well as nightly.

@chinedufn
Copy link
Contributor Author

chinedufn commented Jul 7, 2024

You are getting //~^ WARN field inner is never read because the function isn't public.

The following does not give a warning:

[toolchain]
# arbitrarily chosen nightly from before this regression
channel = "nightly-2024-03-01"
// src/lib.rs

// This DOES NOT give a warning (`nightly-2024-03-01`)

#[repr(C, u8)]
pub enum MyEnum {
    Variant { inner: u8 },
}

#[no_mangle]
pub extern "C" fn some_function() -> MyEnum {
    MyEnum::Variant { inner: 0 }
}

I also checked nightly-2018-03-01 and the code block above (... Variant { inner: u8 } ...) did not produce a warning.

So, this has all worked properly for at least the last 6 years until a recent regression.


For your second question about the #[repr(transparent)] struct:

According to the nomicon, "the goal of repr(transparent) is to make it possible to transmute between the single field and the struct/enum".

So, since #[repr(transparent)] exists to facilitate transmutations, if a type #[repr(transparent)] type is publicly accessible then its field should be considered alive.
#[repr(transparent)] signals that we intend for the type's field to be accessed in a way that rustc cannot see.

More simply:

  • the dead code lint should treat #[repr(transparent)] as #[used] if the type CAN be accessed outside of the crate
  • the dead code lint should not treat #[repr(transparent)] as #[used] if the type CAN NOT be accessed outside of the crate.

Note that your #[repr(transparent)] example warns in nightly-2024-07-01 but does not warn in nightly-2024-03-01 (I chose those two dates 2024-03-01 arbitrarily).


The following does not warn in nightly-2024-03-01 but does warn on nightly-2024-07-01.

// lib.rs

pub struct Foo {
    inner: u8,
}

impl Foo {
    pub fn print(&self) {
        println!("{}", self.inner)
    }
}

This is an improvement. inner should be considered dead here.


The following would solve the original issue:

  • if MyType is used as an argument or return value in a #[no_mangle] pub extern "C" my_func(..) -> .., then all of MyType's fields are alive

Then to fix #[repr(transparent)] liveness checks:

  • If #[repr(transparent) FooType is visible outside of the crate then FooType's field is alive

@traviscross
Copy link
Contributor

Regarding the first example, with your revision:

// lib.rs

#[repr(C, u8)]
pub enum MyEnum {
    Variant { inner: u8 },
}

#[no_mangle]
pub extern "C" fn some_function() -> MyEnum {
    MyEnum::Variant { inner: 0 }
}

This does not warn in the latest nightly either:

Playground link

So I'm unclear what you're trying to say here, and I'm still curious if you think the version I gave should warn, as that's spiritually similar to your original example that used a private module.


Regarding the second example:

// lib.rs

#[repr(transparent)]
pub struct Foo {
    //~^ WARN struct `Foo` is never constructed
    inner: u8,
}

impl Foo {
    pub fn print(&self) {
        println!("{}", self.inner)
    }
}

Since inner is not a public field, it seems valid to me for something within this crate to construct a value of type Foo with unsafe { core::mem::transmute::<u8, Foo>(0) }, but it doesn't seem valid for another crate to do that. You suggest that this should be valid:

So, since #[repr(transparent)] exists to facilitate transmutations, if a type #[repr(transparent)] type is publicly accessible then its field should be considered alive.

Can you point us at any documentation or earlier decisions that we've made that would suggest that this should be supported?

@chinedufn
Copy link
Contributor Author

chinedufn commented Jul 7, 2024

Oops, I meant:

// warns on 2024-07-01
// does not warn on 2024-03-01
mod private_mod {
    #[repr(C, u8)]
    pub enum MyEnum {
        Variant { inner: u8 },
    }

    #[no_mangle]
    pub extern "C" fn some_function() -> MyEnum {
        MyEnum::Variant { inner: 0 }
    }
}

The above warns in 2024-07-01 but not in 2024-03-01.


Regarding your question about #[no_mangle] extern "C" fn some_function() -> MyEnum (without the pub keyword)

  • it should warn if some_function is not reachable outside the crate
  • it should not warn if some_function is not reachable outside the crate

I don't not whether a non-pub #[no_mangle] extern "C" fn some_function() -> MyEnum is stripped from the final library.


I can't point to earlier documentation or decisions.

Your counterpoint is reasonable. I would have to think more deeply about this to know where I stand.

This is not a problem that I have faced. I do not currently have evidence of real code that would run into this #[repr(transparent)] warning.


I'd like to remain focused on #[repr(C)] types that are exposed in a pub extern "C' fn.

Reasons:

  • it can be solved independently of every other example that you've raised
  • it is the actual, real papercut that I am currently facing. I have had to change my code generation to account for this warning. I'd rather this issue remains focused on that so that we can fix it and I can remove the #[allow(unused)] from my code generation.

So, here is exactly what I would prefer this issue to remain focused on:

// This should not warn
mod private_mod {
    #[repr(C)]
    pub enum MyEnum {
        Variant(u8)
    }

    #[no_mangle]
    pub extern "C" fn some_function() -> MyEnum {
        MyEnum::Variant(5)
    }
}

If we agree that this should not produce a warning, then I'd like to move forward to selecting a solution.

If you believe that this should produce a warning, please state why. I am unclear on where you stand on this specific example.

@traviscross
Copy link
Contributor

traviscross commented Jul 8, 2024

Here's a comparison of the behaviors of some different cases:

Playground link

I would expect E1, E2, S1, and S2 to have the same behavior, whatever that is. And similarly, I'd expect E3, E4, S3, and S4 to have the same behavior.

I'd be interested to hear whether there are any reasons that the first set (E1, E2, S1, and S2) and the last set (E3, E4, S3, and S4) should have different behaviors from each other (e.g. perhaps related to RFC 2145).

I'm interested to hear what others think here about the situation and the proper behavior.

cc @petrochenkov @shepmaster @WaffleLapkin

@shepmaster
Copy link
Member

cc @mu001999

@shepmaster
Copy link
Member

Before reading the comments deeply, these are my opinions:

  1. These should all behave identically:
    enum MyEnum { Variant(u32) }
    enum MyEnum { Variant { name: u32 } }
    struct MyStruct(u32)
    struct MyStruct { name: u32 }
  2. #[repr(C)] should have no impact on the dead_code lint. repr(C) only controls layout. You can have repr(C) types that are not exported.
  3. extern "C" should have no impact on the dead_code lint. extern "..." only controls ABI. You can have extern "C" functions that are not exported.
  4. #[no_mangle] should probably have an impact on the dead_code lint. Note that #[no_mangle] fn some_function() {} causes some_function to be present in the assembly regardless of (a) debug / release mode, (b) if the function is pub or not, (c) if the function is extern "C" or not.

@mu001999
Copy link
Contributor

mu001999 commented Jul 8, 2024

There is a related issue (#126169) about the example in #126706 (comment)

@mu001999
Copy link
Contributor

mu001999 commented Jul 8, 2024

2. `#[repr(C)]` should have no impact on the `dead_code` lint. `repr(C)` only controls layout. You can have `repr(C)` types that are not exported.

@shepmaster IIUC, this is not true, there is some logic for repr(C) in the dead_code lint, you can search unconditionally_treat_fields_as_live in the code.

In short, fields will be marked live if there is a repr(C). But I don't know why this doesn't work for the examples in the comments.

@shepmaster
Copy link
Member

shepmaster commented Jul 8, 2024

IIUC, this is not true, there is some logic for repr(C) in the dead_code lint

You are right, I should have been more specific — I wish that the lint's logic did not involve repr at all, but it already did and then we extended that.

@mu001999
Copy link
Contributor

mu001999 commented Jul 8, 2024

@shepmaster oic, I agree with #126706 (comment). And in fact repr(C) implies more semantics.

@chinedufn
Copy link
Contributor Author

It seems like having #[no_mangle] make a #[repr(C)] type count as used is uncontroversial so far?

As in

  • IF #[no_mangle] fn some_function(arg: ReprCType1) -> ReprCType2
  • then: ReprCType1 and ReprCType2 would never emit a dead code warning

@mu001999 would you be willing to mentor me towards implementing this?

I could start by submitting a PR that either gets it working or gets close, and then you could review and answer any questions that I might leave for you in the PR?

Then if this issue reaches a clear consensus on the #[no_mangle] change being okay, we can merge that PR.

@shepmaster
Copy link
Member

It seems like having #[no_mangle] make a #[repr(C)] type count as used is uncontroversial so far?

This is not the mindset I would use: repr(C) should have no bearing here. #[no_mangle] should make the function it is attached to effectively public1. Everything else should fall out from that2 — since we cannot tell that the function is unused, then we cannot tell that the types used by the function are unused and they should not be reported by the lint.

Footnotes

  1. Now, should it actually count as truly public? I don't know. For example, a pub function with a non-pub return type triggers a private-in-public warning. Should marking the function as #[no_mangle] instead of pub also warn that the type is more public than the function?

  2. I actually wonder if this would allow us to roll back some of the earlier decisions about how repr(C) / repr(transparent) interact with the dead_code lint. That is, does treating #[no_mangle] as a usage site allow us to remove the special-casing of repr?

@traviscross
Copy link
Contributor

traviscross commented Jul 8, 2024

Now, should it actually count as truly public? I don't know. For example, a pub function with a non-pub return type triggers a private-in-public warning. Should marking the function as #[no_mangle] instead of pub also warn that the type is more public than the function?

It does seem unfortunate to have #[no_mangle] act as a separate way (than pub visibility) to make something crate-public. If I were reviewing code that relied on such a function being crate-public without it being crate-public according to visibility, I'd find that surprising and want that changed. I'm struggling to think of a good reason to do this, and intuitively, I kind of want to lint against it.

@chinedufn
Copy link
Contributor Author

chinedufn commented Jul 8, 2024

Shepmaster I'm not sure that we're on the same page.

Are you suggesting that once a type is exposed by a #[no_mangle] function, we should always
consider all of that type's fields as publicly accessible?

For instance, are you suggesting that the following should not produce a "field 0 is never read`" warning?

// This SHOULD warn that field 0 is never read
mod private_mod {
    pub enum MyEnum {
        Variant(Vec<u8>)
    }
    
    #[no_mangle]
    pub fn new_bar() -> MyEnum {
        MyEnum::Variant(vec![5])
    }
}

I am suggesting that the above code snippet should still produce a warning (of some sort .. maybe a different one ..),
under the idea that there is no reason to believe that field 0 is read.

If it is being read it's because the user is depending on an unspecified memory layout.
Maybe the warning would be about that. Not sure. I see this as a separate issue/topic that would require some design work.


The reason that #[repr(C)] is relevant is that the following should not give a warning: field 0 is never read:

// This SHOULD NOT warn that field 0 is never read
mod private_mod {
    #[repr(C)]
    pub enum MyEnumC {
        Variant(u32)
    }
    
    #[no_mangle]
    pub extern "C" fn new_bar() -> MyEnumC {
        MyEnumC::Variant(5)
    }
}

I am suggesting that in this second code snippet we have given enough of a signal for the linter to treat the u32 field as alive, since it can be read/used by a C caller.

So, it should treat the u32 field as alive.

The reason that #[repr(C)] is important is that it is a very strong signal that all of the fields are publicly accessible to a C caller.
When you combine that with a #[no_mangle] extern "C" fn some_function we are essentially saying that the types fields are all readable by a C caller, and visible to a C caller, so all of the fields are alive.


I'm treating #[repr(C)] + #[no_mangle] as a strong-enough signal of certain usage patterns that it should be a special case where the fields are obviously alive.

The other cases like having no repr at all, having #[repr(something_else)] seem to draw more debate/considerations, hence me wanting to treat them separately.

Of course I'm a bit biased towards carving out a solution for #[repr(C)] since it is the real papercut that I am facing in real code, but after seeing all of the discussion in some of the linked issues I do think it's reasonable to separate out the uncontroversial cases from the controversial ones so that some amount of progress can be made.


If you can point to where you disagree with the above then I think we'll be on the same page.

It seems like you've addressed something like this before, so feel free to link me to previous discussions and I'll happily read them.

@mu001999
Copy link
Contributor

mu001999 commented Jul 9, 2024

would you be willing to mentor me towards implementing this?

@chinedufn thanks, I am not the member of compiler team so I don't have the access to approve. But I will have a look if you open such a PR.


I think #[no_mangle] implies the function may be called by external code, but #[repr(C)] implies another thing that the type will also be used by external code but not only constructed. Without #[repr(C)], the type may only be constructed externally and used in Rust crate, this allows us to lint such never-read fields.

So #[no_mangle] + #[repr(C)] suppresses the lint for never-read fields looks good to me.

And for current dead code lint, #[no_mangle] would mark the type live and then #[repr(C)] would mark all the fields live. I think it's a regression and fixing it for private Enums may be good enough.

@chinedufn
Copy link
Contributor Author

chinedufn commented Jul 31, 2024

I'm not entirely sure, but it seems like #128404 will close this issue.

Conversation that led to that PR -> #127104 (comment)

@mu001999
Copy link
Contributor

mu001999 commented Jul 31, 2024

I'm not entirely sure, but it seems like #128404 will close this issue.

Conversation that led to that PR -> #127104 (comment)

that's not true, those changes are not in 1.79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. C-bug Category: This is a bug. L-dead_code Lint: dead_code S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants