What are the special magic rules around `malloc`? #535

RalfJung · 2024-10-06T06:56:56Z

Taken from #534:

// use a mutable reference to prevent the MIR opt from happening
#[no_mangle]
pub fn src(x: &mut &u8) -> impl Sized {
    let y = **x;
    let mut z = Box::new(0);
    // a bunch of code that operates on the `Box`, however, 
    // nothing else can potentially access the underlying `u8`
    // that's behind the double reference besides the `__rust_alloc` call.
    

    // optimizable to `true`?
    **x == y
}

Currently, LLVM doesn't do the second optimization. However, it does perform it if you manually set System to be the global allocator: https://rust.godbolt.org/z/a77PWjeKE ¹. This is due to this line, which is used by their GVN pass.

There are clearly special magic rules applying specifically for malloc that mean that its memory must be truly fresh for the Abstract Machine, and cannot be part of any previously existing stack/heap/other allocation. This is "fine" as long as malloc is called via FFI and all the state it works in is completely hidden from the current compilation unit. It becomes rather incoherent if there is ever a chance of malloc itself being inlined into surrounding code, or exchanging data with surrounding code via global state -- so we better have rules in place against things like that. I think we should say that malloc is reserved to be provided by the underlying runtime system, and it must be called via FFI in a way that no inlining is possible.

Note that this is separate from Rust's #[global_allocator] attribute, which does not get all the same magic that malloc gets. See #442 for discussion of the semantics of that attribute.

You also get the malloc -> calloc transformation for types other than these hardcoded ones if you set System to be the global allocator manually. ↩

The text was updated successfully, but these errors were encountered:

VorpalBlade · 2024-10-06T07:42:48Z

The issue with this magic that I see is if you implement malloc itself in Rust.

If it is in a completely different cdylib/cststiclib that is probably still fine(?)
I'm not sure what happens if you implement a libc that both provides malloc and uses the same malloc itself. This is actually required, some functions in libc are documented to return allocations from malloc that should be freed with free. Such as strdup (and many more).
If it is part of the same compilation graph (as is usually the case for embedded for example) you might run into issues(?).

Another issue is LTO or even cross-language LTO.

RalfJung · 2024-10-06T09:41:05Z

I agree that this magic is potentially problematic. I don't know if LLVM has a way to disable it though.

VorpalBlade · 2024-10-06T14:26:57Z

I agree that this magic is potentially problematic. I don't know if LLVM has a way to disable it though.

Fair enough. But I do believe rust / llvm need an answer for how to properly handle the above scenarios. How do I do these things soundly in Rust? Can I or can I not use LTO when making a libc for example?

Also, as I understand it, any soundness issues that cannot be traced to an unsafe block (or unsafe attribute, unsafe command line flags (though I don't think those exist yet?), etc) are compiler bugs? Though in this case I guess the unsafe bit is the no-mangle export of a function called malloc, but that feels like a cop-out and would make it really difficult to write a libc in Rust.

Diggsey · 2024-10-06T15:30:06Z

There are clearly special magic rules applying specifically for malloc that mean that its memory must be truly fresh for the Abstract Machine, and cannot be part of any previously existing stack/heap/other allocation.

Could I dig a bit more into why this is important? Could we avoid such issues by having the malloc implementation explicitly "carve out" an existing allocation and give it back to the Abstract Machine, minting a new allocation? I imagine this "carving out" would come with significant limitations, such as no access being allowed to that region of memory until it is returned.

In this model, the malloc implementation accessing the memory after carving it out would be UB.

RalfJung mentioned this issue Oct 6, 2024

What memory is the Global allocator allowed to access #534

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the special magic rules around `malloc`? #535

What are the special magic rules around `malloc`? #535

RalfJung commented Oct 6, 2024

VorpalBlade commented Oct 6, 2024 •

edited

Loading

RalfJung commented Oct 6, 2024

VorpalBlade commented Oct 6, 2024

Diggsey commented Oct 6, 2024

What are the special magic rules around malloc? #535

What are the special magic rules around malloc? #535

Comments

RalfJung commented Oct 6, 2024

Footnotes

VorpalBlade commented Oct 6, 2024 • edited Loading

RalfJung commented Oct 6, 2024

VorpalBlade commented Oct 6, 2024

Diggsey commented Oct 6, 2024

What are the special magic rules around `malloc`? #535

What are the special magic rules around `malloc`? #535

VorpalBlade commented Oct 6, 2024 •

edited

Loading