Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect unsafe code example in standard library #77220

Closed
TrolledWoods opened this issue Sep 26, 2020 · 19 comments · Fixed by #77385
Closed

Incorrect unsafe code example in standard library #77220

TrolledWoods opened this issue Sep 26, 2020 · 19 comments · Fixed by #77385
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@TrolledWoods
Copy link
Contributor

The safety clause on the Vec::set_len function says that:

  • new_len must be less than or equal to capacity().
  • The elements at old_len..new_len must be initialized.

However, the code example for std::ptr::copy violates the second rule

use std::ptr;

unsafe fn from_buf_raw<T>(ptr: *const T, elts: usize) -> Vec<T> {
    let mut dst = Vec::with_capacity(elts);
    dst.set_len(elts);
    ptr::copy(ptr, dst.as_mut_ptr(), elts);
    dst
}

The set_len is called before the values are initialized with the copy.

This could be fixed by either making the safety clause for set_len more inclusive, so that as long as you don't use the vector before initializing the values it's considered safe, or by switching the dst.set_len(elts); and ptr::copy(ptr, dst.as_mut_ptr(), alts); lines around in the example.

@leonardo-m
Copy link

Eventually Rust will need pre-conditions and post-conditions to verify statically and automatically that (most) invariants of unsafe functions hold :-)

@TrolledWoods
Copy link
Contributor Author

Isn't the point of unsafe code that you yourself uphold the invariants?

@lcnr
Copy link
Contributor

lcnr commented Sep 26, 2020

yeah, afaict the safety conditions of set_len currently do not guarantee that set_len doesn't look at any elements, so this would be an allowed implementation of set_len afaict.

fn set_len<T>(v: &mut Vec<T>, len: usize) {
    v.len = len;
    if let Some(last) = v.last() {
        let _x: ManuallyDrop<T> = unsafe {
            std::ptr::read(last as *const T as *const ManuallyDrop<T>)
        };
        // put copy of a `T` on the stack, 
        // requires that that value satisfies the language invariants of `T`
    }

We "obviously" do not do this, but I guess cc @rust-lang/wg-unsafe-code-guidelines on what's the right wording here.

@leonardo-m
Copy link

leonardo-m commented Sep 26, 2020

Currently yes, but surely that isn't a requirement. Having some mechanical ways to avoid some bugs in unsafe{} sections will be very good, even if it doesn't cover everything. There are people working on the formalization of (unsafe) Rust semantics that have this purposes too.

And in some cases it's not that hard, is well within the capabilities of other research languages, like here:
https://raw.github.com/rust-lang/rust/f976b4641b963dbcc1ed4edb155fce5db28807b1/library/alloc/src/vec.rs

The second SAFETY note says:

    // SAFETY: A `Vec`'s pointer is always aligned properly, and
    // the alignment the array needs is the same as the items.
    // We checked earlier that we have sufficient items.
    // The items will not double-drop as the `set_len`
    // tells the `Vec` not to also drop them.

Zig language gives pointers a static alignment annotation in the type system. And probably the annotations of a language like Why3 are enough for the other part.

@TrolledWoods
Copy link
Contributor Author

@leonardo-m I think this discussion is irrelevant to the issue. What we should be discussing is what to do; change set_len safety guarantees, because "We 'obviously' do not do this", or change the example. I'm leaning towards making the safety guarantees for set_len less strict, because that would allow for more flexible usage of the function.

@TrolledWoods
Copy link
Contributor Author

However, changing the safety on the set_len would break the invariants of Vec, and if you want to do crazy things with a vector you can use into_raw_parts and from_raw_parts. In my experience rust doesn't like invariants to be broken even for just short periods of time inside unsafe code, so just fixing the example may be a better choice that fits better with how rust works.

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 26, 2020
@Lokathor
Copy link
Contributor

The real heart of things is that code within the module is allowed to break the safety invariant as long as they don't break the validity invariant. As long as things are all put back together properly by the end of the function.

@TrolledWoods
Copy link
Contributor Author

A code example is not within the module

@Lokathor
Copy link
Contributor

Ah my mistake! I thought this was a vector construction method on the vec type.

If this is anywhere else at all then the two lines need to be swapped so that set_len comes after the initialization.

@RalfJung
Copy link
Member

What we should be discussing is what to do; change set_len safety guarantees, because "We 'obviously' do not do this", or change the example. I'm leaning towards making the safety guarantees for set_len less strict, because that would allow for more flexible usage of the function.

Exactly.

Give that Vec already makes plenty of guarantees to unsafe code, I think it would be reasonable to say that set_len itself is safe no matter which length you set, but before you use the vector again (i.e., before you call any vector operation, including drop), you must ensure that the given invariants hold.

However, to make the example code correct, we'd additionally have to say that as_mut_ptr is always safe to call, despite the "do not call any other operation" in set_len.

All of this is a roundabout way to say that set_len can violate the safety invariant (or "library invariant") of the type, and that as_mut_ptr is okay to call even when the safety invariant is currently violated.

@lcnr
Copy link
Contributor

lcnr commented Sep 27, 2020

I think it would be reasonable to say that set_len itself is safe no matter which length you set, but before you use the vector again (i.e., before you call any vector operation, including drop), you must ensure that the given invariants hold.

Note that we currently have a debug assert in set_len for len <= capacity, so setting the len to something greater than the capacity can currently panic which probably can cause unsoundness as it would be unexpected.

If we want to change the safety guarantees here we probably should remove that debug_assert (even if they aren't actually part of the std used with rustup afaik)

@RalfJung
Copy link
Member

@lcnr good point. Or we could say that a length larger than the capacity is insta-UB (library UB).

@TrolledWoods
Copy link
Contributor Author

If get_mut_ptr is always safe even if invariants are broken by set_len, then shouldn't get_ptr, len, capacity also be safe in those situations?

@TrolledWoods
Copy link
Contributor Author

Not sure if it's worth the complexity to have all these edge cases when into_raw_parts and from_raw_parts are simpler alternatives. Even if it's fine for Vec, what about other collections? Should those also have edge cases? I think it would get messy very fast.

@RalfJung
Copy link
Member

If get_mut_ptr is always safe even if invariants are broken by set_len, then shouldn't get_ptr, len, capacity also be safe in those situations?

Yes, probably.

Even if it's fine for Vec, what about other collections? Should those also have edge cases? I think it would get messy very fast.

Vec is special in that it already makes a bunch of extra guarantees.

@scottmcm
Copy link
Member

scottmcm commented Oct 1, 2020

Is needing to move the set_len later painful enough in practice to be worth the more complicated soundness requirements on it? Any length increase would need to be after the population for panic-safety in general anyway (of course a memmove as here is fine, but in other places that's not always obvious).

@Lokathor
Copy link
Contributor

Lokathor commented Oct 1, 2020

We should just move the line in the example. Complicating the method's rules would be a bad time.

@scottmcm
Copy link
Member

scottmcm commented Oct 1, 2020

See "Code 15" this paper for another example of "calling set_let before populating is bad": https://arxiv.org/pdf/2003.03296.pdf -- and then "calling set_let after taking is bad" in "Code 20".

@RalfJung
Copy link
Member

RalfJung commented Oct 1, 2020

We should just move the line in the example. Complicating the method's rules would be a bad time.

No strong opinion from my side, I just didn't want to unnecessarily exclude existing unsafe code that is using set_len correctly (such as the example, in fact).

JohnTitor added a commit to JohnTitor/rust that referenced this issue Oct 1, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Oct 1, 2020
@bors bors closed this as completed in 20202da Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants