Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CStr method that accepts any slice containing a nul-terminated string #94984

Merged
merged 1 commit into from
Mar 19, 2022

Conversation

ericseppanen
Copy link
Contributor

@ericseppanen ericseppanen commented Mar 16, 2022

I haven't created an issue (tracking or otherwise) for this yet; apologies if my approach isn't correct. This is my first code contribution.

This change adds a member fn that converts a slice into a CStr; it is intended to be safer than from_ptr (which is unsafe and may read out of bounds), and more useful than from_bytes_with_nul (which requires that the caller already know where the nul byte is).

The reason I find this useful is for situations like this:

let mut buffer = [0u8; 32];
unsafe {
    some_c_function(buffer.as_mut_ptr(), buffer.len());
}
let result = CStr::from_bytes_with_nul(&buffer).unwrap();

This code above returns an error with kind = InteriorNul, because from_bytes_with_nul expects that the caller has passed in a slice with the NUL byte at the end of the slice. But if I just got back a nul-terminated string from some FFI function, I probably don't know where the NUL byte is.

I would wish for a CStr constructor with the following properties:

  • Accept &[u8] as input
  • Scan for the first NUL byte and return the CStr that spans the correct sub-slice (see future note below).
  • Return an error if no NUL byte is found within the input slice

I asked on Zulip whether this sounded like a good idea, and got a couple of positive-sounding responses from @joshtriplett and @AzureMarker.

This is my first draft, so feedback is welcome.

A few issues that definitely need feedback:

  1. Naming. @joshtriplett called this from_bytes_with_internal_nul on Zulip, but after staring at all of the available methods, I believe that this function is probably what end users want (rather than the existing fn from_bytes_with_nul). Giving it a simpler name (from_bytes) implies that this should be their first choice.
  2. Should I add a similar method on CString that accepts Vec<u8>? I'd assume the answer is probably yes, but I figured I'd try to get early feedback before making this change bigger.
  3. What should the error type look like? I made a unit struct since CStr::from_bytes can only fail in one obvious way, but if I need to do this for CString as well then that one may want to return FromVecWithNulError. And maybe that should dictate the shape of the CStr error type also?

Also, cc @poliorcetics who wrote #73139 containing similar fns.

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 16, 2022
@m-ou-se
Copy link
Member

m-ou-se commented Mar 16, 2022

I think this should have a more detailed name than from_bytes. It's not clear from the name that the result will not include all the bytes, but only the part until the first nul byte. from_bytes_until_nul maybe?

@poliorcetics
Copy link
Contributor

I agree with @m-ou-se, especially since this is mostly intended for FFI, a very clear name is better.

The alternative for CString would be nice too, but that can be done in a separate PR (and marked as TODO in the tracking issue)

You said you didn't know how to create a tracking issue. For this you can simply copy an existing one and rework its informations to be about this PR :)

@m-ou-se
Copy link
Member

m-ou-se commented Mar 16, 2022

You said you didn't know how to create a tracking issue. For this you can simply copy an existing one and rework its informations to be about this PR :)

We have a template for this. :)

@ericseppanen
Copy link
Contributor Author

ericseppanen commented Mar 16, 2022

The tracking issue template says

If the new feature is small, it may be fine to skip the RFC process. In that
case, you can use use issue = "none" in your initial implementation PR. The
reviewer will ask you to open a tracking issue if they agree your feature can be
added without an RFC.

Can I take the above comments as approval to go ahead and create one now?

edit: I went ahead and created one: #95027.

@ericseppanen
Copy link
Contributor Author

ericseppanen commented Mar 16, 2022

Updated since the first draft:

  • renamed to from_bytes_until_nul (and FromBytesUntilNulError)
  • fixed Error type (impl Error + Display, add private () member)
  • improved docs

I think this is ready for (another) review.

@m-ou-se
Copy link
Member

m-ou-se commented Mar 17, 2022

the CStr that spans the correct sub-slice

It's important to note that while &CStr is currently a wide pointer (pointer+length), the plan is to change that to be just a single pointer, like a const char * in C, as soon as we have the language features to do so. When that happens, from_ptr will become O(1), and len will become O(N). Then, this from_bytes_until_nul will basically become a version of the trivial from_ptr that in addition scans if there's really a terminating nul byte, but does not return that length. Calling len after from_bytes_until_nul will have to scan the bytes again.

@Mark-Simulacrum
Copy link
Member

r=me with commits squashed into one. I also updated the tracking issue with @m-ou-se's concern.

@ericseppanen
Copy link
Contributor Author

I also updated the tracking issue with @m-ou-se's concern.

I had read that message as informative, just warning that the word "span" may no longer apply in the future. Can you clarify what is the remaining concern?

Even in a future where CStr is just a pointer, I think the need for this function still exists (because from_bytes_with_nul would still return an error if the nul isn't at the end of the slice).

This adds a member fn that converts a slice into a CStr; it is intended
to be safer than from_ptr (which is unsafe and may read out of bounds),
and more useful than from_bytes_with_nul (which requires that the caller
already know where the nul byte is).

feature gate: cstr_from_bytes_until_nul

Also add an error type FromBytesUntilNulError for this fn.
@m-ou-se
Copy link
Member

m-ou-se commented Mar 18, 2022

Once we 'fix' &CStr, this function basically just does a strlen() and then throws away the result. I'm not saying it'd be useless, but it doesn't fit as nicely into the API as it does today. So we should at least think about that a bit more before stabilizing.

@ericseppanen
Copy link
Contributor Author

Once we 'fix' &CStr, this function basically just does a strlen() and then throws away the result. I'm not saying it'd be useless, but it doesn't fit as nicely into the API as it does today. So we should at least think about that a bit more before stabilizing.

We don't need the length, we need the safety (a guarantee that there must be a null byte in that slice somewhere). I don't see any alternative.

@ericseppanen
Copy link
Contributor Author

@Mark-Simulacrum I pushed a rebased + squashed version.

(Apologies, the rebase makes github's "compare" broken-- I didn't realize that would happen.)

@Mark-Simulacrum
Copy link
Member

@bors r+ rollup

Yes, rebase fixing compare is fine and expected.

@bors
Copy link
Contributor

bors commented Mar 18, 2022

📌 Commit d5fe4ca has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 18, 2022
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Mar 19, 2022
…k-Simulacrum

add `CStr` method that accepts any slice containing a nul-terminated string

I haven't created an issue (tracking or otherwise) for this yet; apologies if my approach isn't correct. This is my first code contribution.

This change adds a member fn that converts a slice into a `CStr`; it is intended to be safer than `from_ptr` (which is unsafe and may read out of bounds), and more useful than `from_bytes_with_nul` (which requires that the caller already know where the nul byte is).

The reason I find this useful is for situations like this:
```rust
let mut buffer = [0u8; 32];
unsafe {
    some_c_function(buffer.as_mut_ptr(), buffer.len());
}
let result = CStr::from_bytes_with_nul(&buffer).unwrap();
```

This code above returns an error with `kind = InteriorNul`, because `from_bytes_with_nul` expects that the caller has passed in a slice with the NUL byte at the end of the slice. But if I just got back a nul-terminated string from some FFI function, I probably don't know where the NUL byte is.

I would wish for a `CStr` constructor with the following properties:
- Accept `&[u8]` as input
- Scan for the first NUL byte and return the `CStr` that spans the correct sub-slice (see [future note below](rust-lang#94984 (comment))).
- Return an error if no NUL byte is found within the input slice

I asked on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/CStr.20from.20.26.5Bu8.5D.20without.20knowing.20the.20NUL.20location.3F) whether this sounded like a good idea, and got a couple of positive-sounding responses from `@joshtriplett` and `@AzureMarker.`

This is my first draft, so feedback is welcome.

A few issues that definitely need feedback:

1. Naming. `@joshtriplett` called this `from_bytes_with_internal_nul` on Zulip, but after staring at all of the available methods, I believe that this function is probably what end users want (rather than the existing fn `from_bytes_with_nul`). Giving it a simpler name (**`from_bytes`**) implies that this should be their first choice.
2. Should I add a similar method on `CString` that accepts `Vec<u8>`? I'd assume the answer is probably yes, but I figured I'd try to get early feedback before making this change bigger.
3. What should the error type look like? I made a unit struct since `CStr::from_bytes` can only fail in one obvious way, but if I need to do this for `CString` as well then that one may want to return `FromVecWithNulError`. And maybe that should dictate the shape of the `CStr` error type also?

Also, cc `@poliorcetics` who wrote rust-lang#73139 containing similar fns.
This was referenced Mar 19, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 19, 2022
Rollup of 6 pull requests

Successful merges:

 - rust-lang#92519 (Use verbatim paths for `process::Command` if necessary)
 - rust-lang#92612 (Update stdlib for the l4re target)
 - rust-lang#92663 (Implement `Write for Cursor<[u8; N]>`, plus `A: Allocator` cursor support)
 - rust-lang#93263 (Consistently present absent stdio handles on Windows as NULL handles.)
 - rust-lang#93692 (keyword_docs: document use of `in` with `pub` keyword)
 - rust-lang#94984 (add `CStr` method that accepts any slice containing a nul-terminated string)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 30b4182 into rust-lang:master Mar 19, 2022
@rustbot rustbot added this to the 1.61.0 milestone Mar 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants