ACP: BufReader::peek #417

lolbinarycat · 2024-07-24T16:01:15Z

Proposal

Problem statement

It is often desirable to look ahead a small amount in an unseekable stream, such as a unix pipe.

Motivating examples or use cases

the main usecase is inspecting the magic number of a file, such as is trying to be done here, but it could also be useful for some simple parsers.

Solution sketch

add a method BufReader::peek that has the signature fn peek(&mut self, n: usize) -> io::Result<&[u8]>

the method would work as followed:

clear from the front of the buffer any consumed bytes
do read calls on the underlying Read object to fill the buffer until it reaches any of: the length requested, the BufReader's capacity, or end of file
return the slice of this new buffer

Alternatives

give it the same signature as Read::read (adds an extra copy, but has nice parity)
a Peek trait that abstracts over all streams that can do a limited lookahead
instead of limiting the lookahead to the capacity of the buffer, automatically grow the buffer if a larger lookahead is requested (less performant and more complexity)
leave this up to other crates (this would require those other crates to reimplement basically all of BufReader)

Links and related work

original thread linked previously

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

We think this problem seems worth solving, and the standard library might be the right place to solve it.
We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

The text was updated successfully, but these errors were encountered:

the8472 · 2024-07-24T16:18:53Z

This doesn't differ much from fill_buf, so what's the important distinguishing factor here?

lolbinarycat · 2024-07-24T16:39:57Z

using peek is more robust since it will perform multiple reads if needed, similar to read_exact, but without the behavior of throwing an error at end of file. this is important for properly handling line-buffered ptys and other edge cases.

(sidenote: when looking for a function to compare it to i was surprised that rust doesn't appear to have an equivalent to Go's io.ReadFull, which performs a similar task of doing multiple underlying reads, but doesn't error on end of file like read_exact)

the8472 · 2024-07-24T17:13:00Z

clear from the front of the buffer any consumed bytes

So, backshifting? Yeah, people have asked for this.

Though I think a backshift-and-read-once-more function might be a possible middle ground. If the read doesn't read enough you can call it again without having to fill the entire buffer.
That could help with variable-length messages where the caller doesn't know how long it's going to be.

kennytm · 2024-07-24T17:14:42Z

a Peek trait that abstracts over all streams that can do a limited lookahead

The other obvious Peek candidates are TcpStream and UdpSocket, but I think these have quite different semantic than the current suggestion.

instead of limiting the lookahead to the capacity of the buffer, automatically grow the buffer if a larger lookahead is requested (less performant and more complexity)

None of the BufReader methods resizes the buffer. The buffer is actually now implemented as a Box<[MaybeUninit<u8>]> rather than a Vec<u8>, so clearly it is intended to be staying unchanged.

the8472 · 2024-07-24T17:50:04Z

The buffer is actually now implemented as a Box<[MaybeUninit]> rather than a Vec, so clearly it is intended to be staying unchanged.

It used to be a vec. So that's more of an implementation detail and could be changed. But then we should have explicit resize methods. Introducing it through implicit resizing would be odd.

kennytm · 2024-07-24T18:06:55Z

the method would work as followed:

clear from the front of the buffer any consumed bytes

do read calls on the underlying Read object to fill the buffer until it reaches any of: the length requested, the BufReader's capacity, or end of file

return the slice of this new buffer

I think the peek(n) method should instead do the following:

if self.buffer().len() <= n, just return buffer[..n].
otherwise, clear the consumed bytes (memmove the buffer to the beginning), and then refill the buffer
return buffer[..min(n, capacity)]

Step 1 is considering that, if the buffer is large enough to support peeking n bytes it should not modify the buffer position, to preserve the performance of seek_relative(-1).

Step 2 may refill the buffer up to capacity or up to n. I prefer it to refill up to capacity because that's also what the normal fill_buf() method does.

lolbinarycat · 2024-07-24T18:26:03Z

None of the BufReader methods resizes the buffer. The buffer is actually now implemented as a Box<[MaybeUninit]> rather than a Vec, so clearly it is intended to be staying unchanged.

yep i looked at the implementation, that's why this is an alternative and not the main proposal

I think the peek(n) method should instead do the following:

1. if `self.buffer().len() <= n`, just return `buffer[..n]`.

2. otherwise, clear the consumed bytes (memmove the buffer to the beginning), and then refill the buffer

3. return `buffer[..min(n, capacity)]`

seems like it handles that one edge case a lot better with no real downside, nice!

the8472 · 2024-07-30T16:31:55Z

We discussed this during today's libs-API meeting. We're generally ok with adding this method, but the tracking issue should note the open questions whether requesting an n larger the remaining capacity should result in an error or a panic or return a short slice and also whether it should read the full spare capacity or only as much as is requested.

Speaking for myself, I think we should also have lower level building blocks backshift and read_more and will file a separate ACP for that.

blueglyph · 2024-08-04T08:15:26Z

I first posted that in the tracking issue, but I think here is a better place. What I'm quoting below is the question in that tracking issue.

what should happen if n > capacity? options are: short slice, error, or panic.

Another possibility that may be more in line with other methods reading more than one item, like read, read_until, read_line, etc., and also more like TcpStream::peek(&self, buf: &mut [u8]) -> Result<usize>, is to give a mutable reference to a buffer and return a Result:

impl BufReader {
    fn peek(&mut self, n: usize, buf: &mut [u8]) -> io::Result<usize>
}

I'm suggesting a Result<usize> to indicate that either the peek went well but the size could still be different than n or an error occurred when trying to perform the peek. Maybe there are other possibilities, like returning an error if the number of bytes isn't n (Result<()>), though it's less informative (unless the error type details what went wrong...).

It could also be convenient to have it as a trait rather than a implementation of only BufReader.

lolbinarycat · 2024-08-05T17:59:46Z

@blueglyph you seem to be suggesting using alternatives #1 and #2 together.

blueglyph · 2024-08-05T19:22:17Z

@lolbinarycat Heh, you're right. I first saw the tracking issue and the questions I quoted, without the alternatives. I replied there, then I saw there was a proposal for it here, quickly moved the suggestion and misread the alternatives in my hurry.

Sorry. Just ignore my earlier comment.

FWIW, I also think returning a shorter slice instead of an error is more useful and perhaps more conventional. Errors should be reserved for unexpected I/O issues, but when someone peeks further without knowing the total size, it's hardly an error to give n even if there are fewer items. On the other hand, a programmer might forget to check the length (which is another reason why I prefer read's signature).

PS: I had also thought of the [Peekable](https://doc.rust-lang.org/std/iter/struct.Peekable.html#method.peek) mentioned earlier for TcpStream and UdpSocket, but it's an iterator.

lolbinarycat · 2024-08-05T19:42:44Z

On the other hand, a programmer might forget to check the length (which is another reason why I prefer read's signature).

with a function like read, forgetting to check the length means reusing the old contents of the buffer, a silent logic error (a very nasty class of error)

with a function like the proposed peek, forgetting to check the length will simply panic.

blueglyph · 2024-08-05T21:15:07Z

with a function like read, forgetting to check the length means reusing the old contents of the buffer, a silent logic error (a very nasty class of error)

with a function like the proposed peek, forgetting to check the length will simply panic.

That's a good point. The length is returned explicitly, but ignoring it could be undetected. And maybe it's easier to ignore it when n is already given in argument.

Note that you can use the #[must_use] attribute on a function or trait method declaration to produce a warning if the return value is ignored. I don't think you can trigger an error, though.

lolbinarycat added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Jul 24, 2024

the8472 closed this as completed Jul 30, 2024

the8472 added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Jul 30, 2024

lolbinarycat mentioned this issue Jul 30, 2024

Tracking Issue for bufreader_peek rust-lang/rust#128405

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACP: BufReader::peek #417

ACP: BufReader::peek #417

lolbinarycat commented Jul 24, 2024 •

edited

Loading

the8472 commented Jul 24, 2024

lolbinarycat commented Jul 24, 2024

the8472 commented Jul 24, 2024

kennytm commented Jul 24, 2024 •

edited

Loading

the8472 commented Jul 24, 2024

kennytm commented Jul 24, 2024

lolbinarycat commented Jul 24, 2024

the8472 commented Jul 30, 2024 •

edited

Loading

blueglyph commented Aug 4, 2024 •

edited

Loading

lolbinarycat commented Aug 5, 2024

blueglyph commented Aug 5, 2024

lolbinarycat commented Aug 5, 2024

blueglyph commented Aug 5, 2024

ACP: BufReader::peek #417

ACP: BufReader::peek #417

Comments

lolbinarycat commented Jul 24, 2024 • edited Loading

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses

the8472 commented Jul 24, 2024

lolbinarycat commented Jul 24, 2024

the8472 commented Jul 24, 2024

kennytm commented Jul 24, 2024 • edited Loading

the8472 commented Jul 24, 2024

kennytm commented Jul 24, 2024

lolbinarycat commented Jul 24, 2024

the8472 commented Jul 30, 2024 • edited Loading

blueglyph commented Aug 4, 2024 • edited Loading

lolbinarycat commented Aug 5, 2024

blueglyph commented Aug 5, 2024

lolbinarycat commented Aug 5, 2024

blueglyph commented Aug 5, 2024

lolbinarycat commented Jul 24, 2024 •

edited

Loading

kennytm commented Jul 24, 2024 •

edited

Loading

the8472 commented Jul 30, 2024 •

edited

Loading

blueglyph commented Aug 4, 2024 •

edited

Loading