Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit use of unsafe in uri/mod.rs #417

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

sbosnick
Copy link
Contributor

Added tests for attempts to parse various types of invalid Uri's including ones with invalid UTF-8 bytes in them. Added a test for parsing &[u8] as a Uri where it has invalid UTF-8 bytes in the fragment. This test accepts the Uri as valid because Uri (currently) does not expose the fragment so those bytes are never interpreted as a &str.

Refactored the parse_full() function to eliminate some code duplication and thereby simplify the function. Finally, added comments to parse_full() and the functions it calls to document the postconditions that parse_full() relies on to make its use of unsafe sound.

This PR has a weak discrepancy on #414 and #416 in the sense that some of the comments added in this PR make more sense in light of the comments added in the earlier two PR's. There is no dependency in this PR on the earlier PR's to build or to run tests.

This is a part of #412.

The tests include an example of invalid UTF-8 which is an error.
This test targets and invalid UTF-8 byte after valid bytes for a uri
scheme.
The refactoring simplifies the handling of the authority component of a
Uri by eliminating code duplication.
The two uses of unsafe rely on postconditions in other functions which
are also documented to show this reliance.
The nightly build was failing because of a know regression concerning
the use of the new "unused_braces" lint in doc tests.
src/lib.rs Outdated Show resolved Hide resolved
return Ok(Uri {
scheme: scheme.into(),
authority: authority,
path_and_query: PathAndQuery::empty(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to just assume empty(), and now its calling parse(s)?, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. It used to assume PathAndQuery::empty() and now it is calling PathAndQuery::from_shared(s) where s is an empty Bytes. We know that s is an empty Bytes because we test that authority_end equals the whole length of s at line 821 (of the changed file) and then call s.split_to(authority_end) at line 831. This latter call becomes (in this case) s.split_to(s.len()) which results in s being left empty.

PathAndQuery::empty() and PathAndQuery::from_shared(s) where s is empty both produce a PathAndQuery with data the equivalent of ByteStr::new() and query as NONE (the sentinel value declared in uri::path).

Logically this change still produces the same result in the scheme.is_none() case.

Performance wise, this particular case isn't covered by any of the existing benchmarks in benches/uri.rs. Although PathAndQuery::from_shared(s) in general is linear in the size of s, when s is empty this a constant operation, though this doesn't answer the performance question without having benchmarks.

If there are any concerns about the performance implications of this change I would be happy to add a benchmark to benches/uri.rs to cover this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants