Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit use of unsafe in uri/mod.rs #417

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 17 additions & 18 deletions src/uri/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -785,23 +785,27 @@ impl From<Uri> for Parts {
}
}

// parse_full() parses a Uri that includes more than just a path. It
// expects that at least one of the scheme or authority will be present
// as well.
fn parse_full(mut s: Bytes) -> Result<Uri, InvalidUri> {
// Parse the scheme
let scheme = match Scheme2::parse(&s[..])? {
Scheme2::None => Scheme2::None,
Scheme2::Standard(p) => {
// TODO: use truncate
let _ = s.split_to(p.len() + 3);
Scheme2::Standard(p)
}
Scheme2::Other(n) => {
// Grab the protocol
let mut scheme = s.split_to(n + 3);

// Strip ://, TODO: truncate
let _ = scheme.split_off(n);
// Strip ://
scheme.truncate(n);

// Allocate the ByteStr
// Safety: the postcondition on Scheme2::parse() means that
// s[0..n+3] is valid UTF-8. scheme is a subslice of s[0..n+3].
let val = unsafe { ByteStr::from_utf8_unchecked(scheme) };

Scheme2::Other(Box::new(val))
Expand All @@ -813,28 +817,23 @@ fn parse_full(mut s: Bytes) -> Result<Uri, InvalidUri> {
let authority_end = Authority::parse(&s[..])?;

if scheme.is_none() {
// Path is not allowed if there is no scheme.
if authority_end != s.len() {
return Err(ErrorKind::InvalidFormat.into());
}

let authority = Authority {
data: unsafe { ByteStr::from_utf8_unchecked(s) },
};

return Ok(Uri {
scheme: scheme.into(),
authority: authority,
path_and_query: PathAndQuery::empty(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to just assume empty(), and now its calling parse(s)?, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. It used to assume PathAndQuery::empty() and now it is calling PathAndQuery::from_shared(s) where s is an empty Bytes. We know that s is an empty Bytes because we test that authority_end equals the whole length of s at line 821 (of the changed file) and then call s.split_to(authority_end) at line 831. This latter call becomes (in this case) s.split_to(s.len()) which results in s being left empty.

PathAndQuery::empty() and PathAndQuery::from_shared(s) where s is empty both produce a PathAndQuery with data the equivalent of ByteStr::new() and query as NONE (the sentinel value declared in uri::path).

Logically this change still produces the same result in the scheme.is_none() case.

Performance wise, this particular case isn't covered by any of the existing benchmarks in benches/uri.rs. Although PathAndQuery::from_shared(s) in general is linear in the size of s, when s is empty this a constant operation, though this doesn't answer the performance question without having benchmarks.

If there are any concerns about the performance implications of this change I would be happy to add a benchmark to benches/uri.rs to cover this case.

});
}

// Authority is required when absolute
if authority_end == 0 {
return Err(ErrorKind::InvalidFormat.into());
} else {
// Authority is required when absolute
if authority_end == 0 {
return Err(ErrorKind::InvalidFormat.into());
}
}

let authority = s.split_to(authority_end);
let authority = Authority {
// Safety: The postcondition on Authority::parse() means that
// s[0..authority_end] is valid UTF-8 after that call. The call
// to s.split_to() means that authority here is what s[0..authority_end]
// was after the call to Authority::parse().
data: unsafe { ByteStr::from_utf8_unchecked(authority) },
};

Expand Down
9 changes: 9 additions & 0 deletions src/uri/scheme.rs
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,7 @@ impl Scheme2<usize> {
}
}

// Postcondition: On Ok(Scheme2::Other(n)) return, s[0..n+3] is valid UTF-8
pub(super) fn parse(s: &[u8]) -> Result<Scheme2<usize>, InvalidUri> {
if s.len() >= 7 {
// Check for HTTP
Expand All @@ -270,6 +271,9 @@ impl Scheme2<usize> {
}

if s.len() > 3 {
// The only Ok(Scheme2::Option(n)) return from this function is an
// early exit from this loop. This loop checks each byte in s against
// SCHEME_CHARS until until one of the early exit conditions.
for i in 0..s.len() {
let b = s[i];

Expand All @@ -290,10 +294,15 @@ impl Scheme2<usize> {
}

// Return scheme
// Postcondition: Every byte in s[0..i] has matched the
// _ arm so is valid UTF-8. s[i..i+3] matches "://" which
// is also valid UTF-8. Thus s[0..i+3] is valid UTF-8.
return Ok(Scheme2::Other(i));
}
// Invald scheme character, abort
0 => break,
// Valid scheme character: imples that b is a valid, single
// byte UTF-8 codepoint.
_ => {}
}
}
Expand Down
49 changes: 49 additions & 0 deletions src/uri/tests.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use std::str::FromStr;
use std::convert::TryFrom;

use super::{ErrorKind, InvalidUri, Port, Uri, URI_CHARS};

Expand Down Expand Up @@ -517,3 +518,51 @@ fn test_partial_eq_path_with_terminating_questionmark() {

assert_eq!(uri, a);
}

#[test]
fn test_uri_from_u8_slice() {
let uri = Uri::try_from(b"http://example.com".as_ref()).expect("conversion");

assert_eq!(uri, "http://example.com");
}

#[test]
fn test_uri_from_u8_slice_error() {
fn err(s: &[u8]) {
Uri::try_from(s).unwrap_err();
}

err(b"http://");
err(b"htt:p//host");
err(b"hyper.rs/");
err(b"hyper.rs?key=val");
err(b"?key=val");
err(b"localhost/");
err(b"localhost?key=val");
err(b"\0");
err(b"http://[::1");
err(b"http://::1]");
err(b"localhost:8080:3030");
err(b"@");
err(b"http://username:password@/wut");

// illegal queries
err(b"/?foo\rbar");
err(b"/?foo\nbar");
err(b"/?<");
err(b"/?>");

// invalid UTF-8
err(&[0xc0]);
err(&[b'h', b't', b't', b'p', b':', b'/', b'/', 0xc0]);
err(b"http://example.com/\0xc0");
err(b"http://example.com/path?\0xc0");
}

#[test]
fn test_uri_with_invalid_fragment_is_valid() {
let uri_bytes = b"http://example.com/path?query#\0xc0";
let uri = Uri::try_from(uri_bytes.as_ref()).expect("conversion error");

assert_eq!(uri, "http://example.com/path?query");
}