Fix handling of urlencoded '+' #168

sprohaska · 2015-06-17T15:02:38Z

Previously, '%2B' was incorrectly converted to space. The reason was
that page.js's Context() calls decodeURLEncodedURIComponent() when
initializing pathname and Route.match() calls
decodeURLEncodedURIComponent() again on the matched params after
splitting pathname. Since decodeURLEncodedURIComponent() replaces
plus with space and the path is already decoded when calling it on the
params, pluses were incorrectly converted to spaces.

page's Route.match() calls decodeURIComponent() to decode pathname
before applying the path regex, so there is no need to decode the entire
path when initializing the Context().

page.js's behavior seems odd. But I haven't fully understood whether it
is a bug, so this fixes it by disabling page.js's
decodeURLEncodedURIComponent() via the config object.

The tests are updated to exercise the param matching with a few
urlencoded special characters.

arunoda · 2015-06-17T22:13:34Z

test/client/router.core.spec.js

      rendered++;
    }
  });

-  FlowRouter.go(pathDef, {key: "abc"});
+  FlowRouter.go(pathDef, {key: "abc%20%2B%40%25"});


Why you need put encoded string here? I think that should be done by the router.

I think you are right that the router should encode it. See longer comment below.

arunoda · 2015-06-17T22:14:31Z

I'm quite not sure about here. Need some voice from others as well. @elidoran @delgermurun

delgermurun · 2015-06-18T01:36:54Z

What is the problem? What is exact behavior?

elidoran · 2015-06-18T04:07:35Z

pagejs appears to decode twice as @sprohaska describes. See pagejs line 491 and line 497.

I don't know why they would decode it twice. The first time always happens and the second happens unless you disable it. The second one also explicitly changes pluses to spaces before decoding (here).

I'm curious to know why they do that. Perhaps raise this issue with pagejs and see what they have to say?

It seems like outright disabling it would alter behavior and possibly break backwards compatibility. How about providing a way for a user to specify FlowRouter should disable the extra decoding, and leave it enabled by default.

Also, when I do the calls listed below I get an error: URIError: URI malformed. Should it encode it for us?

FlowRouter.go('/app/abc +@%')
FlowRouter.go('/app/abc%20%2B%40%25')
FlowRouter.go('/app/:param', {param:'abc%20%2B%40%25'})
FlowRouter.go('/app/home?test=abc +@%')

Without the % and %25 they work fine and all return abc @ (plus sign is replaced with a space).

When I do FlowRouter.go('/app/home?test=abc%20%2B%40%25') it works and the browser shows the encoded version. FlowRouter.getQueryParam('test') returns the decoded version "abc +@%".

So, this seems to require some review to determine what the behavior should actually be.

sprohaska · 2015-06-18T11:03:13Z

I agree that a good solution may require clarification of the behavior in pagejs.

I'm a bit unsure what I'd exactly expect from flow router. It seems reasonable that whatever I type in the browser's address field should be put into the params as urldecoded strings and I can just use them without any further decoding.

I'm less sure what I'd expect from FlowRouter.go(). One option would be that FlowRouter.go(path) behaves as if I typed path into the browser's address field, which means that it would not urlencode path. But for symmetry with route matching, FlowRouter.go('/app/:param', {param: String}), should probably urlencode the String, since it places decoded strings into params when matching a path, so it should encode them when constructing a path. In combination it would mean that flow router would not encode any direct path argument (assuming that it is already encoded) but would encode strings that are passed as params. I'm unsure whether this is a sensible choice.

Any thoughts?

Previously, '%2B' (aka '+') was incorrectly converted to space. Other urlencoded characters where affected, too. The reason was that pagejs's `Context()` calls `decodeURLEncodedURIComponent()` when initializing `pathname` and `Route.match()` calls `decodeURLEncodedURIComponent()` again on the matched `params` after splitting `pathname`. Since `decodeURLEncodedURIComponent()` replaces plus with space and the path is already decoded when calling it on the `params`, pluses were incorrectly converted to spaces; other urlencoded values were effected if the second urldecoding was not idempotent. pagejs's `Route.match()` calls `decodeURIComponent()` to decode `pathname` before applying the path regex, so there is no need to decode the entire path when initializing the `Context()`. pagejs's behavior seems odd at least; maybe it is a bug. This commit fixes the problem by disabling pagejs's `decodeURLEncodedURIComponent()` via the config object. There is no reasonable scenario in which the old behavior was correct and is modified by fixing the duplicate decoding. params are now correctly handled, and `FlowRouter` uses `qs.parse()` to handle query strings, which handles urldecoding, so pagejs should leave the query string alone. The semantic of `FlowRouter.path(pathDef, params, queryParams)` is clarified to urlencode `params` and `queryParams`, but not the `pathDef`. The tests are updated to exercise the encoding by using a few special characters.

sprohaska · 2015-06-18T18:41:27Z

I've updated the commit to implement the urlencoding in FlowRouter.path() as described in my previous comment.

sprohaska · 2015-06-18T19:04:34Z

Note that encoding of params may be unintuitive for tail params as in /:foo/:bar+. :bar captures all tail directory levels, so /a/b/c/d is matched with :bar = b/c/d. FlowRouter.path('/:foo/:bar+', {foo: 'x', bar: 'b/c/d'}) will not return the original path but /x/b%2Fc%2Fd, which is correct but looks a bit odd.

I'm unsure whether this is acceptable. I'm not fully convinced, but also do not yet have a better proposal.

arunoda · 2015-06-30T09:39:57Z

I had to do add a few modifications to this.
See this: 750321c

sprohaska · 2015-07-01T14:46:15Z

It looks correct and works, but it creates ugly URLs. I'm wondering whether flow router could provide an opt-in option to disable page.js URL encoding. It would preserve backward compatibility but also provide nicer URLs by avoiding the page.js weirdness.

I haven't thought about it in more detail. Maybe I propose something when I find more time.

arunoda reviewed Jun 17, 2015
View reviewed changes

sprohaska force-pushed the fix-urlencoded-plus branch from 2d5266c to 4a792fd Compare June 18, 2015 18:38

arunoda merged commit 4a792fd into kadirahq:master Jun 30, 2015

arunoda added a commit that referenced this pull request Jun 30, 2015

Encoding params 2times to fix #168

750321c

sprohaska deleted the fix-urlencoded-plus branch July 1, 2015 14:46

Sing-Li mentioned this pull request Sep 25, 2015

Can't create cyrillic room name RocketChat/Rocket.Chat#892

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of urlencoded '+' #168

Fix handling of urlencoded '+' #168

sprohaska commented Jun 17, 2015

arunoda Jun 17, 2015

sprohaska Jun 18, 2015

arunoda commented Jun 17, 2015

delgermurun commented Jun 18, 2015

elidoran commented Jun 18, 2015

sprohaska commented Jun 18, 2015

sprohaska commented Jun 18, 2015

sprohaska commented Jun 18, 2015

arunoda commented Jun 30, 2015

sprohaska commented Jul 1, 2015

Fix handling of urlencoded '+' #168

Fix handling of urlencoded '+' #168

Conversation

sprohaska commented Jun 17, 2015

arunoda Jun 17, 2015

Choose a reason for hiding this comment

sprohaska Jun 18, 2015

Choose a reason for hiding this comment

arunoda commented Jun 17, 2015

delgermurun commented Jun 18, 2015

elidoran commented Jun 18, 2015

sprohaska commented Jun 18, 2015

sprohaska commented Jun 18, 2015

sprohaska commented Jun 18, 2015

arunoda commented Jun 30, 2015

sprohaska commented Jul 1, 2015