Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No longer able to monkey-patch HTTP parser #23716

Closed
Jimbly opened this issue Oct 17, 2018 · 5 comments
Closed

No longer able to monkey-patch HTTP parser #23716

Jimbly opened this issue Oct 17, 2018 · 5 comments
Labels
http_parser Issues and PRs related to the HTTP Parser dependency or the http_parser binding.

Comments

@Jimbly
Copy link
Contributor

Jimbly commented Oct 17, 2018

Hi, I'm the maintainer of http-parser-js, a pure Javascript implementation of the HTTP parser.

Starting with Node v10.1.0, specifically this commit, users can no longer reliably "monkey-patch" this module in to replace the internal HTTP parser (usually done with process.binding('http_parser').HTTPParser = require('http-parser-js').HTTPParser), because some boot-strapping code is requiring _http_common before any user-level code is possibly ran (note: only goes wrong in a forked/child/cluster process, but those are pretty common). For reference our discussion has been over here.

I realize support for replacing internal modules is pretty low priority, and when maintaining the Node.js internal modules, maintainers should never have to worry about what 3rd party modules might be doing with their API, but it would be great if we could find some way to keep this module working with future versions of Node, but I've been unable to figure out any workaround as things stand now.

First, a quick note on why this module is important to our users:

  • The Node internal http parser is very strict, and just aborts when it gets things that don't conform to the HTTP spec, and http-parser-js is more forgiving
    • For client uses (the most common among our users), this is important for web-crawling tasks that come across old or buggy pages serving slightly incorrect HTTP (often garbled headers or incorrect content-length values, etc, due to caching/proxying), which browsers will usually still display fine, yet Node cannot (normally) parse
    • For server uses, this is important to allow legacy clients (usually non-browsers) that send slightly invalid HTTP requests to still get a response
  • It's important to be able to monkey-patch this in at a low level so users can still take advantage of the rich stack and ecosystem (e.g. the NPM modules request and express)
  • The Node internal http parser has, historically, been rather crashy on some unparseable data (would crash the process), whereas http-parser-js just throws a JS-level error. It looks like the native parser has perhaps been rewritten since I last saw it crashing (v6.x or so, I think), so maybe this is no longer an issue.

Anyway, I'd love some help getting this module working reliably again, on Node v10.x+.

Without making changers to Node, is there some other way to monkey-patch our module in? If we accept that, now, some Node modules will have cached the HTTPParser constructor (but never called it) before our user code is ran, maybe there's some other way we can monkey-patch our code in. We should be able to modify HTTPParser.prototype to ours, but the actual constructor call seems like it will always call the native constructor (and we would never have a chance to catch the arguments to said constructor). My research says "no", but is there any trick we can do to get the HTTPParser constructor to call our replacement (even if it calls both the native one and then ours)?

If we were to make a small change to Node, what would be best?

  1. Wrapping HTTPParser in a Javascript object, so even if its cached, the internals could be swapped out. I'm guessing "no" to this one, since this would have a performance impact, but it would be the most transparent to the users of it
  2. Changing module-level caching to cache just internalBinding('http_parser') instead of internalBinding('http_parser').HTTPParser as it's doing now. This might have a (more minor) performance impact, as code would need to reference http_parser.HTTPParser instead of just HTTPParser everywhere. This might also require code that conflicts with the style that is normally used in internal Node modules.
  3. Change bootstrapping to not cache HTTPParser until after user-level code has a chance to run. This is less likely to stay "fixed", but would likely be a small change with no performance impact:
    • Could just change child_process.js's handleConversion[net.Socket] code to lazily require _http_common and HTTPParser the first time it sends a socket over a pipe. This is already a relatively high-overhead operation, so the performance impact of this would be negligible.
    • Could add a .freeParser() method to net.Sockets, and child_process.js simply calls this if it exists, instead of dealing with instanceof checks and parsers directly. This probably makes for cleaner code in child_process.js anyway and is just and extra method on the Socket prototype?

Any advice would help, would any PRs doing any of the above be acceptable?

@mmarchini mmarchini added http_parser Issues and PRs related to the HTTP Parser dependency or the http_parser binding. v10.x labels Oct 17, 2018
@mmarchini
Copy link
Contributor

cc @nodejs/http-parser

@devsnek
Copy link
Member

devsnek commented Oct 17, 2018

i'd be interested in seeing a parser option added to the public api.

@Jimbly
Copy link
Contributor Author

Jimbly commented Oct 17, 2018

@devsnek Right now the http parser seems often pretty coupled with the internal http modules (there's some two-way communication about things like upgrade requests) in ways that make it hard to have a good clean API (and require a bunch of Node-version-specific behaviors in http-parser-js), but obviously that could be improved, but I think that's a much larger undertaking than anything I'm inquiring about =).

Jimbly added a commit to Jimbly/node that referenced this issue Oct 31, 2018
Lazy load _http_common and HTTPParser so that the 'http_parser' binding
can be monkey patched before any internal modules require it. This also
probably improves startup performance minimally for programs that never
require the HTTP stack.

Fixes: nodejs#23716
Fixes: creationix/http-parser-js#57
@Jimbly
Copy link
Contributor Author

Jimbly commented Oct 31, 2018

I didn't hear anything back, so investigated Node-level fixes more. Adding a freeParser callback to sockets proved problematic, because it can't just be added to the prototype lest we just move the require() of _http_common into net.js instead (even worse). Simplest approach seemed to be to change internal/child_process to lazy-require the http stack when needed, and there was some precedence for lazy-requiring in that module already. PR #24006 created for this.

Trott pushed a commit to Trott/io.js that referenced this issue Nov 8, 2018
Lazy load _http_common and HTTPParser so that the 'http_parser' binding
can be monkey patched before any internal modules require it. This also
probably improves startup performance minimally for programs that never
require the HTTP stack.

Fixes: nodejs#23716
Fixes: creationix/http-parser-js#57

PR-URL: nodejs#24006
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@Trott
Copy link
Member

Trott commented Nov 8, 2018

Fixed in 7d86a53

@Trott Trott closed this as completed Nov 8, 2018
BridgeAR pushed a commit that referenced this issue Nov 14, 2018
Lazy load _http_common and HTTPParser so that the 'http_parser' binding
can be monkey patched before any internal modules require it. This also
probably improves startup performance minimally for programs that never
require the HTTP stack.

Fixes: #23716
Fixes: creationix/http-parser-js#57

PR-URL: #24006
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
kiyomizumia pushed a commit to kiyomizumia/node that referenced this issue Nov 15, 2018
Lazy load _http_common and HTTPParser so that the 'http_parser' binding
can be monkey patched before any internal modules require it. This also
probably improves startup performance minimally for programs that never
require the HTTP stack.

Fixes: nodejs#23716
Fixes: creationix/http-parser-js#57

PR-URL: nodejs#24006
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
codebytere pushed a commit that referenced this issue Dec 13, 2018
Lazy load _http_common and HTTPParser so that the 'http_parser' binding
can be monkey patched before any internal modules require it. This also
probably improves startup performance minimally for programs that never
require the HTTP stack.

Fixes: #23716
Fixes: creationix/http-parser-js#57

PR-URL: #24006
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this issue Dec 26, 2018
Lazy load _http_common and HTTPParser so that the 'http_parser' binding
can be monkey patched before any internal modules require it. This also
probably improves startup performance minimally for programs that never
require the HTTP stack.

Fixes: #23716
Fixes: creationix/http-parser-js#57

PR-URL: #24006
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http_parser Issues and PRs related to the HTTP Parser dependency or the http_parser binding.
Projects
None yet
Development

No branches or pull requests

4 participants