TLSWrap leaking huge amounts of memory #3047

arthurschreiber · 2015-09-24T08:26:00Z

We recently upgraded our application to node 4.0 (and 4.1). During load and performance testing we recognized that memory usage is going through the roof.

Here's a graph that shows this:

Basically, with the start of the load test memory usage quickly climbed from ~150MB up to ~1GB and then stabilizing at 800MB.

I went ahead and took a heapdump and opened it in the Chrome Development Tools.

Here, you can see that ~400MB are used by 2 TLSWrap instances.

(EDIT: This is on 4.1, but we've seen this also on 4.0)

//cc @indutny @ChALkeR

PSanni · 2015-09-24T16:30:08Z

Same issue here..
It's taking mass amount of memory.

indutny · 2015-09-24T16:35:31Z

Hello! Thanks for submitting this. Is there any way to reproduce it locally?

arthurschreiber · 2015-09-24T17:13:43Z

Nope, no idea. So far I tried different attempts at reproducing this locally, and none of them seemed to work.

I pulled heapdumps from many other workers we have running (we're using the node.js cluster module), and I can see similar behaviour on other nodes as well. It looks like some of these TLSWrap instances are simply left behind and never freed.

I know that a locally reproducible test case would be awesome, but I don't have any. 😞 Any ideas on how I could provide more info on this issue?

yjhjstz · 2015-09-25T02:33:35Z

If you don't mind, can you uploader heap dump ? I can go further analyse. @arthurschreiber

arthurschreiber · 2015-09-25T19:18:49Z

@yjhjstz Can I send you the dump via email? I don't want to upload it to a public place, as I'm not sure how much sensitive data is in there. I can compress the dumps down to ~30MB. Is that ok?

arthurschreiber · 2015-09-25T20:15:32Z

Ok, this is fixed by #3059.

Our application has an uncaughtException handler which does catch exceptions and forces the process to continue. (We know this is bad, and are actively working on changing it to gracefully restart the process instead).

The issue that is fixed by #3059 plus this uncaughtException handler caused requests and their associated TLSWrap object to leak and be never collected.

Thanks everyone for the help!

ChALkeR · 2015-09-26T04:58:02Z

Everything below is an offtopic and represents my personal opinion (well, except the «don't continue after an uncaughtException» part).

Our application has an uncaughtException handler which does catch exceptions and forces the process to continue. (We know this is bad, and are actively working on changing it to gracefully restart the process instead).

That is very bad and leaves the process in an corrupted state.

Also, restarting your process from within the uncaughtException handler is also not an ideal approach — your process could die for other reasons, for example an assertion on the c++ side (yes, that will be a bug) or an OOM. Or you process could hang if the logic is somewhere broken in your code. Everything of that won't be covered by restarting the process from within the uncaughtException handler.

IMO, an ideal approach in production would be a system-level supervisor that launches your process and automatically restarts it when it dies for some reason. Also, by sending periodic «I'm alive» pings from your process to the supervisor you can protect your process from hangs. If the process doesn't send a ping in the given time, the supervisor will restart it.

See http://0pointer.de/blog/projects/watchdog.html for an example.

arthurschreiber · 2015-09-26T05:34:06Z

@ChALkeR Thanks for your insight.

The problem is that just restarting the application when a lot of requests are currently in flight is reaaally uncool.

We already have a process supervisor and a watchdog + ping in place, so that's already covered.

The approach we want to take is to use the uncaught exception handler to try and shut down everything gracefully, but force a shutdown after some time in case the exception has left the process in some state where a graceful shutdown is not possible.

indutny · 2015-09-26T05:44:37Z

LET THE UNCAUGHT EXCEPTION CONTROVERSY BEGIN

arthurschreiber · 2015-09-26T06:00:47Z

LET THE UNCAUGHT EXCEPTION CONTROVERSY BEGIN

defunctzombie · 2015-09-26T16:32:02Z

@ChALkeR The issue with uncaughtException is that it can range from anything to a typo in javascript code to some underlying bad thing with sockets (as in this case) and just die-and-restart is indeed destructive for in-flight requests in the case of a typo which realistically might affect only a single user or call versus taking the whole system down. There may be a non-trivial cost to booting back up. Not arguing one way or the other but it is important to understand why uncaughtException leads folks to try and continue because sometimes you just miss a silly thing and there are no consequences to continuing.

Fishrock123 · 2015-09-27T17:41:07Z

Refs: #3068

its-eli-z · 2015-10-08T08:16:40Z

Hey guys,

I have the same issues with these TLSWrap objects. After a few hours of running, there are more than 10 huge objects per worker.

Happens with node v0.12.7, v4.1.1, v4.1.2.
I'm running forever with the cluster module (2 workers).
Normal memory usage: ~450MB, after a few hours suddenly: 3.5GB (then there is too much latency and my load balancer turns off this machine). My heapdump file is almost same as posted earlier.
I have an uncaughtException handler just to log the error and then immediately call process.exit() - Isn't that the same as when node crashes and the forever automatically restarts it?

Any ideas?
Thanks,

ChALkeR added the memory Issues and PRs related to the memory management or memory footprint. label Sep 24, 2015

mscdex added the tls Issues and PRs related to the tls subsystem. label Sep 24, 2015

arthurschreiber mentioned this issue Sep 25, 2015

AssertionError crash at ServerResponse.resOnFinish #2639

Closed

arthurschreiber closed this as completed Sep 25, 2015

ChrisCinelli mentioned this issue Mar 2, 2016

Huge amount of RAM allocated in TLSWrap spdy-http2/node-spdy#250

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TLSWrap leaking huge amounts of memory #3047

TLSWrap leaking huge amounts of memory #3047

arthurschreiber commented Sep 24, 2015

PSanni commented Sep 24, 2015

indutny commented Sep 24, 2015

arthurschreiber commented Sep 24, 2015

yjhjstz commented Sep 25, 2015

arthurschreiber commented Sep 25, 2015

arthurschreiber commented Sep 25, 2015

ChALkeR commented Sep 26, 2015

arthurschreiber commented Sep 26, 2015

indutny commented Sep 26, 2015

arthurschreiber commented Sep 26, 2015

defunctzombie commented Sep 26, 2015

Fishrock123 commented Sep 27, 2015

its-eli-z commented Oct 8, 2015

TLSWrap leaking huge amounts of memory #3047

TLSWrap leaking huge amounts of memory #3047

Comments

arthurschreiber commented Sep 24, 2015

PSanni commented Sep 24, 2015

indutny commented Sep 24, 2015

arthurschreiber commented Sep 24, 2015

yjhjstz commented Sep 25, 2015

arthurschreiber commented Sep 25, 2015

arthurschreiber commented Sep 25, 2015

ChALkeR commented Sep 26, 2015

arthurschreiber commented Sep 26, 2015

indutny commented Sep 26, 2015

LET THE UNCAUGHT EXCEPTION CONTROVERSY BEGIN

arthurschreiber commented Sep 26, 2015

defunctzombie commented Sep 26, 2015

Fishrock123 commented Sep 27, 2015

its-eli-z commented Oct 8, 2015